Statistical Programming Manifesto · Declaration · Essay · AI Extension · 2026
We built one of the most technically sophisticated disciplines in pharmaceutical science. Then we forgot what it was for.
Download as PDFIf you removed SDTM, ADaM, and TFLs entirely from our world tomorrow, what percentage of statistical programming organizations could still meaningfully contribute to a scientific or clinical decision?
The honest estimate: less than 20%. And the reason this manifesto exists.
We are uncovering better ways to turn clinical data into decisions —
by doing this work and helping others grow in it.
Through twenty-nine years of this work, we have come to value:
For decades, statistical programmers did something genuinely impressive. They built macro libraries, standardized workflows, mastered CDISC, and relentlessly engineered the speed and quality of their outputs. SDTM datasets, ADaM structures, TFLs — produced faster, cleaner, more consistently than ever before.
And in doing so, they forgot one question: what are those outputs actually for?
The languages changed. SAS gave way to R. Proprietary tools gave way to open-source communities like Pharmaverse. But the underlying logic — receive a specification, produce an output, move on — remained perfectly, stubbornly intact.
Leaders doubled down on how, never asking what for. Efficiency became the religion. Effectiveness never made it into the creed.
The result: a profession structurally incapable of answering the question that matters most. Not "did we produce the output?" — but "did it help anyone decide anything worth deciding?"
We became world-class at producing artifacts. We became strangers to the science those artifacts were meant to serve.
There was a specific moment when the fracture began. It was not a crisis. It was a job posting.
The moment the role of "SAS Programmer" was formally separated from "Statistician," scientific understanding and operational execution were cleaved apart. Statistics would do the thinking. Programming would do the doing. It seemed efficient. It was the beginning of a slow hollowing out.
CDISC arrived in 1997 and made things worse in a way nobody intended. Standards meant to enable scientific communication became the cognitive boundary of an entire profession. If it fits the standard, it ships. If it doesn't, it gets bent until it does. Fit-for-purpose gave way to fit-for-compliance.
Offshoring deepened the wound. The promise: do the same work, cheaper, faster. The reality: more specifications, more oversight, more review cycles. An entire bureaucracy built to describe what to build — instead of helping people understand why they were building it.
Attrition in offshore locations remained persistently higher than onshore. The cycle reset again and again: train, document, hand over, lose, repeat. We did not create a more efficient system. We created the same inefficiency — distributed across more time zones.
Ask a statistical programmer what their output means clinically. Listen for the silence that follows.
| < 20% | Estimated share who could contribute meaningfully to a clinical decision without standards-driven outputs |
| 1997 | CDISC founded — standardization accelerated, and with it, the institutionalization of the execution-only mindset |
| ∞ | Pages of specification written to describe what to program — instead of time spent understanding why |
This is not a failure of competence. That would be easier to fix. It is something more subtle and more dangerous: well-governed drift. Doing the right things, in the right processes, toward destinations we never chose consciously.
I believe statistical programming is not a support function. It is the intelligence layer of drug development. And I believe the profession is at a pivotal moment — not because AI is arriving, but because what AI takes over will leave us exactly the space we always deserved: the space where understanding, judgment, and wisdom live.
I believe data without understanding is noise dressed up as output. I learned that from a boxplot I couldn't explain in Düsseldorf in 1997. I had produced the plot. It was technically correct. It made no sense whatsoever. My supervisor told me as much. So I sat down, studied the variables, understood how they were collected and what they actually measured. Only then could I show something meaningful. I have not forgotten it.
I believe efficiency is what AI will give us. Effectiveness is what we have to choose. In an industry where the wrong decision costs lives and the right one saves them, that choice is not a luxury. It is the job.
I believe we must stop the tool war. The debate between SAS and open source — between proprietary and Pharmaverse — is distraction at its purest form. Every hour spent arguing about the oven is an hour not spent thinking about the guest. The question was never which language you write in. It was always whether you understand what you are trying to say.
I believe the highest form of this work is not the deliverable. It is the decision it enables. The table, the listing, the figure — these are the mise en place. What we are here to produce is clarity, in the hands of someone who needs it to make a choice that matters.
The Data Caterer™ is not a role title. It is a stance toward the profession. It says: I know the guest before they sit down. I curate with intent. I govern with transparency. I synthesize across time and domain. And I remember, always, that the guest at the end of the table is not a clinician or a regulator.
It is a patient.
Here is the hard truth that is already beginning to land across the industry: AI can write code. It can read specifications. It can produce CDISC-conformant outputs. It can generate TFLs from ADaM structures at a fraction of current cost and time.
Everything our profession optimized for — execution speed, standard compliance, output volume — is now automatable. The checklist-follower is the easiest knowledge worker in the industry to replace.
AI will not make you irrelevant. The irrelevance was already there. AI will simply make it visible.
The programmers who will survive this are those who understood the science while writing the code. Those who could sit with a clinical team and translate research questions into analytical strategy. Those who owned their relevance — not just their output.
They are rare. They always were. The question is whether you choose to become one of them.
Every kitchen has a hierarchy. At the bottom: the Line Cook. Receives orders. Executes to spec. Technically proficient, disconnected from the guest. Data is a deliverable — produced, handed over, forgotten. The role is defined by the task, not the question behind it.
This is where statistical programming has lived for thirty years. It is where it cannot afford to remain.
The destination is the Data Caterer™: the professional who knows the guest before they sit down. Who curates, governs, and orchestrates data across the full pharma value chain — from protocol design to post-approval evidence. Not just execution: experience, wisdom, and the ability to build lasting knowledge.
The Data Caterer is not a title. It is a posture. It is the decision to understand the clinical question behind every dataset, to speak the language of the physician and the regulator, to design the analytical kitchen so others can cook the last mile.
This transformation is 80% mindset and 20% technology. No tool will give it to you. No offshoring model will produce it. No specification will specify it into existence. It must be chosen — actively, persistently, personally.
The courses below describe how data is served to different customers in different contexts. They are not a curriculum. They are not a technology checklist. Mastering dashboards does not make you a Data Caterer. Mastering one-pagers does not make you a Data Caterer. What makes you a Data Caterer is what sits underneath every course — foundational capabilities without which no format, however elegant, produces anything of value:
| The Line Cook — Where We Are | The Data Caterer™ — Where We Must Go | |
|---|---|---|
| Orientation | Receives orders. Produces outputs. | Knows the guest before they sit down. |
| Thinking | Thinks in standards, not questions. | Thinks in clinical questions, not specs. |
| Posture | Waits to be told what to build. | Enters rooms where decisions are made. |
| Measure | Delivery speed. | Impact on decisions. |
| Connection | Disconnected from clinical decisions. | Connected across the full value chain. |
| In the Age of AI | Replaceable by a well-prompted model. | Irreplaceable because of accumulated wisdom. |
And they should be cooking for each other.
There is a misread of the menu I keep encountering. Readers see the five courses laid out — Fix-Prix through Omakase — and they read it as a ladder. Start at the bottom. Graduate upward. One day, when I am senior enough, I will cook Omakase.
This is Line Cook thinking with ambition bolted on. It is not what the menu says.
Every stage of the pharma value chain uses all five courses. What changes is the mix. A Data Caterer in regulatory submissions is not stuck at Fix-Prix forever. They are someone who knows when a reviewer's question demands a forty-eight-hour Omakase response synthesizing registry data — and has the wisdom to deliver it without breaking the submission. A Data Caterer in Medical Affairs is not "beyond" Fix-Prix. They know Fix-Prix barely belongs on their station at all, and they have stopped producing it just because tradition says they should.
The skill is not ascending the menu. It is reading the room and choosing the course.
And the kitchen where this reading happens has seven stations.
| # | Station | The Question Being Asked |
|---|---|---|
| 01 | Non-clinical Research | Is this molecule worth a human? |
| 02 | Early Clinical Development (Ph I–IIa) | Does it do what we think it does in people? |
| 03 | Late Clinical Development (Ph IIb–III) | Does the benefit justify the risk — and for whom? |
| 04 | Regulatory Submission | Will a regulator trust this enough to approve? |
| 05 | Medical Affairs | How do clinicians actually use this — and what do they need to know? |
| 06 | HEOR / HTA | Is this worth paying for, and compared to what? |
| 07 | Commercial & Publication | Who benefits, and how do we tell that story truthfully? |
Seven questions. Seven cultures. Seven failure modes. And five courses that show up in every one of them — but in dramatically different proportions, with dramatically different stakes.
Is this molecule worth a human?
In vitro assays, in vivo pharmacokinetics and pharmacodynamics, toxicology, target engagement studies, animal models. Data that will never see a CDISC structure and should not.
Omakase-dominant. There is no submission standard for "interesting hypothesis." The decision to progress a molecule is a synthesis of fragmented evidence across biological domains. Fine Dining emerges at go/no-go committees where one chart decides a three-year program. Fix-Prix barely exists — and when someone tries to force it here, they are mistaking the station for one they used to work in.
Does it do what we think it does in people?
Phase I safety, PK, biomarkers, early efficacy signals, small-N adaptive designs. High stakes on small samples.
Fine Dining and Omakase. Decisions are fast and consequential. Fix-Prix is appropriate for the DSMB package — and nothing else. Build-Your-Own-Taco serves the clinical team probing signals as they emerge. The programmer who produces only submission-grade outputs here is producing the wrong thing for the wrong audience at the wrong moment.
Does the benefit justify the risk — and for whom?
Large-N Phase II and III trials. SDTM. ADaM. TFLs. Safety databases. The canonical landscape statistical programming has called home for thirty years.
Fix-Prix and Raw Data Buffet by volume. But the decisions that matter — futility analyses, subgroup signals, adaptive design triggers, DSMB interactions — are Fine Dining and Omakase. This is the station where most programmers spend entire careers and still never leave Fix-Prix. Which is precisely the problem the manifesto names.
Will a regulator trust this enough to approve?
Integrated summaries of safety and efficacy. CDISC-compliant datasets. Reviewer-ready TLFs. Define.xml. The full regulatory stack.
Fix-Prix is non-negotiable and foundational. This is the one station in the kitchen where strict standards adherence is the entire point, not a failure of imagination. The regulator's trust is built on predictability. Do not improvise here during service. But the part people miss: Omakase emerges at two in the morning when a reviewer asks a question that cannot be answered from the submission alone.
How do clinicians actually use this — and what do they need to know?
Post-marketing studies. Real-world evidence. Investigator-initiated trials. Advisory board outputs. Medical information queries running in the thousands.
Build-Your-Own-Taco and Fine Dining. Physicians asking ad-hoc questions need governed self-service. MSLs need one-page evidence summaries for specialist conversations held in hospital corridors. Fix-Prix is nearly irrelevant here. And yet many Medical Affairs teams still have programmers operating in pure Fix-Prix mode — producing outputs nobody reads, because the outputs are answers to questions nobody asked.
Is this worth paying for, and compared to what?
Cost-effectiveness models. Budget impact analyses. Real-world outcomes. Indirect treatment comparisons. Patient-reported outcomes. Payer dossiers.
Omakase masquerading as Fine Dining. The deliverable a payer sees looks like a single chart and a confidence interval. The work behind it is multi-year synthesis across trial data, registry data, literature, and health-economic modeling. This is one of the most sophisticated data-catering disciplines in the entire industry — and statistical programming is almost entirely absent from it. That absence is a profession-level strategic failure. The reimbursement decision determines whether patients ever receive the therapy their trial proved works.
Who benefits, and how do we tell that story truthfully?
Publication-ready analyses. Congress abstracts. Sales force materials. Patient journey data. Market research overlays.
Fine Dining is the entire point. One chart. One message. One decision a clinician will make about a patient on a Tuesday afternoon. Fix-Prix is irrelevant — nobody prescribes from a TLF. Omakase is dangerous — over-synthesis in commercial materials is where scientific integrity goes to die quietly, one caveat removed at a time.
The Value Chain Is Not a Pipeline.
We draw it as one. Arrows left to right. Non-clinical hands off to Early Clinical. Early Clinical hands off to Late. Submission hands off to Medical Affairs. And somewhere off the edge of most diagrams, HEOR and Commercial receive what they are given and do the best they can with it.
This is not a kitchen. This is a conveyor belt.
A real kitchen has stations that cook for each other. The sauté station times its finish to the grill. Mise en place on one station reduces chaos on three others. Nobody asks permission to communicate. The service depends on it.
Today, pharma's seven stations cook in isolation and hand off cold plates. Non-clinical does not listen to Medical Affairs. Submissions do not pre-answer HEOR's questions. Commercial is not constrained by what the submission team knows the data cannot support. Every handoff loses information. Every station reinvents what the previous one already knew.
The Data Caterer is the one who walks between stations carrying what each one needs from the others — often before they know they need it. Not because a process requires it. Because the patient at the end of the value chain does not care which station dropped the plate.
The menu tells you what to cook. The seven stations tell you where to cook it. What makes you a Data Caterer is knowing that the station next to yours is cooking for the same guest — and acting on that knowledge, every day, whether anyone asked you to or not.
Every profession has a trap that looks like rigor. Ours is the competency model. Name the skills, grid them out, score people against them, and something that looks like progress follows. It is a comforting exercise. It is also how the Line Cook was built.
The consulting world prescribes a T-Shape: one deep stem of expertise, one wide bar of transferable skills. It is an improvement on the I-Shape Line Cook — the programmer who is deep in SAS and shallow in everything else. But the T flattens a real truth about this profession: we do not need one deep skill. We need several — at different depths, for different stations — bound together by a crossbar of judgment and communication that turns craft into catering.
The honest shape is a comb. Multiple teeth, deliberately uneven, held together by a bar that is itself a skill. The teeth are where the craft lives. The bar is where the catering happens. Neither works without the other. And the whole comb changes shape depending on which station you are cooking at.
The comb has teeth of different lengths, and that is the point. No Data Caterer is equally deep in all of them. A Data Caterer in Late Clinical Development has a long Regulatory tooth; a Data Caterer in Non-Clinical Research barely has one at all. The depth is stationed to the work. The breadth is not negotiable.
Five teeth every Data Caterer carries — at whatever depth their station demands:
These five are the price of admission. The Line Cook mistakes them for the job. The Data Caterer knows they are the floor.
No amount of depth in the teeth produces a Data Caterer. A programmer with a twenty-year Regulatory tooth and nothing across the top is a deeply knowledgeable Line Cook. The crossbar is what makes the comb a comb — what takes craft out of the individual head and delivers it to a decision that has to be made.
The crossbar has three strands, braided together.
One without the other is not half a Data Caterer. It is something else entirely — and the profession is full of both halves, working past each other in parallel.
There is one more tooth. It does not fit cleanly into the craft teeth, and it does not sit on the crossbar, because it is growing faster than the comb itself.
Not as a buzzword. Not as another tool to bolt onto the stack. As the single skill that determines whether the rest of the comb still matters in five years. The Line Cook's response to AI has three flavors: panic, dismissal, or the quiet hope that it will go away. None of them is a strategy.
The Data Caterer's response is different. It is to develop the one capability that makes every other skill on this list more valuable: the ability to direct an agentic system toward a clinical question worth answering, evaluate what it returns against a craft standard, and integrate its output into a decision a human will own.
A Data Caterer who cannot collaborate with AI will be replaced by one who can — and the replacement will not arrive from offshore. It will arrive from the colleague two desks over who decided, this quarter, to learn.
A comb is not a silhouette you grow once and carry forever. It bends to the work.
A Data Caterer in Regulatory Submission grows a long Regulatory tooth and a reinforced Technical tooth; their Therapeutic tooth can be shorter, because submission is not where new therapeutic questions are born. A Data Caterer in Medical Affairs grows a long Therapeutic tooth and a long Communication strand; their Regulatory tooth is present but modest. A Data Caterer in HEOR grows a Data Craft tooth longer than most late-phase programmers will ever build — because synthesis across registries, literature, health-economic models, and trial data demands it.
Different stations. Different combs. Same crossbar.
The skill is not to grow every tooth to the same length. It is to know which teeth your station demands, grow them deliberately, and never mistake a long tooth for a full comb.
I believe every programmer who reads this far owes themselves an honest answer to a small number of uncomfortable questions. Not to score themselves against a framework. To locate themselves against a destination.
If four of five are "no," the comb is mostly teeth. That is not a verdict. It is a diagnosis. The demands that follow are what to do about it.
This is not a memo to leadership. This is not a call to management. This is a demand that every statistical programmer makes of themselves — now, before the choice is made for them.
Artificial intelligence has arrived in statistical programming. The tools are real. The productivity gains are real. The threat — for those who have not yet made the transformation — is also real.
But not for the reason most people assume.
The assumption is that AI will replace statistical programmers. That is the wrong fear. The right fear is subtler: AI will not replace statistical programmers. It will replace the ones who were already replaceable.
A multiplier needs something to multiply.
Multiply AI by a programmer whose entire value lives in execution — translating specifications into code, mapping variables to SDTM domains, producing TFLs from a pre-approved shell — and you get faster execution. The output looks the same. It arrives sooner. The programmer becomes more productive at the work that was already at risk of automation.
This is not liberation. It is acceleration toward irrelevance.
Now multiply AI by a Data Caterer — someone who brings deep data understanding, regulatory judgment, scientific fluency, and stakeholder awareness to every engagement. The Data Caterer who once managed ten studies manages thirty. Not because they type faster. Because AI carries the execution load while the Caterer holds the judgment. The irreplaceable part — understanding what the data means, knowing when a finding is submission-threatening, reading the room in a DSMB — is not automated. It is amplified.
The value of AI to any individual is proportional to the value they brought before AI arrived.
If the answer to "what do you contribute beyond execution?" was unclear before, AI has made that question urgent.
The five foundational capabilities of the Data Caterer were always the standard. AI has not changed what the Data Caterer must be. It has changed how visible the gap has become — and introduced a new risk.
There is now a figure who did not exist five years ago: the AI-Assisted Imitation. The statistical programmer who has learned to operate AI tools fluently, who produces polished outputs, who speaks in the language of strategy, who moves quickly and sounds confident — but who lacks the underlying judgment that the Data Caterer brings.
The outputs are professional. The reasoning is borrowed. The caveat that should have been surfaced was not surfaced — because surfacing it required knowing the regulatory history, understanding the estimand implications, having enough scientific context to sense that something was off. The AI did not know. The programmer did not know. The output went forward.
The Line Cook was always recognisable. The specification arrived, the code was produced, the review was completed. The limits of the role were visible.
The AI-Assisted Imitation is not recognisable — until something goes wrong.
This is not a critique of AI. It is a critique of mistaking fluency with tools for depth of understanding. The Data Caterer framework has always been about the latter. In an AI-augmented world, that distinction has never mattered more.
Remove the AI. What remains?
The discipline needs to ask this question honestly. About its teams. About its structures. About itself.
This is not a warning document. The point of naming these risks is to make the alternative concrete.
The Data Caterer in an AI-augmented world does not use AI less carefully than the Line Cook. They use it more deliberately.
The five foundational capabilities remain unchanged. AI has not altered what the Data Caterer must be. It has made becoming one more urgent — and the cost of not becoming one more visible.
This extension was added to the Statistical Programming Manifesto in 2026, in response to the rapid integration of generative AI into clinical data workflows.
The checklist is over. The craft begins.
The profession of statistical programming is at an inflection point it will not revisit. Those who treat this as someone else's problem will find themselves with a very clean, very compliant, very automated replacement arriving within the decade.
Those who act now — who reclaim the science, own the judgment, and build toward the Data Caterer™ — will define what this discipline becomes. Not the Line Cook. Not the specification executor. The strategist at the table where decisions that affect patients are made.
The menu is in front of you. Every programmer must choose which course to master next.
Download PDF — Share this manifesto