Statistical Programming Manifesto · Declaration · Essay · AI Extension · 2026

Code Without Purpose.

We built one of the most technically sophisticated disciplines in pharmaceutical science. Then we forgot what it was for.

Sascha Ahrweiler Global Head of Statistical Programming · Bayer AG PHUSE Board Member datacaterer.com
Download as PDF
The question nobody wants to answer

If you removed SDTM, ADaM, and TFLs entirely from our world tomorrow, what percentage of statistical programming organizations could still meaningfully contribute to a scientific or clinical decision?

The honest estimate: less than 20%. And the reason this manifesto exists.

Core Manifesto · The Declaration

Manifesto for the Data Caterer™

We are uncovering better ways to turn clinical data into decisions — by doing this work and helping others grow in it.

Through twenty-nine years of this work, we have come to value:

Understanding the guest
over
executing the specification
Scientific judgment
over
standard compliance
Enabling decisions
over
delivering outputs
Owning the question
over
answering the task
Earning your seat at the table
over
waiting to be invited
That is, while there is value in the items on the right — specifications must be met, standards must be respected, outputs must be delivered — we value the items on the left more.

The patient is the ultimate guest. Every dataset exists to serve them. We refuse to lose sight of that.

Twelve Principles of the Data Caterer

01
The clinical question comes first. Before we write a line of code, we ask: what decision does this data need to support, and for whom?
02
We know the guest before they sit down. Understanding our audience — the physician, the regulator, the DSMB, the statistician — is not optional. It is the work.
03
Standards are a floor, not a ceiling. CDISC exists to enable. A programmer who cannot think beyond SDTM and ADaM has mistaken the plate for the meal.
04
Correctness without comprehension is a liability. We once produced a boxplot we could not explain. We will not let that happen again.
05
The specification is a starting point, not a destination. A great Data Caterer reads the protocol to understand the science — not just to follow the instructions.
06
We own our own development. No employer, certification body, or offshoring model will grow our judgment for us. That work is ours.
07
SAS versus R is the wrong question. The right question is: am I using my tools to advance understanding, or to protect my comfort zone?
08
AI will take the last mile. We must own the mile before it. The execution layer is automatable. The judgment, the wisdom, the scientific understanding — that is where we build irreplaceability.
09
Drift is a choice. The slow, comfortable slide into pure execution happens through a thousand small decisions. We choose differently.
10
The best data storytelling renders itself invisible. When a decision-maker acts with confidence, the data behind it has done its job. We serve the decision, not our own craft.
11
We curate the kitchen so others can cook. Great Data Caterers design systems, govern assets, and build platforms that multiply the intelligence of everyone who touches the data after them.
12
The patient is always in the room. Behind every dataset is a trial. Behind every trial is a patient who trusted the system with their health. We do not forget this.
Extended Manifesto · The Essay
01

We Optimized the Wrong Thing.

For decades, statistical programmers did something genuinely impressive. They built macro libraries, standardized workflows, mastered CDISC, and relentlessly engineered the speed and quality of their outputs. SDTM datasets, ADaM structures, TFLs — produced faster, cleaner, more consistently than ever before.

And in doing so, they forgot one question: what are those outputs actually for?

The languages changed. SAS gave way to R. Proprietary tools gave way to open-source communities like Pharmaverse. But the underlying logic — receive a specification, produce an output, move on — remained perfectly, stubbornly intact.

Leaders doubled down on how, never asking what for. Efficiency became the religion. Effectiveness never made it into the creed.

The result: a profession structurally incapable of answering the question that matters most. Not "did we produce the output?" — but "did it help anyone decide anything worth deciding?"

We became world-class at producing artifacts. We became strangers to the science those artifacts were meant to serve.


02

The Moment It Broke.

There was a specific moment when the fracture began. It was not a crisis. It was a job posting.

The moment the role of "SAS Programmer" was formally separated from "Statistician," scientific understanding and operational execution were cleaved apart. Statistics would do the thinking. Programming would do the doing. It seemed efficient. It was the beginning of a slow hollowing out.

CDISC arrived in 1997 and made things worse in a way nobody intended. Standards meant to enable scientific communication became the cognitive boundary of an entire profession. If it fits the standard, it ships. If it doesn't, it gets bent until it does. Fit-for-purpose gave way to fit-for-compliance.

Offshoring deepened the wound. The promise: do the same work, cheaper, faster. The reality: more specifications, more oversight, more review cycles. An entire bureaucracy built to describe what to build — instead of helping people understand why they were building it.

Attrition in offshore locations remained persistently higher than onshore. The cycle reset again and again: train, document, hand over, lose, repeat. We did not create a more efficient system. We created the same inefficiency — distributed across more time zones.

Ask a statistical programmer what their output means clinically. Listen for the silence that follows.

< 20% Estimated share who could contribute meaningfully to a clinical decision without standards-driven outputs
1997 CDISC founded — standardization accelerated, and with it, the institutionalization of the execution-only mindset
Pages of specification written to describe what to program — instead of time spent understanding why

03

The Field Stands Indicted.


04

What I Believe.

This is not a failure of competence. That would be easier to fix. It is something more subtle and more dangerous: well-governed drift. Doing the right things, in the right processes, toward destinations we never chose consciously.

I believe statistical programming is not a support function. It is the intelligence layer of drug development. And I believe the profession is at a pivotal moment — not because AI is arriving, but because what AI takes over will leave us exactly the space we always deserved: the space where understanding, judgment, and wisdom live.

I believe data without understanding is noise dressed up as output. I learned that from a boxplot I couldn't explain in Düsseldorf in 1997. I had produced the plot. It was technically correct. It made no sense whatsoever. My supervisor told me as much. So I sat down, studied the variables, understood how they were collected and what they actually measured. Only then could I show something meaningful. I have not forgotten it.

I believe efficiency is what AI will give us. Effectiveness is what we have to choose. In an industry where the wrong decision costs lives and the right one saves them, that choice is not a luxury. It is the job.

I believe we must stop the tool war. The debate between SAS and open source — between proprietary and Pharmaverse — is distraction at its purest form. Every hour spent arguing about the oven is an hour not spent thinking about the guest. The question was never which language you write in. It was always whether you understand what you are trying to say.

I believe the highest form of this work is not the deliverable. It is the decision it enables. The table, the listing, the figure — these are the mise en place. What we are here to produce is clarity, in the hands of someone who needs it to make a choice that matters.

The Data Caterer™ is not a role title. It is a stance toward the profession. It says: I know the guest before they sit down. I curate with intent. I govern with transparency. I synthesize across time and domain. And I remember, always, that the guest at the end of the table is not a clinician or a regulator.

It is a patient.


05

AI Is Not the Threat. You Are.

Here is the hard truth that is already beginning to land across the industry: AI can write code. It can read specifications. It can produce CDISC-conformant outputs. It can generate TFLs from ADaM structures at a fraction of current cost and time.

Everything our profession optimized for — execution speed, standard compliance, output volume — is now automatable. The checklist-follower is the easiest knowledge worker in the industry to replace.

AI will not make you irrelevant. The irrelevance was already there. AI will simply make it visible.

The programmers who will survive this are those who understood the science while writing the code. Those who could sit with a clinical team and translate research questions into analytical strategy. Those who owned their relevance — not just their output.

They are rare. They always were. The question is whether you choose to become one of them.


06

The Destination: From Line Cook to Data Caterer.

Every kitchen has a hierarchy. At the bottom: the Line Cook. Receives orders. Executes to spec. Technically proficient, disconnected from the guest. Data is a deliverable — produced, handed over, forgotten. The role is defined by the task, not the question behind it.

This is where statistical programming has lived for thirty years. It is where it cannot afford to remain.

The destination is the Data Caterer™: the professional who knows the guest before they sit down. Who curates, governs, and orchestrates data across the full pharma value chain — from protocol design to post-approval evidence. Not just execution: experience, wisdom, and the ability to build lasting knowledge.

The Data Caterer is not a title. It is a posture. It is the decision to understand the clinical question behind every dataset, to speak the language of the physician and the regulator, to design the analytical kitchen so others can cook the last mile.

This transformation is 80% mindset and 20% technology. No tool will give it to you. No offshoring model will produce it. No specification will specify it into existence. It must be chosen — actively, persistently, personally.

The courses below describe how data is served to different customers in different contexts. They are not a curriculum. They are not a technology checklist. Mastering dashboards does not make you a Data Caterer. Mastering one-pagers does not make you a Data Caterer. What makes you a Data Caterer is what sits underneath every course — foundational capabilities without which no format, however elegant, produces anything of value:


07

The Journey Every Programmer Must Choose.

The Line Cook — Where We Are The Data Caterer™ — Where We Must Go
Orientation Receives orders. Produces outputs. Knows the guest before they sit down.
Thinking Thinks in standards, not questions. Thinks in clinical questions, not specs.
Posture Waits to be told what to build. Enters rooms where decisions are made.
Measure Delivery speed. Impact on decisions.
Connection Disconnected from clinical decisions. Connected across the full value chain.
In the Age of AI Replaceable by a well-prompted model. Irreplaceable because of accumulated wisdom.

08

The Kitchen Has Seven Stations.

And they should be cooking for each other.

There is a misread of the menu I keep encountering. Readers see the five courses laid out — Fix-Prix through Omakase — and they read it as a ladder. Start at the bottom. Graduate upward. One day, when I am senior enough, I will cook Omakase.

This is Line Cook thinking with ambition bolted on. It is not what the menu says.

Every stage of the pharma value chain uses all five courses. What changes is the mix. A Data Caterer in regulatory submissions is not stuck at Fix-Prix forever. They are someone who knows when a reviewer's question demands a forty-eight-hour Omakase response synthesizing registry data — and has the wisdom to deliver it without breaking the submission. A Data Caterer in Medical Affairs is not "beyond" Fix-Prix. They know Fix-Prix barely belongs on their station at all, and they have stopped producing it just because tradition says they should.

The skill is not ascending the menu. It is reading the room and choosing the course.

And the kitchen where this reading happens has seven stations.

# Station The Question Being Asked
01Non-clinical ResearchIs this molecule worth a human?
02Early Clinical Development (Ph I–IIa)Does it do what we think it does in people?
03Late Clinical Development (Ph IIb–III)Does the benefit justify the risk — and for whom?
04Regulatory SubmissionWill a regulator trust this enough to approve?
05Medical AffairsHow do clinicians actually use this — and what do they need to know?
06HEOR / HTAIs this worth paying for, and compared to what?
07Commercial & PublicationWho benefits, and how do we tell that story truthfully?

Seven questions. Seven cultures. Seven failure modes. And five courses that show up in every one of them — but in dramatically different proportions, with dramatically different stakes.

Station 01 Non-clinical Research

Is this molecule worth a human?

Native Data

In vitro assays, in vivo pharmacokinetics and pharmacodynamics, toxicology, target engagement studies, animal models. Data that will never see a CDISC structure and should not.

Course Mix

Omakase-dominant. There is no submission standard for "interesting hypothesis." The decision to progress a molecule is a synthesis of fragmented evidence across biological domains. Fine Dining emerges at go/no-go committees where one chart decides a three-year program. Fix-Prix barely exists — and when someone tries to force it here, they are mistaking the station for one they used to work in.

Cross-Pollination
Real-world evidence from approved analogous mechanisms should be shaping target selection. Patient registry data on disease progression should be informing animal model design. Instead, non-clinical teams routinely design experiments in isolation from the clinical reality their molecule will eventually face. The Data Caterer here pulls signal from Medical Affairs and HEOR — years before a human is dosed.
Station 02 Early Clinical Development

Does it do what we think it does in people?

Native Data

Phase I safety, PK, biomarkers, early efficacy signals, small-N adaptive designs. High stakes on small samples.

Course Mix

Fine Dining and Omakase. Decisions are fast and consequential. Fix-Prix is appropriate for the DSMB package — and nothing else. Build-Your-Own-Taco serves the clinical team probing signals as they emerge. The programmer who produces only submission-grade outputs here is producing the wrong thing for the wrong audience at the wrong moment.

Cross-Pollination
Historical control data from published trials or registries can rescue an underpowered Phase I by providing a credible comparator. PK data from similar molecules in late-phase development can front-load dose-finding by months. The programmer who produces only Phase I outputs misses the most valuable analysis: the one that connects this trial to five years of prior human data the company already owns — and has never learned to look at.
Station 03 Late Clinical Development

Does the benefit justify the risk — and for whom?

Native Data

Large-N Phase II and III trials. SDTM. ADaM. TFLs. Safety databases. The canonical landscape statistical programming has called home for thirty years.

Course Mix

Fix-Prix and Raw Data Buffet by volume. But the decisions that matter — futility analyses, subgroup signals, adaptive design triggers, DSMB interactions — are Fine Dining and Omakase. This is the station where most programmers spend entire careers and still never leave Fix-Prix. Which is precisely the problem the manifesto names.

Cross-Pollination
Early clinical biomarker data should be informing Phase III subgroup pre-specification. Real-world comparator data can contextualize effect sizes the trial alone cannot. The Line Cook produces the TLF. The Data Caterer produces the TLF — and a second analysis that shows what the TLF means in the real-world patient population who will actually receive this therapy.
Station 04 Regulatory Submission

Will a regulator trust this enough to approve?

Native Data

Integrated summaries of safety and efficacy. CDISC-compliant datasets. Reviewer-ready TLFs. Define.xml. The full regulatory stack.

Course Mix

Fix-Prix is non-negotiable and foundational. This is the one station in the kitchen where strict standards adherence is the entire point, not a failure of imagination. The regulator's trust is built on predictability. Do not improvise here during service. But the part people miss: Omakase emerges at two in the morning when a reviewer asks a question that cannot be answered from the submission alone.

Cross-Pollination
Registry data. Published literature on similar mechanisms. Prior submission precedents available in public FDA reviews. External control arms. All of this belongs in the submission team's back pocket before the first information request arrives. The Data Caterer in submissions is the one who, at eleven at night on a Thursday, assembles a registry-based sensitivity analysis in six hours and saves the approval timeline. Line Cooks panic in that moment. Data Caterers have been preparing for it since Month One of the program.
Station 05 Medical Affairs

How do clinicians actually use this — and what do they need to know?

Native Data

Post-marketing studies. Real-world evidence. Investigator-initiated trials. Advisory board outputs. Medical information queries running in the thousands.

Course Mix

Build-Your-Own-Taco and Fine Dining. Physicians asking ad-hoc questions need governed self-service. MSLs need one-page evidence summaries for specialist conversations held in hospital corridors. Fix-Prix is nearly irrelevant here. And yet many Medical Affairs teams still have programmers operating in pure Fix-Prix mode — producing outputs nobody reads, because the outputs are answers to questions nobody asked.

Cross-Pollination
Late-phase clinical trial subgroup data should be re-served to Medical Affairs in a completely different format than it was submitted in. The Data Caterer closes the loop the industry has kept open for decades: submission data becomes Medical Affairs data becomes R&D intelligence. Today, that loop is a dotted line on an org chart. It should be the central nervous system of the company.
Station 06 HEOR / HTA

Is this worth paying for, and compared to what?

Native Data

Cost-effectiveness models. Budget impact analyses. Real-world outcomes. Indirect treatment comparisons. Patient-reported outcomes. Payer dossiers.

Course Mix

Omakase masquerading as Fine Dining. The deliverable a payer sees looks like a single chart and a confidence interval. The work behind it is multi-year synthesis across trial data, registry data, literature, and health-economic modeling. This is one of the most sophisticated data-catering disciplines in the entire industry — and statistical programming is almost entirely absent from it. That absence is a profession-level strategic failure. The reimbursement decision determines whether patients ever receive the therapy their trial proved works.

Cross-Pollination
Clinical trial ADaM datasets are routinely rebuilt from scratch for HEOR models — because programming never designed them to serve economic analysis. The Data Caterer designs clinical trial datasets with HEOR's downstream needs in mind from protocol day one. Not as a favor. As a core deliverable — because the reimbursement decision is the second gate to the patient, and our work is only finished when they have walked through both.
Station 07 Commercial & Publication

Who benefits, and how do we tell that story truthfully?

Native Data

Publication-ready analyses. Congress abstracts. Sales force materials. Patient journey data. Market research overlays.

Course Mix

Fine Dining is the entire point. One chart. One message. One decision a clinician will make about a patient on a Tuesday afternoon. Fix-Prix is irrelevant — nobody prescribes from a TLF. Omakase is dangerous — over-synthesis in commercial materials is where scientific integrity goes to die quietly, one caveat removed at a time.

Cross-Pollination
The most honest publication figures come from people who also worked on the submission — they know exactly what the data cannot support. The Data Caterer protects scientific integrity at the precise point where commercial pressure is highest — because they are the only person in the room who has actually read the full dataset, and they remember what the confidence intervals looked like before the slide was polished.

The Value Chain Is Not a Pipeline.

We draw it as one. Arrows left to right. Non-clinical hands off to Early Clinical. Early Clinical hands off to Late. Submission hands off to Medical Affairs. And somewhere off the edge of most diagrams, HEOR and Commercial receive what they are given and do the best they can with it.

This is not a kitchen. This is a conveyor belt.

A real kitchen has stations that cook for each other. The sauté station times its finish to the grill. Mise en place on one station reduces chaos on three others. Nobody asks permission to communicate. The service depends on it.

Today, pharma's seven stations cook in isolation and hand off cold plates. Non-clinical does not listen to Medical Affairs. Submissions do not pre-answer HEOR's questions. Commercial is not constrained by what the submission team knows the data cannot support. Every handoff loses information. Every station reinvents what the previous one already knew.

The Data Caterer is the one who walks between stations carrying what each one needs from the others — often before they know they need it. Not because a process requires it. Because the patient at the end of the value chain does not care which station dropped the plate.

The menu tells you what to cook. The seven stations tell you where to cook it. What makes you a Data Caterer is knowing that the station next to yours is cooking for the same guest — and acting on that knowledge, every day, whether anyone asked you to or not.


09

What We Carry to the Station.

Every profession has a trap that looks like rigor. Ours is the competency model. Name the skills, grid them out, score people against them, and something that looks like progress follows. It is a comforting exercise. It is also how the Line Cook was built.

The consulting world prescribes a T-Shape: one deep stem of expertise, one wide bar of transferable skills. It is an improvement on the I-Shape Line Cook — the programmer who is deep in SAS and shallow in everything else. But the T flattens a real truth about this profession: we do not need one deep skill. We need several — at different depths, for different stations — bound together by a crossbar of judgment and communication that turns craft into catering.

The honest shape is a comb. Multiple teeth, deliberately uneven, held together by a bar that is itself a skill. The teeth are where the craft lives. The bar is where the catering happens. Neither works without the other. And the whole comb changes shape depending on which station you are cooking at.

09.1

The Teeth: Deep Craft, Unevenly Held.

The comb has teeth of different lengths, and that is the point. No Data Caterer is equally deep in all of them. A Data Caterer in Late Clinical Development has a long Regulatory tooth; a Data Caterer in Non-Clinical Research barely has one at all. The depth is stationed to the work. The breadth is not negotiable.

Five teeth every Data Caterer carries — at whatever depth their station demands:

Data Craft
Data Craft
Not data "expertise" in the HR sense, but the ability to read a dataset the way a clinician reads a patient history. How the variables were collected. What distorts them. What they cannot tell you. What the protocol was actually trying to measure, versus what the CRF ended up capturing. A Data Caterer who cannot read a dataset this way is pretending with a LIBNAME statement.
Technical
Technical Craft
SAS, R, the Pharmaverse, the agentic tools already arriving. The instruments of the station. What matters is fluency, not tribalism. The debate about which language is superior has cost this profession a decade of attention that belonged to the guest.
Therapeutic
Therapeutic Craft
The biology, the disease, the standard of care, the competitive landscape. What a clinician knows before they open your output. Without this tooth, the deepest Technical Craft produces beautiful tables about conditions we do not actually understand.
Analytical
Analytical Craft
The statistical reasoning behind every method we apply. Not the macro that runs it. The logic that justifies it. Why this estimand and not another. Why this imputation strategy. Why this sensitivity analysis is the one that matters. The Line Cook runs the code. The Data Caterer can defend the code in the room where the decision is made.
Regulatory
Regulatory Craft
Knowing when the rules are non-negotiable, when fit-for-purpose wins, and where the line between them runs. This is often mistaken for SOP literacy. It is not. SOP literacy is knowing what to follow. Regulatory Craft is knowing when to follow it strictly, when to negotiate, and when to say — with evidence and conviction — that the standard is the wrong answer for this question.

These five are the price of admission. The Line Cook mistakes them for the job. The Data Caterer knows they are the floor.

09.2

The Crossbar: What Turns Craft Into Catering.

No amount of depth in the teeth produces a Data Caterer. A programmer with a twenty-year Regulatory tooth and nothing across the top is a deeply knowledgeable Line Cook. The crossbar is what makes the comb a comb — what takes craft out of the individual head and delivers it to a decision that has to be made.

The crossbar has three strands, braided together.

The Operating Strand
Project and compound management. Stakeholder orchestration. Leading and developing teams. Coaching others into their own judgment. These are the skills the profession has called "soft" for thirty years, and therefore treated as optional. I reject that completely. There is nothing soft about the skill that decides whether the patient's data ever reaches the decision-maker. These are the skills that determine whether what you know leaves your head at all.
The Judgment Strand
Problem solving — the real kind, not the ticket-closing kind. Holistic strategic thinking. Business acumen. Business curiosity. These are the skills that let a Data Caterer read the room and choose the course. Without them, even deep craft serves the wrong question with perfect form. With them, even modest craft serves the right question well enough to matter. The Judgment Strand is what elevates this work from reporting to advising.
The Communication Strand
The strand most programmers were never told they were hired for. The ability to defend a confidence interval to a commercial team under pressure. The ability to tell a clinician what their data cannot support without losing the relationship. The ability to stand in front of a DSMB and say, in language a physician can act on, what the evidence does and does not allow. The boxplot I could not explain in 1997 was a failure of this strand. Every programmer who has ever produced a technically correct output they could not defend in a meeting has experienced the same failure.

One without the other is not half a Data Caterer. It is something else entirely — and the profession is full of both halves, working past each other in parallel.

09.3

The Frontier Tooth.

There is one more tooth. It does not fit cleanly into the craft teeth, and it does not sit on the crossbar, because it is growing faster than the comb itself.

The Frontier Tooth — Generative & Agentic AI Literacy

Not as a buzzword. Not as another tool to bolt onto the stack. As the single skill that determines whether the rest of the comb still matters in five years. The Line Cook's response to AI has three flavors: panic, dismissal, or the quiet hope that it will go away. None of them is a strategy.

The Data Caterer's response is different. It is to develop the one capability that makes every other skill on this list more valuable: the ability to direct an agentic system toward a clinical question worth answering, evaluate what it returns against a craft standard, and integrate its output into a decision a human will own.

A Data Caterer who cannot collaborate with AI will be replaced by one who can — and the replacement will not arrive from offshore. It will arrive from the colleague two desks over who decided, this quarter, to learn.

09.4

The Comb Bends to the Station.

A comb is not a silhouette you grow once and carry forever. It bends to the work.

A Data Caterer in Regulatory Submission grows a long Regulatory tooth and a reinforced Technical tooth; their Therapeutic tooth can be shorter, because submission is not where new therapeutic questions are born. A Data Caterer in Medical Affairs grows a long Therapeutic tooth and a long Communication strand; their Regulatory tooth is present but modest. A Data Caterer in HEOR grows a Data Craft tooth longer than most late-phase programmers will ever build — because synthesis across registries, literature, health-economic models, and trial data demands it.

Different stations. Different combs. Same crossbar.

The skill is not to grow every tooth to the same length. It is to know which teeth your station demands, grow them deliberately, and never mistake a long tooth for a full comb.

09.5

The Honest Self-Audit.

I believe every programmer who reads this far owes themselves an honest answer to a small number of uncomfortable questions. Not to score themselves against a framework. To locate themselves against a destination.

If four of five are "no," the comb is mostly teeth. That is not a verdict. It is a diagnosis. The demands that follow are what to do about it.


10

What We Demand of Ourselves.

This is not a memo to leadership. This is not a call to management. This is a demand that every statistical programmer makes of themselves — now, before the choice is made for them.


AI Extension · Companion to the Manifesto · 2026
VI

The Multiplier Problem.

Artificial intelligence has arrived in statistical programming. The tools are real. The productivity gains are real. The threat — for those who have not yet made the transformation — is also real.

But not for the reason most people assume.

The assumption is that AI will replace statistical programmers. That is the wrong fear. The right fear is subtler: AI will not replace statistical programmers. It will replace the ones who were already replaceable.

A multiplier needs something to multiply.

AI
The tool
×
Line Cook
Execution only
=
Faster execution
Accelerated irrelevance

Multiply AI by a programmer whose entire value lives in execution — translating specifications into code, mapping variables to SDTM domains, producing TFLs from a pre-approved shell — and you get faster execution. The output looks the same. It arrives sooner. The programmer becomes more productive at the work that was already at risk of automation.

This is not liberation. It is acceleration toward irrelevance.

AI
The tool
×
Data Caterer™
Judgment + science
=
Irreplaceable at scale
Strategic amplification

Now multiply AI by a Data Caterer — someone who brings deep data understanding, regulatory judgment, scientific fluency, and stakeholder awareness to every engagement. The Data Caterer who once managed ten studies manages thirty. Not because they type faster. Because AI carries the execution load while the Caterer holds the judgment. The irreplaceable part — understanding what the data means, knowing when a finding is submission-threatening, reading the room in a DSMB — is not automated. It is amplified.

The value of AI to any individual is proportional to the value they brought before AI arrived.

If the answer to "what do you contribute beyond execution?" was unclear before, AI has made that question urgent.


VII

The New Warning: The AI-Assisted Imitation.

The five foundational capabilities of the Data Caterer were always the standard. AI has not changed what the Data Caterer must be. It has changed how visible the gap has become — and introduced a new risk.

New risk · 2026

There is now a figure who did not exist five years ago: the AI-Assisted Imitation. The statistical programmer who has learned to operate AI tools fluently, who produces polished outputs, who speaks in the language of strategy, who moves quickly and sounds confident — but who lacks the underlying judgment that the Data Caterer brings.

The outputs are professional. The reasoning is borrowed. The caveat that should have been surfaced was not surfaced — because surfacing it required knowing the regulatory history, understanding the estimand implications, having enough scientific context to sense that something was off. The AI did not know. The programmer did not know. The output went forward.

The Line Cook was always recognisable. The specification arrived, the code was produced, the review was completed. The limits of the role were visible.

The AI-Assisted Imitation is not recognisable — until something goes wrong.

This is not a critique of AI. It is a critique of mistaking fluency with tools for depth of understanding. The Data Caterer framework has always been about the latter. In an AI-augmented world, that distinction has never mattered more.

The Test

Remove the AI. What remains?

Data Caterer™
A practitioner with genuine data understanding, regulatory instinct, and scientific grounding. The AI was a force multiplier.
Imitation
Someone who can no longer perform at the level their outputs suggested. The AI was a mask.

The discipline needs to ask this question honestly. About its teams. About its structures. About itself.


VIII

What AI-Augmented Data Catering Actually Looks Like.

This is not a warning document. The point of naming these risks is to make the alternative concrete.

The Data Caterer in an AI-augmented world does not use AI less carefully than the Line Cook. They use it more deliberately.

The five foundational capabilities remain unchanged. AI has not altered what the Data Caterer must be. It has made becoming one more urgent — and the cost of not becoming one more visible.

This extension was added to the Statistical Programming Manifesto in 2026, in response to the rapid integration of generative AI into clinical data workflows.


The checklist is over. The craft begins.

The profession of statistical programming is at an inflection point it will not revisit. Those who treat this as someone else's problem will find themselves with a very clean, very compliant, very automated replacement arriving within the decade.

Those who act now — who reclaim the science, own the judgment, and build toward the Data Caterer™ — will define what this discipline becomes. Not the Line Cook. Not the specification executor. The strategist at the table where decisions that affect patients are made.

The menu is in front of you. Every programmer must choose which course to master next.

Download PDF — Share this manifesto
Statistical Programming Manifesto · Version 1.3 · 2026
Declaration · Extended Essay · AI Extension (Sections VI–VIII)
Sascha Ahrweiler · Global Head of Statistical Programming & Analysis Strategy & Operations · Bayer AG
PHUSE Board Member, Director Communication & Brand Strategy
datacaterer.com · AI Position Paper · Back to main site