When AI Moves Past the Mundane: Inside Sanofi’s Drug Discovery Pipeline
For 30 episodes of Practical AI in Healthcare, we’ve documented a recurring pattern. Guest after guest, the real AI wins show up in operations: scheduling, documentation, billing, claims processing. We even coined a shorthand for it: “Mundane Wins Matter.”
Then we sat down with Matt Truppo.
The wins are in the hard science
Truppo is Global Head of Research Platforms and Computational R&D at Sanofi. He leads a team that has moved AI out of pilot projects and into the core of drug discovery. Not document generation. Not regulatory filings. Target identification, molecular design, and protein engineering.
The headline numbers are striking. Sanofi’s multimodal target identification engine, which integrates proteomics, transcriptomics, genetics, and clinical safety data, identified 10+ novel drug targets in the first 12 months of deployment. “It’s the first time in my career that I’ve seen that achieved internally,” Truppo said. In his 25 years in pharma R&D, those targets were typically the province of decades of academic lab work.
From there, the team built a second-generation engine for multi-target combinations. It screened 30 million target pairs in a matter of days and expanded Sanofi’s early-stage preclinical portfolio by 40%. The goal: break efficacy ceilings by identifying combinations of targets that can be hit simultaneously with multi-specific antibodies.
On the small molecule side, Sanofi’s “AI Auto Lead” initiative tested AI-driven lead optimization by having the system suggest which compounds to synthesize. Of 200 candidates selected from a million screened in silico, 75% could be physically made, and one-third showed biological activity around 10 nanomolar. The automated lab that will close the full design-make-test loop is under construction, scheduled for Q1 2027.
But the real story is the data
What separates Sanofi’s results from the usual pharma AI press releases? Truppo was direct about it: the data.
Sanofi sits on hundreds of millions of paired antibody sequences from the Kymab and Ablynx acquisitions. They fine-tuned large protein language models on that proprietary data, and those models now outperform AlphaFold for Sanofi’s specific applications. The protein engineering cycle has been cut in half.
Truppo made a point that extends well beyond pharma. His team compared models trained on 30 million publication abstracts versus models trained on full text with images. The abstract-trained models looked right on the surface but carried the biases of how authors summarize their own work. The full-text models were “more grounded in truth.”
“The competitive moat,” as we put it to Truppo, “is decades of curated biological data, not better algorithms.” He agreed.
Where it falls down
Truppo was equally forthcoming about the gaps. Data integration across hundreds of acquisitions remains a persistent challenge. Explainability is insufficient for most real-world applications. And change management, getting scientists to adopt tools when they’re ready while not pushing them before they are, is harder than the technology itself.
His response to the organizational challenge: build a “bilingual workforce.” Sanofi trained executives first, then pushed AI literacy courses to thousands of employees. Not everyone needs to be an expert, Truppo argued, but everyone needs to be literate enough to know what questions to ask.
The 30-second advice
When we asked Truppo for his 30-second recommendation to other pharma companies, the answer was blunt: “Get your house in order. Data foundations. It is the not sexy, boring sounding thing, but is absolutely critical and foundational to everything else you’re gonna build.”
This is Part 1. In Part 2, Truppo will walk us through the clinical development side and his team’s work on digital patient twins.
🎙️ Listen to the full conversation: https://practicalaiinhealthcare.com/episodes/#S1E32
You Might Also Enjoy
- AI-Enabled Real-World Data for Biopharma with Shashi Shankar — Tackles the same broken pharma data landscape from the other side: how biopharma companies consume and curate real-world data for drug development.
- Computable Definitions for Drug Development with Aaron Kamauu — The data infrastructure and standards layer that underpins pharmaceutical AI, including fit-for-purpose data assessment and getting the foundations right before building models.
- Healthcare Data Quality and the PIQI Framework with Charlie Harp — “AI-ready data” as the prerequisite for AI success, with direct parallels to the data quality foundation Truppo’s drug discovery pipeline depends on.