Solutions
About Us
Insights
Careers

Choosing AI for Clinical Workflows: What the Transparency Index Tells Us About Model Quality

The life sciences industry is reaching a turning point. Large language models (LLMs) are no longer experimental tools; they are becoming embedded in the day-to-day work of protocol writing, SAP drafting, biostatistical programming, data analysis, CSR generation, and even regulatory communication pre drafting. Organizations are beginning to ask not whether they should use AI, but which model they should trust with some of the most sensitive and scientifically consequential tasks in drug development.

This question becomes even more urgent when we look at the 2025 Foundation Model Transparency Index (FMTI), which evaluates how openly model developers disclose information about data, compute, evaluation methods, and governance. The findings show a steep decline in transparency overall. And as illustrated in the graph below, only a handful of companies — most notably IBM, Writer, and AI21 Labs — score above 60%. Many of the most widely used frontier models fall far lower, with OpenAI at 35%, Google at 41%, and Anthropic at 46%. For clinical development teams deciding which LLM to integrate into validated workflows, these discrepancies are too large to ignore.

 

 

Why transparency must be the first filter for model selection

Clinical development is a regulated environment where every analytical step must be traceable, auditable, and scientifically defensible. That makes transparency the first criterion — not model size, not benchmark performance, not popularity. Transparency determines whether you can validate a model’s outputs, understand its failure modes, and integrate it safely into processes governed by GxP expectations and regulatory submissions.

The findings offer a stark reminder that transparency is not evenly distributed across the AI ecosystem. A large gap separates enterprise-focused developers — who tend to score high on transparency — from consumer-facing or hybrid companies, whose disclosure practices are far more limited. For pharma, the companies at the top of the transparency rankings are the ones most aligned with enterprise governance needs.

 

Understanding what really matters: Data, compute, and evaluation rigor

The FMTI highlights where transparency is most lacking: training data provenance and training compute. For clinical development, these blind spots matter because training data influence a model’s understanding of medical terminology, regulatory structure, statistical concepts, and scientific nuance. Without clarity on data sources, organizations cannot evaluate whether an LLM was exposed to clinical trial–relevant content, whether copyrighted text was used, or whether biases exist that could affect outputs like safety narratives or eligibility criteria.

Compute transparency may seem less vital, but it correlates strongly with engineering discipline, model stability, and reproducibility. Models backed by clear documentation of training processes tend to produce more reliable, less erratic outputs — qualities that matter when an AI system is writing code, generating protocol text, or summarizing patient data.

The same applies to evaluation rigor. While many companies publish capability claims, very few release details sufficient for independent replication. The FMTI graph helps contextualize this: companies with the highest scores are also those more likely to provide reproducible evaluations and detailed documentation. These are crucial qualities when determining whether a model can handle clinical tasks such as explaining statistical tests, drafting SAP language, or interpreting adverse event patterns.

 

Choosing the right partner, not just the right model

The transparency disparities shown in the graph underscore an essential reality: selecting an LLM for clinical development is as much about choosing the developer as the model. Enterprise-focused developers consistently outperform frontier labs and consumer-oriented companies because they design their systems with compliance, governance, and documentation in mind.

When evaluating AI vendors for clinical development, organizations should look for evidence of stable disclosure practices, detailed model documentation, governance frameworks, and clarity about training processes. Companies with declining transparency — those trending downward in the FMTI rankings — introduce risk, especially as regulators begin requiring more visibility into the AI systems used in biomedical workflows.

 

A transparency framework for selecting LLMs

Using insights from the FMTI, organizations can apply a simple but powerful sequence when choosing an LLM:

  1. Start with transparency: eliminate models whose developers do not disclose how they were trained or evaluated.
  2. Evaluate domain relevance: determine whether the training data and tuning strategies support clinical and biomedical reasoning.
  3. Assess methodological reproducibility: ensure the model’s documented performance can be independently validated.
  4. Consider governance maturity: prioritize developers with clear update logs, risk policies, and enterprise support systems.

The companies at the top of the graph tend to check these boxes. Those at the bottom typically do not.

 

Transparency by design for AI agents

The 2025 FMTI suggests that as AI systems increasingly take the form of agents embedded across clinical development workflows, transparency may need to be considered early in the design process. AI agents that support activities such as protocol drafting, statistical interpretation, regulatory pre-authoring, or workflow orchestration can introduce additional complexity, particularly when the underlying models operate as opaque systems. In these contexts, limited visibility into how models behave may make validation, monitoring, and risk assessment more challenging.

For those of us in the life sciences industry, the use of AI agents in clinical development may be more sustainable when their behavior and decision pathways can be traced and reviewed. Regulatory evaluation typically extends beyond final outputs to include how decisions were formed and what controls were in place. If AI contributes to protocol language, safety summaries, or analytical reasoning, the ability to explain inputs, assumptions, and limitations could become increasingly important. This, in turn, may depend on working with model providers that offer sufficient documentation around training approaches, evaluation practices, and governance structures.

The transparency differences highlighted by the FMTI indicate that choosing an LLM may also involve selecting a partner with whom long-term governance and compliance considerations can be aligned. Models developed by providers with stronger disclosure practices may offer advantages when building AI systems that require auditability, reproducibility, and regulatory readiness. For organizations exploring the use of AI agents in clinical development, transparency may therefore be one of several factors that influence how confidently such systems can be scaled over time.

 

Read the Foundation Model Transparency Index 2025 report.

Beyond the Database: How Clinical Data Management Transforms Patient Care

When we think about clinical data management (CDM), it is often easy to picture databases, spreadsheets, and documents for days. However, being able to step into a clinic setting and witness how data-driven decisions shape patient care reveals the true impact of CDM.

Here, I share real-world examples of the impact of clinical data management on patients and what lies ahead for the field as technology advances.

 

From data to decisions: The impact of clinical data management in the clinical setting

Every piece of data collected during a clinical trial, be it lab results, procedure information, patient reported outcomes, or even adverse events, tells a story. During trials, these individual stories converge to guide treatment plans, ensure safety, and improve outcomes. Accuracy and speed are absolutely critical when it comes to data entry and processing as it allows clinicians to make informed decisions without delay, reducing risks for patients. Without this precision, even groundbreaking therapies can stumble due to incomplete or unreliable information.

 

Real-world examples of CDM impact

Spotting issues early

In an oncology trial, centralized monitoring picked up unusual liver enzyme levels across several patients. Because of that insight, clinicians were able to tweak treatment plans right away, preventing serious side effects and keeping patients safe.

 

Identifying dosing mistakes

During a diabetes study, data checks uncovered inconsistencies in insulin doses. Fixing those errors ensured patients got the right amount of medication, reducing the risk of hypoglycemia and keeping the study on track.

 

Keeping patients engaged

Real-time data review revealed a trend of missed visits in a cardiovascular trial. Sharing this with site teams led to proactive outreach, helping patients stay on schedule and reducing dropout rates.

 

Bridging science and care

Clinical data managers play a behind-the-scenes role, but their work directly influences what happens in the exam room. For example:

 

Keeping data consistent

Consistency ensures that trial results are reliable and can be applied to real-world care, not just on paper.

 

Building trust in the numbers

Data Integrity means clinicians can rely on the information when adjusting dosages or monitoring side effects. No second-guessing, just confidence.

 

Protecting patients and speeding up progress

Regulatory compliance isn’t just about ticking boxes — it keeps patients safe and helps move promising therapies from research to approval faster.

 

Better communication

Real-time data sharing helps patients stay informed about their progress, reducing uncertainty.

 

Fewer repeat visits

Catching errors early means patients avoid unnecessary trips back to the clinic, saving time and stress.

 

The human element — My perspective

As a Principal Clinical Data Manager, I’ve had the privilege of seeing this impact firsthand. One moment that stands out was during a rare disease trial where every day mattered for patients waiting for treatment. By streamlining data cleaning and resolving queries quickly, we helped lock the database ahead of schedule. Knowing that this effort contributed to patients receiving life-changing therapy sooner was incredibly rewarding.

It’s in these moments that the connection between data and human lives becomes crystal clear. Behind every query, every validation check, there’s a patient hoping for better health and that’s what drives our work. CDM is not just about compliance; it’s about compassion through precision.

 

Looking ahead

As technology advances, the integration of real-time data and AI-driven insights will make clinical data management even more impactful. The clinic will become a hub where data flows seamlessly, supporting personalized medicine and improving patient experiences. Predictive analytics could help identify risks before they occur, and automation will free up time for deeper analysis. The future of CDM isn’t just about managing data, it’s about transforming care.

In short, clinical data management isn’t just a technical process, it’s a human story where every detail matters.

 

Interested in learning more?

The Medical AI Superintelligence Test and NOHARM: A New Framework for Assessing Clinical Safety in AI Systems

Artificial intelligence has become an increasingly common tool in medical decision-making. Physicians consult large language models (LLMs) for diagnostic reasoning, documentation, and summarization; patients use them to interpret symptoms; and health systems continue to integrate them into clinical workflows. Yet a basic question remains insufficiently answered: How safe are these systems when their outputs influence real medical decisions?

A recent initiative under Arise AI, centered around the NOHARM benchmark, offers one of the most rigorous evaluations of clinical safety to date. Its findings, and the broader accountability framework behind it, have implications not only for direct patient care but also for clinical development, medical writing, pharmacovigilance, and regulatory documentation. Importantly, the study highlights patterns of AI failure that closely mirror risks encountered when using LLMs for complex scientific and regulatory work.

 

A benchmark designed around real patient harm

NOHARM evaluates LLMs using one hundred real physician-to-specialist consultation cases across ten specialties. Instead of relying on synthetic questions or knowledge tests, the benchmark measures whether AI-generated recommendations could expose patients to harm. More than 4,000 plausible medical actions were annotated by specialists for clinical appropriateness and potential harm, allowing the framework to assess both errors of commission (unsafe recommendations) and omission (failing to recommend necessary actions).

The benchmark sits within the broader MAST (Medical AI Superintelligence Test), initiative led by Harvard and Stanford, hosted on bench.arise-ai.org, which aims to provide ongoing public evaluation of LLMs used in healthcare settings. By publishing comparative and transparent performance metrics — including safety, completeness, precision, and harm rates — MAST serves as a standardized accountability structure for medical AI systems.

 

Key findings from the study

The results provide a nuanced view of current medical AI capabilities:

  • Harm remains a measurable risk. Some LLMs produced severely harmful recommendations in more than 20% of cases.
  • Omissions are the dominant failure mode. Over three-quarters of severe errors involved missing essential actions rather than giving incorrect ones.
  • Model “strength” does not predict safety. Size, recency, and performance on general AI benchmarks had limited correlation with clinical safety.
  • Top models can outperform physicians. In a subset of cases, the best LLMs demonstrated higher safety and completeness than generalist clinicians.
  • Hybrid systems improve outcomes. Multi-agent configurations — where one model critiques or revises another — showed materially lower harm rates.

Collectively, these findings emphasize that clinical safety must be evaluated directly; it cannot be inferred from general intelligence or linguistic fluency.

 

Relevance beyond clinical care: Implications for clinical development

Although NOHARM focuses on medical recommendations, its insights apply directly to workflows in clinical development, where LLMs are increasingly used for drafting protocols, summarizing analyses, generating safety narratives, and producing Clinical Study Reports (CSRs). The risk profile is different — regulators, rather than patients, are the primary audience — but the core failure mode identified in NOHARM is the same: AI systems frequently omit essential information while producing text that appears complete.

These omissions can lead to incomplete evidence packages, insufficient traceability, inconsistencies with statistical outputs, and regulatory challenges. The study therefore reinforces the need for structured validation processes when using LLMs in high-stakes regulatory environments.

 

The CSR example: Completeness as a safety criterion

A clinical study report requires comprehensive reporting: methodology, protocol deviations, statistical analyses, safety findings, and linked tables, figures, and listings. While LLMs can streamline drafting and improve clarity, they do not reliably identify which elements are required for regulatory compliance. As NOHARM demonstrates, even highly capable models often omit critical actions or fail to include context necessary for safety.

This parallels the risk in clinical documentation: a well-written but incomplete CSR is not simply inconvenient — it can delay submission timelines, trigger regulatory questions, or obscure important safety signals. Ensuring completeness therefore becomes a core safety requirement.

 

The necessity of human-in-the-loop systems

One of the clearest insights from the NOHARM study is that hybrid systems outperform both standalone AI models and standalone human reviewers. Multi-agent architectures reduce harmful outputs, and expert human oversight further ensures contextual accuracy, completeness, and regulatory fidelity. In clinical development, this means that LLMs should support — but not replace — experienced medical writers, clinical scientists, statisticians, and safety physicians.

A well-designed workflow leverages AI for efficiency while relying on human expertise for judgment, quality control, and risk mitigation. This aligns with the MAST vision of AI systems operating under ongoing, benchmarked evaluation rather than unmonitored deployment.

 

A path forward: Benchmark-aligned, hybrid AI for regulated medicine

The NOHARM study and the broader Arise AI benchmarking platform represent a shift toward transparent, safety-focused evaluation of medical AI. They show that:

  • Safety and completeness require explicit measurement.
  • Omission is a primary source of AI risk in both clinical and regulatory contexts.
  • Multi-agent and human-in-the-loop systems materially reduce harm.
  • Public, standardized benchmarking supports accountability and informed adoption.

For organizations exploring or deploying AI in clinical development, the message is straightforward: LLMs can accelerate work and improve consistency, but only when embedded within systems designed to detect and mitigate the very risks NOHARM identifies. With rigorous evaluation, hybrid architectures, and expert oversight, AI can be integrated into medical and regulatory workflows in a way that advances both efficiency and safety.

 

Interested in learning more?

Consult the preprint by David Wu, et al., “First, do NOHARM: Towards clinically safe large language models” and access the interactive NOHARM leaderboard to see model performance.

Empowering Patient Engagement in HTA: Lessons from an AI-Generated Plain Language Summary Case Study

The challenge: Making HTA understandable to everyone

Health technology assessments (HTAs) play a critical role in determining which treatments and innovations are adopted within healthcare systems. However, the technical language and complexity of HTA reports often make them inaccessible to patients and caregivers — the very individuals whose lives these decisions affect the most.

Plain Language Summaries (PLS) are designed to close this gap. They can translate HTA findings into clear, patient-friendly language, empowering people to engage meaningfully in healthcare decisions. Yet, producing high-quality PLS documents is a slow and resource-intensive process. Teams must balance scientific rigor with readability, cultural sensitivity, and accuracy — a demanding task that limits scalability.

This is where artificial intelligence (AI) offers a transformative opportunity.

 

The study: Can generative AI help bridge the communication gap?

At ISPOR Europe 2025, we presented a pioneering study exploring whether generative AI can create accurate and patient-friendly summaries from complex HTA documents.

Using a NICE Highly Specialized Technologies (HST) guidance on onasemnogene abeparvovec (a gene therapy for spinal muscular atrophy), the team tested Google Gemini, a large language model, to generate a full PLS automatically.

The AI-generated summary was evaluated across 18 quality measures covering readability, accuracy, relevance, and tone. A “human-in-the-loop” reviewer ensured alignment with patient communication standards and European HTA Regulation principles — integrating transparency and patient empowerment into the assessment.

 

The results: Speed meets substance

The results were striking. The AI produced an eight-page (2,570-word) PLS in just 15 seconds, structured around all key HTA components — disease context, treatment mechanism, clinical effectiveness, safety, and patient impact.

Across 18 evaluation criteria, the PLS achieved an average score of 8.27/10, reflecting strong alignment with plain language and patient-centered communication standards.

  • Mechanism simplicity (9.2/10) and plain language explanation (8.9/10) were top-performing categories, demonstrating Gemini’s ability to simplify complex gene therapy concepts without sacrificing accuracy.
  • The document met CEFR B1 readability, ensuring accessibility for non-specialist audiences.

However, the AI struggled with target population clarity (6.8/10) and unmet need articulation (6.5/10) — areas requiring deeper contextual and emotional nuance. These findings underscore the importance of maintaining a human role in refining and validating AI outputs, especially when tailoring content for specific patient groups.

 

The implications: Toward patient-centered HTA with AI

The study demonstrates that AI can accelerate and enhance the creation of patient-friendly HTA communications, promoting inclusivity and transparency in healthcare decision-making. But it also emphasizes that AI should complement, not replace, human expertise.

Generative AI tools like Gemini can help:

  • Scale patient engagement, enabling broader and faster dissemination of accessible HTA information.
  • Support regulatory compliance, aligning with EU HTA Regulation principles of transparency and participation.
  • Enhance health literacy, fostering more equitable and informed patient involvement.

Yet, meaningful adoption requires:

  • Human-in-the-loop systems to verify accuracy, tone, and contextual relevance.
  • Prompt optimization to capture nuances like unmet needs or cultural differences.
  • Ongoing validation to ensure reliability and regulatory alignment.

 

The conclusion: AI as a partner in patient empowerment

This work highlights how AI, when thoughtfully integrated, can make HTA more human-centered, transparent, and inclusive. Rather than automating empathy, it can help scale understanding — bringing patients into the conversation, not leaving them behind.

As HTA continues to evolve under new European regulations, embedding AI into communication workflows may mark a key step toward a truly patient-centered future — where every individual can understand, question, and contribute to the health decisions that shape their lives.

 

Interested in learning more?

Read the abstract published at ISPOR EUROPE 2025: “Can Generative AI Deliver Patient-Friendly Summaries? A Case Study Using NICE Guidance for Spinal Muscular Atrophy” by Manuel Cossio and Ramiro E. Gilardino.

A Preview of Cytel’s Contributions at PHUSE EU 2025

I can’t believe it has already been a year since we wrapped up PHUSE EU Connect 2024, and in two weeks we will be gathering another exciting PHUSE EU Connect conference, only a few kilometers from Heidelberg, where everything started twenty years ago with the very first PHUSE event. I was one of the couple hundred lucky attendees and now, twenty years later, I have the great honor of supporting Jennie McGuirk and Jinesh Patel as Conference Co-chair for this year’s edition.

With a promising agenda featuring about 190 presentations, 34 posters, 9 hands-on workshops, 2 panel discussions, and 3 inspiring keynote speakers, this year we are going to the city of Hamburg for the 21st PHUSE EU Connect. The agenda is full of topics looking toward the future, with about 40 talks and posters referring to AI in their titles, and once again open source will be the confirmed leitmotif.

Cytel will make a significant contribution this year, perhaps more than ever, with six presentations, one poster, active participation in both panel discussions, and co-chairing the “Scripts, Macros and Automation” and “People Leadership & Management” streams.

 

Monday topics: Agile code writing, extracting metadata from R OOP functions, and leadership

The week kicks off on Monday with Kamil Foltynski, who will present “Overcoming Challenges in Collaborative Spreadsheet Editing with Shiny, SpreadJS and JSON-Patch” in the Application Development stream at 11:30 am. Kamil will provide a technical deep dive into enabling real-time spreadsheet editing within Shiny applications, using tools such as SpreadJS, sharing key lessons learned so far. Following Kamil’s presentation, Eswara Satyanarayana Gunisetti, will present “Micro-Decisions, Macro Impact: The Role of Agile Thinking in Every Line of Code” in theCoding Tips & Tricks” stream at 12 pm. See his recent blog on the topic. Eswara will share how an agile “mindset” can positively influence the way we write code.

In the same stream, a few hours later at 2 pm, another colleague Edward Gillian, in collaboration with Sanofi, will present “Risk.assessr: Extracting OOP Function Details,” discussing strategies for extracting metadata from R Object-Oriented Programming functions. Prior to Eswara and Edward’s sessions, at 1:30 pm, Kath Wright, will moderate the Interactive People Leadership & Management session “Invisible Glue: Trust, Influence and The Architecture of Teamwork.” With this live workshop, attendees will engage in practical exercises to learn how to identify barriers to trust, evaluate influence dynamics, and apply evidence-based strategies to strengthen collaboration in both physical and virtual environments.

 

Tuesday topics: Industry trends, extracting macro usage and dependency information from SAS programs, and integrating ECA data into CDISC-compliant datasets

Tuesday also brings two presentations and one poster. Right after lunch at 1:30 pm, Cedric Marchand will join other industry leaders in the panel discussion “Reimagining Statistical Programming: AI, Standards & the Talent of Tomorrow.” The panel will explore how current industry trends, such as AI, open source, and the evolution of data standards, will influence the next generation of statistical programmers.

The afternoon continues at 4 pm with my young and talented colleague Marie Poupelin, who will present “From Zero to Programming Hero: How Internships Shape Statistical Programmers in a CRO” in the “Professional Development” stream. Marie is a great example of the success of our internship program, and she will share her journey from having “zero” statistical programming experience to becoming an industry-ready programmer. Thirty minutes later, at 4:30 pm, Guido Wendland will present “Which Macros Are Used in the Study?” in the “Scripts, Macros and Automation” stream, a stream co-led this year for the first time by my colleague Sebastià Barceló. Guido will discuss techniques to extract macro usage and dependency information from SAS programs; this is particularly useful for identifying potential issues or estimating the impact of macro updates.

Later, in the traditional Tuesday evening poster session, you can join my colleague Cyril Sombrin in discussing “Our Journey in Integrating External Control Arms (ECAs) and RWD for Rare Disease Trials.” There you can discuss real-world case studies on integrating ECA data into CDISC-compliant datasets, exploring the unique challenges and solutions when aligning real-world data with CDISC standards.

 

Wednesday topics: Real-time spreadsheet editing within Shiny applications and real-time validation and streamlined submissions

On Wednesday at 12 pm, Hugo Signol, another young talented Cytel statistical programmer and a product of our internship program, will present his talk “From XPT to Dataset-JSON: Enabling Real-Time Validation and Streamlined Submissions.” Building on Cytel’s experience from CDISC Dataset-JSON-Viewer Hackathon, Hugo will demonstrate a Shiny application that supports interactive exploration and real-time validation through API-based checks.

 

Meet us there!

Cytel will be at Booth 9 at the conference, where you can engage in discussions with our team or meet any of us throughout the week.

I hope I didn’t miss anyone, or anything! We look forward again to reuniting with colleagues and friends from around the world and meeting new acquaintances.

See you all in Hamburg!

Generative AI in Evidence Synthesis: Harnessing Potential with Responsibility

The integration of AI into the healthcare research landscape is accelerating, with one obvious area of application being evidence synthesis. From early scoping reviews to comprehensive systematic literature reviews (SLRs), AI promises to reduce manual burden and enhance efficiency by saving time. However, it is crucial to understand both the strengths and limitations of using AI in this broad context to ensure compliance, reliability, and scientific rigor.

 

Knowing where it works: A targeted approach

Artificial intelligence, including generative AI models, shines when used for targeted literature reviews (TLRs) or when generating summaries of scientific articles to support evidence-based decision-making at an early development stage. AI can synthesize large volumes of information quickly, offering valuable insights during exploratory or early-phase research.

However, it’s critical to distinguish these from regulatory-facing systematic literature reviews, especially those intended for payer or health technology assessment (HTA) submissions. In this context, SLR extractions have traditionally been completed by two independent human reviewers. This human oversight ensures objectivity and reproducibility, key elements of regulatory compliance.

 

Expertly trained models vs. generalist giants

The current landscape is filled with large generalist language models trained on diverse internet-scale data. While impressive, these models often exhibit hallucinations — the generation of plausible but incorrect or fabricated content — particularly in domain-specific applications like evidence synthesis.

This is why domain-trained expert models are preferred. These models are fine-tuned on biomedical and scientific corpora, ensuring higher reliability and reducing the risk of misinterpretation or erroneous conclusions. They understand field-specific terminology, data structures, and compliance requirements far better than their generalist counterparts.

 

The imperative of data traceability

In evidence synthesis, transparency is non-negotiable. Any AI-generated output must allow users to:

  • Highlight the exact source (i.e., sentence or section) of the original scientific article from which a conclusion or data point was extracted.
  • Compare the model’s interpretation with the source text to identify discrepancies or nuances that could affect meaning or validity.

Using structured tags to annotate key terms, qualifiers, and relationships can make these comparisons clearer and more systematic but also inform advanced search and retrieval activities. By surfacing subtle differences, tagging supports expert review, preserves contextual integrity, and strengthens the reliability and defensibility of the synthesized evidence.

 

Measuring what matters: Precision and beyond

Traditional evaluation metrics like precision, recall, and F1 score (the harmonic mean of precision and recall) remain foundational when assessing AI model performance in literature screening and data extraction.

But in generative contexts — where the task may be summarization, paraphrasing, or abstract reasoning — additional measures become valuable:

  • Answer correctness: Does the output convey a factual, verifiable point?
  • Semantic similarity: How closely does the AI output align in meaning with the ground truth?
  • BLEU, ROUGE, and BERTScore: These Natural Language Processing metrics offer quantitative insights into the quality of generated text, especially for summarization and content generation tasks.

Selecting the right mix of these metrics provides a comprehensive view of model performance and reliability.

 

Where AI makes a difference: Screening and beyond

One of the most promising applications of generative AI in evidence synthesis is in literature screening, or the ability to assess whether a publication (abstract or full text) meets the criteria for inclusion. Studies and pilot implementations suggest that AI can reduce screening time by up to 40%, making it a powerful ally for research teams.

AI tools have been leveraged to assign a probability of inclusion to a title or abstract or full text to guide the screening process but also to allow researchers to quickly understand the impact of modifying search strategies on yield. By automating this repetitive and time-consuming phase, organizations can reallocate expert human resources to higher-value tasks, such as:

  • Resolving ambiguous or context-dependent data extractions
  • Validating nuanced findings and offering insights into implications of these findings
  • Ensuring alignment with HTA submission standards

In this way, AI doesn’t replace human reviewers but augments them, driving efficiency without compromising accuracy.

 

AI with guardrails

Generative AI is reshaping the landscape of evidence synthesis, but its integration must be strategic, measured, and compliant. By combining domain-trained models, robust traceability, appropriate evaluation metrics, and human oversight, organizations can unlock the true value of AI — accelerating workflows without sacrificing quality or compliance.

When used thoughtfully, generative AI becomes more than just a tool — it becomes a partner in advancing scientific research.

 

Meet with us at ISPOR 2025!

Manuel Cossio and Nathalie Horowicz-Mehler will be in Glasgow for ISPOR Europe 2025! Click the link below to book a meeting, or stop by Booth #1024 to connect with our experts:

Breaking Barriers in Rare Disease Research with Generative AI and Synthetic Data

In healthcare innovation, one of the most pressing challenges lies in rare disease research. There are approximately 7,000 rare diseases affecting over 300 million people worldwide. With only a handful of patients dispersed globally, gathering sufficient data to power robust clinical studies or predictive models is a monumental hurdle. However, a solution is emerging at the intersection of generative AI and real-world data (RWD) — a novel approach with the potential to reshape possibilities and unlock insights to address unmet medical needs in rare diseases.

 

The rare disease data dilemma

In the U.S., rare diseases are defined as conditions affecting fewer than 200,000 people. Despite their low individual prevalence, rare diseases collectively impose a significant burden on both patients and healthcare systems.

Research and development in rare diseases often face a vicious cycle: low prevalence leads to data scarcity. Traditional clinical trials are often infeasible and/or statistically underpowered due to the limited pool of participants.

Meanwhile, RWD sources such as electronic health records (EHRs), insurance claims, registries, and patient-reported outcomes offer valuable, albeit messy and fragmented, glimpses into the patient journey. Yet even RWD struggles to paint a complete picture in rare diseases. This is where generative AI steps in.

 

Enter generative AI: Making data where there is none

Generative AI — especially models like Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and, more recently, large foundation models — has a transformative ability: it can learn patterns from limited datasets and generate synthetic yet realistic datasets.

How it works

  1. Learning from RWD: Even small datasets from rare disease patients can be used to train and fine-tune generative models. These models identify patterns, distributions, and time-dependent relationships present in the data.
  2. Synthesizing patients: Once trained, the model can create new, synthetic patient records that preserve the statistical properties and characteristics of the original data. These “digital patients” simulate disease progression, treatment responses, and comorbidities.
  3. Validating realism: Synthetic data must be validated to ensure it reflects the real-world data it was trained on. Techniques like distributional comparison, propensity scoring, and expert validation are used to ensure accuracy and utility.

 

Why synthetic data matters for rare diseases

Synthetic data can enhance rare disease clinical research in many ways, including:

 

1. Augmenting small cohorts

Synthetic data can boost sample sizes for rare disease studies, enabling:

  • Simulation of clinical trials
  • Development of more robust predictive models
  • Generation of synthetic control arms where traditional controls are ethically or logistically impractical

 

2. Enhancing privacy

In rare diseases, patient re-identification is an increased risk due to unique phenotypes or genetic markers. Synthetic data protects patient privacy, while at the same time preserves the utility of the data.

 

3. Facilitating global collaboration

As synthetic data is deidentified, it facilitates data sharing among researchers, institutions and borders, minimizing regulatory hurdles and fostering cross-collaborative discovery.

 

4. Accelerating drug development

Pharma and biotech companies can use synthetic data to:

  • Test drug targeting strategies
  • Model long-term outcomes
  • Conduct in silico trials in the earliest stages of development

 

Challenges and considerations

While promising, this approach is not without its challenges:

  • Bias amplification: Synthetic data reflects the biases of its training data. If the RWD is incomplete or skewed, so will the synthetic outputs be. Strategies to handle bias are essential.
  • Regulatory acceptance: Regulatory bodies are still evaluating how to incorporate synthetic data into approval pathways.
  • Validation standards: There is a need for consistent benchmarks and best practices for validating synthetic data — both in terms of privacy and utility, as well as broader generative AI applications in healthcare.

 

Looking ahead

The marriage of generative AI and RWD opens new doors for rare disease research. With the ability to synthesize patient data that preserves real-world complexity, we can begin to break free from the constraints of scarcity — generating insights, hypotheses, and interventions that were once out of reach.

As we move forward, interdisciplinary collaboration among clinicians, data scientists, regulatory bodies, and patient advocacy groups will be key to harnessing this potential ethically and effectively.

 

Interested in learning more?

Download our complimentary ebook, Rare Disease Clinical Trials: Design Strategies and Regulatory Considerations:

From Metadata to Submission: Rule-Based Robotic Process Automation for Statistical Programming Excellence

In the race to modernize data operations in clinical research and regulatory submissions, Robotic Process Automation (RPA) powered by rule-based systems has emerged as a dependable and high-impact solution. These systems offer clarity, control, and reproducibility — critical traits for industries like biopharma where regulatory compliance and data integrity are non-negotiable.

Here, we discuss rule-based RPA as the foundation for a scalable and auditable standards automation pipeline.

 

Rule-based automation: Transparent, trusted, and tunable

Unlike more probabilistic models, rule-based systems operate on deterministic logic. Every output is traceable back to an explicit rule, which enhances trust and simplifies troubleshooting. This transparency is particularly valuable when the processes must be easily explained to stakeholders and auditors.

Key strengths of rule-based RPA include:

Transparency

Each step in the workflow is rule-driven, making the logic easy to inspect, validate, and justify. This ensures regulatory reviewers can clearly understand how data was transformed or outputs generated — vital in submission contexts.

Consistency

Standard rules applied across studies generate consistent outputs. For example, Cytel’s ALPS system creates SDTM and ADaM code from structured specifications, producing reliable results that hold up across different projects and teams.

Customizability

Rule-based systems are modular. Teams can easily adapt existing rules to accommodate study-specific needs without overhauling the entire system. Tools like Prism allow this by applying both generic rules and study-specific layers for enriched metadata processing.

 

Cytel’s metadata-driven RPA workflow in action

Our internal automation pipeline demonstrates the power of rule-based RPA. It’s built on a modular architecture where each tool performs a specific, rules-driven task:

  • ALPS: Converts metadata specifications into ready-to-run SAS code for SDTM and ADaM datasets, reducing manual programming and minimizing error risks.
  • Lighthouse: Enables biostatisticians to build mock shells using reusable templates, ensuring consistency in table and listing structures.
  • Prism: Extracts metadata from mock shells and transforms it into XML-format ARMs (Analysis Results Metadata), enriching it through rules and generating code for up to 60% of standard safety outputs.
  • TAB Macros and CytelDocs: Automate the creation of summary tables and documentation, saving hours of effort and ensuring compliance with standardized formats.

This end-to-end pipeline reduces manual touchpoints, maintains high quality, and boosts team efficiency.

 

Where generative AI complements RPA

While rule-based systems are ideal for tasks requiring consistency and auditability, generative AI can complement these systems — particularly in areas where variability is acceptable and outputs don’t require deterministic reproducibility. For example, Gen AI can assist with:

  • Drafting exploratory narratives or documentation
  • Suggesting code for non-critical outputs
  • Enhancing user interfaces with intelligent prompts
  • Enrich the set of study specific rules to be used

However, these AI-driven capabilities are best applied where hallucinations won’t compromise integrity, and outputs don’t demand rigid consistency.

 

Business and quality benefits of rule-based RPA

By relying on rule-based RPA for core data workflows, we’ve realized several tangible gains:

  • Time efficiency: Standard code is generated automatically, freeing time for custom analysis.
  • Reduced redundancy: Developers no longer rewrite common code across projects.
  • Improved QA: Outputs are independently validated and built on rigorously tested rule sets.
  • Collaboration at scale: Uniform rules simplify onboarding and knowledge transfer.
  • Focus on what matters: Teams can concentrate on non-standard elements that require expertise.

 

Final takeaways

Rule-based RPA systems provide the transparency, structure, and adaptability required for high-stakes data environments. At Cytel, we’ve found them indispensable in our mission to expedite regulatory submissions without compromising on quality or compliance. As AI continues to evolve, generative technologies may enrich this foundation — but rule-based automation remains the core engine that ensures accuracy, accountability, and speed.

Agentic Autonomy: How Multi-Agent Systems Could Orchestrate the Future of Clinical Development

In recent years, artificial intelligence has evolved beyond basic pattern matching to become capable of autonomous reasoning, multi-step planning, and even delegation. This transition — from passive tools to goal-driven, reasoning agents — marks the rise of agentic AI.

For the life sciences sector, and especially clinical development, this evolution arrives at a critical time. Clinical trials are increasingly complex, cross-functional, and data-intensive. Agentic AI offers not just faster tools, but the possibility of autonomous collaboration — teams of agents working in harmony to reduce burden, increase efficiency, and shorten timelines.

Here we explore the evolution of agentic AI and how higher levels of autonomy could transform clinical development from reactive execution to proactive, intelligent orchestration.

 

The evolution of agentic AI

Agentic AI evolves through distinct levels of capability. Each stage unlocks new functionality — from static models to ecosystems of communicating agents. Here’s a clear breakdown of the five major levels:

 

 

Each level builds toward intelligent autonomy. The transition from Level 3 to Levels 4 and 5 introduces intentional behavior, goal-setting, and inter-agent collaboration — the foundations of autonomous operations in clinical development.

 

Agentic AI in clinical development: A new operating model

Clinical development is not just complex — it’s interdependent. Every milestone relies on the seamless handoff and integration of data, code, documents, and decisions. Agentic AI, particularly at Levels 4 and 5, promises to re-architect this model.

 

Level 4: Planning and reasoning agents

These agents can independently break down goals, design execution paths, and adapt to changing environments. Here’s how they can drive value:

  • Medical writing agents
    • What they do: Generate drafts for protocols, CSRs, and patient narratives.
    • How they help: Understand document structures, integrate real-time data, and adapt language for regulatory or clinical audiences.
    • Outcome: Faster document turnaround, reduced rework, and scalable writing support.

 

  • Statistical programming agents
    • What they do: Develop and validate analysis code in SAS, R, or Python.
    • How they help: Plan logical sequences, debug outputs, and dynamically update based on protocol amendments.
    • Outcome: Accelerated code generation with built-in quality assurance.

 

  • Information synthesis agents
    • What they do: Retrieve and synthesize information from multiple domains — scientific literature, regulatory guidelines, real-world data, health system policies, and reports on unmet medical needs.
    • How they help: Prioritize and contextualize inputs to support clinical design, indication selection, and risk-benefit assessments.
    • Outcome: Broader strategic alignment and better-informed cross-functional planning.

 

Level 5: Multi-agent systems

At this level, clinical development becomes an ecosystem of agents, each with a specialized role, working under the coordination of orchestrator agents that function like project managers.

  • Orchestrator agents
    • What they do: Assign tasks, monitor progress, and realign workflows in real time.
    • How they help: Adjust deliverables dynamically as inputs change or downstream agents complete their tasks.
    • Outcome: Continuously managed, self-optimizing trial execution.

 

  • Agent networks
    • Example: A data management agent processes raw datasets and hands outputs to a statistical agent, which triggers a writing agent to draft updated narratives — all autonomously.
    • Value: End-to-end automation with minimal human handoffs.
    • Outcome: Real-time trial updates and agility under pressure.

 

The benefits of the agent ecosystem

 

From automation to autonomy

Agentic AI reflects an evolution from “AI that assists” to “AI that takes initiative” — supporting actions, learning from experience, and extending expertise across domains. In clinical development, where complexity continues to rise and efficiency is critical, this shift offers a meaningful opportunity rather than just an advantage.

As we look toward Levels 4 and 5, we can imagine a future where trials increasingly manage themselves, where teams are supported by networks of intelligent agents, and where human professionals gain more space to focus on innovation, thoughtful oversight, and meaningful patient outcomes.

 

Meet with us at ISPOR 2025!

Manuel Cossio will be in Glasgow for ISPOR Europe 2025! Click the link below to book a meeting, or stop by Booth #1024 to connect with our experts:

Redefining Clinical Documentation in the Age of Intelligent Collaboration: The Rise of the AI-Assisted Medical Writing Strategist

The introduction of AI into medical writing workflows marks a pivotal turning point in clinical development. As life sciences companies deploy AI agents to generate clinical documents — from clinical study protocols (CSPs) together with the Statistical Analysis Plan (SAP) to clinical study reports (CSRs) — a new role is emerging: the AI-assisted medical writing strategist.

This role represents a shift in mindset and skillset. No longer is the medical writer just a document author; they are becoming a strategic orchestrator of AI tools, data-driven narratives, and regulatory precision.

 

What is an AI-assisted medical writing strategist?

An AI-assisted medical writing strategist is a clinical and regulatory expert who partners with AI systems to accelerate and optimize the development of clinical documents. They bring together deep scientific understanding, regulatory knowledge, and technical fluency to co-create documents that are not only accurate and compliant but also delivered at unprecedented speed.

They are not just reviewing AI outputs — they are shaping the way AI generates those outputs, continuously fine-tuning the interaction between human judgment and machine efficiency.

 

Core pillars of the strategist role

The AI-assisted medical writing strategist role is defined by the following five key pillars:

 

1. AI orchestration, not just review

At the heart of the strategist’s work is the ability to guide AI systems toward producing high-quality, usable first drafts. This means:

  • Designing intelligent prompts based on document type and trial context.
  • Structuring modular content frameworks that AI can populate and iterate on.
  • Embedding company-specific style guides, preferred language, and regulatory templates into AI workflows.

 

2. Scientific and regulatory oversight

Even with AI generating drafts, clinical development demands nuanced, evidence-based interpretation. The strategist ensures:

  • Scientific rigor in efficacy and safety narratives.
  • Consistency in data interpretation across documents.
  • Adherence to ICH, FDA, EMA, and country-specific requirements.

AI might know the rules, but the strategist knows the exceptions, the subtleties, and the evolving guidance that govern every submission.

 

3. Training the AI with human expertise

AI systems improve through feedback. Strategists:

  • Curate and label high-quality training datasets (e.g., past CSRs, protocols).
  • Correct and comment on AI-generated drafts to reinforce preferred structures and content styles.
  • Continuously evaluate model performance and guide retraining cycles.

They act as domain-informed teachers, helping the AI become a better writing partner over time.

 

4. Cross-functional bridge builder

Medical writing is inherently collaborative. The strategist aligns AI output with expectations from:

  • Clinical, data management, and statistical teams.
  • Regulatory affairs and quality assurance.
  • Legal, ethical, and patient advocacy groups.

In doing so, they help organizations reimagine review cycles, moving from linear drafting to agile co-creation.

 

5. Champion of ethics and transparency

AI is powerful — but it must be used responsibly. Strategists play a leading role in:

  • Ensuring AI doesn’t fabricate data or misrepresent study outcomes.
  • Clarifying where automation was used in document creation.
  • Promoting transparency, reproducibility, and compliance in every AI-assisted process.

 

Why this role matters

The volume and complexity of clinical documentation are only increasing. At the same time, timelines are shrinking, budgets are tightening, and regulatory scrutiny is rising. AI offers a way forward — but only when guided by human intelligence.

The AI-Assisted Medical Writing Strategist ensures that automation enhances human value rather than diminishing it. They unlock:

  • Faster turnaround times for key deliverables.
  • More consistent documentation across global studies.
  • Greater focus on high-value tasks like interpretation, innovation, and communication.

 

How to prepare for this role

Transitioning into this role requires new capabilities:

  • AI literacy: Understanding how large language models (LLMs) work, how they’re trained, and where they fall short.
  • Prompt engineering: Knowing how to ask the right questions and frame the right context for AI tools.
  • Regulatory acumen: Staying current with guidance on AI use in regulated document environments.
  • Change leadership: Helping others adopt AI tools confidently and responsibly.

 

Final thoughts

The AI-assisted medical writing strategist is more than a job title — it’s a vision for the future of clinical documentation. As the life sciences industry embraces digital transformation, this role becomes essential to ensure that automation is paired with accountability, speed with accuracy, and efficiency with empathy.

By stepping into this role, medical writers don’t just adapt to the AI era — they lead it.