Solutions
About Us
Insights
Careers
Contact us
Contact Us
Customer Support
Customer Support

Trustworthy AI in Action: Predicting Stroke Risk Transparently with Claims-Based Machine Learning

In recent years, deep learning and large neural networks have garnered most of the attention in the machine learning (ML) community. Their ability to model complex, high-dimensional data is indeed impressive. But in healthcare — where decisions can have serious consequences and interpretability is paramount — simpler, transparent models like logistic regression still have an important role to play.

Not every problem requires a black box. When it comes to predicting disease risk using structured data, such as insurance claims, traditional models can offer accuracy and insight.

 

Claims databases: An untapped resource for disease risk prediction

Claims databases are an increasingly valuable source of real-world data (RWD). Unlike clinical trial data, which is highly controlled but limited in scale and scope, administrative claims datasets cover millions of lives over multiple years, reflecting real patient behavior and care patterns.

These databases include information on diagnoses, procedures, prescriptions, and demographics — elements that, while lacking granular clinical detail, can still reveal important patterns in disease progression and risk. The scale of these datasets allows for robust statistical modeling, even for rare outcomes.

 

The case for explainable machine learning in claims-based risk prediction

When working with claims data, models like logistic regression, Lasso, or Ridge regression are not just sufficient — they are often ideal. These models:

  • Produce coefficients that quantify the relationship between features and outcomes.
  • Allow for transparent understanding of why a prediction was made.
  • Are easier to validate and communicate to clinicians, payers, and regulators.

In contrast, deep learning models often deliver slightly higher accuracy at the cost of interpretability — a trade-off that may not be acceptable in regulated healthcare environments.

 

A real-world example: Predicting stroke risk with claims data

In a recent study, Cytel used data from over 2.5 million insured individuals to predict the risk of stroke hospitalization. Using only claims-based features such as age, medication use, comorbidities (e.g., diabetes, hypertension), and health service utilization, we compared the performance of several models, including:

  • Logistic Regression
  • Regularized linear models (Lasso and Ridge)
  • XGBoost (a state-of-the-art ML algorithm)

The results? All models achieved similar predictive performance, with area under the ROC curve (AUC) values around 0.81. Logistic regression — simple, explainable, and well-established — performed on par with XGBoost, demonstrating that advanced complexity wasn’t necessary to achieve meaningful predictive power.

 

Transparency enables trust and action

What sets models like logistic regression apart is their explainability. Stakeholders can see precisely how risk factors like atrial fibrillation, hypercholesterolemia, or age contribute to predicted stroke risk. This level of clarity is essential not only for clinicians making decisions, but also for data governance, compliance, and patient communication.

In a time when “black box” AI models are under increasing scrutiny, explainable models offer a pragmatic path forward — especially when paired with large-scale real-world datasets like claims data.

 

Keep it simple, keep it transparent

Healthcare doesn’t just need powerful algorithms — it needs trustworthy ones. As our study shows, standard machine learning models remain highly relevant, especially when applied to well-structured real-world data. Claims databases, in particular, offer a rich foundation for developing these models and making preventive healthcare smarter, earlier, and more accessible.

Blending Power and Flexibility: How AI-Generated R Code is Reshaping Clinical Trial Design

In today’s fast-evolving clinical research landscape, designing robust and efficient trials is more critical than ever. As statistical designs grow in sophistication, biostatisticians are increasingly relying on both commercial platforms and open-source tools to meet unique modeling needs. But this hybrid approach also comes with challenges, particularly for those new to advanced simulation software or lacking programming experience.

At Cytel, we’ve been exploring how artificial intelligence (AI) can help bridge this gap. At the 2025 Joint Statistical Meetings (JSM), we will present on our latest innovation: AI-powered R code generation for clinical trial design, a feature embedded in our East Horizon™ platform. This assistant, called RCACTS (R Coding Assistant for Clinical Trial Simulation), represents a significant step forward in making custom trial design faster, more accessible, and more reliable.

 

Why talk about this now? The open-source imperative

While commercial clinical trial design software offers rapid design development through validated and user-friendly workflows, it doesn’t always address the full complexity of real-world problems. Trial statisticians often face challenges in areas such as oncology, rare diseases, and adaptive designs that require tailored statistical tests, unique outcome generation models, or alternative randomization techniques.

This is where open-source tools like R become invaluable. R allows statisticians to write custom code to simulate complex trial designs, perform Bayesian analyses, or integrate evolving regulatory guidance. Over the years, a vibrant ecosystem of R packages has emerged, offering a high degree of transparency, flexibility, and academic rigor.

Yet this flexibility comes with trade-offs: code development can be time-consuming, error-prone, and requires significant programming expertise. As a result, many biostatisticians find themselves switching between validated commercial workflows and custom R functions, leading to a process that is often fragmented and inefficient.

Recognizing this, Cytel’s East Horizon platform has introduced R integration points, enabling users to inject custom code directly into validated simulation workflows. This integration delivers the best of both worlds: the speed and structure of commercial software with the creativity and control of open-source.

 

Enter AI: Speed, simplicity, and smarter coding

Our next logical question was: can AI make this process even easier?

The answer, increasingly, is yes. With recent advances in generative AI, particularly large language models (LLMs), it’s now possible to assist in the generation of R code for simulation-based design tasks. At Cytel, we’ve harnessed OpenAI’s GPT-4o via API, securely deployed within Microsoft Azure, to create RCACTS, a coding assistant purpose-built for biostatisticians.

Unlike generic AI tools that produce standalone R scripts, RCACTS generates R code specifically tailored for the East Horizon simulation engine. It ensures that the generated functions:

  • Match expected input/output structures,
  • Include pre-defined parameters as shown in our internal statistical package CyneRgy,
  • Are immediately ready for integration and testing within a live trial design workflow.

With RCACTS, users can simply describe what they want in plain English and receive functioning R code that can be integrated into East Horizon.

 

Who benefits? Everyone from newcomers to experts

One of the major advantages of this AI-enhanced workflow is lowering the barrier to entry. For a new user unfamiliar with Cytel’s R integration or syntax requirements, writing compatible code from scratch can be daunting. RCACTS significantly reduces the learning curve by providing validated function templates, sensible defaults, and clear parameterization, all supported by generative AI.

At the same time, experienced statisticians benefit by spending less time on repetitive coding tasks, debugging, or remembering function signatures. This allows them to focus on higher-level design questions, such as: What analysis method is most robust? How sensitive is the design to different outcome distributions? What dropout patterns pose the greatest risk?

Our assistant supports a wide range of trial design elements:

  • Simulating patient responses: Binary, Continuous, Time-to-event, and Repeated-measure endpoints.
  • Analyzing simulated data: Statistical analysis for these endpoints.
  • Randomization: Flexible randomization of patients across treatment groups.
  • Enrollment and dropout modeling: Custom mechanisms for realistic patient enrollment and dropout scenarios.
  • Treatment selection: Supporting multi-arm multi-stage (MAMS) trial designs.

 

Balancing innovation with responsibility

Of course, like any AI solution, there are caveats. AI-generated code must be carefully reviewed for correctness, appropriateness, and regulatory readiness. RCACTS includes a built-in testing functionality to flag structural or syntactic errors, but statistical validation remains the user’s responsibility. Also note that all data interactions adhere to Azure OpenAI’s stringent data protection policies to ensure security and compliance.

There’s also a broader concern: will over-reliance on AI limit the creativity and deep statistical thinking that define our profession? At Cytel, we view AI not as a replacement for expertise, but as a tool to amplify it. Our goal is to give statisticians more time and mental space to explore, iterate, and innovate rather than reduce them to prompt engineers.

 

Looking ahead

The future of clinical trial design lies in intelligent integration: combining the strengths of validated commercial tools, flexible open-source frameworks, and AI-powered coding assistance. With East Horizon and RCACTS, we believe we’re building the blueprint for this future, with a platform that supports both scientific rigor and operational speed.

As the field continues to evolve, biostatisticians will need tools that not only keep up with complexity but also support creativity, collaboration, and efficiency. AI-generated R code, embedded within a powerful simulation engine, is one such tool and is already transforming how we approach design flexibility in clinical trials.

 

Catch us at JSM 2025 to learn more about how AI is transforming the future of clinical trial design within Cytel.

Leveraging Mobile and Wearable Technology for Outcomes Research in Depression

As mobile and wearable technologies become increasingly integrated into daily life, their applications have expanded far beyond convenience and lifestyle. In the field of outcomes research — particularly within mental health — these technologies are opening new frontiers for understanding and monitoring clinical endpoints. A notable case is depression, where continuous digital monitoring can provide rich insights into both the course of illness and treatment impact.

This post draws on our findings from a recent systematic review and poster presentation to examine how mobile and wearable tools are currently deployed in depression monitoring and how this aligns with broader outcomes research goals.

 

Digital monitoring as a tool for mental health outcomes

Over the past five to six years, depression has seen a marked rise across youth and adult populations globally, underscoring the need for scalable and effective monitoring strategies. In parallel, smartphones and wearables have become ubiquitous, capable of capturing passive, longitudinal health data. These digital tools offer unprecedented potential for outcomes research by providing real-time behavioral and physiological markers relevant to depression.

To map the current landscape, we conducted a comprehensive literature review focused on how smartphones and wearables are used to monitor depression in research contexts. This synthesis aimed to highlight prevailing methods, feature usage, and the extent to which demographic variability is accounted for — critical considerations in health outcomes analysis.

 

Key findings from the literature

We reviewed 140 studies and identified 22 that met our inclusion criteria. The following themes emerged:

 

Study characteristics

  • Recency: Most studies were published in 2024, reflecting the field’s rapid acceleration.
  • Geography: The U.S. and Pakistan emerged as leading contributors.
  • Sample Size: Studies included an average of 465 participants, suggesting moderately powered observational designs.

 

Demographic reporting

  • Gender and age: Captured in 20 of the 22 studies.
  • Ethnicity: Reported in just 9 studies.
  • Education and marital status: Only 4 studies reported these variables — yet both are key social determinants of health and influence depression outcomes.

 

Monitoring technologies and features

  • Smartphones were used in 20 of the 22 studies, highlighting their dominance.
  • Key features monitored included:
    • Mood tracking: 20 studies
    • Movement (accelerometer data): 10 studies
    • Heart Rate Variability (HRV): 5 studies
    • Word usage tracking: 4 studies
    • Sleep patterns: 2 studies

 

Clinical assessment tools

Self-reported clinical scales were commonly used as outcome anchors:

  • PHQ-9 (Patient Health Questionnaire-9): 6 studies
  • GAD-7 (Generalized Anxiety Disorder-7): 7 studies

(See our original poster for a visual breakdown of these features and tools.)

 

Implications for outcomes research

From an outcomes research perspective, these technologies offer compelling advantages:

  • Continuous and passive monitoring: Enables longitudinal capture of clinically relevant endpoints like mood, behavior, and sleep — reducing bias from intermittent self-reporting.
  • Scalability and reach: Mobile-based data collection can extend to underserved and geographically dispersed populations, improving study generalizability.
  • Early signal detection: Passive data streams can flag deterioration or improvement earlier than clinical visits alone, offering potential for timely interventions.

However, a consistent limitation observed in the literature is the underreporting of demographic variables — especially education and marital status. This omission constrains subgroup analysis and limits insights into how different populations experience depression and respond to interventions. In outcomes research, such data are essential for contextualizing and stratifying results across socioeconomic or cultural dimensions.

 

The path forward

As wearable and mobile sensors become more refined, their integration into real-world data frameworks will likely become standard practice in outcomes research. But to truly capitalize on this potential, researchers must enhance demographic reporting and examine interactions between digital phenotypes and traditional health indicators across diverse populations.

These tools not only offer more granular tracking of mental health status — they also help researchers and health systems better understand the dynamics of treatment effectiveness, burden of illness, and quality of life over time.

 

Interested in learning more?

This blog summarizes findings from the poster presentation, “Exploring Mobile and Wearable Technology for Early Depression Detection and Monitoring,” presented by Lyuboslav Ivanov and Manuel Cossio at Cytel and Universitat de Barcelona.

Smartwatches Are Transforming Clinical Trials: Insights from Digital Primary Endpoints

The landscape of clinical research is continually evolving, with a growing emphasis on leveraging digital technologies to enhance efficiency and data quality. Among these innovations, wearable devices like the Apple Watch have emerged as promising tools for continuous and remote patient monitoring.

We recently analyzed the current application of smartwatches in clinical trials, focusing on their role in capturing digital primary endpoints across a variety of therapeutic areas. Here, I share some of our key findings, including major application areas as well as the benefits and challenges associated with their wider adoption in clinical research.

 

Digital primary endpoints

One way smartwatches are being used in clinical trials is to collect digital primary endpoints — sensor-generated data often collected outside a clinical setting.

To understand the potential impact of smartwatches in this context, we analyzed 87 completed or terminated clinical trials listed on ClinicalTrials.gov that used Apple Watch technology, examining key variables such as therapeutic focus, endpoint types, geographic distribution, and study design. Here is what we found:

 

Key Findings

  • High completion rate: 93.1% of the trials were completed successfully.
  • Top therapeutic areas: Cardiology led with 28.7% of studies, followed by neurology (21.8%) and oncology (11.5%).
  • Common endpoints: ECG changes (18.4%), heart rate variability (12%), and oxygen saturation (10%) were the most frequently measured.
  • Study design: Interventional trials dominated (64%), with high recruitment rates across the board.
  • Geographic trends: North America hosted the majority of trials (55%), followed by Europe (30%).

 

Importantly, validation studies confirmed the diagnostic accuracy of these devices, supporting their potential for regulatory approval.

Leveraging Consumer-Grade Wearables in Clinical Trials: Insights From Digital Primary Endpoints Figure 1

 

Cossio, M. & Gilardino, R. (2025, May 15). Leveraging Consumer-Grade Wearables in Clinical Trials: Insights From Digital Primary Endpoints [Conference presentation]. ISPOR 2025, Montreal, Canada.

 

Why wearables matter in clinical trials

In clinical trials, smartwatches offer several unique advantages:

  • Continuous, remote monitoring: Smartwatches enable continuous, remote monitoring of patients, reducing the need for in-person visits and enhancing data collection.
  • Scalability: Smartwatch use is ideal for decentralized or hybrid trials, where flexibility and patient engagement are key, enabling participation across wide geographies.
  • Reduced costs: Smartwatches can also help reduce trial costs by requiring fewer site visits, enabling decentralized trials, providing real-time data collection and automated uploads, and so on.
  • Improved patient adherence and engagement: Smartwatches often include reminders, notifications, and user-friendly interfaces that help patients stay compliant with treatment schedules, data input, and study protocols.
  • Objective, high-frequency data: Smartwatches gather physiological metrics (e.g., heart rate, activity levels, sleep patterns) with high frequency and objectivity, reducing reliance on subjective self-reporting.
  • Increased accessibility and inclusivity:
    Smartwatches can broaden trial access for populations who may face barriers to travel or mobility, thereby enhancing demographic diversity and generalizability of trial findings.

 

The growing use of wearables and the future of clinical trials

The growing use of wearables in clinical trials signals a shift toward more scalable, cost-effective, and patient-friendly research models. However, challenges remain — particularly around technical reliability and patient adherence. Future research should focus on integrating wearables into value-based healthcare and global trial frameworks.

The Use of AI in Clinical Trial Data Management

Clinical data managers play a key role in clinical trials, ensuring the integrity of the clinical data and bearing ultimate responsibility for preparing the data for statistical analysis. As clinical studies evolve, data management is becoming more complex with the use of multiple data sources, as well as the increased volume of data through case report forms, patient reported outcomes, laboratory data, electronic health records, imaging reports, and more.

Here, we discuss various ways artificial intelligence has the potential to accelerate clinical trial data management as well as some of the benefits and challenges of using these groundbreaking tools.

 

Transforming data cleaning in clinical trials

The traditional method of data cleaning involves manual checks and review of data listings. Data that falls outside the expected results is queried, which leads to time-consuming communications that are often prone to error. Additionally, a significant amount of time is spent programming and validating data checks and listings.

AI has the potential to transform data cleaning: AI tools can quickly spot outliers, inconsistencies, and errors in datasets that may be missed with traditional methods. An example of this is an exception listing, which compares elevated laboratory parameters with adverse events. This listing would seek to strike a correlation between laboratory values and concurrent adverse events. AI can also detect possible missing or duplicate data. Ultimately, AI can lead to faster data availability by improving clinical trial data analysis and cleaning.

Furthermore, AI can be used to detect fraudulent data. As data managers review one patient data at a time, AI tools can look at data at a site collectively and look for potential fraudulent data (for example, dosing for all patients at the site is on the same day and time).

 

Improving database development in clinical trials

The use of AI is becoming more prevalent in the software industry: electronic data capture (EDC) companies are using AI technology to translate plain text for edit check requirements (e.g., temperature should be within 36–40 Celsius) into the programmed edit checks. This will significantly improve the timelines for database development within the clinical trial space.

 

Shortening database lock timelines

AI can also shorten database lock timelines. Data managers are often tasked with manually reviewing data to ensure data accuracy as well as verifying whether there is missing data. An example of this is considering a study participant with colon cancer, but there is no prior therapy within the timeframe of the condition. Data managers require a significant amount of time to perform such reviews. AI can examine the data holistically, flag possible discrepancies, and reduce manual effort and human error, allowing for a cleaner database at a much faster rate.

 

The need for human involvement

While AI can process large amounts of data, identify suspicious patterns, and analyze images, human involvement is required to validate the AI functionality. Humans will also be required to perform ongoing reviews of the output and adjust the AI tools as the amount of clinical trial data increases. With the use of AI, individuals involved in the clinical trial can focus their skills and time on evaluating complex factors and decision-making.

 

Benefits and challenges of AI

The use of AI in clinical trial data management and EDC development has several benefits:

  • AI can automate time-consuming manual tasks for a faster and more accurate database.
  • AI can reduce the possibility of human error while continuously monitoring clinical trial data. This allows data issues to be detected quickly, which can improve data accuracy and safety for trial participants.

However, challenges remain:

  • Maintaining participant confidentiality while using AI tools can be a significant challenge.
  • AI systems must be configured and monitored to ensure there is no risk of accessing unauthorized or unblinded data.
  • Large amounts of data are required for the AI to be properly trained to deliver high-quality results.

Ultimately, the decision must be made of whether the benefits of AI outweigh the challenges.

The pharmaceutical industry is a heavily regulated industry. Validity of AI tools must be established before these tools are deployed for database development and data cleaning. This will ensure the sensitivity and specificity with which the AI tool is expected to perform with a negligible error rate.

 

Final takeaways

AI will play an important role in many aspects of clinical trials, including and beyond data management. From identifying potential compounds to automating routine tasks, innovating statistical programming, streamlining medical writing, and creating digital twins, we will only continue to see advancements in AI tools in the coming years. AI can be a groundbreaking tool to shorten drug development timelines and improve patient outcomes.

The Ethics of Artificial Intelligence in Clinical Development

If nothing else, the concept of artificial intelligence is polarizing — opinions on the topic tend to be very strong. While there are many subtleties in the viewpoints, they generally fall into two camps. The first consists of early adopters and technophiles who are practically buzzing with excitement about AI’s potential impact. The second camp is, to put it mildly, a bit more hesitant. Due to myriad reasons, they are reluctant — often antagonistically so — to place their trust in a computer system.

As someone who would place themselves in a more neutral position, I can’t help but feel that both sides are correct, at least to a certain degree.

 

Rapidly evolving AI tools and their potential impact on the industry

AI and other predictive modeling and analytics tools have reached a point of sophistication where it is obvious they have the potential to provide tremendous value. At the same time, we have seen the potential for technology used poorly to have a catastrophic impact where it was originally intended to help. Given our roles within the life sciences and healthcare industries, we must proceed with care — every action, or inaction, carries real consequences to the health and well-being of people worldwide.

As technology becomes more accessible and affordable, the drive for adoption will only grow. At the same time, we are seeing change and growth at an unprecedented rate, with advances emerging faster than most of us can realistically grasp. The reality is that we may never have a simple answer that will guide our actions, in fact the questions will likely only become more complex and challenging as we move forward.

This doesn’t mean we don’t have an obligation to ask these questions, nor does it allow us to walk away simply because it’s too difficult. Instead, we must challenge ourselves to be more thoughtful, responsible, and foster an open dialogue about the path forward for our industry. Regardless of your opinion, it is important we all engage on this topic. Each of us brings unique perspectives and value — whether by raising overlooked concerns or clarifying terms like AI, which often become loaded with a connotation outside their form.

 

Critical conversations on the future of clinical development

Considering all of this, I invite you to join me on March 25, 2025, at 10 am ET for what I hope will be the first of many conversations about the ways data and analytics are disrupting our industry. In this live discussion, I will be joined by Allie DeLonay from the Data Ethics Practice at the SAS Institute to discuss the ethical use of artificial intelligence — both broadly and considering some of the nuances unique to clinical development.

Harnessing AI-Powered Tools for Clinical Trial Design Coding

The global move towards AI-powered tools is sweeping across the life sciences industry. In particular, the roles biostatisticians play in both clinical trial design and programming lend themselves to AI-based innovations.

Earlier this month, Cytel launched its first AI-driven solution for clinical trial design code generation and joined the artificial intelligence revolution. This innovation is predicated on several years of research and development, coupled with the recent maturing of AI-focused service providers. The solution is designed for optimal functioning within the East Horizon platform and intended to enhance R integration functionalities that are now embedded within our software.

 

What makes Cytel’s AI-powered R coding assistant unique?

The coding assistant generates R code with required parameters for East Horizon. Unlike generic AI-based coding tools that generate standalone R scripts, this solution ensures the generated code includes function templates, expected arguments, and input variable names; is structured for direct integration into East Horizon’s simulation engines; and is aligned with industry best practices for regulatory-compliant clinical trials. In addition, the coding assistant is purpose-built for biostatistics and clinical trial design.

General-purpose AI tools do not innately relate to adaptive trial designs, survival analyses, or clinical trial randomization. Cytel’s AI-powered R coding assistant allows biostatisticians to generate custom statistical tests beyond software-native options; perform advanced patient data modeling such as Quasi-Poisson, longitudinal outcomes, etc.; and allows for alternative randomization and drop out modeling methods.

Finally, the coding assistant is embedded within an industry-standard solution for trial design. The solution ensures compatibility with East Horizon’s statistical engine, generating code that is formatted correctly and validation-ready.

 

How does the solution work?

Users interested in augmenting their trial design simulation work can select the R integration features within the software and gain access to the coding assistant. Users then enter prompts in natural language to illicit a response. The user can review the response, iterate and refine with additional queries, and modify the code to fit the task at hand. Once refined, the code can be employed in simulation runs for additional validation and debugging.

 

Why does this matter?

The AI-powered R coding assistant in East Horizon enables biostatisticians to generate complex R code instantly; customize trial simulations with precise statistical methods; and reduce manual coding errors and speed up model validation.

 

Custom R coding for oncology designs

The ability to augment design characteristics with custom R code is especially relevant to the ever-evolving oncology area of study. As regulatory guidelines are routinely adjusted to comply with clinical practice and current research, oncology studies often require specific analysis approaches and/or patient outcome data generation methods to conform to changing evidence thresholds. For example, the testing method and analysis type chosen for a specific design can be highly sensitive to the underlying distribution of the data. Therefore, simulating designs with a variety of analysis types can help design studies that are robust to a variety of possible data distributions.

With this in mind, using commercial software to generate patient outcome data through simulation takes full advantage of the software’s native workflows and computing power. These data are then analyzed against a variety of analysis types using R code augmentation. This approach to analysis variation also lends itself to advanced Bayesian tests, affording biostatisticians maximum flexibility.

 

Want to learn more?

In our recent webinar, “Evaluating Different Analysis Options for Your Oncology Study Design by Combining East Horizon and R,” J. Kyle Wathen and Valeria Mazzanti discuss clinical trial design using a combination of R coding and Cytel’s proprietary statistical software, with a focus on analysis testing variations:

The Year Ahead for FSP: Open Source, AI, Global Reach, and Cost Efficiency

The Biometrics FSP outsourcing market is evolving faster than ever. Looking back, 2024 was a year of transition for our industry as we put the COVID bubble in our rearview mirror and focused on efficient delivery of our portfolios. Looking ahead now to 2025, Biometrics FSP is on track for continued growth, with a strong emphasis on open-source technologies, global reach, artificial intelligence, and cost efficiency.

Here, I touch a bit on these four areas and share our thoughts on the impact they will have in 2025 and beyond.

 

Embracing open-source technologies

While the adoption of open-source programming has been slower in the clinical research space, tools like R and Shiny are quickly gaining traction as cost-effective and reliable solutions for data analysis and submissions.

Cytel has been leveraging open-source software for big data aggregation, application development, and validation. We will continue to be a key contributor to the open-source ecosystem and help organizations solve key design and analysis challenges, while offering access to the industry’s top R/Shiny talent pool.

 

Offshoring hub strategy and cost-effective solutions

The demand for more cost-effective solutions continues to drive the use of offshore resources across the industry. Cytel’s expansion into Eastern Europe and continued growth of our South Africa– and India-based teams positions us well to support our sponsors in reducing costs while maintaining the high levels of quality they have come to expect of Cytel.

 

Artificial intelligence

AI is revolutionizing the biometrics space by enabling real-time data monitoring, automated code generation, and improving statistical accuracy in clinical trials. By flagging anomalies and potential errors, AI reduces the risk of data discrepancies and enhances overall data quality. AI-powered tools are streamlining biometric services, automating routine tasks and allowing researchers to focus on high-value activities.

 

Secure data collection and real-time monitoring

Innovations in data collection and real-time monitoring are improving privacy, security, and data integrity. Advanced authentication methods and AI integration are helping ensure the accuracy and confidentiality of data.

Automation is also playing a key role by extracting data from unstructured sources, such as medical records, and reducing human error during data transcription. This further enhances the efficiency of electronic data capture (EDC) systems and boosts the overall reliability of clinical data.

 

Final takeaways

I’m more excited about 2025 than any year in my career. In an industry that has been criticized for moving too slowly and cautiously, we sit an inflection point for rapid evolution of decades-old models. Change can be exciting and scary all the same. Reach out — myself and our FSP experts are eager to present content and engage with you throughout the year at events such as PHUSE, JSM, SCDM, and more.

AI’s Influence on SAS Programming

The advent of Artificial Intelligence (AI) has transformed numerous fields, and the domain of SAS (Statistical Analysis System) programming is no exception. From automating tedious tasks to enhancing decision-making processes, AI has made significant inroads into how SAS programmers work.

However, AI is not a substitute but a companion to programmers. While AI can help us focus our critical thinking, creativity, and problem-solving skills, AI needs our expertise. Domain expertise is still essential.

To understand this transformation better, here we explore key ways AI has impacted SAS programming, particularly by comparing skills of traditional to AI-assisted programming, examining the days before and after AI, and discussing the new responsibilities and skills required in the modern programming landscape.

 

Traditional SAS programming vs. AI-assisted SAS programming

Traditional SAS programming has long been a manual, code-intensive practice requiring a high level of expertise in statistical analysis and programming. In the earlier days, SAS programmers worked with well-defined, often repetitive tasks. The process of developing code required a deep understanding of the data and statistical methodologies, all while meticulously debugging and quality-checking code.

AI-assisted SAS programming introduces a new level of efficiency, allowing programmers to focus more on value-added tasks rather than repetitive work. Traditional SAS programming workflows are now supported by AI-driven automation tools that can generate code, optimize algorithms, and even offer suggestions for complex statistical analyses. For example, where traditional methods would require a programmer to sift through data to find patterns, AI can now analyze large datasets in seconds and offer insights that help in decision-making. This allows the SAS programmers to focus on more strategic and high-level interpretations.

In essence, the role of the SAS programmer is evolving from being a “code generator” to a “code curator” and they maintain control over every step, providing deep customization and understanding of the entire process.

 

AI as a companion, not a substitute

The fear of AI replacing jobs has become a common narrative, but in the case of SAS programming, AI should be viewed as a companion rather than a replacement. While AI can optimize code, automate reporting, or even suggest corrections, it is still far from replacing the creative and analytical skills of programmers. AI systems can generate insights based on patterns within datasets, but understanding the nuances of those patterns and making informed decisions based on them remains a unique programmer’s skill.

SAS programmers have a deep understanding of the data they work with, including the context, limitations, and real-world implications of their findings. While AI can handle the heavy lifting in terms of data processing and analytics, the role of the programmer is to interpret these findings, cross-check their accuracy, and ensure the outputs are aligned with business goals or research questions.

Additionally, AI’s suggestions aren’t always perfect, especially when dealing with edge cases or complex datasets with nuanced relationships. In such scenarios, a programmer’s oversight is crucial to prevent AI-driven errors from propagating throughout the analysis.

 

Before and after AI

The landscape of SAS programming before the integration of AI was characterized by manual coding, exhaustive debugging processes, and labor-intensive quality control procedures. Let’s break down the key changes AI has brought to these areas:

 

Code development

Before AI, coding was manual and depended heavily on a programmer’s syntax knowledge and experience to ensure that the code adhered to best practices for efficiency and performance. This could be a time-consuming process, especially when dealing with large, complex datasets.

In the post-AI era, code development is becoming more efficient through AI-assisted coding tools. These tools can automatically suggest code snippets based on previous coding patterns or even generate entire blocks of code tailored to the dataset. AI-driven auto-complete features and advanced libraries that recommend the best statistical models or data manipulation techniques have significantly sped up the development process.

 

Debugging

Debugging used to be a meticulous and painstaking part of the SAS programmer’s job. Identifying errors in code or incorrect outputs is often required by going through large blocks of code line by line, manually reviewing logic and syntax.

AI has revolutionized debugging by identifying errors in real time, suggesting fixes, and even automatically correcting syntax errors. AI tools can also track changes in code and predict where potential issues might arise based on past errors, significantly reducing debugging time and enhancing code accuracy.

 

Quality control (QC)

Before AI, the QC process was often manual or semi-automated, prone to missed errors, and involved peer reviews, statistical validations, and rigorous testing to ensure that the code met the necessary standards. This was particularly important in industries such as healthcare or finance, where data accuracy is critical.

Today, AI-driven QC tools can automatically verify the integrity of datasets, flag inconsistencies, and ensure that statistical models meet predefined accuracy thresholds. These tools can run tests much faster than human reviewers, allowing for quicker validation cycles and better compliance with industry standards.

AI doubles productivity, without replacing the need for programmer’s intuition and expertise, so we can opt for other developmental activities like enhancing the client outcomes, learning new skills, and mentoring to strengthen the overall team.

 

New responsibilities and skills for SAS programmers in the AI age

New responsibilities and skills required for AI platforms

  • To understand how to work along with AI tools
  • To adopt AI-driven workflows for faster development cycles
  • To learn to guide and review AI-generated code
  • Additional skills like data literacy, critical thinking, and ethical AI considerations are also required

 

Industry AI tools

  • Tabnine: AI-powered code predictions
  • Snyk: AI-driven security checks
  • DeepCode: Real-time AI code review
  • SAS Viya: Integrate existing code with AI tools

 

Final takeaways

AI tools are transforming the role of SAS programmers, making them faster and more effective, but human expertise remains crucial in directing AI and ensuring high-quality outcomes. The future of programming likely lies in a hybrid approach that leverages both human expertise and AI-driven efficiencies.

 

Interested in learning more about AI in clinical development? Watch our recent webinar:

Driving Innovation in Clinical Trial Design: Open Source, Commercial Software, and AI in 2025

As we usher in a new year, we reflect on 2024’s prominent trends in simulation software for clinical trial design that will continue to drive innovation in the coming year. The two main areas of growth and innovation we see taking the lead in 2025 are:

  1. The combination of open source with commercial software solutions
  2. The increasing use of AI to generate open-source code and augment clinical trial design

 

Commercial software: Confident and quick design capabilities

Commercial software remains a common and popular choice for clinical trial design, with many sponsors opting for these tools. This choice allows for confident and quick design through validated workflows and pre-coded and verified design types. As an accepted choice with a wealth of trial design options, biostatisticians can easily and quickly design and compare a variety of trials. Furthermore, users enjoy access to expert professional support in addition to frequent software releases that ensure updates to methodologies and design types.

 

Open-source code offers a high degree of flexibility

Although commercial software provides numerous benefits to biostatisticians, there are also drawbacks to this choice in isolation. In a complex scientific field, biostatisticians often encounter idiosyncratic problems that require unique and custom solutions. In these cases, validated commercial software may prove insufficient, and custom code must be developed to address the problem at hand. In fact, this need for flexibility is at the root of the rise of open-source software for custom coding using industry-accepted languages like R, Python, or Julia. These languages afford biostatisticians a degree of creativity in their work and go hand-in-hand with the collaborative nature of this highly academic field. Over the years, many code packages have been developed and shared as solutions to unique design aspects, helping to drive and shape industry trends.

However, with this near-limitless flexibility come several drawbacks. Vetting or developing a bespoke solution can be complicated and resource intensive. Time is required for collecting requirements, writing code, testing, and validating a custom open-source design option. This approach relies on a set of expertise in both software development and statistical methodology. While biostatisticians have deep knowledge and experience in statistics and clinical trial design, they are not typically trained in best practices for software development and programming. These best practices are crucial in developing reliable, robust solutions that can easily be shared with others and that apply to a wide array of trials. Finally, the results derived from open-source code require additional resources for both design selection and communication of results, in the context of a multidisciplinary team. The biostatistician’s attention is thus diverted from providing valuable strategic input to the clinical development team towards software development and implementation tasks.

 

Combining open-source code with commercial software

Acknowledging these challenges, the industry is quickly adopting a combined-capabilities approach that incorporates the established, validated backbone of commercial software with the added creativity afforded by open-source code. This approach allows biostatisticians to augment elements of the design such as the choice of analysis type, statistical test, or the distributions used to generate various design inputs, without the need to code an entire design. In addition, clinical trial design professionals benefit from the cloud computing power embedded in some commercial software solutions, eliminating the need for maintaining an expensive internal computational grid. We believe that this integrated future of study design harnesses the benefits of both commercial software and open-source solutions while limiting the drawbacks experienced with each approach individually.

 

The use of artificial intelligence in generating code for clinical trial design

Along with the intensive use of R and other coding languages, we believe that we will see increased interest in using AI solutions for a variety of clinical trial design and execution activities. These applications of AI may include data transformation and cleaning; statistical analysis; protocol writing; clinical data reporting; trial management practice; and efficient code generation and validation for clinical trial design. For the latter, AI solutions powered by Large Language Models (LLMs) can be harnessed to produce analysis-ready custom code based on project specifications. Indeed, over the past few months, Cytel has introduced an AI-driven coding assistant in its newest clinical trial design software to augment study designs with novel approaches via custom code. This approach holds several advantages, among them: the ability to generate code faster; the potential for efficient code validation and editing; and the ability to generate code using natural language prompts.

With the great promise that such tools hold, there are also potential drawbacks and concerns expressed by biostatisticians working in the field. AI-supported code generation requires close review by trained coders to ensure the code created using these tools is sound and applicable to the purpose for which it was created. While code generated by AI can save considerable resources, it requires close supervision and review for validation and application in practice. Over-reliance on code-generation tools may, over time, change the way in which statisticians think through complex coding problems, and limit creativity in this field.

 

Final takeaways

The landscape of clinical trial design is poised for significant advancements in 2025, driven by the integration of commercial software and open-source solutions, as well as the innovative application of AI for code generation. By leveraging the strengths of commercial software — validated workflows, expert support, and computational power — and combining them with the flexibility and creativity of open-source coding, biostatisticians can overcome traditional challenges and design trials more efficiently. Furthermore, AI-powered tools promise to streamline the generation, validation, and customization of code, empowering teams to focus on strategic decision-making and innovation. These trends signal a promising era of collaboration, efficiency, and enhanced capabilities in clinical trial design.

 


Cytel’s East Horizon Platform now includes open-source integration points, allowing users to inject custom analysis types, statistical tests, and patient outcome generation into existing software workflows. In addition, the software includes an advanced AI-driven coding assistant that can generate compatible custom R code using plain language queries for integration in study designs. These new features, in combination with Cytel’s advanced trial simulation tools and cloud computing capabilities offer a potent, comprehensive solution for clinical trial design and optimization.