Defining Probability of Success in Clinical Trial Design with Commercial Software and R Coding

April 22, 2024

Written by Boaz Adler

Written by Boaz N. Adler, Director, Global Product Engagement, and J. Kyle Wathen, Vice President, Scientific Strategy & Innovation

One of the pivotal metrics considered when designing a clinical trial is the study’s probability of success, which can be measured in several ways. Each definition of probability of success has benefits and limitations both in the way it is measured, and the way it is accounted for in study design and execution.

Here, we examine two definitions of probability of success in similar settings and describe how each can be defined and incorporated into a study design using a combination of commercial software and R coding. In leveraging two tools, statisticians can rely on the power and confidence of commercial software in conjunction with the flexibility of R code.

Criteria for study success

One of the most basic criteria for study success is study power: the probability of detecting a true treatment effect. This is effective in single endpoint, dual-arm designs, where there are no issues of multiplicity. It has been widely accepted as a measure of success in clinical trial design over the years.

However, as more complex study designs are introduced, including those assessing multiple endpoints or multiple treatment arms, the definition of study success becomes more nuanced. In these cases, study statisticians need to define the winning condition for the study: the sequence or relative importance of multiple endpoints; or how many arms need to show statistical significance to declare a successful study. For example, if two co-primary outcomes are assessed, the study will be declared a success if both endpoints are statistically significant. An alternative would be a primary outcome followed by a key secondary outcome, where the secondary outcome is only tested if the primary outcome is statistically significant.

In any scenario with multiple endpoints, it is crucial to maintain type-1 error among the statistical tests performed. In the case of multiple arms, the winning condition could be determined as winning on at least one arm, the highest dose arm, or multiple arms, depending on the product profile and sponsor preference. Here too, as we are handling multiplicity, it is important to monitor the family-wise error rate (FWER) to ensure proper testing and sound conclusions.

In a multiplicity setting, study power and the winning condition are taken together to define the overall probability of winning for the study. For example, sponsors may choose to rely on conjunctive power if they are interested in declaring success based on both endpoints, and disjunctive power if they wish to declare success on at least one endpoint.

For seamless phase II/III designs, the sponsors first test two or more treatment arms against a control in a smaller phase II study, and based on the results, the protocol determines which arm(s) “graduate” into the larger phase III study. The sponsor must account both for the multiplicity in arms (FWER), but also the type-1 error of multiple testing between the study’s phases. The p-value combination method, or similar type-1 error control methods, can be used.

The choice of winning condition and definition of probability of success depends on the product profile, design requirements, and sponsor preference. Biostatisticians then use a variety of tools, including commercial software and custom coding, to evaluate different study designs and select a design that is robust, in accordance with the outlined definition of success.

Next, we share two examples: one in which we relied solely on commercial software, and another in which we combined commercial software and R code to optimize two similar designs with different definitions of probability of success.

Average study power as probability of success: Using commercial software

First, let’s take a look at an example in which we relied solely on commercial software. For a group sequential study design with two arms and a single normal outcome, average power is the definition of probability of study success and assumes a discrete treatment effect.

The study design was simulated using Solara®, which can rapidly simulate hundreds, or even thousands, of study designs using cloud computational resources, and can select robust designs using pre-specified workflows and tools. Additionally, it can simulate a range of potential treatment effect scenarios and apply weights or “likelihoods” that these will be the true underlying treatment effect. Thus, a Solara design space can accommodate many variations in study design parameters and assess those against many realistic true underlying treatment effect assumptions, leading to a more accurate computation of average study power.

In this case, study statisticians were able to identify one group sequential and one sample size re-estimation alternative to the originally proposed group sequential design. Both designs perform better in terms of the studies’ operating characteristics and provide a more realistic probability of success based on study power and the assigned weights to the true underlying treatment effect scenarios. After defining the design space and setting up the simulation run, the software simulated over 11 million clinical trials in mere minutes. Design prioritization, selection, and team discussions took a few additional hours. This saved the team a significant amount of time in trial simulation and selection, compared to custom R code alone.

Bayesian probability of success in a phase II study design: Using commercial software + R coding

Now, let’s consider a similar study design: a group sequential study with two arms and a single normal outcome. However, here we introduced the notion of a “go-no-go” decision point at the interim analysis. Accumulated data is assessed at 50% information fraction to check for futility, and if futility is not met, the study continues to its conclusion and a full readout based on the primary outcome. Futility is based on the Bayesian predictive probability of study success at the end of the study and the probability of success considers the probability of passing the go-no-go decision point and achieving overall success at the end of the study.

The addition of a futility rule based on Bayesian predictive probability of success at the interim places this design outside the immediate capabilities of commercial software such as Solara or East®. To overcome this hurdle, we used a combination of East proprietary study design software with integrated R code capabilities for a hybrid approach. We expanded the design coverage of East by including a new analysis method developed as a function in R that includes futility based on Bayesian predictive probability of success. In addition, we used a continuous prior on the true treatment effect by integrating another R function into East to replace the way patient data is simulated.

The team began with a bimodal prior on the true treatment effect, reflecting the team’s belief that there was a significant chance of no treatment effect. The prior was obtained as a mixture of two normal distributions, one with a mean of 0 and the other a mean of 0.7. Another useful distribution obtained from the simulation is the posterior distribution of the true treatment effect given the phase II trial concluded with a “go” decision. The conditional posterior distribution can be very helpful in understanding risk when designing the phase III study that would follow the phase II study. Comparing the prior and the posterior distributions can demonstrate how much information the phase II study provides.

By integrating two R functions into East, we computed the Bayesian assurance, or probably of success, assuming a continuous prior on the true treatment effect and accounted for a new futility rule based on the Bayesian predictive probability of success, with only minimal R development.

A hybrid approach: Commercial software and custom R coding

Commercial software allows for confident and quick design through validated workflows and pre-specified and verified design types. In our first example, commercial software rapidly developed a large design space with over 10,000 clinical trial models, and we leveraged the system to prioritize, select, and compare different complex design options. This approach relies on the software to automate certain aspects of simulation and selection, based on criteria and likely execution scenarios entered by the user. The system is also flexible enough to allow for the assignment of weights on particular treatment effects, which means the user can incorporate priors into the analysis and it provides a more realistic simulation run and final study design.

However, R coding allows for almost limitless flexibility in terms of methods, but is dependent on the user’s coding ability, requires time for writing and validation, and demands additional resources for communicating results and design selection. In our second example, thoughtfully injecting R code into off-the-shelf software analyses in the areas that required flexibility mitigated some of these limitations, while relying on the convenience and pre-coded workflows of the software to save time. We were able to incorporate both a Bayesian assurance to discern study success, and we changed how patient data is simulated, utilizing a continuous prior on the true underlying treatment effect. We were able to complete the simulation runs on the second study design rapidly, relying on commercial software, while finding flexibility in both treatment effect modeling and the definition of study success.

We therefore believe a hybrid approach employing both commercial software and custom coding can provide time savings and confidence in automating basic design characteristics and types using existing software, interjecting with custom code where needed to incorporate novel methods or specific analysis types required for each project.

Final takeaways

One of the key considerations in clinical trial study design is the definition of probability of study success, and the choice of definitions reflects the priorities and aims of study sponsors and designers. In early design conversations, special attention should be given both to the selection of study success criteria and the tools that will be used by study designers to pressure test these definitions and produce a robust study design. A thoughtful combination of both commercial software and custom R code can offer added flexibility and confidence in parameter selection, design simulation, and pressure testing.

This blog is adapted from Boaz Adler and Kyle Wathen’s presentation, “Congruent Statistical Tools: Variations on Probability of Success in Clinical Trials Using Commercial Software in Combination with R Code,” presented at Duke Industry Statistics Symposium 2024.

Interested in learning more? Valeria Mazzanti, Associate Director of Customer Success, and J. Kyle Wathen, Vice President, Scientific Strategy and Innovation, discuss the integration of East® and R. With this new capability, users have greater latitude in selecting input parameters, such as analysis types and test statistics, beyond those that are native to the software. Click to register:

About Boaz N. Adler

Boaz Adler is Director of Global Product Engagement at Cytel. He has served as a Solutions Consultant and Analyst for Life Sciences companies and Health-Tech organizations for over a decade. Boaz’s interests are focused on tech and novel services innovations that contribute to more coherent and robust evidence generation across the drug development cycle. At Cytel, Boaz enhances the connection between Cytel’s software development team and its clients and supports clients in clinical trial optimization projects using Cytel’s cutting-edge technology.

About J. Kyle Wathen

Kyle Wathen is Vice President, Scientific Strategy & Innovation at Cytel. With a background in academia, consulting, and industry, he has more than 20 years of experience and is an expert on the development and application of novel Bayesian methodology for adaptive clinical trial designs. Kyle has been involved in many innovative clinical trials in oncology, neuroscience, infectious diseases, cardiovascular, and inflammation as well as many platform trials. He has also released several software packages including OCTOPUS, an R package for simulation of platform trials.

Defining Probability of Success in Clinical Trial Design with Commercial Software and R Coding

Criteria for study success

Average study power as probability of success: Using commercial software

Bayesian probability of success in a phase II study design: Using commercial software + R coding

A hybrid approach: Commercial software and custom R coding

Final takeaways

Read more from Cytel's Perspectives:

Sorry no results please clear the filters and try again

Commercial and Open-Source Software Synergy for Clinical Trial Design

Understanding Group Sequential Designs

Sample Size Re-Estimation for Rare Disease Clinical Trials

Therapeutics Development Team

Software

Innovative Trial Design

RWE

HEOR

Biometrics Implementation

Functional Service Provision

CDI

Events

Newsroom

Resources

Defining Probability of Success in Clinical Trial Design with Commercial Software and R Coding

Criteria for study success

Average study power as probability of success: Using commercial software

Bayesian probability of success in a phase II study design: Using commercial software + R coding

A hybrid approach: Commercial software and custom R coding

Final takeaways

Read more from Cytel's Perspectives:

Sorry no results please clear the filters and try again

Commercial and Open-Source Software Synergy for Clinical Trial Design

Understanding Group Sequential Designs

Sample Size Re-Estimation for Rare Disease Clinical Trials

Subscribe to our weekly newsletter

Subscribe to our weekly newsletter