New Primer and Ebook on Synthetic Control Arms

June 8, 2020

Cytel has recently published a new ebook on synthetic control arms, and a new scientific primer as well.

The rising costs of clinical trial completion and the greater availability of data, make the uses of synthetic control arms a less-costly option adopted in a number of clinical trials, owing to the absence of either a full or partial control group. They are generally most useful when practical or ethical reasons limit randomization to control. These synthetic control arms replace the placebo or standard of care with a simulated arm that draws on data collected in previous trials or from real world data sources. This in turn saves the costs and times associated with patient recruitment and retention, while avoiding the need to give very ill patients an already well-tested candidate drug.

The new primer focuses on assessing the validity of data, the validity of methodology, and the modes of analysis and interpretation within this burgeoning field. Each of these is a crucial part of understanding how to make the most impact with a synthetic control.


For our purposes, synthetic control arms randomize and adjust external data sources to create control arms for a trial. These external data sources are any source of clinical data collected outside of the trial at hand. This includes data from previous trials, as well as health records, medical claims data, and so forth. The synthetic control is created by randomized cohorts from external data sources, adjusted using the statistical methods described below. We assume that for this post, all trials utilizing synthetic control arms are also randomized control trials.

We will also be using the term validity below to identify places where the scientific rigor of either a dataset or a methodology might be called into question. While synthetic control arms can significantly alter the speed of a trial, this must be accomplished with high levels of scientific validity. As many of these methods are new, we encourage all considering them to have a data scientist at hand who can avoid the pitfalls of poor data or sloppy methodology.

Data Validity:

When using external data sources within a trial, it is vital to ensure that the data are reliable and comprehensive. Additionally, though, they must fit the needs of the new trial without skewing patients being sampled, the outcomes measured, or the process of data collection altering in a way that generates statistical bias. 

Be sure to consider the application of your external control when selecting data vendors. Be aware that the level of scrutiny of the underlying data will vary depending on use case. Regulatory bodies may reasonably request partial or full access to the underlying data and therefore making sure your data provider(s) can offer this is important in these cases.

Therefore, data validity must be assessed along two parameters: whether the original dataset itself is a reliable source of data, and whether the dataset is sufficiently similar to the data that would have been collected during a new trial, to provide relevant data for the control arm.

There are five questions that need to be evaluated to ensure data validity:

  • Was the original data collection process similar to that of the clinical trial?
  • Was the external control population similar to the clinical trial population? When this is not the case the new dataset is considered “heterogenous” to the external one. Here is an entire article on how to handle heterogeneity of datasets.
  • Did the outcome definitions of the external control match those of the clinical trial? For a more expanded look at using estimands click here.
  • Was the synthetic control dataset sufficiently reliable and comprehensive?
  • Were there any other limitations on the dataset?

Finally, when considering appropriate datasets there is a general rule of “garbage-in garbage-out.” Stated more plainly, if the data is not up to par, no amount of quantitative maneuvering will result in high quality, statistically rigorous output.

Methodological Validity:

A number of methods can be used to adjust external data to the needs of a trial. Depending on circumstances, these range from regression and propensity scoring, to Bayesian mixed-models, neural forecasts and so forth.

Once you have established data validity, working with a quantitative strategist to determine the best methods for your trial can help you to explore options fully. Most quantitative strategists will evaluate the following questions to determine methodological validity:

  • Did the clinical trial include a concurrent control arm or is the synthetic control data the only control data?
  • How was the synthetic control data matched on the intervention group?
  • Were the results robust to sensitivity assumptions and potential biases?
  • Were synthetic control comparisons possible for all clinically important outcomes?
  • Are the results applicable to your patients?
  • Were there any other limitations to the synthetic control methods?

Application of Results

Cytel scientists note that external data may not always provide evidence on the same outcomes as concurrent randomized controls. Therefore interpretation and analysis will play a more significant role when utilizing synthetic control arms properly.

Such cases require careful consideration of all the techniques used to build a synthetic control. Working with experienced data scientists, statisticians and epidemiologists can give you the partnership you need to deliver a solid case for your regulatory submission.

For a non-technical introduction to synthetic control arms, download the Cytel ebook:

Access Ebook


Thorlund, K., Dron, L., Park, J. J., & Mills, E. J. (2020). Synthetic and External Controls in Clinical Trials–A Primer for Researchers. Clinical Epidemiology, 12, 457.