Sofia S. Villar is a member of the DART (Design and Analysis of Randomised Trials) group at the MRC Biostatistics Unit in Cambridge England. She has recently been awarded the first Biometrika post-doctoral research fellowship.
In the post below Villar offers her position on how challenges in rare disease drug development may be alleviated by Bayesian-Bandit adaptive designs. The "Multi-armed Bandit" is an agent who tries to acquire new knowledge while trying to capitalize on existing knowledge. Clinical studies can therefore utilize bandit designs to recruit patients whose primary goal in participating in a trial is to improve their health outcomes.
Why using Bayesian-Bandit adaptive designs can address challenges in
Rare Disease Drug Development
Adaptive clinical trials are attractive from many perspectives. By reducing the risk, time and cost associated with clinical development they are financially more appealing compared to conventional fixed randomised trials. Additionally, adaptive clinical trials that modify randomization weights based on response data can be a partial remedy for the Therapeutic Misconception by which patients (and their family members) believe that the foremost goal of a clinical trial is to improve their outcomes.  This can eventually work out to alleviate recruitment problems too.
However, another important benefit of adaptive designs is their ability to balance within a trial two conflicting goals: to correctly identify the best treatment (learning) and to use that knowledge to treat the largest number of patients better (earning). Correctly identifying the best treatment requires some patients to be assigned to all treatments, and therefore the former acts to limit the latter. In general, the optimal level of learning versus earning depends on the size of the population with the disease (or patient horizon) . In a rare disease context, where the number of patients in the trial is a high proportion of all patients with the condition, there are not enough patients to ensure a high probability of correctly detecting the best treatment (i.e., a high power). Therefore, the idea of prioritizing patient benefit over hypothesis testing to guide the design of the trial is a highly accepted one in the medical community dealing with these diseases. 
There is a large body of theoretical literature accumulated during the past 50 years, known as bandit problems, that addresses the optimal solution to the earning vs learning dilemma and which has long been motivated by the optimal design of multi-arm clinical trials  . It provides a rationale for efficiently learning which treatment is the best and then exploiting this knowledge for the benefit of future patients. Despite this long standing theoretical work developed in this area, bandit methods have never actually been applied in a real clinical trial.
A recent paper of mine  reviews the use of bandit methods in multi-arm trials. They were found to perform extremely well when judged solely on patient outcomes, i.e., they offer significantly larger numbers of average patient responses and a higher mean proportion of patients allocated to the best arm (when it exists) compared to the traditional fixed randomisation approach and even to some other adaptive allocation rules. However, in addition we have demonstrated that using a fixed randomization probability to the trial’s control arm (and using bandit methods for experimental arm allocation) yields designs which score highly on both patient benefit and statistical power.
In further work  we are attempting to overcome two remaining central limitations to the use of bandit methods in practice: (1) it is fully sequential, i.e. it requires the endpoint to be observable soon after treating a patient, thus reducing the medical settings to which it is applicable; (2) it is completely deterministic, thus removing the central tenet of randomization from the trial and being subject to assignment bias. In a working paper, we have proposed a novel implementation of bandit based rules that overcome these difficulties, trading off a small reduction in optimality for a fully randomized, adaptive group allocation procedure which offers substantial improvements in terms of patient benefit, both for small and large populations. We feel that the resulting bandit rule is especially useful for trials in which the main goal is the selection of a superior treatment for continuing further study or when power is not the most important objective of the experiment (for example in Phase II or rare disease settings).
So watch this space!
Liked this article ? Join our global audience of biopharmaceutical innovators and click the button below to receive Cytel blog notifications direct to your inbox ( choose from instant or weekly notifications).
Gittins, J. C. and Jones, D. M. (1974). A dynamic allocation index for the sequential design of experiments. In Gani, J., Sarkadi, K., and Vincze, I., editors, Progress in Statistics (European Meeting of Statisticians, Budapest, 1972), pages 241-266. North-Holland, Amsterdam, The Netherlands.