Contact Us     Search     Site Map  
  Home

    East

Cytel Home > Products > East > 
East® 4: Example 2

Case Study: Vesnarinone Trial Among Patients with Heart Failure
Vesnarinone is a drug that enhances contractility of the heart. It was shown in short-term placebo controlled trials, to relieve symptoms and improve quality of life and prognosis in patients with a history of heart failure despite conventional therapy. However, the same studies raised concern over the occurrence of side effects as well as a possible adverse effect on mortality with high dose regimens. This prompted the realization of the trial described below which was published in the New England Journal of Medicine, on December 17, 1998, under the title “A dose-dependent increase in mortality with Vesnarinone among patients with severe heart failure”, authored by JN Cohn, SO Goldstein et al. A total of 189 centers in the United States and Canada accrued 3833 patients to the trial over a 15 months period between 1995 and 1996. Randomization occurred between conventional medication (placebo arm), conventional medication plus 30 mg of Vesnarinone per day and conventional medication plus 60 mg of Vesnarinone per day. The primary outcome variable was mortality from all causes. This three arm trial was planned to accrue 3618 patients and to be analyzed sequentially at 6 month intervals using the O’Brien and Fleming spending function and was planned to end if 232 deaths had occurred in the placebo group.

  1. The Fixed-Sample Design
  2. Group-Sequential Designs
  3. Interim Monitoring of Vesnarinone Trial
  4. Early Termination of Vesnarinone Trial


1. The Fixed-Sample Design

The investigators expected a baseline mortality of 20% at one year in the placebo group, corresponding to a median survival time of 37.28 months. Vesnarinone was expected to bring the one-year mortality down to 14% (a 30% reduction) with either of the two dose levels considered, corresponding to a median survival time of 55.15 months. An overall significance level of 5% on a twosided test was desired. A pair-wise comparison of each of the two active drug groups with the placebo group implied a Bonferroni adjustment to the alpha level for each two-sided comparison, which was then set to 2.5%. The study was required to have 90% power to detect the difference of interest. Based on the published data we may assume an average accrual rate of 260 patients per month. For the purpose of this example, and in line with the way the study has been planned, we shall restrict our interest to only one pair-wise comparison, the placebo versus the 60 mg Vesnarinone arm. We may therefore assume an accrual rate of two-thirds of 260, that is, of 174 patients per month.

 

In studies where time to event (death in our example) is the outcome of interest, it is important to find the right balance between patients enrolled and total study duration, since the power of the study is related to the number of events observed rather than the number of patients accrued. For the placebo-drug comparison of interest, and assuming no interim looks (fixed-sample study), East estimates that in order to achieve 90% power the trial should be stopped when a total of 334 events have been observed. But how long do we expect to wait in order to observe 334 events? That would depend on the number of patients accrued. The smaller the number of patients the longer it will be necessary to follow them up in order to observe the required number of events. East provides a graphical representation of the relationship between patient accrual and total study duration under the alternative hypothesis of a 30% reduction in baseline mortality.

 

For example, if 870 patients were accrued (435 per arm) the accrual period would last for 5 months and 29 more months of follow-up would be required (for a total study duration of 34 months) in order to observe the required 334 events. The investigators believed that they could accrue 2412 patients onto the study. The expected study duration (accrual plus follow-up) if these accrual goals are met is 16.7 months.

 

2. Group Sequential Designs

In traditional fixed-sample studies the data are analyzed once only, when the target number of events has been achieved. For a two-sided test, at the 2.5% level, a decision in favor of the new drug would be taken if the absolute value of the log-rank test exceeded 2.24. Group sequential designs offer the possibility to analyze the data more than once so as to possibly stop the trial as soon as enough evidence has accumulated. The penalty for this added flexibility is a more stringent criterion of significance as well as a potentially larger final number of events than would be required for a fixed-sample size, should the trial fail to reject the null hypothesis at all interim analyses. The Vesnarinone trial was intended to be monitored using stopping boundaries based on the O’Brien and Fleming error spending function to maintain an alpha level of 2.5% for each of the two placebodrug comparisons.

 

2.1 The O'Brien and Fleming Stopping Boundaries

The graph below displays the boundaries to be used at each of five hypothetical equally spaced looks obtained in the spirit of the O’Brien and Fleming error spending function:

 

The x-axis indicates with which number of events (deaths) each analysis has to be performed. At each look the test statistic (log-rank) has to be compared against the corresponding boundary values and the trial will be stopped the first time the test statistic exceeds either of the stopping boundaries. In particular, if it is larger than the upper boundary then the study will be stopped and the new drug will be considered superior to the placebo. If it is smaller than the lower boundary then placebo will be considered superior than the new drug. As the funnel shape of the boundaries indicates, rather large values of the test statistic are initially required for the trial to be stopped but the strategy suggested by O’Brien and Fleming makes it easier to stop the trial as more evidence accumulates. If no early stopping occurs then the last look will be performed with 340 deaths. Notice, however, that to allow for multiple analyses, at each look the absolute value of the boundary is larger than the fixed-sample study threshold for significance, namely +2.24: in particular, at the last look, the boundary value is +2.30.

 
2.2 The Pocock Stopping Boundaries

The figure below shows analogous boundaries generated in the spirit of the error spending function proposed by Pocock.

 
The boundaries describe a horizontal threshold of value +2.67; larger than
+2.24 but much lower, at the initial looks, than the boundaries we have derived in the spirit of the O'Brien and Fleming use function. This translates into a maximum number of deaths of 395 deaths, further away than 340 obtained for the previous scenario.
 
2.3 Comparing the Three Designs in East

The interactive design window in East makes it very easy to compare side by side the characteristics of the three designs considered: the fixed-sample study (labeled Plan1), the 5-look study based on the O’Brien and Fleming use function (Plan2) and the Pocock use function (Plan3) respectively. For each design, and after having entered the design parameters in the interactive worksheet, East gives the accrual range either in terms of study duration or in terms of total number of patients. We entered 2412 as the committed accrual but as described earlier other values may have been chosen depending on the preferred balance between study duration and subjects to be accrued in order to reach the required number of events.

  
 

With the 1-look design, the study will last 16.7 months under the alternative hypothesis that Vesnarinone prolongs survivals. Should the null hypothesis be true, the target number of events will be reached sooner and the expected study duration will be 15.1 months. Since the five-look designs allow early stopping, their expected study durations under the alternative are smaller than for the 1-look design but both will require a larger number of events than the fixed-sample study should the early stopping boundaries never be crossed. Notice the impact of the shape of the boundaries on the expected accrual, study duration and number of events at termination under both the null and the alternative for the two five look designs. This different behavior of the boundaries may be further explored in terms of the probability of stopping at each of the 5 looks, which East provides at the click of a button:

 
 

The table above displays these exit probabilities for the K=5 OF design. Among the strengths of East is its ability to make available side by side detailed information describing the relative merits of competitive design options thus allowing a choice that best suits the needs of the investigation.

 
2.4 Simulating the Selected Design

Any study designed with East can be simulated under any choice of treatment differences. For clinical trials with survival endpoints it is convenient to express the treatment difference as the negative of the log hazard ratio of treatment to control. Thus, in the present case, the magnitude of the treatment difference under the alternative hypothesis is –log(lambda(T)/lambda(C)) = log(median(C)/median(T)) = log(37.28/55.15) = 0.39. If 0.39 is indeed the true treatment difference the study will have 90% power to reject the null hypothesis that treatment difference is zero. But what if we have over-estimated the true treatment difference and in fact the negative of the log hazard ratio is only 0.3? We can simulate the 5-look O’Brien-Fleming design under this assumption. The results of 1000 such simulations are displayed below.

 
 

These simulations show that the null hypothesis of no treatment difference is rejected 683 times in 1000 simulations. Thus it is seen that the power of the study would be only 68.3% if the negative of the log hazard ratio was 0.3 instead of 0.39.

 

3. Interim Monitoring of Vesnarinone Trial

Although we have designed the study assuming five equally spaced looks, any schedule of analyses can be eventually adopted when actually monitoring the study. The actual trial has been stopped at the ninth analysis, performed at the 19th month of study, with a value of the test statistic of –2.326 (two-sided p-value, p=0.02); thus a result in favor of placebo. The published report provides the dates of the interim evaluations, performed respectively after 5, 8, 12, 14, 15, 16, 17 and 18 months since the start of the trial but does not specify either the number of events available at each of them or the value of the observed test statistic. For the purpose of this example we shall assume that they had been performed with 25, 50, 105, 160, 195, 227, 240, 284 and 335 deaths respectively and that the corresponding values of the test statistic (delta/SE) were 1.2 (0.132/0.11), 1.3 (0.156/0.12), -0.6 (-0.12/0.20), -0.98 (-0.196/0.20), -0.5 (-0.095/0.19), -1.3 (-0.234/0.18), -1.7 (-0.289/0.17), -2.15 (-0.344/0.16) and –2.35 (-0.329/0.14). We may enter these values up to the fourth look into the interim monitoring worksheet corresponding to the Plan2 design to obtain the screen displayed below:

 
 

The boundaries above have been computed corresponding to the hypothesized analyses by means of the Lan and DeMets alpha-spending function methodology. A graphical display of such boundaries and of the path of the test statistic can be obtained from East and is displayed below:

 
The stopping boundary has not been crossed yet and a total of 160 deaths have been observed, corresponding to 47.1% of the projected maximum number of deaths. By invoking the alpha-spending function plot we can see how much of the type I error probability has been spent thus far. The interim monitoring spreadsheet also tells us that, given the adopted monitoring strategy, we can achieve full power at the next look if it is taken with 334 events. This horizon is revised after every analysis and provides a measure of the penalty or reward associated with the actual monitoring strategy as opposed to the 5 equally spaced looks assumed at design. By placing the vertical cursor close to 47.1% of information we can see that we have spent 0.00056 alpha. From the conditional power graph displayed in the next column (top right) we may also see what is the probability of rejecting the null hypothesis under a variety of alternative hypotheses. The graph is generated assuming that we shall perform the next (and last) analysis with the current revised horizon of events required to achieve full power (i.e., 355).

 

 

The conditional power chart reveals that the trial has a very small chance of being stopped declaring superiority of the new treatment if the true – log(lambda(T)/lambda(C)) was indeed around the hypothesized value of 0.39. On the other hand the probability to declare the placebo arm superior to Vesnarinone looks higher given the accumulated evidence.

 

4. Early Termination

Suppose we now performed the additional 5 analyses. At the ninth look the lower boundary is ?2.34 and it is crossed since the value of the test statistic is –2.35. The trial is therefore stopped and the null hypothesis is rejected in favor of the alternative that the active drug is inferior to the placebo. East would then compute the adjusted inference for the log hazard ratio, allowing for the nine interim looks at the data. The results are shown below:

 
 

The adjusted p-value is 0.023, confirming that the difference between placebo and Vesnarinone 60mg, when measured in terms of the negative log hazard ratio, is significant and in favor of placebo as expressed by the negative signs of the limits of the 95% confidence interval and by the negative sign of the median unbiased estimator.

 

Order Inquiries
To order East please call +1-617-661-2011