Home
East
|
 |
East® 4: Example 2
|
|
Case Study: Vesnarinone Trial Among Patients with Heart Failure
Vesnarinone is a drug that enhances contractility of
the heart. It was shown in short-term placebo controlled
trials, to relieve symptoms and improve quality of life and
prognosis in patients with a history of heart failure
despite conventional therapy. However, the same studies
raised concern over the occurrence of side effects as well
as a possible adverse effect on mortality with high dose
regimens. This prompted the realization of the trial
described below which was published in the New England
Journal of Medicine, on December 17, 1998, under
the title “A dose-dependent increase in mortality with
Vesnarinone among patients with severe heart failure”,
authored by JN Cohn, SO Goldstein et al. A total of 189
centers in the United States and Canada accrued 3833
patients to the trial over a 15 months period between
1995 and 1996. Randomization occurred between
conventional medication (placebo arm), conventional
medication plus 30 mg of Vesnarinone per day and
conventional medication plus 60 mg of Vesnarinone per
day. The primary outcome variable was mortality from
all causes. This three arm trial was planned to accrue
3618 patients and to be analyzed sequentially at 6 month
intervals using the O’Brien and Fleming spending
function and was planned to end if 232 deaths had
occurred in the placebo group.
- The Fixed-Sample Design
- Group-Sequential Designs
- Interim Monitoring of Vesnarinone Trial
- Early Termination of Vesnarinone Trial
|
|
|
1. The Fixed-Sample Design
The investigators expected a baseline mortality of
20% at one year in the placebo group, corresponding to a
median survival time of 37.28 months. Vesnarinone was
expected to bring the one-year mortality down to 14% (a
30% reduction) with either of the two dose levels considered,
corresponding to a median survival time of 55.15
months. An overall significance level of 5% on a twosided
test was desired. A pair-wise comparison of each of
the two active drug groups with the placebo group
implied a Bonferroni adjustment to the alpha level for
each two-sided comparison, which was then set to 2.5%.
The study was required to have 90% power to detect the
difference of interest. Based on the published data we
may assume an average accrual rate of 260 patients per
month. For the purpose of this example, and in line with
the way the study has been planned, we shall restrict our
interest to only one pair-wise comparison, the placebo
versus the 60 mg Vesnarinone arm. We may therefore
assume an accrual rate of two-thirds of 260, that is, of
174 patients per month.
|
| |
|
|
|
In studies where time to event (death in our example)
is the outcome of interest, it is important to find the
right balance between patients enrolled and total study
duration, since the power of the study is related to the
number of events observed rather than the number of
patients accrued. For the placebo-drug comparison of
interest, and assuming no interim looks (fixed-sample
study), East estimates that in order to achieve 90% power
the trial should be stopped when a total of 334 events
have been observed. But how long do we expect to wait
in order to observe 334 events? That would depend on
the number of
patients
accrued. The
smaller the
number of
patients the
longer it will
be necessary to
follow them
up in order to
observe the
required
number of events. East provides a graphical representation
of the relationship between patient accrual and total
study duration under the alternative hypothesis of a 30%
reduction in baseline mortality.
|
| |
|
For example, if 870 patients were accrued (435 per
arm) the accrual period would last for 5 months and 29
more months of follow-up would be required (for a total
study duration of 34 months) in order to observe the
required 334 events. The investigators believed that they
could accrue 2412 patients onto the study. The expected
study duration (accrual plus follow-up) if these accrual
goals are met is 16.7 months. |
| |
2. Group Sequential Designs
In traditional fixed-sample studies the data are
analyzed once only, when the target number of events
has been achieved. For a two-sided test, at the 2.5% level, a
decision in favor of the new drug would be taken if the
absolute value of the log-rank test exceeded 2.24. Group
sequential designs offer the possibility to analyze the data
more than once so as to possibly stop the trial as soon as
enough evidence has accumulated. The penalty for this
added flexibility is a more stringent criterion of significance
as well as a potentially larger final number of
events than would be required for a fixed-sample size,
should the trial fail to reject the null hypothesis at all
interim analyses. The Vesnarinone trial was intended to
be monitored using stopping boundaries based on the
O’Brien and Fleming error spending function to maintain
an alpha level of 2.5% for each of the two placebodrug
comparisons.
2.1 The O'Brien and Fleming Stopping Boundaries
The graph below displays the boundaries to be used at
each of five hypothetical equally spaced looks obtained in
the spirit of the O’Brien and Fleming error spending
function:
|
| |
|
|
The x-axis indicates with which number of events (deaths) each analysis has to be performed. At each look the test statistic (log-rank) has to be compared against the corresponding boundary values
and the trial will be stopped the first time the test statistic
exceeds either of the stopping boundaries. In particular,
if it is larger than the upper boundary then the study will
be stopped and the new drug will be considered superior
to the placebo. If it is smaller than the lower boundary
then placebo will be considered superior than the new
drug. As the funnel shape of the boundaries indicates,
rather large values of the test statistic are initially required
for the trial to be stopped but the strategy suggested
by O’Brien and Fleming makes it easier to stop the
trial as more evidence accumulates. If no early stopping
occurs then the last look will be performed with 340
deaths. Notice, however, that to allow for multiple
analyses, at each look the absolute value of the boundary
is larger than the fixed-sample study threshold for
significance, namely +2.24: in particular, at the last look,
the boundary value is +2.30.
|
| |
| 2.2 The Pocock Stopping Boundaries
The figure below shows analogous boundaries generated
in the spirit of the error spending function proposed
by Pocock. |
| |
The boundaries describe a horizontal threshold of value +2.67; larger than
+2.24 but much lower, at the initial looks, than the boundaries we have derived in the spirit of the O'Brien and Fleming use function. This translates into a maximum number of deaths of 395 deaths, further away than 340 obtained for the previous scenario. |
| |
| 2.3 Comparing the Three Designs in East
The interactive design window in East makes it very
easy to compare side by side the characteristics of the
three designs considered: the fixed-sample study (labeled
Plan1), the 5-look study based on the O’Brien and
Fleming use function (Plan2) and the Pocock use
function (Plan3) respectively. For each design, and
after having entered the design parameters in the interactive
worksheet, East gives the accrual range either in
terms of study duration or in terms of total number of
patients. We entered 2412 as the committed accrual but
as described earlier other values may have been chosen
depending on the preferred balance between study
duration and subjects to be accrued in order to reach
the required number of events.
|
| |
 |
| |
|
With the 1-look design, the study will last 16.7 months
under the alternative hypothesis that Vesnarinone
prolongs survivals. Should the null hypothesis be true,
the target number of events will be reached sooner and
the expected study duration will be 15.1 months. Since
the five-look designs allow early stopping, their expected
study durations under the alternative are smaller than for
the 1-look design but both will require a larger number
of events than the fixed-sample study should the early
stopping boundaries never be crossed. Notice the impact
of the shape of the boundaries on the expected accrual,
study duration and number of events at termination
under both the null and the alternative for the two five look
designs.
This different behavior of the boundaries may be
further explored in terms of the probability of stopping
at each of the 5 looks, which East provides at the click of
a button:
|
| |
 |
 |
| |
|
The table above displays these exit probabilities for the K=5 OF design. Among the strengths of East is its ability to make available side by side detailed information describing the relative merits of competitive design options thus allowing a choice that best suits the needs of the investigation. |
| |
| 2.4 Simulating the Selected Design
Any study designed with East can be simulated under
any choice of treatment differences. For clinical trials
with survival endpoints it is convenient to express the
treatment difference as the negative of the log hazard
ratio of treatment to control. Thus, in the present case,
the magnitude of the treatment difference under the
alternative hypothesis is –log(lambda(T)/lambda(C)) =
log(median(C)/median(T)) = log(37.28/55.15) = 0.39. If
0.39 is indeed the true treatment difference the study will
have 90% power to reject the null hypothesis that treatment
difference is zero. But what if we have over-estimated
the true treatment difference and in fact the
negative of the log hazard ratio is only 0.3? We can
simulate the 5-look O’Brien-Fleming design under this
assumption. The results of 1000 such simulations are
displayed below.
|
| |
 |
| |
|
These simulations show that the null hypothesis of no
treatment difference is rejected 683 times in 1000 simulations.
Thus it is seen that the power of the study would
be only 68.3% if the negative of the log hazard ratio was
0.3 instead of 0.39.
|
| |
3. Interim Monitoring of Vesnarinone Trial
Although we have designed the study assuming five
equally spaced looks, any schedule of analyses can be
eventually adopted when actually monitoring the study.
The actual trial has been stopped at the ninth analysis,
performed at the 19th month of study, with a value of the
test statistic of –2.326 (two-sided p-value, p=0.02); thus a
result in favor of placebo. The published report provides
the dates of the interim evaluations, performed respectively
after 5, 8, 12, 14, 15, 16, 17 and 18 months since the
start of the trial but does not specify either the number
of events available at each of them or the value of the
observed test statistic. For the purpose of this example
we shall assume that they had been performed with 25,
50, 105, 160, 195, 227, 240, 284 and 335 deaths respectively
and that the corresponding values of the test
statistic (delta/SE) were 1.2 (0.132/0.11), 1.3 (0.156/0.12), -0.6 (-0.12/0.20), -0.98 (-0.196/0.20), -0.5 (-0.095/0.19), -1.3 (-0.234/0.18), -1.7 (-0.289/0.17), -2.15 (-0.344/0.16) and –2.35 (-0.329/0.14). We may enter these values up to the fourth
look into the interim monitoring worksheet corresponding
to the Plan2 design to obtain the screen displayed
below:
|
| |
 |
| |
|
The boundaries above have been computed corresponding
to the hypothesized analyses by means of the
Lan and DeMets alpha-spending function methodology.
A graphical display of such boundaries and of the path of
the test statistic can be obtained from East and is displayed
below: |
| |
 |
| The stopping
boundary has not
been crossed yet and
a total of 160 deaths
have been observed,
corresponding to
47.1% of the projected
maximum
number of deaths.
By invoking the
alpha-spending
function plot we can see how much of the type I error
probability has been spent thus far. The interim monitoring
spreadsheet also tells us that, given the adopted
monitoring strategy, we can achieve full power at the
next look if it is taken with 334 events. This horizon is
revised after every analysis and provides a measure of the
penalty or reward associated with the actual monitoring
strategy as opposed to the 5 equally spaced looks assumed
at design.
By placing the vertical cursor close to 47.1% of information
we can see that we have spent 0.00056 alpha. From the
conditional power graph displayed in the next column
(top right) we may also see what is the probability of
rejecting the null
hypothesis under a
variety of alternative
hypotheses. The
graph is generated
assuming that we
shall perform the
next (and last)
analysis with the
current revised
horizon of events required to achieve full power (i.e., 355).
|
|
|
| |
 |
|
The conditional power chart reveals that the trial has a very small chance of being stopped
declaring superiority of the new treatment if the true – log(lambda(T)/lambda(C)) was indeed around the hypothesized value of
0.39. On the other hand the probability to declare the placebo arm superior to Vesnarinone looks higher given the accumulated evidence.
|
| |
4. Early Termination
Suppose we now performed the additional 5 analyses.
At the ninth look the lower boundary is ?2.34 and it is
crossed since the value of the test statistic is –2.35. The
trial is therefore stopped and the null hypothesis is rejected
in favor of the alternative that the active drug is
inferior to the placebo. East would then compute the
adjusted inference for the log hazard ratio, allowing for
the nine interim looks at the data. The results are shown
below: |
| |
 |
| |
|
The adjusted p-value is 0.023, confirming that the
difference between placebo and Vesnarinone 60mg, when
measured in terms of the negative log hazard ratio, is
significant and in favor of placebo as expressed by the
negative signs of the limits of the 95% confidence interval
and by the negative sign of the median unbiased estimator.
|
|
|
 |
|
|