 |
Only in LogXact® 7
Firth’s Penalized Maximum Likelihood Method (PMLE)
Previous to LogXact 7, LogXact Version 6 had two methods for estimating coefficients in unstratified logistic regression models:
- Maximum Likelihood (ML), and
- Exact (EX) based on conditioning out nuisance parameters.
When data are visualized as points in covariate space if it is possible to find a hyper-plane that separates responses from non-responses, separation is said to have occurred. For small and medium-sized datasets it is fairly common to encounter the phenomenon of separation. Paul Allison [1] gives an excellent explanation of both the phenomenon and reasons for its common occurrence in practice.
ML estimates for coefficients are infinite for datasets that exhibit separation. This is undesirable as in most applications it is very unlikely that a covariate is a perfect predictor for response.
When there is separation the EX procedure in LogXact computes the Median Unbiased Estimator (MUE) for the coefficient. These estimates are finite but practitioners have remarked that they can be unreliable in certain situations.
To overcome these shortcomings LogXact 7 provides an exceptional method for estimating coefficients in unstratified logistic regression:
3. Firth’s procedure using Penalized Likelihood (PMLE)
Firth’s method [2] estimates the coefficients by finding the maximum of a penalized likelihood function that is constructed by adding a penalty term to the log-likelihood. The penalty term equals the logarithm of the square-root of the determinant of the information matrix. The method was originally developed to reduce bias in the ML estimate. Heinze and Schemper [3] have shown through extensive simulation that it provides good estimates for logistic regression coefficients for datasets that exhibit separation.
Example: Osteogenic Sarcoma
To illustrate the usefulness of the Firth procedure let us look at a dataset from a 46-patient study of non-metastatic osteogenic sarcoma by Goorin, Perz-Atayde, Gebhardt, and Andersen. (Details are provided in the LogXact manual.) They were interested in determining the predictors for a three year disease-free interval (DFI3). The covariates of interest were GENDER, any osteoid pathology (AOP), and lymphocytic infiltration (LYINF).
This dataset exhibits separation and so the maximum likelihood estimates for a logistic regression model with these covariates do not exist as shown by the LogXact output given below. Notice that EX gave conditional MLE estimates for GENDER and AOP but the conditional MLE estimate for LYINF does not exist so that an MUE was computed.

In LogXact 7, selecting the Firth procedure, we get the following output:

Notice that the penalized ML estimates using Firth’s method produces finite values for all the coefficients. The point estimates are quite close to the CMLE estimates computed by EX for GENDER and AOP. There is an appreciable difference for the LYINF coefficient. Simulation experiments we have conducted suggest that the Firth estimate is likely to be a more reliable point estimate than the MUE.
References:
- Allison, Paul (2004) Convergence Problems in Logistic Regression Chapter 10 (pp 238-252) in Numerical Issues in Statistical Computing for the Social Scientist, Altman, Gill, and McDonald(ed.) Wiley-Interscience (2004)
- Firth, D. (1993) Bias Reduction of Maximum Likelihood Estimates. Biometrika, 80, 27-38.
- Heinze, G and Schemper, M (2002) A solution to the problem of separation in logistic regression. Statistics in Medicine, 21, 2409-2419.
|
 |
|