Interview with Alind Gupta: Transparent Machine Learning in Oncology
Cytel is hosting a webinar on Transparent Machine Learning in Oncology, on April 21, 2020. Our speaker, Alind Gupta, Machine Learning specialist, will provide insights on a particular transparent ML method called Bayesian networks, and how we have been using it for HEOR and other real world applications in oncology trials. As the adoption of machine learning is on the rise, we speak to Alind about the differences between black-box models and transparent machine learning, and how the latter is becoming more important in clinical research today. Alind also speaks about the application of ML on real-world data and how it is going to evolve in the coming years.
Machine learning (ML) aims to discover patterns from data that can be used for prediction, but the use of “black-box” ML models in healthcare research and decision-making has been limited, due to clinical liability and lack of trust from stakeholders. FDA guidelines for ML-based devices mandate transparency to assure continual safety and efficiency as notable recent failures have prompted increasing ML research into bias, fairness and causality. This has ramifications for all therapeutic areas but particularly within oncology.
Cytel: Tell us about your role at Cytel.
Alind Gupta (AG): I work as a machine learning researcher. My role requires me to collaborate with clients as well as work independently on technologies. I focus mainly on graphical models, Bayesian inference, survival models (Markov models, patient simulation) and traditional statistics work.
User-experience analysis is also an important part of my job, as I need to understand how people respond to advanced concepts in statistics and machine learning, and how they can be made more accessible to our clients.
Cytel: How is Machine Learning important in clinical research today? How would you describe its rate of adoption?
AG: Firstly, machine learning is more than just prediction, unlike the common perception around it. Machine learning is broadly about deriving knowledge from data and learning how to make optimal decisions, and only then comes its usefulness for prediction.
As for its rate of adoption or its usage, every big pharma company today has a data science or machine learning division that is primarily working on real-world data. Many hospitals are partnering with machine learning companies to improve patient care. Also, regulatory agencies like the FDA are providing guidelines for machine learning practices and have already cleared a few machine learning software.
There is increasing interest in machine learning and it is being used for a lot of different purposes. At Cytel, we are using real-world data for informing adaptive trials. So, machine learning can be used as a part of analyzing real-world data to inform clinical trial design and adaptive trials, for looking at things like eligibility criteria, endpoints and surrogates for planning a trial.
People are looking at genomics and proteomics data for matching, for example, drugs to patient profiles. One of the things of interest in precision oncology is identifying whether certain genetic variants or biomarkers are prognostic of response to therapy. I know of companies that are working on drug development too.
Cytel: What is a black-box machine learning model? How is it useful and what are its challenges?
AG: A black-box model takes some input and transforms it into an output in the form of a prediction, but it is difficult to understand exactly how it is working on the inside. Patterns learned by a black-box model can be too complicated for humans to understand, particularly for subject matter experts like clinicians. These kinds of models are widely used in machine learning for example, deep neural networks. They are very flexible and have a high capacity for learning very complicated patterns in data. But it is very difficult to look on the inside and say, "This is exactly what this model is doing and this is how it is using the data to convert it into an output like a prediction."
It is also challenging to build trust in such a model because it is hard to understand the biases and the limitations of where this model is likely to fail. Additionally, it is difficult to troubleshoot when a model fails. There is also limited regulatory approval. There are cases of FDA clearances that took seven years for example, because they involved black-box models. This means it takes longer for registry approval and research and development.
Cytel: Can you explain the concept of transparent machine learning? How does it apply to real-world data?
AG: Transparent machine learning is simply building interpretable models from the ground up. So, these are models that are not black boxes or are models where a clinician can investigate and reason about what the model is doing, when it might fail, and why it made a specific decision. You should be able to show it to a clinician and ask, "Does it make sense to you? Does it align with how you might make a decision about a particular problem?" Using transparent machine learning builds trust. It is more useful for communication and for vetting your model from different angles, by different people who are going to be associated with it. It is also generally accepted that if a transparent machine learning model performs as well in terms of prediction as a black-box model, there is no reason why you shouldn't be using the transparent model over a black box model.
In my upcoming webinar, I will be speaking about a specific kind of transparent model that I work with, known as Bayesian networks. They have the additional advantage of being able to work with real-world data (generally, real-world data has a lot of missing data) without the need to impute missing values. Additionally, they can work with a lot of variables like in the case of real-world data, without doing variable selection in advance. These kinds of models are also flexible enough to work with temporal data.
If we particularly look at immuno-oncology, one of the challenges is that there is a large amount of patient-level or individual-level heterogeneity. Using this kind of a method allows you to understand that heterogeneity, and you can predict which patients will most likely respond, what potentially causes adverse events, what variables are prognostic of survival and so on.
Cytel: Can you tell us about a project where we are supporting a client with transparent machine learning technology?
AG: We have an ongoing project supporting a clinical trial for advanced cancer. The client wanted to identify subgroups that are most likely to respond to immunotherapy. In this case, a clinical trial is better than real-world data because even though the sample size is small, there is a large diversity of variables that are measured. We also have randomized patient assignment, which real-world data does not.
Our objective is to build a machine learning model on top of this clinical trial data set to predict multiple outcomes. We are looking at adverse events and overall survival in these patients over three years, using just baseline variables. We are also trying to predict which patients will survive and which patients will develop adverse events at baseline, given several patient characteristics. Our team is interested in predictive performance, looking at variables that are prognostic for survival and adverse events, and in identifying the subgroup of patients that are most likely to respond to treatment.
In addition to the clinical trial, we also used real-world data that we were able to get for external validation of our model. Currently, we are working on how we can predict long-term responders and a patient's survival and response over its lifetime, and not just three years. Our client is interested in using this as an engine for driving health economic evaluations for HTA submissions.
Cytel: How do you envision the future of machine learning in the life sciences industry?
AG: I think there is going to be a wider usage of machine learning in all aspects of clinical research. There are many problems that we can potentially address with machine learning and there is no one-size-fits-all. So, I believe the onus is on machine learning to be able to show that it can provide an added benefit to the things that are already being done, while simultaneously taking care of things like bias and safety. So, I envision broader incorporation which at the same time is slow, careful and systematic with a lot of validation to make sure that the long-term efficacy and safety of these methods is proven before they are totally incorporated and adopted by people.
Join Cytel for a complimentary webinar on Transparent Machine Learning in Oncology. Alind Gupta will present our continuing work in immuno-oncology using Bayesian network models for predicting safety and survival outcomes, extrapolating from limited follow-up data and validating with external real-world data for key subgroups. We will also present ways to incorporate subject-matter expertise and causality, and address ways to enhance transparency and communication for stakeholders.
About Alind Gupta
Alind Gupta is a Machine learning specialist at Cytel in Toronto, Canada focusing on probabilistic graphical models and Bayesian inference. His current work focuses on the use of Bayesian networks and Markov models for modelling heterogeneity in response to cancer immunotherapy and for long-term survival prediction using clinical trial and real-world data. Alind has a PhD from the University of Toronto studying genetics of rare diseases.