Skip to content

Opening the Black Box: Moving to Explainable AI


Nowadays, it’s difficult to pick up a mainstream newspaper or read an industry publication without seeing reference to Artificial Intelligence or AI and progress towards innovations like autonomous vehicles, or customer behavior prediction. For the biopharma industries specifically, AI represents an opportunity to avert the R&D productivity crisis with paradigm-shifting applications such as in-silico drug design, prediction of trial risks and big data analytics.
However, with every opportunity, there are risks and challenges, and in this blog, I will discuss how pharma needs to address the opacity of AI to ensure trust and credibility with all stakeholders.

Silhouette of human head with gears mechanism instead of brain

Black box AI
One of the challenges of many cutting-edge AI algorithms is that they operate as ‘black boxes' whereby the algorithms are only capable of being understood by computers themselves, and the decision-making process is opaque. It can be impossible to determine why the system made a particular decision rather than opting for another outcome; when the system succeeded and when it failed and how to correct an error.
This lack of transparency is very problematic in a highly regulated industry like biopharma, not least in medical situations where, for example, a clinician needs to know the reasoning behind a particular decision, or regulators are reviewing datasets submitted to them as part of a regulatory submission package.
Further, the European GDPR legislation introduced in 2018 has brought forward the right of explanation for individuals to obtain “meaningful information of the logic involved" when automated decision-making takes place.[referenced in Articles 13, 14 and 15]. Although the understanding of the regulations is still evolving, one interpretation could be that this effectively legislates against systems whose decisions are unexplainable.


"...Explainable models build the foundation of trust for healthcare stakeholders..."


Explainable AI

Against this backdrop, across all industries, there is a call for AI that produces explainable models while maintaining accurate predictions. It is vital that humans can understand and manage the emerging generation of artificially intelligent systems, while still harnessing their power.
One of the critical issues at stake is trust- explainable models build the foundation of trust for healthcare stakeholders from regulators, to drug developers to move forward and adopt these systems. Producing explainable systems also brings protection against adversarial attacks, which in turn plays back into the issue of trust.
So what options are available when looking to explain AI models, that are by their very nature complex and not easily understandable by humans? The idea has more than one interpretations that are relevant to different problems. Two such interpretations are explanation by design and explanation of black boxes.

"Explanation by design" is the problem where the data scientist, who is developing the model, must explain the logic. The explanation of the model is another deliverable along with the model and could be global or local in nature. A globally interpretable model is one where the logic is understandable for all outcomes. On the other hand, a locally interpretable model is one where it is possible to understand only specific decision outcomes.

On the other hand, the "black box explanation approach" refers to a post-hoc reconstruction of the explanation of the decisions made by a black box algorithm. This is a harder problem since the original data on which the algorithm was trained, is usually not available. Besides, the internal workings are hidden, given that it is a black box.

Below, are a couple of examples of explainable AI methods in use.For more details about these algorithms, please refer to the sources listed under References

REverse Time AttentIoN model (RETAIN)applied to Electronic Health Records (EHR) data
This model achieves high accuracy and importantly remains clinically interpretable. It is based on a two-level neural attention model that detects which physician visits were influential, and which diagnoses were important within them. The model relies on an attention mechanism modeled to represent the behavior of physicians. This algorithm is an example of "explanation by design."
It demonstrated predictive accuracy and scalability when tested on a large health system EHR dataset with 14 million visits completed by 263K patients over eight years.

Local Interpretable Model-Agnostic Explanations (LIME)
LIME seeks to provide explanations for individual predictions. It approximates the black box model using a simple interpretable model that operates locally (i.e., works specifically regarding the prediction that we need to explain). To accomplish this, we need to “perturb” the inputs and observe how this affects the model’s outputs. This algorithm is an example of the "black box explanation approach."

In Cytel's 2018 industry report " Data Science Bridging the Gap Between Controlled Experiment and Real World Evidence" trust emerged as one of the critical issues to overcome across the spectrum of data science techniques and applications. For artificial intelligence, in particular, it is essential that we develop solutions to explain cutting-edge algorithms and preserve human decision-making oversight. By opening the black box and developing trustworthy approaches, we will ensure that novel techniques remain in line with human values and expectations, and therefore allow their adoption to progress.

Cytel's data science team regularly works with customers on a variety of biomarker discovery, signal detection, and data mining projects. To check out our data science brochure click the button below.

 Data science

Ribeiro, M. T., Singh, S., Guestrin, C., “Why Should I Trust You?” Explaining the Predictions of Any Classifier, 2016, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144

 About the Authors


 Munshi Imran Hossain  is a Senior Analyst at Cytel. He has about 7 years of experience working in the areas of software development and data science.  He holds an M. Tech in Biomedical Engineering, from IIT Bombay. His interests include processing and analysis of biomedical data. Outside of work, he enjoys reading.