About three years ago, Cytel was helping a sponsor on a project where I had to conduct surveillance of some CRO deliverables, mainly for SDTM and ADaM packages. At first, I was involved in the review cycle of SDTM, and began by reviewing some initial mapping specifications including a draft SDTM Annotated CRF. The CRO in charge was quite experienced and there was nothing major to spot in all the different versions I had to review.
Surprisingly, it was not the case some months later, when I had to provide the same review support for the Biostatistics deliverables, specific to the ADaM package. The ADaM datasets overall were well designed, and there were no major open non-conformance issues. However, it was clear from the very beginning that there was something missing - a missing link between SDTM and ADaM.
The problem with data and metadata traceability
“An important component of a regulatory review is an understanding of the provenance of the data. Traceability permits an understanding of the relationships between the analysis results, analysis datasets, tabulation datasets, and source data” (FDA Study Data Technical Conformance Guidance).
When looking at the SDTM and ADaM packages together, there are three types of data (see figure below):
- SDTM Only: Source variables and records not used in ADaM, i.e. Screen Failures, IE domain, Suppl. Lab. Data, etc.
- SDTM and ADaM: Variables and records are copied from SDTM to ADaM
- ADaM Only: Derived or assigned variables/new records added in ADaM for the purpose of analysis
For point 1, we assume there is a definite reason for not bringing records and variables into ADaM, for example, when they are not needed in the analysis and instead, we have in place a strong quality control process.
Point 2 and 3 can pose some traceability problems.
Point 2 can be “easily” checked with appropriate tools or ad-hoc programs. For example, you can check if a variable in ADaM has define-xml Origin=Predecessor, then the variable should have the same content and attributes of the source (predecessor) variable in SDTM. So, not only the variable in ADaM must have the same label, type, etc. but also, most importantly, have the same meaning.
Moreover, if the variable has a code-list in the define-xml for SDTM, I would assume the same code-list applies in ADaM and therefore, the same should be available in the define-xml for ADaM. Of note, the same traceability should be guaranteed when the source of an ADaM dataset is another ADaM dataset i.e. an intermediate ADaM dataset1.
Application of Point 3 instead requires a more accurate manual and independent review, and there the traceability can be guaranteed by a number of actions which includes the creation of accurate documentation and clear pattern of derivations.
The missing link between SDTM and ADaM and the gap in the current validation tools
One of the common issues I observed in the ADaM package of the sponsor, was the missing link between SDTM and ADaM, particularly, in the documentation. Quite often, there are discrepancies between SDTM define-xml and ADaM define-xml, and also between the Clinical Study Data Reviewer Guide for SDTM (cSDRG) and the Analysis Data Reviewer Guide for ADaM (ADRG).
Define-xml for SDTM and define-xml for ADaM are usually treated by sponsors and vendors as two independent and separate deliverables; the same is true for the cSDRG and the ADRG. However, there are many repeated information between SDTM and ADaM datasets; the same is true for their documentation. This often creates inconsistencies, for example, between the same information that is reported in SDTM define-xml and ADaM define-xml.
Some of this sort of ‘redundant’ information, needs to be verified manually i.e. check if the version of the MedDRA dictionary reported in the “Study Data Standards and Dictionary Inventory” is the same as in cSDRG and ADRG.
For data and metadata (although machine-readable) , unless you have a complex linked-metadata repository system (that’s the idea beyond the CDISC 360 project2), these kind of discrepancies are difficult to spot in the review process and validation process, if for example you only “trust” the conformance rules implemented in the “de facto” standard validation tools, such as Pinnacle21. In fact, these checks do not spot the discrepancies given the fact that except for few cross-checks, the ADaM package validation is performed independently from the SDTM package validation.
Define-xml and reviewer guide(s) are two important and strategic elements for the success of your data submission. Their quality is your “business card” to the agency. One of the key factors is “consistency” of the information reported in SDTM and ADaM package. Discrepancies between the two might be spotted by the reviewer and result in questions, on the other hand, failing to spot these discrepancies can potentially generate misunderstandings.
This is a topic of a poster, “Please Take Care of your Metadata” which I will present at the upcoming PhUSE EU on November 10th, 9:00 am - 11:30 am. I will share some examples of missing links and how to check for them.
Cytel will be contributing to a rage of events at PhUSE EU Connect, on the 9th - 13th November 2020. Join us to take part in the sessions by clicking on the button.
- “ADaM Intermediate Dataset: how to improve your analysis traceability”, A. Tinazzi and T. Curto, PHUSE-US CONNECT 2018, Raleigh
About Angelo Tinazzi
Angelo Tinazzi is Senior Director, Statistical Programming, Clinical Data Standards and Clinical Data Submission at Cytel. He is a well- published and recognized expert in statistical programming with over 20 years' experience in clinical research. The application of CDISC standards in different therapeutic areas is part of his core expertise since 2003 in particular in the context of data submission to health authorities such as the FDA and PMDA.
Angelo is an authorized CDISC instructor and member of the CDISC ADaM Team as well as the CDISC European Committee where he also manages the Italian-speaking CDISC User Network.