As of March 2023, specifically for any study started on or after March 15, 2023,1 for the submission of SEND, SDTM, and ADaM packages, the FDA recommends the use of Define-XML 2.1 (while this is not yet the case for PMDA).2, 3, 4However, since Data Requirement Ends for Define v2.0 are not available, submitting data using Define v2.0 is still acceptable.
The landing page of the CDISC Define-XML 2.1 anticipates six major updates:
Updated approach to def:Origin
Identification of the standards and controlled terminology
Added support for sub-classes (for ADaM)
Improved SENDIG support
v2.0 errata fixes
In this article, I will focus on items 1, 2, 3, and 6.
Updated Approach to def:Origin
In my opinion, among the six points, the most significant change is the modification of the "Origin" element. This change is noteworthy because it not only introduces a new attribute but also replaces the previous method of describing the origin of a variable or value-level. In the previous version of Define-XML, only the "Type" attribute was available, whereas now the Origin element includes an additional attribute called "Source." According to the Define-XML 2.1 standard document (section 184.108.40.206/220.127.116.11), both attributes are required, except in cases where the Type is "Predecessor."
Sections 18.104.22.168 and 22.214.171.124 of the Define-XML standard document contain two tables that illustrate how these two attributes should be used to identify various sources for SDTM and ADaM, respectively. While this change does not significantly impact ADaM, where the previous Type remains the same and the Source is always "Sponsor," it does require some additional effort for SDTM Define-XML.
In Define-XML 2.0, a variable in SDTM or a value-level could have the following origin:type:
However, in Define 2.1, you could have the combination of origin:type and origin:source illustrated in the following table.
Table 1: Change to "Origin" Element
Collected directly in the CRF
Collected by the subject through an instrument
Received from a central lab, e.g., labs, ECG
Calculated within an EDC, e.g., BMI calculated or by an instrument, e.g., questionnaire section scores within an ePRO
Calculated in SDTM, e.g., EPOCH
Coding terms such as MedDRA for AE, or assigned through a third-party adjudication process, e.g., best tumor response in an oncology trial
Assigned in SDTM, e.g., --TESTCD, DOMAIN
Not directly collected but it could be assigned by protocol, e.g., VSPOS (Vital Signs Position)
The Origin:Type has been modified in the Define-XML controlled terminology, and the addition of Origin:Source was introduced starting from version 2020-03-27. Furthermore, Origin:Type now allows the use of "Not Available" and "Other," which were added in version 2021-03-26.
To illustrate the differences in the identification of the "origin" for certain variables in the DM domain, Figures 1 and 2 show two examples extracted from the sample Define-XML provided with the Define-XML standard versions 2.0 and 2.1, as rendered by the Define-XML stylesheet.
Figure 1: SDTM DM Portion of Define-XML Using Version 2.0
Figure 2: SDTM DM Portion of Define-XML Using Version 2.1
The following example illustrates the portion of Define-XML describing Origin element of SUBJID in version 2.0 vs version 2.1.
Figure 3: Example of How Origin=CRF Is Now Defined in Define-XML 2.1
Identification of the Standards and Controlled Terminology
With Define-XML 2.0, you had the opportunity to specify the standard used in the whole package through the MetaDataVersion element and its StandardName and StandardVersion attributes; see Figure 4 from the sample package included with the Define-XML 2.0.
Figure 4: Identification of Standard Used with Version 2.0
Prior to Define-XML 2.1, the CDISC controlled terminology version used in the CDISC package, either in SDTM, SEND, or ADaM, was typically specified in the reviewer guide or, in SDTM, it was assumed to be the version referenced in the TS domain through the TSVCDREF and TSVCDVER parameters.
Version 2.1 had deprecated the two attributes highlighted in Figure 4, and a new def:Standards element was introduced. Figure 5 is an example of how the new element is displayed in a new section of the rendered define-xml.
Figure 5: New “Standards” Section in Define-XML 2.1
Furthermore, the datasets’ metadata, through the ItemGroupDef element, now includes a new attribute called def:StandardsOID. This attribute allows you to reference one of the previously declared standards. The example in Figure 6, shows that in SDTM package, in addition to standard domains from the referenced SDTM IG, we also made use of additional SDTM domains specified in a separate IG, the Medical Device IG. Furthermore, we also have one domain, XS, that is not standard; this is specified through a new attribute def:IsNonStandard (it will be “Yes” in this case indicating XS is not a standard domain in any SDTM IG).
Figure 6: Datasets’ Metadata Portion Indicating for Each Domain which Standard Was Used
Similarly, we could also have a code list including standard terms from multiple CDISC controlled terminology versions, although I don’t see the reason to have more than one CDISC controlled terminology version used in the same CDISC package.
Added Support for Sub-Classes
A new def:Subclass attribute was added to further classify/group datasets. This is currently only applicable to ADaM. Currently, the latest Define-XML controlled terminology version (2022-12-16) has defined controlled terminology for three ADaM classes as described in Table 2.
Table 2: The New Sub-Class Terminology
BASIC DATA STRUCTURE
BASIC DATA STRUCTURE
POPULATION PHARMACOKINETIC ANALYSIS
BASIC DATA STRUCTURE
MEDICAL DEVICE BASIC DATA STRUCTURE
MEDICAL DEVICE TIME-TO-EVENT
OCCURRENCE DATA STRUCTURE
The Define-XML 2.1 contains other updates well-documented in section 1.1.3 “Relationship to Prior Define-XML Specifications.”
We have, for example, the possibility to specify a new attribute def:HasNoData for datasets (ItemGroupDef element), see Figure 7, and for variables (ItemRef element) metadata, see Figure 8.
Figure 7: HasNoData for Datasets
Figure 8: HasNoData for Variables
The addition of the HasNoData attribute contradicts a “historical” recommendation provided in the SDTM IG and in particular the following sentence:
“In the event that no records are present in a dataset (e.g., a small PK study where no subjects took concomitant medications), the empty dataset should not be submitted and should not be described in the Define-XML document. The annotated CRF will show the data that would have been submitted had data been received; it need not be reannotated to indicate that no records exist.”
However, that sentence has been removed in the latest version of the SDTM Implementation Guide (SDTM IG 3.4), aligning it with the recommendation provided in the CDISC SDTM Metadata Submission Guideline v2.0.5
The Define-XML 2.1 contains some good improvements for both human and machine “readability.”
The ADaM-specific Analysis Results Metadata (ARM)6 are not yet incorporated in the main Define-XML 2.1, therefore, when using Define-XML 2.1, you should still reference the ARM standard in the ODM element, as shown in Figure 9.
Figure 9: Referencing the ARM Standard
Most off-the-shelf software already have integrated support for the new version of Define-XML, including the necessary metadata. For internally developed software that relies on ad-hoc metadata repositories and tools like SAS macros, adopting Define-XML 2.1 may require additional metadata and the deprecation of certain elements from the previous version.
For those planning to move to Define-XML 2.1 for a study that has already been started and making use of Define-XML 2.0 (though migration is not required if your study started before March 15, 2023), migration from 2.0 to 2.1 is pretty straightforward, or that’s at least our current experience with a number of sponsors we have supported adopting and / or migrating to Define-XML 2.1.
I would like to express my gratitude to Steve Wong from my team for his valuable contributions and efforts in conducting the “investigations” discussed in this article.
Interested in learning more about data submission? Download our complimentary new ebook, The Good Data Doctor on Data Submission and Data Integration:
1 In the CDISC and data submission context, the study start date is defined as the date of the first signed informed consent (SSTDTC parameter in TS SDTM dataset).