One of the main challenges in converting legacy data to SDTM is the application of the CDISC Standard Controlled Terminology (CDISC-CT) and other external dictionaries and standards (such as the ISO format for dates or the medical dictionary such as MedDRA for Adverse Events). In this issue of the “Good Data Submission Doctor” I would like to go through some good rules to live by when converting legacy terminology to CDISC-CT.
1. Apply Standard CDISC-CT whenever you can
This means that terminology such as YN or ISO formats e.g. for date should be applied whenever it is applicable:
- Is your Supplemental Qualifier a date? ISO format should be applied and therefore the original date should be converted to the YYYY-MM-DD ISO format (assuming here it is only containing the date part)
- Does your Supplemental Qualifier have only Yes/No results? Then convert “Yes” to “Y” and “No” to “N”
- Does your result for a specific parameter in a finding domain have only Yes/No results? Then convert “Yes” to “Y” and “No” to “N” (similarly for Negative/Positive)
2. Column E (CDISC-CT Submission Value) is the one to be ‘used’ in the dataset
My French colleagues might say “ca va sans dire” ….. but column E from the CDISC-CT excel file is the one to be used in the SDTM dataset. Column F, CDISC Synonym(s), cannot be used; the column is there to facilitate the conversion to the appropriate CDISC-CT term (CDISC-CT Submission Value)
3. CDISC-CT in define.xml
- There are terminologies with broad scope e.g. UNIT can apply to laboratory data, to medication units, etc. In define.xml you can have subset terminology e.g. LBUNIT, CMUNIT, etc. where only applicable units are listed in each subset
o If the CT subset is defined in define.xml, still reference the “Codelist Code” and “Code” as per CDISC-CT
- Only occurred terms or terms pre-printed in the CRF should be reported in the CT definition in the define.xml e.g. if from the CRF 5 races are expected but you had only WHITE subjects, all 5 expected races should be part of the CT reported in the define.xml
o Do not include all terms in the CT definition in define.xml if not all are applicable e.g. do not include all terms from the UNIT CDISC-CT (>500 terms)
4. Converting data to CDISC-CT
- Whenever a CDISC variable has a standard CT associated, every effort should be made to try to convert the original term to the corresponding term in the CDISC CT
o Exceptions could be for example with concomitant medications unit where the unit was a completely free-text and converting hundreds of term to CDISC-CT might not be worth the effort
o When the original term is converted to a corresponding term in the CDISC-CT, it is recommended to store the original term as supplemental qualifier, if not for all variables at least for the “key” ones. For example any major conversion/”interpretation” of race. One option could be also to add a table in the reviewer’s guide with the CT conversion you applied
- Some of the CT are not extensible, meaning that you cannot add new terms.
o For the extensible CT theoretically the process of adding a new term would require liaising with the CDISC-CT team and asking them to consider adding the new term; the CDISC-CT team should assess the new term and decide whether the new term should be added or not. They might come back to you refusing the proposal and suggesting alternative mapping e.g. the term you identified is a synonym of an existing term. Check with your or the sponsor’s governance team if they have any standard governance requiring this. Please note this can take some time.
o Terms such as MULTIPLE are allowed as per SDTM Ig but if you validate with P21 you will get an error if the terminology for which you applied the MULTIPLE is not extensible. This is expected and you just need to justify it in the reviewer’s guide e.g. the term MULTIPLE is allowed as per SDTM Ig 3.2 (a classic example is RACE)
5. Case of text variable in SDTM
It is recommended to upper case text variables with the following possible exceptions:
- –TEST variable where mixed case may be more readable
- When applying CDISC-CT and the CT has mixed case e.g. UNIT has a mixed-case
- Long text data such as long comments
6. Handling of CT for Quality of Life Questionnaires
- If a Quality of Life Questionnaire is standardized in the CDISC-CT, standard terminology should be used. Check here to see which questionnaires have been standardized in CDISC-CT https://www.cdisc.org/foundational/qrs
- If the questionnaire is not standardized try to follow the CDISC recommendation in the way QSTESTCD and QSTEST are defined. See here https://www.cdisc.org/system/files/members/standard/QRS/Reference/QRS_Naming_Rules_2017-06-23.xlsx
- If Questionnaire questions have to be shortened (>40 characters) the same rules as inclusion/exclusion criteria should be applied e.g. you shorten the text in QSTEST and you add a table in the reviewer’s guide with the original term and shortened version
7. Standard terminology for SDTM domain name
- Standard domains are those defined in the SDTM Ig (of the version you are using)
- Additional domain names have been “reserved” and are available in the CDISC-CT
- If the data you have to map ‘belong’ to any of the domains in the CDISC-CT, make use of the standard name and build your dataset according to the observation class they belong to e.g. events, interventions or findings
- In all other cases you can use the following conventions to name your ‘sponsor-specific’ (non-standard) domain:
• X- for events e.g. XA
• Y- for interventions e.g. YA
• Z- for findings e.g. ZH
8. Apply the most recent CDISC-CT available at the time of the start of your mapping task. Consider upgrading to the most recent CDISC-CT at the time of database lock
9. Use of other terminologies e.g. medical dictionaries
- Whenever data are coded with an external dictionary such as MedDRA, original dictionary “case” should be kept e.g. do not uppercase MedDRA variables.
10. ADaM has its own terminology
- Although limited compared to SDTM, ADaM has its own CDISC-CT e.g. for DTYPE variable
- Of course SDTM CDISC-CT should be maintained
Bonus: CT and Medical dictionary when studies are pooled to support ISS
- From the FDA technical conformance guidance :
FDA recognizes that studies are conducted over many years, during which time versions of a terminology may change. Sponsors should use the most recent version of the dictionary available at the start of a clinical or nonclinical study. It is common to have different studies use different versions of the same dictionary within the same application (e.g., NDA, BLA)……. Regardless of the specific versions used for individual studies, pooled analyses (e.g., for an integrated summary of safety) should be conducted using a single version of a terminology. The current version should be used at the time that data across studies are pooled……"
This means that in a submission package individual study SDTM packages should maintain the CDISC-CT version you used at the time of Clinical Study Report creation (if you did the analysis with CDISC standards) .
- There is not yet a standard approach on where to handle up-versioning / migration to most recent version of the terminology (or medical dictionary). Some sponsors create pooled SDTMs from individual SDTMs where they ‘align’ version differences, others apply changes directly in the ADaM.
- For Medical dictionary “it does seem”, given the availability of special variables in the ADaM OCCDS class, that the recommended approach for up-versioning medical dictionaries such as MedDRA, is ADaM.
Do you need consulting or operational support for your CDISC project? Click the button below to arrange a discussion with one of our expert team.