Over the past years, probably the entire last decade, there have been several discussions on how to handle multiple subjects’ enrollments in CDISC data packages. Members of the CDISC SDS Multiple Subject Instances (MSI) team also shared some previews of future possible modifications of the SDTM standard to handle multiple subjects’ enrollment,[i] and we might finally have something available with the upcoming future releases of the SDTM standard and Implementation Guidance (IG).
Meanwhile, several sponsors and vendors have shared their experiences dealing with different scenarios of multiple subjects and, in some cases, provided alternative solutions,[ii][iii][iv][v] with the PHUSE “SDTM ADaM Implementation FAQ Data Submission” working group[vi] also providing some recommendations.
This topic — without any clear direction and standard solutions and allowing no major conformance issues — is causing a lot of “headaches” to sponsors and vendors. Nevertheless, the FDA requirements added in the October 2018 version of their Study Data Technical Conformance Guide (SDTCG) was also the source of extra stress to CDISC implementers. “What? Another Demographics domain? DC domain what? What Observational Class should it be?”
The result is that, at least from the CDISC packages I’m daily exposed to, everyone — vendors and sponsors alike — is interpreting the CDISC and FDA requirements in their own way.
With this article, I will try to summarize this topic by providing a current “state of the art” and some recommendations. I will focus specifically on two scenarios and as such will clarify what we mean by multiple subjects’ enrollments:
Multiple study subject enrollment is where a subject participating in a trial could also be subsequently enrolled in another trial where again the subject could have multiple failures or be successfully enrolled in the first attempt. The subject is assigned a new enrollment number as per the rule set up in the study.
Subject re-enrollment in the same study is where a subject in a single trial is allowed to be re-screened, meaning the subject is selected for participating in a trial, fails the screening, and at a later time might come back and go through the screening process again. Every time the subject comes in, the subject gets a new enrollment/subject number.
The case of a subject enrolled in multiple cohorts or study arms within the same study is not covered in this article,[vii] although many of the recommendations provided in this article apply to this scenario as well.
The question is: how can I represent this information in our CDISC datasets, for example in SDTM, so that we have a clear flow of the subject experience, while we maintain traceability and hopefully compliance with available standards and regulatory guidance?
Let’s look first at specific requirements from the SDTM IG and Health Regulatory guidance.
Who is Subject X for CDISC and Health Authorities?
The CDISC IG guidance clearly says, for example here in the below DM section, that it is a requirement to use the USUBJID variable to uniquely identify a subject in all studies within a submission involving the product.
Furthermore, the FDA in its SDTCG[viii] adds that an “individual,” a study subject, should have the exact same unique identifier, thus the USUBJID variable, across all datasets, including between SDTM and ADaM, and that if a subject participated in more than one study, he or she should maintain the same USUBJID across all studies.
Similar requirements and additional clarifications can be found in both the FDA and the PMDA[ix] SDTCG.
In addition to USUBJID, which is a unique identifier for the subject across all studies, we also need a SUBJID that uniquely identifies each subject that participates in a study. Also, if a subject is screened and/or enrolled more than once in a study, then the subject’s SUBJID should be different for each unique screening or enrollment.
However, from the CDISC IG, including the latest IG 3.4, it is not clear how to handle data about individuals that are screened/enrolled more than once in the same study. The only additional requirement is that in DM domain, only one record per subject, per individual, can be submitted, but it is unclear how we could represent demographics for subjects that have been enrolled more than once (SDTM IG 3.2 / 3.3 / 3.4 Section 5.2 “DM Assumptions”).
Let’s look now at a couple of scenarios on how to handle multiple enrollments.
Multiple study subject enrollment
The following scenario shows how on study nr.2, USUBJID was carried over from study nr.1, while keeping SUBJID and SITEID the one assigned during enrollment in study nr.2.
Of note, in this and all subsequent scenarios and examples, USUBJID is built by concatenating the study ID, site ID, and subject ID or enrollment number. For example, in study 1, the USUBJID was given the value of STUDY01 concatenated with the value 010, which is the site nr. where the subject was enrolled and 101 that is the subject number assigned during the “primary enrollment” (I will come back later to the definition of primary enrollment). However, this is simply a convention as the CDISC IG doesn’t mandate any standard derivation nor does the regulatory guidance, e.g., FDA SDTCG.
To achieve the USUBJID “uniqueness,” we need to be a bit “prescient” and plan in CRF some fields to check if a subject did participate to any previous study of the same “product.”
The same is also applicable for rescreened subjects within the same study, the CRF should collect previous assigned screening/enrollment numbers.
It is also recommended to mention in the Clinical Study Data Reviewer Guide (cSDRG) which previous studies a subject could have been already enrolled in and the reason.
The same is true for submission projects where individual studies were conducted by other vendors or sponsors. It is recommended to check with the sponsor which studies could have subjects that already participated in previous studies. You will need to check if USUBJID was correctly implemented, because retrospective changes in individual existing study packages could be problematic, if not discouraged, and eventually adjustments can be done in the pooled datasets, either in the integrated SDTM or integrated ADaM.
Subject re-enrollment in the same study
Let’s look now at a scenario where a subject might be screened several times in the same study (the figure below was taken from V. Poulsen and H. Pontoppidan Föh PHUSE-EU 2022 presentation).
While subject XXX is finally enrolled after two failures, and eventually randomized, subject YYY failed twice and never came back. At each attempt, both subjects were given a new enrollment number and eventually some data needed for screening the subject might be collected and re-collected, e.g., vital signs.
Given the fact we must keep the same USUBJID when a subject is screened multiple times within the same study, this means in the DM domain we must decide from which enrollment the DM information should be derived. One approach could be as follows:
One record per USUBJID should be created in DM from the primary enrollment whereas primary enrollment could be defined as the “successful enrollment” or the “last enrollment attempt” if a subject was finally not successfully enrolled.
Demographics data from previous enrollments can be either mapped to SUPPDM, for example, previously assigned enrollment numbers, or, as per FDA recommendation in their SDTCG, you can create another special purpose domain like DM; this can be considered a custom domain, for example, you can name it XM, and you can keep the same set of variables expected in DM, for example RACE, AGE, etc. Of note, “DC – Demographics as Collected” (or “Demographics for Multiple Participations”?) domain is currently under discussion in the CDISC SDTM Team to be included in the next SDTM IG 4.0.[x]
It is also recommended to check for consistency for last provided demographics information with those provided in previous enrollments, for example, race, date of birth, etc.
With regards to variables such as informed consent, the primary informed consent could be used to fill the RFICDTC variable in DM, while DS will contain all informed consent date from all screening attempts, including the primary one (so no need additional informed consent date in SUPPDM). This is currently against the current SDTM IG assumptions for DM point 10.d where “In the event that there are multiple informed consents, this will be the date of the first”; however, this very likely it will be changed in the next SDTM version, so I suggest following my recommendation and map in RFICDTC the date of the informed consent of the primary enrollment and not the first one.
SUBJID can also be kept in DS to distinguish the various enrollment/screening attempts and the same for any other domain where data from different enrollment attempts are mapped; this is also the recommendation in the FDA SDTCG “even though it may cause validation errors” as they say in the “Subject Identifier (SUBJID)” section. Again, this is something that might change in future versions of SDTM. As an alternative, some sponsors used variables such as --GRPID to indicate which enrollment the information is coming from, for example, “First Enrollment,” “Second Enrollment,” etc., and --SPID to store the correspondent SUBJID from DM.
What about ADaM?
The same requirements discussed for SDTM apply to ADaM, so if the unique USUBJID throughout all study SDTM packages has been properly implemented, then you should not have any major problems in ADaM.
If in your ADaM package you do not plan to include information about screen failure subjects, my recommendation is also to not map any data related to screen failure attempts, if these do not contribute to any analyses, for example, vital signs collected during previous screening failure(s) attempts when then the subject was randomized and new “screening” vital signs data are collected again.
ADaM has other types of problems to consider when integrating data from different studies where there are subjects enrolled in more than one study or where the same subject could be part of different types of cohorts, thus for example, having a different way of assessing analysis periods depending on which cohort the subject data are analyzed, so potentially needing one record per subject/per cohort in ADSL (you can also check Cytel experience with ADaM integration shared few years back at PHUSE-EU Interchange).[xi] However, despite this, there was a proposal made by the CDISC ADaM team,[xii] but that proposal and the draft guidance was never finalized and published, thus the problem as of today remains unresolved.
As discussed at CDISC EU Interchange 2020,[xiii] “it is complicated to represent the multiple screening data without breaking few rules in SDTM/ADaM,” such as for example an informed consent in DS with date before the primary informed consent mapped in DM or in IE when a subject is rescreened and failed for the same reason. For this reason, it is important to openly discuss with your reviewer, for example, through the Study Data Standardization Plan, the solution you intend to apply, and eventually which expected existing conformance criteria you are going to violate with your approach.
My final recommendation, so that also conformance issues are limited, is to avoid being creative with the solutions you would like to apply or use those solutions for internal purpose only if you think this will help your organization for example to properly identify a subject when enrolled multiple times within the same study or across studies.
Interested in learning more about data submission? Download our complimentary new ebook, The Good Data Doctor on Data Submission and Data Integration: