Metadata Repositories: Overcoming Challenges with Automation

CDISC| data submission| metadata |

February 23, 2024

Written by Angelo Tinazzi

Written by Angelo Tinazzi, Nicolas Rouillé, and Sebastià Barceló

In the realm of standards management, companies of all sizes are increasingly exploring the potential of metadata repositories (MDR). From protocol development to eCRF, SDTM, and ADaM to Analysis Results, these repositories are being used to speed up study set up and delivery. Sponsors leverage metadata using a framework that involves putting together a data governance team that establishes and pilots company standards, defines roles, establishes workflows, develops standard operating procedures, and provides necessary training. This structured framework is supported by selecting or building a metadata repository that aligns with the established infrastructure.

However, unlike sponsors, CROs encounter specific challenges when implementing standards management or metadata use, given that each sponsor has unique processes and not all aspects of clinical trial are always managed by a single CRO.

Here, we share a new approach to automation to address the challenges inherent in metadata repositories.

Use of metadata: Sponsors vs. CROs

The key distinction between sponsors and CROs is that sponsors have the “autonomy” to deploy their standards framework across their portfolio, on a limited number of indications, potentially integrating studies within a metadata repository while coexisting with off-MDR studies.

On the other hand, CROs that choose to implement MDRs operate within diverse client contexts where they must adapt to each client’s standards and requirements (or lack thereof), managing variability in data governance practices and technical choices. The scope of CRO contracts further adds nuance, as the benefit derived from MDR usage may vary based on the contracted services, ranging from full-service biometrics to limited reporting services. Lastly, in addition to compliance with health authorities’ submission requirements, CROs are required to adapt to the evolving sponsor strategies and technical choices.

Challenges in implementing a metadata repository

Both sponsor organizations and CROs share a common objective of reducing the duration of clinical trials and expediting the time to market for new products. Meeting the increasingly complex requirements of health authorities necessitates the development of technical strategies that align with the expected level of standardization, supported by meticulous documentation highlighting traceability, linkage of data elements, and detailed descriptions.

Moreover, we also need to consider the adoption of multiple programming languages like R and Python, and the impact this has on data workflows, process and resource configurations within a company, and outsourcing engagements between companies.

A new approach to addressing metadata usage challenges

There are numerous MDR solutions available, yet no single dominant leader has emerged. The proliferation of MDR solutions adds complexity to centralizing and managing metadata, with variations in data governance team assembly and standards implementation.

Contemporary MDR solutions tend to primarily emphasize standardization in upstream artifacts like protocol and case report form development, leaving significant limitations in consuming metadata for statistical analysis deliverables such as ADaM, Tables, Listings and Figures (TLFs); statistical analysis plans (SAP), and study mock shells documents. Given these constraints, while still looking at the market options and evolving open-source initiatives such as “The Open Study Builder” from the CDISC COSA Initiative, we’ve pursued an alternative approach to harnessing metadata.

Our automation team’s approach involves a strategic division of the data workflow based on the type of deliverables, such as eCRF (CDASH), SDTM, SAP, study mock shells, ADaM, and TLFs. We identify and enrich metadata from each artifact to fully automate the production of a given deliverable. This user-centered strategy breaks down the challenge into manageable components, attributing each artifact to a specific function (e.g., data management, biostatistics, statistical programming). This approach provides subject-matter expertise and directly supports the conceptualization and testing of automation tools. It also accelerates tool development, allowing parallel development of multiple tools and asynchronous release. This helps to significantly speed up the release cadence and accommodates potential failures in tool development without hindering the use of tools created for other upstream or downstream steps of the workflow.

The Cytel PRISM application (presented at PHUSE-EU 2022 and 2023), is one example of how we broke down the challenge into manageable components.^1,2 With PRISM, we are able to capture tagged metadata directly from study mock shells developed at Cytel with Lighthouse, another tool to support Cytel biostatisticians in developing SAP mock shells from a standard library, to then automatically generates TLFs programs, either in SAS or R. At this stage, we can automate on average about 60% of the outputs needed for a study through PRISM (see applied workflow in Figure 1).

Figure 1: Cytel analysis results standard ARS-driven generation of TLFs

The value of metadata can be maximized in clinical trial delivery by starting with the metadata inherent in study artifacts. This divergent approach accommodates a multitude of sponsor standards and delivery requirements without sacrificing the benefits of automation within an ecosystem of interdependencies between regulatory authorities, industry consortia, sponsors, CROs, and third-party technology vendors. Using our unique approach, we have streamlined the automation of TLFs production.

About Angelo Tinazzi

Angelo Tinazzi is Senior Director, Statistical Programming, Clinical Data Standards, and Clinical Data Submission at Cytel. Angelo has over 25 years’ experience in clinical research with expertise in statistical programming and the application of CDISC standards across different therapeutic areas, in particular, with data submission to health authorities such as the FDA and PMDA.

About Nicolas Rouillé

Nicolas Rouillé is Senior Director, Statistical Programming, for Cytel’s Projects Based Services (PBS) business unit, in Europe. He supervises statistical programming for the “Analysis” projects and he contributed to Cytel’s service team and locations expansion since 2013. With a biostatics background, Nicolas has more than 20 years of experience in the pharmaceutical industry with different roles in biometric groups.

About Sebastià Barceló

Sebastià is Principal Statistical Programmer at Cytel in Geneva, mainly working in automation initiatives. He has more than 9 years of experience in the field of clinical research in the areas of data management and statistical programming with different roles in CROs, in Spain and Switzerland.

Interested in learning more about data submission and data integration? Download our complimentary ebook:

Notes

“Leveraging the Analysis Results Standard (ARS): The Cytel PRISM Experience,” PHUSE 2023.
“From Artifact to Metadata: A Divergent Approach to Automation, through Standardization,” PHUSE 2022.

Metadata Repositories: Overcoming Challenges with Automation

Read more from Perspectives on Enquiry and Evidence:

Sorry no results please clear the filters and try again

New FDA Data Submission Requirements and Substantial Changes

Preparing Your Integrated Summaries of Safety and Effectiveness: Best Practices

It’s Time to Move, Time to Move to Define-XML 2.1

Therapeutics Development Team

Software

Innovative Trial Design

RWE

HEOR

Biometrics Implementation

Functional Service Provision

CDI

Events

Newsroom

Resources

Metadata Repositories: Overcoming Challenges with Automation

Read more from Perspectives on Enquiry and Evidence:

Sorry no results please clear the filters and try again

New FDA Data Submission Requirements and Substantial Changes

Preparing Your Integrated Summaries of Safety and Effectiveness: Best Practices

It’s Time to Move, Time to Move to Define-XML 2.1

Subscribe to our weekly newsletter

Subscribe to our weekly newsletter