A little walk in the CDISC Library, hand in hand with SAS
The Christmas break presented an opportunity to make my first concrete steps into the CDISC Library. Overall, it was a pleasant “promenade”.
The CDISC Library forms the foundation of an ongoing transformation in the way we will access and make use of the CDISC standards to facilitate the long awaited and desired, end-to-end data process implementation (see also the CDISC 360 project).
With the availability of the CIDSC Library, vendors can now develop software which you can use to instantly access standards i.e., Standards Controlled Terminology or data standards (for example SDTM). Standards are now available in machine readable and non-proprietary format.
Accessing the CDISC Library
While access to the CDISC standards in PDF format (or Wiki) and the controlled terminology through the NCI portal will remain free of charge, access to the CDISC library is granted only to the CDISC members and open source developers (see here for more details).
The CDISC Library can be accessed in two ways:
- Through the CDISC Library API (I’ll cover this later) - Through the CDISC Library dashboard where you can interactively browse the metadata repository, where CDISC standards are hosted. The dashboard also provides the option to export reports in excel (or csv) format. See below some screenshots.
API and JSON data format
The above may not seem to be anything special but, let’s go back to the CDISC Library API and let me try to introduce a couple of technical things: API and the JSON format.
API (and REST)
An Application Programming Interface (API) is essentially an application that facilitates the interface of two or more applications and the way they communicate with each other through a set of developed procedures. The API should adhere to some standards and the most popular one is REST (REpresentational State Transfer). In this architecture which is based on a client/server model, the client (e.g.: a user through a client application) calls the REST web service by making a “request” and the server (the REST web page) returns a “response”. The CDISC Library is based on a RESTFul API, a standard de-facto set of rules (or constraints) that the REST web service needs to meet.
A request to a RESTFul service consists of 4 elements: endpoint, method, header and data. For the purpose of accessing resources from the CDISC Library API, what matters is the “endpoint” which is essentially a string containing the URL (Uniform Address Location) and the “method”, which in the case of the CDISC Library API is limited to the GET method - a method that lets you query the CDISC Library API.
For example, the following URL https://library.cdisc.org/api/mdr/ct/packages/sdtmct-2020-06-26 returns the Q2 2020 SDTM CDISC-CT; in this URL, https://library.cdisc.org./api represents the “root-endpoint” for the CDISC Library API, and /mdr/ct/packages/sdtmct-2020-06-26 is the specific path where our needed resource, the Q2 2020 SDTM CDISC-CT, is located.
We know now that a web service based on REST is composed of two main elements - Request and Response. By default, the CDISC Library API returns a response to the client in the JSON format, but other formats are also available, for example xml and csv.
Querying the CDISC Library API from SAS
We mentioned before that API are based on the client/server model. Through the CDISC Library API, we can access the web server containing the CDISC data standards from a “client”. A client can be an end-user application that has integrated “request” to the CDISC Library API, or a “script” written in a computer language such as SAS.
Given the fact that REST web services, such as the CDISC Library, use HTTP, we need a way to access URLs over HTTP and find a method to “interpret” the response. In SAS, this is possible using different methods; in particular, we can use the PROC HTTP to send “requests” to the CDISC Library, and use the JSON library engine to interpret or import, the “response” (available since SAS 9.4M4).
For example, with the following SAS code we can query the CDISC Library API to get the full list of available standards and versions (the hidden “api-key” can be generated in the CDISC Library API Portal):
The returned JSON file is a long string containing the response to the query
Through the JSON engine and the following PROC COPY, the JSON content can be interpreted and automatically saved into a number of different SAS datasets. The list of SAS datasets created depends on the type of request/query.
proc copy inlib=space outlib=work; run;
For example, the SAS dataset _links_packages contains one record for each type of controlled terminology (SDTM, CDASH and ADaM) and available versions.
The variable HREF contains the specific “path” to be added to the CDISC Library API “root-endpoint” to get information about each individual CDISC controlled terminology package; therefore, if we want to access the last available CDISC/ADaM Controlled Terminology, in the above PROC HTTP syntax we would need to replace the “path” in the URL as follows:
Overall, everything looks promising although I have the impression that there are still some “work in progress” aspects. For example, it looks like there is some delay between the time a new version of controlled terminology is released and the time the new version is made available on the CDISC Library (at the time I’m writing this article, the latest implemented SDTM controlled terminology in the CDISC library is the CDISC 2020-Q2/ 2020-06-26, whereas three more versions have been released since then). It would also be good to be able to query individual metadata attributes such as label of a standard variable or to query a single metadata attribute for a set of standard variables.
Whether it makes sense to make use of the CDISC Library API in the traditional SAS application, such as a set of SAS programs to migrate legacy datasets to SDTM, or ADaM SAS program, this requires more assessment. So, if you want to know more about my journey into the CDISC library, and for more “Tech-Enabled Standards” presentations, make sure you attend the next CDISC-EU Interchange, April 28-30, 2021 (see the draft agenda and how to register for the event).
If you want to know more about the CDISC Library, I also recommend the following readings:
Angelo Tinazzi is Senior Director, Statistical Programming, Clinical Data Standards and Clinical Data Submission at Cytel. He is a well- published and recognized expert in statistical programming with over 20 years' experience in clinical research. The application of CDISC standards in different therapeutic areas is part of his core expertise since 2003 in particular in the context of data submission to health authorities such as the FDA and PMDA.
Angelo is an authorized CDISC instructor and member of the CDISC ADaM Team as well as the CDISC European Committee where he also manages the Italian-speaking CDISC User Network.