CARE-SM Toolkit

Why to use it?

The implementation of the Clinical And Registry Entries (CARE) Semantic Model for CSV data entails a meticulous and technically advanced workflow. By leveraging the power of the CARE-SM, YARRRML templates and incorporating the critical curation step executed by the CARE-SM toolkit, this implementation achieves robustness, accuracy, and reliability in generating RDF-based CARE SM-oriented patient data.

The toolkit serves as a module dedicated to performing a curation step prior to the conversion of data into RDF. The primary transformations carried out by the toolkit include:

  • Quality control for column names.

  • Adding every domain specific ontological term required to define every instances of the model, these terms are specific for every data element.

  • Splitting the column labeled as value into distinct datatypes. This enables YARRRML to interpret each datatype differently, facilitating the subsequent processing.

  • Conducting a quality control among age/date, startdate and enddate columns to ensure data consistency and validity.

  • Eliminating any row that lacks the minimal required data to minimize the generation of incomplete RDF transformations.

  • Creation of the column called uniqid that assigns a unique identifier to each observation. This prevents the RDF instances from overlapping with one another, ensuring their distinctiveness and integrity.

Requirements

  • In order to use CARE-SM Toolkit functionality:

  • All CSV files MUST be named according the data tags described at the CARE-SM glossary, documented at CARE-SM implementation E.g.: Diagnosis.csv, Birthdate.csv

  • All your CSV data content MUST be compatible with the CARE-SM glossary, documented at CARE-SM implementation

Docker

There’s a Docker-based implementation controlled via API (using FastAPI) that you can use for mounting this data transformation step as a part of your CARE-SM implementation.

You can edit the docker-compose.yaml to control the volume folder in order to pass your CSV-based patient data:

version: "3.8"

services:
  api:
    image: pabloalarconm/care-sm-toolkit:1.1.0 # Check latest version
    ports:
      - "8000:8000"
    volumes:
      - ./location/of/your/data:/code/data

Note IP and Port can be customized in the docker compose as well.

Run docker compose to start the containers:

 docker compose up -d

Once its running, you can use in your browser the OpenAPI documentation at http://localhost:8000/docs so inspect all the possible requests and trigger the execution

Alternatively, you trigger the data transformation in the terminal by the following:

curl -X POST http://localhost:8000/toolkit

Congrats! You will find your transformed data, stored as CARE.csv at the folder you defined as volume below.

To stop and remove the implementation, do the following:

docker compose down

Local implementation

If you are not interested in running our Docker image, you can install the Python module for local implementation.

pip install care-sm-toolkit==1.1.0

Then, change the folder path inside the trial.py script. And run it:

python3 trial.py