This software brings together data about clinical trials from various sources and converts it into RDF (Resource Description Framework) format. By extracting relevant information and mapping it to RDF, this tool helps make clinical trial data more accessible and interoperable for semantic web applications.
- Extract PubChem CIDs from JSON files: Extracts Compound Identifiers (CIDs) from JSON files containing clinical trial data.
- Fetch clinical trial data from PubChem using CIDs: Uses extracted CIDs to query PubChem and retrieve detailed clinical trial information.
- Convert clinical trial data to RDF: Transforms the clinical trial data into RDF format, making it compatible with semantic web standards.
git clone <repository-url>
cd clinical-trials-rdf-converter
pip install .
Use the extract-cid command to extract PubChem CIDs from a source JSON file.
The JSON file can be downloaded from https://pubchem.ncbi.nlm.nih.gov/source/15362#data=Annotations
ct-rdf extract-cid --basefile <source_json_file> --targetfile <output_cid_file>
Example:
ct-rdf extract-cid --basefile clinical_trials.json --targetfile extracted_cids.csv
Use the pubchem-from-cid command to fetch clinical trial data from PubChem using the extracted CIDs.
ct-rdf pubchem-from-cid --cidfile <input_cid_file> --targetfolder <output_folder>
Example:
ct-rdf pubchem-from-cid --cidfile extracted_cids.csv --targetfolder pubchem_data/
Use the build-rdf command to convert the fetched clinical trial data into RDF format.
ct-rdf build-rdf --inputfolder <json_input_folder> --ttlfile <output_rdf_ttl_file>
Example:
ct-rdf build-rdf --inputfolder pubchem_data/ --ttlfile clinical_trials_data.ttl
The ct-rdf fetch-ct
command is used to fetch data from the ClinicalTrials.gov API and convert it to RDF format. This command processes a list of clinical trial identifiers, retrieves detailed information from ClinicalTrials.gov, and generates an RDF file containing the data.
To use this command, you need to specify the following arguments:
inputfile
: The CSV file containing the list of clinical trial identifiers.start
: The starting index for processing the identifiers.end
: The ending index for processing the identifiers.outputfolder
: The folder where the resulting Turtle (.ttl) file will be saved.
Command Syntax:
ct-rdf fetch-ct inputfile.csv outputfile.ttl --start <start> --end <end>
You can now load the newly generated .ttl files to your favorite triple store. You can run Jena with Fuseki using
docker run -it -p 3030:3030 stain/jena-fuseki
TODO
Please refer to the following links to obtain the source data files used for extracting PubChem CIDs:
This project is licensed under the MIT License.
Data from ClinicalTrials.gov may be subject to copyright
If you have suggestions for improvements or new features, please create an issue or submit a pull request. Contact