Skip to content
This repository has been archived by the owner on Jun 2, 2023. It is now read-only.

Latest commit

 

History

History
94 lines (71 loc) · 3.56 KB

README-impala.md

File metadata and controls

94 lines (71 loc) · 3.56 KB

Make sure you have Impala running on a cluster with the Common Data Model data loaded as described in https://github.com/OHDSI/CommonDataModel/tree/master/Impala.

Before running Achilles, create a database for the Achilles tables:

impala-shell -q 'CREATE DATABASE achilles'

Download the Impala JDBC drivers from the Cloudera website, and install them in a local directory called ~/impala-drivers/Cloudera_ImpalaJDBC4_2.5.36.

In R, use the following commands to install the Impala development version of Achilles. I used Docker to get all the pre-reqs (and ran docker run -it --rm -v ~/impala-drivers/:/impala-drivers achilles_achilles bash).

install.packages("devtools")
                 library(devtools)

install_github("tomwhite/SqlRender", ref="impala-timestamp")
library(SqlRender)
install_github("ohdsi/DatabaseConnector")
install_github("ohdsi/Achilles")

To run Achilles for a single, simple analysis, try:

library(Achilles)
connectionDetails <- createConnectionDetails(dbms="impala", 
                                             server="bottou01.sjc.cloudera.com",
                                             schema="omop_cdm_parquet",
                                             pathToDriver = "/impala-drivers/Cloudera_ImpalaJDBC4_2.5.36")
achillesResults <- achilles(connectionDetails, cdmDatabaseSchema="omop_cdm_parquet",
                            resultsDatabaseSchema="achilles", sourceName="Impala trial", runHeel = FALSE,
                            cdmVersion = "5", vocabDatabaseSchema="omop_cdm_parquet", analysisIds = c(1))

Have a look at the output:

impala-shell -q 'select * from achilles.achilles_results'

The following is a more complex analysis:

achillesResults <- achilles(connectionDetails, cdmDatabaseSchema="omop_cdm_parquet",
                            resultsDatabaseSchema="achilles", sourceName="Impala trial", runHeel = FALSE,
                            cdmVersion = "5", vocabDatabaseSchema="omop_cdm_parquet", analysisIds = c(105))

This is how you can run a set of analyses, by exclusion:

allAnalyses=getAnalysisDetails()$analysis_id
subset1=setdiff(allAnalyses,c(110))
subset2=Filter(function(X) { X > 0 }, subset1)
achillesResults <- achilles(connectionDetails, cdmDatabaseSchema="omop_cdm_parquet",
                            resultsDatabaseSchema="achilles", sourceName="Impala trial", runHeel = FALSE, createTable = FALSE,
                            cdmVersion = "5", vocabDatabaseSchema="omop_cdm_parquet", 
                            analysisIds = subset2)

Note that createTable has been set to FALSE so that the initial tables are not created, to save time.

Finally, run all of the analyses with:

achillesResults <- achilles(connectionDetails, cdmDatabaseSchema="omop_cdm_parquet",
                            resultsDatabaseSchema="achilles", sourceName="Impala trial", runHeel = FALSE,
                            cdmVersion = "5", vocabDatabaseSchema="omop_cdm_parquet")

You can uninstall packages with

remove.packages("Achilles")
remove.packages("DatabaseConnector")
remove.packages("SqlRender")

If you want to delete all the Achilles results use the following:

impala-shell -q 'drop database achilles cascade'

Heel implementation for Impala

Currently only Achilles precomputations are fully developed for Impala. No work was done on the Heel component of Achilles

impala-shell -q 'drop database achilles cascade'