All of Us Curation NLP

Purpose of this document

Describes the All of Us NLP deliverables associated with data ingestion and quality control, intended to support alpha release requirements. This document is version controlled; you should read the version that lives in the branch or tag you need. The specification document should always be consistent with the implemented curation processes.

Directory Overview

src
- Source code in Java
- main - Scripts for setup, maintenance, deployment, etc.
- test - Unit tests.
docker
- Dockerfile with all tools necessary for running the package
config
- Cloud Build configuration.

Developer setup

Please reference this guide for development setup.

Usage

Ensure the required environment variables are set as indicated in the developer guide.
The following command can be used to build maven for different profiles

mvn clean install -U -P {profile} where profile can be direct, spark, flink and dataflow\

To deploy to Google Dataflow, use the following command

java -cp target/curation-nlp-bundled-dataflow-1.2-SNAPSHOT.jar org.allofus.curation.pipeline.CurationNLPMain --runner=DataflowRunner --gcpTempLocation={bucket}/gcp_tmp --stagingLocation={bucket}/staging --tempLocation={bucket}/tmp --resourcesDir={bucket}/resources --input={bucket}/input --output={bucket}/output --inputType=jsonl --outputType=jsonl --project={project} --region={region} --subnetwork={subnet} --usePublicIps=false --maxNumWorkers=5 --numberOfWorkerHarnessThreads=2 --workerMachineType=n1-highmem-4 --diskSizeGb=50 --experiments=use_runner_v2 --pipeline={pipeline} --maxClampThreads=4 --maxOutputPartitionSeconds=60 --maxOutputBatchSize=100 [--streaming --enableStreamingEngine]

Authentication Details

All actors calling APIs in production will use service accounts.

Name		Name	Last commit message	Last commit date
Latest commit History 194 Commits
.circleci		.circleci
.mvn		.mvn
config		config
docker		docker
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

All of Us Curation NLP

Purpose of this document

Directory Overview

Developer setup

Usage

Authentication Details

About

Releases

Packages

Contributors 3

Languages

License

all-of-us/curation-nlp

Folders and files

Latest commit

History

Repository files navigation

All of Us Curation NLP

Purpose of this document

Directory Overview

Developer setup

Usage

Authentication Details

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages