Elasticsearch-powered search interface to browse publicly available corpora
Try it out here: HCDS Corpus Browser
-
Clone this repository:
git clone https://github.com/uhh-lt/corpus-browser.git
-
Navigate to the docker directory:
cd corpus-browser/docker
-
Adjust the environment variables
- Copy the .env.example file to .env:
cp .env.example .env
- Change UID and GID to the output of
id
- Copy the .env.example file to .env:
-
Run
docker compose up -d
-
Visit http://localhost:13100/ in your browser
- Navigate to the importer directory:
cd corpus-browser/importer
- Install conda environment:
conda env create -f environment.yaml
- Activate conda environment:
conda activate corpus-browser
- Run
python importer.py --index germanu15 --input_dir ../data/uhh/json
- Navigate to the docker directory (
cd corpus-browser/docker
) and removefrontend
fromCOMPOSE_PROFILES
in the.env
file - Start the docker containers:
docker compose up -d
- Navigate to the frontend directory (
cd corpus-browser/frontend
) and install all dependencies (npm i
) - Start the frontend:
npm run dev
- Visit http://localhost:5173/ in your browser