Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Almost) Universal batch job dockerfile #613

Open
wants to merge 7 commits into
base: uv_at_last
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 0 additions & 16 deletions batch/gdal-python.dockerfile

This file was deleted.

16 changes: 0 additions & 16 deletions batch/postgresql-client.dockerfile

This file was deleted.

16 changes: 0 additions & 16 deletions batch/tile_cache.dockerfile

This file was deleted.

66 changes: 66 additions & 0 deletions batch/universal_batch.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
FROM ghcr.io/osgeo/gdal:ubuntu-full-3.9.3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a big jump from gdal v1.2.2 in the gdal-python.dockerfile to gdal v3.9.3? Are you sure there are no incompatibilities? I guess we just need to do a lot of testing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no, that's just the (somewhat arbitrary) version of the dockerfile. v1.2.1 corresponds to GDAL v3.8.5 or so. So it's not that big a version bump. See here: https://github.com/wri/gfw-dockerfiles/blob/master/data-api-gdal.dockerfile#L1

LABEL desc="Docker image with ALL THE THINGS for use in Batch by the GFW data API"
LABEL version="v1.1"

ENV TIPPECANOE_VERSION=2.72.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it is intentional that you are upgrading tippecanoe so much from v1.3.1 to v2.72.0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


ENV VENV_DIR="/.venv"

RUN apt-get update -y \
&& apt-get install --no-install-recommends -y python3 python-dev-is-python3 python3-venv \
postgresql-client jq curl libsqlite3-dev zlib1g-dev zip libpq-dev build-essential gcc g++ \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

RUN ln -s /usr/include /usr/include/gdal

# --system-site-packages is needed to copy the GDAL Python libs into the venv
RUN python -m venv ${VENV_DIR} --system-site-packages \
&& . ${VENV_DIR}/bin/activate \
&& python -m ensurepip --upgrade \
&& python -m pip install \
agate~=1.12.0 \
asyncpg~=0.30.0 \
awscli~=1.36.18 \
awscli-plugin-endpoint~=0.4 \
boto3~=1.35.77 \
click~=8.1.7 \
csvkit~=2.0.1 \
earthengine-api~=0.1.408 \
fiona~=1.9.6 \
gsutil~=5.31 \
numpy~=1.26.4 \
pandas~=2.1.4 \
psycopg2~=2.9.10 \
rasterio~=1.3.11 \
setuptools~=75.6 \
shapely~=2.0.4 \
SQLAlchemy~=1.3.24 \
tileputty~=0.2.10

RUN ln -s /usr/include /usr/include/gdal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you need to do this here and also at line 15. Probably an unneeded duplicate?

Also, can you put a comment here on why you need to do this at all? Are the gdal header files all in /usr/include/* and you also want to make them appear under /usr/include/gdal/*?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, unneeded duplicate almost certainly! As to why, yes, I think it was just that. I'll add a comment. And possibly try omitting it, in case it's changed since it was necessary.


# Install TippeCanoe
RUN mkdir -p /opt/src
WORKDIR /opt/src
RUN curl https://codeload.github.com/felt/tippecanoe/tar.gz/${TIPPECANOE_VERSION} | tar -xz \
&& cd /opt/src/tippecanoe-${TIPPECANOE_VERSION} \
&& make \
&& make install \
&& rm -R /opt/src/tippecanoe-${TIPPECANOE_VERSION}

# Copy scripts
COPY ./batch/scripts/ /opt/scripts/
COPY ./batch/python/ /opt/python/

# Make sure scripts are executable
RUN chmod +x -R /opt/scripts/
RUN chmod +x -R /opt/python/

ENV PATH="/opt/scripts:${PATH}"
ENV PATH="/opt/python:${PATH}"

ENV WORKDIR="/"
WORKDIR /

ENTRYPOINT ["/opt/scripts/report_status.sh"]
4 changes: 1 addition & 3 deletions scripts/develop
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,7 @@ done
set -- "${POSITIONAL[@]}" # restore positional parameters

if [ "${BUILD}" = true ]; then
docker build -t batch_gdal-python_test . -f batch/gdal-python.dockerfile
docker build -t batch_postgresql-client_test . -f batch/postgresql-client.dockerfile
docker build -t batch_tile_cache_test . -f batch/tile_cache.dockerfile
docker build -t batch_jobs_test . -f batch/universal_batch.dockerfile
docker build -t pixetl_test . -f batch/pixetl.dockerfile
docker compose -f docker-compose.dev.yml --project-name gfw-data-api_dev up --abort-on-container-exit --remove-orphans --build
else
Expand Down
4 changes: 1 addition & 3 deletions scripts/test
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,7 @@ if [ $# -eq 0 ]; then
fi

if [ "${BUILD}" = true ]; then
docker build -t batch_gdal-python_test . -f batch/gdal-python.dockerfile
docker build -t batch_postgresql-client_test . -f batch/postgresql-client.dockerfile
docker build -t batch_tile_cache_test . -f batch/tile_cache.dockerfile
docker build -t batch_jobs_test . -f batch/universal_batch.dockerfile
docker build -t pixetl_test . -f batch/pixetl.dockerfile
docker compose -f docker-compose.test.yml --project-name gfw-data-api_test build --no-cache app_test
fi
Expand Down
4 changes: 1 addition & 3 deletions scripts/test_v2
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,7 @@ if [ $# -eq 0 ]; then
fi

if [ "${BUILD}" = true ]; then
docker build -t batch_gdal-python_test . -f batch/gdal-python.dockerfile
docker build -t batch_postgresql-client_test . -f batch/postgresql-client.dockerfile
docker build -t batch_tile_cache_test . -f batch/tile_cache.dockerfile
docker build -t batch_jobs_test . -f batch/universal_batch.dockerfile
docker build -t pixetl_test . -f batch/pixetl.dockerfile
docker compose -f docker-compose.test.yml --project-name gfw-data-api_test build --no-cache app_test
fi
Expand Down
6 changes: 3 additions & 3 deletions terraform/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ module "batch_gdal_python_image" {
image_name = substr(lower("${local.project}-gdal_python${local.name_suffix}"), 0, 64)
root_dir = "${path.root}/../"
docker_path = "batch"
docker_filename = "gdal-python.dockerfile"
docker_filename = "universal_batch.dockerfile"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**Can you have a single 'module "universal_batch_image" at line 48, adjust the image_name at line 50, remove lines 65-81, and then adjust lines 224, 226,227 to use the same docker url based on the common image name? That way, you're only creating one docker instead of 3 during each data API deployment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will combine, thanks!

}

# Docker image for PixETL Batch jobs
Expand All @@ -68,7 +68,7 @@ module "batch_postgresql_client_image" {
image_name = substr(lower("${local.project}-postgresql_client${local.name_suffix}"), 0, 64)
root_dir = "${path.root}/../"
docker_path = "batch"
docker_filename = "postgresql-client.dockerfile"
docker_filename = "universal_batch.dockerfile"
}

# Docker image for Tile Cache Batch jobs
Expand All @@ -77,7 +77,7 @@ module "batch_tile_cache_image" {
image_name = substr(lower("${local.project}-tile_cache${local.name_suffix}"), 0, 64)
root_dir = "${path.root}/../"
docker_path = "batch"
docker_filename = "tile_cache.dockerfile"
docker_filename = "universal_batch.dockerfile"
}


Expand Down
6 changes: 3 additions & 3 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,11 +181,11 @@ def patch_run(self, *k, **kwargs):
ON_DEMAND_COMPUTE_JOB_QUEUE, cogify_env["computeEnvironmentArn"]
)

aws_mock.add_job_definition(GDAL_PYTHON_JOB_DEFINITION, "batch_gdal-python_test")
aws_mock.add_job_definition(GDAL_PYTHON_JOB_DEFINITION, "batch_jobs_test")
aws_mock.add_job_definition(
POSTGRESQL_CLIENT_JOB_DEFINITION, "batch_postgresql-client_test"
POSTGRESQL_CLIENT_JOB_DEFINITION, "batch_jobs_test"
)
aws_mock.add_job_definition(TILE_CACHE_JOB_DEFINITION, "batch_tile_cache_test")
aws_mock.add_job_definition(TILE_CACHE_JOB_DEFINITION, "batch_jobs_test")
aws_mock.add_job_definition(PIXETL_JOB_DEFINITION, "pixetl_test", mount_tmp=True)

yield aws_mock.mocked_services["batch"]["client"], aws_mock.mocked_services["logs"][
Expand Down
Loading