Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble with Dockerfile - here's what built for me #1380

Open
danbri opened this issue Jun 20, 2024 · 3 comments
Open

Trouble with Dockerfile - here's what built for me #1380

danbri opened this issue Jun 20, 2024 · 3 comments

Comments

@danbri
Copy link

danbri commented Jun 20, 2024

I've been trying to build in a Windows 11 + WSL2/Ubuntu environment. Docker wasn't working due to boost repo seeming to be offline.

The following just built for me. Being lazy, I just let claude.ai keep suggesting things based on error messages but seems most of the iterations were working through consequences of the boost library needing to be built from source. Use at your own peril!

FROM ubuntu:22.04 as base
LABEL maintainer="Johannes Kalmbach <kalmbacj@informatik.uni-freiburg.de>"
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV LC_CTYPE C.UTF-8
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y software-properties-common

FROM base as boost
RUN apt-get update && apt-get install -y build-essential wget zlib1g-dev
WORKDIR /tmp
RUN wget https://boostorg.jfrog.io/artifactory/main/release/1.81.0/source/boost_1_81_0.tar.gz \
    && tar xzf boost_1_81_0.tar.gz \
    && cd boost_1_81_0 \
    && ./bootstrap.sh --prefix=/usr/local \
    && ./b2 install --with-program_options --with-iostreams --with-regex --with-url -sZLIB_INCLUDE=/usr/include -sZLIB_LIBRARY=/usr/lib/x86_64-linux-gnu/libz.so

FROM base as builder
COPY --from=boost /usr/local /usr/local
RUN apt-get update && apt-get install -y build-essential cmake libicu-dev tzdata pkg-config uuid-runtime uuid-dev git libjemalloc-dev ninja-build libzstd-dev libssl-dev zlib1g-dev
ENV LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

COPY . /app/

WORKDIR /app/
ENV DEBIAN_FRONTEND=noninteractive

WORKDIR /app/build/
RUN cmake -DCMAKE_BUILD_TYPE=Release -DLOGLEVEL=INFO -DUSE_PARALLEL=true -D_NO_TIMING_TESTS=ON -GNinja ..
RUN ninja -j2 sparqlExpressions
RUN ninja -j2
RUN ctest --rerun-failed --output-on-failure

FROM base as runtime
COPY --from=boost /usr/local /usr/local
WORKDIR /app
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev zlib1g-dev
ENV LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

ARG UID=1000
RUN groupadd -r qlever && useradd --no-log-init -r -u $UID -g qlever qlever && chown qlever:qlever /app
USER qlever
ENV PATH=/app/:$PATH

COPY --from=builder /app/build/*Main /app/
COPY --from=builder /app/e2e/* /app/e2e/
ENV PATH=/app/:$PATH

USER qlever
EXPOSE 7001
VOLUME ["/input", "/index"]

ENV INDEX_PREFIX index
ENV MEMORY_FOR_QUERIES 70
ENV CACHE_MAX_SIZE_GB 30
ENV CACHE_MAX_SIZE_GB_SINGLE_ENTRY 5
ENV CACHE_MAX_NUM_ENTRIES 1000
ENTRYPOINT ["/bin/sh", "-c", "exec ServerMain -i \"/index/${INDEX_PREFIX}\" -j 8 -m ${MEMORY_FOR_QUERIES} -c ${CACHE_MAX_SIZE_GB} -e ${CACHE_MAX_SIZE_GB_SINGLE_ENTRY} -k ${CACHE_MAX_NUM_ENTRIES} -p 7001 \"$@\"", "--"]

Log from the build:

(base) danbri@tincan:/mnt/d/foafplus/quickstart-qlever/qlever-code$ docker buildx build -t qlever .
[+] Building 2285.4s (24/24) FINISHED                                                                                                                       docker:default
 => [internal] load build definition from Dockerfile                                                                                                                  0.1s
 => => transferring dockerfile: 2.37kB                                                                                                                                0.1s
 => [internal] load metadata for docker.io/library/ubuntu:22.04                                                                                                       0.0s
 => [internal] load .dockerignore                                                                                                                                     0.1s
 => => transferring context: 213B                                                                                                                                     0.0s
 => [internal] load build context                                                                                                                                     3.9s
 => => transferring context: 40.23kB                                                                                                                                  3.8s
 => [base 1/2] FROM docker.io/library/ubuntu:22.04                                                                                                                    0.0s
 => CACHED [base 2/2] RUN apt-get update && apt-get install -y software-properties-common                                                                             0.0s
 => [boost 1/3] RUN apt-get update && apt-get install -y build-essential wget zlib1g-dev                                                                             25.1s
 => [boost 2/3] WORKDIR /tmp                                                                                                                                          0.1s
 => [boost 3/3] RUN wget https://boostorg.jfrog.io/artifactory/main/release/1.81.0/source/boost_1_81_0.tar.gz     && tar xzf boost_1_81_0.tar.gz     && cd boost_1  108.6s
 => [runtime 1/6] COPY --from=boost /usr/local /usr/local                                                                                                             1.8s
 => [builder 2/9] RUN apt-get update && apt-get install -y build-essential cmake libicu-dev tzdata pkg-config uuid-runtime uuid-dev git libjemalloc-dev ninja-build  43.5s
 => [runtime 2/6] WORKDIR /app                                                                                                                                        0.1s
 => [runtime 3/6] RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 l  26.1s
 => [runtime 4/6] RUN groupadd -r qlever && useradd --no-log-init -r -u 1000 -g qlever qlever && chown qlever:qlever /app                                             0.5s
 => [builder 3/9] COPY . /app/                                                                                                                                        7.5s
 => [builder 4/9] WORKDIR /app/                                                                                                                                       0.1s
 => [builder 5/9] WORKDIR /app/build/                                                                                                                                 0.1s
 => [builder 6/9] RUN cmake -DCMAKE_BUILD_TYPE=Release -DLOGLEVEL=INFO -DUSE_PARALLEL=true -D_NO_TIMING_TESTS=ON -GNinja ..                                          69.8s
 => [builder 7/9] RUN ninja -j2 sparqlExpressions                                                                                                                   564.3s
 => [builder 8/9] RUN ninja -j2                                                                                                                                    1417.7s
 => [builder 9/9] RUN ctest --rerun-failed --output-on-failure                                                                                                       39.1s
 => [runtime 5/6] COPY --from=builder /app/build/*Main /app/                                                                                                          0.3s
 => [runtime 6/6] COPY --from=builder /app/e2e/* /app/e2e/                                                                                                            0.3s
 => exporting to image                                                                                                                                                2.4s
 => => exporting layers                                                                                                                                               2.4s
 => => writing image sha256:be1bf8e0a8d6a23910c04dda7017c09948b0277d577aff287e87647cec33cc9d                                                                          0.0s
 => => naming to docker.io/library/qlever  
@joka921
Copy link
Member

joka921 commented Jun 21, 2024

Hi,

  1. We always push docker builds to dockerhub for each commit to master for X86-64 and ARM64, is there any particular reason you did not simply pull those?
  2. You seem to just have had bad luck with your timing. In the meantime those repositories all work again (our own CI actions were failing for some hours yesterday for exactly the same reasons, sometimes the PPAs of Ubuntu are under maintenance which you typically only realize if you frequently restall your dependencies (as typical in CI builds).
  3. I don't really see an issue with QLever here, can you please confirm that the "ordinary" build now works again that all the dependencies are up?

@danbri
Copy link
Author

danbri commented Jun 24, 2024

Thanks - and sorry for sluggish reply!

My journey here was basically that I have heard good things about qlever and specifically it seems capable of running a copy of Wikidata, so I started with that in mind.

I think I found
https://github.com/Buchhold/QLever/blob/master/docs/quickstart.md because I found the old wikidata.md in that repo but I then found my way to https://github.com/ad-freiburg/qlever/blob/master/docs/quickstart.md and https://github.com/ad-freiburg/qlever/blob/master/docs/wikidata.md.OLD

The quickstart doesn't mention dockerhub, and I don't do a lot of docker work lately so I was trying to do as the quick start guide suggested, e.g.

cd $QLEVER_HOME
git clone --recursive -j8 https://github.com/ad-freiburg/qlever qlever-code
cd qlever-code
docker build -t qlever .

So - yes it looks like the ordinary build on the dockerhub directory should be fine.

I saw belatedly that there is a different quickstart within the github repo readme, https://github.com/ad-freiburg/qlever/tree/master emphasizing the python wrapper utility. I haven't tried it yet, but it is good to see that https://github.com/ad-freiburg/qlever-control/blob/main/src/qlever/Qleverfiles/Qleverfile.wikidata was updated recently.

With "raw" Docker I am having only partial success so far but I have not been able to put a lot of time into figuring things out yet.

I have an indexing job just started this way in powershell Windows. Previous runs died either memory use or too many files open (I think that was Ubuntu/WSL2 which I've now abandoned). This machine does not have a ton of memory fwiw.

PS D:\foafplus\quickstart-qlever\qlever-code> docker run -it --rm

-v "${PWD}/index:/index" `
-v "${PWD}/wikidata-input:/input" `
--entrypoint /bin/bash `
qlever `
-c "bzip2 -dc /input/latest-truthy.nt.bz2 | /app/IndexBuilderMain -i /index/wikidata-truthy -f - -F nt -m '15GB' -s /input/wikidata.settings.json"

2024-06-24 22:40:57.710 - INFO: QLever IndexBuilder, compiled on Sun Jun 23 00:44:19 UTC 2024 using git hash 833925
2024-06-24 22:41:07.468 - INFO: You specified the input format: NT
2024-06-24 22:41:07.468 - INFO: Processing input triples from /dev/stdin ...
2024-06-24 22:41:07.471 - INFO: You specified "locale = en_US" and "ignore-punctuation = 1"
2024-06-24 22:41:07.471 - INFO: You specified "ascii-prefixes-only = true", which enables faster parsing for well-behaved TTL files
2024-06-24 22:41:07.471 - INFO: You specified "parallel-parsing = true", which enables faster parsing for TTL files with a well-behaved use of newlines
2024-06-24 22:41:07.471 - INFO: You specified "num-triples-per-batch = 2,000,000", choose a lower value if the index builder runs out of memory
2024-06-24 22:41:07.471 - INFO: By default, integers that cannot be represented by QLever will throw an exception
2024-06-24 22:41:07.856 - INFO: Parsing input triples and creating partial vocabularies, one per batch ...
2024-06-24 23:19:35.649 - INFO: Triples parsed: 580,000,000 [average speed 0.3 M/s, last batch 0.3 M/s, fastest 0.3 M/s, slowest 0.2 M/s]
`

If this works I am hoping I can see an endpoint by running something like (some formatting trashed for github markdown):

PS D:\foafplus\quickstart-qlever\qlever-code> docker run -it -p 7001:7001 -v "${PWD}/index:/index"
-e INDEX_PREFIX=wikidata-truthy -e MEMORY_FOR_QUERIES="15GB"
-e CACHE_MAX_SIZE_GB="30GB" -e CACHE_MAX_SIZE_GB_SINGLE_ENTRY="5GB"
--name qlever `
qlever

I've just installed the qlever tool via anaconda python environment, hopefully I can migrate to using that. Probably I shouldn't be impatient and jump straight to Wikidata! Feel free to close out this issue since the bloom repo issue turned out to be transient.

@hannahbast
Copy link
Member

hannahbast commented Jun 29, 2024

@danbri Can you start with the olympics dataset first (2M triples, that should be a matter of seconds)? And if that works, maybe try dblp next (400M triples, that should be a matter of minutes). Then wikidata (20B triples, that should be a matter of hours). Using the qlever script this should be as easy as (use a fresh directory for each dataset):

pip install qlever
qlever setup-config olympics
qlever get-data
qlever index
qlever start
qlever example-queries

If the performance you experience varies greatly from what is reported on https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines, please let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants