Skip to content

Add blocksize to DocumentDataset.read_* that uses dd.from_map #1421

Add blocksize to DocumentDataset.read_* that uses dd.from_map

Add blocksize to DocumentDataset.read_* that uses dd.from_map #1421

Workflow file for this run

name: Test Python package
on:
push:
branches:
- main
pull_request:
workflow_dispatch:
# When this workflow is queued, automatically cancel any previous running
# or pending jobs from the same branch
concurrency:
group: test-${{ github.ref }}
cancel-in-progress: true
jobs:
build_and_test:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
python-version: ["3.10"]
steps:
- uses: actions/checkout@v4
- name: Optionally free up space on Ubuntu
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install NeMo-Curator and pytest
# TODO: Remove pytest when optional test dependencies are added to setup.py
# Installing wheel beforehand due to fasttext issue:
# https://github.com/facebookresearch/fastText/issues/512#issuecomment-1837367666
# Explicitly install cython: https://github.com/VKCOM/YouTokenToMe/issues/94
run: |
pip install wheel cython
pip install --no-cache-dir .
pip install pytest
- name: Run tests
run: |
python -m pytest -v --cpu