Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelization for Reading Large PGN Files #39

Open
dvub opened this issue Dec 16, 2023 · 1 comment
Open

Parallelization for Reading Large PGN Files #39

dvub opened this issue Dec 16, 2023 · 1 comment

Comments

@dvub
Copy link

dvub commented Dec 16, 2023

So I was wondering if it's possible to implement some sort of parallelization for reading huge PGN files? For example, split the file into smaller chunks, either on disk or in memory, and then run the reader on each chunk. If it's possible, I'm wondering why it hasn't been implemented in this crate yet? Thanks.

@niklasf
Copy link
Owner

niklasf commented Dec 22, 2023

A (completely correct) PGN parser cannot easily be parallelized at this level. For example, a { anywhere might mean that later chunks are actually part of a comment, even if they look like games.

It's not entirely impossible:

  • A parser could speculatively split the PGN on a boundary that looks like a game end and fix/reparse chunks in the rare case that the decision turned out to be wrong.
  • A parser could do a fast pass (similar to the skipping mode in this library) to determine game boundaries, followed by slower but parallelizable pass.

I haven't done this because so far there were always opportunities for more coarse-grained parallelism (e.g., https://database.lichess.org/ coming in multiple independent files already).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants