Parallelization for Reading Large PGN Files #39

dvub · 2023-12-16T21:17:50Z

So I was wondering if it's possible to implement some sort of parallelization for reading huge PGN files? For example, split the file into smaller chunks, either on disk or in memory, and then run the reader on each chunk. If it's possible, I'm wondering why it hasn't been implemented in this crate yet? Thanks.

niklasf · 2023-12-22T17:53:08Z

A (completely correct) PGN parser cannot easily be parallelized at this level. For example, a { anywhere might mean that later chunks are actually part of a comment, even if they look like games.

It's not entirely impossible:

A parser could speculatively split the PGN on a boundary that looks like a game end and fix/reparse chunks in the rare case that the decision turned out to be wrong.
A parser could do a fast pass (similar to the skipping mode in this library) to determine game boundaries, followed by slower but parallelizable pass.

I haven't done this because so far there were always opportunities for more coarse-grained parallelism (e.g., https://database.lichess.org/ coming in multiple independent files already).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelization for Reading Large PGN Files #39

Parallelization for Reading Large PGN Files #39

dvub commented Dec 16, 2023

niklasf commented Dec 22, 2023

Parallelization for Reading Large PGN Files #39

Parallelization for Reading Large PGN Files #39

Comments

dvub commented Dec 16, 2023

niklasf commented Dec 22, 2023