You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So I was wondering if it's possible to implement some sort of parallelization for reading huge PGN files? For example, split the file into smaller chunks, either on disk or in memory, and then run the reader on each chunk. If it's possible, I'm wondering why it hasn't been implemented in this crate yet? Thanks.
The text was updated successfully, but these errors were encountered:
A (completely correct) PGN parser cannot easily be parallelized at this level. For example, a { anywhere might mean that later chunks are actually part of a comment, even if they look like games.
It's not entirely impossible:
A parser could speculatively split the PGN on a boundary that looks like a game end and fix/reparse chunks in the rare case that the decision turned out to be wrong.
A parser could do a fast pass (similar to the skipping mode in this library) to determine game boundaries, followed by slower but parallelizable pass.
I haven't done this because so far there were always opportunities for more coarse-grained parallelism (e.g., https://database.lichess.org/ coming in multiple independent files already).
So I was wondering if it's possible to implement some sort of parallelization for reading huge PGN files? For example, split the file into smaller chunks, either on disk or in memory, and then run the reader on each chunk. If it's possible, I'm wondering why it hasn't been implemented in this crate yet? Thanks.
The text was updated successfully, but these errors were encountered: