Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to-tf-od: rework shard output #36

Open
fracpete opened this issue Jan 21, 2021 · 0 comments
Open

to-tf-od: rework shard output #36

fracpete opened this issue Jan 21, 2021 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@fracpete
Copy link
Member

Currently, the -s option allows one to specify the shard files to generate. However, this is not feasible when creating hundreds of shard files.
Instead have the following two options:

  • --num_shards <int>: how many shard files to generate, default is -1, so no sharding
  • --format <format>: the format for the file names (see below)
  • --width <int>: the number of digits to use to the index/total number of shards in the file name (left-padded with 0s), default is 5

The shard format string supports the following two placeholders:

  • {INDEX} - the 1-based index for the shard file
  • {TOTAL} - the total number of shard files that get generated (see --num_shards)

This allows the specification of formats:

  • --format "train.record-{INDEX}-{TOTAL}" (see here)
  • --format "train-{INDEX}-{TOTAL}.tfrecords"
@fracpete fracpete added the bug Something isn't working label Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants