Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support FileBackedList for PyTorch-CRF #417

Open
vrdn-23 opened this issue Jun 8, 2022 · 0 comments
Open

Support FileBackedList for PyTorch-CRF #417

vrdn-23 opened this issue Jun 8, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@vrdn-23
Copy link
Contributor

vrdn-23 commented Jun 8, 2022

The new torch-crf implementation does not currently support storing CRF features on disk. This option would be beneficial for users who do not have a larger memory threshold.

In order to implement this successfully, we would have to mainly re-implement the scikit-learn train_test_split function used in pytorch_crf.py. This sounds like a good idea to me for two main reasons:

  • First obviously, being able to support storing CRF features on disk.
  • The second is that, the function currently does not allow stratifying if we have labels that have only one instance. This makes us implement the additional overhead of duplicating unique examples if the stratify option is passed. A custom train_test_split can perhaps bypass this requirement, as I'm not convinced it is truly essential for achieving the intended effect of stratification.

It makes sense to have a separate PR for this as there are a lot of moving parts in the current PR, and this would be better evaluated as a stand alone change as we would also need to implement an efficient file-seeking mechanism for a file backed CRF feature.

@vrdn-23 vrdn-23 added the enhancement New feature or request label Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant