Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'train' for a dataset which doesn't have train folder, how to split? #85

Closed
vidyap-xgboost opened this issue Jul 18, 2020 · 2 comments

Comments

@vidyap-xgboost
Copy link
Contributor

I've gone through tutorials for updating hyper parameters, especially this https://github.com/Tessellate-Imaging/monk_v1/blob/master/study_roadmaps/1_getting_started_roadmap/5_update_hyperparams/2_data_params/4)%20Play%20around%20with%20train-val%20splits.ipynb to split my dataset into train and validation. But I couldn't get anything like that so I took this Classifier to start training my classifier from scratch.

When I run the below code:

gtf.Default(dataset_path="/content/drive/My Drive/Data/zalando/zalando",
            model_name="resnet152_v2", 
            freeze_base_network=True,
            num_epochs=2)

I keep getting this error.

Dataset Details
    Train path:     /content/drive/My Drive/Data/zalando/zalando
    Val path:       None
    CSV train path: None
    CSV val path:   None
    Label Type:     single

Dataset Params
    Input Size:   224
    Batch Size:   4
    Data Shuffle: True
    Processors:   2
    Train-val split:   0.7

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-10-ca95f2c1dd79> in <module>()
      2             model_name="resnet152_v2",
      3             freeze_base_network=True,
----> 4             num_epochs=2)

9 frames
/content/monk_v1/monk/gluon/finetune/level_1_dataset_base.py in set_dataset_dataloader(self, test)
    116 
    117 
--> 118                 self.system_dict["dataset"]["params"]["num_train_images"] = len(image_datasets["train"]);
    119                 self.system_dict["dataset"]["params"]["num_val_images"] = len(image_datasets["val"]);
    120 

KeyError: 'train'

Where did I go wrong?

Environment: Google Colab

@vidyap-xgboost
Copy link
Contributor Author

vidyap-xgboost commented Jul 18, 2020

Surprisingly, this didn't happen when I first ran the code, it happened only when I factory reset the runtime, this started happening.

[EDIT]
When I ran the above code for the first time, I was in /content/mydrive/My Drive/Data/ folder, could that be the reason? If so, how do I resolve this by not changing into that directory?

[EDIT]
I changed to that folder and tried it again, its the same error. I am lost as to how this error suddenly occurred when it didn't occur the first time.

[UPDATE]
It's working again now after downloading the dataset again. I think it's probably the path or dataset problem.

@abhi-kumar
Copy link
Contributor

Thank you for the detailed analysis on debugging the error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants