Skip to content
NelsonGon edited this page Aug 16, 2020 · 14 revisions

Welcome to cytounet's wiki. 😃

Compiled below is a list of the most frequent questions/problems.

Model stuck when using predict

This may depend on your Keras/Tensorflow versions but has in general been fixed. If you meet this issue, please open a new issue.

Found 0 images when using predict

Please ensure that your test images are in subfolders. For instance, if you have a folder named test with images whose predictions are required, you should structure it as follows: test --> images_to_predict.

In using predict, you should then set the test path as test not test/images_to_predict.

Unknown loss function when using predict

If you have defined a custom loss function for example dice_coef, you need to define a dictionary corresponding to these functions.

Training with data from load_augmentations leads to the model getting stuck after one epoch

This is related to keras' model.fit method and the steps_per_epoch argument. If you set this in model.fit, there may not been enough samples and the method will simply get stuck with no warning.

What you can do instead is to use the batch_size argument and use an appropriate batch_size based on you data.

Loss goes up and down

  • Try using a LRScehduler to "sequentially" decrease the learning rate.

  • If using SGD, it has been suggested that using a "raw" SGD i.e. one without such parameters as decay. You can set momentum to None or delete this altogether. At the time of writing, this is not possible without manually editing cytounet's source code. See this discussion.

How can I choose an optimal batch_size?

This will vary depending on your dataset and/or computational resources. For a general overview of the effects of batch_size, take a look at this paper, this and this.

In general, it is stated that batch sizes of powers of two perform well for most data sets.

A simple way to think of this is:

Small batch sizes --> Fast(er) training, faster convergence, greater error in the gradient estimate.

Large batch sizes --> Better estimate, require more computational power and may take more time to converge.

Accuracy and Loss are stuck at the same value

  • Try a different optimiser. The default is SGD which may not perform well on some datasets. Try using Adam and see how that affects your results.

  • You probably have an imbalanced data set. In the train/test/validation data generators, try using a batch_size that ensures a balanced number of samples.

  • Too high/ too low learning rate. Use a learning rate scheduler or write a callback that stops training on plateau.

My model trains so slowly, what can I do to get faster training times?

  • Try reducing the batch_size

  • Try increasing the number of steps_per_epoch, or use a more powerful GPU.

  • You can also try to reduce the input_size to the data generators.

My model is not improving, what should I do?

Try reducing the learning rate. Alternatively, try using a learning rate scheduler or try different combinations of different hyper-parameters. This is on the TODO list

My model has a low loss but does poorly on training data.

Please use generate_validation_data and feed the result to fit_generator. That way, you can be see how the model does on the training and validation set. Alternatively, try using a different metric. We currently support dice and binary_crossentropy. You can implement your own and feed it to the metrics argument in unet.

Model cannot train, throws out of resource memory error

This is fairly common and is tied to your machine's computational power. The ideal way would be to have a strong(er) machine. If you're low on resources however, please reduce the batch_size, reduce the input_shape and/or increase the number of steps_per_epoch. This will lead to more training time however.

If training with augmented data, try using higher steps_per_epoch in the .fit method. Alternatively, you can set higher batch_sizes as this reduces the computational load. Note that this may result in lower accuracy.

Model gives a black prediction

It is likely that the model converged too soon and/or you used a very low/high learning rate. Try to change these or train with more data. You can use load_augmentations for example to process augmented images and increase your training set.

Thank you and do let us know if you have further questions.

Clone this wiki locally