- cloud_labeling.ipynb
- Jupyter Notebook file to label and output patch data from a given date range
- visualize_patches.ipynb
- Jupyter Notebook file to cluster labeled patch data, create visualizations, and remove poorly labeled patches
- 80k_with_31_patches_clustered.ipynb
- Jupyter Notebook file to cluster labeled patch data with the exisiting 80k patches
- matplotlib
- os
- sys
- glob
- numpy
- matplotlib
- pyhdf.SD
- Tensorflow 1.12.0 for CPU
- pandas
- seaborn
- math
- sklearn
- Necesary elelments:
- lib_hdfs directory
- .txt file of dates (see clouds/src_analysis/dates for examples)
- MOD02, MOD35 data from the NASA LAADS website (see here for download instructions)
-
Run cloud_labeling.ipynb
-
After running the notebook (and labeling), you will have the necessary file for clustering and validation. This file contains a list of labeled patch instances with the necessary information for the clustering model and analysis.
- patches_DDMMYYYY.npy, where DDMMYYYY is the date the patches were labeled
- Necessary elements:
- lib_hdfs directory
- encoder directory (see "load model" section of visualize_patches.ipynb)
- patches_DDMMYYY.npy (my labeled 31 patches can be found here)
- Run visualize_patches.ipynb
- Edit num_clusters to change the number of clusters for agglomerative clustering
- Remove ambigious/mislabeled patches from patch list if necessary
- Save plot images if desired
- Necessary elements:
- npy file containing the labels from clustering ALL data together (use the bash script located here)
- Run 80k_with_31_patches_clustered.ipynb