Repository containing code for experiments conducted with regards to early time-series classification of light curves to identify type of each astronomical source responsible for a particular light curve emissions
Required Platforms:
- Databricks for data processing and visualization using Apache Spark
Required Python Packages:
- numpy
- pandas
- matplotlib
- plotly
- seaborn
- scikit-learn
- tensorflow (1.13.2)
- keras
Required linux utilities:
- awk
- sed
- cut
- gnuplot
Data:
- Training:
- training_set.csv
- training_set_metadata.csv
- Test:
- test_set.csv
- test_set_metadata.csv
Types of models being tested:
-
GRU-based RNN with passbands embedded, spatial droput and max-pooling layers:
-
LSTM
- LSTM::
- LSTM with spatial dropout and max-pooling layers::
-
Phased-LSTM
- Variant 1: Here the inputs are only flux values and flux error values for each of the passbands (total 6 flux values and 6 flux error values)
- Variant 2: Here the inputs are flux values, flux error values and source wavelengths (total 6 flux values, 6 flux error values, 6 source wavelengths where there is zero value when there is zero flux value for the same)
- Variant 3: Inputs are flux values without pass band distinction, flux error values without pass band distinction, passband indicator (1, 2, 3, 4, 5, 6), source wavelengths (Here the validation accuracy is below 50% for 50 epochs and is not stable for training.
-
Time-LSTM
- LSTM::
-
Self-Attention (Transformer's Encoder Architecture)
-
Classical ML
Folders:
- /gru_emb_sd_mp
- /lstm
- /self-attention
- /classical_ml
- /misc_experiments contains other experiments understanding feasibility of an idea like active learning
- /data
- /EDA
- unsupervised_classification
Benchmark models being tested against:
- Avocado
- RAPID