Skip to content
This repository has been archived by the owner on Jan 23, 2024. It is now read-only.
/ dscovry Public archive

A project developed during NASA SpaceAppsChallenge

License

Notifications You must be signed in to change notification settings

commanderxa/dscovry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DSCOVRY

NASA SpaceAppsChallenge: Develop the Oracle of DSCOVR

Requirements

  • python >= 3.10
  • npm >= 9.2.0

To run

  • Create virtual environment:
python3 -m venv ./venv
  • Activate the environment

    • for linux:
    source ./venv/bin/activate
    • for windows:
    .\venv\activate
  • Install the dependencies:

    • python:
    pip install -r requirements.txt
    • npm (open the second terminal):
    cd ./client && npm i
  • Download data and place it inside /data directory (create it beforehand) in the project root.

  • Download model and place it inside /models directory (create it beforehand) in the project root.

  • Run server:

python run.py
  • Run client in the 2nd terminal:
npm start

Using Neural Network

  • To train: specify data in /nn/conf/config.yaml and run:
python ./nn/train.py
  • To validate: specify data in /nn/main.py and run:
python ./nn/main.py

Project structure

This project encorporates:

  • /app - Backend
  • /nn - Deep Learning model

Backend is written in Flask and the Deep Learning Model that we called DSCOVR(Y) was developed in PyTorch.

Data

Our Deep Learning model had been trained on data from 2 datasets that we merged together:

Short names for datasets:

  • RDS_D - raw data from the satellite dataset
  • KP_D - planetary k-index dataset

Data Cleaning

RDS_D has data for each minute of a year, whereas KP_D has data for each 3 hour period. Moreover KP_D data entry is not for singe discrete hour, it is a range of 1 hour and 50 minutes.

This gives as:

  • RDS_D - data for each minute
  • KP_D - aggregated? data for 1 hour and 50 minutes of each 3 hour period (1 hour and 10 minutes thereby is a dark spot)

Basically, we have checked each RDS_D entry if it's in range of time (1 hour and 50 minutes) in any KP_D period of observations.

We have used 2 scripts to clean the data, they are located in the /scripts directory.

  • kp_dataset_clean.py used to clean the Kp indices data that contained a lot of unrelevant data to the problem in the first place. It also produces well structured file with the delimeters, makeing it possible for pandas to open it.
  • datasets_merge.py iterates over the satellite raw data, for each entry it then iterates over the Kp indices data and if the entry is in range of time measurement of any Kp index, then it adds the Kp index column to the entry.

Model Architecture

DSCIVR(Y) model has 53 inputs and 1 output. It has 1,175,681 parameters.

The architecture:

  • linear (53, 128)
  • batch_norm (128)
  • linear1 (128, 256)
  • batch_norm (256)
  • linear2 (256, 256)
  • batch_norm (256)
  • lstm (256, 256, 2 layers)
  • linear3 (256, 64)
  • linear4 (64, 1)

License

Plantary k-index data is subjected to CC BY 4.0 license.

The license of our project is Apache 2.0.