Skip to content

Find the best characteristics using various models to best predict the future returns

Notifications You must be signed in to change notification settings

donQuiote/FIN-407-Project

 
 

Repository files navigation

Decoding Future Stock Returns: Revealing the Key Predictors

This is a project from Guillaume Ferrer, Gustave Paul Besacier, Eva Perazzi and Agustina Maria Zein, students from the course Machine Learning in Finance FIN-407 from the Swiss Federal Institute of Technology Lausanne (EPFL).

About the project 📈📊

  • Goal: Retrieve a subset of predictors that best predict the stocks returns.

  • The code is accompanied by a 20-page report, which is available in the repository (...github).

Installation 💻

The code is optimized for Python 3.11.

Library

The following library are used:

  • Numpy
  • Matplotlib
  • Scikit-Learn
  • Pandas
  • Scipy
  • Json
  • Os
  • seaborn
  • tqdm
  • torch
  • transformers
  • mlxtend
  • (IPython)

Files 📁

Main files

  • main.py : Master file, desired processes are called from it.

Directories

Contains the data used in the project in different formats(.txt, .csv). Additionally some contains shorten dataset computed for shortened run times. The required data to run the code is available here https://drive.google.com/drive/folders/1ivoX5Kiannv-GN9mML8K0n7tHz3baE38?usp=share_link.

Contains the various files to handle the data and run statistical analysis

  • Data_Handler.py : Handles data loading as well as creating shorter datasets. Can run small analysis
  • Data_Analysis.py : Does the data analysis when called in main.

Handles the various techniques to fit nan values and creates the dataset to run the machine learning algorithms

  • Data_fitting.py : Imputes the missing values using the cross-sectional and time series information from the data.
  • FittingTries.py : Old imputation tries, it runs firm independent fitting.

This directory is composed of all the methods we have used to find a predictors subset

  • NonParametrical.py : This file contains multiple implementation of the non-parametrical estimation using the adaptive group Lasso.
  • lasso_elastic.py : This file contains the implementation of Lasso models and bootstrapped-enhanced technique based on cross validation and SURE framework optimal regularization parameter determination. It also contains these methods applied to Elastic-Net.
  • SFFS.py : Contains the implementation of the SFFS (Sequential Forward Feature Selection) using AIC, BIC and HQIC criteria.

Contains the implementation of the deep neural network using the predcitors we found to assess them.

  • FLASH.py : this module implements FAST (Financial Learning Algorithm for Signal Heuristics), a deep neural network designed to predict return direction and therefore provide trading signal (buy/sell). The module contains all the data handling, model training and model inference.

This directory contains many different graph we used to analyze or represent data.

Contains various utility files.

  • Grapher.py : Contains multiple functions to plot various graphs
  • Utilities.py : Functions with single use
  • shared_Treatment_Grapher.py : Contains the methods shared by more than one .py file to avoid circular imports
  • Data_Treatment.py : Contains function done to analyse the data, such as to give the basic information about the rax data, the different treatments we applied to the data, the group creation and handling, etc.

Old files, they should not be used as they may contain inefficient code or even errors.

Usage 🫳

The code can be downloaded on the GitHub repository. Usage is of a standard Python code. The original file is quite large and it is available in a Google Drive: https://drive.google.com/drive/folders/1ivoX5Kiannv-GN9mML8K0n7tHz3baE38?usp=share_link

Contact 📒

  • Guillaume Ferrer: guillaume[dot]ferrer[at]epfl[dot]ch
  • Gustave Paul Besacier: gustave[dot]besacier[at]epfl[dot]ch
  • Eva Perazzi: eva[dot]perazzi[at]epfl[dot]ch
  • Agustina Maria Zein: agustina[dot]zein[at]epfl[dot]ch

Project link: https://github.com/gustavebesacier/MouLa

Acknowledgments 🤗

We thank FIN-407 team for their support.

About

Find the best characteristics using various models to best predict the future returns

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%