Skip to content

paulojunqueira/Data-Science-Study-and-Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 

Repository files navigation

Data Science Study and Projects

This repo contains a list and summary of projects that I have done in the context of Data Science. This repository is constantly being built and updated.

Summary


Large Language Models - LLM

NLP

TTS - Text to Speech

  • Naive Video Voice Over TTS and Transcriptions
    • Kaggle Notebook project to extract Audio Transcriptions, translate text and than translated audio and create a voice-over video
    • Keywords: TTS | Hugging Face | Whisper | Facebook MMS-TTS | Translating

Computer Vision

  • Vehicle Tracking Time in Parking Lot

    • In this Kaggle Notebook, it is used the YOLO V8 tracking to track vehicles and time it in a parking lot. Also, a simple color mapping is create to infer the vehicle color.
    • Keywords: YOLO V8 | Tracking | Python | Image Detection
  • Autolabelling with Autodistill and GroundedSAM

    • In this Kaggle Notebook, to teste the library autodistill, a video was breaked into frames that were used in the zero-shot object detection GroundedSAM to label automatically label images and create a dataset
    • Keywords: Autodistill | GroundedSAM | Python | Image Detection
  • Zero-shot Object Detection with GroundingDINO

    • In this Kaggle Notebook, it is tested the Zero-shot object detection model called GroundingDINO. Transform text description input into object detection in a imagem without training
    • Keywords: GroundingDINO | Python | Image Detection
  • Detecting and Counting Vehicles with Yolo V8 - Notebook

    • In this Kaggle Notebook, a Yolo V8 model was used to detect vehicles in a video. After dectecion, the number of vehicles that passes in each direction (up\down) are counted.
    • Keywords: Yolo V8 | Python | Image Detection
  • Detecting and Tracking People in a ROI - Notebook

    • In this Kaggle Notebook, a Yolo V8 model was used to detect and track peoaple passing in a region of interest (ROI).
    • Keywords: Yolo V8 | Python | Image Detection | Tracking | Detection | ROI
  • Autoencoder with MNIST

    • In this notebook, an Autoencoder is implemented in pytorch. Then, it was used to reconstruct the learned features from the inputed MNIST dataset imagens. From random noise to similar MNIST imanges. The objective is to learn how the structure works as it could be used for dimensionalty reduction, anommaly detection and more.
    • Keywords: Autoencoder | Dimensionalty Reduction | Pytorch | Neural Network
  • RBM Boltzmann Experiment

    • In this notebook, a Restricted Boltzman Machines (RBM) network is used to learn how to reconstruct the input, imagens from the datasets Dogs Vs Cats and MNIST. RBM is an algorithm used for many purposes such as dimensionalty reduction, regression and as generative outputs.
    • Keywords: Image | RBM | Dimensionalty Reduction | Neural Network
  • Audio to Image Pipeline - BirdCLEF 2022

    • This notebook implements one of the pipelines for audio transformation to Spectograms developed for the BirdCLEF 2022 competition where our team achieve the Bronze medal in 68th position.
    • Keywords: Sound | Image | Transformation | Pipeline | Competition
  • CNN for MNIST with Pytorch and Transfer Learning (timm)

    • This notebooks apply the transfer learning with pytorch and timm libraries in a classification task for the MNIST Dataset
    • Keywords: Classification | Image | Pytorch | CNN | timm | Transfer Learning
  • CNN for MNIST with Pytorch

    • This notebooks explores the pytorch library to develop a CNN model for the Classification task of the MNIST Dataset
    • Keywords: Classification | Image | Pytorch | CNN

Classification

  • Training Pipiline for AMEX Kaggle Competition

    • In this Kaggle Notebook, a complete pipeline was created for the AMEX Competition. The competition was to predict predict deafault. Different models and variables can be used, with feature engineering, tunning and oof analysis. The challenge of the competion was cope with the huge size of the data.
    • Keywords: Classification | GPU Processing | CATBOOST | XGB | TABNET | LOGIT | LGBM | Feature Engineering | Tuning
  • Catboost and SpaceShip Dataset

    • Introductory notebook on the SpaceShip dataset. EDA, feature engineering, tabular data augmentation, and Catboost model
    • Keywords: Classification | Data Augmentation | EDA
  • Simple Models Comparation for Titanic DataSet

    • Introductory notebook exploring the classic titanic datset. EDA, feature engineering and Classification Models comparison
    • Keywords: Classification | Ensemble | EDA

Python and PySpark

  • Ensemble Optimization with GA

    • In this notebook, a Genetic Algorithm (GA) is implemented to optmize the OOF files from predictions for a better performance in a ensemble
    • Keywords: Genetic Algorithm | Optimization | Ensemble | OOF
  • Hands on Pyspark Overview

    • Notebook that compiles pllenty pyspark functions and transforms
    • Keywords: PySpark | Python | Pandas | Functions | Aggregations | Distributed Computation
  • Custom MLP

    • Notebook that implements a custom multi layer perceptron (MLP)
    • Keywords: Python | Neural Network | Numpy
  • Gradient Descent Experiments

    • Notebook with experiments and visualizations of Gradient Descent
    • Keywords: Python | Visualizations | Gradient Descent | Numpy
  • A Pathfinder Algorithm

    • A repository containing the implementation of a algorithm called Pathfinder. The pathfinder finds the least cost path from A -> B.
    • Keywords: Python | Numpy | Optimization
  • N Queen Problem and EA

    • In this repo is implemented a Evolutionary Algorithm to solve an optimization problem called N-Queens. In this problem N chees queens have to be place in a board NxN and none of them can be kill by each other.
    • Keywords: Optimization | Evolutionary Algorithms | Python | N-Queen
  • Repo with Many other DS Developments and ML implementations

    • Github repository with many other small projects applications. For instance, K-means implementation, Regression analysis, SVM, Decision Trees etc.
    • Keywords: Machine Learning | Data Science | Python

EDA

  • EDA ML Olympiad - Toxic Language 2024

    • Exploratory Data Analysis for the ML Olympiacs - Toxic Language (PTBR). Data from Tweeter (X) with PTBR Tweets and classification of toxic or non-toxic language;
    • Keywords: Classification | NLP | Competition | EDA | Visualization
  • EDA and Web Scrapping for BirdCLEF 2023 Competition

    • Exploratory Data Analysis for the BirdCLEF 2023. Analysis of most\least common bird sounds, locations and other informations. Sound to spectogram transformation and analysis. Also, was created a Web scraping to look for additional information from the birds in the wikipedia page
    • Keywords: Web Scraping | Sound Classification | Spectogram | Competition | EDA | Visualization
  • Brief Analysis for AMEX Competition

    • Exploratory Data Analysis for the AMEX 2022 Competition.
    • Keywords: EDA | Classification | Competition
  • EDA for BirdCLEF 2022 Competition

    • Exploratory Data Analysis for the BirdCLEF 2022. Analysis of most\least common bird sounds, locations and other informations. Sound to spectogram transformation and analysis.
    • Keywords: Sound Classification | Spectogram | Competition | EDA | Visualization
  • EDA for ENEM Data (in PT-BR)

    • This study presents a statistical analysis of the data from the ENEM. This was done in PT-BR as one of the first studies that I have done.
    • Keywords: Statistical Analysis | Academic Data | EDA
  • Used Car Sales Analysis (in PT-BR)

    • This study presents an analysis and some basic model development for an used car sales data. This was done in PT-BR as one of the first studies that I have done.
    • Keywords: Python | Analysis | Visualizations | EDA | Models

Releases

No releases published

Packages

No packages published