Portfolio

Data Science & Machine Learning
- Credit Card Market Segmentation and Cluster Prediction
- Restuarants Review Rating Classification
Artificial Intelligence (AI)
- Facial Expression Recognition System
- Brain Tumor Detection and Localization
Machine Learning for NLP
Research Paper
- Enhancing GPT-3.5 for Zero-Shot Thai Intent Classification via Cross-Lingual Prompts, Chain-of-Thought, and Self-Consistency
Data Analysis and visualization
- Netflix Top 10 and Financial Data Analysis
- My Tableau Dashboard
- Google Spreadsheet / Excel
Project Repositories
Data Engineer Workshop

Data Science & Machine Learning Projects

Credit Card Market Segmentation and Cluster Prediction - Explore Project
- Python, Exploratory Data Analysis (EDA), Pandas, Clustering, PCA, Logistic Regression
Restuarants Rating Classification - Explore Project
- Python, BERT, Logistic Regression, Pandas

Artificial Intelligence (AI) Projects

Facial Expression Recognition System - Explore Project
- Python, Exploratory Data Analysis (EDA), Pandas, OpenCV, Image augmentation, Data normalization, Tensorflow, CNNs, RESNET
Overview
- A system that automatically monitors people's emotions and expressions based on facial images
  - The dataset comprises 2000 images with facial key-point annotations and 20,000 facial images, each labeled with facial expression categories.
  - The tasks include detecting facial key points and categorizing each face into one of five emotion categories.
- Tasks:
  - Perform image visualizations to understand the dataset.
  - Perform image augmentation to increase dataset diversity.
  - Conduct data normalization and prepare training data for model training.
  - Build deep Convolutional Neural Networks (CNNs) and residual neural network (RESNET) models for facial key points detection.
  - Save the trained model for deployment.
Brain Tumor Detection and Localization - Explore Project
- Python, Exploratory Data Analysis (EDA), Pandas, scikit-learn, OpenCV, Image Segmentation, Tensorflow, ResNet50, ResUNet
Overview
- Improve the speed and accuracy of brain tumors detection and localization based in MRI scans
  - The data comprises 3929 Brain MRI scans with brain tumor location from https://www.kaggle.com/mateuszbuda/lgg-mri-segmentation
  - The tasks include classification to detect if tumor exists or not and localizing the tumor if exists/li>
- Tasks:
  - Perform data visualizations to understand the dataset.
  - Training classifier model to detect tumor
  - Train a segmentation ResUNet model to localize tumor if exist

Machine Learning for NLP Projects

Business Idea Generator App (BizGen) using Langchain and Large Language Model (LLM) - Explore Project
- Python, Langchain, OpenAI, LLM
Aspect Category and Polarity Classification - Explore Project
- Python, Pandas, nltk toolkit, Spacy, Logistic Regression, DAN, CNNs
Overview
- Tasks :
  - Bag-of-word logistic regression model as a baseline for both sentiment and aspect classification. The features are created from the cleaned text.
  - Perform oversampling by multiplying the number of conflict label data in the training set to increase dataset diversity.
  - Trained both multi-class and multi-label logistic regression models for aspect classification.
  - For multi-label, used a binary logistic regression model to train each aspect model separately, and combine the end result prediction.
  - For Deep Learning Models, tried both pre-trained GloVe 300-dimensional word embeddings from stanford.edu and Word2Vec.
  - Build Deep Averaging Network (DAN) and Convolutional Neural Network (CNN).
  - Tuned Hyperparameters using grid search.
Next word Prediction - Explore Project
- Python, Tensorflow, Pandas
  Overview
  - Predicting next word based on the first letter
    - The training set is drawn from https://huggingface.co/datasets/gigaword
    - The development set provided for evaluation contains 94,825 rows with 3 columns:
    - 'context' column, 'first letter' column, and 'prediction' column.
    - The 'first letter' column provides the initial letter of the word to be predicted for each context, while the 'prediction' column contains the actual word that is to be generated.
  - Tasks :
    - For trigram model, a counter dictionary is used to count the number of occurrences of each trigram in the training data.
    - The model is trained in small batches, with a batch size of 2048, to accommodate the large size of the training set.
    - Once the model has been trained, the probability of each word is computed based on its frequency in the training data.
    - The trigram model is then used to predict the next word in a given context by selecting the word with the highest probability.
    - For kenlm (pre-trained 5-gram model), the next word in a given context is generated by looping over each word in the model's vocabulary and selecting the word with the highest probability.
Name Entitiy Recognition (NER) for Thai Language - Explore Project
- Python, Scikit-learn, Pandas, pythainlp
  Overview
  - Name Entity Recognition (NER) for Thai Language
    - The training and development data for this project were in Thai, and were first tokenized and separated by '|' using the pythainlp library (newmm dictionary).
    - The resulting text was then tagged with entity types, including 'ORG', 'PER', 'MEA', 'LOC', 'TTL', 'DTM', 'NUM', 'DES', 'MISC', 'TRM', and 'BRN', using 'B_' before each tag.
    - Each word and tag were separated by '\t', while sentences were separated by '\n'.
    - To preprocess the training data for the models, each word and tag in the dataset were split and stored in two separate lists: one for token sequences and one for label sequences.
  - Tasks :
    - Implemented Conditional Random Fields (CRF) with only the word, the previous word, and the next word as features as baseline.
    - Added conjunctive features to the model, which took the form of {word i-1 – word i – word i+1}, resembling bigram and trigram features to capture more contextual information about the words.
    - Explored the use of conjunctive part- of-speech (POS) tags as a feature to recognize named entities based on grammatical context, using the pythainlp pos_tag (orchid_ud).

Research Paper

Enhancing GPT-3.5 for Thai Intent Classification via Cross-Lingual Prompts, Chain-of-Thought, and Self-Consistency - Read Paper
- proposes a method using Large Language Models (LLMs) and cross-lingual techniques.
- enhance classification performance by prompting GPT 3.5 in English rather than Thai.
- Our Cross- Lingual Chain-of-Thought Prompt template (XCoT) improves LLM performance, integrating role-assigning and cross-lingual steps, surpassing standard prompts.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
Brain Tumor Detection and Localization		Brain Tumor Detection and Localization
Facial Recognition		Facial Recognition
Predicting Next Word		Predicting Next Word
Prompt Engineering (Cross-Lingual, Chain-of-Thought, Self-Consistency)		Prompt Engineering (Cross-Lingual, Chain-of-Thought, Self-Consistency)
Restuarant-Review Sentiment-Aspect Classification		Restuarant-Review Sentiment-Aspect Classification
Thai Name Entity Recognition		Thai Name Entity Recognition
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Portfolio

Table of Contents

Data Science & Machine Learning Projects

Credit Card Market Segmentation and Cluster Prediction - Explore Project

Restuarants Rating Classification - Explore Project

Artificial Intelligence (AI) Projects

Facial Expression Recognition System - Explore Project

Brain Tumor Detection and Localization - Explore Project

Machine Learning for NLP Projects

Business Idea Generator App (BizGen) using Langchain and Large Language Model (LLM) - Explore Project

Aspect Category and Polarity Classification - Explore Project

Next word Prediction - Explore Project

Name Entitiy Recognition (NER) for Thai Language - Explore Project

Research Paper

Enhancing GPT-3.5 for Thai Intent Classification via Cross-Lingual Prompts, Chain-of-Thought, and Self-Consistency - Read Paper

Projects Repositories

Data Engineer Workshop

About

Releases

Packages

Languages

thitirat-mnc/DataSci-ML-Portfolio

Folders and files

Latest commit

History

Repository files navigation

Portfolio

Table of Contents

Data Science & Machine Learning Projects

Credit Card Market Segmentation and Cluster Prediction - Explore Project

Restuarants Rating Classification - Explore Project

Artificial Intelligence (AI) Projects

Facial Expression Recognition System - Explore Project

Brain Tumor Detection and Localization - Explore Project

Machine Learning for NLP Projects

Business Idea Generator App (BizGen) using Langchain and Large Language Model (LLM) - Explore Project

Aspect Category and Polarity Classification - Explore Project

Next word Prediction - Explore Project

Name Entitiy Recognition (NER) for Thai Language - Explore Project

Research Paper

Enhancing GPT-3.5 for Thai Intent Classification via Cross-Lingual Prompts, Chain-of-Thought, and Self-Consistency - Read Paper

Projects Repositories

Data Engineer Workshop

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages