I'm a passionate Data Scientist currently pursuing a Master of Data Science at the University of California, Irvine. With a robust background in Computer Engineering I specialize in transforming data into actionable insights and building models to solve real-world problems. My expertise lies in Machine Learning, Deep Learning, and CV.
"Data is the new oil, but it’s only valuable if you can extract meaning from it."
- 🔭 I’m currently working on developing advanced machine learning models and enhancing my knowledge in data science.
- 🌱 I’m currently learning about Big Data technologies and cloud computing with AWS.
- 👯 I’m looking to collaborate on innovative data science projects and open-source contributions.
- 💬 Ask me about machine learning algorithms, data analysis techniques, and statistical modeling.
- 📫 How to reach me: pbgupta@uci.edu
- Programming Languages: Python, C++, MATLAB
- Web Development: HTML, CSS, Bootstrap, Flask
- Python Libraries and Frameworks: Pandas, NumPy, Matplotlib, TensorFlow, OpenCV, Scikit-learn, Keras
- Database: MySQL, PostgreSQL
- Data Science: Data Analysis, Machine Learning, Deep Learning, LLM, Generative AI
- Statistical Analysis: Strong applied statistics skills, including distributions, statistical testing, regression
- Business Intelligence: Tableau, Microsoft Excel, AWS
- Big Data Technologies: Apache Spark
An AI-driven storytelling assistant utilizing Retrieval-Augmented Generation (RAG) and Gemini model for personalized, interactive narratives. Integrated DALL-E for visual storytelling within a Streamlit interface, enhancing user engagement by 60%.
Developed a chatbot leveraging LLaMA and BERT to deliver personalized, sentiment-aware financial advice. Achieved a 20% improvement in accuracy and deployed a real-time application with Streamlit.
A custom MCQ generator that utilizes LLMs through Langchain, automating question creation for educational purposes.
Engineered a recommendation system using Autoencoders and PCA to enhance user preference predictions by 20%. Deployed via Flask with optimized generalization techniques like batch normalization and dropout.
Created a high-accuracy (90%+) brain tumor classification model using ResNet and VGG-16. Fine-tuned with TensorFlow and Flask for real-time predictions, leveraging advanced preprocessing and data augmentation strategies.
A tool to estimate salaries for data science roles using web-scraped data from Glassdoor, with features like job title, location, and company. Implemented data cleaning, exploratory analysis, and machine learning models (Random Forest, Lasso, Linear Regression), achieving the best performance with Random Forest (MAE: 11.14).
Repository containing PDF guides explaining key AWS services (Lambda, API Gateway, RDS, VPC) and their interactions, offering a foundational understanding of cloud service integration.
Data Scientist with a solid background in computer engineering, experienced in transforming data into actionable insights and building models to solve complex problems. Proven track record of improving model accuracy and optimizing algorithms. Skilled in Python, SQL, and various data science tools and technologies.
- Kaggle: Google KaggleX BIPOC Mentee
- Contributed to open-source projects like SENMETER for sentiment analysis.
- 📖 A big fan of fantasy novels, always on the lookout for new book recommendations.