Staff Data Scientist with 11 plus years of experience executing data-driven solutions to increase efficiency, accuracy and utility of data processing. Proficient in building Natural Language Understanding and Natural Language Generation Models (NLP=NLU+NLG). I am looking for open source colloborations and working on productive remote teams.
Languages
: Python, C++,SQL
Developer Tools
: Jupyter, VS Code, Git,Confluence,Jira, Azure/AWS/GCP
ML Tech
: NLP NLU, NLG, RNN,LSTM ,BERT,GPT,CNN,Transformers,HuggingFace, Fastai
ML Tools
: MLFlow, Grafana, Prometheus, Gradio, WanDB
Netacad Copilot - Engineering Technical Leader Generative AI Enhanced Netacad platform with Generative AI capabilities for course learners to query on course topics and get answers from course sections using Retrieval Augmented Generative Search (RAG) data using LLM models like Anthropic Claude sonnet -3.4 llama3.1, and mistral 7B models.
-
Employed Prompt tuning and Prompt Engineering techniques using pydantic to retrieve information from OpenSearch Database.
-
Created monitoring stack on llm responses using Apache Superset and Postgres Database.
-
Supported Translation capabilites by creating a llm translation wrapper using Mixtral 45B model.
-
Developed a data pipeline workflow to move course data present in gitlab repositories to OpenSearch using PGSync Connector and Intermediate(Postgres Database) mechanism.
-
Added Security level access between data resources for course learners and course instructors.
GenosAI - Senior Staff Generative AI Enabled Retrieval Augmented Generative Search (RAG) for clinical Trials, Health Care Professionals(HCP's) and Institutions data using LLM models like openai gpt-4, llama2, mistral etc.
-
Employed Prompt tuning and Prompt Engineering techniques using pydantic to generate elastic search queries.
-
created embeddings for documents and stored on elastic search to enable K nearest neighbours KNN search on the embeddings.
-
Developed fastapi apis and exposed them on Amazon elastic kubernetes service.
-
Conducted Code reviews ,and ci-cd pipelining using circle-ci.
-
Added Infra setup and Deployed the app on Amazon Kubernetes service
PII Information Classification
April 2022 β April 2023
Senior Staff Machine Learning
- Enabled a pii/phi/pci detection strategy across multiple communication channels and different types of files using NER techniques.
- Reduced the false positive rate from 15 percent to 7 percent by using advanced semi-supervised learning techniques and architectures like longformer.
- Strategized the structured pii identification using character cnn models and integrated it with existing regex based system.
DocZ Document AI Assistant Product Engineering AI/NLP Expert
- Enhanced In-house DocZ product to condense clinical study report information with NLP Actions using techniques like Named entity recognition (NER in scispacy and microsoft text analytics for health).
- Condensed the clinical study report document by 75 percent by using One-shot Summarization by using Universal sentence Encoder Embeddings.
- Improved the table extraction of measurements by 95 percent from irregular rtf files to excel by using tabula module in python.
Fraud Detection Machine Learning/NLP Consultant
- Implemented machine learning to reduce fraud by 8 percent by using Gradient Boosting Trees.
- Brought down the client metric (false positive/true positive) ratio under 4 as opposed to 6.5. Complaint Categorization
- Automated the complaint categorization from manual process by using tfidf,text analytics, logistic regression with 0.8 F1 Score at each level.
- Reduced time of complaint categorization for 1000 complaints from 20 business hours to 2 minutes.
Question Generation Wizard Software NLP Engineer
- Automated generation of FAQ questions given answer and context using LSTM/RNN Encoder Decoder deep learning models.
- Able to reduce the time of FAQ creation of questions when compared with an Subject Matter Expert by 80 percent. Ticket Classification
- Leveraged Azure Machine learning for efficient classification of incoming software/hardware related tickets into issue categories using email description by using ensemble of models like logistic regression,boosted decision trees and naive bayes algorithms.
- The time of classification of tickets to correct categories was reduced to 10ms. Forecasting Consumer Goods
- Converted Alteryx workflows of forecasting of top 8 products to R.
- Reduced forecast time by 63 percent and increased revenue by 29 percent to the client.
Indian School of Business
Business Analytics Graduate Certificate
V.R Siddhartha engineering College
Bachelors in Electronics and Communication Engineering