About Me
Graduate student (Big Data Analytics and Data Science at San Diego State University) with 3+ years of professional experience in Data Science. Currently working as a Data Science Research Intern for Automobile Insurance Team at AAA. Demonstrated competency in utilizing advanced data mining, statistical analysis, and modelling techniques to optimize insights and drive forward business objectives. Applied advanced Machine Learing, Deep Learning and NLP algorithms in various coursework related and personal projects. Seeking to leverage expertise in Python, ML and NLP to uncover actionable insights and gain meaningful experience in data science.
Skills and Interests:
- Statistical Computing Language - Python, R, SAS, STATA
- Machine Learning Algorithms –Linear & Logistic Regression, Support Vector Machine, Clustering, Decision Trees, Random Forest, XGBoost
- Deep Learning (Pytorch) – CNN, RNN, LSTM , Transformers, LLM
- Statistical Skills – Regression, Classification, Clustering, PCA, Time Series, Statistical Testing, Multivariate Statistics
- Visualization and Reporting – MS Excel, Tableau, PowerBI
- Database Querying – SQL, Snowflake
News
- [May. 2024] Started internship at AAA - Data Science Research
- [May. 2024] Graduate Research Excellence Award at SDSU Student Symposium (Link)
- [Aug. 2023] Started MS in Data Science at SDSU
- [May. 2022] Awarded Star of Business Service at HSBC
- [Sep. 2021] Promoted to Senior Analyst
- [May. 2020] Awarded team Star at HSBC
Projects
-
Personal Project
Wine reviews were used to determine the type of wine with the goal of creating a wine recommendation system based on user's description of taste preference. Training was performed on an imbalanced dataset using classification algorithms like SVM, Naive Bayes and Random Forest Classifier. Neural Network (CNN, RNN and LSTM) and LLM models (DistilBERT and RoBERTa) were also used along with Natural Language Processing to clean and tokenize the unstructed data.Error analysis was done using SHAP.
-
Group Project
Deep learning and Natural Language Processing used to develop an automated system capable of accurately recognizing a wide range of fruits and vegetables from their images using advanced Vision Transformer. The methodology involves the use of transfer learning to fine-tune pre-trained vision models on the Fruits and Vegetables dataset from Kaggle followed by NLP to offer personalized recipe recommendations based on the identified ingredients.
-
Personal Project
This project uses unsupervised machine learning to group reddit text and identify major conspiracy theories using NLP, LDA, spacy, SVD, SBert embedding and HDSCAN.
-
Personal Project
This project uses LAPD crime data to train a model on crime type, location, premise and time of occurance along with victim details. Supervised machine learning models like Random Forest and XGBoost have been used to determine if a certain crime will reamin unresolved along with the probability of resolution.
-
Professional Project
HSBC - UK Transaction Risk Management for Invoice Financing Clients
Project undertaken as a part of risk mitigation initiative with the onset of COVID 19. Data mining done using SQL querying and Excel for all transactional and behavioral information of customer base 2007 onwards. Portfolio level analysis done for customers to determine risk appetite taking into consideration trends in macroeconomic metrics. Customer segmentation performed using Machine learning in Python on the basis of probability of default subject to amount requested for loan. Strategy rules developed with 85% accuracy score. Automated decision engine for recessionary credit request approval implemented followed by rigorous testing and model documentation appropriate for risk models.
Benefit- ~35 million GBP
-
Professional Project
HSBC - UK Transaction Risk Management for Invoice Financing Clients
HSBC business required an automated framework to launch early triggers for disruptive patterns in customer transaction behaviour. Data mining done using SQL querying all transactional and behavioural information of customers.Data Engineering done to create metrics to track 3-monthly and 6-monthly behaviour of customers. Trigger framework developed using ensemble Machine learning algorithms in python with ~80% accuracy of prediction . Testing and model documentation done as appropriate for risk models.
Benefit- ~3 million GBP
-
Professional Project
AAA - Data Science Research
Project involved model development for risk segmentation of Electric Vehicles using statistical analysis and machine learning.Collaborated with team to extract data using Snowflake and develop understanding of insurance domain Performed extensive exploratory data analysis to identify new feature for EV loss data modeling. Developed an ensemble machine learning model in Python using AWS to understand the impact of newly identified feature. Incorporated feature into existing segmentation model to achieve considerable improvement in model performance(Gini increase of 11.6%) Presented insights to business leaders and cross-functional teams for implementation
Experience
AAA : Data Science Research Intern
MAY 2024 - PRESENT
- Worked on model refresh for risk segmentation of Electric Vehicles using statistical analysis
and machine learning
- Collaborated with team to extract data using Snowflake and develop understanding of insurance
domain
- Performed extensive exploratory data analysis to identify new feature for EV loss data
modeling
- Developed an ensemble machine learning model in Python using AWS to understand the impact of
newly identified feature
- Incorporated feature into existing segmentation model to achieve a Gini improvement of 11.6%
- Presented insights to business leaders and cross-functional teams for implementation
HSBC - Senior Analyst
SEP 2021 - AUG 2022
- Actively managed projects of small/medium complexity effectively communicating analytical
solutions to business heads
- Defined a business problems, collected required data, analyzed the results synthesizing
compelling insights
- Developed advanced analytics solutions including forecasting, predictive modeling,
clustering, and prescriptive analytics to solve business problems using Machine Learning in
Python
- Collaborated with cross-functional stakeholders to understand their business needs and
formulated end-to-end analysis
HSBC - Analyst
JUL 2019 - AUG 2021
- Performed exploratory data analysis on sales and revenue data followed by forecasting using machine
learning in R, RShiny and Python
- Built reusable, and maintainable models that handle large amounts of data
- Implemented data driven solutions using reports and visualizations in Excel and Tableau to
communicate data insights to stakeholders.
- Assisted with data pulling from data lake followed by data engineering and reduction.
- Developed professional competency in MS Office tools like PowerPoint, Word, Outlook and Teams
Education
San Diego State University, CA, United States
MS in Big Data Analytics and Data Science
AUG 2023 - PRESENT
GPA - 4.0
CS 549 - Machine Learning
CS 553 - Neural Networks (Current)
CS 561 - Deep Learning for Natural Language Processing (Current)
LING 583 - Statistical Methods for Text Analysis
BDA 594 - Big Data Science and Analytics Platforms
BDA 602 - Machine Learning Engineering
BDA 696 - Advanced Special Topics in Big Data Analytics
BDA 797 - Research (Current)
University of Calcutta, India
MSc in Economics(Econometrics)
AUG 2017 - AUG 2019
GPA - 3.8
Awards
- Graduate Research Excellence Award, 2024 (SDSU) - Across all departments of the university
- Star of Business Service, 2022 (HSBC) - Company Level (1 among 12000+ employees)
- Team Star, 2020 (HSBC) - Organisation Level (1 among 2000+ employees)
Powered by Jekyll and Minimal Light theme.