avatar

Anurima Saha

MS - Data Science
San Diego State University (SDSU)
Data Science Research Intern at
AAA
3+ years of professional experience in ML

asaha8669 (at) sdsu.edu


About Me

Graduate student (Big Data Analytics and Data Science at San Diego State University) with 3+ years of professional experience in Data Science. Currently working as a Data Science Research Intern for Automobile Insurance Team at AAA. Demonstrated competency in utilizing advanced data mining, statistical analysis, and modelling techniques to optimize insights and drive forward business objectives. Applied advanced Machine Learing, Deep Learning and NLP algorithms in various coursework related and personal projects. Seeking to leverage expertise in Python, ML and NLP to uncover actionable insights and gain meaningful experience in data science.

Skills and Interests:

News

Projects

  1. Personal Project
    Wine reviews were used to determine the type of wine with the goal of creating a wine recommendation system based on user's description of taste preference. Training was performed on an imbalanced dataset using classification algorithms like SVM, Naive Bayes and Random Forest Classifier. Neural Network (CNN, RNN and LSTM) and LLM models (DistilBERT and RoBERTa) were also used along with Natural Language Processing to clean and tokenize the unstructed data.Error analysis was done using SHAP.

  2. Group Project
    Deep learning and Natural Language Processing used to develop an automated system capable of accurately recognizing a wide range of fruits and vegetables from their images using advanced Vision Transformer. The methodology involves the use of transfer learning to fine-tune pre-trained vision models on the Fruits and Vegetables dataset from Kaggle followed by NLP to offer personalized recipe recommendations based on the identified ingredients.

  3. Personal Project
    This project uses unsupervised machine learning to group reddit text and identify major conspiracy theories using NLP, LDA, spacy, SVD, SBert embedding and HDSCAN.

  4. Personal Project
    This project uses LAPD crime data to train a model on crime type, location, premise and time of occurance along with victim details. Supervised machine learning models like Random Forest and XGBoost have been used to determine if a certain crime will reamin unresolved along with the probability of resolution.

  5. Professional Project
    HSBC - UK Transaction Risk Management for Invoice Financing Clients
    Project undertaken as a part of risk mitigation initiative with the onset of COVID 19. Data mining done using SQL querying and Excel for all transactional and behavioral information of customer base 2007 onwards. Portfolio level analysis done for customers to determine risk appetite taking into consideration trends in macroeconomic metrics. Customer segmentation performed using Machine learning in Python on the basis of probability of default subject to amount requested for loan. Strategy rules developed with 85% accuracy score. Automated decision engine for recessionary credit request approval implemented followed by rigorous testing and model documentation appropriate for risk models.
    Benefit- ~35 million GBP

  6. Professional Project
    HSBC - UK Transaction Risk Management for Invoice Financing Clients
    HSBC business required an automated framework to launch early triggers for disruptive patterns in customer transaction behaviour. Data mining done using SQL querying all transactional and behavioural information of customers.Data Engineering done to create metrics to track 3-monthly and 6-monthly behaviour of customers. Trigger framework developed using ensemble Machine learning algorithms in python with ~80% accuracy of prediction . Testing and model documentation done as appropriate for risk models.
    Benefit- ~3 million GBP

  7. Professional Project
    AAA - Data Science Research
    Project involved model development for risk segmentation of Electric Vehicles using statistical analysis and machine learning.Collaborated with team to extract data using Snowflake and develop understanding of insurance domain Performed extensive exploratory data analysis to identify new feature for EV loss data modeling. Developed an ensemble machine learning model in Python using AWS to understand the impact of newly identified feature. Incorporated feature into existing segmentation model to achieve considerable improvement in model performance(Gini increase of 11.6%) Presented insights to business leaders and cross-functional teams for implementation

Experience

AAA : Data Science Research Intern
MAY 2024 - PRESENT

HSBC - Senior Analyst
SEP 2021 - AUG 2022

HSBC - Analyst
JUL 2019 - AUG 2021

Education

San Diego State University, CA, United States
MS in Big Data Analytics and Data Science
AUG 2023 - PRESENT
GPA - 4.0

CS 549 - Machine Learning
CS 553 - Neural Networks (Current)
CS 561 - Deep Learning for Natural Language Processing (Current)
LING 583 - Statistical Methods for Text Analysis
BDA 594 - Big Data Science and Analytics Platforms
BDA 602 - Machine Learning Engineering
BDA 696 - Advanced Special Topics in Big Data Analytics
BDA 797 - Research (Current)

University of Calcutta, India
MSc in Economics(Econometrics)
AUG 2017 - AUG 2019
GPA - 3.8

Awards


Powered by Jekyll and Minimal Light theme.