Sanjeev — Data Science Engineer

About Me

With 8+ years of industry experience in data engineering, analytics, warehousing techniques, ETL data pipelines & machine learning,

Along with numerous projects showcasing my command over data engineering through SQL, Python & dbt combined with outstanding interpersonal skills, I firmly believe that I can design, develop and deliver analytical solutions with exceptional efficiency to your business problems.

Resume

Experience

Data Engineer 3

Kraft Analytics Group

Jul 2023 – Present

● Implemented a multi-tenant dbt-based data model to automate table deployments, improving scalability and reducing onboarding time for new users from 6–9 months to 6 hours.

● Designed a nonprofit organization’s ticketing data model spanning across 1,100 schools & 500,000 college student athletes who compete annually in college sports.

● Pioneered Prefect automation by utilizing event-based triggers to create a cross-client internal python package; reduced runtime by 30%.

● Revamped a client’s historical seat level transactions per event per season FACT table logic; reduced pipeline runtime by 75%.

Senior Data Engineer

Klearnow.AI

Nov 2022 – May 2023

● Led the data engineering team on re-designing a PySpark pipeline that loads shipment & merchandise data into Redshift warehouse’s one-big-table via a multi-node EMR cluster.

● Collaborated with the AI team in building a data pipeline used for real time shipment tracking and analysis, by ingesting data extracted from third party shipment contracts.

● Reduced shipment tracking dashboard latency from 48 mins to 17mins, by identifying and resolving bottlenecks and inefficient code practices across the entire data pipeline.

● Designed & developed HubSpot API pipeline in Python to extract customer interaction data like contacts, emails, calls, notes, etc. and harness new avenues of insight. This increased the customer retention by 18%.

● Reduced CPU utilization on the data warehouse by 22% by introducing materialized views and data validation before writing data into the respective schemas. This also fixed a multitude of data quality issues such as duplicacy & data inaccuracy.

Data Engineer

AstraZeneca

Nov 2020 – Nov 2022

● Engineered a data pipeline that ingests global rare disease clinical trials data that enhanced the R&D team’s drug development, keeping data accuracy as the primary business goal while avoiding data redundancy.

● Developed a pipeline that migrated study & patient data from multiple sources into our Snowflake warehouse and in turn to our Qlik dashboards through AWS EC2 and S3 buckets. This opened the Clinical Trials to a larger population of patients based on the insight generated by the symptomology dashboards.

● Spearheaded a cross-functional data wrangling & intergrity effort to predict deviation from other similar clinical trials by extracting data from various sources.

Graduate Research Assistant

Syracuse University

Jan 2020 – Mar 2020

● Designed and containerized a scalable learning platform; 800+ students registered in the first semester of launch

Data Analytics Intern

Marathon Energy

May 2019 – Dec 2019

● Designed and implemented data pipelines with Python-Selenium web-scraper for revenue driving teams to improve data governance. These pipelines fueled PowerBI dashboards with data extracted from sources like National Grid, and exported it into our local servers. This also improved data refresh rate by 94%.

Data Analyst

Latentview Analytics

Jan 2018 – Jul 2018

● Surveyed stockholders, conceived ideas as a part of the data science team that predicts quarterly customer conversion rate and propose strategies to improve it.

● Established in-house methods to extract results from end-to-end descriptive analysis that administered Ad placement on the customer’s website which increased sales and customer retention by 12%.

Data Engineer

Cognizant Technology Solutions

Aug 2014 – Jan 2018

● Led a cross-functional team that utilizes international customer transaction and global monetary data to enhance data accuracy & compliance through data warehousing techniques & complex queries.

● Reduced service downtime by implementing a caching system that helped warm up databases with daily foreign exchange data on service startup by 85 minutes per day.

Portfolio

Trend Analysis of Stock Price and Financial News

predictive analytics, pyspark, webscrapping

Drug Recommendation System

healthcare, RShiny, clustering

Data Warhousing & Pipelines

ETL, Dimensional Modelling, Data Architecture

Boston Crime Pipeline

Python Pipeline, Dimensional Warehouse

Customer Retention Booster

Python - Web Scrapper, R, Shiny, ML-Classification

Blog Website - Flask

Python, Flask, HTML & CSS

Income Predictor

R, Shiny, ML-Clustering & Classification

Digit Recognizer

R, ML-Classification

Regression and Artificial Neural Network

R, Keras, Random Forest, Tidyverse

Publication Database

SQL Server, MS Access, MS Excel

Cluster Analysis and Decision Tree Induction

R, ML - Decision Tree & CLuster Analysis, Classification

Skills

Languages

SQL, Python, dbt, PySpark, R

Databases & Tools

Snowflake, Redshift, Hive, AWS - EC2, S3, EMR, RDS, Power BI, Tableau, Google Analytics, Docker

Data Analysis

Pandas, Numpy, Scikit-Learn, matplotlib, Shiny, ggplot2, Flask,

Data Modeling & ETL

SSIS, OLTP, OLAP, Snapshot, KPI, OLTP, OLAP, dbt

Version Control & Collaboration

Git, Circle CI, JIRA, Confluence

0 % Data Acquisition

0 % Data Pre-processing

0 % Machine learning

0 % Data Visualization

0 % Data Pipelining

0 % Data Governance

0 % Data Warehousing

0 % Database Management

Testimonials

“

I had the distinct pleasure of working with Sanjeev during his time at KlearNow.AI. Sanjeev is a quick study who jumped right in to some API's we needed to ingest to our data warehouse and not only figured out the connections, but optimized the whole process shaving off precious time during the refresh. He works well in a group or as an individual contributor asking questions when necessary and most importantly delivering solid, sustainable code that positively impacts the bottom line. I would work with Sanjeev on any project and whole heartedly recommend him.

Guy Mofley

Sales Operations and Analytics Manager @ Klearnow.AI

“

It has been a pleasure working with Sanjeev. While he has interned with our company, he has learned a lot about the Energy industry and has helped us stream line some of our processes.

Tammy Maule

Exec. Director @ Marathon Enerygy

“

Sanjeev created a web scrape tool for us that will look up the +4 of Zip Codes for us. The web scrape tool will save us significant time and is very easy to use.

Pamela Conner

Customer Admin @ Marathon Energy

“

The Python-Selenium automation tool that Sanjeev wrote would save around $4000 for the firm and multiple hours of manual work every week.

Jim Nichols

Vice President @ Marathon Energy

Education

Master of Science in Applied Data Science

Syracuse University

May 2020

Bachelor of Engineering in Electrical & Electronics Engineering

Anna University

May 2014

Get In Touch

Your message was sent, thank you!

Email sanjeevsramasamy@gmail.com

Phone +1 315 278 0599

Address

Secaucus, NJ 07094