Sanjeev

Data Engineer

Scroll

About Me

divider
Image

With 7 years of industry experience in data engineering, analytics, warehousing techniques, ETL data pipelines & machine learning,

Along with numerous projects showcasing my command over data engineering through SQL, Python, PySpark & R combined with outstanding interpersonal skills, I firmly believe that I can design, develop and deliver analytical solutions with exceptional efficiency to your business problems.

Download my CV

Experience

divider

Data Engineer 3

Kraft Analytics Group

    ● Redesigned & deployed reatil & marketing end-to-end data pipeline, improving efficiency and consistency.

    ● Assisting Enterprise Service team with Ad-hoc data modeling & data pipeline redesign efforts.

    ● Spearheading the dbt data modeling team to improve data intergrity & semantics, help stakeholders to better understand the structure & relationship of the data lineage thereby assisting the executive team to make informed descisions.

Senior Data Engineer

Klearnow.AI

    ● Led the data engineering team on re-designing a PySpark pipeline that loads shipment & merchandise data into Redshift warehouse’s one-big-table via a multi-node EMR cluster.

    ● Collaborated with the AI team in building a data pipeline used for real time shipment tracking and analysis, by ingesting data extracted from third party shipment contracts.

    ● Reduced shipment tracking dashboard latency from 48 mins to 17mins, by identifying and resolving bottlenecks and inefficient code practices across the entire data pipeline.

    ● Designed & developed HubSpot API pipeline in Python to extract customer interaction data like contacts, emails, calls, notes, etc. and harness new avenues of insight. This increased the customer retention by 18%.

    ● Reduced CPU utilization on the data warehouse by 22% by introducing materialized views and data validation before writing data into the respective schemas. This also fixed a multitude of data quality issues such as duplicacy & data inaccuracy.

Data Engineer

AstraZeneca

    ● Engineered a data pipeline that ingests global rare disease clinical trials data that enhanced the R&D team’s drug development, keeping data accuracy as the primary business goal while avoiding data redundancy.

    ● Developed a pipeline that migrated study & patient data from multiple sources into our Snowflake warehouse and in turn to our Qlik dashboards through AWS EC2 and S3 buckets. This opened the Clinical Trials to a larger population of patients based on the insight generated by the symptomology dashboards.

    ● Spearheaded a cross-functional data wrangling & intergrity effort to predict deviation from other similar clinical trials by extracting data from various sources.

Graduate Research Assistant

Syracuse University

    ● Designed and containerized a scalable learning platform; 800+ students registered in the first semester of launch


Data Analytics Intern

Marathon Energy

    ● Designed and implemented data pipelines with Python-Selenium web-scraper for revenue driving teams to improve data governance. These pipelines fueled PowerBI dashboards with data extracted from sources like National Grid, and exported it into our local servers. This also improved data refresh rate by 94%.


Data Analyst

Latentview Analytics

    ● Surveyed stockholders, conceived ideas as a part of the data science team that predicts quarterly customer conversion rate and propose strategies to improve it.

    ● Established in-house methods to extract results from end-to-end descriptive analysis that administered Ad placement on the customer’s website which increased sales and customer retention by 12%.


Data Engineer

Cognizant Technology Solutions

    ● Led a cross-functional team that utilizes international customer transaction and global monetary data to enhance data accuracy & compliance through data warehousing techniques & complex queries.

    ● Reduced service downtime by implementing a caching system that helped warm up databases with daily foreign exchange data on service startup by 85 minutes per day.

Skills

divider

Languages

    SQL, Python, R, PySpark, R


Databases & Tools

    Power BI, Tableau, AWS - EC2, S3, EMR, RDS, Redshift, Google Analytics, Docker, Hive, Snowflake


Data Analysis

    Pandas, Numpy, Scikit-Learn, matplotlib, Shiny, ggplot2, Flask,


Data Modeling & ETL

    SSIS, OLTP, OLAP, Snapshot, KPI, OLTP, OLAP, dbt


Version Control & Collaboration

    Git, Jenkins, JIRA, Confluence

0 % Data Acquisition
0 % Data Pre-processing
0 % Machine learning
0 % Data Visualization


0 % Data Pipelining
0 % Data Governance
0 % Data Warehousing
0 % Database Management

Testimonials

divider

Education

divider

Master of Science in Applied Data Science

Syracuse University

Bachelor of Engineering in Electrical & Electronics Engineering

Anna University

Get In Touch

divider
Your message was sent, thank you!
Address
Boston, MA 02446