Data Engineer

PURU
SINGH

Building high-performance data pipelines and intelligent systems with Spark, Databricks, and Generative AI.

Scroll

Where I've Worked

Data Engineer – Business Analyst, Gen-AI
Genpact · Client: GE Vernova
Bangalore, Karnataka Oct 2024 – Present
  • Spearheaded the end-to-end migration of mission-critical ETL pipelines from PostgreSQL (Greenplum) to SparkSQL and PySpark on Databricks, improving job runtimes by 80% via logical flow enhancements.
  • Owned the development of backend architecture on Databricks for 70+ Business Objects (BO) reports integrated with Oracle ERP systems, streamlining complex reporting for key business stakeholders.
  • Maintained the GSC Bowler (Global Supply Chain) performance-tracking system, ensuring the integrity of high-fidelity data for critical supply chain KPIs.
  • Modernized the Enterprise Data Lake (EDL) by transitioning sources from PowerNow 1.0 to 2.0, ensuring architectural consistency and zero data loss for global operations.
  • Orchestrated robust workflows using Apache Airflow and Databricks native features to ensure scalable, production-grade data processing.
  • Engineered Gen-AI and RAG systems using Python to automate insight generation and improve data discoverability within the data ecosystem.
  • Collaborated across cross-functional teams to streamline migration across Ingestion, Transformation, and Visualization layers.
Genpact x GE Vernova - Databricks Migration
Intern – Data Analyst, Gen-AI
Genpact
Hyderabad, Telangana Feb 2024 – Jul 2024
  • Developed a Python automation script using LangChain to generate summaries of 10-K financial reports, achieving an 85% speedup over existing manual solutions.
  • Built a Financial Health Dashboard for Consumer Banking using Power BI, providing a comprehensive overview of income and spending with automated goal suggestions.
  • Performed data preprocessing and cleaning to extract actionable insights for banking sector stakeholders.
Intern – Android Development & Security
DRDO
New Delhi, Delhi Jul 2023 – Aug 2023
  • Developed a background Android application using Kotlin and Jetpack Compose to capture and forward system notifications via SMS.
  • Conducted security analysis using MobSF (static and dynamic), identifying critical flaws and reducing total testing time by 70%.
  • Implemented real-time logging and system-level API integrations (SmsManager) for secure notification monitoring.

What I've Built

Loocle – Self-Hosted Photo Backup & Web App

  • Engineered a cloud-independent photo management system replicating the Google Photos experience on local hardware.
  • Built a Python Flask web app with fast metadata-based search, caching, and multi-shot facial recognition features.
  • Automated wireless Android backups using ADB over Wi-Fi for seamless syncing between devices.
  • Implemented automatic metadata tagging using local LLMs via Ollama to associate detected faces with person names, enhancing organization and quick retrieval.

Face Recognition via One-Shot Classification

  • Recognizes faces from a single source image using embedding differences.
  • Over 90% speedup versus traditional face recognition.
  • Built with FaceNet, Haar Cascade, and one-shot classification.

My Toolkit

Python Flask Apache Spark Apache Airflow Databricks SQL AWS REST APIs C++ Kotlin Power BI Excel LangChain RAGs Ollama Machine Learning Automation Scripting

Certifications

Data Engineer – Professional
Databricks
Data Engineer – Associate
Databricks
Generative AI Engineer – Associate
Databricks
Certified Data Engineer – Associate
AWS
Cloud Practitioner (CLF-C01)
AWS
AI Fundamentals (AI-900)
Microsoft Azure

Background

Graphic Era Deemed to be University

B.Tech in Computer Science · 2020 – 2024 · Dehradun, UK

GPA: 8.73

Achievements

Languages
English & Hindi
Typing Speed
172 WPM