Lokesh Gaddam

About

The analyst who reads
the P&L first

Background

I specialise in financial and risk analytics — where a misclassified loan costs money and a missed complaint trend costs customers.

Final-year B.Tech (ECE — Data Science) at KL University, CGPA 8.2. Four shipped projects: raw CSV ingestion to production-ready Power BI reports, with reproducible Python pipelines and documented methodology throughout.

I write code to be read by the next analyst. I write dashboards to be trusted by someone who doesn't know what a p-value is.

Download Resume

"A dashboard that requires explanation has already failed. The insight should be visible before the meeting starts." — My design principle for BI reports

Education

B.Tech — ECE (Data Science Specialisation)
KL University · CGPA 8.2 · Graduating May 2026

Certifications

Google Data Analytics — Coursera Python for Data Science — Coursera

Domain Focus

Loan default & credit risk Complaint intelligence Quantitative backtesting ETL pipeline design Statistical EDA ML classification

Open To

Data Analyst · Business Analyst in banking, fintech, or financial services. Hybrid or remote. Vijayawada / Hyderabad preferred; open to relocation.

Flagship Case Study

Deep dive · Banking Risk

Bank Loan Default
Detection Pipeline

Problem

A lending portfolio with no automated early-warning system. Risk officers reviewing applications manually, missing behavioural signals.

Dataset

77,000+ loan records · 35 raw features · Class imbalance: 78% non-default / 22% default

Approach

EDA → imputation (median/mode by segment) → 14 engineered features (debt-to-income, repayment history score, utilisation flags) → SMOTE rebalancing → Random Forest + XGBoost comparison → threshold tuning to minimise false negatives

Result

AUC 0.91 Recall 0.87 Precision 0.84 F1 0.855

Business impact

At 0.87 recall across 77K records: correctly flags ~14,700 of 16,940 true defaults — ranked risk score for credit decisioning.

Tools

Python · Pandas · Scikit-learn · XGBoost · Matplotlib · Seaborn · Jupyter

Confusion Matrix · Test Set Threshold = 0.38

14,700 True Positive

3,220 False Positive

2,240 False Negative

56,840 True Negative

Feature Importance · XGBoost

Debt-to-Income Ratio 0.24

Repayment History Score 0.19

Credit Utilisation Flag 0.15

Loan-to-Income Ratio 0.12

Employment Length 0.09

ROC Curve · XGBoost vs Random Forest AUC = 0.91

The trade-off I made

I optimised for recall over precision, setting the classification threshold at 0.38 (below default 0.5). This intentionally increases false positives — flagging some good loans as risky — but reduces far more costly missed defaults.

In a real lending context: a false positive costs a manual review; a false negative costs the loan amount. I documented the expected false positive rate (~19%) so the credit team could weigh the trade-off explicitly.

What failed and what I learned

First iteration used raw loan amount as a feature. The model overfit — high amounts dominated feature importance, masking behavioural signals. I rebuilt with ratio-based features (debt-to-income, loan-to-income). Feature importance shifted to repayment history and utilisation patterns.

Lesson: raw scale features in financial data tell you about the customer segment, not the risk. Engineer ratios, not magnitudes.

What I'd build next

A monitoring layer: track precision/recall on new data quarterly. I'd add SHAP explanations per prediction so a credit officer can see exactly which features drove a high-risk flag — critical for RBI guidelines on algorithmic credit scoring.

SHAP explainability Model drift monitoring Regulatory documentation API deployment

Output artifact

Jupyter notebook with full pipeline, confusion matrix, ROC curve, and feature importance chart. README documents all assumptions and how to reproduce on a new dataset.

View full project on GitHub →

Selected Projects

01

Customer Complaints Intelligence Dashboard

AmEx complaint records had no visibility into which products, channels, or regions were driving repeat complaints — resolution teams were reactive, not systematic.

Approach

Python ETL across 50K+ records. SQL aggregations by product, channel, resolution time. Power BI with 8 drill-through views.

Result

8 product lines broken down by complaint volume, median resolution time, and escalation rate on one dashboard page.

Business Impact

Surfaces top 3 complaint drivers per product — gives an ops manager one screen to prioritise resolution resources.

Output

Power BI .pbix · DAX measures for complaint rate, SLA breach %, and YoY trend.

Power BI Python SQL DAX

GitHub

Next: sentiment scoring on complaint text to auto-tag tone severity — distinguishing billing confusion from a regulatory grievance.

Complaint Intelligence · Power BI

50K+

Total Records

4.2d

Avg Resolution

18%

SLA Breach

Complaints by Product

Credit Card↑ 23%

Personal Loan↑ 18%

Savings Account↓ 4%

02

Stock Market Analysis & Strategy Backtesting

Retail investors making entry/exit decisions on intuition, with no systematic evidence their chosen indicators produce positive risk-adjusted returns.

Approach

5 years OHLCV data. MA crossover (20/50-day), RSI, Bollinger Band strategies. Backtested with 0.1% transaction cost assumption per trade.

Result

MA crossover: Sharpe-equivalent 1.34 vs buy-and-hold 0.87. RSI underperformed in trending markets — documented and explained.

Business Impact

Reproducible framework to evaluate any technical strategy before live deployment — risk-adjusted return, max drawdown, win-rate outputs.

Output

Python backtesting module, equity curve charts, performance comparison table across 3 strategies.

Python Pandas NumPy

GitHub

Next: walk-forward validation to prevent look-ahead bias, and Monte Carlo simulation for drawdown scenarios.

Equity Curve · Strategy Comparison

1.34

MA Sharpe

0.87

B&H Sharpe

−14%

Max Drawdown

03

Age–Gender Detection · CNN Pipeline

Demographic classification on live video — for retail foot-traffic analytics, digital signage targeting, and security systems.

Approach

Fine-tuned VGG-16 on UTKFace (23K images). Separate heads for age regression and gender classification. OpenCV for real-time face detection.

Result

Gender: 91% accuracy. Age: MAE ±5.2 years on held-out test set. Real-time at 18 FPS on laptop GPU.

Business Impact

Proof-of-concept for in-store demographic analytics — age-group and gender breakdown of foot traffic without manual tagging.

Output

Trained weights, inference script, evaluation notebook with confusion matrix and error distribution by age bracket.

TensorFlow Keras OpenCV

GitHub

Next: data augmentation for low-light retail environments. Evaluate MobileNetV3 for edge deployment speed.

Inference Output · VGG-16

Male, 28y

91%

Gender Acc.

±5.2y

Age MAE

18fps

Inference

Top Repositories

Repository 01

banking-analysis

Loan default detection pipeline — EDA, feature engineering (14 ratio-based features), SMOTE rebalancing, XGBoost vs Random Forest comparison, threshold tuning.

AUC 0.91 · Recall 0.87

Python XGBoost Scikit-learn

View

Repository 02

amex-complaints

Complaint intelligence dashboard — Python ETL on 50K+ records, SQL aggregations, Power BI report with 8 drill-through views and DAX-driven KPIs.

50K records · 8 BI views

Power BI SQL DAX

View

Repository 03

stock-market-analysis

Strategy backtesting framework — MA crossover, RSI, and Bollinger Band strategies compared on 5 years of OHLCV data with transaction cost assumptions.

Sharpe 1.34 vs B&H 0.87

Python Pandas NumPy

View

Contribution Activity

GitHub contribution chart for LokeshGaddam14

Skills & Proof

What I can do,
with receipts

No self-rated percentages. Every skill is backed by a project with documented outputs at a real scale.

Python · Data Pipeline & Modeling

Primary tool

Built a 77K-record ETL + ML pipeline (loan default) from raw CSV to evaluated model — imputation, SMOTE, threshold tuning, all in one reproducible notebook. Engineered 14 ratio-based features from raw financial fields.

PandasNumPyScikit-learnXGBoostMatplotlibSeaborn

SQL · Aggregation & ETL

Daily use

Wrote multi-table SQL aggregations across 50K+ complaint records — joining product, channel, and resolution tables, window functions for rolling averages, LAG-based trend columns. Feeds Power BI via DirectQuery.

Window functionsCTEsAggregationETLJoin optimisation

Power BI · DAX & Dashboard Design

Primary BI tool

Delivered an 8-view complaint intelligence dashboard with custom DAX measures — complaint rate, SLA breach %, YoY trend. Drill-through model from portfolio level to individual product in two clicks.

Power BIDAXPower QueryDrill-throughTableau

Statistical EDA & Hypothesis Testing

Applied in 4 projects

Applied chi-square, t-tests, and distribution analysis to validate feature relevance before modeling. Pearson correlation matrix to remove multicollinear features (threshold r > 0.85). Documented statistical significance of top predictors.

Hypothesis testingCorrelation analysisDistribution plotsFeature selection

Machine Learning · Classification

Shipped: AUC 0.91

Trained and compared Random Forest vs XGBoost on imbalanced financial data. Tuned classification threshold to 0.38 to optimise recall on minority class. Documented precision/recall trade-off for a non-technical audience.

Random ForestXGBoostSMOTEROC/AUCThreshold tuning

Journey

2021 – 2026

B.Tech — ECE (Data Science),
KL University · CGPA 8.2

Technical foundation in statistics, ML, data structures, and programming — with a specialisation track that prioritised applied analytics over theory.

ML algorithms, evaluation metrics, and model selection
Statistics, probability, linear algebra — applied to real datasets
Full project lifecycle in every major course: raw data to deliverable output
Google Data Analytics Professional Certificate (Coursera)
Python for Data Science Certificate (Coursera)

B.Tech ECE Data Science specialisation CGPA 8.2 May 2026 graduate

2022 – Present

Four shipped projects,
all with documented outputs

Every project started with a business question, not a dataset. I defined the metric that mattered, built to hit it, and documented where the model or dashboard fell short.

Loan default classifier: 77K records, AUC 0.91, threshold-tuned for recall
Complaint intelligence dashboard: 8 Power BI views, DAX-driven KPIs
Stock backtesting: 3 strategies compared on risk-adjusted return
CV demographic model: 91% gender accuracy, MAE ±5.2 years for age

Banking & Financial Services

Three years studying
how banks think about risk

Banking analytics isn't about finding patterns — it's about quantifying consequences. A false negative has a dollar cost. A missed complaint trend has a churn cost.

Credit risk: default probability, threshold tuning for asymmetric costs
Customer experience: complaint root-cause, resolution SLA tracking
Market analytics: risk-adjusted return, drawdown, strategy comparison
All outputs framed in terms a credit manager or ops head would use

Available June 2026

Ready to contribute
from week one

Looking for a data analyst role in banking, fintech, or financial services where I can own a reporting domain or analytical workstream from day one.

Can pull, clean, and model financial data independently (Python + SQL)
Can design and build BI reports decision-makers will actually use (Power BI)
Can communicate findings without jargon — to technical and non-technical teams
Open to Hyderabad, Bengaluru, or fully remote

Let's talk →

Contact

Hire someone who
ships clean analysis

Available for roles starting June 2026

If you need an analyst who can take a messy financial dataset, build a defensible model, and explain the output to a credit manager without using the word "algorithm" — let's talk.

I'm specifically strong at: loan default and risk modeling · complaint intelligence dashboards · financial data pipelines · Power BI for non-technical audiences.

LinkedIn lokesh-gaddam-data-analyst

GitHub LokeshGaddam14

Email [email protected]

Send a direct message

The analyst who readsthe P&L first

Bank Loan DefaultDetection Pipeline

Customer Complaints Intelligence Dashboard

Stock Market Analysis & Strategy Backtesting

Age–Gender Detection · CNN Pipeline

What I can do,with receipts

Python · Data Pipeline & Modeling

SQL · Aggregation & ETL

Power BI · DAX & Dashboard Design

Statistical EDA & Hypothesis Testing

Machine Learning · Classification

B.Tech — ECE (Data Science),KL University · CGPA 8.2

Hire someone whoships clean analysis

The analyst who reads
the P&L first

Bank Loan Default
Detection Pipeline

What I can do,
with receipts

B.Tech — ECE (Data Science),
KL University · CGPA 8.2

Hire someone who
ships clean analysis