Open to DA / BA roles · June 2026

Lokesh Gaddam

Data Analyst — Banking, Risk & Financial Analytics

I build analytics that flag risk before it becomes a loss. Loan default classifier at AUC 0.91 on 77K records. Complaint intelligence dashboard across 8 product lines. Backtested stock strategy with quantified Sharpe-like metrics.

Differentiator: I work backwards from the business decision, not forwards from the data. Every model I ship answers a specific question a manager would actually ask.
0.91
AUC · Loan
Default Model
77K
Records
Modeled
8+
Power BI
Dashboard Views

About

The analyst who reads
the P&L first

Background

I specialise in financial and risk analytics — where a misclassified loan costs money and a missed complaint trend costs customers.

Final-year B.Tech (ECE — Data Science) at KL University, CGPA 8.2. Four shipped projects: raw CSV ingestion to production-ready Power BI reports, with reproducible Python pipelines and documented methodology throughout.

I write code to be read by the next analyst. I write dashboards to be trusted by someone who doesn't know what a p-value is.

Download Resume
"A dashboard that requires explanation has already failed. The insight should be visible before the meeting starts." — My design principle for BI reports
Education
B.Tech — ECE (Data Science Specialisation)
KL University · CGPA 8.2 · Graduating May 2026
Certifications
Google Data Analytics — Coursera Python for Data Science — Coursera
Domain Focus
Loan default & credit risk Complaint intelligence Quantitative backtesting ETL pipeline design Statistical EDA ML classification
Open To
Data Analyst · Business Analyst in banking, fintech, or financial services. Hybrid or remote. Vijayawada / Hyderabad preferred; open to relocation.

Flagship Case Study
Deep dive · Banking Risk

Bank Loan Default
Detection Pipeline

Problem
A lending portfolio with no automated early-warning system. Risk officers reviewing applications manually, missing behavioural signals.
Dataset
77,000+ loan records · 35 raw features · Class imbalance: 78% non-default / 22% default
Approach
EDA → imputation (median/mode by segment) → 14 engineered features (debt-to-income, repayment history score, utilisation flags) → SMOTE rebalancing → Random Forest + XGBoost comparison → threshold tuning to minimise false negatives
Result
AUC 0.91 Recall 0.87 Precision 0.84 F1 0.855
Business impact
At 0.87 recall across 77K records: correctly flags ~14,700 of 16,940 true defaults — ranked risk score for credit decisioning.
Tools
Python · Pandas · Scikit-learn · XGBoost · Matplotlib · Seaborn · Jupyter
Confusion Matrix · Test Set Threshold = 0.38
14,700 True Positive
3,220 False Positive
2,240 False Negative
56,840 True Negative
Feature Importance · XGBoost
Debt-to-Income Ratio 0.24
Repayment History Score 0.19
Credit Utilisation Flag 0.15
Loan-to-Income Ratio 0.12
Employment Length 0.09
ROC Curve · XGBoost vs Random Forest AUC = 0.91
Recall 0.87 0 1 1 False Positive Rate XGBoost AUC=0.91 RF AUC=0.87

I optimised for recall over precision, setting the classification threshold at 0.38 (below default 0.5). This intentionally increases false positives — flagging some good loans as risky — but reduces far more costly missed defaults.

In a real lending context: a false positive costs a manual review; a false negative costs the loan amount. I documented the expected false positive rate (~19%) so the credit team could weigh the trade-off explicitly.

First iteration used raw loan amount as a feature. The model overfit — high amounts dominated feature importance, masking behavioural signals. I rebuilt with ratio-based features (debt-to-income, loan-to-income). Feature importance shifted to repayment history and utilisation patterns.

Lesson: raw scale features in financial data tell you about the customer segment, not the risk. Engineer ratios, not magnitudes.

A monitoring layer: track precision/recall on new data quarterly. I'd add SHAP explanations per prediction so a credit officer can see exactly which features drove a high-risk flag — critical for RBI guidelines on algorithmic credit scoring.

SHAP explainability Model drift monitoring Regulatory documentation API deployment

Jupyter notebook with full pipeline, confusion matrix, ROC curve, and feature importance chart. README documents all assumptions and how to reproduce on a new dataset.

View full project on GitHub →

Selected Projects
01

Customer Complaints Intelligence Dashboard

AmEx complaint records had no visibility into which products, channels, or regions were driving repeat complaints — resolution teams were reactive, not systematic.

Approach
Python ETL across 50K+ records. SQL aggregations by product, channel, resolution time. Power BI with 8 drill-through views.
Result
8 product lines broken down by complaint volume, median resolution time, and escalation rate on one dashboard page.
Business Impact
Surfaces top 3 complaint drivers per product — gives an ops manager one screen to prioritise resolution resources.
Output
Power BI .pbix · DAX measures for complaint rate, SLA breach %, and YoY trend.
Power BI Python SQL DAX
GitHub

Next: sentiment scoring on complaint text to auto-tag tone severity — distinguishing billing confusion from a regulatory grievance.

Complaint Intelligence · Power BI
50K+
Total Records
4.2d
Avg Resolution
18%
SLA Breach
Complaints by Product
Credit Card↑ 23%
Personal Loan↑ 18%
Savings Account↓ 4%
02

Stock Market Analysis & Strategy Backtesting

Retail investors making entry/exit decisions on intuition, with no systematic evidence their chosen indicators produce positive risk-adjusted returns.

Approach
5 years OHLCV data. MA crossover (20/50-day), RSI, Bollinger Band strategies. Backtested with 0.1% transaction cost assumption per trade.
Result
MA crossover: Sharpe-equivalent 1.34 vs buy-and-hold 0.87. RSI underperformed in trending markets — documented and explained.
Business Impact
Reproducible framework to evaluate any technical strategy before live deployment — risk-adjusted return, max drawdown, win-rate outputs.
Output
Python backtesting module, equity curve charts, performance comparison table across 3 strategies.
Python Pandas NumPy
GitHub

Next: walk-forward validation to prevent look-ahead bias, and Monte Carlo simulation for drawdown scenarios.

Equity Curve · Strategy Comparison
MA × B&H RSI
1.34
MA Sharpe
0.87
B&H Sharpe
−14%
Max Drawdown
03

Age–Gender Detection · CNN Pipeline

Demographic classification on live video — for retail foot-traffic analytics, digital signage targeting, and security systems.

Approach
Fine-tuned VGG-16 on UTKFace (23K images). Separate heads for age regression and gender classification. OpenCV for real-time face detection.
Result
Gender: 91% accuracy. Age: MAE ±5.2 years on held-out test set. Real-time at 18 FPS on laptop GPU.
Business Impact
Proof-of-concept for in-store demographic analytics — age-group and gender breakdown of foot traffic without manual tagging.
Output
Trained weights, inference script, evaluation notebook with confusion matrix and error distribution by age bracket.
TensorFlow Keras OpenCV
GitHub

Next: data augmentation for low-light retail environments. Evaluate MobileNetV3 for edge deployment speed.

Inference Output · VGG-16
Male, 28y
91%
Gender Acc.
±5.2y
Age MAE
18fps
Inference

Top Repositories
Repository 01
banking-analysis
Loan default detection pipeline — EDA, feature engineering (14 ratio-based features), SMOTE rebalancing, XGBoost vs Random Forest comparison, threshold tuning.
AUC 0.91 · Recall 0.87
Repository 02
amex-complaints
Complaint intelligence dashboard — Python ETL on 50K+ records, SQL aggregations, Power BI report with 8 drill-through views and DAX-driven KPIs.
50K records · 8 BI views
Repository 03
stock-market-analysis
Strategy backtesting framework — MA crossover, RSI, and Bollinger Band strategies compared on 5 years of OHLCV data with transaction cost assumptions.
Sharpe 1.34 vs B&H 0.87

Contribution Activity

GitHub contribution chart for LokeshGaddam14

Skills & Proof

What I can do,
with receipts

No self-rated percentages. Every skill is backed by a project with documented outputs at a real scale.

Python · Data Pipeline & Modeling

Primary tool
Built a 77K-record ETL + ML pipeline (loan default) from raw CSV to evaluated model — imputation, SMOTE, threshold tuning, all in one reproducible notebook. Engineered 14 ratio-based features from raw financial fields.
PandasNumPyScikit-learnXGBoostMatplotlibSeaborn

SQL · Aggregation & ETL

Daily use
Wrote multi-table SQL aggregations across 50K+ complaint records — joining product, channel, and resolution tables, window functions for rolling averages, LAG-based trend columns. Feeds Power BI via DirectQuery.
Window functionsCTEsAggregationETLJoin optimisation

Power BI · DAX & Dashboard Design

Primary BI tool
Delivered an 8-view complaint intelligence dashboard with custom DAX measures — complaint rate, SLA breach %, YoY trend. Drill-through model from portfolio level to individual product in two clicks.
Power BIDAXPower QueryDrill-throughTableau

Statistical EDA & Hypothesis Testing

Applied in 4 projects
Applied chi-square, t-tests, and distribution analysis to validate feature relevance before modeling. Pearson correlation matrix to remove multicollinear features (threshold r > 0.85). Documented statistical significance of top predictors.
Hypothesis testingCorrelation analysisDistribution plotsFeature selection

Machine Learning · Classification

Shipped: AUC 0.91
Trained and compared Random Forest vs XGBoost on imbalanced financial data. Tuned classification threshold to 0.38 to optimise recall on minority class. Documented precision/recall trade-off for a non-technical audience.
Random ForestXGBoostSMOTEROC/AUCThreshold tuning

Journey

2021 – 2026

B.Tech — ECE (Data Science),
KL University · CGPA 8.2

Technical foundation in statistics, ML, data structures, and programming — with a specialisation track that prioritised applied analytics over theory.

  • ML algorithms, evaluation metrics, and model selection
  • Statistics, probability, linear algebra — applied to real datasets
  • Full project lifecycle in every major course: raw data to deliverable output
  • Google Data Analytics Professional Certificate (Coursera)
  • Python for Data Science Certificate (Coursera)
B.Tech ECE Data Science specialisation CGPA 8.2 May 2026 graduate

Contact

Hire someone who
ships clean analysis

Available for roles starting June 2026

If you need an analyst who can take a messy financial dataset, build a defensible model, and explain the output to a credit manager without using the word "algorithm" — let's talk.

I'm specifically strong at: loan default and risk modeling · complaint intelligence dashboards · financial data pipelines · Power BI for non-technical audiences.

Send a direct message