Naman Varshney
Principal Architect · AI Systems & Infrastructure
I design resilient AI platforms that scale — from real-time event pipelines to cost-aware LLM planners

🏃♂️ HYROX Athlete
"Systems that endure — in code and in sport."
Bangalore Solo – April 2026
About
Professional
I'm Naman Varshney, Principal Architect with 12+ years building distributed systems. Currently leading AI & infra scaling at TripFactory, where I've reduced costs by ₹15+ Cr and maintained 99.99% uptime across 15+ microservices serving 500K+ users.
My expertise spans from high-frequency trading systems to AI-powered platforms. I believe in choosing boring technology that scales, then optimizing the hell out of it.
Personal

🏃♂️ HYROX Athlete
Next race: Bangalore Solo – April 2026
👨👩👧👦 Father of two
Teaching prioritization & efficiency
🏸 Sports
HYROX, Badminton, Cricket, Running
"The same grit and discipline I bring to competition, I bring to designing reliable systems."
Skills
🏗️ Production Stack Architecture
Real-time event processing with intelligent routing
💻 Languages
⚙️ Infra/Streaming
🤖 AI/LLM
☁️ Cloud/DevOps
Experience
TripFactory — Principal Architect
Context:
Large travel commerce codebase (search, pricing, booking, payments) with latency & cost issues.
Actions:
- •Carved domain services (search/pricing/payments)
- •Introduced BFF (Next.js) + caching strategies; added observability (SLOs, tracing)
- •Piloted LLM assistants: log triage & catalog normalization (RAG)
Results:
Stack:
Artifacts: Before/after latency chart, service map, SLO dashboard
Vedantu — Payments & Reliability
Context:
Scale for live-class peaks; sensitive payments/refunds.
Actions:
- •Split monolith into idempotent Spring services; release trains + trunk-based dev
- •Event analytics (Kafka → BigQuery); operational runbooks
- •Built payment reconciliation and fraud detection systems
Results:
Stack:
Artifacts: Idempotent payment flow diagram; dashboard red→green story
Via.com / EbixCash — Mobile + B2B Modules
Context:
Multi-brand travel apps and B2B agent portal.
Actions:
- •Built Android & iOS apps from scratch, modularized for multi-brand rollouts
- •Implemented dual authentication for B2B portal; incentive engine for agents
- •Led mobile team scaling from 2 to 8 engineers
Results:
Stack:
Artifacts: App flows video; incentives ERD; API documentation
Shoppoke — First Employee, Full-Stack
Zero-to-one startup experienceContext:
Zero-to-one marketplace matching shopper requests to nearby retailers.
Actions:
- •Shaped product with founder; built Android app + server APIs end-to-end
- •Led a small engineering team; shipped consulting work (AxisRooms, Manipal Hospital) to bootstrap
- •Established engineering practices and deployment pipelines
Results:
Stack:
Artifacts: Early architecture sketch, first-release screenshots
Flagship Case Studies
Three real systems, each with problem, architecture, and measurable results.
🎯 The Challenge
Teams were building AI features in silos with inconsistent models, costs, and governance. No centralized way to route requests, manage fallbacks, or ensure compliance across multiple LLM providers.
🏗️ Architecture
API Gateway → Model Router → Fallback Chain → Cost Optimizer → Analytics Dashboard
📊 Results
🛠️ Tech Stack
💡 Built from my own HYROX journey - create custom HYROX simulations, track them on your watch, and get AI-driven training plans. The app lets you design your own HYROX-style workouts and start tracking immediately on your Apple Watch.
🎯 The Challenge
Athletes needed training plans that adapt in real time. Existing solutions were one-size-fits-all and couldn’t capture the unique demands of HYROX — let alone track custom simulations and workouts straight from the watch.
🏗️ Architecture
Apple Watch → Kafka Streams → Feature Store → LLM Planner → Web/Mobile
📊 Results
🛠️ Tech Stack
🎯 The Challenge
Daily ground operations across airport ↔ hotel routes required complex routing with 45% dead kilometers, 25% service delays, and ₹2L+ daily fuel waste. Needed real-time optimization honoring time windows, vehicle constraints, and live traffic updates.
🏗️ Architecture
Booking API → Normalizer → Pooler → Route Optimizer → Vehicle Allocator
📊 Results
🛠️ Tech Stack
⚡ AI Supplier Negotiation Engine
TripFactory Cost Optimization Platform
🎯 The Challenge
Manual supplier negotiations were time-intensive and inconsistent, leading to suboptimal pricing.
🏗️ Architecture
Event Stream → ML Models → Negotiation Logic → Supplier APIs → Analytics Dashboard
📊 Results
🛠️ Tech Stack
🌊 Real-time Travel Recommendation Engine
Personalized Travel Discovery at Scale
🎯 The Challenge
Static recommendation systems couldn't adapt to real-time user behavior and market dynamics.
🏗️ Architecture
User Events → Kafka → Feature Engineering → ML Pipeline → Recommendation API
📊 Results
🛠️ Tech Stack
📊 Kafka Streaming Infrastructure
Enterprise Event Processing Platform
🎯 The Challenge
Legacy batch processing couldn't handle real-time analytics and event-driven architecture needs.
🏗️ Architecture
Multi-DC Kafka → Stream Processing → Real-time Analytics → Monitoring Dashboard
📊 Results
🛠️ Tech Stack
Research Publications
Developing and Testing the Automated Post-Event Earthquake Loss Estimation and Visualisation (APE-ELEV) Technique
Anthony Astoul, Christopher Filliter, Eric Mason, Andrew Rau-Chaplin, Kunal Shridhar, Blesson Varghese and Naman Varshney
An automated, real-time, multiple sensor data source relying and globally applicable earthquake loss model and visualiser for post-event earthquake analysis. The system supports rapid data ingestion, loss estimation and integration of data from multiple sources with rapid visualisation at multiple geographic levels.
Keywords:
Impact: Real-time earthquake loss estimation system validated for ten global earthquakes using industry loss data
A Framework for Real-time Earthquake Loss Estimation and Visualisation
Naman Varshney, Anthony Astoul, Christopher Filliter, Eric Mason, Andrew Rau-Chaplin, Kunal Shridhar, Blesson Varghese
A comprehensive framework for automated post-event earthquake loss estimation combining multiple data sources with real-time visualisation capabilities. The system demonstrates feasibility using the 2011 Tohoku earthquake case study.
Keywords:
Impact: Framework for rapid post-earthquake response with multi-source data integration
Let's Work Together
Ready to Build Something Amazing?
Whether you need to scale AI systems, optimize infrastructure costs, or build reliable distributed platforms, I'd love to discuss how we can work together.