Senior AI/ML Engineer focused on architecting and delivering production-grade LLM systems, RAG pipelines, and multi-agent orchestration frameworks at enterprise scale. I work end-to-end across the AI platform stack — from vector-backed retrieval and LLM observability to distributed MLOps infrastructure on Azure, GCP, and AWS.
- Building agentic AI workflows with LangChain, LangGraph, and the MCP ecosystem
- Designing RAG pipelines with hybrid semantic search across Pinecone, Weaviate, FAISS, and ChromaDB
- Fine-tuning and deploying foundation models (GPT-4o, Claude, LLaMA, Mistral, Gemini) in regulated, high-availability environments
- Driving LLM observability, evaluation, and responsible AI practices across ML organizations
- Reach me at venkat.vasabathula@gmail.com
- LLM Systems & Generative AI — Production RAG, prompt engineering, fine-tuning (LoRA/PEFT), Chain-of-Thought reasoning, multi-modal pipelines.
- Agentic Frameworks — MCP client/server architectures, LangGraph orchestration, tool-using agents integrated with enterprise data sources.
- MLOps & Platform Engineering — Model versioning, A/B testing, evaluation pipelines, CI/CD for ML, distributed inference serving.
- AI Observability — LangSmith tracing, drift detection, latency/cost monitoring, model confidence and quality metrics.
Highlights from my professional experience (company names omitted).
- Architected a production-grade agentic MCP client–server framework using LangChain and LangGraph, integrating 5+ enterprise data sources and reducing Tier-1 support ticket volume by 20%.
- Engineered an enterprise RAG pipeline on Azure AI Search with FAISS / Pinecone / Weaviate, automated ingestion, and real-time KB sync — cutting query resolution time by 34%.
- Built an end-to-end LLM observability framework with LangSmith and distributed tracing for confidence, latency, and drift — reducing model degradation by 25%.
- Owned automated model evaluation pipelines benchmarking 40+ configurations across GPT-4, LLaMA, and open-source LLMs — achieving 42% performance improvement.
- Delivered scalable multi-model inference serving on Azure (FastAPI + Docker + Kubernetes) with caching, batching, and model versioning — sub-2s latency and 28% lower API cost.
- Built distributed Spark + Airflow pipelines on Databricks processing 100+ TB of data, and mentored engineers on AI architecture and LLM integration patterns.
- Earlier in my career: fine-tuned LLMs with LoRA/PEFT, deployed cloud-native inference on GCP/AWS (38% lower latency), and led MLOps migrations achieving 99% uptime SLA.
Production AI Services Platform — Python, FastAPI, LangGraph, LangChain, Pinecone, Docker, Kubernetes, Azure
Cloud-native AI services platform with agentic workflows, vector-backed RAG, hybrid semantic search, and full observability. Serves 200K+ documents with sub-2-second responses and 99.5% uptime. CI/CD on Azure via GitHub Actions; model quantization reduced serving latency by 31%.
- M.S. Computer Science — California State University, Channel Islands
Coursework: Large Language Models, Neural Networks, NLP, Machine Learning, Distributed Systems - Microsoft Certified: Azure AI Engineer Associate
- AWS Certified Solutions Architect – Associate
- Cisco – Data Analytics Essentials
- CSUCI Plot-A-Thon 2024 — 1st place, data visualization & analysis
Open to collaborating on production AI/ML systems, LLM platforms, and agentic AI research.