VideoDB
At VideoDB I built a real-time fact-checking system for live audio and video streams. Given a stream from YouTube Live or Google Meet, the pipeline transcribes claims as they are spoken, retrieves indexed video evidence, and decides within a second whether each claim is grounded in what the camera actually shows.
I also open-sourced VideoDB Claude Skills, a small library exposing video search, transcription, and scene understanding through a plain-English terminal interface using the MCP protocol. It shipped to the Claude developer marketplace.
I co-authored a benchmarking paper studying whether Gemini’s intermediate reasoning traces actually improve scene-understanding accuracy, or whether they merely look like they do. The answer depends heavily on task structure.
Nanyang Technological University, Singapore
I designed and optimised large-scale multilingual ASR and speaker diarization pipelines over GigaSpeech 2, Emilia, and NVIDIA Granary, analysing trade-offs in normalization strategy, language mixing, and benchmark generalization across regional accents.
My current paper builds on Indian-ASR-Bench, a WER benchmark of five ASR systems on the TIE dataset — 986 Indian English academic lecture clips. I evaluated Whisper Base, Medium, and Large alongside Parakeet-TDT-0.6B and Qwen3-ASR-1.7B, with breakdowns by region, speech rate, audio duration, gender, and discipline.
The paper focuses on why scale is not the whole story for accented academic speech: Whisper Medium beats Whisper Large overall, while Parakeet and Qwen3 are more robust on long clips. The methodological result is just as important — transcript normalization can change WER more than model choice.
BITS Pilani, Goa
The first project was PustakAI, a curriculum-aligned QA pipeline for Indian school textbooks. I built the curation and evaluation harness, designed RAG-based prompting strategies, and fine-tuned Gemma3 :1B and LLaMA3.2 :3B variants. The paper was accepted at ACM COMPUTE 2025. The NCERT dataset is public on HuggingFace.
The second project is ICH-QA, a 131,495-pair synthetic QA dataset over 7,506 Wikipedia articles on Indian Cultural Heritage. RAG reaches 48% EM - confirming the dataset requires genuine cultural grounding, not surface pattern-matching.
I am currently writing up the ICH-QA work as a research paper with a PhD scholar under a Prof. from BITS Pilani Goa and a Prof. from RMIT Australia.
Video Fact Checker
A desktop application that monitors YouTube, Google Meet, and podcasts in real time. It captures system audio via VideoDB, transcribes it live, and uses Gemini to extract and verify claims every 20 seconds — classifying each as verified, misleading, or missing context with a confidence score.
Built as an Electron app with a React frontend and a lightweight Hono backend. Runs as a menu bar tray icon so it stays out of the way while monitoring any audio on the machine.
Open-source mobile translation app covering all 22 official Indian languages, powered by Google Gemini 2.5 Flash. Supports voice input with auto-transcription, text-to-speech output, and on-device translation history.
Built for KSP Datathon 2024. Trained a CatBoostClassifier on 329,000+ Karnataka State Police FIR records (2016–2024) to predict accident severity and recommend emergency response levels. Five interactive dashboards surface trends by district, road type, and time period.
Open Source
Open WebUI PR #23456 — replaced six instances of hardcoded forward-slash
path concatenation in audio.py with os.path.join(), fixing a Windows audio load
failure. Merged.
OpenAI Codex issue #16303 — a UX improvement for skill display in the TUI: surfacing specific skill names instead of the generic “Read SKILL.md” message. Accepted and incorporated by the maintainers.
-
Research Presenter, GLEX 2025 2025
Global Space Exploration Conference, New Delhi. Real-time DEM refinement for autonomous space exploration.
-
Invited Speaker, IIT Madras Industry Interaction Event 2025
On undergraduate research pathways and AI/ML internship experience.
-
Guest, IIT Madras BS Story Podcast 2025
Research internships at NTU Singapore and BITS Pilani, and the Data Science programme.
-
Finalist, Karnataka State Police Hackathon 2024
Public-safety AI tool built in 36 hours.
-
Finalist, Smart India Hackathon 2023
National-level hackathon, ~50 finalist teams from over 1,000 submissions.
-
Citizen Scientist, NASA Astrophile Campaign 2023
Asteroid candidate identifications via the IASC programme.
-
Google PM Scholarship 2024
-
Bertelsmann Technology Scholarship 2023
Full scholarship for the Udacity Ethical Hacking Nanodegree.
-
IIT Madras Merit Scholarship 2022–
Bachelor in Data Science & Applications.