VideoDB

Software Engineer Intern  ·  Jan 2026 – Apr 2026  ·  Remote

At VideoDB I built a real-time fact-checking system for live audio and video streams. Given a stream from YouTube Live or Google Meet, the pipeline transcribes claims as they are spoken, retrieves indexed video evidence, and decides within a second whether each claim is grounded in what the camera actually shows.

I also open-sourced VideoDB Claude Skills, a small library exposing video search, transcription, and scene understanding through a plain-English terminal interface using the MCP protocol. It shipped to the Claude developer marketplace.

I co-authored a benchmarking paper studying whether Gemini’s intermediate reasoning traces actually improve scene-understanding accuracy, or whether they merely look like they do. The answer depends heavily on task structure.

Nanyang Technological University, Singapore

Research Intern · Speech and Multilingual NLP  ·  Jan 2025 – Dec 2025  ·  Hybrid

I designed and optimised large-scale multilingual ASR and speaker diarization pipelines over GigaSpeech 2, Emilia, and NVIDIA Granary, analysing trade-offs in normalization strategy, language mixing, and benchmark generalization across regional accents.

My current paper builds on Indian-ASR-Bench, a WER benchmark of five ASR systems on the TIE dataset — 986 Indian English academic lecture clips. I evaluated Whisper Base, Medium, and Large alongside Parakeet-TDT-0.6B and Qwen3-ASR-1.7B, with breakdowns by region, speech rate, audio duration, gender, and discipline.

The paper focuses on why scale is not the whole story for accented academic speech: Whisper Medium beats Whisper Large overall, while Parakeet and Qwen3 are more robust on long clips. The methodological result is just as important — transcript normalization can change WER more than model choice.

BITS Pilani, Goa

Research Intern · NLP and LLM Evaluation  ·  Sep 2024 – Dec 2025  ·  Goa, India

The first project was PustakAI, a curriculum-aligned QA pipeline for Indian school textbooks. I built the curation and evaluation harness, designed RAG-based prompting strategies, and fine-tuned Gemma3 :1B and LLaMA3.2 :3B variants. The paper was accepted at ACM COMPUTE 2025. The NCERT dataset is public on HuggingFace.

The second project is ICH-QA, a 131,495-pair synthetic QA dataset over 7,506 Wikipedia articles on Indian Cultural Heritage. RAG reaches 48% EM - confirming the dataset requires genuine cultural grounding, not surface pattern-matching.

I am currently writing up the ICH-QA work as a research paper with a PhD scholar under a Prof. from BITS Pilani Goa and a Prof. from RMIT Australia.

Home  ·  Last revised June 2026