discussionadvanced
Show HN: A real time AI video agent with under 1 second of latency
By hassaanrhackernews
View original on hackernewsTavus, an AI research company, has developed a real-time conversational video agent achieving sub-1 second latency by optimizing their Phoenix-2 model architecture. Key innovations include switching from NeRF to Gaussian Splatting for 70+ fps generation, hyper-optimizing each component (vision, ASR, LLM, TTS), and implementing specialized end-of-turn detection to enable natural human-AI conversations.
Key Points
- •Achieved sub-1 second latency for conversational AI video by optimizing every millisecond across the entire pipeline (vision, ASR, LLM, TTS, video generation)
- •Switched from NeRF-based to Gaussian Splatting backbone in Phoenix-2 model to enable 70+ fps frame generation on lower-end hardware, reducing computational requirements
- •Identified time-to-first-token (TTFT) as the critical LLM bottleneck rather than tokens-per-second; standard providers like Groq were too slow despite high throughput
- •Implemented specialized end-of-turn detection model that uses conversation signals and input speculation to reduce latency from detecting speech pauses, preventing both talking-over and delayed responses
- •Transitioned from requiring individual H100 GPUs per conversation to running multiple conversations on lower-end hardware through memory optimization and GPU core utilization improvements
- •Applied architectural techniques including streaming vs. batching and process parallelization to balance three competing constraints: latency, scale, and cost
- •Validated real-world effectiveness with customers like Delphi running multi-hour conversations with digital twins, proving the system's reliability and user engagement
- •Recognized conversational video as a fundamental human-computer interface that requires realistic interaction speed (~250ms between utterances for natural conversation)
Found this useful? Add it to a playbook for a step-by-step implementation guide.
Workflow Diagram
Start Process
Step A
Step B
Step C
Complete