Vuen AI
Project Concept
Core Functionality: The agent is a Multimodal Voice-to-UI Assistant centered around a reactive 3D Visual Orb. It enables real-time, bidirectional voice conversations where users can interact not just verbally, but physically by dragging and dropping documents (PDF, DOCX) directly onto the orb to instantly “teach” the agent new context (RAG). The agent actively orchestrates a suite of Business Intelligence (BI) tools, dynamically generating and rendering interactive charts (Time Series, Geo Maps, KPI Cards) based on the conversation.
Working Prototype Stability: The prototype is highly stable, featuring robust state management for connection lifecycles (Idle, Listening, Thinking, Speaking) and error handling. It implements a resilient WebSocket connection for low-latency audio and includes sophisticated UI feedback loops (e.g., visual cues for file processing, drag states, and chart loading statuses).
Technical Complexity: The solution demonstrates high complexity through Client-Side Tool Orchestration. The agent handles a multimodal pipeline:
Audio Processing: Real-time FFT frequency analysis driving the 3D orb’s vertex shaders.
Context Injection: On-the-fly parsing of dropped files (mammoth, pdfjs-dist) into markdown context injected directly into the active ElevenLabs session.
Tool Execution: The LLM “calls” client-side functions (e.g.,
time_series_chart
,
geo_chart
), which the application parses to render Recharts components dynamically without page reloads.
Innovation & Real-World Impact
Innovation: The “Tactile Voice Interface”—where dropping a file onto a 3D avatar instantly creates a knowledge base—is a novel interaction pattern that bridges the gap between file systems and conversational AI. Furthermore, the ability of the agent to “show, don’t just tell” by summoning complex data visualizations on command transforms the browser from a passive document viewer into an active analytical partner.
Real-World Impact: This tool democratizes complex data analysis. A non-technical user can drop a sales report and simply ask, “Show me the revenue trend by region,” and the agent will instantly visualize the data, drastically reducing the time-to-insight for decision-makers.
Theme Alignment
This project perfectly aligns with the theme by transforming disparate web elements into a cohesive entity:
Browsers: Transformed from static pages into an immersive 3D spatial interface using React Three Fiber.
Voices: ElevenLabs powers the agent’s persona, giving it a natural, human-like presence that drives the interaction.
Clouds: High-speed cloud inference connects the user’s intent to actionable data and context processing.
Tools: The agent acts as an orchestrator, wielding 7 distinct BI tools as extensions of its capabilities to manipulate the user’s screen.
Technology Stack
Core Framework: React, Vite, TypeScript
AI & Voice: ElevenLabs React SDK (WebSockets), ElevenLabs Conversational AI
3D & Visuals: Three.js, React Three Fiber (R3F), GLSL Shaders (Custom noise/fresnel shaders), Framer Motion
Data Visualization: Recharts (Dynamic chart rendering)
Document Processing: PDF.js (PDF parsing), Mammoth (DOCX parsing)
UI/Styling: TailwindCSS, Shadcn/UI, Lucide React
Entry
Status: Submitted
Last saved: December 11 at 9:33 PM -05
Team Roster
Message board not available for this team yet.
Enoc Silva Team Lead RSVP Approved
Founder at Vuen AI
Cristian Mancilla David RSVP Approved
CTO at VuenAI