Whispers – A Real-Time Voice Journaling Agent Built with AssemblyAI

This content originally appeared on DEV Community and was authored by VaishakhVipin

This is a submission for the AssemblyAI Voice Agents Challenge

Whispers - A Real-Time Voice Journaling Agent

What I Built

Whispers is a voice-first journaling application powered by AssemblyAI's universal-streaming API. It enables users to speak their thoughts in real-time, intelligently formatting their words into reflective, readable journal entries. The app serves as a personal wellness companion—part therapist, part mirror, part coach—helping users capture their daily reflections through natural speech.

This project falls under the Real-Time Performance category, demonstrating advanced real-time audio processing with sub-300ms latency for live transcription display. The application showcases how AssemblyAI's universal-streaming technology can create seamless, responsive voice experiences that feel natural and immediate.

Demo

🎥 Video Demo: https://drive.google.com/file/d/1RHyqpW434EeTGdP6xMRYbZCfifNatZd7/view?usp=sharing

GitHub Repository

The complete source code is available at: (https://github.com/VaishakhVipin/whispers-final)

Key files demonstrating AssemblyAI integration:

backend/services/assembly.py - Python WebSocket streaming implementation
frontend/src/components/NotionLikeEditor.tsx - Frontend WebSocket integration
backend/routes/stream.py - Backend API endpoints for voice processing
frontend/src/lib/api.ts - Frontend API integration

Technical Implementation & AssemblyAI Integration

AssemblyAI's universal-streaming WebSocket API is the core of Whispers' real-time voice processing capabilities. The implementation streams microphone audio and receives live, formatted transcripts with exceptional accuracy and minimal latency.

Key AssemblyAI Features Implemented:

Real-time WebSocket Connection: Direct streaming to AssemblyAI's v3 streaming endpoint with formatted finals
Live Transcription: Continuous audio processing with immediate text output and partial transcript display
Auto-formatting: Clean, punctuated transcripts with proper sentence boundaries using formatted_finals=true
Streaming State Management: Robust connection handling with proper cleanup and error recovery
Duplicate Detection: Intelligent handling to prevent transcription artifacts and repeated content
Paragraph Logic: Smart paragraph spacing based on content analysis and sentence boundaries

Code Snippet - Python WebSocket Implementation:

async def stream_to_assemblyai(audio_generator):
    """
    Streams PCM audio chunks to AssemblyAI Universal-Streaming API and yields transcript text results.
    :param audio_generator: async generator yielding raw PCM audio bytes
    :yield: transcript text (str)
    """
    token = get_assemblyai_token_universal_streaming()
    ws_url = ASSEMBLYAI_WS_BASE + token

    async with websockets.connect(ws_url) as ws:
        async def send_audio():
            async for chunk in audio_generator:
                await ws.send(chunk)
            await ws.send(json.dumps({"terminate_session": True}))

        async def receive_transcripts():
            async for msg in ws:
                data = json.loads(msg)
                if data.get("message_type") == "FinalTranscript":
                    yield data.get("text", "")

        send_task = asyncio.create_task(send_audio())
        async for transcript in receive_transcripts():
            yield transcript
        await send_task

Frontend JavaScript Integration:

// Connect to AssemblyAI WebSocket
const ws = new WebSocket(`wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&formatted_finals=true&token=${token}`);

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);

  if (data.type === "Turn") {
    const transcript = data.transcript || "";
    const turnIsFormatted = data.turn_is_formatted || false;

    if (turnIsFormatted && transcript.trim()) {
      // Final, formatted version - add to main transcription
      console.log("📝 Clean transcription:", transcript);

      // Check for duplicates and add with proper paragraph spacing
      const shouldStartNewParagraph = shouldStartNewParagraphLogic(transcript, transcriptionText);
      const separator = shouldStartNewParagraph ? "\n\n" : " ";

      setTranscriptionText(prev => {
        const trimmedTranscript = transcript.trim();
        const trimmedPrev = prev.trim();

        // Robust duplicate detection
        if (trimmedTranscript && 
            !trimmedPrev.endsWith(trimmedTranscript) && 
            !trimmedPrev.includes(trimmedTranscript + " " + trimmedTranscript)) {
          return prev + (prev && !prev.endsWith('\n\n') ? separator : "") + transcript;
        }
        return prev;
      });
    } else if (!turnIsFormatted && transcript.trim()) {
      // Partial version - show in real-time stream
      setCurrentStreamText(transcript);
    }
  }
};

UX Design & Features

Voice-First Interface:

Minimalist journaling canvas with vintage paper aesthetic
Pulsing recording indicator for live microphone status
Real-time word count and session duration tracking
Intelligent duplicate detection to prevent transcription artifacts

Smart Journaling Features:

Daily Reflection Prompts: Curated prompts that refresh daily at 12 AM GMT
Tone Rewriting: AI-powered text transformation (optimistic, technical, formal, etc.)
Session Management: Edit sessions created on the same day, read-only after that
Content Analysis: Automatic title generation, summaries, and key theme extraction
Search & Discovery: Full-text search across all journal entries

Technical Architecture:

Frontend: React + TypeScript + Vite + Tailwind CSS + Shadcn/ui
Backend: FastAPI + Python for API endpoints and AI processing
Database: Supabase for user authentication and session storage
Search: Algolia for fast, semantic search across journal entries
AI Processing: Google Gemini for content summarization and tone rewriting

Key Technical Achievements

Real-Time Performance:

Sub-200ms latency for live transcription display
Seamless WebSocket connection management
Efficient audio processing with proper resource cleanup
Responsive UI updates synchronized with audio state

Domain Expertise:

Specialized journaling workflow optimized for voice input
Intelligent content organization with automatic categorization
User behavior analysis with session statistics and trends
Privacy-focused design with user data isolation

Robust Error Handling:

Graceful microphone permission management
Connection recovery mechanisms
Comprehensive logging for debugging
Fallback modes for degraded performance

Key Takeaways

AssemblyAI's Real-time Capabilities: The universal-streaming API provides exceptional low-latency transcription with remarkable accuracy, making voice journaling feel natural and responsive.
WebSocket Management is Critical: Proper cleanup of WebSocket connections and audio resources is essential, especially when users navigate between pages or close the application.
Voice Journaling Requires Context: Beyond simple text capture, voice journaling benefits from emotional context, prompting, and intelligent content organization.
Immutable Journals Encourage Honesty: Locking journal entries after creation (read-only after the same day) encourages more authentic, unfiltered self-reflection.
Real-time UX Demands Attention: Users expect immediate feedback when speaking, requiring careful attention to UI state management and audio-visual synchronization.

What's Next

Immediate Roadmap:

Deploy live version with enhanced security and RLS re-enabled
Implement user streak tracking and habit formation features
Add sentiment analysis for emotional trend tracking
Create memory timelines and reflection insights

Future Enhancements:

Voice emotion detection for mood tracking
Collaborative journaling features
Integration with wellness apps and calendars
Advanced AI coaching and reflection prompts

Technical Stack

Frontend:

Typescript (React)
Vite for fast development and building
Tailwind CSS for styling
Shadcn/ui for component library
React Router for navigation

Backend:

FastAPI for RESTful API endpoints
Python for server-side processing
Supabase for authentication and database
Algolia for search indexing

Voice & AI:

AssemblyAI Universal Streaming for real-time transcription
Google Gemini for content analysis and rewriting
WebSocket for real-time communication

Deployment:

Vercel for frontend hosting
Vercel Functions for backend API
Environment-based security configuration

Final Note

Whispers is built for people who think best out loud. It transforms the traditional journaling experience into a dynamic conversation with yourself—live, raw, and authentically yours. By leveraging AssemblyAI's cutting-edge voice technology, Whispers makes capturing daily reflections as natural as having a conversation, while providing the structure and insights that make journaling truly meaningful.

The project demonstrates how real-time voice technology can enhance personal wellness applications, creating a more intuitive and engaging way for users to document their thoughts, emotions, and personal growth journey.

This content originally appeared on DEV Community and was authored by VaishakhVipin

Print Share Comment Cite Upload Translate Updates

APA

VaishakhVipin | Sciencx (2025-07-28T00:39:24+00:00) Whispers – A Real-Time Voice Journaling Agent Built with AssemblyAI. Retrieved from https://www.scien.cx/2025/07/28/whispers-a-real-time-voice-journaling-agent-built-with-assemblyai/

MLA

" » Whispers – A Real-Time Voice Journaling Agent Built with AssemblyAI." VaishakhVipin | Sciencx - Monday July 28, 2025, https://www.scien.cx/2025/07/28/whispers-a-real-time-voice-journaling-agent-built-with-assemblyai/

HARVARD

VaishakhVipin | Sciencx Monday July 28, 2025 » Whispers – A Real-Time Voice Journaling Agent Built with AssemblyAI., viewed ,<https://www.scien.cx/2025/07/28/whispers-a-real-time-voice-journaling-agent-built-with-assemblyai/>

VANCOUVER

VaishakhVipin | Sciencx - » Whispers – A Real-Time Voice Journaling Agent Built with AssemblyAI. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/07/28/whispers-a-real-time-voice-journaling-agent-built-with-assemblyai/

CHICAGO

" » Whispers – A Real-Time Voice Journaling Agent Built with AssemblyAI." VaishakhVipin | Sciencx - Accessed . https://www.scien.cx/2025/07/28/whispers-a-real-time-voice-journaling-agent-built-with-assemblyai/

IEEE

" » Whispers – A Real-Time Voice Journaling Agent Built with AssemblyAI." VaishakhVipin | Sciencx [Online]. Available: https://www.scien.cx/2025/07/28/whispers-a-real-time-voice-journaling-agent-built-with-assemblyai/. [Accessed: ]

rf:citation

» Whispers – A Real-Time Voice Journaling Agent Built with AssemblyAI | VaishakhVipin | Sciencx | https://www.scien.cx/2025/07/28/whispers-a-real-time-voice-journaling-agent-built-with-assemblyai/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.