AI Voice Agent
Learn how to enable voice-based interactions with your chatbot using the AI Voice Agent feature in OKChat AI.
Preview Feature: The AI Voice Agent is currently in preview. For access, please reach out to [email protected].
Introduction
The AI Voice Agent is an advanced feature in OKChat AI that enables voice-based interactions with your agent. This feature allows users to engage with the agent using natural language, making it ideal for hands-free operations and voice-activated devices.
Key Features
Voice Interaction
Users can speak to the chatbot, and the AI Voice Agent will respond with voice output.
Advanced Turn Detection
Detects when a user is speaking and supports English, Multilingual, and Push-to-Talk detection.
Customizable Voice Widget
Customize the appearance, text, voice, and behavior of the voice widget to match your brand.
Real-Time Speech Recognition
Uses advanced speech recognition to understand and process user commands.
Flexible Provider Selection
Choose from a wide range of LLM, Text-to-Speech (TTS), and Speech-to-Text (STT) providers.
Voice Activity Detection (VAD)
Uses voice activity detection to process user commands efficiently.
Telephony Integration
Integrate your voice agent with phone systems for seamless communication.
Default Tools
Web search, URL scraping, weather, and knowledge base search are available by default.
Providers
OKChat AI’s Voice Agent supports multiple providers to give you maximum flexibility.
OpenAI Realtime
Uses OpenAI’s realtime API with OKChat’s knowledge base and tools for fast interactions.
Features: Various voices, keyword detection, voice vibe.
Gemini Live
Uses Google’s Gemini API for a seamless voice experience.
Features: Various voices, keyword detection, voice vibe.
OKCHAT Provider (Recommended)
The most flexible option, using OKChat’s own pipeline, optimized for the platform. It allows you to mix and match different providers for core functionalities.
LLM Providers: OpenAI, Google, Groq, and more.
TTS Providers: OpenAI, Google, ElevenLabs, Deepgram, Cartesia.
STT Providers: OpenAI, Deepgram, Speechmatics, Groq, Google.
Getting Started with AI Voice Agent
Getting Started Steps
1. Accessing the AI Voice Agent
Log in to your OKChat AI dashboard.
Navigate to the AI Voice Agent section under Integration.
If you do not see this option, contact [email protected] to request access.
2. General Configuration
In the General tab, you can configure the core behavior of your voice agent.
- Prompt: Set the system prompt to define the agent’s personality and instructions.
- Voice Provider: Choose between OKCHAT (Recommended), OpenAI Realtime, or Gemini Live. The OKCHAT provider offers the most customization options.
- LLM Provider (OKCHAT only): If you chose the OKCHAT provider, select an underlying Large Language Model (LLM) provider like OpenAI, Google, or Groq.
- Model: Choose the specific model for the selected provider (e.g.,
gpt-4o
,gemini-2.0-flash
). - Turn Detector: Select how the agent detects user speech: English, Multilingual, or Push To Talk (deprecated).
- TTS Provider (OKCHAT only): Choose a Text-to-Speech provider like OpenAI, Google, ElevenLabs, Deepgram, or Cartesia to generate the agent’s voice.
- Choose AI Voice: Select a specific voice from the chosen TTS provider’s library.
- STT Provider: Select a Speech-to-Text provider like OpenAI, Deepgram, Speechmatics, Groq, or Google for transcription.
- STT Model: Choose the specific transcription model.
- Language: Enable Auto-detect language or manually specify language codes (e.g.,
en
for ISO 639-1,en-US
for BCP-47, depending on the STT provider). - Keywords: Provide a comma-separated list of domain-specific keywords to improve speech recognition accuracy.
- Greeting Message: Set the initial message the agent speaks when a conversation starts.
- Voice Vibe: Describe the desired personality, tone, and speaking style for the AI voice, particularly for OpenAI voices.
3. Appearance, Text, and Avatar
Customize the visual aspects of your voice widget.
- Appearance Tab: Adjust colors (background, text, button), radius for cards and buttons, and the widget’s on-screen position (
bottom-right
,top-left
, etc.). You can also choose to hide the “Powered by OKCHAT.AI” watermark. - Text Tab: Customize the text for buttons and status indicators like “Start call” or “Listening…”.
- Avatar Tab: Upload a custom image to be used as the agent’s avatar.
4. Advanced Configuration
Fine-tune the agent’s performance from the Config tab.
- VAD Threshold: Adjust VAD sensitivity (0-1). Lower is more sensitive. Default: 0.5.
- Prefix Padding (ms): Minimum speech duration to start a chunk. Default: 500ms.
- Silence Duration (ms): Minimum silence before ending a segment. Default: 1000ms.
- Temperature: Adjust response randomness (0-1). Higher is more random. Default: 0.7.
- Max Output Tokens: Limit response length. Default: 2048 tokens.
5. Kiosk Mode
Configure a full-screen voice agent experience, ideal for public displays.
- Enable Kiosk Mode from the Kiosk tab.
- Customize the Onboarding Screen with a title, description, instructions, and brand colors.
- Upload a mascot image and set the button text.
- Use the provided Kiosk URL to launch the agent in a browser.
6. Telephony Integration
Connect your voice agent to phone lines from the Telephony tab. This allows users to call in and interact with your AI agent over the phone.
7. Embedding the Voice Widget
Copy the embed code from the bottom of the configuration page and paste it into your website’s HTML. The data-position
attribute can be configured in the Appearance tab.
Use Cases
Customer Support
Provide voice-based customer support for users who prefer speaking over typing.
Hands-Free Interaction
Enable voice commands for users in environments where typing is inconvenient (e.g., driving, cooking).
Accessibility
Improve accessibility for users with disabilities by offering a voice-based interface.
Best Practices
Test Thoroughly
Test Thoroughly
Test the voice widget in different environments to ensure accurate speech recognition.
Optimize VAD Settings
Optimize VAD Settings
Adjust the VAD threshold, prefix padding, and silence duration to match your use case.
Monitor Performance
Monitor Performance
Regularly review the chatbot’s responses and adjust the temperature and max output tokens as needed.
User Guidance
User Guidance
Provide clear instructions to users on how to interact with the voice widget.
Troubleshooting
Voice Widget Not Responding
Voice Widget Not Responding
- Ensure the embed code is correctly placed in your website’s HTML.
Check the VAD settings to ensure the widget is sensitive enough to detect speech.
Inaccurate Responses
Inaccurate Responses
Adjust the temperature setting to control the randomness of responses.
Review the max output tokens to ensure responses are not too long or too short.
Access Issues
Access Issues
If you do not see the AI Voice Agent option in your dashboard, contact [email protected] to request access.
Conclusion
The AI Voice Agent is a powerful preview feature that brings voice-based interaction to your OKChat AI chatbot. By customizing the voice widget and configuring the settings, you can create a seamless and engaging experience for your users. For access to this feature, reach out to [email protected].
For further assistance, refer to the OKChat AI support resources or contact our support team.