AI Voice Agent

Preview Feature: The AI Voice Agent is currently in preview. For access, please reach out to [email protected].

Introduction

The AI Voice Agent is an advanced feature in OKChat AI that enables voice-based interactions with your agent. This feature allows users to engage with the agent using natural language, making it ideal for hands-free operations and voice-activated devices.

Key Features

Voice Interaction

Users can speak to the chatbot, and the AI Voice Agent will respond with voice output.

Advanced Turn Detection

Detects when a user is speaking and supports English, Multilingual, and Push-to-Talk detection.

Customizable Voice Widget

Customize the appearance, text, voice, and behavior of the voice widget to match your brand.

Real-Time Speech Recognition

Uses advanced speech recognition to understand and process user commands.

Flexible Provider Selection

Choose from a wide range of LLM, Text-to-Speech (TTS), and Speech-to-Text (STT) providers.

Voice Activity Detection (VAD)

Uses voice activity detection to process user commands efficiently.

Telephony Integration

Integrate your voice agent with phone systems for seamless communication.

Default Tools

Web search, URL scraping, weather, and knowledge base search are available by default.

Providers

OKChat AI’s Voice Agent supports multiple providers to give you maximum flexibility.

OpenAI Realtime

Uses OpenAI’s realtime API with OKChat’s knowledge base and tools for fast interactions.

Features: Various voices, keyword detection, voice vibe.

Gemini Live

Uses Google’s Gemini API for a seamless voice experience.

Features: Various voices, keyword detection, voice vibe.

OKCHAT Provider (Recommended)

The most flexible option, using OKChat’s own pipeline, optimized for the platform. It allows you to mix and match different providers for core functionalities.

LLM Providers: OpenAI, Google, Groq, and more.

TTS Providers: OpenAI, Google, ElevenLabs, Deepgram, Cartesia.

STT Providers: OpenAI, Deepgram, Speechmatics, Groq, Google.

Getting Started with AI Voice Agent

Getting Started Steps

1. Accessing the AI Voice Agent

Navigate to the AI Voice Agent section under Integration.

If you do not see this option, contact [email protected] to request access.

2. General Configuration

In the General tab, you can configure the core behavior of your voice agent.

Prompt: Set the system prompt to define the agent’s personality and instructions.
Voice Provider: Choose between OKCHAT (Recommended), OpenAI Realtime, or Gemini Live. The OKCHAT provider offers the most customization options.
LLM Provider (OKCHAT only): If you chose the OKCHAT provider, select an underlying Large Language Model (LLM) provider like OpenAI, Google, or Groq.
Model: Choose the specific model for the selected provider (e.g., gpt-4o, gemini-2.0-flash).
Turn Detector: Select how the agent detects user speech: English, Multilingual, or Push To Talk (deprecated).
TTS Provider (OKCHAT only): Choose a Text-to-Speech provider like OpenAI, Google, ElevenLabs, Deepgram, or Cartesia to generate the agent’s voice.
Choose AI Voice: Select a specific voice from the chosen TTS provider’s library.
STT Provider: Select a Speech-to-Text provider like OpenAI, Deepgram, Speechmatics, Groq, or Google for transcription.
STT Model: Choose the specific transcription model.
Language: Enable Auto-detect language or manually specify language codes (e.g., en for ISO 639-1, en-US for BCP-47, depending on the STT provider).
Keywords: Provide a comma-separated list of domain-specific keywords to improve speech recognition accuracy.
Greeting Message: Set the initial message the agent speaks when a conversation starts.
Voice Vibe: Describe the desired personality, tone, and speaking style for the AI voice, particularly for OpenAI voices.

3. Appearance, Text, and Avatar

Customize the visual aspects of your voice widget.

Appearance Tab: Adjust colors (background, text, button), radius for cards and buttons, and the widget’s on-screen position (bottom-right, top-left, etc.). You can also choose to hide the “Powered by OKCHAT.AI” watermark.
Text Tab: Customize the text for buttons and status indicators like “Start call” or “Listening…”.
Avatar Tab: Upload a custom image to be used as the agent’s avatar.

4. Advanced Configuration

Fine-tune the agent’s performance from the Config tab.

VAD Threshold: Adjust VAD sensitivity (0-1). Lower is more sensitive. Default: 0.5.
Prefix Padding (ms): Minimum speech duration to start a chunk. Default: 500ms.
Silence Duration (ms): Minimum silence before ending a segment. Default: 1000ms.
Temperature: Adjust response randomness (0-1). Higher is more random. Default: 0.7.
Max Output Tokens: Limit response length. Default: 2048 tokens.

5. Kiosk Mode

Configure a full-screen voice agent experience, ideal for public displays.

Enable Kiosk Mode from the Kiosk tab.
Customize the Onboarding Screen with a title, description, instructions, and brand colors.
Upload a mascot image and set the button text.
Use the provided Kiosk URL to launch the agent in a browser.

6. Telephony Integration

Connect your voice agent to phone lines from the Telephony tab. This allows users to call in and interact with your AI agent over the phone.

Copy the embed code from the bottom of the configuration page and paste it into your website’s HTML. The data-position attribute can be configured in the Appearance tab.

<script
  src="https://v2.okchat.ai/chatbot-voice-widget.js"
  data-chatbot-id="YOUR_CHATBOT_ID"
  data-position="bottom-right"
></script>

Use Cases

Customer Support

Provide voice-based customer support for users who prefer speaking over typing.

Hands-Free Interaction

Enable voice commands for users in environments where typing is inconvenient (e.g., driving, cooking).

Accessibility

Improve accessibility for users with disabilities by offering a voice-based interface.

Best Practices

Test Thoroughly

Optimize VAD Settings

Monitor Performance

User Guidance

Troubleshooting

Voice Widget Not Responding

Inaccurate Responses

Access Issues

Conclusion

The AI Voice Agent is a powerful preview feature that brings voice-based interaction to your OKChat AI chatbot. By customizing the voice widget and configuring the settings, you can create a seamless and engaging experience for your users. For access to this feature, reach out to [email protected].

For further assistance, refer to the OKChat AI support resources or contact our support team.

Get Started

Social Media

Plugins

Introduction

Key Features

Voice Interaction

Advanced Turn Detection

Customizable Voice Widget

Real-Time Speech Recognition

Flexible Provider Selection

Voice Activity Detection (VAD)

Telephony Integration

Default Tools

Providers

OpenAI Realtime

Gemini Live

OKCHAT Provider (Recommended)

Getting Started with AI Voice Agent

Getting Started Steps

1. Accessing the AI Voice Agent

2. General Configuration

3. Appearance, Text, and Avatar

4. Advanced Configuration

5. Kiosk Mode

6. Telephony Integration

7. Embedding the Voice Widget

Use Cases

Customer Support

Hands-Free Interaction

Accessibility

Best Practices

Troubleshooting

Conclusion

Get Started

Social Media

Plugins

​Introduction

​Key Features

Voice Interaction

Advanced Turn Detection

Customizable Voice Widget

Real-Time Speech Recognition

Flexible Provider Selection

Voice Activity Detection (VAD)

Telephony Integration

Default Tools

​Providers

OpenAI Realtime

Gemini Live

OKCHAT Provider (Recommended)

​Getting Started with AI Voice Agent

​Getting Started Steps

​1. Accessing the AI Voice Agent

​2. General Configuration

​3. Appearance, Text, and Avatar

​4. Advanced Configuration

​5. Kiosk Mode

​6. Telephony Integration

​7. Embedding the Voice Widget

​Use Cases

Customer Support

Hands-Free Interaction

Accessibility

​Best Practices

​Troubleshooting

​Conclusion

Introduction

Key Features

Providers

Getting Started with AI Voice Agent

Getting Started Steps

1. Accessing the AI Voice Agent

2. General Configuration

3. Appearance, Text, and Avatar

4. Advanced Configuration

5. Kiosk Mode

6. Telephony Integration

7. Embedding the Voice Widget

Use Cases

Best Practices

Troubleshooting

Conclusion