Deepgram Text to Speech API JavaScript: A Comprehensive Guide This guide covers how to use the Deepgram text to speech JavaScript SDK for converting text to real-time audio, customizing output, and integrating live transcription.

in API

September 29, 2024 7 min read
Deepgram Text to Speech API JavaScript: A Comprehensive Guide

Low latency, highest quality text to speech API

clone voiceClone your voice
Free API Playground

Table of Contents

Integrating text-to-speech (TTS) into your JavaScript applications has never been easier, thanks to Deepgram’s API. With its ultra-low latency and real-time capabilities, it’s perfect for various use cases like live narration, interactive voice response systems, and more. In this article, we’ll walk you through the key steps for setting up Deepgram’s JavaScript SDK to convert text into lifelike speech. We’ll also cover everything from API key creation to handling audio streams in real time.

1. Installing the SDK

To get started, you’ll need to install Deepgram’s SDK. This can be done easily using npm:

npm install @deepgram/sdk

This package installs everything you need to start making TTS requests in JavaScript. You can find more details in the Deepgram documentation.

2. Creating API Keys

Before you can interact with the Deepgram API, you’ll need an API key for authentication. To generate one, create an account on Deepgram’s website, navigate to your dashboard, and generate a key. This key is crucial for all API requests.

For example:

const deepgram_api_key = 'your_deepgram_api_key';

You’ll use this key when initializing the SDK to authenticate your requests.

3. Basic TTS Requests

With the SDK installed and your API key ready, you can make a basic TTS request. The Deepgram API converts your text input into speech. Here’s a simple example:

const { Deepgram } = require('@deepgram/sdk');

const deepgram = new Deepgram(deepgram_api_key);

const inputText = 'Hello, this is Deepgram Text to Speech!';

const model = 'aura-asteria-en'; // Voice selection

deepgram.textToSpeech({

text: inputText,

model: model,

}).then((audio) => {

console.log('Audio data received:', audio);

});

In this example, aura-asteria-en is used for voice selection. You can adjust the speed, tone, and other attributes based on your use case.

Get Started with the Lowest Latency Text to Speech API

Unlock the power of seamless voice generation with PlayHT’s text to speech API, featuring the lowest latency in the industry. Enhance your applications with high-quality, natural-sounding AI voices and deliver an exceptional user experience – in real time.

Try Playground Get Started

4. Streaming TTS for Real-Time Applications

For applications requiring real-time conversion, like live broadcasting or interactive voice agents, Deepgram’s WebSocket streaming API is a great fit. You can send text to the server and receive real-time audio.

Here’s how you might set up WebSocket streaming for real-time audio:

const ws = new WebSocket('wss://api.deepgram.com/v1/speak', {

headers: { Authorization: Token ${deepgram_api_key} }

});

ws.on('open', function open() {

ws.send(JSON.stringify({ type: 'Speak', text: 'Hello from Deepgram!' }));

});

ws.on('message', function incoming(data) {

console.log('Audio data received:', data);

});

WebSockets enable you to send streaming text and receive continuous audio as speech.

5. Output Settings: Customizing Your Audio

Deepgram’s API provides flexibility to customize the audio output settings. You can tweak settings like encoding, sample rate, and the number of audio channels. For instance, you can specify wav or other formats depending on your requirements.

Example:

{

"encoding": "wav",

"sample_rate": 48000,

"channels": 1

}

You can define these properties when you send your API request to match the needs of your project.

6. Managing API Requests: REST vs WebSocket

Deepgram offers two primary methods for interacting with the API:

  1. REST API for batch requests, perfect for sending pre-recorded text to be converted into speech.
  2. WebSocket API for live, real-time applications.

The method you choose depends on your specific use case. REST is ideal for simple conversions, while WebSockets are best suited for real-time streaming needs.

7. Voice Selection

Deepgram offers multiple voices to choose from, each suited for different applications. In addition to selecting the appropriate voice model like aura-asteria-en, you can also adjust parameters such as speed and tone for enhanced customization.

const options = {

voice: 'aura-asteria-en',

speed: 1.2, // Adjust speed

};

Additional Information an Engineer Might Need

Error Handling and Retries

When working with APIs, error handling is essential. For WebSocket-based streaming, you’ll want to manage reconnection strategies and handle timeouts effectively.

ws.on('error', (error) => {

console.error('WebSocket error:', error);

});

Environment Configuration

If you’re deploying in browser environments, ensure that you set up the required proxy configurations to secure your API requests. This is particularly important when working with frontend frameworks like React or Vue.

Multilingual Support

Deepgram’s TTS API supports multiple languages, making it perfect for multilingual apps. Simply select the appropriate language model when initializing your requests.

const model = 'es-ES'; // Spanish model

Common Use Cases for Deepgram TTS in JavaScript

  1. Live Transcriptions: Use Deepgram SDK for real-time transcription and TTS to provide dynamic spoken feedback in applications.
  2. Interactive Apps: Combine TTS with speech-to-text (STT) for voice-driven apps, like personal assistants or customer service bots.
  3. Transcribing Meetings: Convert meeting notes into audio files or use real-time transcription for live meeting platforms.

With Deepgram’s powerful JavaScript SDK, integrating text-to-speech functionality into your web or mobile apps is simple and flexible. From handling real-time streaming audio to batch processing pre-recorded text, the platform offers robust solutions tailored to a wide range of applications. By following the steps outlined above, you can bring audio transcription, speech-to-text, and TTS capabilities to life in your projects.

Deepgram JavaScript SDK Migration Notes: From V2 to V3

If you’re migrating from Deepgram Node SDK V2 to the new V3 of the JavaScript SDK, there are several key changes and improvements you’ll need to be aware of. This guide will help you understand the updates and how to adjust your code accordingly.

Key Changes and Updates

New Initialization Approach

– In V3, you no longer use the Deepgram class for initialization. Instead, you’ll use the createClient function to initialize the SDK. This change simplifies the SDK and makes it more versatile.

Before (V2):

const { Deepgram } = require("@deepgram/sdk");

const deepgram = new Deepgram(DEEPGRAM_API_KEY);

After (V3):

const { createClient } = require("@deepgram/sdk");

const deepgram = createClient(DEEPGRAM_API_KEY);

Async and Sync Transcription Methods

– In V3, transcription methods have been separated into synchronous and asynchronous methods. For instance, instead of using a callback directly in synchronous methods, V3 requires you to use a new method for asynchronous callbacks. This change allows for clearer code structuring when dealing with different transcription scenarios.

Scoped Configuration

– V3 introduces scoped configurations, allowing you to define settings (like URLs or headers) at a global level or for specific namespaces such as transcribe or listen. This is particularly useful when managing different environments (e.g., local vs. production).

Example:

js

const deepgram = createClient(DEEPGRAM_API_KEY, {

global: { url: "http://localhost:8080" }

});

UMD and ESM Support

– V3 introduces broader compatibility with both UMD (Universal Module Definition) and ESM (ECMAScript Modules) formats, making it easier to use in various environments, including Node.js and browsers. You can now directly import Deepgram from a CDN using <script> tags.

Switch from request to fetch

– The SDK now uses fetch for making HTTP requests instead of request, aligning with modern JavaScript standards.

Live Transcription Enhancements

– The live transcription events have been improved in V3, allowing for better handling of real-time data streams, which is crucial for applications involving **real-time speech-to-text**.

Error Handling Improvements

– Error messages and handling have been significantly improved in V3. This means you’ll receive more informative error messages, making debugging easier and improving overall robustness.

Migration Steps

Installation

To update to V3, you can install the latest version of the SDK via npm:

npm install @deepgram/sdk

Transcription of Files

– The method for transcribing local or remote files has also changed. Here’s an example of how you would transcribe a file in V3:

Before (V2):

const response = await deepgram.transcription.preRecorded({

stream: fs.createReadStream("./audio.wav"),

mimetype: "audio/wav",

});

After (V3):

const { result, error } = await deepgram.listen.prerecorded.transcribeFile(

fs.createReadStream("./audio.wav"),

{ model: "nova-2" }

);

URL Transcriptions

– Similarly, URL-based transcriptions have changed:

Before (V2):

const response = await deepgram.transcription.preRecorded({

url: "https://example.com/audio.wav",

});

After (V3):

const { result, error } = await deepgram.listen.prerecorded.transcribeUrl({

url: "https://example.com/audio.wav",

});

By following these migration steps, you’ll be able to upgrade your application to use the latest features and improvements in Deepgram’s JavaScript SDK V3.

For more detailed guidance, check out the official Deepgram SDK V2 to V3 Migration Guide.

Ready to start building? Explore the full capabilities of the SDK in the Deepgram API reference.

Recent Posts

Listen & Rate TTS Voices

See Leaderboard

Top AI Apps

Alternatives

Similar articles