Integrating text-to-speech (TTS) into your JavaScript applications has never been easier, thanks to Deepgram’s API. With its ultra-low latency and real-time capabilities, it’s perfect for various use cases like live narration, interactive voice response systems, and more. In this article, we’ll walk you through the key steps for setting up Deepgram’s JavaScript SDK to convert text into lifelike speech. We’ll also cover everything from API key creation to handling audio streams in real time.
To get started, you’ll need to install Deepgram’s SDK. This can be done easily using npm
:
npm install @deepgram/sdk
This package installs everything you need to start making TTS requests in JavaScript. You can find more details in the Deepgram documentation.
Before you can interact with the Deepgram API, you’ll need an API key for authentication. To generate one, create an account on Deepgram’s website, navigate to your dashboard, and generate a key. This key is crucial for all API requests.
For example:
const deepgram_api_key = 'your_deepgram_api_key';
You’ll use this key when initializing the SDK to authenticate your requests.
With the SDK installed and your API key ready, you can make a basic TTS request. The Deepgram API converts your text input into speech. Here’s a simple example:
const { Deepgram } = require('@deepgram/sdk');
const deepgram = new Deepgram(deepgram_api_key);
const inputText = 'Hello, this is Deepgram Text to Speech!';
const model = 'aura-asteria-en'; // Voice selection
deepgram.textToSpeech({
text: inputText,
model: model,
}).then((audio) => {
console.log('Audio data received:', audio);
});
In this example, aura-asteria-en
is used for voice selection. You can adjust the speed, tone, and other attributes based on your use case.
Unlock the power of seamless voice generation with PlayHT’s text to speech API, featuring the lowest latency in the industry. Enhance your applications with high-quality, natural-sounding AI voices and deliver an exceptional user experience – in real time.
For applications requiring real-time conversion, like live broadcasting or interactive voice agents, Deepgram’s WebSocket streaming API is a great fit. You can send text to the server and receive real-time audio.
Here’s how you might set up WebSocket streaming for real-time audio:
const ws = new WebSocket('wss://api.deepgram.com/v1/speak', {
headers: { Authorization: Token ${deepgram_api_key} }
});
ws.on('open', function open() {
ws.send(JSON.stringify({ type: 'Speak', text: 'Hello from Deepgram!' }));
});
ws.on('message', function incoming(data) {
console.log('Audio data received:', data);
});
WebSockets enable you to send streaming text and receive continuous audio as speech.
Deepgram’s API provides flexibility to customize the audio output settings. You can tweak settings like encoding, sample rate, and the number of audio channels. For instance, you can specify wav
or other formats depending on your requirements.
Example:
{
"encoding": "wav",
"sample_rate": 48000,
"channels": 1
}
You can define these properties when you send your API request to match the needs of your project.
Deepgram offers two primary methods for interacting with the API:
The method you choose depends on your specific use case. REST is ideal for simple conversions, while WebSockets are best suited for real-time streaming needs.
Deepgram offers multiple voices to choose from, each suited for different applications. In addition to selecting the appropriate voice model like aura-asteria-en, you can also adjust parameters such as speed and tone for enhanced customization.
const options = {
voice: 'aura-asteria-en',
speed: 1.2, // Adjust speed
};
When working with APIs, error handling is essential. For WebSocket-based streaming, you’ll want to manage reconnection strategies and handle timeouts effectively.
ws.on('error', (error) => {
console.error('WebSocket error:', error);
});
If you’re deploying in browser environments, ensure that you set up the required proxy configurations to secure your API requests. This is particularly important when working with frontend frameworks like React or Vue.
Deepgram’s TTS API supports multiple languages, making it perfect for multilingual apps. Simply select the appropriate language model when initializing your requests.
const model = 'es-ES'; // Spanish model
With Deepgram’s powerful JavaScript SDK, integrating text-to-speech functionality into your web or mobile apps is simple and flexible. From handling real-time streaming audio to batch processing pre-recorded text, the platform offers robust solutions tailored to a wide range of applications. By following the steps outlined above, you can bring audio transcription, speech-to-text, and TTS capabilities to life in your projects.
If you’re migrating from Deepgram Node SDK V2 to the new V3 of the JavaScript SDK, there are several key changes and improvements you’ll need to be aware of. This guide will help you understand the updates and how to adjust your code accordingly.
– In V3, you no longer use the Deepgram
class for initialization. Instead, you’ll use the createClient
function to initialize the SDK. This change simplifies the SDK and makes it more versatile.
const { Deepgram } = require("@deepgram/sdk");
const deepgram = new Deepgram(DEEPGRAM_API_KEY);
const { createClient } = require("@deepgram/sdk");
const deepgram = createClient(DEEPGRAM_API_KEY);
– In V3, transcription methods have been separated into synchronous and asynchronous methods. For instance, instead of using a callback directly in synchronous methods, V3 requires you to use a new method for asynchronous callbacks. This change allows for clearer code structuring when dealing with different transcription scenarios.
– V3 introduces scoped configurations, allowing you to define settings (like URLs or headers) at a global level or for specific namespaces such as transcribe
or listen
. This is particularly useful when managing different environments (e.g., local vs. production).
Example:
js
const deepgram = createClient(DEEPGRAM_API_KEY, {
global: { url: "http://localhost:8080" }
});
– V3 introduces broader compatibility with both UMD (Universal Module Definition) and ESM (ECMAScript Modules) formats, making it easier to use in various environments, including Node.js and browsers. You can now directly import Deepgram from a CDN using <script>
tags.
request
to fetch
– The SDK now uses fetch
for making HTTP requests instead of request
, aligning with modern JavaScript standards.
– The live transcription events have been improved in V3, allowing for better handling of real-time data streams, which is crucial for applications involving **real-time speech-to-text**.
– Error messages and handling have been significantly improved in V3. This means you’ll receive more informative error messages, making debugging easier and improving overall robustness.
To update to V3, you can install the latest version of the SDK via npm:
npm install @deepgram/sdk
– The method for transcribing local or remote files has also changed. Here’s an example of how you would transcribe a file in V3:
const response = await deepgram.transcription.preRecorded({
stream: fs.createReadStream("./audio.wav"),
mimetype: "audio/wav",
});
const { result, error } = await deepgram.listen.prerecorded.transcribeFile(
fs.createReadStream("./audio.wav"),
{ model: "nova-2" }
);
– Similarly, URL-based transcriptions have changed:
const response = await deepgram.transcription.preRecorded({
url: "https://example.com/audio.wav",
});
const { result, error } = await deepgram.listen.prerecorded.transcribeUrl({
url: "https://example.com/audio.wav",
});
By following these migration steps, you’ll be able to upgrade your application to use the latest features and improvements in Deepgram’s JavaScript SDK V3.
For more detailed guidance, check out the official Deepgram SDK V2 to V3 Migration Guide.
Ready to start building? Explore the full capabilities of the SDK in the Deepgram API reference.