5 Ways to Subtitle and Caption Videos Automatically Using Speech Recognition

Adding subtitles to your videos is easier than it may seem, and brings all kinds of great benefits. In this tutorial we cover five ways to use speech recognition software to automatically add subtitles or captions to a video.

We’ll show you how to use the leading artificial intelligence technologies. Read on to find out more!

1. Add Automatic Subtitles in Premiere Pro (Subscription)

Learning how to auto generate captions in Premiere Pro it’s pretty easy. We have a full step-by-step tutorial on speech recognition in Premiere Pro, or watch the video below to learn how with Tom:

2. YouTube’s Automatic Captioning (Free)

Creating YouTube auto generated subtitles is easy. Automatic transcription is built directly into the service, and can be edited in the Video Manager.

Although the automated closed captioning software does seem to be continually improving, YouTube auto generated subtitles are notoriously (and sometimes hilariously) imperfect. Thankfully, you have the option to manually adjust them.

How to Correct the YouTube Auto Generated Subtitles

Select the Subtitles & CC option to view the subtitle settings, and then click on the subtitle file you want to alter. In my case, that is English (Automatic).

How to Correct the Automatic Subtitles on YouTubeHow to Correct the Automatic Subtitles on YouTubeHow to Correct the Automatic Subtitles on YouTube

Now, press the Edit button to begin adjusting them. On the left side, you’ll see the transcription that YouTube automatically generated. You can type over any of the incorrect subtitles to correct them. Use the embedded video player to work your way through the video and intervene as needed.

Click Publish Edits when you’re finished to save the corrected subtitles.

To export and download a subtitle file, click the Actions dropdown and select the file type you need. SRT is a widely supported format (including Facebook video). Google makes it tricky to save the subtitles from other peoples’ videos, but there is a workaround.

There are other problems with YouTube’s service. For one, there is no way to collaborate on the subtitles. Or maybe your videos are for a course you’re producing and you don’t want to put your files on YouTube.

Or maybe you are Irish and YouTube just completely mangles your accent. There are plenty of reasons why YouTube isn’t always the right tool for the job. Let’s look at how to generate subtitles and captions for any video.

3. Watson + Amara (Free/Paid)

You might have heard about IBM’s Watson supercomputer when it was defeating defending Jeopardy! champions in 2011. IBM have put Watson (now watsonx) technology to use processing all kinds of data, including speech. Simply upload an audio file to the speech-to-text demo and it will transcribe the audio for you.

Go ahead and jump to the app. Don’t be fooled: the demo has plenty of functionality. I uploaded a six-minute audio clip and it was transcribed in just a few minutes.

I recommend transcription as a finishing step in editing a video. Lock down your edit, then export a WAV audio file from your video editor.

If your app doesn’t support exporting directly to WAV, I recommend you use software like Audacity to convert from other audio formats to WAV.

To upload audio, click on Select Audio File on the app page. Browse to your WAV file and choose it. Watson also supports Spanish, French, Portuguese and Japanese, so make sure and choose the language from the dropdown if it differs from English.

Note: If you use an ad blocking plugin for your web browser, make sure and disable it for the IBM website. The demo might not function correctly if adblock is enabled.

Select Audio FileSelect Audio FileSelect Audio File
Once you’ve selected an audio file, the Watson technology will go to work on transcribing it.

A few minutes later, you’ll have a transcription of the audio. At this stage, don’t worry about correcting all of the incorrect words in the transcription. Copy the text from the transcription box and save it into a text file, using an app like Notepad on Windows or TextEdit on mac.

If you are a capable programmer, IBM offers a much more in-depth way to use the Watson technology. Check out IBM’s tutorial on how to use the full Watson capabilities programmatically.

Use Amara to Create a Finished Subtitle File From Watson

Now you have a rough transcription of our audio. Next you’ll synchronize it with the video.

For this step we’ll use Amara, a service that’s designed to do just that. Amara can help us generate a finished subtitle file that can be used on many different services. Jump over to Amara’s site and sign up for a free account to get started.

After you’ve signed up for Amara, you’ll need to link to your video file. Amara doesn’t support uploading video directly to its site, so you’ll need to get the file online first. I have an FTP server, so I’ll often upload the video to my server and link to it.

If you don’t have a web server to use you can upload to Vimeo, for example, and use it as a temporary host.
Amara generates captions for talking videos.Amara generates captions for talking videos.Amara generates captions for talking videos.
Amara generates captions for talking videos.

Paste in a link to your video and select the spoken language from the list. Hit Add to Amara Public, and Amara will load the video into its subtitling system. Now it’s time to upload the transcription Watson generated for us.

Click on Add/Edit subtitles on the left side of the screen to upload the text file to Amara.

Amara Load ScreenAmara Load ScreenAmara Load Screen
Upload the text file that we created using Watson.

Amara will take you to the editing screen. Click on the Subtitle tools menu and select Upload subtitles from the dropdown list. Browse and select the Watson text file.

Upload SubtitlesUpload SubtitlesUpload Subtitles
Press Upload.

Once you press Upload, Amara will load the generated text. Go back to the editing screen and start tweaking your subtitles.

Check out the video below to walk through the time sync process:

You could create subtitles using Amara alone, but that requires typing the captions from scratch. IBM Watson gives us a great starting point, and pairs well with Amara to get the timing right.

One of the perks of using Amara is that it’s great for collaboration and translation. If you have a skilled translator, they can just as easily link to your video and generate subtitles in a different language.

4. Descript (Subscription)

Descript is tool with a refreshing approach to edit audio and create transcripts, all in one package. It features a really slick interface and desktop apps. The service generates automatic captions with AI.

A free account lets you get started with text-based editing and use of the AI tool for video subtitles. I created an account, then opened an audio file in the app.

AI tool for video subtitles: voice to captions in Descript.AI tool for video subtitles: voice to captions in Descript.AI tool for video subtitles: voice to captions in Descript.
Once Descript processes your audio, you’ll see a rock-solid transcription in a document.

 Descript automatic captions with AI are as accurate as any tool I’ve tried. Here’s the magic: your transcribed audio appears just like a document. You can scroll and review your take easily using the efficient interface.

But while you’re reviewing (and adjusting) your transcription, you can also edit and adjust your audio on the timeline. And notice those beautiful text annotations on top of the waveform!

Automatic captions with AI: timeline with annotations.Automatic captions with AI: timeline with annotations.Automatic captions with AI: timeline with annotations.
On the waveform timeline, notice that Descript tags individual words to the timestamp.

Past tools always felt like choosing between text-centric and audio-centric tools. But with Descript, it feels like a “best of both worlds” where you can think in audio and see the text output.

Because Descript has built-in recording tools, you could actually use it as a fully-featured recording center. Imagine using it to create a podcast and export a finished file that includes a ready-to-share transcript.

When you’re finished, you can export a subtitle file or jump directly to the video editing tool of your choice. 

Descript is an interesting AI tool for video subtitles, with a unique and efficient interface to edit video based on the automatically transcribed audio.

5. Kapwing (Subscription)

Kapwing is a browser-based video editor with a speech-to-text subtitling tool that’s very easy to use, and creates automatic translations, too. You can add captions for talking videos easily.

We are testing the results, and so far they are impressive. It’s the newest entry on this list, but probably the best option if you are someone who needs to turn voice to captions and subtitles for social media videos as it integrates well with online services.

Accessibility is Good for Everyone

With the rise of multiple online video platforms and the spread of video-capable smartphones we’re now producing and using more and more video every day. This swell of video creates three new access problems:

  1. How do we find the videos we want in a crowded world?
  2. How do you watch all these new videos if you have a hearing impairment?
  3. How do you follow along with video that isn’t in your first language, as is now an everyday reality for millions of people?

Subtitles and captions solve all of these problems.

Search engines have no idea what’s in your videos. This is slowly changing with the rise of algorithms that can interpret images but, fundamentally, search engines are built to read text. The best way to tell the world what’s in your video is still to describe it.

Subtitles and closed captions provide exactly the kind of juicy text information that Google and the rest love to have.

As many as 15 percent of Americans have a hearing impairment. Closed captions, or written text overlaid on a video that replicates what the speaker is saying, are a key accessibility tool. That’s why automated closed captioning software is more relevant than ever.

Accessibility is essential in all kinds of videos, but it’s especially important when teaching with video. That’s why Envato is moving towards providing captions on as many Tuts+ videos as we possibly can.

It’s also worth noting that Facebook videos play automatically, but they play silently by default, so if you want everyone on Facebook to know what people in your videos are saying you really need captions or subtitles.

Subtitles are frequently used as a way to translate language from one medium to another (such as spoken English to written English) or one language to another. They’re a great way to make video more accessible to diverse linguistic audiences.

Learn Video Editing

We’ve built a complete guide to help you learn how to edit videos, whether you’re just getting started with the basics or you want to master video editing and post-production.

Also check out the time-saving video templates on Envato Elements, learn how to plan your video projects, and discover the best video editing software. More resources below:

And if you’re looking for some free B-roll to use in your next video, try the huge collection on Mixkit. Browse the growing library of free HD video, and sign up to see the brand new videos that are added each week.


This content originally appeared on Envato Tuts+ Tutorials and was authored by Andrew Childress

Adding subtitles to your videos is easier than it may seem, and brings all kinds of great benefits. In this tutorial we cover five ways to use speech recognition software to automatically add subtitles or captions to a video.

We’ll show you how to use the leading artificial intelligence technologies. Read on to find out more!

1. Add Automatic Subtitles in Premiere Pro (Subscription)

Learning how to auto generate captions in Premiere Pro it’s pretty easy. We have a full step-by-step tutorial on speech recognition in Premiere Pro, or watch the video below to learn how with Tom:

2. YouTube’s Automatic Captioning (Free)

Creating YouTube auto generated subtitles is easy. Automatic transcription is built directly into the service, and can be edited in the Video Manager.

Although the automated closed captioning software does seem to be continually improving, YouTube auto generated subtitles are notoriously (and sometimes hilariously) imperfect. Thankfully, you have the option to manually adjust them.

How to Correct the YouTube Auto Generated Subtitles

Select the Subtitles & CC option to view the subtitle settings, and then click on the subtitle file you want to alter. In my case, that is English (Automatic).

How to Correct the Automatic Subtitles on YouTubeHow to Correct the Automatic Subtitles on YouTubeHow to Correct the Automatic Subtitles on YouTube

Now, press the Edit button to begin adjusting them. On the left side, you’ll see the transcription that YouTube automatically generated. You can type over any of the incorrect subtitles to correct them. Use the embedded video player to work your way through the video and intervene as needed.

Click Publish Edits when you’re finished to save the corrected subtitles.

To export and download a subtitle file, click the Actions dropdown and select the file type you need. SRT is a widely supported format (including Facebook video). Google makes it tricky to save the subtitles from other peoples’ videos, but there is a workaround.

There are other problems with YouTube’s service. For one, there is no way to collaborate on the subtitles. Or maybe your videos are for a course you’re producing and you don’t want to put your files on YouTube.

Or maybe you are Irish and YouTube just completely mangles your accent. There are plenty of reasons why YouTube isn’t always the right tool for the job. Let’s look at how to generate subtitles and captions for any video.

3. Watson + Amara (Free/Paid)

You might have heard about IBM’s Watson supercomputer when it was defeating defending Jeopardy! champions in 2011. IBM have put Watson (now watsonx) technology to use processing all kinds of data, including speech. Simply upload an audio file to the speech-to-text demo and it will transcribe the audio for you.

Go ahead and jump to the app. Don’t be fooled: the demo has plenty of functionality. I uploaded a six-minute audio clip and it was transcribed in just a few minutes.

I recommend transcription as a finishing step in editing a video. Lock down your edit, then export a WAV audio file from your video editor.

If your app doesn’t support exporting directly to WAV, I recommend you use software like Audacity to convert from other audio formats to WAV.

To upload audio, click on Select Audio File on the app page. Browse to your WAV file and choose it. Watson also supports Spanish, French, Portuguese and Japanese, so make sure and choose the language from the dropdown if it differs from English.

Note: If you use an ad blocking plugin for your web browser, make sure and disable it for the IBM website. The demo might not function correctly if adblock is enabled.

Select Audio FileSelect Audio FileSelect Audio File
Once you’ve selected an audio file, the Watson technology will go to work on transcribing it.

A few minutes later, you’ll have a transcription of the audio. At this stage, don’t worry about correcting all of the incorrect words in the transcription. Copy the text from the transcription box and save it into a text file, using an app like Notepad on Windows or TextEdit on mac.

If you are a capable programmer, IBM offers a much more in-depth way to use the Watson technology. Check out IBM’s tutorial on how to use the full Watson capabilities programmatically.

Use Amara to Create a Finished Subtitle File From Watson

Now you have a rough transcription of our audio. Next you’ll synchronize it with the video.

For this step we’ll use Amara, a service that’s designed to do just that. Amara can help us generate a finished subtitle file that can be used on many different services. Jump over to Amara’s site and sign up for a free account to get started.

After you’ve signed up for Amara, you’ll need to link to your video file. Amara doesn’t support uploading video directly to its site, so you’ll need to get the file online first. I have an FTP server, so I’ll often upload the video to my server and link to it.

If you don’t have a web server to use you can upload to Vimeo, for example, and use it as a temporary host.
Amara generates captions for talking videos.Amara generates captions for talking videos.Amara generates captions for talking videos.
Amara generates captions for talking videos.

Paste in a link to your video and select the spoken language from the list. Hit Add to Amara Public, and Amara will load the video into its subtitling system. Now it’s time to upload the transcription Watson generated for us.

Click on Add/Edit subtitles on the left side of the screen to upload the text file to Amara.

Amara Load ScreenAmara Load ScreenAmara Load Screen
Upload the text file that we created using Watson.

Amara will take you to the editing screen. Click on the Subtitle tools menu and select Upload subtitles from the dropdown list. Browse and select the Watson text file.

Upload SubtitlesUpload SubtitlesUpload Subtitles
Press Upload.

Once you press Upload, Amara will load the generated text. Go back to the editing screen and start tweaking your subtitles.

Check out the video below to walk through the time sync process:

You could create subtitles using Amara alone, but that requires typing the captions from scratch. IBM Watson gives us a great starting point, and pairs well with Amara to get the timing right.

One of the perks of using Amara is that it’s great for collaboration and translation. If you have a skilled translator, they can just as easily link to your video and generate subtitles in a different language.

4. Descript (Subscription)

Descript is tool with a refreshing approach to edit audio and create transcripts, all in one package. It features a really slick interface and desktop apps. The service generates automatic captions with AI.

A free account lets you get started with text-based editing and use of the AI tool for video subtitles. I created an account, then opened an audio file in the app.

AI tool for video subtitles: voice to captions in Descript.AI tool for video subtitles: voice to captions in Descript.AI tool for video subtitles: voice to captions in Descript.
Once Descript processes your audio, you’ll see a rock-solid transcription in a document.

 Descript automatic captions with AI are as accurate as any tool I’ve tried. Here’s the magic: your transcribed audio appears just like a document. You can scroll and review your take easily using the efficient interface.

But while you’re reviewing (and adjusting) your transcription, you can also edit and adjust your audio on the timeline. And notice those beautiful text annotations on top of the waveform!

Automatic captions with AI: timeline with annotations.Automatic captions with AI: timeline with annotations.Automatic captions with AI: timeline with annotations.
On the waveform timeline, notice that Descript tags individual words to the timestamp.

Past tools always felt like choosing between text-centric and audio-centric tools. But with Descript, it feels like a "best of both worlds" where you can think in audio and see the text output.

Because Descript has built-in recording tools, you could actually use it as a fully-featured recording center. Imagine using it to create a podcast and export a finished file that includes a ready-to-share transcript.

When you’re finished, you can export a subtitle file or jump directly to the video editing tool of your choice. 

Descript is an interesting AI tool for video subtitles, with a unique and efficient interface to edit video based on the automatically transcribed audio.

5. Kapwing (Subscription)

Kapwing is a browser-based video editor with a speech-to-text subtitling tool that’s very easy to use, and creates automatic translations, too. You can add captions for talking videos easily.

We are testing the results, and so far they are impressive. It’s the newest entry on this list, but probably the best option if you are someone who needs to turn voice to captions and subtitles for social media videos as it integrates well with online services.

Accessibility is Good for Everyone

With the rise of multiple online video platforms and the spread of video-capable smartphones we’re now producing and using more and more video every day. This swell of video creates three new access problems:

  1. How do we find the videos we want in a crowded world?
  2. How do you watch all these new videos if you have a hearing impairment?
  3. How do you follow along with video that isn’t in your first language, as is now an everyday reality for millions of people?

Subtitles and captions solve all of these problems.

Search engines have no idea what’s in your videos. This is slowly changing with the rise of algorithms that can interpret images but, fundamentally, search engines are built to read text. The best way to tell the world what’s in your video is still to describe it.

Subtitles and closed captions provide exactly the kind of juicy text information that Google and the rest love to have.

As many as 15 percent of Americans have a hearing impairment. Closed captions, or written text overlaid on a video that replicates what the speaker is saying, are a key accessibility tool. That’s why automated closed captioning software is more relevant than ever.

Accessibility is essential in all kinds of videos, but it’s especially important when teaching with video. That’s why Envato is moving towards providing captions on as many Tuts+ videos as we possibly can.

It’s also worth noting that Facebook videos play automatically, but they play silently by default, so if you want everyone on Facebook to know what people in your videos are saying you really need captions or subtitles.

Subtitles are frequently used as a way to translate language from one medium to another (such as spoken English to written English) or one language to another. They’re a great way to make video more accessible to diverse linguistic audiences.

Learn Video Editing

We’ve built a complete guide to help you learn how to edit videos, whether you’re just getting started with the basics or you want to master video editing and post-production.

Also check out the time-saving video templates on Envato Elements, learn how to plan your video projects, and discover the best video editing software. More resources below:

And if you’re looking for some free B-roll to use in your next video, try the huge collection on Mixkit. Browse the growing library of free HD video, and sign up to see the brand new videos that are added each week.


This content originally appeared on Envato Tuts+ Tutorials and was authored by Andrew Childress


Print Share Comment Cite Upload Translate Updates
APA

Andrew Childress | Sciencx (2016-07-08T23:38:41+00:00) 5 Ways to Subtitle and Caption Videos Automatically Using Speech Recognition. Retrieved from https://www.scien.cx/2016/07/08/5-ways-to-subtitle-and-caption-videos-automatically-using-speech-recognition/

MLA
" » 5 Ways to Subtitle and Caption Videos Automatically Using Speech Recognition." Andrew Childress | Sciencx - Friday July 8, 2016, https://www.scien.cx/2016/07/08/5-ways-to-subtitle-and-caption-videos-automatically-using-speech-recognition/
HARVARD
Andrew Childress | Sciencx Friday July 8, 2016 » 5 Ways to Subtitle and Caption Videos Automatically Using Speech Recognition., viewed ,<https://www.scien.cx/2016/07/08/5-ways-to-subtitle-and-caption-videos-automatically-using-speech-recognition/>
VANCOUVER
Andrew Childress | Sciencx - » 5 Ways to Subtitle and Caption Videos Automatically Using Speech Recognition. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2016/07/08/5-ways-to-subtitle-and-caption-videos-automatically-using-speech-recognition/
CHICAGO
" » 5 Ways to Subtitle and Caption Videos Automatically Using Speech Recognition." Andrew Childress | Sciencx - Accessed . https://www.scien.cx/2016/07/08/5-ways-to-subtitle-and-caption-videos-automatically-using-speech-recognition/
IEEE
" » 5 Ways to Subtitle and Caption Videos Automatically Using Speech Recognition." Andrew Childress | Sciencx [Online]. Available: https://www.scien.cx/2016/07/08/5-ways-to-subtitle-and-caption-videos-automatically-using-speech-recognition/. [Accessed: ]
rf:citation
» 5 Ways to Subtitle and Caption Videos Automatically Using Speech Recognition | Andrew Childress | Sciencx | https://www.scien.cx/2016/07/08/5-ways-to-subtitle-and-caption-videos-automatically-using-speech-recognition/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.