Tech Corner 3rd Q – 2018

Digital cameras and cheap mass data storage have resulted in more media produced during production. When a director doesn’t have to turn off the cameras (2-3 for every setup), they don’t. Thus, they avoid hair, makeup, wardrobe, personal assistants swarming the set, delaying the next take. How editorial handles the mass amount of information is evolving albeit slowly. One problem is how to find dialogue that you know that exists, but not exactly where. These snippets of dialogue come from several sources: A director remembers an adlibbed line, the script or a transcript of an interview.

PhraseFind

Avid Media Composer has two types of searches in the Find tool. The most basic is Find, where you type in a clip name, click on ‘Find,’ and the results window will list the located clips.

Or, you type in text and search the project’s audio with PhraseFind. According to the Avid website: “PhraseFind automatically analyzes all clips in your project and phonetically indexes all audible [dialogue]…” The underlying technology is licensed from the company, Nexidia. And according to their website: “Nexidia pioneered the field of neural phonetic speech analytics.” In other words, their technology analyzes speech to phonetically index and search audio. In many languages.

PhraseFind is often conflated with Avid’s ScriptSync. They both use the same underlying phonetic index technology. PhraseFind searches for dialogue. ScriptSync matches specific dialogue takes with text in a document or script. The Nexidi engine is tasked with automatically matching up dialogue and text. So, the editor can open that script and see markers which, when clicked, will park the source monitor on that clip, at that line of dialogue. Most assistants I’ve talked with say the automatic matching in ScriptSync is too inaccurate to be relied on, so they do the matching manually.

Thus, ScriptSync can work without Nexidia. PhraseFind can’t. My application for PhraseFind mostly has been on a documentary. The Lafayette Escadrille tells the story of several Americans who went to France during the early part of World War I, before the United States entered the war, and became pilots in an all-American air unit. From the latest script, here is a quote from an interview that I needed to locate. (Although there is a tape number and timecode for this quote, what is missing is the clip name. The tape number is from the audio transcripts of the interview, so I don’t know where this clip resides.)

The top three results, all scoring 98, point to the same piece of dialogue/media in three different clips. The lower-scored clips found will have similar audio, but the accuracy is less exact. Which means they could be similar phrases. At some point as the number lowers, the match isn’t close at all.

The Find window is independent of the application, meaning you can continue other tasks in Media Composer while Find performs a search.

Double-clicking the subclip, NARAYAN, opens the bin of that clip, and loads the clip into the source window, right before the searched for audio.

In settings, you can specify how much before the ‘audio start’ to park the playback bar.

PhraseFind has been a great time saver in finding clips. It is also helpful in finding words from certain speakers to repair audio, for example replacing a single word at the end of a sentence where the speaker runs into another word.

But what if you don’t have a transcript, or an accurate script, and the director remembers a certain ad-libbed line of dialogue. PhraseFind can be useful. But I experimented with another idea: creating transcripts from speech-to-text applications.

Dragon NaturallySpeaking is a highly-regarded application for translating speech to text. Most generally, this is from a speaker into a microphone, as in dictation. Part of the setup of Dragon is to train it to the speaker’s voice. At $300 for the Mac version, the cost is a little steep.

Through an Avid editors’ online forum I discovered Descript. The Descript application (at descript.com) is a transcription service.

The application is free. It takes an audio file and transcribes the speech-to-text using Google Speech as the underlying technology. The first 30 minutes transcribed is free. After that you are charged $0.15 per minute. A higher tier costs $0.07 per minute with a $10 monthly charge. A higher tier uses human translators at $1.00/ minute, which they call White Glove.

I tested the two applications with a short clip of audio from the same documentary.

With a short narration clip, Dragon produced the following text.

The transcription is pretty good. But it missed the names ‘William Thaw’ and ‘Bert Hall.’ It also transcribed incorrectly ‘merging’ and the ‘battle of the Martin.’ It also tweaked ‘Chemin des Dames’ as ‘shim to Dom.’ but it’s hardly fair to expect it to translate French.

And with Descript:

Descript has its own text processor. Corrections can be made directly in the app. And it carries the analyzed sound file, highlighting the text from the file as it plays. The sound file itself can be edited, if you were editing a podcast, to correct mistakes.

It didn’t get ‘Chemin des Dames.’ But otherwise it was accurate. Descript got the names ‘William Thaw’ and ‘Bert Hall’ correct, as well as ‘Battle of the Marne.’

Have you ever used Google Voice as your cell phone’s answering machine? It will provide a text version of your phone message, but it is very often complete gibberish. How can Google Speech be so good?

The biggest concern with a service that uses the web to analyze audio is security. It may be possible to mostly remain anonymous, but who knows.

Another option for analyzing/searching audio for text is Soundbite, a stand-alone program from Boris FX, the developers of Continuum Complete, and sellers of mocha Pro tracking software and the VFX plug-in, Sapphire. The complete review appeared in this space in 2016 (“Two Tools,” CinemaEditor magazine, Q1, 2016).

Soundbite also uses the Nexidia technology. As a standalone program, it analyzes/searches QuickTime files. It basically can work as PhraseFind for Adobe Premiere Pro or Apple’s Final Cut Pro.

I’m too far into my current film project to use Descript to transcribe dialogue. I plan on trying it in the future, as well as using PhraseFind. Anyone need a $300 copy of Dragon NaturallySpeaking?

Thanks to the producers of The Lafayette Escadrille for the use of materials and images from that film.