Using ChatGPT and OpenAI’s Whisper to transcribe and translate old family archives
ArticlesIn 2018, I was in Hawaii with my in-laws, and my wife and I took advantage of the trip to stage and record an interview of them talking about their history: their childhood in Thailand, their emigration to the United States, etc. Surprisingly, my wife’s mother had a lot to say and we ended up with more than an hour of interview footage. My wife asked questions in Thai and English, and her parents responded in Thai.
I was behind the camera, but of course could not understand anything. I had planned to have the footage transcribed and translated, but upon doing research back in 2018, I discovered that the prices for such services (resulting in a SRT subtitles file) were astronomical! To be more scrappy, I separated the tasks into transcription and translation, and set up a spreadsheet with a timecode column to make exporting an SRT from the spreadsheet easier once the work was done. I also hit up friends in Thailand to help find folks to do the work for an hourly rate. I can’t remember the exact numbers, but the final cost was many times lower when split up than it would have been if I had engaged a specialized, all-in-one service.
Tonight, using AI, I transcribed and translated an old, digitized audio tape into a bilingual SRT for almost literally pennies. The recording is from 1979 and is of my parents and me speaking Mandarin to each other while telling / reading stories. I was 3 years old (!). Although Mandarin was my first language, I can’t speak it well today (I was way better as a 3 year old). And as a father of a 5 year old now, it’s fun to hear myself when I was even younger than my son is today.
The quality of Whisper’s transcriptions in suboptimal recordings is amazing, and I expect it and future AI-based services like it to dramatically upgrade the automated captioning we see everywhere (like in YouTube), which are just OK.
For the translation and transcription, I used the following tools:
- OpenAI’s Whisper (via MacWhisper)
- GPT for (Google) Sheets and Docs plugin
- a text editor that supports search and replace (I used Sublime)
I used Adobe Premiere Pro to burn the captions into a video.
Steps, in detail
Digitize
Luckily, I digitized all of our family’s analog media years ago when I was at home for long periods of time during my father’s sickness. For cassette tapes, I used our old family Nakamichi tape player and a USB audio interface (a Blackmagic Design Intensity Shuttle (eBay), I believe). These days, you can get something like a Behringer U-Control UCA222 ($29) to digitize analog audio.
Compress / pre-process audio
I like using The Levelator to bring out dialog, but it’s controversial. Audio people often prefer to have control over compression, but I love this tool!
Transcribe
Open the audio file with MacWhisper. Whisper is resource intensive, so if you don’t have a fast computer, it could take some time to complete. Because there was both Mandarin and English in my recording, I left MacWhisper’s language option as “Auto Detect”. It shows you its progress in real time.
Import into Google Sheets
Export the transcript as a .csv file and import it into Google Sheets. MacWhisper’s .csv export uses semicolons between columns, so when you import, select “Custom” for separator type, and type ‘;’ as the separator.
Translate using GPT
Install the GPT for (Google) Sheets and Docs plugin, open Google Sheets, and then navigate to Extensions->GPT for Sheets and Docs->Launch Sidebar. You’ll need to paste in an OpenAI API key, which you can get by creating an OpenAI account and giving them your credit card. These transactions are insanely cheap; here’s the pricing page. It costs about about a nickel to translate 1000 rows of ~15-word prompts.
Add a column after the transcript and call it “Translation”. Use the plugin’s GPT formula in those cells to create a translation prompt that includes the transcript. Here’s the prompt I used:
=GPT(CONCATENATE("Translate the following to English. Output only the translation. ", C2))
(obviously, replace “C2” with your cell number for the translation in your current row)
Copy and paste to all the cells you need to translate, and the plugin will go do its work.
Clean up
This isn’t a perfect system, and in some cells, the transcript could be wrong from the original Whisper transcription. The call to GPT might also return something conversational, like, “I’m sorry, but I the phrase, ‘xxx’ couldn’t be translated because ‘yyy'”, and some cells will have time-out errors.
You’ll need to do a bit of manual clean-up
When I’m done, I like to copy the entire column and paste it back in as plain text (⌘-Shift-V) so the GPT formulas are wiped out. Otherwise, the system will sometimes re-query GPT, which can be annoying.
Making the SRT file
A SRT file is a standard subtitles file format that is in plain text and easy to read. MacWhisper can directly export a SRT file, which you should do if you don’t need translation, but if you need to manipulate its output, it’s easier to work in a spreadsheet and export it in SRT format from there.
Luckily, the timestamps in and out are already in the correct format, so you just need to CONCATENATE a bunch of stuff together to make it work. I did this in a new column E in the spreadsheet. I could have crafted the formula manually, but instead, I asked ChatGPT to write it for me. It’s a little messy, but it works:
=ROW(A2)-1&CHAR(10)&"00:"&TEXT(A2,"hh:mm:ss,000")&" --> 00:"&TEXT(B2,"hh:mm:ss,000")&CHAR(10)&C2&CHAR(10)&D2&CHAR(10)
CHAR(10) is a line feed.
I use a multi-line prompt for ChatGPT to generate this formula because it didn’t like doing it all at once:
> Give me a google sheets formula to create a valid SRT subtitle file entry using 4 columns: start time, end time, and 2 caption columns. The first 2 columns are already in the correct format.
> There is a header.
> Add a "00:" before the timestamps in the first 2 columns
Now, you have a bunch of SRT entries in column E (see the image above). Select them all (minus the header row) and paste them into a text editor like Sublime. There were some weird quotes in my text, so I had to search for all double quotes and replace them with nothing. Run your SRT through a syntax validator, and if it passes, you’re ready to burn it into a video!
Burning Captions into a video
This is almost as easy as dragging the SRT file into a Premiere Pro project. Once they are in a video timeline, all you have to do is select the “Burn Captions into Video” option when you export. If you are a command line person, you can also do this for free using ffmpeg.
Conclusion
There are existing services that will automatically transcribe videos that cost tens of dollars per hour instead of the thousands of dollars I was quoted 5 years ago, but it’s likely that Whisper-based transcripts are going to be better until it’s integrated into more products. And if you can string together your own workflows, it should only cost you pennies.
MacWhisper has a free tier, or you can pay a one-time upgrade price of $11 for better transcriptions using larger AI models. OpenAI’s GPT API costs almost nothing to use unless you’re doing a ton with it.
If you are reading this months or years after I’ve posted it, the world has probably changed a lot! I assume that both the effort required and price to do this sort of work will trend toward zero.
Bonus: ChatGPT (GPT-4) summary of the stories
I pasted all the Chinese from the tape into ChatGPT, and asked it to summarize in English. Here’s what it said (I don’t have access to the APIs with huge prompt/input sizes yet, so I had to send in chunks).
In the first story, the character is asked if they want to live with Yiguo. Tamara says yes, but is told to ask the Prince for his name. When the Prince returns, Tamara asks for his name. The storyteller’s mother is asked to retell this story as it is a favorite.
In the second story, the character sees a magical item in another place after passing through a door. They are placed inside and discover a cup. They continue on, and a big bad wolf appears, jumping into a river. The characters want to play on a small island in the sea.
In the third story, Vivian and friends go swimming one day. They find two other Vivians and decide to knock down a tree. Once the tree is knocked down, they discover a hole they can crawl into. They collect things and eventually find another hole to crawl through.
In the fourth story, Tamara is given many gems by a squirrel to make a hat. Another squirrel bites a large egg for Tamara. Tamara finishes the hat and takes it to Sewidge, but a snake on Sewidge’s body tries to bite her. A bird from the broken egg saves Tamara by burning the snake to death. Sewidge’s house burns down, and she turns into an old woman. Prince Igor and Tamara go back to their place in Guangzhou, where Tamara’s mother is waiting for them.
In the fifth story, there are three bears – a big bear father, a medium-sized bear mother, and a small bear baby. They live in a house in the forest and go for walks every morning. One day, a little girl named Goldilocks, with curly hair, walks through the forest and finds the bears’ house. Since the bears are out for a walk, she enters the house and tries sitting on each of their chairs. The father’s chair is too hard, the mother’s chair is too soft, but the baby’s chair is just right, though it breaks under her weight. Hungry, Goldilocks tries their cereal. The father’s cereal is too hot, the mother’s cereal is too cold, but the baby’s cereal is just right, so she eats it all. Goldilocks then goes upstairs and tries the beds, finding the baby’s bed to be just right. She falls asleep.
The three bears return and notice that someone has been sitting in their chairs, eating their cereal, and sleeping in their beds. They go upstairs and find Goldilocks still sleeping in the baby bear’s bed.
The story is about a character named Godlax, who wakes up frightened after hearing a sound and seeing three bears. They quickly jump out of bed and run out of the house, never daring to return. Outside, three kittens have lost their gloves, making them cry. Their mother scolds them for losing their gloves, and they go search for them. Once they find their gloves, they return home and eat, but the gloves are dirty. Their mother cleans the gloves and hangs them out to dry.
The story then transitions to the Three Little Pigs, who are playing and working together. Each pig builds a house from a different material: straw, wood, and bricks. One day, a big bad wolf comes and threatens to eat the first pig. When the pig refuses to come out, the wolf blows down the straw house and eats the pig. The same happens to the second pig with the wood house. However, the third pig’s brick house withstands the wolf’s huffing and puffing. The wolf attempts to enter through the chimney but ends up falling into boiling water prepared by the third pig. The pig then eats the cooked wolf as a meal.
The story ends with a scene involving a toy train, which breaks down and can’t move. The clown and children are frightened. The train explains that it’s sick and can’t move anymore, and they need to figure out what to do next.
In this story, the toys are worried about how to get to the children living on the mountain. A shiny new passenger engine arrives, but it refuses to carry the toys. A big engine arrives, but it also refuses to help, as it only carries machinery and goods. A rusty old engine arrives, but it is too tired to help. Finally, a little blue engine arrives, and although it is small, it agrees to carry the toys up the mountain. The toys and children are very happy.
In the second story, a turtle named Toby is running late for school. He meets a rabbit and a squirrel on his way, but they refuse to wait for him because he is slow. The turtle, rabbit, and squirrel eventually reach school just in time for class with their teacher, an owl.
In this story, the turtle rushes to class and arrives while the children are singing “Twinkle Twinkle Little Star.” The teacher, an owl, asks the rabbit, squirrel, and turtle to read. Each student reads something different, but the turtle admits he is slow and not very smart. However, he just wants to do good deeds and help others. The teacher encourages him that he can do good deeds if he wants to.
At snack time, the rabbit brings cookies for everyone. The students eat quickly and go outside to play, leaving the slow turtle behind. The squirrel accidentally falls into the water from a tree and calls for help. The rabbit tries to help but also falls into the water. Both the squirrel and rabbit cannot swim. The turtle, who can swim, jumps into the water, helps them onto his back, and brings them to safety. Everyone is grateful to the turtle, who is happy to have done a good deed despite being slow and not very smart.
In this story, ChuChu, the little switch engine, is very happy working at the railway station, moving train cars around on the tracks. He loves his job and is proud of himself. However, one day he dreams of becoming a bigger, more beautiful passenger engine when he grows up.
ChuChu meets a passenger engine who laughs at his dream, saying that while people, animals, and trees grow up, trains do not. ChuChu becomes sad and refuses to work, even when the engineer tries to make him go. The engineer calls him an “Old Tea Pot,” which angers ChuChu and makes him start working again.
Later, ChuChu arrives at a train station, where the passenger engine from before has broken down and is unable to move. The driver asks ChuChu to help. Despite being a small switch engine, ChuChu bravely takes on the task of pulling the passenger engine and its train cars, filled with people. He proves his strength by pulling the heavy train over mountains and through tunnels.
In the end, the workers praise ChuChu for his great effort and determination, showing that even a small switch engine can make a big difference.