
Last 6 Weeks Before Defense: What We Learned Transcribing 14 Interviews on a Deadline
A field-tested workflow for turning a stack of unprocessed interviews into coded, defensible thesis material when the clock has already started.

My partner defends her thesis in two weeks. Six weeks ago we sat down to plan the home stretch and realized she had 14 semi-structured interviews — most of them around an hour — sitting on her hard drive completely untouched. No transcripts, no codes, just a folder of .m4a files named after participant initials.
If you have ever been in that position, you know the feeling. The fieldwork is "done", but the analysis hasn't even started, and every day you spend wrestling with audio is a day you are not coding, writing, or sleeping.
This post is the honest version of what worked, what wasted our time, and what I would tell anyone else staring down a deadline with a pile of qualitative interviews. It is mostly methodology, with one short note about the tool we ended up building because of this experience.
Things we wasted time on
Trying to transcribe the first interview manually
The first night, we did what every advisor tells you to do: "transcribe at least one yourself, you'll learn things." Fine. The interview was 58 minutes. Including playback at 0.7x speed, fixing typos, separating speakers, and adding rough timestamps, it took her almost five hours.
That math does not work when you have 13 more interviews. Five hours times 14 is 70 hours of pure typing — about a month and a half of evenings — and we had six weeks to also code, write, and prep for the defense itself. Manual transcription is a useful exercise on interview number one. It is a project killer at scale.
Hitting the wall on the most-recommended tool
We signed up for the transcription service everyone in her cohort uses. Free tier got us through three interviews. On file four, we hit the upload wall, upgraded to the paid plan, and then discovered the paid plan caps you at 10 file imports per month. We had 14 interviews and three weeks to process them. The math, again, did not work.
We were not angry at the company — their pricing makes sense for someone who records two meetings a week and wants live notes. It does not make sense for a researcher who records nothing for nine months and then has 14 interviews in three weeks. The tool was matched to the wrong shape of work.
Hand-fixing speaker labels for half a Saturday
Default labels like Speaker 1 and Speaker 2 were not acceptable to her advisor, who wanted participants identified by code (P01, P02, etc.) and the interviewer marked clearly. We tried to fix this by find-and-replace in Word. It works on a single transcript. It falls apart across 14 of them, especially when the AI occasionally swaps speakers mid-interview. We lost most of a Saturday cleaning files we should have generated correctly the first time.
Eight lessons we would tell ourselves on day one
1. Transcribe each interview the same night you record it
The single biggest workflow change we made was treating transcription as part of fieldwork, not analysis. Same evening, push the file through. You catch problems while they are cheap to fix: a participant whose audio cut out, a recorder that picked up too much background noise, a section where you misheard the question. If you wait three weeks to find out interview seven has unusable audio, you cannot rerun it. If you find out the same night, sometimes you can.
There is also a psychological reason. A folder of 14 unprocessed audio files feels infinite. A folder of 14 transcripts feels like a job you can finish. Front-loading transcription buys you a sense of progress that carries you through coding.
2. Pick your downstream tool first, then pick your transcription format
We made the mistake of optimizing the transcripts for human reading before checking what NVivo (her chosen QDA tool) actually wanted. NVivo prefers timestamped .txt or .docx with consistent speaker prefixes; it can do auto-coding by speaker if your formatting is regular. We had to reformat the first three transcripts because we had used em-dashes for speaker turns and NVivo did not parse them.
Decide your analysis tool first. Read its import documentation. Then pick a transcription output format that drops cleanly into it. The order matters because reformatting 14 files by hand at 11pm the night before a chapter is due is not a position you want to be in.
3. Keep timestamps even if they look ugly
We were tempted to strip timestamps because they made the document harder to read. Don't. When your supervisor asks "where did P04 say that thing about institutional trust?" you want to be able to scrub the audio in three seconds. We ended up keeping timestamps every 30 seconds in the transcript and they paid for themselves the first time her advisor asked to verify a quote. Reading flow is a lower priority than retrievability.
4. Standardize speaker naming on the very first transcript
The naming scheme we settled on — Interviewer: and P01:, P02:, … — needs to be locked in before transcript number two. Otherwise you will end up with Researcher, Me, Q, Interviewer (J) scattered across files and find-and-replace becomes a mine field. Pick the format that matches whatever your methodology chapter or appendix style requires, write it on a sticky note, and never deviate.
If your transcription tool produces speaker labels that don't match (e.g. SPEAKER_00), do the rename in a single batch script or a regex find-and-replace template, not by hand. We wrote a 10-line Python script that processed all 14 files in under a second. That script saved us hours.
5. For non-English interviews, test on a 5-minute sample first
Two of her participants were not native English speakers and had thick accents in some sections. The recommended tool produced very confident but wrong transcripts on those two — names misspelled consistently, key methodological terms mistranscribed, and at one point a whole sentence inverted in meaning. We caught it because she happened to listen to that segment again. We almost didn't.
Before committing budget, slice off a five-minute sample from your hardest-to-transcribe interview (heaviest accent, worst audio, most jargon) and test every candidate tool on it. If the sample is bad, the full transcript will be worse, and you will spend more time fixing it than the cheaper tool would have cost.
6. Back everything up twice, in two places
Audio files are irreplaceable. We kept one copy on her laptop, one on an external drive, and one in her university OneDrive. This sounds excessive until you remember that grad students are notorious for spilling coffee on laptops in week 11. Transcripts are recoverable; audio is not. If you have to choose, prioritize the raw audio.
A bonus benefit of keeping audio in cloud storage is that most transcription tools will pull directly from a URL, which removes one upload step from your workflow.
7. Don't try to clean every transcript to publication quality before coding
This was the temptation that almost broke us in week two. There is a strong urge to polish each transcript to a perfect, quotable, appendix-ready state before moving on. Resist it. The right order is: rough transcript → coding → identify which quotes you actually need in the final write-up → polish only those quotes to publication quality. Cleaning the other 90% of the transcript is wasted effort. You will not cite most of it.
The transcripts in the appendix can be at "clean enough to read" quality. Only quoted excerpts in the body need to be perfect, and at that point you are polishing maybe 40 short passages, not 14 full hours.
8. Budget the bottleneck — review and fix is 25 to 40 percent of audio length
Even with a fast AI transcription, you still need to listen back, fix names, mark inaudibles, and confirm speaker labels on tricky overlaps. We measured this: on a one-hour interview with reasonably clean audio, review took her about 18 minutes. On a noisier interview with two speakers talking over each other, review took 35 minutes. Plan for somewhere in that range.
If you skip the review step, your "transcribed in 5 minutes!" interview is not actually defensible. If you over-budget for it, you can always use the spare time to start coding. Always estimate the bottleneck.
A note on Otter
A few people will read this and assume we are about to bash Otter. We are not. Otter is genuinely excellent for the workflow it was built for: you sit in meetings every week, you want live notes, you want a searchable archive of your work life. If that is your shape of work, the monthly subscription is fair, and the 10-import-per-month cap on the paid tier rarely matters because you mostly record live.
It just isn't the right shape for thesis-style burst transcription, where you record nothing for nine months, then have 14 interviews to process in three weeks, then go quiet again. For that pattern, the per-month limits are the wrong abstraction.
About the tool we ended up using
When we hit the import wall on file four, my partner asked why nobody made a per-hour transcription service for exactly this case. I built TranscribeBee over the next two weekends, partly to get her unstuck and partly because I suspected other researchers were in the same position.
It is intentionally simple. You upload audio, you pay $2 per hour of audio processed, you get speaker-labeled transcripts with timestamps in 100+ languages, files up to 500MB. There is no monthly plan, no import cap, no subscription to cancel. If you transcribe one interview a year, you pay for one hour. If you transcribe 14, you pay for 14.
If you are in this exact situation — a stack of interviews, a hard deadline, a methodology that needs speaker labels and timestamps, a budget that doesn't justify a yearly subscription — TranscribeBee is built for you. If you record meetings every day and want live notes, stay with Otter, it is the better fit.
You can see what the output looks like on the sample transcript page before signing up. Or if you want to try it on the hardest five minutes of your hardest interview before committing, that's exactly what we'd recommend.
Either way: start tonight, transcribe as you go, lock in your speaker naming on file one, and don't try to make every transcript perfect. Save your perfectionism for the chapters you actually have to write.
Author

Categories
More Posts

How to Transcribe Research Interviews for Your Thesis
Turn interview recordings into structured transcripts for analysis, coding, and thesis appendices with a workflow that suits bursty academic work.


How to Transcribe Lecture Recordings
Turn lecture recordings into searchable study notes with the right workflow, a realistic cost comparison, and a pay-as-you-go option for occasional use.


How to Get Better Transcripts from Bad Audio
Practical ways to improve transcription results before and after recording, from room setup and microphone placement to noise cleanup and speaker labels.

Newsletter
Join the community
Subscribe to our newsletter for the latest news and updates