Automating Medical and Legal Transcriptions

A Generative AI based solution to automate the process of medical and legal transcription with good accuracy so that transcriptionists can focus only on reviewing the output instead of creating the transcription itself saving 75% time and cost involved.

Problem Statement

The client is a hundreds of medical and legal audios generated to transcription on a daily basis. To handle this massive load through transcriptionists accurately is a laborious and time consuming process.

Hence, the client was looking for how to automate this process.

Solution

To automate the process of legal and medical transcription, we first understood the manual process and the various nuances involved. Then we evaluated multiple services available in the market like AWS Transcribe, Open AI’s Whisper (Open Source) and others. Out of these, Whisper worked the best for our use case.

The next task was speaker diarization ie. to differentiate between what each speaker said and print it accordingly. We did this using Meta’s demucs library, Whisper’s X and punctuation model and Nvidia based NeuralDiarizer model.

Technologies

Development: Python
Transcription: Whisper
Speaker Diarization: Nvidia NeuralDiarizer, Meta demucs, Whisper X and Punctuation

Value Delivered

Our program is able to generate speaker diarized transcription for a 10 min audio in 5-6 mins with 90% accuracy