Automated Transcription Tool with Hate Speech Analysis

Automated Transcription Tool with Hate Speech Analysis


The challenge

The Agency for Audio and Audiovisual Media Services (AVMU) in Skopje continuously monitors the content broadcasted on the radio and TV stations in North Macedonia, ensuring transparency in the broadcaster’s work and maintaining the regulations established by the Law.

The agency deals with the problem of on-time and efficient processing of the large pool of audio and visual content recorded from all radio and television stations on a daily basis since it requires substantial human effort to perform it manually. 

The solution

Based on the requirements analysis and the AVMU’s needs, the suggested solution provides a multi-step framework that incorporates the latest advents in Natural Language Processing and Speech Processing hosted on an HPC-based environment to completely automate the annotation process of audio pools.

The pipeline consists of the following steps: 

  • Background music removal.
  • Transcription of human speech into text in the Macedonian language.
  • Segmentation and alignment to facilitate the navigation in long audio recordings.
  • Speaker diarization to partition the audio stream into homogeneous segments according to the speaker’s identity.
  • Speaker identification and biometrics to improve the search in audio files using the identity of the speaker.
  • And hate speech detection in the Macedonian language.

This pipeline is hosted in a horizontally scalable execution environment using a GPU cluster, thus providing fast and resilient processing of the recorded audio streams incoming from multiple channels.

Figure 1 Framework Architecture

The benefits

The benefits for the AVMU of using the framework for fast processing and transcription of audio files are multifold. First, the results showed that our solution eases the analysis of broadcasted audio streams sourced from multiple channels, including radio and television.

Secondly, considering the Deep Learning models computing demand, the high-performance computing environment allows near real-time pipeline execution that helps the agency to strengthen its internal processes. Third, the general framework can be easily improved and adapted to their needs, gratefully to the deployed micro-service architecture in the initial solution.

Figure 2 Active audio streams and channels list
Figure 3 Transcription preview and hate speech analysis results