Speech synthesis and recognition systems (TTS, STT): Application in the sound search engine of the university library “svetozar markovic”

  • Adam Sofronijevic Library of University of Belgrade
  • Nikola Smolenski Library of University of Belgrade
  • Ivana Gavrilovic Library of University of Belgrade
Keywords: natural language processing, speech synthesis, speech recognition, audio search, text search

Abstract

This paper addresses the topics of speech synthesis technology (text-to-speech — TTS) and speech recognition technology (speech-to-text — STT), with a focus on their application in the sound search engine of the University Library “Svetozar Marković”. The introductory section elaborates on the subject both theoretically and from a historical perspective, starting with mechanical devices from the 18th century, such as Kratzenstein’s vocal tract model from 1779, up to modern systems based on artificial intelligence and deep learning. The development of these technologies for the Serbian language is also mentioned, from the first systems in the mid-1980s to today’s solutions. The paper describes the implementation of a search engine for audio recordings from the University Library’s YouTube channel, which uses Whisper JAX technology for speech recognition, achieving recognition accuracy of over 90%. The processes of metadata collection, speech transcription, data processing, and storage in a Lucene/Solr-based database are described in detail. The system enables searching transcribed content with support for finding similar words using Levenshtein distance, which increases search efficiency despite possible errors in speech recognition. Challenges such as the lack of temporal subject descriptors in metadata and the dependence of recognition quality on the  quality of the source recording are also discussed. The conclusion is that the developed system is a useful tool that facilitates access to the library’s audiovisual content, with plans for further development and expansion of its functionality.

Published
2025-12-30
Section
Digital marketing and multimedia