Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
Vol 9 No 3 (2025): Volume 9, Issue 3, March 2025 | Pages: 182-192
International Research Journal of Innovations in Engineering and Technology
OPEN ACCESS | Research Article | Published Date: 28-03-2025
Audio content is abundant and diverse in today's digital age, ranging from music to podcasts and audio streams. Efficiently representing and searching this vast audio data is essential for applications like content identification, recommendation systems, and audio retrieval. Traditional audio fingerprinting methods have relied on handcrafted features and heuristics, which may lack scalability and robustness in real-world scenarios.
In contrast, deep learning has shown remarkable capabilities in various audio-related tasks, such as speech recognition and music classification. Leveraging deep learning-based methods for audio fingerprinting offers the potential to create compact yet informative representations of audio signals, enabling faster and more accurate content identification and search.
This paper explores deep-learning model to develop advanced audio fingerprinting methods. By utilizing models such as a variant of autoencoders – U-Net Autoencoders and Convolutional Neural Networks (CNNs), the work in the paper seeks to extract audio features, and compress and encode them to reduce the feature space effectively. Also, the work scope includes the challenge of noise resilience, ensuring that the audio fingerprints remain consistent and robust even for noisy samples.
This compressed, encoded audio fingerprint is then used to efficiently search the audio database for required purposes (for example, music identification). For creating the audio database, vector database of FAISS is selected as it provides efficient vector search capabilities, which can be utilized well for music identification.
Efficient representation of audio, audio fingerprinting, deep learning, U-Net Autoencoders, Convolutional Neural Networks (CNNs), compact feature representation, noise resilience, audio analysis, audio database search, vector database, Facebook AI Similarity Search (FAISS)
Divesh Singh. (2025). Deep Learning-based Fingerprinting Methods for Audio Representation and Search. International Research Journal of Innovations in Engineering and Technology - IRJIET, 9(3), 182-192. Article DOI https://doi.org/10.47001/IRJIET/2025.903024
This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence