Timbre Transfer from Flute to Sarangi Using Latent Diffusion Bridge

Aashish Shrestha; Sanjivan Satyal

doi:https://doi.org/10.47001/IRJIET/2026.105091

Timbre Transfer from Flute to Sarangi Using Latent Diffusion Bridge

Aashish ShresthaDepartment of Electronics and Computer Engineering, Pulchowk Campus, Institute of Engineering, Lalitpur, NepalSanjivan SatyalDepartment of Electronics and Computer Engineering, Pulchowk Campus, Institute of Engineering, Lalitpur, Nepal

Vol 10 No 5 (2026): Volume 10, Issue 5, May 2026 | Pages: 683-686

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 31-05-2026

doi.org/10.47001/IRJIET/2026.105091

Full Text PDF

Abstract

Timbre transfer aims to modify the timbral characteristics of audio while preserving key elements like melody and rhythm. Advances in diffusion-based models have yielded promising results in image and audio synthesis. However, their application to ethnic Nepali instruments remains largely unexplored. We explore an unsupervised method for timbre transfer in Sarangi using latent diffusion bridges. In our experiment, the flute model maps the input audio into its corresponding Gaussian prior, and the Sarangi model reconstructs the target audio from the Gaussian prior. The trained Sarangi model can be used both as a source and a target model. Experimental results demonstrate that the model successfully keeps the melodic structure while altering timbral qualities.

Keywords

Audio synthesis, Latent Diffusion, Sarangi, Timbre Transfer.

Citation of this Article

Aashish Shrestha, & Sanjivan Satyal. (2026). Timbre Transfer from Flute to Sarangi Using Latent Diffusion Bridge. International Research Journal of Innovations in Engineering and Technology - IRJIET, 10(5), 683-686. Article DOI https://doi.org/10.47001/IRJIET/2026.105091

This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence

References

Bonnici, R. S., Benning, M., & Saitis, C. (2022, July). Timbre transfer with variational auto encoding and cycle-consistent adversarial networks. In 2022 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.

Mancusi, M., Halychanskyi, Y., Cheuk, K. W., Moliner, E., Lai, C. H., Uhlich, S., ... & Mitsufuji, Y. (2025, April). Latent diffusion bridges for unsupervised musical audio timbre transfer. In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.

Huang, S., Li, Q., Anil, C., Bao, X., Oore, S., & Grosse, R. B. (2018). Timbretron: A wavenet (cyclegan (cqt (audio))) pipeline for musical timbre transfer. arXiv preprint arXiv:1811.09620.

Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223-2232).

Jain, D. K., Kumar, A., Cai, L., Singhal, S., & Kumar, V. (2020, July). ATT: Attention-based timbre transfer. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-6). IEEE.

Engel, J., Agrawal, K. K., Chen, S., Gulrajani, I., Donahue, C., & Roberts, A. (2019). Gansynth: Adversarial neural audio synthesis. arXiv preprint arXiv:1902.08710.

Engel, J., Hantrakul, L., Gu, C., & Roberts, A. (2020). DDSP: Differentiable digital signal processing. arXiv preprint arXiv:2001.04643.

Kong, Z., Ping, W., Huang, J., Zhao, K., & Catanzaro, B. (2020). Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761.

Liu, H., Chen, Z., Yuan, Y., Mei, X., Liu, X., Mandic, D., ... & Plumbley, M. D. (2023). Audioldm: Text-to-audio generation with latent diffusion models. arXiv preprint arXiv:2301.12503.

Elizalde, B., Deshmukh, S., Al Ismail, M., & Wang, H. (2023, June). Clap learning audio concepts from natural language supervision. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.

Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2020). Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.

Défossez, A., Copet, J., Synnaeve, G., & Adi, Y. (2022). High fidelity neural audio compression. arXiv preprint arXiv:2210.13438.

Karras, T., Aittala, M., Aila, T., & Laine, S. (2022). Elucidating the design space of diffusion-based generative models. Advances in neural information processing systems, 35, 26565-26577.

Dowson, D. C., & Landau, B. (1982). The Fréchet distance between multivariate normal distributions. Journal of multivariate analysis, 12(3), 450-455.

For Authors

Publication Archives

Volume 1 - 2017

Volume 2 - 2018

Volume 3 - 2019

Volume 4 - 2020

Volume 5 - 2021

Volume 6 - 2022

Volume 7 - 2023

Volume 8 - 2024

Volume 9 - 2025

Volume 10 - 2026

For Board Members

Downloads

Research Areas

Timbre Transfer from Flute to Sarangi Using Latent Diffusion Bridge

Abstract

Keywords

Citation of this Article

References

International Research Journal of Innovations in Engineering
and Technology - IRJIET

Editorial Policies

Quick Links