Unsupervised Improvement of Audio-Text Cross-Modal Representations

dc.contributor.authorWang, Zhepei
dc.contributor.authorSubakan, Cem
dc.contributor.authorSubramani, Krishna
dc.contributor.authorWu, Junkai
dc.contributor.authorTIAGO FERNANDES TAVARES
dc.contributor.authorFABIO JOSE AYRES
dc.contributor.authorSmaragdis, Paris
dc.creatorWang, Zhepei
dc.creatorSubakan, Cem
dc.creatorSubramani, Krishna
dc.creatorWu, Junkai
dc.creatorSmaragdis, Paris
dc.date.accessioned2025-01-08T00:04:10Z
dc.date.available2025-01-08T00:04:10Z
dc.date.issued2023
dc.description.abstractRecent advances in using language models to obtain cross-modal audio-text representations have overcome the limitations of conventional training approaches that use predefined labels. This has allowed the community to make progress in tasks like zero-shot classification, which would otherwise not be possible. However, learning such representations requires a large amount of human-annotated audio-text pairs. In this paper, we study unsupervised approaches to improve the learning framework of such representations with unpaired text and audio. We explore domain-unspecific and domain-specific curation methods to create audio-text pairs that we use to further improve the model. We also show that when domain-specific curation is used in conjunction with a soft-labeled contrastive loss, we are able to obtain significant improvement in terms of zero-shot classification performance on downstream sound event classification or acoustic scene classification tasks.en
dc.formatDigital
dc.format.extent5 p.
dc.identifier.urihttps://repositorio.insper.edu.br/handle/11224/7245
dc.language.isoInglês
dc.subjectAudio-text representation learningen
dc.subjectData aug-mentationen
dc.subjectContrastive learningen
dc.subjectSound event classificationen
dc.subjectAcoustic scene classificationen
dc.titleUnsupervised Improvement of Audio-Text Cross-Modal Representations
dc.typeconference paper
dspace.entity.typePublication
local.description.event3 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
local.identifier.sourceUrihttps://arxiv.org/abs/2305.01864
local.publisher.cityNew York
local.publisher.countryEstados Unidos
local.subject.cnpqCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
local.subject.cnpqENGENHARIAS::ENGENHARIA ELETRICA
local.typeTrabalho de Evento
relation.isAuthorOfPublicationb94cce1d-a49e-40dc-becd-051f9254fab8
relation.isAuthorOfPublication37971022-7c69-4e93-9186-4c9431a1f95c
relation.isAuthorOfPublication.latestForDiscoveryb94cce1d-a49e-40dc-becd-051f9254fab8
Arquivos
Pacote Original
Agora exibindo 1 - 2 de 2
N/D
Nome:
ACESSO_RESTRITO_Trabalho_de_Evento_2023_Unsupervised_improvement_of_audio_text_cross_modal_representations_TC.pdf
Tamanho:
262.54 KB
Formato:
Adobe Portable Document Format
N/D
Nome:
Primeira_Pagina_Trabalho_de_Evento_2023_Unsupervised_improvement_of_audio_text_cross_modal_representations_TC.pdf
Tamanho:
139.37 KB
Formato:
Adobe Portable Document Format
Licença do Pacote
Agora exibindo 1 - 1 de 1
N/D
Nome:
license.txt
Tamanho:
236 B
Formato:
Item-specific license agreed upon to submission
Descrição: