Weakly Labeled Sound Event Detection with a Capsule-Transformer Model

2024年3月1日·

Kanghao Li

杨树国

Corresponding

Li Zhao

Wenwu Wang

· 0 分钟阅读时长

源文档 DOI

摘要

Sound event detection (SED) is a widely studied field that has achieved considerable success. The dynamic routing mechanism of capsule networks has been used for SED, but its performance in capturing global information of audio is still limited. In this paper, we propose a method for SED that by combining the capsule network with transformer leverages the strength of transformer in capturing global features with that of capsule network in capturing local features. The proposed method was evaluated on the DCASE 2017 Task 4 weakly labeled dataset. The obtained F-score and Equal Error Rate are 60.6% and 0.75, respectively. Compared to other baseline systems, our method achieves significantly improved performance. Keywords: Sound event detection, audio tagging, gated convolution, transformer, capsule network.

类型

期刊文章

出版物

Digital Signal Processing

最近更新于 2026年3月9日

Sound Event Detection Audio Tagging Gated Convolution Transformer Capsule Network 其他

Authors

Kanghao Li

Authors

杨树国

正教授

教授，博士生导师，哈尔滨工业大学博士后。数据科学与信息技术研究中心主任，人工智能海洋技术场景化应用山东省工程研究中心主任，青岛市人工智能海洋技术创新中心主任，青岛科技大学数学与交叉研究院院长。美国佐治亚理工学院高级访问学者、香港中文大学高级访问学者、北京交通大学高级访问学者；山东省数学会常务理事、山东省应用统计学会常务理事、人工智能海洋学专业委员会常务委员。近年来，主持或参与国家自然科学基金、国防科工委、电子工业部、省自然基金、省重点科研计划、省高校科研计划、省优秀中青年科学家基金、青岛市科技发展计划项目等各级各类科研项目40多项。

Authors

Li Zhao

Authors

Wenwu Wang

← Multi-scale spatial and spectral feature fusion for soil carbon content prediction based on hyperspectral images 2024年3月4日

Second-harmonic flat-top beam shaping via a three-dimensional nonlinear photonic crystal 2024年2月15日 →

No results found

Weakly Labeled Sound Event Detection with a Capsule-Transformer Model