ResCapsnet: A Capsule Network with CRAM and BiGRU for Sound Event Detection
摘要
Sound event detection (SED) is a challenging task where ambient sound events are detected from a given audio signal, which includes categorizing the events and estimating their onset and offset times. Deep learning methods such as convolutional neural networks (CNN) and recurrent neural networks (RNN) have achieved promising performance in SED. However, for overlapping sound events, existing deep learning methods are still limited in detecting individual sound events from their mixtures. Inspired by the success of the dynamic routing mechanism of the capsule network (CapsNet), this paper proposes a capsule network model (ResCapsnet-BiGRU) based on a customized residual attention module (CRAM) and bidirectional gated recurrent unit (BiGRU). CRAM is utilized to extract features from log-mel spectrograms that are relevant to sound events. Through dynamic routing, the capsule network can address the overlapping sound events problem. In addition, the BiGRU with time-distributed fully connected layers is adopted to obtain contextual information. Our proposed method was evaluated on two datasets: the Vehicle Weakly Labeled Sound Dataset (VWLSD , DCASE 2017 Task 4) and the Domestic Environment Sound Dataset (DESD , DCASE 2022 Task 4). It achieved F-scores of 62.1% and 75.9% on the Audio Tagging (AT) task, and 54.1% and 59.0% on the sound event detection (SED) task, respectively. The source codes are available at https://github.com/123sunbing/ResCapsnet.git.
类型
出版物
EURASIP Journal on Audio, Speech, and Music Processing
Authors
Authors

Authors
正教授
教授,博士生导师,哈尔滨工业大学博士后。数据科学与信息技术研究中心主任,人工智能海洋技术场景化应用山东省工程研究中心主任,青岛市人工智能海洋技术创新中心主任,青岛科技大学数学与交叉研究院院长。美国佐治亚理工学院高级访问学者、香港中文大学高级访问学者、北京交通大学高级访问学者;山东省数学会常务理事、山东省应用统计学会常务理事、人工智能海洋学专业委员会常务委员。近年来,主持或参与国家自然科学基金、国防科工委、电子工业部、省自然基金、省重点科研计划、省高校科研计划、省优秀中青年科学家基金、青岛市科技发展计划项目等各级各类科研项目40多项。
Authors
Authors