To Örebro University

oru.seÖrebro University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition
Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, Germany.
Örebro University, School of Science and Technology. (Centre for Applied Autonomous Sensor Systems (AASS))
Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, Germany.
Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, Germany; exXxa GmbH, Hamburg, Germany.
Show others and affiliations
2023 (English)In: Artificial Neural Networks and Machine Learning – ICANN 2023: 32nd International Conference on Artificial Neural Networks, Heraklion, Crete, Greece, September 26–29, 2023, Proceedings, Part VII / [ed] Lazaros Iliadis; Antonios Papaleonidas; Plamen Angelov; Chrisina Jayne, Springer, 2023, Vol. 14260, p. 376-388Conference paper, Published paper (Refereed)
Abstract [en]

In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have reported state-of-the-art performance on various benchmarks. These systems intrinsically learn how to handle and remove noise conditions from speech. Previous research has shown, that it is possible to extract the denoising capabilities of these models into a preprocessor network, which can be used as a frontend for downstream ASR models. However, the proposed methods were limited to specific fully convolutional architectures. In this work, we propose a novel method to extract the denoising capabilities, that can be applied to any encoder-decoder architecture. We propose the Cleancoder preprocessor architecture that extracts hidden activations from the Conformer ASR model and feeds them to a decoder to predict denoised spectrograms. We train our pre-processor on the Noisy Speech Database (NSD) to reconstruct denoised spectrograms from noisy inputs. Then, we evaluate our model as a frontend to a pretrained Conformer ASR model as well as a frontend to train smaller Conformer ASR models from scratch. We show that the Clean-coder is able to filter noise from speech and that it improves the total Word Error Rate (WER) of the downstream model in noisy conditions for both applications.

Place, publisher, year, edition, pages
Springer, 2023. Vol. 14260, p. 376-388
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 14260
Keywords [en]
Conformer, Noise Robustness, Speech Recognition
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:oru:diva-112056DOI: 10.1007/978-3-031-44195-0_31ISI: 001156958200031Scopus ID: 2-s2.0-85174623830ISBN: 9783031441943 (print)ISBN: 9783031441950 (electronic)OAI: oai:DiVA.org:oru-112056DiVA, id: diva2:1842287
Conference
32nd International Conference on Artificial Neural Networks (ICANN 2023), Heraklion, Crete, Greece, September 26-29, 2023
Note

The authors gratefully acknowledge support from the German BMWK (SIDIMO), the DFG (CML, LeCAREbot), and the European Commission (TRAIL, TERAIS).

Available from: 2024-03-04 Created: 2024-03-04 Last updated: 2024-03-04Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Möller, Matthias

Search in DiVA

By author/editor
Möller, Matthias
By organisation
School of Science and Technology
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 10 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf