At this year’s Interspeech conference, in September, Alexa AI is co-organizing four special sessions — themed sessions within the main conferences — all of which are currently seeking paper submissions.
One session is on machine learning and signal processing in the context of multiple networked smart devices. This session will address topics such as synchronization, arbitration (deciding which device should respond to a query), and privacy.
Another Interspeech session is on inclusive and fair speech technologies. Algorithmic bias has been well studied in natural-language processing and computer vision but less so in speech. Possible paper topics include methods of bias analysis and mitigation, dataset creation, and ASR for atypical speech.
The third session is on trustworthy speech processing, which focuses on the development of models whose goals go beyond accuracy to incorporate privacy, interpretability, fairness, ethics, bias mitigation, and related areas.
Finally, the fourth special session is on predicting the intelligibility of speech — both the raw acoustic signal and the signal generated by hearing aids — to hearing-impaired listeners. This session is related to the Clarity Challenge, a five-year challenge to improve hearing aids that Alexa AI is participating in.
There’s more information about the individual sessions below. Submissions to the special sessions should go through the main-conference submission portal. The submission deadline is March 21.
Challenges and opportunities for signal processing and machine learning for multiple smart devices
The purpose of this session is to promote research in multiple-device signal processing and machine learning by bringing together industry and academic experts to discuss topics that include but are not limited to
- Multiple-device audio datasets
- Automatic speech recognition
- Keyword spotting
- Device arbitration (i.e., which device should respond to the user’s inquiry)
- Speech enhancement: de-reverberation, noise reduction, echo reduction
- Source separation
- Speaker localization and tracking
- Privacy-sensitive signal processing and machine learning
The session will collocate top researchers working in the multisensor domain, and even though their specific applications may be different (e.g., enhancement vs. acoustic-event detection), the similarity of the problem space encourages cross-pollination of techniques.
- Jarred Barber, applied scientist with Alexa AI
- Gregory Ciccarelli, applied scientist with Alexa AI
- Israel Cohen, Amazon Scholar and professor at Technion-Israel Institute of Technology
- Tao Zhang, senior manager of applied science with Alexa AI
Inclusive and Fair Speech Technologies
Alexa AI is co-organizing this session with leading researchers in the field from around the world. The session will feature a series of oral presentations (or posters with two-minute introductions if more than six papers are accepted) that may address but are not limited to the following topics:
- methods for bias analysis and mitigation, including algorithmic training criteria;
- creating, managing, and sharing datasets for bias quantification and methods for data augmentation, curation, and coding techniques, with an emphasis on user groups not included in standard corpora;
- ASR for atypical speech (e.g., ALS, stroke, deafness, Down syndrome);
- ethical considerations about inclusion, democratization of speech technologies, and making speech interaction seamless for all;
- applications of personalization techniques while fostering fairness (i.e., fairness-aware personalization)
- Peng Liu, senior machine learning scientist with Alexa AI
- Anirudh Mani, applied scientist with Alexa AI
- Tao Zhang, senior manager of applied science with Alexa AI
Trustworthy Speech Processing
Given the ubiquity of machine learning systems, it is important to ensure private and safe handling of data. Speech processing presents a unique set of challenges, given the rich information carried in linguistic and paralinguistic content, including speaker traits and interaction and state characteristics. This special session will bring together new and experienced researchers working on trustworthy machine learning and speech processing, and the session organizers are seeking novel and relevant submissions from academic and industrial research groups showcasing both theoretical and empirical advancements in trustworthy speech processing (TSP)
Topics of interest include but are not limited to:
- Differential privacy
- Federated learning
- Ethics in speech processing
- Model interpretability
- Quantifying and mitigating bias in speech processing
- New datasets, frameworks, and benchmarks for TSP
- Discovery and defense against emerging privacy attacks
- Trustworthy machine learning in applications of speech processing, such as automatic speech recognition
Speech intelligibility prediction for hearing-impaired listeners
Disabling hearing impairment affects 360 million people worldwide, and one of the greatest challenges for hearing-impaired listeners is understanding speech in the presence of background noise. The development of better hearing aids requires prediction models that can take audio signals and use knowledge of the listener’s characteristics (e.g., an audiogram) to estimate the signals’ intelligibility. These include models that can estimate the intelligibility of natural signals and models that can estimated the intelligibility of signals that have been processed using hearing aid algorithms.
The Clarity Prediction Challenge (part of the five-year Clarity Challenge) provides noisy speech signals that have been processed with a number of hearing-aid signal-processing systems and corresponding intelligibility scores and asks contestants to produce models that can predict intelligibility scores given just the signals, their clean references, and a characterisation of each listener’s specific hearing impairment. The challenge will remain open until the Interspeech submission deadline and all entrants are welcome.
The special session welcomes submission from entrants to the Clarity Prediction Challenge but is also inviting papers on related topics in hearing impairment and speech intelligibility, including, but not limited to
- Statistical speech modeling for intelligibility prediction
- Modeling energetic and informational noise masking
- Individualizing intelligibility models using audiometric data
- Intelligibility prediction in online and low-latency settings
- Model-driven speech intelligibility enhancement
- New methodologies for intelligibility model evaluation
- Speech resources for intelligibility model evaluation
- Applications of intelligibility modeling in acoustic engineering
- Modeling interactions between hearing impairment and speaking style
- Papers using the data supplied with the Clarity Prediction Challenge
Daniel Korzekwa, an applied-science manager with Alexa AI