default search action
ICASSP 2024: Seoul, Republic of Korea - Workshops
- IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Workshops, Seoul, Republic of Korea, April 14-19, 2024. IEEE 2024, ISBN 979-8-3503-7451-3
- Yujia Sun, Jinxin He, Yi Zhang, Xiaoming Liang, Ziyan Wang, Zhen Fu, Bo Chen:
The Fawaispeech System for Multi-Channel Speech Recognition in ICMC-ASR Challenge. 1-2 - Jingguang Tian, Shuaishuai Ye, Shunfei Chen, Yang Xiang, Zhaohui Yin, Xinhui Hu, Xinkang Xu:
The Royalflush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge. 1-2 - Minghui Wu, Luzhen Xu, Jie Zhang, Haitao Tang, Yanyan Yue, Ruizhi Liao, Jintao Zhao, Zhengzhe Zhang, Yichi Wang, Haoyin Yan, Hongliang Yu, Tongle Ma, Jiachen Liu, Chongliang Wu, Yongchao Li, Yanyong Zhang, Xin Fang, Yue Zhang:
The USTC-Nercslip Systems for the ICMC-ASR Challenge. 3-4 - Shangkun Huang, Yuxuan Du, Yankai Wang, Jing Deng, Rong Zheng:
The Fosafer System for The ICASSP2024 In-Car Multi-Channel Automatic Speech Recognition Challenge. 5-6 - Yuan Fang, Hao Li, Xueliang Zhang, Fei Chen, Guanglai Gao:
Cross-Attention-Guided Wavenet for Mel Spectrogram Reconstruction in The ICASSP 2024 Auditory EEG Challenge. 7-8 - Xingwei Sun, Qinglong Li, Kaichi Ma, Linzhang Wang, Yujun Wang:
Two-Stage Neural Network Model with Packet Loss Detection for ICASSP 2024 PLC Challenge. 9-10 - Zhongshu Hou, Tianchi Sun, Yuxiang Hu, Changbao Zhu, Kai Chen, Jing Lu:
SIR-Progressive Audio-Visual TF-Gridnet with ASR-Aware Selector for Target Speaker Extraction in MISP 2023 Challenge. 11-12 - Changheon Han, Suhyun Lee:
Optimizing Music Source Separation In Complex Audio Environments Through Progressive Self-Knowledge Distillation. 13-14 - Nicolae-Catalin Ristea, Ando Saabas, Ross Cutler, Babak Naderi, Sebastian Braun, Solomiya Branets:
ICASSP 2024 Speech Signal Improvement Challenge. 15-16 - Xuzhi Zhao, Xi Liu, Xinyi Wang, Rui Yang, Yi Du, Yahui Peng:
Dual-Domain Neural Networks for Clinical and Low-Dose CBCT Reconstruction. 17-18 - Han Yin, Mou Wang, Jisheng Bai, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen:
Sub-Band and Full-Band Interactive U-Net with Dprnn for Demixing Cross-Talk Stereo Music. 21-22 - Zihan Zhang, Jiayao Sun, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie:
Bs-Plcnet: Band-Split Packet Loss Concealment Network with Multi-Task Learning Framework and Multi-Discriminators. 23-24 - Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee:
Cross-Lingual Text-to-Speech via Hierarchical Style Transfer. 25-26 - Runduo Han, Xiaopeng Yan, Weiming Xu, Pengcheng Guo, Jiayao Sun, He Wang, Quan Lu, Ning Jiang, Lei Xie:
An Audio-Quality-Based Multi-Strategy Approach For Target Speaker Extraction in the Misp 2023 Challenge. 27-28 - Xiang Lyu, Yuhang Cao, Pengpeng Zou, Weilin Zhou:
Ximalaya ASDR System for ICASSP 2024 in-Car Multi-Channel (ICMC) ASR Challenge. 29-30 - Qinwen Hu, Tianyi Tan, Ming Tang, Yuxiang Hu, Changbao Zhu, Jing Lu:
General Speech Restoration Using Two-Stage Generative Adversarial Networks. 31-32 - Guochen Yu, Runqiang Han, Chenglin Xu, Haoran Zhao, Nan Li, Chen Zhang, Xiguang Zheng, Chao Zhou, Qi Huang, Bing Yu:
Ks-Net: Multi-Band Joint Speech Restoration and Enhancement Network for 2024 ICASSP SSI Challenge. 33-34 - Nan Li, Guochen Yu, Chen Zhang, Chao Zhou, Qi Huang, Bing Yu:
Multi-Stage Training For Cross-Domain Full-Band Audio Packet Loss Concealment. 35-36 - Fengyuan Hao, Huiyong Zhang, Lingling Dai, Xiaoxue Luo, Xiaodong Li, Chengshi Zheng:
Renet: A Time-Frequency Domain General Speech Restoration Network for Icassp 2024 Speech Signal Improvement Challenge. 37-38 - Longjie Luo, Tao Li, Lin Li, Qingyang Hong:
The Xmuspeech System for Audio-Visual Target Speaker Extraction in Misp 2023 Challenge. 39-40 - Lorenz Diener, Solomiya Branets, Ando Saabas, Ross Cutler:
The Icassp 2024 Audio Deep Packet Loss Concealment Grand Challenge. 41-42 - Alice Hein, Sven Gronauer, Klaus Diepold:
Patient-Specific Modeling of Daily Activity Patterns for Unsupervised Detection of Psychotic and Non-Psychotic Relapses. 43-44 - Lukas Henneke:
Improving Data-Driven RF Signal Separation with SOI-Matched Autoencoders. 45-46 - Jinting Wu, Mei Tu:
Unsupervised Relapse Detection Using Wearable-Based Digital Phenotyping for the 2nd E-Prevention Challenge. 47-48 - Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie:
Rad-Net: A Repairing and Denoising Network for Speech Signal Improvement. 49-50 - Ana Clara C. Silveira, Diedre Santos do Carmo, Lucas H. Ueda, Denis G. Fantinato, Paula D. P. Costa, Letícia Rittner:
Vision Transformer MST++: Efficient Hyperspectral Skin Reconstruction. 51-52 - Tejas Jayashankar, Binoy Kurien, Alejandro Lancho, Gary C. F. Lee, Yury Polyanskiy, Amir Weiss, Gregory W. Wornell:
The Data-Driven Radio Frequency Signal Separation Challenge. 53-54 - Mike Thornton, Jonas Auernheimer, Constantin Jehn, Danilo P. Mandic, Tobias Reichenbach:
Detecting Gamma-Band Responses to the Speech Envelope for the ICASSP 2024 Auditory EEG Decoding Signal Processing Grand Challenge. 55-56 - Hongbo Lan, Tianyou Cheng, Maokui He, Hang Chen, Jun Du:
The USTC System for Cadenza 2024 Challenge. 57-58 - Pai Chet Ng, Zhixiang Chi, Malcolm Low, Juwei Lu, Konstantinos N. Plataniotis, Nikolaos V. Boulgouris, Thirimachos Bourlai, Yong Man Ro:
Hyperspectral Skin Vision Challenge: Can Your Camera See Beyond Your Skin? 59-60 - Abhayjeet Singh, Amala Nagireddi, Deekshitha G, Jesuraja Bandekar, Roopa R., Sandhya Badiger, Sathvik Udupa, Prasanta Kumar Ghosh, Hema A. Murthy, Pranaw Kumar, Keiichi Tokuda, Mark Hasegawa-Johnson, Philipp Olbrich:
LIMMITS'24: Multi-Speaker, Multi-Lingual Indic TTS with Voice Cloning. 61-62 - He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li:
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge. 63-64 - Nikhil Das, Rakesh Pogula, Mohammad Imtiaz Ali, Sasank Kottapalli, Sanniboyina Venkata Kiran:
Rebuild, Regenerate: A Gated Temporal Convolution Based Gan for Speech Signal Improvement. 65-66 - Yejin Jeon, Youngjae Kim, Gary Geunbae Lee:
Leveraging Effective Language and Speaker Conditioning In Indic TTS for Limmits 2024 Challenge. 67-68 - Andreas Hauptmann, Mustafa Al-Rubaye, Miika T. Nieminen, Mikael A. K. Brix:
A Multi-Filter and Multi-Scale U-Net for Cone-Beam Computed Tomography with Hardware Constraints. 69-70 - Yixuan Zhou, Shuoyi Zhou, Shun Lei, Zhiyong Wu, Menglin Wu:
The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge. 71-72 - Daniil Vladimirov, Daniil Reutsky, Pavel Pischev, Vsevolod Plokhotnyuk, Egor I. Ershov:
MST-: A Modification of MST++ for Narrow Domain Hyperspectral Reconstruction. 73-74 - Mikael Brudfors, Mark Graham, Hyungon Ryu, Oliver Kutter:
Monai for Deep-Learning Based CBCT Reconstruction. 75-76 - Lingling Dai, Yuxuan Ke, Huiyong Zhang, Fengyuan Hao, Xiaoxue Luo, Xiaodong Li, Chengshi Zheng:
A Time-Frequency Band-Split Neural Network For Real-Time Full-Band Packet Loss Concealment. 77-78 - Yu Tian, Ahmed Alhammadi, Abdullah Quran, Abubakar Sani Ali:
A Novel Approach to Wavenet Architecture for RF Signal Separation with Learnable Dilation and Data Augmentation. 79-80 - Ander Biguri, Subhadip Mukherjee:
Advancing the Frontiers of Deep Learning for Low-Dose 3D Cone-Beam Computed Tomography (CT) Reconstruction. 81-82 - Hao-Chiang Shao, Szu-Chi Wu, Yen-Liang Chuo, Jyun-Hao Lin, Yuan-Rong Liao, Tse-Yu Tseng:
RGBT2HS-Net: Reconstructing a Hyper-Spectral Volume from an Rgb-T Stack via an Attention-Powered Multiresolution Framework. 83-84 - Austin Yunker, Rajkumar Kettimuthu, John C. Roeske:
Low Dose CBCT Denoising Using a 3D U-Net. 85-86 - Yuchen Wang, Hongyuan Wang, Jiang Xu, Chang Chen, Xue Hu, Fenglong Song, Lizhi Wang:
Hysat++: Hybrid Spectral-Wise Attention Transformer for Skin Spectral Reconstruction. 87-88 - Kamil Górzynski, Anna Ples, Ivan Ryzhankow, Bartlomiej Zych:
Bumblebee Your Way to Recovery: Transforming The Approach to Detection of Mental Health Relapses. 89-90 - Mostafa Naseri, Jaron Fontaine, Ingrid Moerman, Eli De Poorter, Adnan Shahid:
A U-Net Architecture for Time-Frequency Interference Signal Separation of RF Waveforms. 91-92 - Gerardo Roa Dabike, Michael A. Akeroyd, Scott Bannister, Jon Barker, Trevor J. Cox, Bruno Fazenda, Jennifer Firth, Simone Graetzer, Alinka Greasley, Rebecca R. Vos, William M. Whitmer:
The ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids. 93-94 - Çagkan Yapar, Fabian Jaensch, Jan Christian Hauffen, Francesco Pezone, Peter Jung, Saeid K. Dehkordi, Giuseppe Caire:
Demucs for Data-Driven RF Signal Denoising. 95-96 - Panagiotis Kaliosis, Sofia Eleftheriou, Christos Nikou, Theodoros Giannakopoulos:
A Self-Supervised Learning Approach for Detecting Non-Psychotic Relapses Using Wearable-Based Digital Phenotyping. 97-98 - Fadli Damara, Zoran Utkovski, Slawomir Stanczak:
Signal Separation in Radio Spectrum Using Self-Attention Mechanism. 99-100 - Sasidhar Alavala, Subrahmanyam Gorthi:
3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction Using Swinir-Based Sinogram and Image Enhancement. 101-102 - Adria Mallol-Ragolta, Anika A. Spiesberger, Andreas Triantafyllopoulos, Björn W. Schuller:
Personalised Anomaly Detectors and Prototypical Representations for Relapse Detection from Wearable-Based Digital Phenotyping. 103-104 - Wojciech Czaja, Jeremiah Emidih, Brandon Kolstoe, Richard G. Spencer:
Hyperspectral Reconstruction of Skin through Fusion of Scattering Transform Features. 105-106 - Bryce Irvin, Sile Yin, Shuo Zhang, Marko Stamenovic:
A Fullband Neural Network for Audio Packet Loss Concealment. 107-108 - Matthew Daly:
Remixing Music for Hearing Aids Using Ensemble of Fine-Tuned Source Separators. 109-110 - Bo Wang, Xiran Xu, Zechen Zhang, Haolin Zhu, Yujie Yan, Xihong Wu, Jing Chen:
Self-Supervised Speech Representation and Contextual Text Embedding for Match-Mismatch Classification with EEG Recording. 111-112 - Xiran Xu, Bo Wang, Yujie Yan, Haolin Zhu, Zechen Zhang, Xihong Wu, Jing Chen:
ConvConcatNet: A Deep Convolutional Neural Network to Reconstruct Mel Spectrogram from the EEG. 113-114 - Akshit Arora, Rohan Badlani, Sungwon Kim, Rafael Valle, Bryan Catanzaro:
Scaling Nvidia's Multi-Speaker Multi-Lingual TTS Systems With Zero-Shot TTS to Indic Languages. 115-116 - Zelin Qiu, Jianjun Gu, Dingding Yao, Junfeng Li, Yonghong Yan:
BMMSNet: Bidirectional Mapping and Multilevel Similarity Comparison for EEG-Speech Match-Mismatch Problem. 117-118 - Keren Shao, Ke Chen, Shlomo Dubnov:
Music Enhancement with Deep Filters: A Technical Report for the ICASSP 2024 Cadenza Challenge. 119-120 - Daniil Robnikov, Tanel Alumäe:
Single-Stage TTS with Adapted Vocoder and Cross-Attention: Taltech Systems for the Limmits'24 Challenge. 121-122 - Hang Chen, Shilong Wu, Chenxi Wang, Jun Du, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Jingdong Chen, Odette Scharenborg, Zhong-Qiu Wang, Bao-Cai Yin, Jia Pan:
Summary on the Multimodal Information-Based Speech Processing (MISP) 2023 Challenge. 123-124 - Panagiotis Paraskevas Filntisis, Niki Efthymiou, George Retsinas, Athanasia Zlatintsi, Christos Garoufis, Thomas Sounapoglou, Panayotis Tsanakas, Nikolaos Smyrnis, Petros Maragos:
The 2nd E-Prevention Challenge: Psychotic and Non-Psychotic Relapse Detection Using Wearable-Based Digital Phenotyping. 125-126 - Lies Bollens, Corentin Puffay, Bernd Accou, Jonas Vanthornhout, Hugo Vanhamme, Tom Francart:
ICASSP 2024 Auditory EEG Decoding Challenge. 127-128 - Sayanton V. Dibbo, Juston S. Moore, Garrett T. Kenyon, Michael A. Teti:
LCANets++: Robust Audio Classification Using Multi-Layer Neural Networks with Lateral Competition. 129-133 - Alkis Koudounas, Eliana Pastor, Vittorio Mazzia, Manuel Giollo, Thomas Gueudré, Elisa Reale, Giuseppe Attanasio, Luca Cagliero, Sandro Cumani, Luca De Alfaro, Elena Baralis, Daniele Amberti:
Leveraging Confidence Models for Identifying Challenging Data Subgroups in Speech Models. 134-138 - Yi Zhu, Saurabh Powar, Tiago H. Falk:
Characterizing the Temporal Dynamics of Universal Speech Representations for Generalizable Deepfake Detection. 139-143 - Drew Grant, Helena Hahn, Adebayo Eisape, Valerie E. Rennoll, James E. West:
Multi-Modal Approaches for Improving the Robustness of Audio-Based Covid-19 Detection Systems. 144-148 - Yu Chen, Minglei Yang, Baixiao Chen, Meng Liu, Teng Ma:
A Method for Estimating Stationary Reflective Surfaces and Extracting Road Guardrails. 149-153 - Brayan Monroy, Jorge Bacca, Henry Arguello:
Predicting The Spectrum: Deep Adaptive Sensing for Hadamard Single Pixel Spectral Imaging. 154-158 - Kyriakos Lite, Bernhard Rinner:
Navigation for Autonomous Robots with Adaptive Information-Seeking. 159-163 - Anil Ganti, Michael A. Martinez, Granger Hickman, Jeffrey L. Krolik:
Wideband Adaptive Beamforming for a Partially-Calibrated Distributed Array. 164-168 - Fabio Broghammer, Thomas Wiedemann, Siwei Zhang, Benjamin Noack:
Simultaneous Gas Exploration and Network Localization with Robotic Swarms. 169-173 - Zexu Pan, Gordon Wichern, François G. Germain, Aswin Shanmugam Subramanian, Jonathan Le Roux:
Late Audio-Visual Fusion for in-the-Wild Speaker Diarization. 174-178 - Arijit Ukil, Angshul Majumdar, Antonio J. Jara, João Gama:
Deep Neural Network Model Compression and Signal Processing. 179-183 - Henrik Nilsson, Joakim Rydell, Anton Kullberg, Gustaf Hendeby:
Dronar: Obstacle Echolocation Using Drone Ego-Noise. 184-188 - Jiahua Wan, Hong Ren, Cunhua Pan, Zhiyuan Yu, Zhenkun Zhang, Yang Zhang:
Reconfigurable Intelligent Surface Assisted Integrated Sensing, Communication and Computation Systems. 189-193 - Weiqi Huang, Chenxu Zhang, Wen Qi:
Neuro-Mechanical Synergy: A Smart Idea of Intelligent Exoskeletons for Stroke Rehabilitation. 194-198 - Wageesha Manamperi, Thushara D. Abhayapala, Paul Holmberg:
Microphone Aligned Continuous Wearable Device-Related Transfer Function: Efficient Modeling and Measurements. 199-203 - Jun Li, Jihwan Youn, Ryan Wu, Jeroen Overdevest, Shunqiao Sun:
Performance Evaluation and Analysis of Thresholding-Based Interference Mitigation for Automotive Radar Systems. 204-208 - Qiancheng Wei, Yi Yuan, Ying Liu, Mohammed Ali Mohammed Al-Hababi, Qiya Su, Muyao Yu:
LASDNet: A Lightweight Adaptive Surface Defect Detection Network. 209-213 - Wanli Ni, Jiachen Han, Zhijin Qin:
Convergence Analysis of Semi-Federated Learning with Non-IID Data. 214-218 - Peng Wang, Yashuai Cao, Wanli Ni, Dongsheng Han:
Pareto-Optimal Waveform Design for Multi-User and Multi-Target MIMO-ISAC Systems. 219-223 - Xiaoye Wang, Fan Liu, Le Zheng:
Low-Overhead OFDM Sensing Waveform Design for 5G NR: A Coprime-Based Subcarrier Allocation Approach. 224-228 - Chenyuan Feng, Zhenyu Feng, Qing Wang:
Recommendation Algorithm Based on Federated Multi-modal Learning. 229-233 - Shuang Li, Yaxiu Sun, Zherui Zhang, Han Zhang, Yun Lin:
Spectrum Data Graph Structure Learning Based On Dual-View Contrastive Learning For Spectrum Prediction Of ISCC. 234-238 - Xuchen Li, Ronghao Lin, Hing Cheung So:
Sparse Array Design for Mimo Radar in Multipath Scenarios. 239-243 - Yhonatan Gayer, Vladimir Tourbabin, Zamir Ben-Hur, Jacob Donley, Boaz Rafaely:
Ambisonics Encoding For Arbitrary Microphone Arrays Incorporating Residual Channels For Binaural Reproduction. 244-248 - Sotirios Athanasoulias, Stavros Sykiotis, Nikos Temenos, Anastasios Doulamis, Nikolaos Doulamis:
A Pre-Training Pruning Strategy for Enabling Lightweight Non-Intrusive Load Monitoring On Edge Devices. 249-253 - Yu Yao, Feng Shu, Linlong Wu, Xu Cheng, Xuan Li, Jiangzhou Wang:
Anti-Jamming Strategy for IRS-Aided JRC System in Vehicular Networks. 254-258 - Edmond Ka Hin Chan, Yirui Deng, Amus Chee Yuen Goay, Deepak Mishra, Aruna Seneviratne:
Non-Invasive Occupancy Monitoring Using Sustainable Backscatter Tags. 259-263 - Giulia Slavic, Mattia Bracco, Lucio Marcenaro, David Martín Gómez, Carlo S. Regazzoni, Pamela Zontone:
Joint Data-Driven Analysis of Visual-Odometric Anomaly Signals in Generative Ai-Based Agents. 264-268 - Sunil Bharitkar, Thaddeus Páez:
Edge-Optimized Model for Multimedia Classification using Linguistic Metadata. 269-273 - Yichun Li, Rajesh Nair, Syed Mohsen Naqvi:
Harnessing Video Intelligence: Intelligent System for ADHD Detection. 274-278 - Yirui Deng, Deepak Mishra, Shaghik Atakaramians, Aruna Seneviratne:
Smart CSI Processing for Accurate Commodity WiFi-Based Humidity Sensing. 279-283 - Jeroen Overdevest, Xinyi Wei, Hans Van Gorp, Ruud J. G. van Sloun:
Model-Based Diffusion for Mitigating Automotive Radar Interference. 284-288 - Pu Yang, Xiang-Gen Xia, Xichang Zhang, Jiaming Zhang, Yi Liu:
Full-Duplex Decode-And-Forward Relay for Joint Environmental Sensing And Self-Interference Cancellation. 289-293 - Hongwei Wang, Xi Zheng:
Robust Filtering of Distributed Cyber-Physical Systems with Cyber-Attack Detection. 294-298 - Mingfeng Zhang, Jinhua Sun, Xiaojun Wu:
Iterative Detection Algorithm with LDPC Decoder in Cyclic-Prefix OTFS System. 299-303 - Menghong Cai, Jun Fang, Huiping Duan, Xiaoyu Li, Hongbin Li:
A Variational Bayesian Inference-Inspired Unrolled Deep Network for Compressed Sensing. 304-308 - Xiao Wu, Xuan Mu, Wen Qi, Xiaorui Liu:
A Cross-Attention Emotion Recognition Algorithm Based on Audio and Video Modalities. 309-313 - Marios Nikolaos Militsis, Vasileios Mygdalis, Ioannis Pitas:
Soft ROI Pooling Distillation for Detection Transformers. 314-319 - Fuma Aki, Riku Ikeda, Takumi Saito, Ciaran Regan, Mizuki Oka:
Evolving Complex Environments in Evolution Gym using Large Language Models. 320-324 - Alfredo Nascita, Raffaele Carillo, Federica Giampetraglia, Antonio Iacono, Valerio Persico, Antonio Pescapè:
Interpretability and Complexity Reduction in Iot Network Anomaly Detection Via XAI. 325-329 - Arijit Ukil, Ishan Sahu, Mridul Biswas, Arpan Pal, Angshul Majumdar:
Structured Lottery Ticket Hypothesis for Effective Deep Neural Network Model Size Reduction. 330-334 - Yasar Abbas Ur Rehman, Kin Wai Lau, Yuyang Xie, Lan Ma, Jiajun Shen:
Exploring Federated Self-Supervised Learning for General Purpose Audio Understanding. 335-340 - Álvaro Martín-Cortinas, Daniel Sáez-Trigueros, Grzegorz Beringer, Iván Vallés-Pérez, Roberto Barra-Chicote, Biel Tura Vecino, Adam Gabrys, Thomas Merritt, Piotr Bilinski, Jaime Lorenzo-Trueba:
Investigating Self-Supervised Features for Expressive, Multilingual Voice Conversion. 341-345 - Antonino Ferraro, Valerio La Gatta, Marco Postiglione:
Empowering Network Security with Autoencoders. 346-350 - Cheng Kang, Daniel Novak, Katerina Urbanova, Yuqing Cheng, Yong Hu:
Domain-Specific Improvement on Psychotherapy Chatbot Using Assistant. 351-355 - Kaneez Khatoon, Deepak Mishra, Ravikant Saini:
IRS Assisted Secure NOMA for Untrusted Users. 356-360 - Giancarlo Sperlì, Andrea Vignali:
Anomaly Detection in Cyber-Physical Systems: A Case Study on Pump Health Monitoring. 361-364 - Yating Chen, Cai Wen, Yan Huang, Timothy N. Davidson:
CRB Optimization for Integrated Sensing and Communication Systems Using Hybrid Linear-Nonlinear Precoding. 365-369 - Xinrui Li, Baixiao Chen:
An Effective Composite Jamming Method for Synthetic Aperture Radar in Practical Electronic Countermeasures. 370-373 - Luca Barbieri, Mattia Brambilla, Monica Nicoli:
A Compressed Decentralized Federated Learning Frame Work for Enhanced Environmental Awareness in V2V Networks. 374-378 - Chengwei Lin, Wen Qi:
Integrating AI in Human-Robot Interaction: Emerging Challenges and Future Directions. 379-382 - Xinyu Huang, Yuwen Cao, Tomoaki Ohtsuki:
Deep Sub-Image Sampling Based Defense Against Spatial-Domain Adversarial Steganography. 383-387 - Weizhao Chen, Jiawang Wan, Fangwen Ye, Ran Wang, Cheng Xu:
QMARL: A Quantum Multi-Agent Reinforcement Learning Framework for Swarm Robots Navigation. 388-392 - Hollan Haule, Ian Piper, Patricia Jones, Tsz-Yan Milly Lo, Javier Escudero:
Collaborative Learning of Common Latent Representations in Routinely Collected Multivariate ICU Physiological Signals. 393-397 - Saad Mokssit, Daniel Bonilla Licea, Bassma Guermah, Mounir Ghogho:
An Object-Oriented Deep Learning Method for Video Frame Prediction. 398-402 - Yiwei Ding, Alexander Lerch:
Embedding Compression for Teacher-to-Student Knowledge Transfer. 403-407 - Sarthak Kumar Maharana, Krishna Kamal Adidam, Shoumik Nandi, Ajitesh Srivastava:
Acoustic-to-Articulatory Inversion for Dysarthric Speech: are Pre-Trained Self-Supervised Representations Favorable? 408-412 - Wenjie Li, Lei Chen, Bing Lv, Xiaobo Li, Weijun Wang, Kun Xu, Xilun Ding:
Deep Learning EEG Technology Development and Brain Computer Interface Chips Progress. 413-418 - Dror Jacoby, Jonatan Ostrometzky, Hagit Messer:
Integrated RNNs for Rainfall Sensing with Wireless Communication Networks. 419-423 - Yuzhuo Ren, Yining Deng, David Pajak, Robin Jenkin, Niranjan Avadhanam, Varsha Hedau:
Parameter Blending For Multi-Camera Harmonization For Automotive Surround View Systems. 424-428 - Zubair Shaban, Ranjitha Prasad, Pooja Kumari:
Over The Air Federated Learning in the Presence of Impulsive Noise. 429-433 - Patitapaban Palo, Satyajit Nayak, Durga Nagendra Raghava Kumar Modhugu, Satarupa Uttarkabat, Kwanit Gupta:
Drivesafe: Pose-Based Driver Monitoring In Software Defined Vehicle Using STGCN. 434-438 - Peter G. Vouras:
K-Space Beamforming for an Array of Quantum Sensors. 439-443 - Hyewon Han, Naveen Kumar:
A Cross-Talk Robust Multichannel VAD Model For Multiparty Agent Interactions Trained Using Synthetic Re-Recordings. 444-448 - Yi Luo, Rongzhi Gu:
Fast Random Approximation of Multi-Channel Room Impulse Response. 449-454 - Rino Kimura, Tomohiro Nakatani, Naoyuki Kamo, Marc Delcroix, Shoko Araki, Tetsuya Ueda, Shoji Makino:
Diffusion Model-Based MIMO Speech Denoising and Dereverberation. 455-459 - Ryan Whetten, Titouan Parcollet, Marco Dinarelli, Yannick Estève:
Open Implementation and Study of Best-RQ for Speech Processing. 460-464 - Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-Yi Lee, Hsin-Min Wang, David Harwath:
SpeechCLIP+: Self-Supervised Multi-Task Representation Learning for Speech Via Clip and Speech-Image Data. 465-469 - Thomas Deppisch, Jens Ahrens, Sebastià V. Amengual Garí, Paul Calamia:
Blind Estimation of Spatial Room Impulse Responses Using a Pseudo Reference Signal. 470-474 - Sebastian Löf, Cody Hesse, Carl Thomé, Carlos Lordelo, Jens Ahrens:
VICMus: Variance-Invariance-Covariance Regularization for Music Representation Learning. 475-479 - Haoming Guo, Seth Z. Zhao, Jiachen Lian, Gopala Anumanchipalli, Gerald Friedland:
Enhancing GAN-based Vocoders with Contrastive Learning Under Data-Limited Condition. 480-484 - Andreas Jonas Fuglsig, Jesper Jensen, Zheng-Hua Tan, Lars Søndergaard Bertelsen, Jens Christian Lindof, Jan Østergaard:
Joint Minimum Processing Beamforming and Near-End Listening Enhancement. 485-489 - Wei-Ting Lai, Lachlan Birnie, Thushara D. Abhayapala, Amy Bastine, Shaoheng Xu, Prasanga N. Samarasinghe:
A Two-Step Approach for Narrowband Source Localization in Reverberant Rooms. 490-494 - Fabian Ritter Gutierrez, Kuan-Po Huang, Dianwen Ng, Jeremy H. M. Wong, Hung-Yi Lee, Eng Siong Chng, Nancy F. Chen:
Noise Robust Distillation of Self-Supervised Speech Models via Correlation Metrics. 495-499 - Alexander Bohlender, Ann Spriet, Wouter Tirry, Nilesh Madhu:
Insights into Magnitude and Phase Estimation by Masking and Mapping in DNN-Based Multichannel Speaker Separation. 500-504 - Moreno La Quatra, Alkis Koudounas, Lorenzo Vaiani, Elena Baralis, Luca Cagliero, Paolo Garza, Sabato Marco Siniscalchi:
Benchmarking Representations for Speech, Music, and Acoustic Events. 505-509 - Takuya Higuchi, Avamarie Brueggeman, Masood Delfarah, Stephen Shum:
Multichannel Voice Trigger Detection Based on Transform-Average-Concatenate. 510-514 - Calum Heggan, Sam Budgett, Timothy M. Hospedales, Mehrdad Yaghoobi:
On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification. 515-519 - Chen Li:
External Knowledge Augmented Polyphone Disambiguation Using Large Language Model. 520-524 - Mohammad Bokaei, Jesper Jensen, Simon Doclo, Jan Østergaard:
Deep Low-Latency Joint Speech Transmission and Enhancement over A Gaussian Channel. 525-529 - Mingxue Song, Tetsuya Ueda, Ruifeng Zhang, Jiahui Hu, Shoji Makino:
Geometrically Constrained Joint Moving Source Extraction and Dereverberation Based on Constant Separating Vector Mixing Model. 530-534 - Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Takanori Ashihara, Shoko Araki, Jan Cernocký:
Probing Self-Supervised Learning Models With Target Speech Extraction. 535-539 - Chih-Kai Yang, Kuan-Po Huang, Ke-Han Lu, Chun-Yi Kuan, Chi-Yuan Hsiao, Hung-Yi Lee:
Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR And Speech-to-Text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision. 540-544 - Shanshan Wang, Soumya Tripathy, Toni Heittola, Annamaria Mesaros:
Positive and Negative Sampling Strategies for Self-Supervised Learning on Audio-Video Data. 545-549 - Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain:
Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations. 550-554 - Sungho Jeon, Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel:
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency. 555-559 - Natarajan Balaji Shankar, Ruchao Fan, Abeer Alwan:
SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation for Low Resource ASR. 560-564 - Thai-Binh Nguyen, Alexander Waibel:
ConVoiFilter: A Case Study of Doing Cocktail Party Speech Recognition. 565-569 - Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe:
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation. 570-574 - Guang Chai, Zhibin Yu, Xiaofeng Wu, Muhammad Nabeel, Giuseppe Caire:
Joint Transmit and Receive Codebook Design for Self-Interference Suppression in a mmWave ISAC Terminal Device. 575-579 - Aditya Ravuri, Erica Cooper, Junichi Yamagishi:
Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction. 580-584 - Branimir Dropuljic, Miljenko Suflaj, Andrej Jertec, Leo Obadic:
Synthetic Speech Detection with Wav2vec 2.0 in Various Language Settings. 585-589 - Alexandra Saliba, Yuanchao Li, Ramon Sanabria, Catherine Lai:
Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition. 590-594 - Srikanth Korse, Mohamed Elminshawi, Emanuël A. P. Habets, Srikanth Raj Chetupalli:
Training Strategies for Modality Dropout Resilient Multi-Modal Target Speaker Extraction. 595-599 - Ilyass Moummad, Nicolas Farrugia, Romain Serizel:
Self-Supervised Learning for Few-Shot Bird Sound Classification. 600-604 - Panos Kakoulidis, Nikolaos Ellinas, Georgios Vamvoukakis, Myrsini Christidou, Alexandra Vioni, Georgia Maniati, Junkwang Oh, Gunu Jho, Inchul Hwang, Pirros Tsiakoulis, Aimilios Chalamandaris:
Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations. 605-609 - Yi Zheng, Cunyi Liao, Ji Wang, Shouyin Liu:
A Transformer-Based Network for Unifying Radio Map Estimation and Optimized Site Selection. 610-614 - Yuanjie Wang, Zhanbo Feng, Zhenyu Liao:
FedRF-Adapt: Robust and Communication-Efficient Federated Domain Adaptation via Random Features. 615-619 - Arash Rasti-Meymandi, Pai Chet Ng, Huan Liu, Yuanhao Yu, Konstantinos N. Plataniotis:
Persota FL: A Robust-to-Noise Personalized Over the Air Federated Learning for Human Activity Recognition. 620-624 - Xiaoye Shi, Xiangcheng Su, Shu Cai, Zhaowei Zhang, Haibo Dai:
UAV Position Deployment and Power Optimization Based on User Clustering in IoT Network. 625-629 - Xavier F. Cadet, Ranya Aloufi, Sara Ahmadi-Abhari, Hamed Haddadi:
A Study on The Impact of Self-Supervised Learning on Automatic Dysarthric Speech Assessment. 630-634 - Seyed Mohammad Sheikholeslami, Pai Chet Ng, Huan Liu, Yuanhao Yu, Konstantinos N. Plataniotis:
Towards Collaborative Multimodal Federated Learning for Human Activity Recognition in Smart Workplace Environments. 635-639 - Kiyoshi Kurihara, Masanori Sano:
Low-Resourced Phonetic and Prosodic Feature Estimation With Self-Supervised-Learning-based Acoustic Modeling. 640-644 - Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-Yi Lee, David Harwath:
Integrating Self-Supervised Speech Model with Pseudo Word-Level Targets from Visually-Grounded Speech Model. 645-649 - Weibin Li, Haifeng Zheng, Jun Fang, Xinxin Feng, Chunyan Cheng:
Intelligent Heavy Truck Platooning with ISCC for Enclosed Intermodal Railway-Road Transport Parks. 650-654 - Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu:
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification. 655-659 - Byoungsung Lim, David Han:
Cosmos: Coherent Scene with Multiple Objects Reconstruction. 660-664 - Jiahui Wang, Zan Xu, Ruxin Zhi, Liming Wang:
Reliability Study of Leo Satellite Networks Based on Random Linear Network Coding. 665-669 - Sameer Khurana, Nauman Dawalatabad, Antoine Laurent, Luis Vicente, Pablo Gimeno, Victoria Mingote, James R. Glass:
Cross-Lingual Transfer Learning for Low-Resource Speech Translation. 670-674 - Luca Zampierin, Ghouthi Boukli Hacene, Bac Nguyen, Mirco Ravanelli:
Skill: Similarity-Aware Knowledge Distillation for Speech Self-Supervised Learning. 675-679 - Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters:
Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning. 680-684 - George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti:
Training Early-Exit Architectures for Automatic Speech Recognition: Fine-Tuning Pre-Trained Models or Training from Scratch. 685-689 - Yang Xiao, Rohan Kumar Das:
Dual Knowledge Distillation for Efficient Sound Event Detection. 690-694 - Xinyi Hu, Nikolaos Pappas, Howard H. Yang:
Version Age-Based Client Scheduling Policy for Federated Learning. 695-699 - Ioannis Ziogas, Hessa Alfalahi, Ahsan H. Khandoker, Leontios J. Hadjileontiadis:
Cochceps-Augment: A Novel Self-Supervised Contrastive Learning Using Cochlear Cepstrum-Based Masking for Speech Emotion Recognition. 700-704 - Tzu-Han Lin, How-Shing Wang, Hao-Yung Weng, Kuang-Chen Peng, Zih-Ching Chen, Hung-Yi Lee:
PEFT for Speech: Unveiling Optimal Placement, Merging Strategies, and Ensemble Techniques. 705-709 - Tianyu Hu, Yunhang Xie, Shuai Wang, Lingxiang Li, Zhi Chen:
Directional Terahertz Radio Map-Assisted Federated Obstacle Sensing. 710-714 - David Nordlund, Jialing Liao, Zheng Chen:
Byzantine-Resilient Hierarchical Federated Learning with Clustered Over-The-Air Aggregation. 715-719 - Pantelis Mentesidis, Christos Papaioannidis, Ioannis Pitas:
Advancing Industrial Inspection: A Dataset for Automated Damage Detection in Insulated Pipes. 720-724 - Jingjie Fan, Rongzhi Gu, Yi Luo, Cong Pang:
A Unified Geometry-Aware Source Localization and Separation Framework for AD-HOC Microphone Array. 725-729 - Yuto Ishikawa, Kohei Konaka, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari:
Real-Time Speech Extraction Using Spatially Regularized Independent Low-Rank Matrix Analysis and Rank-Constrained Spatial Covariance Matrix Estimation. 730-734 - Hao Shi, Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani, Shoko Araki:
Ensemble Inference for Diffusion Model-Based Speech Enhancement. 735-739 - Diego Di Carlo, Aditya Arie Nugraha, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii:
Neural Steerer: Novel Steering Vector Synthesis with a Causal Neural Field over Frequency and Direction. 740-744 - H. Nazim Bicer, Cagdas Tuna, Andreas Walther, Emanuël A. P. Habets:
Data-Driven Joint Detection and Localization of Acoustic Reflectors. 745-749 - Karim Helwani, Masahito Togami, Paris Smaragdis, Michael M. Goodwin:
Sound Source Separation Using Latent Variational Block-Wise Disentanglement. 750-754 - Seokhyun Kim, Won Jeong, Hyung-Min Park:
Multi-Channel Speech Enhancement Using Beamforming and Nullforming for Severely Adverse Drone Environment. 755-759 - Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko:
Gated Low-Rank Adaptation for Personalized Code-Switching Automatic Speech Recognition on the Low-Spec Devices. 760-764 - Bo He, Shiqi Zhang, Xianrui Wang, Zheng Qiu, Daiki Takeuchi, Daisuke Niizumi, Noboru Harada, Shoji Makino:
Light Gated Multi Mini-Patch Extractor for Audio Classification. 765-769 - Zhiqiang Tan, Zhiwei Yao, Limin Xiao, Ming Zhao, Yunzhou Li:
A New Approach to Predict Radio Map via Learning-Based Spatial Loss Field. 770-774 - Thilo von Neumann, Christoph Böddeker, Tobias Cord-Landwehr, Marc Delcroix, Reinhold Haeb-Umbach:
Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization. 775-779 - Thomas Serre, Mathieu Fontaine, Éric Benhaim, Geoffroy Dutour, Slim Essid:
A Lightweight Dual-Stage Framework for Personalized Speech Enhancement Based on Deepfilternet2. 780-784 - Akam Rahimi, Triantafyllos Afouras, Andrew Zisserman:
Voicevector: Multimodal Enrolment Vectors for Speaker Separation. 785-789 - Gyungmin Kim, Rui Jin, Wilson Funk, Seung-Jun Kim, Hyuk Lim:
Multiantenna Channel Map Estimation Using Deep Spatial Interpolation. 790-794 - Federico Miotello, Paolo Ostan, Mirco Pezzoli, Luca Comanducci, Alberto Bernardini, Fabio Antonacci, Augusto Sarti:
HOMULA-RIR: A Room Impulse Response Dataset for Teleconferencing and Spatial Audio Applications Acquired through Higher-Order Microphones and Uniform Linear Microphone Arrays. 795-799 - Niklas Vaara, Pekka Sangi, Miguel Bordallo López, Janne Heikkilä:
A Ray Launching Approach for Computing Exact Paths with Point Clouds. 800-804 - Olivia Zacharia, M. Vani Devi:
Super-Resolution Sensing of User Equipment Using Delay-Doppler Pilot-Data Structure in RIS-Aided OTFS Systems. 805-809 - Douglas W. Oard, Christopher Bearman, David Baker, Susannah Paletz, Johanne Trippas:
Fearless Steps APOLLO: Operational Disconnect Detection in Mission Control. 810-811 - Alkis Koudounas, Flavio Giobergia:
Houston we have a Divergence: A Subgroup Performance Analysis of ASR Models. 812-813 - Maron Schlemon, Martin Schulz, Rolf Scheiber, Marc Jäger, Joel A. Amao Oliva:
Real-Time Capability of DLR'S Beamforming Synthetic Aperture Radar Processing Architecture. 814-817 - Luca Marinelli, Charalampos Saitis:
Explainable Modeling of Gender-Targeting Practices in Toy Advertising Sound and Music. 818-822 - Cameron Churchwell, Max Morrison, Bryan Pardo:
High-Fidelity Neural Phonetic Posteriorgrams. 823-827 - Masato Hagiwara, Marius Miron, Jen-Yu Liu:
ISPA: Inter-Species Phonetic Alphabet for Transcribing Animal Sounds. 828-832 - Pablo Alonso-Jiménez, Leonardo Pepino, Roser Batlle-Roca, Pablo Zinemanas, Dmitry Bogdanov, Xavier Serra, Martín Rocamora:
Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio. 833-837 - Zeyu Zhao, Pinzhen Chen, Peter Bell:
Regarding Topology and Adaptability in Differentiable WFST-Based E2E ASR. 843-847 - Yassine El Kheir, Ahmed Ali, Shammur Absar Chowdhury:
Speech Representation Analysis Based on Inter- and Intra-Model Similarities. 848-852 - Luca Della Libera, Cem Subakan, Mirco Ravanelli:
Focal Modulation Networks for Interpretable Sound Classification. 853-857 - Maxime Jacquelin, Maëva Garnier, Laurent Girin, Rémy Vincent, Olivier Perrotin:
Exploring the Multidimensional Representation of Unidimensional Speech Acoustic Parameters Extracted by Deep Unsupervised Models. 858-862 - Golara Javadi, Kamer Ali Yuksel, Yunsu Kim, Thiago Castro Ferreira, Mohamed Al-Badrashiny:
Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing Through Analyzing Attentions of a Reference-Free Metric. 863-867 - Zeyu Zhao, Peter Bell, Ondrej Klejch:
Exploring Dominant Paths in CTC-Like ASR Models: Unraveling the Effectiveness of Viterbi Decoding. 868-872 - Chang-Bin Jeon, Gordon Wichern, François G. Germain, Jonathan Le Roux:
Why Does Music Source Separation Benefit from Cacophony? 873-877 - Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos, Giorgos Stamou:
Perceptual Musical Features for Interpretable Audio Tagging. 878-882 - Davide Salvi, Temesgen Semu Balcha, Paolo Bestagini, Stefano Tubaro:
Listening Between the Lines: Synthetic Speech Detection Disregarding Verbal Content. 883-887
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.