ByteSing: A chinese singing voice synthesis system using duration allocated encoder-decoder acoustic models and WaveRNN vocoders Y Gu, X Yin, Y Rao, Y Wan, B Tang, Y Zhang, J Chen, Y Wang, Z Ma 2021 12th International Symposium on Chinese Spoken Language Processing …, 2021 | 86 | 2021 |
Waveform modeling and generation using hierarchical recurrent neural networks for speech bandwidth extension ZH Ling, Y Ai, Y Gu, LR Dai IEEE/ACM Transactions on Audio, Speech, and Language Processing 26 (5), 883-894, 2018 | 81 | 2018 |
Speech bandwidth extension using bottleneck features and deep recurrent neural networks. Y Gu, ZH Ling, LR Dai Interspeech, 297-301, 2016 | 57 | 2016 |
A Kinect based gesture recognition algorithm using GMM and HMM Y Song, Y Gu, P Wang, Y Liu, A Li 2013 6th International Conference on Biomedical Engineering and Informatics …, 2013 | 34 | 2013 |
Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension. Y Gu, ZH Ling INTERSPEECH, 1123-1127, 2017 | 29 | 2017 |
Human action recognition based on depth images from microsoft kinect T Liu, Y Song, Y Gu, A Li 2013 Fourth Global Congress on Intelligent Systems, 200-204, 2013 | 29 | 2013 |
Multi-task WaveNet: A multi-task generative model for statistical parametric speech synthesis without fundamental frequency conditions Y Gu, Y Kang arXiv preprint arXiv:1806.08619, 2018 | 23 | 2018 |
Restoring high frequency spectral envelopes using neural networks for speech bandwidth extension Y Gu, ZH Ling 2015 International Joint Conference on Neural Networks (IJCNN), 1-8, 2015 | 11 | 2015 |
Video-to-audio generation with hidden alignment M Xu, C Li, Y Ren, R Chen, Y Gu, W Liang, D Yu arXiv preprint arXiv:2407.07464, 2024 | 5 | 2024 |
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation Q Zhu, J Zhang, Y Gu, Y Hu, L Dai Proceedings of the AAAI Conference on Artificial Intelligence 38 (17), 19768 …, 2024 | 5 | 2024 |
Rep2wav: Noise robust text-to-speech using self-supervised representations Q Zhu, Y Gu, R Chen, C Weng, Y Hu, L Dai, J Zhang arXiv preprint arXiv:2308.14553, 2023 | 4 | 2023 |
LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance S Chen, Y Gu, J Zhang, N Li, R Chen, L Chen, L Dai arXiv preprint arXiv:2406.05325, 2024 | 3 | 2024 |
Sifisinger: A High-Fidelity End-to-End Singing Voice Synthesizer Based on Source-Filter Model J Cui, Y Gu, C Weng, J Zhang, L Chen, L Dai ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 2 | 2024 |
DurIAN-E: Duration informed attention network for expressive text-to-speech synthesis Y Gu, Y Bian, G Lei, C Weng, D Su arXiv preprint arXiv:2309.12792, 2023 | 2 | 2023 |
Speech vocoder based on deep convolutional neural networks HC Wu, Y Gu, ZH Ling Proc. of the 14th National Conference on Man-Machine Speech Communicationn …, 2017 | 2* | 2017 |
LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation S Chen, Y Gu, J Cui, J Zhang, R Chen, L Dai arXiv preprint arXiv:2408.12354, 2024 | 1 | 2024 |
Eeg2vec: Self-Supervised Electroencephalographic Representation Learning Q Zhu, X Zhao, J Zhang, Y Gu, C Weng, Y Hu arXiv preprint arXiv:2305.13957, 2023 | 1 | 2023 |
Video-to-Audio Generation with Fine-grained Temporal Semantics Y Hu, Y Gu, C Li, R Chen, D Yu arXiv preprint arXiv:2409.14709, 2024 | | 2024 |
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment Y Ren, C Li, M Xu, W Liang, Y Gu, R Chen, D Yu arXiv preprint arXiv:2409.08601, 2024 | | 2024 |
Opine: Leveraging a Optimization-Inspired Deep Unfolding Method for Multi-Channel Speech Enhancement A Li, R Chen, Y Gu, C Weng, D Su ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | | 2024 |