Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models R Huang, J Huang, D Yang, Y Ren, L Liu, M Li, Z Ye, J Liu, X Yin, Z Zhao International Conference on Machine Learning, 13916-13932, 2023 | 217 | 2023 |
Bytesing: A chinese singing voice synthesis system using duration allocated encoder-decoder acoustic models and wavernn vocoders Y Gu, X Yin, Y Rao, Y Wan, B Tang, Y Zhang, J Chen, Y Wang, Z Ma 2021 12th International Symposium on Chinese Spoken Language Processing …, 2021 | 86 | 2021 |
Mega-tts: Zero-shot text-to-speech at scale with intrinsic inductive bias Z Jiang, Y Ren, Z Ye, J Liu, C Zhang, Q Yang, S Ji, R Huang, C Wang, ... arXiv preprint arXiv:2306.03509, 2023 | 50 | 2023 |
Ppg-based singing voice conversion with adversarial representation learning Z Li, B Tang, X Yin, Y Wan, L Xu, C Shen, Z Ma ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 39 | 2021 |
A unified sequence-to-sequence front-end model for mandarin text-to-speech synthesis J Pan, X Yin, Z Zhang, S Liu, Y Zhang, Z Ma, Y Wang ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 34 | 2020 |
Make-an-audio 2: Temporal-enhanced text-to-audio generation J Huang, Y Ren, R Huang, D Yang, Z Ye, C Zhang, J Liu, X Yin, Z Ma, ... arXiv preprint arXiv:2305.18474, 2023 | 33 | 2023 |
Towards realistic visual dubbing with heterogeneous sources T Xie, L Liao, C Bi, B Tang, X Yin, J Yang, M Wang, J Yao, Y Zhang, Z Ma Proceedings of the 29th ACM International Conference on Multimedia, 1739-1747, 2021 | 33 | 2021 |
Modeling F0 trajectories in hierarchically structured deep neural networks X Yin, M Lei, Y Qian, FK Soong, L He, ZH Ling, LR Dai Speech Communication 76, 82-92, 2016 | 32 | 2016 |
A hybrid text normalization system using multi-head self-attention for mandarin J Zhang, J Pan, X Yin, C Li, S Liu, Y Zhang, Y Wang, Z Ma ICASSP 2020-2020 IEEE international conference on acoustics, speech and …, 2020 | 25 | 2020 |
Clinical efficacy of bone cement-injectable cannulated pedicle screw short segment fixation for lumbar spondylolisthesis with osteoporosise Y Liu, J Xiao, X Yin, M Liu, J Zhao, P Liu, F Dai Scientific reports 10 (1), 3929, 2020 | 25 | 2020 |
Mega-tts 2: Zero-shot text-to-speech with arbitrary length speech prompts Z Jiang, J Liu, Y Ren, J He, C Zhang, Z Ye, P Wei, C Wang, X Yin, Z Ma, ... arXiv preprint arXiv:2307.07218, 2023 | 21 | 2023 |
Cross-speaker emotion transfer based on speaker condition layer normalization and semi-supervised training in text-to-speech P Wu, J Pan, C Xu, J Zhang, L Wu, X Yin, Z Ma arXiv preprint arXiv:2110.04153, 2021 | 19 | 2021 |
Geneface++: Generalized and stable real-time audio-driven 3d talking face generation Z Ye, J He, Z Jiang, R Huang, J Huang, J Liu, Y Ren, X Yin, Z Ma, Z Zhao arXiv preprint arXiv:2305.00787, 2023 | 18 | 2023 |
Biomechanical influence of the surgical approaches, implant length and density in stabilizing ankylosing spondylitis cervical spine fracture Y Liu, Z Wang, M Liu, X Yin, J Liu, J Zhao, P Liu Scientific reports 11 (1), 6023, 2021 | 18 | 2021 |
Towards high-fidelity singing voice conversion with acoustic reference and contrastive predictive coding C Wang, Z Li, B Tang, X Yin, Y Wan, Y Yu, Z Ma arXiv preprint arXiv:2110.04754, 2021 | 17 | 2021 |
Virtual try-on with pose-garment keypoints guided inpainting Z Li, P Wei, X Yin, Z Ma, AC Kot Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 16 | 2023 |
Fine-Grained Prosody Modeling in Neural Speech Synthesis Using ToBI Representation. Y Zou, S Liu, X Yin, H Lin, C Wang, H Zhang, Z Ma Interspeech, 3146-3150, 2021 | 16 | 2021 |
Clapspeech: Learning prosody from text context with contrastive language-audio pre-training Z Ye, R Huang, Y Ren, Z Jiang, J Liu, J He, X Yin, Z Zhao arXiv preprint arXiv:2305.10763, 2023 | 15 | 2023 |
A chapter-wise understanding system for text-to-speech in Chinese novels J Pan, L Wu, X Yin, P Wu, C Xu, Z Ma ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 14 | 2021 |
Real3d-portrait: One-shot realistic 3d talking portrait synthesis Z Ye, T Zhong, Y Ren, J Yang, W Li, J Huang, Z Jiang, J He, R Huang, ... arXiv preprint arXiv:2401.08503, 2024 | 13 | 2024 |