Follow
Anwen Hu
Anwen Hu
Alibaba Group
Verified email at ruc.edu.cn
Title
Cited by
Cited by
Year
mplug-owl: Modularization empowers large language models with multimodality
Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang, A Hu, P Shi, Y Shi, C Li, ...
arXiv preprint arXiv:2304.14178, 2023
3972023
WenLan: Bridging vision and language by large-scale multi-modal pre-training
Y Huo, M Zhang, G Liu, H Lu, Y Gao, G Yang, J Wen, H Zhang, B Xu, ...
arXiv preprint arXiv:2103.06561, 2021
1162021
mplug-docowl: Modularized multimodal large language model for document understanding
J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao, G Xu, C Li, J Tian, Q Qi, ...
arXiv preprint arXiv:2307.02499, 2023
452023
Leveraging multi-token entities in document-level named entity recognition
A Hu, Z Dou, JY Nie, JR Wen
Proceedings of the AAAI Conference on Artificial Intelligence 34 (05), 7961-7968, 2020
292020
Ureader: Universal ocr-free visually-situated language understanding with multimodal large language model
J Ye, A Hu, H Xu, Q Ye, M Yan, G Xu, C Li, J Tian, Q Qian, J Zhang, Q Jin, ...
arXiv preprint arXiv:2310.05126, 2023
272023
ICECAP: information concentrated entity-aware image captioning
A Hu, S Chen, Q Jin
Proceedings of the 28th ACM International Conference on Multimedia, 4217-4225, 2020
182020
A roadmap for big model
S Yuan, H Zhao, S Zhao, J Leng, Y Liang, X Wang, J Yu, X Lv, Z Shao, ...
arXiv preprint arXiv:2203.14101, 2022
172022
Question-controlled text-aware image captioning
A Hu, S Chen, Q Jin
Proceedings of the 29th ACM International Conference on Multimedia, 3097-3105, 2021
132021
mplug-paperowl: Scientific diagram analysis with the multimodal large language model
A Hu, Y Shi, H Xu, J Ye, Q Ye, M Yan, C Li, Q Qian, J Zhang, F Huang
arXiv preprint arXiv:2311.18248, 2023
92023
Youku-mplug: A 10 million large-scale chinese video-language dataset for pre-training and benchmarks
H Xu, Q Ye, X Wu, M Yan, Y Miao, J Ye, G Xu, A Hu, Y Shi, G Xu, C Li, ...
arXiv preprint arXiv:2306.04362, 2023
62023
Movie101: A new movie understanding benchmark
Z Yue, Q Zhang, A Hu, L Zhang, Z Wang, Q Jin
arXiv preprint arXiv:2305.12140, 2023
52023
mplug-docowl 1.5: Unified structure learning for ocr-free document understanding
A Hu, H Xu, J Ye, M Yan, L Zhang, B Zhang, C Li, J Zhang, Q Jin, F Huang, ...
arXiv preprint arXiv:2403.12895, 2024
42024
MPMQA: multimodal question answering on product manuals
L Zhang, A Hu, J Zhang, S Hu, Q Jin
Proceedings of the AAAI Conference on Artificial Intelligence 37 (11), 13958 …, 2023
42023
Accommodating audio modality in CLIP for multimodal processing
L Ruan, A Hu, Y Song, L Zhang, S Zheng, Q Jin
Proceedings of the AAAI Conference on Artificial Intelligence 37 (8), 9641-9649, 2023
42023
Generalizing multimodal pre-training into multilingual via language acquisition
L Zhang, A Hu, Q Jin
arXiv preprint arXiv:2206.11091, 2022
42022
InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation
A Hu, S Chen, L Zhang, Q Jin
arXiv preprint arXiv:2305.06002, 2023
32023
Document-level named entity recognition by incorporating global and neighbor features
A Hu, Z Dou, J Wen
Information Retrieval: 25th China Conference, CCIR 2019, Fuzhou, China …, 2019
32019
Multimodal pretraining from monolingual to multilingual
L Zhang, L Ruan, A Hu, Q Jin
Machine Intelligence Research 20 (2), 220-232, 2023
22023
Learning Semantics-Grounded Vocabulary Representation for Video-Text Retrieval
Y Shi, H Liu, H Xu, Z Ma, Q Ye, A Hu, M Yan, J Zhang, F Huang, C Yuan, ...
Proceedings of the 31st ACM International Conference on Multimedia, 4460-4470, 2023
12023
Explore and Tell: Embodied Visual Captioning in 3D Environments
A Hu, S Chen, L Zhang, Q Jin
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
12023
The system can't perform the operation now. Try again later.
Articles 1–20