Anwen Hu

Cited by

	All	Since 2019
Citations	710	709
h-index	9	9
i10-index	8	8

340

170

255

202020212022202320246 29 67 338 269

Public access

View all

7 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Haiyang XuAlibaba Group, DIDI AI LABS, SEUVerified email at seu.edu.cn
Qin Jin中国人民大学信息学院Verified email at ruc.edu.cn
Qinghao YeDAMO Academy, Alibaba Group; University of California, San DiegoVerified email at alibaba-inc.com
Guohai XuDAMO Academy, Alibaba GroupVerified email at alibaba-inc.com
Shizhe ChenINRIA ParisVerified email at inria.fr
Dou ZhichengRenmin University of ChinaVerified email at ruc.edu.cn
Ji-Rong WenRenmin University of ChinaVerified email at ruc.edu.cn
Jian-Yun Nieuniversity of montrealVerified email at iro.umontreal.ca

Anwen Hu

Alibaba Group

Verified email at ruc.edu.cn

Multimodal Pretraining Image Captioning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
mplug-owl: Modularization empowers large language models with multimodality Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang, A Hu, P Shi, Y Shi, C Li, ... arXiv preprint arXiv:2304.14178, 2023	397	2023
WenLan: Bridging vision and language by large-scale multi-modal pre-training Y Huo, M Zhang, G Liu, H Lu, Y Gao, G Yang, J Wen, H Zhang, B Xu, ... arXiv preprint arXiv:2103.06561, 2021	116	2021
mplug-docowl: Modularized multimodal large language model for document understanding J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao, G Xu, C Li, J Tian, Q Qi, ... arXiv preprint arXiv:2307.02499, 2023	45	2023
Leveraging multi-token entities in document-level named entity recognition A Hu, Z Dou, JY Nie, JR Wen Proceedings of the AAAI Conference on Artificial Intelligence 34 (05), 7961-7968, 2020	29	2020
Ureader: Universal ocr-free visually-situated language understanding with multimodal large language model J Ye, A Hu, H Xu, Q Ye, M Yan, G Xu, C Li, J Tian, Q Qian, J Zhang, Q Jin, ... arXiv preprint arXiv:2310.05126, 2023	27	2023
ICECAP: information concentrated entity-aware image captioning A Hu, S Chen, Q Jin Proceedings of the 28th ACM International Conference on Multimedia, 4217-4225, 2020	18	2020
A roadmap for big model S Yuan, H Zhao, S Zhao, J Leng, Y Liang, X Wang, J Yu, X Lv, Z Shao, ... arXiv preprint arXiv:2203.14101, 2022	17	2022
Question-controlled text-aware image captioning A Hu, S Chen, Q Jin Proceedings of the 29th ACM International Conference on Multimedia, 3097-3105, 2021	13	2021
mplug-paperowl: Scientific diagram analysis with the multimodal large language model A Hu, Y Shi, H Xu, J Ye, Q Ye, M Yan, C Li, Q Qian, J Zhang, F Huang arXiv preprint arXiv:2311.18248, 2023	9	2023
Youku-mplug: A 10 million large-scale chinese video-language dataset for pre-training and benchmarks H Xu, Q Ye, X Wu, M Yan, Y Miao, J Ye, G Xu, A Hu, Y Shi, G Xu, C Li, ... arXiv preprint arXiv:2306.04362, 2023	6	2023
Movie101: A new movie understanding benchmark Z Yue, Q Zhang, A Hu, L Zhang, Z Wang, Q Jin arXiv preprint arXiv:2305.12140, 2023	5	2023
mplug-docowl 1.5: Unified structure learning for ocr-free document understanding A Hu, H Xu, J Ye, M Yan, L Zhang, B Zhang, C Li, J Zhang, Q Jin, F Huang, ... arXiv preprint arXiv:2403.12895, 2024	4	2024
MPMQA: multimodal question answering on product manuals L Zhang, A Hu, J Zhang, S Hu, Q Jin Proceedings of the AAAI Conference on Artificial Intelligence 37 (11), 13958 …, 2023	4	2023
Accommodating audio modality in CLIP for multimodal processing L Ruan, A Hu, Y Song, L Zhang, S Zheng, Q Jin Proceedings of the AAAI Conference on Artificial Intelligence 37 (8), 9641-9649, 2023	4	2023
Generalizing multimodal pre-training into multilingual via language acquisition L Zhang, A Hu, Q Jin arXiv preprint arXiv:2206.11091, 2022	4	2022
InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation A Hu, S Chen, L Zhang, Q Jin arXiv preprint arXiv:2305.06002, 2023	3	2023
Document-level named entity recognition by incorporating global and neighbor features A Hu, Z Dou, J Wen Information Retrieval: 25th China Conference, CCIR 2019, Fuzhou, China …, 2019	3	2019
Multimodal pretraining from monolingual to multilingual L Zhang, L Ruan, A Hu, Q Jin Machine Intelligence Research 20 (2), 220-232, 2023	2	2023
Learning Semantics-Grounded Vocabulary Representation for Video-Text Retrieval Y Shi, H Liu, H Xu, Z Ma, Q Ye, A Hu, M Yan, J Zhang, F Huang, C Yuan, ... Proceedings of the 31st ACM International Conference on Multimedia, 4460-4470, 2023	1	2023
Explore and Tell: Embodied Visual Captioning in 3D Environments A Hu, S Chen, L Zhang, Q Jin Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	1	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors