Follow
Zhexin Zhang
Zhexin Zhang
Verified email at mails.tsinghua.edu.cn
Title
Cited by
Cited by
Year
Safety assessment of chinese large language models
H Sun, Z Zhang, J Deng, J Cheng, M Huang
arXiv preprint arXiv:2304.10436, 2023
512023
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics
J Guan, Z Zhang, Z Feng, Z Liu, W Ding, X Mao, C Fan, M Huang
ACL 2021, 2021
392021
Safetybench: Evaluating the safety of large language models with multiple choice questions
Z Zhang, L Lei, L Wu, R Sun, Y Huang, C Long, X Liu, X Lei, J Tang, ...
arXiv preprint arXiv:2309.07045, 2023
352023
Recent advances towards safe, responsible, and moral dialogue systems: A survey
J Deng, H Sun, Z Zhang, J Cheng, M Huang
arXiv preprint arXiv:2302.09270 1, 2023
262023
Defending large language models against jailbreaking attacks through goal prioritization
Z Zhang, J Yang, P Ke, M Huang
arXiv preprint arXiv:2311.09096, 2023
222023
Unveiling the implicit toxicity in large language models
J Wen, P Ke, H Sun, Z Zhang, C Li, J Bai, M Huang
arXiv preprint arXiv:2311.17391, 2023
132023
Persona-Guided Planning for Controlling the Protagonist's Persona in Story Generation
Z Zhang, J Wen, J Guan, M Huang
NAACL 2022, 2022
132022
Ethicist: Targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation
Z Zhang, J Wen, M Huang
arXiv preprint arXiv:2307.04401, 2023
72023
MoralDial: A framework to train and evaluate moral dialogue systems via moral discussions
H Sun, Z Zhang, F Mi, Y Wang, W Liu, J Cui, B Wang, Q Liu, M Huang
arXiv preprint arXiv:2212.10720, 2022
72022
Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Z Zhang, J Cheng, H Sun, J Deng, F Mi, Y Wang, L Shang, M Huang
EMNLP 2022 Findings, 2022
62022
Automatic comment generation for Chinese student narrative essays
Z Zhang, J Guan, G Xu, Y Tian, M Huang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022
42022
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
Z Zhang, Y Lu, J Ma, D Zhang, R Li, P Ke, H Sun, L Sha, Z Sui, H Wang, ...
arXiv preprint arXiv:2402.16444, 2024
22024
InstructSafety: A Unified Framework for Building Multidimensional and Explainable Safety Detector through Instruction Tuning
Z Zhang, J Cheng, H Sun, J Deng, M Huang
Findings of the Association for Computational Linguistics: EMNLP 2023, 10421 …, 2023
22023
Enhancing Offensive Language Detection with Data Augmentation and Knowledge Distillation
J Deng, Z Chen, H Sun, Z Zhang, J Wu, S Nakagawa, F Ren, M Huang
Research 6, 0189, 2023
12023
Selecting Stickers in Open-Domain Dialogue through Multitask Learning
Z Zhang, Y Zhu, Z Fei, J Zhang, J Zhou
ACL 2022 Findings, 2022
12022
Moraldial: A framework to train and evaluate moral dialogue systems via constructing moral discussions
H Sun, Z Zhang, F Mi, Y Wang, W Liu, J Cui, B Wang, Q Liu, M Huang
arXiv preprint arXiv:2212.10720, 2022
12022
Self-Supervised Sentence Polishing by Adding Engaging Modifiers
Z Zhang, J Guan, X Cui, Y Ran, B Liu, M Huang
Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023
2023
The system can't perform the operation now. Try again later.
Articles 1–17