Follow
Jaume Zaragoza-Bernabeu
Jaume Zaragoza-Bernabeu
Prompsit Language Engineering
Verified email at prompsit.com
Title
Cited by
Cited by
Year
ParaCrawl: Web-scale acquisition of parallel corpora
M Bañón, P Chen, B Haddow, K Heafield, H Hoang, M Esplà-Gomis, ...
Association for Computational Linguistics (ACL), 2020
2042020
Bifixer and bicleaner: two open-source tools to clean your parallel data
G Ramírez‐Sánchez, J Zaragoza-Bernabeu, M Bañón, S Ortiz-Rojas
Proceedings of the 22nd Annual Conference of the European Association for …, 2020
352020
Multilingual comparable corpora of parliamentary debates ParlaMint 2.1
T Erjavec, M Ogrodniczuk, P Osenova, N Ljubešić, K Simov, V Grigorova, ...
CLARIN ERIC, 2021
152021
MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages
M Banón, M Espla-Gomis, ML Forcada, C García-Romero, T Kuzman, ...
23rd Annual Conference of the European Association for Machine Translation …, 2022
112022
Bicleaner at WMT 2020: Universitat d’Alacant-Prompsit’s submission to the parallel corpus filtering shared task
M Espla-Gomis, VM Sánchez-Cartagena, J Zaragoza-Bernabeu, ...
Proceedings of the fifth conference on machine translation, 952-958, 2020
112020
Bicleaner AI: Bicleaner goes neural
J Zaragoza-Bernabeu, G Ramírez‐Sánchez, M Bañón, S Ortiz-Rojas
Proceedings of the Thirteenth Language Resources and Evaluation Conference …, 2022
82022
Slovene-English parallel corpus MaCoCu-sl-en 2.0
M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ...
Jožef Stefan Institute, 2023
42023
HPLT: High Performance Language Technologies
M Aulamo, N Bogoychev, S Ji, G Nail, G Ramírez‐Sánchez, J Tiedemann, ...
Proceedings of the 24th Annual Conference of the European Association for …, 2023
22023
Croatian-English parallel corpus MaCoCu-hr-en 2.0
M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ...
Jožef Stefan Institute, 2023
22023
Serbian-English parallel corpus MaCoCu-sr-en 1.0
M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ...
Jožef Stefan Institute, 2023
2*2023
FastSpell: the LangId Magic Spell
M Bañón, J Zaragoza-Bernabeu, G Ramírez-Sánchez, S Ortiz-Rojas
arXiv preprint arXiv:2404.08345, 2024
12024
OpusCleaner and OpusTrainer, open source toolkits for training Machine Translation and Large language models
N Bogoychev, J van der Linde, G Nail, B Haddow, J Zaragoza-Bernabeu, ...
arXiv preprint arXiv:2311.14838, 2023
12023
Bulgarian-English parallel corpus MaCoCu-bg-en 2.0
M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ...
Jožef Stefan Institute, 2023
12023
Human evaluation of web-crawled parallel corpora for machine translation
G Ramírez‐Sánchez, M Bañón, J Zaragoza-Bernabeu, S Ortiz-Rojas
Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval …, 2022
12022
A New Massive Multilingual Dataset for High-Performance Language Technologies
O de Gibert, G Nail, N Arefyev, M Bañón, J van der Linde, S Ji, ...
arXiv preprint arXiv:2403.14009, 2024
2024
Ukrainian-English parallel corpus MaCoCu-uk-en 1.0
M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ...
Jožef Stefan Institute, 2023
2023
Albanian-English parallel corpus MaCoCu-sq-en 1.0
M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ...
Jožef Stefan Institute, 2023
2023
Icelandic-English parallel corpus MaCoCu-is-en 2.0
M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ...
Jožef Stefan Institute, 2023
2023
Turkish-English parallel corpus MaCoCu-tr-en 2.0
M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ...
Jožef Stefan Institute, 2023
2023
Maltese-English parallel corpus MaCoCu-mt-en 2.0
M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ...
Jožef Stefan Institute, 2023
2023
The system can't perform the operation now. Try again later.
Articles 1–20