Publications
The MultiGEC dataset is presented in the paper
- Arianna Masciolini, Andrew Caines, Orphée De Clercq, Joni Kruijsbergen, Murathan Kurfalı, Ricardo Muñoz Sánchez, Elena Volodina, Robert Östling, Kais Allkivi, Špela Arhar Holdt, Ilze Auzina, Roberts Darg̀is, Elena Drakonaki, Jennifer-Carmen Frey, Isidora Glišić, Pinelopi Kikilintza, Lionel Nicolas, Mariana Romanyshyn, Alexandr Rosen, Alla Rozovskaya, Kristjan Suluste, Oleksiy Syvokon, Alexandros Tantos, Despoina-Ourania Touriki, Konstantinos Tsiotskas, Eleni Tsourilla, Vassilis Varsamopoulos, Katrin Wisniewski, Aleš Žagar, and Torsten Zesch. Towards better language representation in Natural Language Processing - a multilingual dataset for text-level Grammatical Error Correction. International Journal of Learner Corpus Research, 2025 [full text] [bibtex]
Additional details about the state of GEC for the 12 MultiGEC languages at the time the dataset was compiled are provided in
- Arianna Masciolini, Andrew Caines, Orphée De Clercq, Joni Kruijsbergen, Murathan Kurfalı, Ricardo Muñoz Sánchez, Elena Volodina, Robert Östling, Kais Allkivi, Špela Arhar Holdt, Ilze Auzina, Roberts Darg̀is, Elena Drakonaki, Jennifer-Carmen Frey, Isidora Glišić, Pinelopi Kikilintza, Lionel Nicolas, Mariana Romanyshyn, Alexandr Rosen, Alla Rozovskaya, Kristjan Suluste, Oleksiy Syvokon, Alexandros Tantos, Despoina-Ourania Touriki, Konstantinos Tsiotskas, Eleni Tsourilla, Vassilis Varsamopoulos, Katrin Wisniewski, Aleš Žagar, and Torsten Zesch. An overview of Grammatical Error Correction for the twelve MultiGEC-2025 languages. Gothenburg, Sweden, 2025. Institution for Swedish, Multilingualism, Language Technology; University of Gothenburg [full text] [bibtex]
An overview of the MultiGEC-2025 shared task is given in the paper
- Arianna Masciolini, Andrew Caines, Orphée De Clercq, Joni Kruijsbergen, Murathan Kurfalı, Ricardo Muñoz Sánchez, Elena Volodina, and Robert Östling. The MultiGEC-2025 shared task on Multilingual Grammatical Error Correction at NLP4CALL. In Ricardo Muñoz Sánchez, David Alfter, Elena Volodina, and Jelena Kallas, editors, Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning, pages 1-33, Tallinn, Estonia, March 2025. University of Tartu Library [full text] [bibtex]
Other publications using MultiGEC
- Ryszard Staruch. UAM-CSI at MultiGEC-2025: Parameter-efficient LLM fine-tuning for multilingual grammatical error correction. In Ricardo Muñoz Sánchez, David Alfter, Elena Volodina, and Jelena Kallas, editors, Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning, pages 42-49, Tallinn, Estonia, March 2025. University of Tartu Library [full text] [bibtex] (winner of the 2025 shared task)
- Olga Seminck, Yoann Dupont, Mathieu Dehouck, Qi Wang, Noé Durandard, and Margo Novikov. Lattice @MultiGEC-2025: A spitful multilingual language error correction system using LLaMA. In Ricardo Muñoz Sánchez, David Alfter, Elena Volodina, and Jelena Kallas, editors, Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning, pages 34-41, Tallinn, Estonia, March 2025. University of Tartu Library [full text] [bibtex]