Publications

The MultiGEC dataset (version 1) is presented in the paper:

Arianna Masciolini, Andrew Caines, Orphée De Clercq, Joni Kruijsbergen, Murathan Kurfalı, Ricardo Muñoz Sánchez, Elena Volodina, Robert Östling, Kais Allkivi, Špela Arhar Holdt, Ilze Auzina, Roberts Darg̀is, Elena Drakonaki, Jennifer-Carmen Frey, Isidora Glišić, Pinelopi Kikilintza, Lionel Nicolas, Mariana Romanyshyn, Alexandr Rosen, Alla Rozovskaya, Kristjan Suluste, Oleksiy Syvokon, Alexandros Tantos, Despoina-Ourania Touriki, Konstantinos Tsiotskas, Eleni Tsourilla, Vassilis Varsamopoulos, Katrin Wisniewski, Aleš Žagar, and Torsten Zesch. Towards better language representation in Natural Language Processing - a multilingual dataset for text-level Grammatical Error Correction. International Journal of Learner Corpus Research, 2025 [full text] [bibtex]

Additional details about the state of GEC for the 12 MultiGEC languages at the time the first version of the dataset was compiled are provided in:

Arianna Masciolini, Andrew Caines, Orphée De Clercq, Joni Kruijsbergen, Murathan Kurfalı, Ricardo Muñoz Sánchez, Elena Volodina, Robert Östling, Kais Allkivi, Špela Arhar Holdt, Ilze Auzina, Roberts Darg̀is, Elena Drakonaki, Jennifer-Carmen Frey, Isidora Glišić, Pinelopi Kikilintza, Lionel Nicolas, Mariana Romanyshyn, Alexandr Rosen, Alla Rozovskaya, Kristjan Suluste, Oleksiy Syvokon, Alexandros Tantos, Despoina-Ourania Touriki, Konstantinos Tsiotskas, Eleni Tsourilla, Vassilis Varsamopoulos, Katrin Wisniewski, Aleš Žagar, and Torsten Zesch. An overview of Grammatical Error Correction for the twelve MultiGEC-2025 languages. Gothenburg, Sweden, 2025. Institution for Swedish, Multilingualism, Language Technology; University of Gothenburg [full text] [bibtex]

Data that was incorporated into MultiGEC after the shared task is described in:

Karl Törnblom Bartholf. From Fuþark to essay - how well does the Viking LLM perform Grammatical Error Correction? 2025 [bibtex] [full text]

MultiGEC-2025 shared task

An overview of the MultiGEC-2025 shared task is given in the paper:

Arianna Masciolini, Andrew Caines, Orphée De Clercq, Joni Kruijsbergen, Murathan Kurfalı, Ricardo Muñoz Sánchez, Elena Volodina, and Robert Östling. The MultiGEC-2025 shared task on Multilingual Grammatical Error Correction at NLP4CALL. In Ricardo Muñoz Sánchez, David Alfter, Elena Volodina, and Jelena Kallas, editors, Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning, pages 1-33, Tallinn, Estonia, March 2025. University of Tartu Library [full text] [bibtex]

System descriptions

Ryszard Staruch. UAM-CSI at MultiGEC-2025: Parameter-efficient LLM fine-tuning for multilingual grammatical error correction. In Ricardo Muñoz Sánchez, David Alfter, Elena Volodina, and Jelena Kallas, editors, Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning, pages 42-49, Tallinn, Estonia, March 2025. University of Tartu Library [full text] [bibtex] (winner)
Olga Seminck, Yoann Dupont, Mathieu Dehouck, Qi Wang, Noé Durandard, and Margo Novikov. Lattice @MultiGEC-2025: A spitful multilingual language error correction system using LLaMA. In Ricardo Muñoz Sánchez, David Alfter, Elena Volodina, and Jelena Kallas, editors, Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning, pages 34-41, Tallinn, Estonia, March 2025. University of Tartu Library [full text] [bibtex]

Other publications referencing MultiGEC

Arvin Jalali. Grammatical Error Correction using Large Language Models: A case study on Universal Dependencies treebanks. 2025 [full text] [bibtex]
Roman Kovalchuk, Mariana Romanyshyn, and Petro Ivaniuk. Introducing OmniGEC: A silver multilingual dataset for grammatical error correction. In Mariana Romanyshyn, editor, Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025), pages 162–178, Vienna, Austria (online), jul 2025. Association for Computational Linguistics [full text] [bibtex]
Mengyang Qiu, Tran Minh Nguyen, Zihao Huang, Zelong Li, Yang Gu, Qingyu Gao, Siliang Liu, and Jungyeul Park. Multilingual grammatical error annotation: Combining language-agnostic framework with language-specific flexibility, 2025 [full text] [bibtex]
Josef Ruppenhofer, Annette Annette Portmann, Christine Renker, Matthias Schwendemann, Katrin Wisniewski, and Torsten Zesch. Where it’s at: Annotating verb placement types in learner language. In Siyao Peng and Ines Rehbein, editors, Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025), pages 187–200, Vienna, Austria, jul 2025. Association for Computational Linguistics [full text] [bibtex]
Robert Östling, Murathan Kurfali, and Andrew Caines. LLM-based post-editing as reference-free GEC evaluation. In Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, and Zheng Yuan, editors, Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 213–224, Vienna, Austria, jul 2025. Association for Computational Linguistics [full text] [bibtex]
Martin Vainikko, Taavi Kamarik, Karina Kert, Krista Liin, Silvia Maine, Kais Allkivi, Annekatrin Kaivapalu, and Mark Fishel. Paragraph-level error correction and explanation generation: Case study for Estonian. In Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, and Zheng Yuan, editors, Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 953–967, Vienna, Austria, jul 2025. Association for Computational Linguistics [full text] [bibtex]
Martin Vainikko, Taavi Kamarik, Karina Kert, Krista Liin, Silvia Maine, Kais Allkivi, Annekatrin Kaivapalu, and Mark Fishel. Paragraph-level error correction and explanation generation: Case study for Estonian. In Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, and Zheng Yuan, editors, Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 953–967, Vienna, Austria, jul 2025. Association for Computational Linguistics [full text] [bibtex]
Emilia Piotrowska. Multilingual document-level GEC evaluation, 2025 [full text] [bibtex]