Contributors

The MultiGEC dataset is the result of a collaboration between the the CompSLA working group, responsible for the design, validation and distribution of the MultiGEC dataset, and over 20 external data providers, who contributed to its 17 subcorpora.

The individual MultiGEC subcorpora were curated by the following people:

Czech

Alexandr Rosen, Charles University, Prague

Acknowledgement

Compilation of the Czech part from existing resources was supported by the Czech Ministry of Education, Youth and Sports (grant no. LM2023044: Large Research, Development and Innovation Infrastructures). The task would be much harder without the source GECCC corpus, due to Jakub Náplava, Milan Straka and Jana Straková. Morevover, the result would be much smaller, less varied, or non-existent without previously built learner corpora, due to Karel Šebesta, Svatava Škodová, Barbora Štindlová, Jirka Hana and many others.

English

Diane Nicholls, ELiT, Cambridge University Press & Assessment
Andrew Caines, University of Cambridge
Paula Buttery, University of Cambridge

Acknowledgement

Andrew Caines has been supported by Cambridge University Press & Assessment. Thanks to Diane Nicholls and Paula Buttery for co-preparation of the Write & Improve Corpus 2024.

Estonian

Mark Fishel, University of Tartu, Estonia
Kais Allkivi, Tallinn University, Estonia
Kristjan Suluste, Eesti Keele Instituut, Estonia

Acknowledgement

We thank the following people for their contribution in compiling and annotating the corpus material: Pille Eslon, Kaisa Norak, Karina Kert, Silvia Maine and Linda Luig for their work on the EIC subcorpus; Kadri Sõrmus, Jelena Kallas, Sven Aller, Helen Kaljumäe, Silver Vapper, Anita Väli, Karoliina Jõgi and Marta Kohv for their work on the EKIL2 corpus and its source corpus EMMA; Krista Liin for consulting the work on both datasets. The work on EKIL2 dataset is co-funded by the European Union.

German

Andrea Horbach, IPN / CAU Kiel, Germany
Josef Ruppenhofer, FernUniversität in Hagen, Germany
Katrin Wisniewski, Universität Leipzig
Torsten Zesch, FernUniversität in Hagen, Germany

Acknowledgement

The MERLIN project was funded from 2012 until 2014 by the EU Lifelong Learning Programme under project number 518989-LLP-1-2011-1-DE-KA2-KA2MP.

Greek

Alexandros Tantos, Aristotle University of Thessaloniki
Konstantinos Tsiotskas, Aristotle University of Thessaloniki
Vassilis Varsamopoulos, Aristotle University of Thessaloniki
Pinelopi Kikilintza, Aristotle University of Thessaloniki
Elena Drakonaki, Aristotle University of Thessaloniki
Eleni Tsourilla, Aristotle University of Thessaloniki
Despoina-Ourania Touriki, Aristotle University of Thessaloniki

Acknowledgement

The compilation of the Greek Learner Corpus II has been supported by the Hellenic Foundation for Research and Innovation through the project “Latent Aspects in L2 Acquisition (LAL2A)” (Grant Number 3161) as part of the 1st Call for “Research Projects to Support Faculty Members & Researchers and Procure High-Value Research Equipment”.

Icelandic

Isidora Glišić, University of Iceland

Acknowledgement

The error corpora project was funded by the Icelandic Government as a part of the Language Technology Programme for Icelandic 2019–2023. We thank the following people for their contribution in collecting, correcting, and annotating the corpora: Anton Karl Ingason, Lilja Björk Stefánsdóttir, Þórunn Arnardóttir, Dagbjört Guðmundsdóttir, Isidora Glišić and Xindan Xu.

Italian

Jennifer-Carmen Frey, Eurac Research Bolzano, Italy
Lionel Nicolas, Eurac Research Bolzano, Italy

Acknowledgement

The MERLIN project was funded from 2012 until 2014 by the EU Lifelong Learning Programme under project number 518989-LLP-1-2011-1-DE-KA2-KA2MP.

Latvian

Roberts Darģis, IMCS at the University of Latvia
Ilze Auzina, IMCS at the University of Latvia

Acknowledgement

Work on Latvian has been supported by the State Research Programme’s project LATE (grant agreement No. VPP-LETONIKA-2021/1-0006), which is in synergy with the Latvian Council of Science grant Common Writing Errors in Latvian (lzp-2023/1-0481).

Russian

Alla Rozovskaya, City University of New York (CUNY), USA

Slovene

Špela Arhar Holdt, University of Ljubljana, Slovenia
Aleš Žagar, University of Ljubljana, Slovenia

Acknowledgement

The research program Language Resources and Technologies for Slovene (P6-0411) and the projects Empirical Foundations for Digitally-Supported Development of Writing Skills (J7-3159) and Large Language Models for Digital Humanities (GC-0002) are funded by the Slovenian Research and Innovation Agency.

Swedish

Arianna Masciolini, University of Gothenburg, Sweden
Elena Volodina, University of Gothenburg, Sweden
Karl Törnblom Bartholf

Acknowledgement

Work on Swedish has been supported by Nationella Språkbanken and Huminfra, both funded by the Swedish Research Council (2018-2024, contract 2017-00626; 2022-2024, contract 2021-00176) and their participating partner institutions, as well as the Swedish Research Council grant 2019-04129.

Ukrainian:

Oleksiy Syvokon, Microsoft
Mariana Romanyshyn, Grammarly

Acknowledgement

The creation of UA-GEC was initiated and supported by Grammarly. We extend special gratitude to Olena Nahorna, Pavlo Kuchmiichuk, Nastasiia Osidach, Ira Kotkalova, Anna Vesnii, Halyna Kolodkevych, and everyone else who participated in the corpus creation.