The MultiGEC dataset is the result of a collaboration between the the CompSLA working group, responsible for the design, validation and distribution of the MultiGEC dataset, and over 20 external data providers, who contributed to its 17 subcorpora.

The individual MultiGEC subcorpora were curated by the following people:

Czech

  • Alexandr Rosen, Charles University, Prague

Acknowledgement

Compilation of the Czech part from existing resources was supported by the Czech Ministry of Education, Youth and Sports (grant no. LM2023044: Large Research, Development and Innovation Infrastructures). The task would be much harder without the source GECCC corpus, due to Jakub Náplava, Milan Straka and Jana Straková. Morevover, the result would be much smaller, less varied, or non-existent without previously built learner corpora, due to Karel Šebesta, Svatava Škodová, Barbora Štindlová, Jirka Hana and many others.

English

  • Diane Nicholls, ELiT, Cambridge University Press & Assessment
  • Andrew Caines, University of Cambridge
  • Paula Buttery, University of Cambridge

Acknowledgement

Andrew Caines has been supported by Cambridge University Press & Assessment. Thanks to Diane Nicholls and Paula Buttery for co-preparation of the Write \& Improve Corpus 2024. Thanks also to Christopher Bryant for discussion around the use of ERRANT cross-linguistically.

Estonian

  • Mark Fishel, University of Tartu, Estonia
  • Kais Allkivi, Tallinn University, Estonia
  • Kristjan Suluste, Eesti Keele Instituut, Estonia

Acknowledgement

We thank the following people for their contribution in compiling and annotating the corpus material: Pille Eslon, Kaisa Norak, Karina Kert, Silvia Maine and Linda Luig for their work on the EIC subcorpus; Kadri Sõrmus, Jelena Kallas, Sven Aller, Helen Kaljumäe, Silver Vapper, Anita Väli, Karoliina Jõgi and Marta Kohv for their work on the EKIL2 corpus and its source corpus EMMA; Krista Liin for consulting the work on both datasets. The work on EKIL2 dataset is co-funded by the European Union.

German

  • Andrea Horbach, IPN / CAU Kiel, Germany
  • Josef Ruppenhofer, FernUniversität in Hagen, Germany
  • Katrin Wisniewski, Universität Leipzig
  • Torsten Zesch, FernUniversität in Hagen, Germany

Greek

  • Alexandros Tantos, Aristotle University of Thessaloniki
  • Konstantinos Tsiotskas, Aristotle University of Thessaloniki
  • Vassilis Varsamopoulos, Aristotle University of Thessaloniki
  • Pinelopi Kikilintza, Aristotle University of Thessaloniki
  • Elena Drakonaki, Aristotle University of Thessaloniki
  • Eleni Tsourilla, Aristotle University of Thessaloniki
  • Despoina-Ourania Touriki, Aristotle University of Thessaloniki

Icelandic

  • Isidora Glišić, University of Iceland

Acknowledgement

The error corpora project was funded by the Icelandic Government as a part of the Language Technology Programme for Icelandic 2019–2023. We thank the following people for their contribution in collecting, correcting, and annotating the corpora: Anton Karl Ingason, Lilja Björk Stefánsdóttir, Þórunn Arnardóttir, Dagbjört Guðmundsdóttir, Isidora Glišić and Xindan Xu.

Italian

  • Jennifer-Carmen Frey, Eurac Research Bolzano, Italy
  • Lionel Nicolas, Eurac Research Bolzano, Italy

Latvian

  • Roberts Darģis, University of Latvia
  • Ilze Auzina, University of Latvia

Russian

  • Alla Rozovskaya, City University of New York (CUNY), USA

Slovene

  • Špela Arhar Holdt, University of Ljubljana, Slovenia
  • Aleš Žagar, University of Ljubljana, Slovenia

Acknowledgement

The research program Language Resources and Technologies for Slovene (P6-0411) and the projects Empirical Foundations for Digitally-Supported Development of Writing Skills (J7-3159) and Large Language Models for Digital Humanities (GC-0002) are funded by the Slovenian Research and Innovation Agency.

Swedish

  • Arianna Masciolini, University of Gothenburg, Sweden
  • Elena Volodina, University of Gothenburg, Sweden

Acknowledgement

Work on Swedish has been supported by Nationella språkbanken and Huminfra, both funded by the Swedish Research Council (2018-2024, contract 2017-00626; 2022-2024, contract 2021-00176) and their participating partner institutions, as well as the Swedish Research Council grant 2019-04129.

Ukrainian:

  • Oleksiy Syvokon, Microsoft
  • Mariana Romanyshyn, Grammarly

Acknowledgement

The creation of UA-GEC was initiated and supported by Grammarly. We extend special gratitude to Olena Nahorna, Pavlo Kuchmiichuk, Nastasiia Osidach, Ira Kotkalova, Anna Vesnii, Halyna Kolodkevych, and everyone else who participated in the corpus creation.