Final documentation for the SweLL corpora (as of August 2021)
Online version: https://spraakbanken.github.io/swell-release-v1/
Procedure for providing access to the SweLL corpora
- Application form: https://sunet.artologik.net/gu/swell
- Those who are (1) geographically from Europe and (2) have research interests within language learning (research, teaching, development, assessment, etc) should be approved.
- Counries that can be approved without lawyers:
- EU: Belgium, Bulgaria, Croatia, Republic of Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, and Sweden.
- EEA: Iceland, Lichtenstein, Norway
- (Note that the status of the UK is still not clear (October 2020). Note also that Schweiz does not belong to EU/EEA.)
- Those who deviate in any of the two conditions above, should be recommended to send their application to GU lawyers < dataskydd@gu.se >. Alternatively, we should forward their applications to that email
If approved:
- Store application file on Nextcloud (folder SweLL-v1/SweLL_user_agreements)
- Add Korp-user using https://ws.spraakbanken.gu.se/ to SweLL, SW1203, TISUS and SpIn corpora
- Add user-mail to the NextCloud-folder (for corpus file download, SweLL_release_v1), and mark “Read only”.
- Mail the applicant. Use for example the following text and cc to swell@svenska.gu.se:
Dear XXX,
Thank you for your interest in the SweLL data!
You should by now have received
(1) a mail invitation from “SBX” (or some variation of that) to access the data in the folder “SweLL_release_v1”. If not, please check your Junk-folder.
(2) a mail invitation to log in to Korp. To do searches in Korp in available learner corpora, please, check our webpage: https://spraakbanken.gu.se/en/projects/swell/l2korp .
This access in personal, and should not be shared with others.
Happy exploring,
SweLL team (swell@svenska.gu.se)
- The list of applicants should be available through https://sunet.artologik.net/gu/swell (application form), but we would need to evaluate the procedure and see whether there is a need to keep track of all approved users.
.zip files for download
The users who have been approved following an access application, will get access to the following two .zip files
- SweLL-pilot (collection period 2007-2016): 502 essays that were anonymized (with CEFR labels)
- SweLL-gold (collection period 2017-2020): 502 essays that were pseudonymized, normalized, correction annotated (no CEFR labels)
SweLL-pilot.zip contains
- folders for TISUS, SW1203 and SpIn subcorpora in three formats:
- json (SVALA format)
- xml (Korp format)
- xml with linguistic annotations (Korp format)
- raw text
- metadata in an excel file, ordered by essay-IDs; divided into subcorpora per spreadsheet
- metadata descriptions as pdf files
- readme file with links to medata descriptions for each subcorpus and links to articles: https://spraakbanken.github.io/swell-release-v1/Readme-SweLL-pilot
SweLL-gold.zip contains
- SweLL-gold corpus files (502 essays) in three formats:
- json (SVALA format)
- two xml files - original and normalizaed versions - Korp format, one for the original version and one for the normalized version
- two xml files - original and normalizaed versions - linguistically annotated in Korp format
- 2 files with raw texts, one for the original version and one for the normalized version
- metadata in an excel file, ordered by essay-IDs
- metadata description in pdf
- readme file with links to medata description, links to articles: https://spraakbanken.github.io/swell-release-v1/Readme-SweLL-gold
ReadMe files for corpus users