Google Doc version (https://docs.google.com/document/d/1bAXIK85ttWJuOXRmsSm_-fTdAlMNKpJG7ddgIrHB77o/edit#)
Elena Volodina, Bea Megyesi, June, 04 - July, 15, 2018;
This document contains instructions for manual conversion of hand-written essays to a digital format, and recommends the following flow:
Acquaintance with the guidelines (1) means actual study of this document, optimally combined with a practical test-case using a number of real-life essays, to see how different questions can be guided.
Transcription workshop (2) is a practical one-day session when several annotators work on actual essays and discuss uncertain cases between themselves and with a responsible researcher. The workshop is aimed at resolving subjective judgements in favour of objective decisions. Optimally, annotators involved in this process can build some network so that when uncertainties arise during their later work, they can ask each other.
Transcription, individual phase (3) is an individual process when each annotator is working on his/her share of essays.
Cross-consultation (4) is a step which can be of use in uncertain cases. We recommend to get in contact with another assistant or a responsible researcher to double-check any uncertainties.
Transcription check (5) is performed by a third party, e.g. another annotator. During this stage random checks are performed on the transcribed files.
Two major principles for essay transcription are:
The following rules should apply to transcription of hand-written essays:
Authenticity of writing
Errors should be preserved from the hand-writing, e.g. no error correction! (see ③ in Figure 1). Tips: disable spell-checker.
If there is a dubious case - for instance, whether the learner has written correctly or incorrectly - a positive assumption should be made. For example, if it is unclear whether two words have been written as one or as two items (with too little space between them), the “positive” assumption would be that the learner meant to write two items, and in practice in the transcribed format the string should be separated into two words. When it is obvious that the two words are written as one, that should be preserved (typed as one item). (See ④ in Figure 1)
In many cases some basic knowledge of Swedish should help to understand what is written (see ⑤ in Figure 1; as well as Figure 2, lilac underlinings). If the handwriting is illegible, write $ (dollar-sign) for each character that cannot be understood.
Figure 1. Essay example. Level A1 (beginner), nr tokens: 117, topic: Presentation/Om mig; transcription time: 15 min.
Figure 2. Deciphering letters in student writing: o vs a; capital/non-capital; hyphenation. Level B1 (intermed.), nr tokens: 330, topic: Min första dag i Sverige; transcription time: 34 min.
Sometimes students can be very creative and “invent” their own letters (see Figure 3). In the case when you know there is an equivalent letter in another language, use that one, as for example, in the first word in Fig.3: Sverigё.
In the case, if there is no way to reflect that letter in writing, choose the closest one in shape, keeping to the positive assumption described in 5. For example, in the second example in Fig.3, the options could be o and å, but å holds the positive assumption, så the transcribed version should contain the word går.
If there is no way you can report a corresponding “created” letter, use dollar-sign $ as if it is an unintelligeble letter.
Figure 3. Invented letters
Graphical issues (supra-linguistic features)
Re-read each hand-written essay once again and compare with your transcribed version. You will get used to the student’s handwriting by then, and will - probably - understand better what is written. Another reason for re-reading the essays is to double-check that no unintended error-correction is introduced (rules 1-5).
To be able to give some time estimation, we have taken time for digitization of several essays per level. The summary follows in table 1 below.
It will take longer per essay in the beginning, when annotators are not yet confident with the guidelines and the process itself. The time will also depend upon the legibility of handwriting, and challenges of the writing, i.e. presence of challenging interpretations/uncertainties. Take the time estimations below only as an approximation.
Table 1. Time estimation essay transcription at different levels
During your work, write down your time per essay in an excel sheet acc. to the example below (Table 2):
Table 2. Time estimation for annotator work on transcription
The work on hand-written essays is potentially risky, since certain amount of personal information in the text (as well as handwriting itself) may give away a person behind the text. That is why this work has to be performed in a safe environment. You will be introduced to the SweLL “kiosk” option for that. See instructions for “kiosk” here: https://github.com/spraakbanken/swell-project/blob/master/SweLL_kiosk_user_manual.md
You can save your work in any format while you are working, but your should deliver it in a plain text format (in unicode utf-8).
Save information about time used for digitizing an essay - for our statistics.