swell-project

SweLL correction annotation guidelines

Motivation and purpose

The purpose of correction annotation is to categorize the corrections made in a learner text so the learner corpus becomes searchable for different types of deviations from the target language norm. The annotation of the learner texts according to the SweLL-projects correction-taxonomy is hence an important step in making the learner language collected in the corpus analyzable for SLA-research purposes. The normalization (see normalization guidelines) and in prolonging also the correction annotation of learner language always includes an element of interpretation since we cannot be exactly sure what the learner had in mind when attempting to express a certain content with a certain linguistic structure or expression. This document therefore contains guidence on how to apply the SeLL-annotation taxonomy in order to minimize ambiguity and ensure the greatest possible inter-annotater agreement. We first give some general guidelines and thereafter describe specific abigiuos cases that might accure in relation to the differnt corrections types.

problematisering här kriterier för feltaxonomin Något mer generellt om hur taxonomin är uppbyggd - vilka olika nivåer det finns osv - och varför


General principles of correction annotation


Taxonomy for annotation

SweLL correction annotation is done on the basis of the SweLL correction annotation taxonomy. The taxonomy is described as a whole in the SweLL corr-code book, which contains explanations and examples for all the correction codes in the SweLL-taxonomy.

Wrong alignment between source text and normalization

If the links between source text (original) and target text (normalized) are wrong, you correct them in correction annotation mode befor you tagg the correction! If you are not sure how the links should be corrected mark it as OBS!

More than one correction type within the same segment

If the same text segment contains more that one correction type, add all tags that are necessary.

Grouping of tokens

To show the alignment between the source text and the correction you sometimes need to group token together. That is done by first marking all the tokens (for Mac: hold shift!) you want to group and then clicking the “group” butten on the meny to the left.


Specific guidelines for different correction types

Concistence corrections - general function

This code covers necessary (follow-up) corrections in the text that come as a result of previous correction, i.e. originally there was no mistake in the segment, but due to an introduced correction in the neighbouring context, a correction is necessary in the segment. By using this code we indicate that the error was not made originally by the student.

Lexical corrections - general function

The correction-taxonomy contains six (?) diffent codes for lexical corrections.


Some specific guiudelines regarding lexical corrections

L-W

Wrong content word/phrase. Includes even phrasal verbs, reflexives with missing particle/reflexive marker, and multiword prepositions.

*Example (see also code book):

[skärmdump]

L-FL

L-FL applies to the use of an existing non-swedish word, i.e. a foreign word that is NOT conventionally used in Swedish and can be identified as an existing word in another language. Naturally annotators will not be able to identify all such cases, since we cannot expect them to be competent in all relevant languages, but it is still relevant to mark cases where possible

O or L-FL?

Crosslinguistic infuence lexical that can be indentified from the context but where the writer has formed a “Swedish” word from e.g. an Englinsh stem (see example below) is marked as L (wrong word), NOT L-FL

[insert illutrated example –> busiga –> buisy här!]

Morfological codes (7)


Morforlogical correction - general function

The SweLL-taxonomy further cantain eight morfological codes.


Some specific guidelines regarding morfological correction

M-DEF

corrections regarding the use of definite/indefinite form, applies also to a missing/redundant article. If there are sevaral corrections within the same noun phrase, e.g. article + adjective + noun, they are to be tagged individually.

If the learner has produced a congruent definate/indefinate noun phrase but used it in the wrong context, the main word in the phrase is tagged as an M-DEF-correcten and the changes in it’s determiners are tagged as consequence- corrections (C)._

*Example:

Use of pronoun i objectsform instead of definite article (dem-de) is also interpreted as an M-DEF-correction._


M-F

Applies to deviant paradigm selection, where the learner has chosen the correct grammatical category. This makes it possible to mark certain cases where the learner has acquired partial knowledge of the target language system but not something about can do here


M-NUM

corrections regarding the deviations in number agreement can apply to single words or groups of words. Mark elements that should agree with each other, if they are not “separated”![förtydliga!!]


M-other

This code is to be applied when there is no convincing arguments for any other morphological code.


Orthographical codes (3)


Orthographiacal correction - general function

The taxonomy disinguishes two different types of orthographical corrections


Some specific guidelines regarding orthographical corrections

???


Punctuation codes (4)


Punctuation correction - general functions

The are four different punctuation code in the SweLL-taxonomy


Some specific guideline regarding punctuation correction

SENT-segmentation

[Note! This correction type needs to be tested whether we need it or whether it can be covered by other codes] This code is only to be used where other punctuation codes cannot be applied, i.e. when a word in the source text is replaced by punctuation


Syntactical codes (6)


Syntactical corrections - general function

The SweLL-taxonomy contains six codes regarding syntactical corrections


Some specific guidelines regarding syntatical corrections

S-adv

_Applies to ALL word oder corrections involving adverbial placement - placement of sentence adverbial in relation to finite verb and all other kinds of missplaced adverbials (see code book and examples below).

Examples:


S-CON

Applies to problematic syntactical comstructions, whwre the deviation is more complex and therefore not covered of one of the other syntactical taggs (marked as OBS! in exemple below!)

Example:


S-Msubj vs L-M/(S-M?)

_[Here we need to adress the possible overlap/ambiguity between the lexical and syntactical codes when it comes to missing words/syntactical functions]


S-W

Wrong function word, e.g. preposition (also multiword presposition), auxilary verb; particle and reflexive marker_.

Intelligebility (1)


Intelligebility corrections - general function

There is one code for marking intelligeble parts of a learner text


Unidentified


Unidentifies´d corrections - general function

The taxonomy also contains one code for marking corrections that cannot be categorized according to other codes

*Uni


Swell code book (taxonomy för correction annotation + examples) [under revision!]

This code book contains the taxonomy for the SweLL-correction annotation. The codes are presented in alfabetical order with exemples. You also find illustrations from the annotation tool for sevaral of the examples in the appendix to the code book.


C (consistence)

This code covers necessary (follow-up) corrections in the text that come as a result of previous correction, i.e. originally there was no mistake in the segment, but due to an introduced correction in the neighbouring context, a correction is necessary in the segment. By using this code we indicate that the error was not made originally by the student.

Examples


Lexical codes (4)


L

Wrong word or phrase. Includes even phrasal verbs, reflexives with missing particle/reflexive marker, and multiword prepositions.

This error code can be used only to mark existing Swedish words that have been used in an incorrect way or context

Examples


L-DER

Deviant (existisng!) derivational affix used

Examples


L-FL

Foreign word (not conventionally used in Swedish)

Even if it similar to a Swedish word and only the spelling is different.

Examples


L-REF

Reference error

This error code is used to mark for example gender error in personal pronouns, as revieled by the context.

Examples


Morfological codes (8)


M-ADJ/ADV

Corrections concerning the confusions of adjective and adverbial endings

Examples


M-Case (Heter AGR i vertyget just nu!)

_corrections regarding the use of genitive (nouns) and dativ forms (pronouns)

Examples


M-DEF

Deviation in definite/indefinite forms, including missing/reduntant article.

May apply to groups of words.

Examples


M-F

Deviant paradigm selection, but correct grammatical category

Examples

*Jag sittade länge och väntade på bussen igår → jag satt länge och väntade på bussen igår.

*Vädret kommer att blir dåligare nästa vecka → vädret kommer att bli sämre nästa vecka.


M-GEND

Correction regarding grammatical gender

Examples


M-NUM

Deviation in number agreement. May apply to groups of words.

Examples


M-other

Ambigeous cases with several interpretaions - to be applied when there is no convincing arguments for any other morphological code.

Examples


M-VERB

Covers deviations in the verb phrase, i.e. aspect, tempus, modus, that is not covered by one of the morpholocal subcategories (M-CASE, M-NUM, M-F).

Examples


OBS!

Code for marking elements you want to return to for review


Orthographic codes (3)


O

Orthographic / spelling error

Examples


O-CAP

Error with capitalization (upper / lower)

Examples

O-COMP


Error within compounds (oversplitting, overcompounding)

Examples


Punctuation codes (4)


P-W

Wrong punctuation

Examples


P-R

Redundanr punctuation

Examples


P-M

Missing punctuation

Examples


SENT-segmentation (?)

Merging, or splitting a sentence. Note! This error type needs to be tested whether it can be coded through other codes Only to be used where other punctuation codes cannot be applied.

Examples


Syntactical codes (8)


S-adv

Word order error involving adverbial placement

Examples


S-COMP

Diviations regarding the choice between multiword expressions and compounds - including phrasel verbs

Example

S-finV

Word order error with finite verb placement

Examples


S-Msubj

Subject missing

Examples


S-M

word missing - not including def/indef articles (see M-DEF)!

Examples

S-R

Word or phrase redundant

Examples


S-WO

Word or phrase order - other

Examples


S-CON

_Problematic syntactical construction

Examples


X (intelligebility)

Impossible to interpret the writer’s intention with a word, phrase or sentence. Some wild guess may be added, but not necessarily.

Examples


Unidentified


Uni

Error that cannot be categorized according to other codes


Appendix


Illustrated examples


C (concistens)

Vi replace bostad (non-neuter word) with bostadsområde (neuter word), which influences the choice of pronoun min (non-neuter) versus mitt (neuter) according to the gender agreement rules.

L (wrong-word)

L-REF (referens correction)

M-F (diviant paradigm selection but correct grammatical category)

M-DEF

M-GEN

O

O-COMP



P-M


SENT-segmentation


S-adv


S-con (under OBS!-taggen)


S-finV


S-R


X