| --- |
| tags: |
| - spacy |
| - dacy |
| - danish |
| - token-classification |
| - pos tagging |
| - morphological analysis |
| - lemmatization |
| - dependency parsing |
| - named entity recognition |
| - coreference resolution |
| - named entity linking |
| - named entity disambiguation |
| language: |
| - da |
| license: apache-2.0 |
| model-index: |
| - name: da_dacy_large_trf-0.2.0 |
| results: |
| - task: |
| name: NER |
| type: token-classification |
| metrics: |
| - name: NER Precision |
| type: precision |
| value: 0.8858195212 |
| - name: NER Recall |
| type: recall |
| value: 0.8620071685 |
| - name: NER F Score |
| type: f_score |
| value: 0.8737511353 |
| dataset: |
| name: DaNE |
| split: test |
| type: dane |
| - task: |
| name: TAG |
| type: token-classification |
| metrics: |
| - name: TAG (XPOS) Accuracy |
| type: accuracy |
| value: 0.9913668347 |
| dataset: |
| name: UD Danish DDT |
| split: test |
| type: universal_dependencies |
| config: da_ddt |
| - task: |
| name: POS |
| type: token-classification |
| metrics: |
| - name: POS (UPOS) Accuracy |
| type: accuracy |
| value: 0.9908174469 |
| dataset: |
| name: UD Danish DDT |
| split: test |
| type: universal_dependencies |
| config: da_ddt |
| - task: |
| name: MORPH |
| type: token-classification |
| metrics: |
| - name: Morph (UFeats) Accuracy |
| type: accuracy |
| value: 0.9880227568 |
| dataset: |
| name: UD Danish DDT |
| split: test |
| type: universal_dependencies |
| config: da_ddt |
| - task: |
| name: LEMMA |
| type: token-classification |
| metrics: |
| - name: Lemma Accuracy |
| type: accuracy |
| value: 0.9589423796 |
| dataset: |
| name: UD Danish DDT |
| split: test |
| type: universal_dependencies |
| config: da_ddt |
| - task: |
| name: UNLABELED_DEPENDENCIES |
| type: token-classification |
| metrics: |
| - name: Unlabeled Attachment Score (UAS) |
| type: f_score |
| value: 0.9280885781 |
| dataset: |
| name: UD Danish DDT |
| split: test |
| type: universal_dependencies |
| config: da_ddt |
| - task: |
| name: LABELED_DEPENDENCIES |
| type: token-classification |
| metrics: |
| - name: Labeled Attachment Score (LAS) |
| type: f_score |
| value: 0.9079997669 |
| dataset: |
| name: UD Danish DDT |
| split: test |
| type: universal_dependencies |
| config: da_ddt |
| - task: |
| name: SENTS |
| type: token-classification |
| metrics: |
| - name: Sentences F-Score |
| type: f_score |
| value: 1.0 |
| dataset: |
| name: UD Danish DDT |
| split: test |
| type: universal_dependencies |
| config: da_ddt |
| - task: |
| name: coreference-resolution |
| type: coreference-resolution |
| metrics: |
| - name: LEA |
| type: f_score |
| value: 0.4672143289 |
| dataset: |
| name: DaCoref |
| type: alexandrainst/dacoref |
| split: custom |
| - task: |
| name: coreference-resolution |
| type: coreference-resolution |
| metrics: |
| - name: Named entity Linking Precision |
| type: precision |
| value: 0.84 |
| - name: Named entity Linking Recall |
| type: recall |
| value: 0.2153846154 |
| - name: Named entity Linking F Score |
| type: f_score |
| value: 0.3428571429 |
| dataset: |
| name: DaNED |
| type: named-entity-linking |
| split: custom |
| library_name: spacy |
| datasets: |
| - universal_dependencies |
| - dane |
| - alexandrainst/dacoref |
| metrics: |
| - accuracy |
| --- |
| |
| <a href="https://github.com/centre-for-humanities-computing/Dacy"><img src="https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png" width="175" height="175" align="right" /></a> |
|
|
| # DaCy large |
|
|
| DaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines. |
| DaCy's largest pipeline has achieved State-of-the-Art performance on parts-of-speech tagging and dependency |
| parsing for Danish on the Danish Dependency treebank as well as competitive performance on named entity recognition, named entity disambiguation and coreference resolution. |
| To read more check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results. |
| DaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines. |
|
|
|
|
| | Feature | Description | |
| | --- | --- | |
| | **Name** | `da_dacy_large_trf` | |
| | **Version** | `0.2.0` | |
| | **spaCy** | `>=3.5.2,<3.6.0` | |
| | **Default Pipeline** | `transformer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `ner`, `coref`, `span_resolver`, `span_cleaner`, `entity_linker` | |
| | **Components** | `transformer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `ner`, `coref`, `span_resolver`, `span_cleaner`, `entity_linker` | |
| | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) | |
| | **Sources** | [UD Danish DDT v2.11](https://github.com/UniversalDependencies/UD_Danish-DDT) (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)<br />[DaNE](https://huggingface.co/datasets/dane) (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)<br />[DaCoref](https://huggingface.co/datasets/alexandrainst/dacoref) (Buch-Kromann, Matthias)<br />[DaNED](https://danlp-alexandra.readthedocs.io/en/stable/docs/datasets.html#daned) (Barrett, M. J., Lam, H., Wu, M., Lacroix, O., Plank, B., & Søgaard, A.)<br />[chcaa/dfm-encoder-large-v1](https://huggingface.co/chcaa/dfm-encoder-large-v1) (The Danish Foundation Models team) | |
| | **License** | `Apache-2.0` | |
| | **Author** | [Kenneth Enevoldsen](https://chcaa.io/#/) | |
|
|
| ### Label Scheme |
|
|
| <details> |
|
|
| <summary>View label scheme (211 labels for 4 components)</summary> |
|
|
| | Component | Labels | |
| | --- | --- | |
| | **`tagger`** | `ADJ`, `ADP`, `ADV`, `AUX`, `CCONJ`, `DET`, `INTJ`, `NOUN`, `NUM`, `PART`, `PRON`, `PROPN`, `PUNCT`, `SCONJ`, `SYM`, `VERB`, `X` | |
| | **`morphologizer`** | `AdpType=Prep\|POS=ADP`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=AUX\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=PROPN`, `Definite=Ind\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=SCONJ`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=ADV`, `Number=Plur\|POS=DET\|PronType=Dem`, `Degree=Pos\|Number=Plur\|POS=ADJ`, `Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=PUNCT`, `NumType=Ord\|POS=ADJ`, `POS=CCONJ`, `Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `POS=VERB\|VerbForm=Inf\|Voice=Act`, `Case=Acc\|Gender=Neut\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Sup\|POS=ADV`, `Degree=Pos\|POS=ADV`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Number=Plur\|POS=DET\|PronType=Ind`, `POS=ADP`, `POS=ADV\|PartType=Inf`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Mood=Ind\|POS=AUX\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=ADP\|PartType=Inf`, `Definite=Ind\|Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `NumType=Card\|POS=NUM`, `Degree=Pos\|POS=ADJ`, `Definite=Ind\|Number=Sing\|POS=AUX\|Tense=Past\|VerbForm=Part`, `POS=PART\|PartType=Inf`, `Case=Acc\|POS=PRON\|Person=3\|PronType=Prs\|Reflex=Yes`, `Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=VERB\|Tense=Pres\|VerbForm=Part`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Definite=Def\|Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `POS=AUX\|VerbForm=Inf\|Voice=Act`, `Definite=Ind\|Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Definite=Ind\|Degree=Cmp\|Number=Sing\|POS=ADJ`, `Degree=Cmp\|POS=ADJ`, `POS=PRON\|PartType=Inf`, `Definite=Ind\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Nom\|Gender=Com\|POS=PRON\|PronType=Ind`, `Number=Plur\|POS=PRON\|PronType=Ind`, `POS=INTJ`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Case=Gen\|Number=Plur\|POS=DET\|PronType=Ind`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Pass`, `Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Degree=Cmp\|POS=ADV`, `Number=Plur\|Number[psor]=Plur\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Case=Gen\|POS=PROPN`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Ind`, `Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Definite=Def\|Degree=Sup\|POS=ADJ`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Dem`, `Definite=Def\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `POS=PRON\|PronType=Dem`, `Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Number=Plur\|POS=NUM`, `POS=VERB\|VerbForm=Inf\|Voice=Pass`, `Definite=Def\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `POS=PRON`, `Definite=Ind\|Number=Sing\|POS=NOUN`, `Definite=Ind\|Number=Sing\|POS=NUM`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Foreign=Yes\|POS=ADV`, `POS=NOUN`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Gender=Com\|Number=Plur\|POS=NOUN`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Ind`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Degree=Sup\|POS=ADJ`, `Degree=Pos\|Number=Sing\|POS=ADJ`, `Mood=Imp\|POS=VERB`, `Case=Nom\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Case=Acc\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `POS=X`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Number=Plur\|POS=PRON\|PronType=Dem`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Number=Plur\|POS=PRON\|PronType=Int,Rel`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Degree=Cmp\|Number=Plur\|POS=ADJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Com\|POS=PRON\|PronType=Int,Rel`, `Case=Gen\|Degree=Pos\|Number=Plur\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `POS=VERB\|VerbForm=Ger`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Dem`, `Case=Gen\|POS=PRON\|PronType=Int,Rel`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Pass`, `Abbr=Yes\|POS=X`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|Number=Plur\|POS=NOUN`, `Foreign=Yes\|POS=X`, `Number=Plur\|POS=PRON\|PronType=Rcp`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Case=Gen\|Degree=Cmp\|POS=ADJ`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Dem`, `Number=Plur\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Gender=Neut\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Number=Plur\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Number=Plur\|POS=PRON\|PronType=Rcp`, `POS=DET\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `POS=SYM`, `POS=DET\|PronType=Dem`, `Gender=Com\|Number=Sing\|POS=NUM`, `Number[psor]=Plur\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Degree=Abs\|POS=ADJ`, `POS=VERB\|Tense=Pres`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NUM`, `Degree=Abs\|POS=ADV`, `Case=Gen\|Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Ind\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Number[psor]=Plur\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|POS=NOUN`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NUM`, `Definite=Def\|Number=Plur\|POS=NOUN`, `Case=Gen\|POS=NOUN`, `POS=AUX\|Tense=Pres\|VerbForm=Part` | |
| | **`parser`** | `ROOT`, `acl:relcl`, `advcl`, `advmod`, `advmod:lmod`, `amod`, `appos`, `aux`, `case`, `cc`, `ccomp`, `compound:prt`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `flat`, `iobj`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obl`, `obl:lmod`, `obl:tmod`, `punct`, `xcomp` | |
| | **`ner`** | `LOC`, `MISC`, `ORG`, `PER` | |
|
|
| </details> |
|
|
| ### Accuracy |
|
|
| | Type | Score | |
| | --- | --- | |
| | `TOKEN_ACC` | 99.92 | |
| | `TOKEN_P` | 99.70 | |
| | `TOKEN_R` | 99.77 | |
| | `TOKEN_F` | 99.74 | |
| | `SENTS_P` | 100.00 | |
| | `SENTS_R` | 100.00 | |
| | `SENTS_F` | 100.00 | |
| | `TAG_ACC` | 99.14 | |
| | `POS_ACC` | 99.08 | |
| | `MORPH_ACC` | 98.80 | |
| | `MORPH_MICRO_P` | 99.45 | |
| | `MORPH_MICRO_R` | 99.32 | |
| | `MORPH_MICRO_F` | 99.39 | |
| | `DEP_UAS` | 92.81 | |
| | `DEP_LAS` | 90.80 | |
| | `ENTS_P` | 88.58 | |
| | `ENTS_R` | 86.20 | |
| | `ENTS_F` | 87.38 | |
| | `LEMMA_ACC` | 95.89 | |
| | `COREF_LEA_F1` | 46.72 | |
| | `COREF_LEA_PRECISION` | 45.91 | |
| | `COREF_LEA_RECALL` | 47.56 | |
| | `NEL_SCORE` | 34.29 | |
| | `NEL_MICRO_P` | 84.00 | |
| | `NEL_MICRO_R` | 21.54 | |
| | `NEL_MICRO_F` | 34.29 | |
| | `NEL_MACRO_P` | 86.71 | |
| | `NEL_MACRO_R` | 24.70 | |
| | `NEL_MACRO_F` | 37.28 | |
|
|
|
|
|
|
| ### Training |
| This model was trained using [spaCy](https://spacy.io) and logged to [Weights & Biases](https://wandb.ai/kenevoldsen/dacy-v0.2.0). You can find all the training logs [here](https://wandb.ai/kenevoldsen/dacy-v0.2.0). |