Description
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization. This NER model was trained over the XTREME dataset by using WordEmbeddings (glove_840B_300).
This NER model covers a subset of the 40 languages included in XTREME (shown here with their ISO 639-1 code):
af
, ar
, bg
, bn
, de
, el
, en
, es
, et
, eu
, fa
, fi
, fr
, he
, hi
, hu
, id
, it
, ja
, jv
, ka
, kk
, ko
, ml
, mr
, ms
, my
, nl
, pt
, ru
, sw
, ta
, te
, th
, tl
, tr
, ur
, vi
, yo
, and zh
Predicted Entities
- B-LOC
- I-LOC
- B-ORG
- I-ORG
- B-PER
- I-PER
How to use
document_assembler = DocumentAssembler() \
.setInputCol('text') \
.setOutputCol('document')
tokenizer = Tokenizer() \
.setInputCols(['document']) \
.setOutputCol('token')
embeddings = WordEmbeddingsModel\
.pretrained('glove_840B_300', 'xx')\
.setInputCols(["token", "document"])\
.setOutputCol("embeddings")
ner_model = NerDLModel.pretrained('ner_xtreme_glove_840B_300', 'xx') \
.setInputCols(['document', 'token', 'embeddings']) \
.setOutputCol('ner')
ner_converter = NerConverter() \
.setInputCols(['document', 'token', 'ner']) \
.setOutputCol('entities')
pipeline = Pipeline(stages=[
document_assembler,
tokenizer,
embeddings,
ner_model,
ner_converter
])
example = spark.createDataFrame(pd.DataFrame({'text': ['My name is John!']}))
result = pipeline.fit(example).transform(example)
val document_assembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = Tokenizer()
.setInputCols("document")
.setOutputCol("token")
val embeddings = WordEmbeddingsModel.pretrained("glove_840B_300", "xx")
.setInputCols("document", "token")
.setOutputCol("embeddings")
val ner_model = NerDLModel.pretrained("ner_xtreme_glove_840B_300", "xx")
.setInputCols("document"', "token", "embeddings")
.setOutputCol("ner")
val ner_converter = NerConverter()
.setInputCols("document", "token", "ner")
.setOutputCol("entities")
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, embeddings, ner_model, ner_converter))
val result = pipeline.fit(Seq.empty["My name is John!"].toDS.toDF("text")).transform(data)
import nlu
text = ["My name is John!"]
ner_df = nlu.load('xx.ner.ner_xtreme_glove_840B_300').predict(text, output_level='token')
Model Information
Model Name: | ner_xtreme_glove_840B_300 |
Type: | ner |
Compatibility: | Spark NLP 3.1.3+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | xx |
Data Source
https://github.com/google-research/xtreme
Benchmarking
Language by language benchmarks (multi-label classification and CoNLL Eval):
###############################
lang: af
precision recall f1-score support
B-LOC 0.83 0.84 0.83 562
I-ORG 0.88 0.87 0.87 786
I-LOC 0.70 0.60 0.65 198
I-PER 0.90 0.91 0.91 504
B-ORG 0.87 0.81 0.84 569
B-PER 0.90 0.89 0.89 356
micro avg 0.86 0.85 0.85 2975
macro avg 0.84 0.82 0.83 2975
weighted avg 0.86 0.85 0.85 2975
processed 10808 tokens with 1487 phrases; found: 1460 phrases; correct: 1230.
accuracy: 84.50%; (non-O)
accuracy: 94.64%; precision: 84.25%; recall: 82.72%; FB1: 83.47
LOC: precision: 81.15%; recall: 82.74%; FB1: 81.94 573
ORG: precision: 85.18%; recall: 79.79%; FB1: 82.40 533
PER: precision: 87.85%; recall: 87.36%; FB1: 87.61 354
###############################
lang: ar
precision recall f1-score support
B-LOC 0.88 0.71 0.79 3780
I-ORG 0.76 0.87 0.81 10045
I-LOC 0.92 0.80 0.85 9073
I-PER 0.81 0.87 0.84 7937
B-ORG 0.76 0.76 0.76 3629
B-PER 0.82 0.82 0.82 3850
micro avg 0.82 0.82 0.82 38314
macro avg 0.82 0.81 0.81 38314
weighted avg 0.83 0.82 0.82 38314
processed 64347 tokens with 11259 phrases; found: 10564 phrases; correct: 8242.
accuracy: 82.24%; (non-O)
accuracy: 87.23%; precision: 78.02%; recall: 73.20%; FB1: 75.53
LOC: precision: 85.80%; recall: 69.23%; FB1: 76.63 3050
ORG: precision: 71.82%; recall: 72.28%; FB1: 72.05 3652
PER: precision: 77.73%; recall: 77.97%; FB1: 77.85 3862
###############################
lang: bg
precision recall f1-score support
B-LOC 0.89 0.91 0.90 6436
I-ORG 0.82 0.87 0.85 7964
I-LOC 0.85 0.78 0.82 3213
I-PER 0.89 0.88 0.89 4982
B-ORG 0.79 0.77 0.78 3670
B-PER 0.91 0.86 0.88 3954
micro avg 0.86 0.86 0.86 30219
macro avg 0.86 0.85 0.85 30219
weighted avg 0.86 0.86 0.86 30219
processed 83463 tokens with 14060 phrases; found: 13897 phrases; correct: 11836.
accuracy: 85.80%; (non-O)
accuracy: 93.83%; precision: 85.17%; recall: 84.18%; FB1: 84.67
LOC: precision: 88.08%; recall: 90.23%; FB1: 89.14 6593
ORG: precision: 75.27%; recall: 73.57%; FB1: 74.41 3587
PER: precision: 89.56%; recall: 84.19%; FB1: 86.79 3717
###############################
lang: bn
precision recall f1-score support
B-LOC 0.94 0.82 0.88 393
I-ORG 0.86 0.93 0.89 1031
I-LOC 0.94 0.82 0.88 703
I-PER 0.84 0.92 0.88 731
B-ORG 0.87 0.90 0.88 349
B-PER 0.87 0.91 0.89 347
micro avg 0.88 0.89 0.88 3554
macro avg 0.89 0.88 0.88 3554
weighted avg 0.88 0.89 0.88 3554
processed 4377 tokens with 1089 phrases; found: 1071 phrases; correct: 932.
accuracy: 89.00%; (non-O)
accuracy: 89.10%; precision: 87.02%; recall: 85.58%; FB1: 86.30
LOC: precision: 92.17%; recall: 80.92%; FB1: 86.18 345
ORG: precision: 85.40%; recall: 88.83%; FB1: 87.08 363
PER: precision: 83.75%; recall: 87.61%; FB1: 85.63 363
###############################
lang: de
precision recall f1-score support
B-LOC 0.78 0.77 0.78 4961
I-ORG 0.77 0.76 0.76 6043
I-LOC 0.77 0.58 0.66 2289
I-PER 0.96 0.84 0.89 6792
B-ORG 0.69 0.73 0.71 4157
B-PER 0.96 0.83 0.89 4750
micro avg 0.83 0.77 0.80 28992
macro avg 0.82 0.75 0.78 28992
weighted avg 0.84 0.77 0.80 28992
processed 97646 tokens with 13868 phrases; found: 13393 phrases; correct: 10307.
accuracy: 77.18%; (non-O)
accuracy: 91.95%; precision: 76.96%; recall: 74.32%; FB1: 75.62
LOC: precision: 75.34%; recall: 73.90%; FB1: 74.61 4866
ORG: precision: 64.23%; recall: 68.15%; FB1: 66.13 4411
PER: precision: 92.52%; recall: 80.17%; FB1: 85.90 4116
###############################
lang: el
precision recall f1-score support
B-LOC 0.84 0.84 0.84 4476
I-ORG 0.79 0.88 0.83 6685
I-LOC 0.74 0.54 0.62 1919
I-PER 0.90 0.87 0.88 5392
B-ORG 0.78 0.81 0.79 3655
B-PER 0.89 0.84 0.87 4032
micro avg 0.83 0.83 0.83 26159
macro avg 0.82 0.80 0.81 26159
weighted avg 0.83 0.83 0.83 26159
processed 90666 tokens with 12164 phrases; found: 12083 phrases; correct: 9880.
accuracy: 82.97%; (non-O)
accuracy: 94.09%; precision: 81.77%; recall: 81.22%; FB1: 81.49
LOC: precision: 82.55%; recall: 82.73%; FB1: 82.64 4486
ORG: precision: 75.29%; recall: 77.95%; FB1: 76.60 3784
PER: precision: 87.28%; recall: 82.52%; FB1: 84.83 3813
###############################
lang: en
precision recall f1-score support
B-LOC 0.80 0.77 0.78 4657
I-ORG 0.77 0.68 0.72 11607
I-LOC 0.87 0.62 0.72 6447
I-PER 0.93 0.75 0.83 7480
B-ORG 0.75 0.65 0.69 4745
B-PER 0.94 0.82 0.87 4556
micro avg 0.83 0.71 0.77 39492
macro avg 0.84 0.71 0.77 39492
weighted avg 0.84 0.71 0.76 39492
processed 80326 tokens with 13958 phrases; found: 12542 phrases; correct: 9604.
accuracy: 70.66%; (non-O)
accuracy: 84.66%; precision: 76.57%; recall: 68.81%; FB1: 72.48
LOC: precision: 72.53%; recall: 69.47%; FB1: 70.97 4460
ORG: precision: 67.03%; recall: 58.02%; FB1: 62.20 4107
PER: precision: 90.97%; recall: 79.37%; FB1: 84.77 3975
###############################
lang: es
precision recall f1-score support
B-LOC 0.94 0.85 0.89 4725
I-ORG 0.84 0.91 0.87 11371
I-LOC 0.90 0.73 0.81 6601
I-PER 0.95 0.86 0.91 7004
B-ORG 0.80 0.89 0.84 3576
B-PER 0.96 0.88 0.92 3959
micro avg 0.89 0.86 0.87 37236
macro avg 0.90 0.85 0.87 37236
weighted avg 0.89 0.86 0.87 37236
processed 64727 tokens with 12260 phrases; found: 11855 phrases; correct: 10412.
accuracy: 85.65%; (non-O)
accuracy: 91.26%; precision: 87.83%; recall: 84.93%; FB1: 86.35
LOC: precision: 91.20%; recall: 82.29%; FB1: 86.52 4263
ORG: precision: 78.06%; recall: 86.24%; FB1: 81.94 3951
PER: precision: 94.48%; recall: 86.89%; FB1: 90.53 3641
###############################
lang: et
precision recall f1-score support
B-LOC 0.82 0.82 0.82 5888
I-ORG 0.85 0.76 0.80 5731
I-LOC 0.71 0.73 0.72 2467
I-PER 0.95 0.86 0.90 5471
B-ORG 0.82 0.70 0.75 3875
B-PER 0.96 0.85 0.90 4129
micro avg 0.86 0.79 0.83 27561
macro avg 0.85 0.79 0.82 27561
weighted avg 0.86 0.79 0.83 27561
processed 80485 tokens with 13892 phrases; found: 12865 phrases; correct: 10397.
accuracy: 79.45%; (non-O)
accuracy: 91.98%; precision: 80.82%; recall: 74.84%; FB1: 77.71
LOC: precision: 75.94%; recall: 75.75%; FB1: 75.84 5873
ORG: precision: 75.55%; recall: 64.83%; FB1: 69.78 3325
PER: precision: 93.40%; recall: 82.95%; FB1: 87.87 3667
###############################
lang: eu
precision recall f1-score support
B-LOC 0.84 0.88 0.86 5682
I-ORG 0.86 0.75 0.80 5560
I-LOC 0.75 0.78 0.77 2876
I-PER 0.95 0.88 0.91 5449
B-ORG 0.81 0.72 0.77 3669
B-PER 0.94 0.83 0.88 4108
micro avg 0.87 0.82 0.84 27344
macro avg 0.86 0.81 0.83 27344
weighted avg 0.87 0.82 0.84 27344
processed 90661 tokens with 13459 phrases; found: 12843 phrases; correct: 10619.
accuracy: 81.56%; (non-O)
accuracy: 93.91%; precision: 82.68%; recall: 78.90%; FB1: 80.75
LOC: precision: 80.69%; recall: 84.72%; FB1: 82.66 5966
ORG: precision: 76.70%; recall: 68.08%; FB1: 72.13 3257
PER: precision: 91.35%; recall: 80.50%; FB1: 85.58 3620
###############################
lang: fa
precision recall f1-score support
B-LOC 0.91 0.81 0.86 3663
I-ORG 0.89 0.92 0.91 13255
I-LOC 0.92 0.85 0.88 8547
I-PER 0.85 0.87 0.86 7900
B-ORG 0.85 0.84 0.85 3535
B-PER 0.84 0.84 0.84 3544
micro avg 0.88 0.87 0.88 40444
macro avg 0.88 0.86 0.87 40444
weighted avg 0.88 0.87 0.88 40444
processed 59491 tokens with 10742 phrases; found: 10313 phrases; correct: 8793.
accuracy: 87.31%; (non-O)
accuracy: 90.55%; precision: 85.26%; recall: 81.86%; FB1: 83.52
LOC: precision: 89.56%; recall: 79.88%; FB1: 84.44 3267
ORG: precision: 83.85%; recall: 83.14%; FB1: 83.49 3505
PER: precision: 82.69%; recall: 82.62%; FB1: 82.65 3541
###############################
lang: fi
precision recall f1-score support
B-LOC 0.82 0.84 0.83 5629
I-ORG 0.84 0.79 0.81 5522
I-LOC 0.56 0.56 0.56 1096
I-PER 0.97 0.89 0.93 5437
B-ORG 0.79 0.69 0.74 4180
B-PER 0.97 0.88 0.92 4745
micro avg 0.86 0.81 0.84 26609
macro avg 0.83 0.78 0.80 26609
weighted avg 0.87 0.81 0.84 26609
processed 83660 tokens with 14554 phrases; found: 13685 phrases; correct: 11218.
accuracy: 81.22%; (non-O)
accuracy: 93.00%; precision: 81.97%; recall: 77.08%; FB1: 79.45
LOC: precision: 77.73%; recall: 79.13%; FB1: 78.42 5730
ORG: precision: 72.27%; recall: 63.47%; FB1: 67.58 3671
PER: precision: 95.96%; recall: 86.64%; FB1: 91.06 4284
###############################
lang: fr
precision recall f1-score support
B-LOC 0.90 0.79 0.84 4985
I-ORG 0.82 0.85 0.83 10386
I-LOC 0.89 0.72 0.79 5859
I-PER 0.95 0.81 0.87 6528
B-ORG 0.80 0.83 0.81 3885
B-PER 0.96 0.85 0.90 4499
micro avg 0.88 0.81 0.84 36142
macro avg 0.89 0.81 0.84 36142
weighted avg 0.88 0.81 0.84 36142
processed 68754 tokens with 13369 phrases; found: 12405 phrases; correct: 10621.
accuracy: 81.03%; (non-O)
accuracy: 89.43%; precision: 85.62%; recall: 79.44%; FB1: 82.42
LOC: precision: 86.39%; recall: 76.05%; FB1: 80.89 4388
ORG: precision: 75.96%; recall: 78.74%; FB1: 77.33 4027
PER: precision: 94.51%; recall: 83.82%; FB1: 88.84 3990
###############################
lang: he
precision recall f1-score support
B-LOC 0.81 0.67 0.73 5160
I-ORG 0.67 0.71 0.69 6907
I-LOC 0.73 0.56 0.63 3133
I-PER 0.76 0.83 0.79 6816
B-ORG 0.69 0.59 0.64 4142
B-PER 0.75 0.78 0.77 4396
micro avg 0.73 0.71 0.72 30554
macro avg 0.74 0.69 0.71 30554
weighted avg 0.74 0.71 0.72 30554
processed 85422 tokens with 13698 phrases; found: 12333 phrases; correct: 8741.
accuracy: 70.75%; (non-O)
accuracy: 87.43%; precision: 70.87%; recall: 63.81%; FB1: 67.16
LOC: precision: 78.18%; recall: 64.36%; FB1: 70.60 4248
ORG: precision: 62.20%; recall: 53.31%; FB1: 57.41 3550
PER: precision: 70.83%; recall: 73.07%; FB1: 71.93 4535
###############################
lang: hi
precision recall f1-score support
B-LOC 0.84 0.71 0.77 414
I-ORG 0.79 0.84 0.81 1123
I-LOC 0.80 0.55 0.65 398
I-PER 0.74 0.83 0.78 598
B-ORG 0.76 0.79 0.77 364
B-PER 0.82 0.82 0.82 450
micro avg 0.79 0.78 0.78 3347
macro avg 0.79 0.76 0.77 3347
weighted avg 0.79 0.78 0.78 3347
processed 6005 tokens with 1228 phrases; found: 1183 phrases; correct: 900.
accuracy: 77.92%; (non-O)
accuracy: 85.15%; precision: 76.08%; recall: 73.29%; FB1: 74.66
LOC: precision: 79.60%; recall: 67.87%; FB1: 73.27 353
ORG: precision: 72.11%; recall: 75.27%; FB1: 73.66 380
PER: precision: 76.67%; recall: 76.67%; FB1: 76.67 450
###############################
lang: hu
precision recall f1-score support
B-LOC 0.84 0.86 0.85 5671
I-ORG 0.81 0.82 0.81 5341
I-LOC 0.78 0.73 0.75 2404
I-PER 0.96 0.87 0.91 5501
B-ORG 0.81 0.77 0.79 3982
B-PER 0.96 0.86 0.91 4510
micro avg 0.87 0.83 0.85 27409
macro avg 0.86 0.82 0.84 27409
weighted avg 0.87 0.83 0.85 27409
processed 90302 tokens with 14163 phrases; found: 13631 phrases; correct: 11348.
accuracy: 82.81%; (non-O)
accuracy: 93.83%; precision: 83.25%; recall: 80.12%; FB1: 81.66
LOC: precision: 80.91%; recall: 83.09%; FB1: 81.98 5824
ORG: precision: 75.76%; recall: 71.82%; FB1: 73.74 3775
PER: precision: 93.65%; recall: 83.73%; FB1: 88.41 4032
###############################
lang: id
precision recall f1-score support
B-LOC 0.92 0.91 0.92 3745
I-ORG 0.87 0.90 0.89 8584
I-LOC 0.95 0.94 0.94 7809
I-PER 0.96 0.85 0.90 6520
B-ORG 0.86 0.87 0.87 3733
B-PER 0.96 0.86 0.91 3969
micro avg 0.92 0.89 0.91 34360
macro avg 0.92 0.89 0.90 34360
weighted avg 0.92 0.89 0.91 34360
processed 61834 tokens with 11447 phrases; found: 11094 phrases; correct: 9903.
accuracy: 89.21%; (non-O)
accuracy: 93.41%; precision: 89.26%; recall: 86.51%; FB1: 87.87
LOC: precision: 90.17%; recall: 89.88%; FB1: 90.02 3733
ORG: precision: 83.39%; recall: 84.89%; FB1: 84.14 3800
PER: precision: 94.58%; recall: 84.86%; FB1: 89.46 3561
###############################
lang: it
precision recall f1-score support
B-LOC 0.91 0.77 0.83 4820
I-ORG 0.83 0.82 0.83 9222
I-LOC 0.85 0.62 0.72 4366
I-PER 0.96 0.87 0.92 5794
B-ORG 0.80 0.81 0.81 4087
B-PER 0.97 0.90 0.93 4842
micro avg 0.88 0.81 0.84 33131
macro avg 0.89 0.80 0.84 33131
weighted avg 0.88 0.81 0.84 33131
processed 80871 tokens with 13749 phrases; found: 12679 phrases; correct: 10917.
accuracy: 80.69%; (non-O)
accuracy: 91.42%; precision: 86.10%; recall: 79.40%; FB1: 82.62
LOC: precision: 86.40%; recall: 73.05%; FB1: 79.17 4075
ORG: precision: 75.26%; recall: 76.29%; FB1: 75.77 4143
PER: precision: 95.90%; recall: 88.35%; FB1: 91.97 4461
###############################
lang: ja
precision recall f1-score support
B-LOC 0.83 0.51 0.64 5094
I-ORG 0.56 0.56 0.56 24814
I-LOC 0.83 0.50 0.62 17278
I-PER 0.84 0.50 0.63 21756
B-ORG 0.55 0.54 0.55 4267
B-PER 0.80 0.55 0.65 4085
micro avg 0.69 0.52 0.60 77294
macro avg 0.73 0.53 0.61 77294
weighted avg 0.73 0.52 0.60 77294
processed 306959 tokens with 13976 phrases; found: 10196 phrases; correct: 6870.
accuracy: 52.43%; (non-O)
accuracy: 86.02%; precision: 67.38%; recall: 49.16%; FB1: 56.84
LOC: precision: 80.60%; recall: 49.01%; FB1: 60.96 3134
ORG: precision: 52.36%; recall: 47.55%; FB1: 49.84 4236
PER: precision: 75.23%; recall: 51.14%; FB1: 60.89 2826
###############################
lang: jv
precision recall f1-score support
B-LOC 0.88 0.85 0.86 52
I-ORG 0.71 0.76 0.74 66
I-LOC 0.67 0.70 0.68 43
I-PER 0.84 0.70 0.77 44
B-ORG 0.74 0.78 0.76 40
B-PER 0.90 0.72 0.80 25
micro avg 0.77 0.76 0.76 270
macro avg 0.79 0.75 0.77 270
weighted avg 0.78 0.76 0.77 270
processed 678 tokens with 117 phrases; found: 112 phrases; correct: 90.
accuracy: 75.56%; (non-O)
accuracy: 88.94%; precision: 80.36%; recall: 76.92%; FB1: 78.60
LOC: precision: 84.00%; recall: 80.77%; FB1: 82.35 50
ORG: precision: 73.81%; recall: 77.50%; FB1: 75.61 42
PER: precision: 85.00%; recall: 68.00%; FB1: 75.56 20
###############################
lang: ka
precision recall f1-score support
B-LOC 0.82 0.72 0.77 5288
I-ORG 0.84 0.83 0.83 7800
I-LOC 0.72 0.59 0.65 2191
I-PER 0.80 0.88 0.84 4666
B-ORG 0.79 0.66 0.72 3807
B-PER 0.80 0.81 0.80 3962
micro avg 0.81 0.77 0.79 27714
macro avg 0.79 0.75 0.77 27714
weighted avg 0.81 0.77 0.79 27714
processed 81921 tokens with 13057 phrases; found: 11833 phrases; correct: 9017.
accuracy: 77.30%; (non-O)
accuracy: 90.06%; precision: 76.20%; recall: 69.06%; FB1: 72.45
LOC: precision: 78.11%; recall: 68.68%; FB1: 73.09 4650
ORG: precision: 73.14%; recall: 61.15%; FB1: 66.61 3183
PER: precision: 76.42%; recall: 77.16%; FB1: 76.79 4000
###############################
lang: kk
precision recall f1-score support
B-LOC 0.76 0.79 0.77 383
I-ORG 0.63 0.80 0.70 592
I-LOC 0.78 0.49 0.60 210
I-PER 0.84 0.84 0.84 466
B-ORG 0.63 0.61 0.62 355
B-PER 0.82 0.74 0.78 377
micro avg 0.72 0.74 0.73 2383
macro avg 0.74 0.71 0.72 2383
weighted avg 0.73 0.74 0.73 2383
processed 7936 tokens with 1115 phrases; found: 1081 phrases; correct: 744.
accuracy: 74.11%; (non-O)
accuracy: 89.57%; precision: 68.83%; recall: 66.73%; FB1: 67.76
LOC: precision: 74.62%; recall: 77.55%; FB1: 76.06 398
ORG: precision: 52.48%; recall: 50.70%; FB1: 51.58 343
PER: precision: 78.53%; recall: 70.82%; FB1: 74.48 340
###############################
lang: ko
precision recall f1-score support
B-LOC 0.88 0.81 0.85 5855
I-ORG 0.75 0.81 0.78 5437
I-LOC 0.79 0.82 0.80 2712
I-PER 0.74 0.84 0.79 3468
B-ORG 0.81 0.66 0.72 4319
B-PER 0.76 0.80 0.78 4249
micro avg 0.79 0.79 0.79 26040
macro avg 0.79 0.79 0.79 26040
weighted avg 0.79 0.79 0.79 26040
processed 80841 tokens with 14423 phrases; found: 13369 phrases; correct: 10274.
accuracy: 78.94%; (non-O)
accuracy: 90.59%; precision: 76.85%; recall: 71.23%; FB1: 73.93
LOC: precision: 83.32%; recall: 77.23%; FB1: 80.16 5427
ORG: precision: 71.82%; recall: 58.35%; FB1: 64.38 3509
PER: precision: 72.91%; recall: 76.06%; FB1: 74.45 4433
###############################
lang: ml
precision recall f1-score support
B-LOC 0.86 0.60 0.70 443
I-ORG 0.80 0.88 0.83 774
I-LOC 0.80 0.32 0.46 219
I-PER 0.73 0.85 0.78 492
B-ORG 0.74 0.71 0.72 354
B-PER 0.72 0.80 0.76 407
micro avg 0.77 0.75 0.76 2689
macro avg 0.77 0.69 0.71 2689
weighted avg 0.78 0.75 0.75 2689
processed 6727 tokens with 1204 phrases; found: 1101 phrases; correct: 810.
accuracy: 74.64%; (non-O)
accuracy: 87.88%; precision: 73.57%; recall: 67.28%; FB1: 70.28
LOC: precision: 83.06%; recall: 57.56%; FB1: 68.00 307
ORG: precision: 70.06%; recall: 68.08%; FB1: 69.05 344
PER: precision: 69.78%; recall: 77.15%; FB1: 73.28 450
###############################
lang: mr
precision recall f1-score support
B-LOC 0.84 0.69 0.76 525
I-ORG 0.78 0.91 0.84 852
I-LOC 0.67 0.49 0.57 258
I-PER 0.85 0.79 0.82 598
B-ORG 0.78 0.74 0.76 364
B-PER 0.82 0.79 0.80 375
micro avg 0.80 0.78 0.79 2972
macro avg 0.79 0.74 0.76 2972
weighted avg 0.80 0.78 0.78 2972
processed 7356 tokens with 1264 phrases; found: 1142 phrases; correct: 900.
accuracy: 77.59%; (non-O)
accuracy: 88.93%; precision: 78.81%; recall: 71.20%; FB1: 74.81
LOC: precision: 81.57%; recall: 67.43%; FB1: 73.83 434
ORG: precision: 73.85%; recall: 70.60%; FB1: 72.19 348
PER: precision: 80.28%; recall: 77.07%; FB1: 78.64 360
###############################
lang: ms
precision recall f1-score support
B-LOC 0.92 0.94 0.93 367
I-ORG 0.86 0.89 0.87 913
I-LOC 0.97 0.95 0.96 898
I-PER 0.95 0.78 0.86 555
B-ORG 0.82 0.85 0.84 375
B-PER 0.95 0.84 0.89 373
micro avg 0.91 0.88 0.90 3481
macro avg 0.91 0.87 0.89 3481
weighted avg 0.91 0.88 0.90 3481
processed 5874 tokens with 1115 phrases; found: 1087 phrases; correct: 952.
accuracy: 88.28%; (non-O)
accuracy: 92.12%; precision: 87.58%; recall: 85.38%; FB1: 86.47
LOC: precision: 91.94%; recall: 93.19%; FB1: 92.56 372
ORG: precision: 78.81%; recall: 81.33%; FB1: 80.05 387
PER: precision: 92.99%; recall: 81.77%; FB1: 87.02 328
###############################
lang: my
precision recall f1-score support
B-LOC 0.55 0.38 0.45 56
I-ORG 0.63 0.49 0.55 68
I-LOC 0.50 0.75 0.60 4
I-PER 0.30 0.67 0.41 46
B-ORG 0.84 0.48 0.62 33
B-PER 0.31 0.53 0.40 30
micro avg 0.44 0.51 0.47 237
macro avg 0.52 0.55 0.50 237
weighted avg 0.54 0.51 0.49 237
processed 756 tokens with 119 phrases; found: 108 phrases; correct: 46.
accuracy: 50.63%; (non-O)
accuracy: 74.07%; precision: 42.59%; recall: 38.66%; FB1: 40.53
LOC: precision: 55.26%; recall: 37.50%; FB1: 44.68 38
ORG: precision: 68.42%; recall: 39.39%; FB1: 50.00 19
PER: precision: 23.53%; recall: 40.00%; FB1: 29.63 51
###############################
lang: nl
precision recall f1-score support
B-LOC 0.90 0.85 0.87 5133
I-ORG 0.83 0.79 0.81 6693
I-LOC 0.90 0.68 0.77 3662
I-PER 0.96 0.86 0.91 6371
B-ORG 0.82 0.80 0.81 3908
B-PER 0.96 0.88 0.92 4684
micro avg 0.89 0.82 0.85 30451
macro avg 0.89 0.81 0.85 30451
weighted avg 0.89 0.82 0.85 30451
processed 85122 tokens with 13725 phrases; found: 12947 phrases; correct: 11196.
accuracy: 81.69%; (non-O)
accuracy: 92.93%; precision: 86.48%; recall: 81.57%; FB1: 83.95
LOC: precision: 86.27%; recall: 81.01%; FB1: 83.55 4820
ORG: precision: 78.22%; recall: 76.92%; FB1: 77.56 3843
PER: precision: 94.12%; recall: 86.08%; FB1: 89.92 4284
###############################
lang: pt
precision recall f1-score support
B-LOC 0.92 0.85 0.89 4779
I-ORG 0.83 0.88 0.85 10542
I-LOC 0.89 0.71 0.79 6467
I-PER 0.96 0.81 0.88 7310
B-ORG 0.81 0.85 0.83 3753
B-PER 0.96 0.85 0.90 4291
micro avg 0.88 0.83 0.85 37142
macro avg 0.89 0.82 0.86 37142
weighted avg 0.89 0.83 0.85 37142
processed 63647 tokens with 12823 phrases; found: 12187 phrases; correct: 10475.
accuracy: 82.53%; (non-O)
accuracy: 89.27%; precision: 85.95%; recall: 81.69%; FB1: 83.77
LOC: precision: 87.47%; recall: 81.04%; FB1: 84.13 4428
ORG: precision: 77.42%; recall: 81.32%; FB1: 79.32 3942
PER: precision: 93.00%; recall: 82.73%; FB1: 87.57 3817
###############################
lang: ru
precision recall f1-score support
B-LOC 0.75 0.81 0.78 4560
I-ORG 0.78 0.83 0.80 8008
I-LOC 0.63 0.69 0.66 3060
I-PER 0.93 0.83 0.88 7544
B-ORG 0.78 0.72 0.75 4074
B-PER 0.90 0.79 0.84 3543
micro avg 0.80 0.79 0.80 30789
macro avg 0.79 0.78 0.78 30789
weighted avg 0.81 0.79 0.80 30789
processed 71288 tokens with 12177 phrases; found: 11798 phrases; correct: 9093.
accuracy: 79.34%; (non-O)
accuracy: 89.47%; precision: 77.07%; recall: 74.67%; FB1: 75.85
LOC: precision: 74.10%; recall: 79.36%; FB1: 76.64 4884
ORG: precision: 71.75%; recall: 66.40%; FB1: 68.97 3770
PER: precision: 88.07%; recall: 78.15%; FB1: 82.82 3144
###############################
lang: sw
precision recall f1-score support
B-LOC 0.86 0.84 0.85 388
I-ORG 0.79 0.85 0.82 763
I-LOC 0.80 0.67 0.73 568
I-PER 0.97 0.87 0.92 744
B-ORG 0.84 0.88 0.86 374
B-PER 0.97 0.88 0.92 432
micro avg 0.87 0.83 0.85 3269
macro avg 0.87 0.83 0.85 3269
weighted avg 0.87 0.83 0.85 3269
processed 5786 tokens with 1194 phrases; found: 1161 phrases; correct: 1003.
accuracy: 82.90%; (non-O)
accuracy: 89.63%; precision: 86.39%; recall: 84.00%; FB1: 85.18
LOC: precision: 81.33%; recall: 78.61%; FB1: 79.95 375
ORG: precision: 81.98%; recall: 86.36%; FB1: 84.11 394
PER: precision: 95.66%; recall: 86.81%; FB1: 91.02 392
###############################
lang: ta
precision recall f1-score support
B-LOC 0.81 0.71 0.75 436
I-ORG 0.77 0.83 0.80 814
I-LOC 0.75 0.51 0.61 239
I-PER 0.84 0.91 0.87 615
B-ORG 0.77 0.68 0.72 383
B-PER 0.77 0.85 0.81 422
micro avg 0.79 0.79 0.79 2909
macro avg 0.78 0.75 0.76 2909
weighted avg 0.79 0.79 0.78 2909
processed 7234 tokens with 1241 phrases; found: 1179 phrases; correct: 869.
accuracy: 78.51%; (non-O)
accuracy: 88.98%; precision: 73.71%; recall: 70.02%; FB1: 71.82
LOC: precision: 77.89%; recall: 67.89%; FB1: 72.55 380
ORG: precision: 69.44%; recall: 61.10%; FB1: 65.00 337
PER: precision: 73.38%; recall: 80.33%; FB1: 76.70 462
###############################
lang: te
precision recall f1-score support
B-LOC 0.76 0.51 0.61 450
I-ORG 0.70 0.74 0.72 633
I-LOC 0.71 0.44 0.55 178
I-PER 0.50 0.77 0.61 294
B-ORG 0.61 0.57 0.59 340
B-PER 0.55 0.67 0.61 381
micro avg 0.63 0.64 0.63 2276
macro avg 0.64 0.62 0.61 2276
weighted avg 0.65 0.64 0.63 2276
processed 8155 tokens with 1171 phrases; found: 1083 phrases; correct: 627.
accuracy: 63.75%; (non-O)
accuracy: 85.62%; precision: 57.89%; recall: 53.54%; FB1: 55.63
LOC: precision: 73.91%; recall: 49.11%; FB1: 59.01 299
ORG: precision: 53.89%; recall: 50.88%; FB1: 52.34 321
PER: precision: 50.32%; recall: 61.15%; FB1: 55.21 463
###############################
lang: th
precision recall f1-score support
B-LOC 0.85 0.55 0.67 6503
I-ORG 0.63 0.67 0.65 56831
I-LOC 0.84 0.55 0.66 47608
I-PER 0.86 0.65 0.74 57522
B-ORG 0.50 0.54 0.52 5151
B-PER 0.48 0.60 0.53 5316
micro avg 0.73 0.62 0.67 178931
macro avg 0.69 0.59 0.63 178931
weighted avg 0.76 0.62 0.67 178931
processed 649606 tokens with 20897 phrases; found: 16403 phrases; correct: 11059.
accuracy: 61.94%; (non-O)
accuracy: 87.63%; precision: 67.42%; recall: 52.92%; FB1: 59.30
LOC: precision: 80.18%; recall: 51.38%; FB1: 62.62 4238
ORG: precision: 48.03%; recall: 44.13%; FB1: 46.00 5469
PER: precision: 75.18%; recall: 60.43%; FB1: 67.00 6696
###############################
lang: tl
precision recall f1-score support
B-LOC 0.83 0.90 0.86 327
I-ORG 0.83 0.81 0.82 1045
I-LOC 0.87 0.85 0.86 706
I-PER 0.95 0.84 0.89 813
B-ORG 0.79 0.82 0.81 341
B-PER 0.94 0.86 0.90 366
micro avg 0.87 0.84 0.85 3598
macro avg 0.87 0.85 0.86 3598
weighted avg 0.87 0.84 0.85 3598
processed 4627 tokens with 1034 phrases; found: 1040 phrases; correct: 857.
accuracy: 83.82%; (non-O)
accuracy: 86.73%; precision: 82.40%; recall: 82.88%; FB1: 82.64
LOC: precision: 79.26%; recall: 85.32%; FB1: 82.18 352
ORG: precision: 76.49%; recall: 79.18%; FB1: 77.81 353
PER: precision: 91.94%; recall: 84.15%; FB1: 87.87 335
###############################
lang: tr
precision recall f1-score support
B-LOC 0.87 0.79 0.83 4914
I-ORG 0.78 0.89 0.83 6979
I-LOC 0.83 0.68 0.75 3005
I-PER 0.95 0.84 0.89 5694
B-ORG 0.78 0.82 0.80 4154
B-PER 0.95 0.84 0.89 4519
micro avg 0.85 0.82 0.84 29265
macro avg 0.86 0.81 0.83 29265
weighted avg 0.86 0.82 0.84 29265
processed 75731 tokens with 13587 phrases; found: 12822 phrases; correct: 10708.
accuracy: 82.40%; (non-O)
accuracy: 92.07%; precision: 83.51%; recall: 78.81%; FB1: 81.09
LOC: precision: 84.17%; recall: 76.82%; FB1: 80.33 4485
ORG: precision: 73.45%; recall: 77.25%; FB1: 75.30 4369
PER: precision: 93.85%; recall: 82.41%; FB1: 87.76 3968
###############################
lang: ur
precision recall f1-score support
B-LOC 0.92 0.87 0.90 334
I-ORG 0.86 0.89 0.88 1005
I-LOC 0.91 0.90 0.91 904
I-PER 0.89 0.91 0.90 928
B-ORG 0.86 0.86 0.86 323
B-PER 0.89 0.89 0.89 363
micro avg 0.89 0.89 0.89 3857
macro avg 0.89 0.89 0.89 3857
weighted avg 0.89 0.89 0.89 3857
processed 5027 tokens with 1020 phrases; found: 1003 phrases; correct: 878.
accuracy: 89.50%; (non-O)
accuracy: 90.89%; precision: 87.54%; recall: 86.08%; FB1: 86.80
LOC: precision: 90.54%; recall: 85.93%; FB1: 88.17 317
ORG: precision: 86.07%; recall: 86.07%; FB1: 86.07 323
PER: precision: 86.23%; recall: 86.23%; FB1: 86.23 363
###############################
lang: vi
precision recall f1-score support
B-LOC 0.88 0.86 0.87 3717
I-ORG 0.86 0.86 0.86 13562
I-LOC 0.89 0.85 0.87 8018
I-PER 0.93 0.81 0.87 7787
B-ORG 0.82 0.82 0.82 3704
B-PER 0.92 0.85 0.88 3884
micro avg 0.88 0.84 0.86 40672
macro avg 0.88 0.84 0.86 40672
weighted avg 0.88 0.84 0.86 40672
processed 64967 tokens with 11305 phrases; found: 10904 phrases; correct: 9223.
accuracy: 84.22%; (non-O)
accuracy: 89.37%; precision: 84.58%; recall: 81.58%; FB1: 83.06
LOC: precision: 85.10%; recall: 83.72%; FB1: 84.40 3657
ORG: precision: 78.77%; recall: 78.35%; FB1: 78.56 3684
PER: precision: 90.06%; recall: 82.62%; FB1: 86.18 3563
###############################
lang: yo
precision recall f1-score support
B-LOC 0.73 0.77 0.75 39
I-ORG 0.76 0.79 0.78 87
I-LOC 0.90 0.89 0.90 72
I-PER 0.94 0.72 0.82 71
B-ORG 0.68 0.79 0.73 29
B-PER 0.97 0.70 0.81 43
micro avg 0.83 0.78 0.81 341
macro avg 0.83 0.78 0.80 341
weighted avg 0.84 0.78 0.81 341
processed 503 tokens with 111 phrases; found: 106 phrases; correct: 81.
accuracy: 78.30%; (non-O)
accuracy: 85.09%; precision: 76.42%; recall: 72.97%; FB1: 74.65
LOC: precision: 73.17%; recall: 76.92%; FB1: 75.00 41
ORG: precision: 64.71%; recall: 75.86%; FB1: 69.84 34
PER: precision: 93.55%; recall: 67.44%; FB1: 78.38 31
###############################
lang: zh
precision recall f1-score support
B-LOC 0.83 0.71 0.76 4371
I-ORG 0.75 0.68 0.71 17399
I-LOC 0.85 0.74 0.79 12282
I-PER 0.87 0.77 0.82 12897
B-ORG 0.67 0.65 0.66 3779
B-PER 0.81 0.76 0.79 3899
micro avg 0.80 0.72 0.76 54627
macro avg 0.80 0.72 0.75 54627
weighted avg 0.80 0.72 0.76 54627
processed 207505 tokens with 12532 phrases; found: 11033 phrases; correct: 8345.
accuracy: 72.07%; (non-O)
accuracy: 90.67%; precision: 75.64%; recall: 66.59%; FB1: 70.83
LOC: precision: 80.35%; recall: 66.99%; FB1: 73.06 3730
ORG: precision: 67.63%; recall: 59.85%; FB1: 63.50 3642
PER: precision: 78.80%; recall: 73.17%; FB1: 75.88 3661