Spanish NER for Laws and Money

Description

Pretrained Spanish Named Entity Recognition model for detecting laws and monetary ammounts. This model was trained in-house and available annotations of this dataset and weak labelling from this model

Predicted Entities

LAW, MONEY

Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\
.setInputCols(["document"])\
.setOutputCol("sentence")

tokenizer = nlp.Tokenizer() \
.setInputCols("sentence") \
.setOutputCol("token")

tokenClassifier = nlp.RoBertaForTokenClassification.pretrained("legner_law_money", "es", "legal/models") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("ner")

pipeline = nlp.Pipeline(
    stages=[
      documentAssembler, 
      sentenceDetector, 
      tokenizer, 
      tokenClassifier])

text = "La recaudación del ministerio del interior fue de 20,000,000 euros así constatado por el artículo 24 de la Constitución Española."

data = spark.createDataFrame([[""]]).toDF("text")

fitmodel = pipeline.fit(data)

light_model = LightPipeline(fitmodel)

light_result = light_model.fullAnnotate(text)

chunks = []
entities = []

for n in light_result[0]['ner_chunk']:       
    print("{n.result} ({n.metadata['entity']}))

Results

20,000,000 euros (MONEY)
artículo 24 de la Constitución Española (LAW)

Model Information

Model Name: legner_law_money
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [ner]
Language: es
Size: 414.2 MB
Case sensitive: true
Max sentence length: 128

References

This model was trained in-house and available annotations of this dataset and weak labelling from this model

Benchmarking

           label  precision    recall  f1-score   support
             LAW       0.95      0.96      0.96        20
           MONEY       0.98      0.99      0.99       106
        accuracy         -         -       0.98       126
       macro-avg       0.97      0.98      0.97       126
    weighted-avg       0.98      0.99      0.99       126