Description
Legal Roberta Named Entity Recognition model in Spanish, able to recognize the following entities:
- LEY: Law
- TRAT_INTL: International Treaty (Agreement)
This model originally trained on scjn dataset, available here and finetuned on internal documents, improving the coverage of the original version, published here.
Predicted Entities
LAW
, TRAT_INTL
How to use
documentAssembler = nlp.DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer() \
.setInputCols("sentence") \
.setOutputCol("token")
tokenClassifier = nlp.RoBertaForTokenClassification.pretrained("legner_laws_treaties","es", "legal/models") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("ner")
pipeline = nlp.Pipeline(
stages=[documentAssembler,
sentenceDetector,
tokenizer,
tokenClassifier])
text = "Sin perjuicio de lo dispuesto en el párrafo b), los requisitos y los efectos de una reivindicación de prioridad presentada conforme al párrafo 1), serán los establecidos en el Artículo 4 del Acta de Estocolmo del Convenio de París para la Protección de la Propiedad Industrial."
data = spark.createDataFrame([[""]]).toDF("text")
fitmodel = pipeline.fit(data)
light_model = LightPipeline(fitmodel)
light_result = light_model.fullAnnotate(text)
chunks = []
entities = []
for n in light_result[0]['ner_chunk']:
print("{n.result} ({n.metadata['entity']}))
Results
para la Protección de la Propiedad Industrial. (TRAT_INTL)
Model Information
Model Name: | legner_laws_treaties |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, token] |
Output Labels: | [ner] |
Language: | es |
Size: | 464.4 MB |
Case sensitive: | true |
Max sentence length: | 128 |
References
This model was originally trained on scjn dataset, available here and finetuned on scrapped documents (as, for example, this one), improving the coverage of the original version, published here.
Benchmarking
label prec rec f1
Macro-average 0.9361195 0.9294152 0.9368145
Micro-average 0.9856711 0.9857456 0.9851656