Description
This model is a Deep Learning Portuguese Named Entity Recognition model for the legal domain, trained using Base Bert Embeddings, and is able to predict the following entities:
- ORGANIZACAO (Organizations)
- JURISPRUDENCIA (Jurisprudence)
- PESSOA (Person)
- TEMPO (Time)
- LOCAL (Location)
- LEGISLACAO (Laws)
- O (Other)
You can find different versions of this model in Models Hub:
- With a Deep Learning architecture (non-transformer) and Base Embeddings;
- With a Deep Learning architecture (non-transformer) and Large Embeddings;
- With a Transformers Architecture and Base Embeddings;
- With a Transformers Architecture and Large Embeddings;
Predicted Entities
PESSOA
, ORGANIZACAO
, LEGISLACAO
, JURISPRUDENCIA
, TEMPO
, LOCAL
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols("sentence")\
.setOutputCol("token")
tokenClassifier = legal.BertForTokenClassification.pretrained("legner_lener_base","pt", "legal/models")\
.setInputCols("token", "sentence")\
.setOutputCol("label")\
.setCaseSensitive(True)
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence","token","label"])\
.setOutputCol("ner_chunk")
pipeline = nlp.Pipeline(
stages=[
documentAssembler,
sentenceDetector,
tokenizer,
tokenClassifier,
ner_converter
]
)
example = spark.createDataFrame(pd.DataFrame({'text': ["""Mediante do exposto , com fundamento nos artigos 32 , i , e 33 , da lei 8.443/1992 , submetem-se os autos à consideração superior , com posterior encaminhamento ao ministério público junto ao tcu e ao gabinete do relator , propondo : a ) conhecer do recurso e , no mérito , negar-lhe provimento ; b ) comunicar ao recorrente , ao superior tribunal militar e ao tribunal regional federal da 2ª região , a fim de fornecer subsídios para os processos judiciais 2001.34.00.024796-9 e 2003.34.00.044227-3 ; e aos demais interessados a deliberação que vier a ser proferida por esta corte ” ."""]}))
result = pipeline.fit(example).transform(example)
Results
+--------------+---------+----------+
| token|ner_label|confidence|
+--------------+---------+----------+
| Mediante| O|0.99998605|
| do| O| 0.9999868|
| exposto| O|0.99998623|
| ,| O| 0.999987|
| com| O|0.99998677|
| fundamento| O| 0.9999863|
| nos| O|0.99998486|
| artigos| I-TEMPO| 0.9995784|
| 32| B-LOCAL| 0.9998317|
| ,| B-LOCAL|0.99983853|
| i| B-LOCAL| 0.9998391|
| ,| B-LOCAL| 0.999842|
| e| B-LOCAL| 0.9998447|
| 33| B-LOCAL| 0.9998419|
| ,| B-LOCAL| 0.9998423|
| da| B-LOCAL| 0.9998431|
| lei| B-LOCAL| 0.9998434|
| 8.443/1992| B-LOCAL|0.99982893|
| ,| O| 0.9999863|
| submetem-se| O|0.99998677|
| os| O| 0.9999873|
| autos| O|0.99998647|
| à| O|0.99998707|
| consideração| O| 0.9999871|
| superior| O| 0.9999868|
| ,| O|0.99998736|
| com| O| 0.9999876|
| posterior| O|0.99998707|
|encaminhamento| O|0.99998724|
| ao| O|0.99998707|
| ministério| O| 0.9999853|
| público| O| 0.9999854|
| junto| O|0.99998665|
| ao| O|0.99998516|
| tcu| O| 0.9993648|
| e| O|0.99998665|
| ao| O|0.99998677|
| gabinete| O| 0.9999856|
| do| O| 0.9999865|
| relator| O|0.99998575|
| ,| O| 0.9999872|
| propondo| O|0.99998724|
| :| O|0.99998707|
| a| O| 0.9999873|
| )| O| 0.9999873|
| conhecer| O|0.99998724|
| do| O| 0.9999872|
| recurso| O| 0.9999867|
| e| O| 0.9999872|
| ,| O| 0.9999869|
| no| O|0.99998695|
| mérito| O| 0.9999872|
| ,| O| 0.9999873|
| negar-lhe| O| 0.9999875|
| provimento| O|0.99998724|
| ;| O| 0.9999865|
| b| O|0.99998635|
| )| O| 0.9999871|
| comunicar| O| 0.9999869|
| ao| O| 0.9999872|
| recorrente| O| 0.9999854|
| ,| O| 0.999987|
| ao| O| 0.999987|
| superior| O| 0.9999805|
| tribunal| O|0.99998057|
| militar| O| 0.9999655|
| e| O|0.99998677|
| ao| O|0.99998665|
| tribunal| O|0.99996954|
| regional| O| 0.9999731|
| federal| O| 0.9999361|
| da| O| 0.9999758|
| 2ª| O| 0.9999704|
| região| O|0.99994576|
| ,| O| 0.999987|
| a| O| 0.9999872|
| fim| O|0.99998724|
| de| O| 0.999987|
| fornecer| O|0.99998724|
| subsídios| O| 0.9999871|
| para| O| 0.9999867|
| os| O| 0.9999863|
| processos| O| 0.9999849|
| judiciais| O| 0.9999815|
| 2001| O|0.99994475|
| .| O|0.99998444|
|34.00.024796-9| O| 0.9999273|
| e| O| 0.9999757|
| 2003| O| 0.9908976|
| .| O|0.99998164|
|34.00.044227-3| O| 0.9999851|
| ;| O| 0.9999866|
| e| O|0.99998695|
| aos| O| 0.9999869|
| demais| O|0.99998677|
| interessados| O| 0.9999867|
| a| O|0.99998707|
| deliberação| O|0.99998724|
| que| O| 0.9999871|
| vier| O| 0.9999868|
| a| O| 0.9999867|
| ser| O| 0.9999872|
| proferida| O| 0.9999871|
| por| O|0.99998695|
| esta| O|0.99998677|
| corte| O|0.99998224|
| ”| O| 0.9999714|
| .| O|0.99998647|
+--------------+---------+----------+
Model Information
Model Name: | legner_lener_base |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token] |
Output Labels: | [ner] |
Language: | pt |
Size: | 403.3 MB |
Case sensitive: | true |
Max sentence length: | 128 |
References
Original texts available in https://paperswithcode.com/sota?task=Token+Classification&dataset=lener_br and in-house data augmentation with weak labelling