Description
This model is a Deep Learning Portuguese Named Entity Recognition model for the legal domain, trained using Base Bert Embeddings, and is able to predict the following entities:
- ORGANIZACAO (Organizations)
- JURISPRUDENCIA (Jurisprudence)
- PESSOA (Person)
- TEMPO (Time)
- LOCAL (Location)
- LEGISLACAO (Laws)
- O (Other)
You can find different versions of this model in Models Hub:
- With a Deep Learning architecture (non-transformer) and Base Embeddings;
- With a Deep Learning architecture (non-transformer) and Large Embeddings;
- With a Transformers Architecture and Base Embeddings;
- With a Transformers Architecture and Large Embeddings;
Predicted Entities
PESSOA
, ORGANIZACAO
, LEGISLACAO
, JURISPRUDENCIA
, TEMPO
, LOCAL
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols("sentence")\
.setOutputCol("token")
tokenClassifier = legal.BertForTokenClassification.pretrained("legner_lener_large","pt", "legal/models")\
.setInputCols("token", "sentence")\
.setOutputCol("label")\
.setCaseSensitive(True)
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence","token","label"])\
.setOutputCol("ner_chunk")
pipeline = nlp.Pipeline(
stages=[
documentAssembler,
sentenceDetector,
tokenizer,
tokenClassifier,
ner_converter
]
)
example = spark.createDataFrame(pd.DataFrame({'text': ["""Mediante do exposto , com fundamento nos artigos 32 , i , e 33 , da lei 8.443/1992 , submetem-se os autos à consideração superior , com posterior encaminhamento ao ministério público junto ao tcu e ao gabinete do relator , propondo : a ) conhecer do recurso e , no mérito , negar-lhe provimento ; b ) comunicar ao recorrente , ao superior tribunal militar e ao tribunal regional federal da 2ª região , a fim de fornecer subsídios para os processos judiciais 2001.34.00.024796-9 e 2003.34.00.044227-3 ; e aos demais interessados a deliberação que vier a ser proferida por esta corte ” ."""]}))
result = pipeline.fit(example).transform(example)
Results
+--------------+-------------+----------+
| token| ner_label|confidence|
+--------------+-------------+----------+
| Mediante| O|0.99998903|
| do| O|0.99999386|
| exposto| O|0.99999356|
| ,| O|0.99998516|
| com| O| 0.9999937|
| fundamento| O|0.99998814|
| nos| O|0.99998933|
| artigos| I-TEMPO| 0.9768946|
| 32| B-LOCAL| 0.9833129|
| ,| B-LOCAL| 0.9897361|
| i| B-LOCAL| 0.9860687|
| ,| B-LOCAL|0.99019605|
| e| B-LOCAL| 0.988641|
| 33| B-LOCAL|0.98958844|
| ,| B-LOCAL| 0.989682|
| da| B-LOCAL|0.97983617|
| lei| B-LOCAL| 0.9777896|
| 8.443/1992| B-LOCAL|0.94548935|
| ,| O| 0.9997625|
| submetem-se| O|0.99999225|
| os| O|0.99999356|
| autos| O|0.99999285|
| à| O| 0.9999936|
| consideração| O| 0.9999945|
| superior| O| 0.9999938|
| ,| O|0.99999297|
| com| O| 0.9999949|
| posterior| O|0.99999535|
|encaminhamento| O|0.99999404|
| ao| O| 0.9999939|
| ministério| O|0.99998385|
| público| O|0.99997985|
| junto| O| 0.9999902|
| ao| O| 0.9999913|
| tcu| O| 0.9961068|
| e| O| 0.9999804|
| ao| O|0.99999124|
| gabinete| O| 0.9999747|
| do| O| 0.9999911|
| relator| O|0.99999297|
| ,| O|0.99998975|
| propondo| O| 0.9999942|
| :| O|0.99999416|
| a| O| 0.9999926|
| )| O| 0.999994|
| conhecer| O|0.99999493|
| do| O|0.99999475|
| recurso| O|0.99999416|
| e| O| 0.999994|
| ,| O| 0.9999923|
| no| O| 0.9999945|
| mérito| O|0.99999404|
| ,| O| 0.9999926|
| negar-lhe| O|0.99999475|
| provimento| O|0.99999505|
| ;| O| 0.9999914|
| b| O| 0.9999917|
| )| O| 0.9999943|
| comunicar| O|0.99999446|
| ao| O| 0.9999935|
| recorrente| O| 0.9999941|
| ,| O| 0.9999869|
| ao| O|0.99999243|
| superior| O| 0.9933063|
| tribunal| O|0.83631223|
| militar| O|0.76226306|
| e| O| 0.9988131|
| ao| O| 0.9999895|
| tribunal|B-ORGANIZACAO|0.94998056|
| regional| I-LEGISLACAO|0.91478646|
| federal| I-LEGISLACAO| 0.9775761|
| da| I-LEGISLACAO| 0.9674108|
| 2ª| I-LEGISLACAO| 0.9871655|
| região| I-LEGISLACAO|0.99471426|
| ,| O| 0.9999918|
| a| O|0.99999434|
| fim| O|0.99999356|
| de| O| 0.9999942|
| fornecer| O| 0.9999948|
| subsídios| O|0.99999213|
| para| O| 0.9999924|
| os| O| 0.9999925|
| processos| O|0.99998784|
| judiciais| O| 0.999987|
| 2001| O|0.99967766|
| .| O| 0.998813|
|34.00.024796-9| O| 0.9933802|
| e| O| 0.999508|
| 2003|I-ORGANIZACAO|0.51847184|
| .| O| 0.99998|
|34.00.044227-3| O| 0.9999412|
| ;| O| 0.9999937|
| e| O| 0.9999936|
| aos| O| 0.9999932|
| demais| O| 0.9999952|
| interessados| O|0.99999493|
| a| O| 0.999994|
| deliberação| O| 0.9999939|
| que| O| 0.9999942|
| vier| O| 0.9999944|
| a| O| 0.9999935|
| ser| O| 0.9999951|
| proferida| O| 0.9999954|
| por| O| 0.9999936|
| esta| O|0.99999356|
| corte| O|0.99992704|
| ”| O| 0.9994554|
| .| O| 0.9993955|
+--------------+-------------+----------+
Model Information
Model Name: | legner_lener_large |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token] |
Output Labels: | [ner] |
Language: | pt |
Size: | 1.2 GB |
Case sensitive: | true |
Max sentence length: | 128 |
References
Original texts available in https://paperswithcode.com/sota?task=Token+Classification&dataset=lener_br and in-house data augmentation with weak labelling