Description
IMPORTANT: Don’t run this model on the whole legal agreement. Instead:
- Split by paragraphs. You can use notebook 1 in Finance or Legal as inspiration;
- Use the
legclf_introduction_clause
Text Classifier to select only these paragraphs;
This is a Legal NER Model, aimed to process the first page of the agreements when information can be found about:
- Parties of the contract/agreement;
- Aliases of those parties, or how those parties will be called further on in the document;
- Document Type;
- Effective Date of the agreement;
This model can be used all along with its Relation Extraction model to retrieve the relations between these entities, called legre_contract_doc_parties
Other models can be found to detect other parts of the document, as Headers/Subheaders, Signers, “Will-do”, etc.
This is a md
(medium version) of the classifier, trained with more data and being more resistent to false positives outside the specific section, which may help to run it at whole document level (although not recommended).
Predicted Entities
PARTY
, EFFDATE
, DOC
, ALIAS
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base", "en") \
.setInputCols("sentence", "token") \
.setOutputCol("embeddings")\
ner_model = legal.NerModel.pretrained('legner_contract_doc_parties_md', 'en', 'legal/models')\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence","token","ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
ner_model,
ner_converter])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = ["""
INTELLECTUAL PROPERTY AGREEMENT
This INTELLECTUAL PROPERTY AGREEMENT (this "Agreement"), dated as of December 31, 2018 (the "Effective Date") is entered into by and between Armstrong Flooring, Inc., a Delaware corporation ("Seller") and AFI Licensing LLC, a Delaware limited liability company ("Licensing" and together with Seller, "Arizona") and AHF Holding, Inc. (formerly known as Tarzan HoldCo, Inc.), a Delaware corporation ("Buyer") and Armstrong Hardwood Flooring Company, a Tennessee corporation (the "Company" and together with Buyer the "Buyer Entities") (each of Arizona on the one hand and the Buyer Entities on the other hand, a "Party" and collectively, the "Parties").
"""]
res = model.transform(spark.createDataFrame([text]).toDF("text"))
Results
+------------+---------+
| token|ner_label|
+------------+---------+
|INTELLECTUAL| B-DOC|
| PROPERTY| I-DOC|
| AGREEMENT| I-DOC|
| This| O|
|INTELLECTUAL| B-DOC|
| PROPERTY| I-DOC|
| AGREEMENT| I-DOC|
| (| O|
| this| O|
| "| O|
| Agreement| O|
| "),| O|
| dated| O|
| as| O|
| of| O|
| December|B-EFFDATE|
| 31|I-EFFDATE|
| ,|I-EFFDATE|
| 2018|I-EFFDATE|
| (| O|
| the| O|
| "| O|
| Effective| O|
| Date| O|
| ")| O|
| is| O|
| entered| O|
| into| O|
| by| O|
| and| O|
| between| O|
| Armstrong| B-PARTY|
| Flooring| I-PARTY|
| ,| I-PARTY|
| Inc| I-PARTY|
| .,| O|
| a| O|
| Delaware| O|
| corporation| O|
| ("| O|
| Seller| B-ALIAS|
| ")| O|
| and| O|
| AFI| B-PARTY|
| Licensing| I-PARTY|
| LLC| I-PARTY|
| ,| O|
| a| O|
| Delaware| O|
| limited| O|
| liability| O|
| company| O|
| ("| O|
| Licensing| B-ALIAS|
| "| O|
| and| O|
| together| O|
| with| O|
| Seller| B-ALIAS|
| ,| O|
| "| O|
| Arizona| B-ALIAS|
| ")| O|
| and| O|
| AHF| B-PARTY|
| Holding| I-PARTY|
| ,| I-PARTY|
| Inc| I-PARTY|
| .| O|
| (| O|
| formerly| O|
| known| O|
| as| O|
| Tarzan| O|
| HoldCo| O|
| ,| O|
| Inc| O|
| .),| O|
| a| O|
| Delaware| O|
| corporation| O|
| ("| O|
| Buyer| B-ALIAS|
| ")| O|
| and| O|
| Armstrong| B-PARTY|
| Hardwood| I-PARTY|
| Flooring| I-PARTY|
| Company| I-PARTY|
------------------------
Model Information
Model Name: | legner_contract_doc_parties_md |
Compatibility: | Legal NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.2 MB |
References
Manual annotations on CUAD dataset
Benchmarking
label tp fp fn prec rec f1
I-PARTY 413 57 64 0.8787234 0.8658281 0.8722281
B-EFFDATE 43 2 5 0.95555556 0.8958333 0.92473114
B-DOC 75 7 16 0.91463417 0.82417583 0.867052
I-EFFDATE 138 6 8 0.9583333 0.94520545 0.9517241
I-ALIAS 5 0 4 1.0 0.5555556 0.71428573
I-DOC 176 20 40 0.8979592 0.8148148 0.8543689
I-FORMER_PARTY_NAME 2 0 0 1.0 1.0 1.0
B-PARTY 141 21 34 0.8703704 0.8057143 0.8367952
B-FORMER_PARTY_NAME 1 0 0 1.0 1.0 1.0
B-ALIAS 66 4 11 0.94285715 0.85714287 0.89795923
Macro-average 66 4 11 0.94184333 0.856427 0.8971066
Micro-average 66 4 11 0.9005947 0.85346216 0.87639517