Classifying Proposal Comments

Description

Given a proposal on a socially important issue, the model classifies whether a comment is In_Favor, Against, or Neutral towards the proposal.

Predicted Entities

In_Favor, Neutral, Against

Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler() \
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["document"]) \
    .setOutputCol("token")

classifier = legal.BertForSequenceClassification.pretrained("legclf_bert_support_proposal", "en", "legal/models")\
    .setInputCols(["document", "token"])\
    .setOutputCol("class")

clf_pipeline = nlp.Pipeline(stages=[
    document_assembler, 
    tokenizer,
    classifier    
])

empty_df = spark.createDataFrame([['']]).toDF("text")

model = clf_pipeline.fit(empty_df)

sample_text = ["""This is one of the most boring movies I have ever seen, its horrible. Christopher Lee is good but he is hardly in it, the only good part is the opening scene. Don't be fooled by the title. "End of the World" is truly a bad movie, I stopped watching it close to the end it was so bad, only for die-hard b-movie fans that have the brain to stand this vomit.""",

"""Of course, there is still a lot of possible improvement in the pipeline, but we definitely don't have to wait for some genius new technology to start. Why am I so definitely against this proposal though it sounds so reasonable and helpful? I'm definitely against the notion that we'll have to wait for a new genius industrial technology to show up to even think of starting a proper transformation. In my opinion, the opposite is true: We have to start right now with what we have & by the way develop better concepts of how to use all the technology & methods already available optimally. And for me, nuclear energy which is - by the way - relaunched with this proposal, is definitely not part of the game, not even in the modular mini-nuke version of Mr. Gates. There are people who know much more about renewable energy than Mr. Gates & completely energy independent who hate that book because of this crap.""",

"""One common defense policy would strengthen the voice and influence in our own backyard. A strong EU army can be a stabilizing factor in the unstable regions around our continent. We Europeans should take our safety and defense into our own hands and not rely on the US to do it for us."""
]

test = spark.createDataFrame(pd.DataFrame({"text": sample_text}))

result = model.transform(test)

Results

+--------+--------------------+
|   class|            document|
+--------+--------------------+
| Neutral|This is one of the...|
| Against|Of course, there ...|
|In_Favor|One common defense...|
+--------+--------------------+

Model Information

Model Name: legclf_bert_support_proposal
Compatibility: Legal NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: en
Size: 403.0 MB
Case sensitive: true
Max sentence length: 512

References

Train dataset available here

Benchmarking

label         precision  recall  f1-score  support 
Against       0.84       0.87    0.86      85      
In_Favor      0.87       0.84    0.86      90      
Neutral       0.98       0.98    0.98      57      
accuracy      -          -       0.89      232     
macro-avg     0.90       0.90    0.90      232     
weighted-avg  0.89       0.89    0.89      232