Multilabel Classification of Customer Service (Linguistic features)

Description

This is a Multilabel Text Classification model that can help you classify a chat message from customer service according to linguistic features. The classes are the following:

  • Q - Colloquial variation
  • P - Politeness variation
  • W - Offensive language
  • K - Keyword language
  • B - Basic syntactic structure
  • C - Coordinated syntactic structure
  • I - Interrogative structure
  • M - Morphological variation (plurals, tenses…)
  • L - Lexical variation (synonyms)
  • E - Expanded abbreviations (I’m -> I am, I’d -> I would…)
  • N - Negation
  • Z - Noise phenomena like spelling or punctuation errors

Predicted Entities

B, C, E, I, K, L, M, N, P, Q, W, Z

Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

embeddings = nlp.UniversalSentenceEncoder.pretrained() \
    .setInputCols("document") \
    .setOutputCol("sentence_embeddings")

docClassifier = nlp.MultiClassifierDLModel().load("finmulticlf_customer_service_lin_features", "en", "finance/models")\
    .setInputCols("sentence_embeddings") \
    .setOutputCol("class")

pipeline = nlp.Pipeline().setStages(
      [
        document_assembler,
        embeddings,
        docClassifier
      ]
    )

empty_data = spark.createDataFrame([[""]]).toDF("text")
model = pipeline.fit(empty_data)
light_model = nlp.LightPipeline(model)

result = light_model.annotate("""What do i have to ddo to cancel a Gold account""")

result["class"]

Results

['Q', 'B', 'L', 'Z', 'I']

Model Information

Model Name: finmulticlf_customer_service_lin_features
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: en
Size: 13.0 MB

References

https://github.com/bitext/customer-support-intent-detection-training-dataset

Benchmarking

label         precision  recall  f1-score  support 
B             1.00       1.00    1.00      485     
C             0.79       0.80    0.80      61      
E             0.74       0.89    0.80      44      
I             0.95       0.94    0.94      134     
K             0.96       0.96    0.96      108     
L             0.96       0.97    0.96      402     
M             0.93       0.93    0.93      134     
N             0.90       0.75    0.82      12      
P             0.77       0.90    0.83      30      
Q             0.73       0.68    0.71      212     
W             0.85       0.88    0.87      33      
Z             0.68       0.72    0.70      160     
micro-avg     0.90       0.90    0.90      1815    
macro-avg     0.85       0.87    0.86      1815    
weighted-avg  0.90       0.90    0.90      1815    
samples-avg   0.91       0.92    0.90      1815