Extract aspects and entities from airline questions (ATIS dataset)

Description

Extract important aspects from user questions like ‘arrive_date’ - arrival time of a flight, ‘fare_amount’ - fare information, and 77 other different types of aspects and entities to automate the process of question answering.

Predicted Entities

aircraft_code, airline_code, airline_name, airport_code, airport_name, arrive_date.date_relative, arrive_date.day_name, arrive_date.day_number, arrive_date.month_name, arrive_date.today_relative, arrive_time.end_time, arrive_time.period_mod, arrive_time.period_of_day, arrive_time.start_time, arrive_time.time, arrive_time.time_relative, city_name, class_type, connect, cost_relative, day_name, day_number, days_code, depart_date.date_relative, depart_date.day_name, depart_date.day_number, depart_date.month_name, depart_date.today_relative, depart_date.year, depart_time.end_time, depart_time.period_mod, depart_time.period_of_day, depart_time.start_time, depart_time.time, depart_time.time_relative, economy, fare_amount, fare_basis_code, flight_days, flight_mod, flight_number, flight_stop, flight_time, fromloc.airport_code, fromloc.airport_name, fromloc.city_name, fromloc.state_code, fromloc.state_name, meal, meal_code, meal_description, mod, month_name, or, period_of_day, restriction_code, return_date.date_relative, return_date.day_name, return_date.day_number, return_date.month_name, return_date.today_relative, return_time.period_mod, return_time.period_of_day, round_trip, state_code, state_name, stoploc.airport_name, stoploc.city_name, stoploc.state_code, time, time_relative, today_relative, toloc.airport_code, toloc.airport_name, toloc.city_name, toloc.country_name, toloc.state_code, toloc.state_name, transport_type.

Open in Colab Download

How to use

...
embeddings = WordEmbeddingsModel.pretrained("glove_840B_300", "xx")\
          .setInputCols("document", "token") \
          .setOutputCol("embeddings")
ner = NerDLModel.pretrained("nerdl_atis_840b_300d", "en") \
        .setInputCols(["document", "token", "embeddings"]) \
        .setOutputCol("ner")
...
pipeline = Pipeline(stages=[document_assembler, tokenizer, embeddings, ner, ner_converter])
example = spark.createDataFrame(pd.DataFrame({'text': ["i want to fly from baltimore to dallas round trip"]}))
result = pipeline.fit(example).transform(example)
...
val embeddings = WordEmbeddingsModel.pretrained("glove_840B_300", "xx")
          .setInputCols(Array("document", "token")) 
          .setOutputCol("embeddings")
val ner = NerDLModel.pretrained("nerdl_atis_840b_300d", "en")
        .setInputCols(Array("document", "token", "embeddings"))
        .setOutputCol("ner")
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings, ner, ner_converter))
val result = pipeline.fit(Seq.empty["i want to fly from baltimore to dallas round trip"].toDS.toDF("text")).transform(data)

Results

+-------------------+---------------------+
| ner_chunk         | label      	  |
+-------------------+---------------------+
| baltimore         | fromloc.city_name   |
| dallas            | toloc.city_name  	  |
| round trip        | round_trip     	  |
+-------------------+---------------------+

Model Information

Model Name: nerdl_atis_840b_300d
Type: ner
Compatibility: Spark NLP 2.7.1+
License: Open Source
Edition: Official
Input Labels: [document, token, embeddings]
Output Labels: [ner]
Language: en
Dependencies: glove_840B_300

Data Source

This model is trained on the ATIS dataset https://www.kaggle.com/hassanamin/atis-airlinetravelinformationsystem

Benchmarking

|     | label                        |    tp |   fp |   fn |     prec |      rec |       f1 |
|----:|-----------------------------:|------:|-----:|-----:|---------:|---------:|---------:|
|   0 | B-airline_name               |   101 |    5 |    0 | 0.95283  | 1        | 0.975845 |
|   1 | B-booking_class              |     0 |    0 |    1 | 0        | 0        | 0        |
|   2 | B-toloc.state_code           |    18 |    0 |    0 | 1        | 1        | 1        |
|   3 | B-arrive_time.period_of_day  |     6 |    2 |    0 | 0.75     | 1        | 0.857143 |
|   4 | B-flight_number              |    11 |    5 |    0 | 0.6875   | 1        | 0.814815 |
|   5 | B-depart_date.year           |     3 |    0 |    0 | 1        | 1        | 1        |
|   6 | I-depart_time.time_relative  |     0 |    0 |    1 | 0        | 0        | 0        |
|   7 | I-depart_time.time           |    52 |    1 |    0 | 0.981132 | 1        | 0.990476 |
|   8 | B-transport_type             |    10 |    0 |    0 | 1        | 1        | 1        |
|   9 | I-depart_date.day_number     |    14 |    1 |    1 | 0.933333 | 0.933333 | 0.933333 |
|  10 | I-city_name                  |    12 |    1 |   18 | 0.923077 | 0.4      | 0.55814  |
|  11 | B-depart_time.end_time       |     2 |    0 |    1 | 1        | 0.666667 | 0.8      |
|  12 | B-depart_date.date_relative  |    17 |    2 |    0 | 0.894737 | 1        | 0.944445 |
|  13 | B-toloc.country_name         |     1 |    0 |    0 | 1        | 1        | 1        |
|  14 | I-depart_time.start_time     |     1 |    0 |    0 | 1        | 1        | 1        |
|  15 | B-flight_stop                |    21 |    0 |    0 | 1        | 1        | 1        |
|  16 | B-airline_code               |    27 |    1 |    7 | 0.964286 | 0.794118 | 0.870968 |
|  17 | B-stoploc.city_name          |    20 |    1 |    0 | 0.952381 | 1        | 0.97561  |
|  18 | I-fromloc.airport_name       |    15 |   19 |    0 | 0.441176 | 1        | 0.612245 |
|  19 | B-round_trip                 |    70 |    0 |    3 | 1        | 0.958904 | 0.979021 |
|  20 | B-day_name                   |     1 |    0 |    1 | 1        | 0.5      | 0.666667 |
|  21 | B-depart_time.time_relative  |    63 |    2 |    2 | 0.969231 | 0.969231 | 0.969231 |
|  22 | B-fromloc.airport_name       |    10 |   12 |    2 | 0.454545 | 0.833333 | 0.588235 |
|  23 | B-meal_code                  |     0 |    1 |    1 | 0        | 0        | 0        |
|  24 | B-fromloc.airport_code       |     5 |    6 |    0 | 0.454545 | 1        | 0.625    |
|  25 | B-arrive_time.start_time     |     8 |    1 |    0 | 0.888889 | 1        | 0.941177 |
|  26 | I-state_name                 |     0 |    0 |    1 | 0        | 0        | 0        |
|  27 | I-arrive_time.time           |    34 |    0 |    1 | 1        | 0.971429 | 0.985507 |
|  28 | B-airport_code               |     3 |    1 |    6 | 0.75     | 0.333333 | 0.461538 |
|  29 | B-depart_time.start_time     |     2 |    0 |    1 | 1        | 0.666667 | 0.8      |
|  30 | B-arrive_time.time_relative  |    27 |    3 |    4 | 0.9      | 0.870968 | 0.885246 |
|  31 | I-cost_relative              |     2 |    0 |    1 | 1        | 0.666667 | 0.8      |
|  32 | B-flight                     |     0 |    0 |    1 | 0        | 0        | 0        |
|  33 | B-toloc.state_name           |    25 |    3 |    3 | 0.892857 | 0.892857 | 0.892857 |
|  34 | I-arrive_time.start_time     |     1 |    0 |    0 | 1        | 1        | 1        |
|  35 | B-compartment                |     0 |    0 |    1 | 0        | 0        | 0        |
|  36 | B-toloc.airport_code         |     4 |    0 |    0 | 1        | 1        | 1        |
|  37 | B-connect                    |     6 |    0 |    0 | 1        | 1        | 1        |
|  38 | I-return_date.date_relative  |     2 |    0 |    1 | 1        | 0.666667 | 0.8      |
|  39 | I-depart_time.end_time       |     2 |    0 |    1 | 1        | 0.666667 | 0.8      |
|  40 | B-meal                       |    16 |    1 |    0 | 0.941176 | 1        | 0.969697 |
|  41 | B-arrive_date.month_name     |     5 |    2 |    1 | 0.714286 | 0.833333 | 0.769231 |
|  42 | B-arrive_date.day_number     |     5 |    2 |    1 | 0.714286 | 0.833333 | 0.769231 |
|  43 | I-toloc.state_name           |     1 |    0 |    0 | 1        | 1        | 1        |
|  44 | B-arrive_date.date_relative  |     2 |    0 |    0 | 1        | 1        | 1        |
|  45 | I-fromloc.state_name         |     1 |    0 |    0 | 1        | 1        | 1        |
|  46 | B-depart_time.period_of_day  |   121 |    1 |    9 | 0.991803 | 0.930769 | 0.960317 |
|  47 | I-toloc.airport_name         |     3 |    2 |    0 | 0.6      | 1        | 0.75     |
|  48 | B-class_type                 |    24 |    1 |    0 | 0.96     | 1        | 0.979592 |
|  49 | B-period_of_day              |     3 |    0 |    1 | 1        | 0.75     | 0.857143 |
|  50 | I-flight_number              |     0 |    0 |    1 | 0        | 0        | 0        |
|  51 | B-stoploc.airport_code       |     0 |    0 |    1 | 0        | 0        | 0        |
|  52 | B-flight_mod                 |    23 |    0 |    1 | 1        | 0.958333 | 0.978723 |
|  53 | I-airport_name               |    13 |    2 |   16 | 0.866667 | 0.448276 | 0.590909 |
|  54 | B-arrive_time.time           |    33 |    2 |    1 | 0.942857 | 0.970588 | 0.956522 |
|  55 | B-arrive_date.day_name       |    11 |    3 |    0 | 0.785714 | 1        | 0.88     |
|  56 | I-restriction_code           |     3 |    0 |    0 | 1        | 1        | 1        |
|  57 | I-flight_time                |     1 |    0 |    0 | 1        | 1        | 1        |
|  58 | B-meal_description           |    10 |    0 |    0 | 1        | 1        | 1        |
|  59 | I-flight_mod                 |     1 |    0 |    5 | 1        | 0.166667 | 0.285714 |
|  60 | B-flight_days                |    10 |    0 |    0 | 1        | 1        | 1        |
|  61 | I-stoploc.city_name          |    10 |    2 |    0 | 0.833333 | 1        | 0.909091 |
|  62 | B-economy                    |     6 |    0 |    0 | 1        | 1        | 1        |
|  63 | I-arrive_date.day_number     |     0 |    1 |    0 | 0        | 0        | 0        |
|  64 | B-toloc.airport_name         |     3 |    1 |    0 | 0.75     | 1        | 0.857143 |
|  65 | I-class_type                 |    17 |    0 |    0 | 1        | 1        | 1        |
|  66 | B-state_code                 |     1 |    0 |    0 | 1        | 1        | 1        |
|  67 | B-aircraft_code              |    27 |    0 |    6 | 1        | 0.818182 | 0.9      |
|  68 | B-days_code                  |     0 |    0 |    1 | 0        | 0        | 0        |
|  69 | B-or                         |     3 |    3 |    0 | 0.5      | 1        | 0.666667 |
|  70 | B-return_date.date_relative  |     1 |    1 |    2 | 0.5      | 0.333333 | 0.4      |
|  71 | B-fromloc.state_name         |    17 |    0 |    0 | 1        | 1        | 1        |
|  72 | B-depart_date.month_name     |    54 |    1 |    2 | 0.981818 | 0.964286 | 0.972973 |
|  73 | B-fromloc.city_name          |   703 |    6 |    1 | 0.991537 | 0.99858  | 0.995046 |
|  74 | B-restriction_code           |     4 |    2 |    0 | 0.666667 | 1        | 0.8      |
|  75 | B-state_name                 |     0 |    0 |    9 | 0        | 0        | 0        |
|  76 | B-city_name                  |    32 |    4 |   25 | 0.888889 | 0.561404 | 0.688172 |
|  77 | I-round_trip                 |    70 |    0 |    1 | 1        | 0.985915 | 0.992908 |
|  78 | I-transport_type             |     0 |    0 |    1 | 0        | 0        | 0        |
|  79 | B-depart_date.day_name       |   210 |    2 |    2 | 0.990566 | 0.990566 | 0.990566 |
|  80 | B-fare_basis_code            |    17 |    4 |    0 | 0.809524 | 1        | 0.894737 |
|  81 | I-arrive_time.end_time       |     8 |    1 |    0 | 0.888889 | 1        | 0.941177 |
|  82 | B-depart_date.day_number     |    53 |    1 |    2 | 0.981482 | 0.963636 | 0.972477 |
|  83 | B-return_date.day_name       |     0 |    0 |    2 | 0        | 0        | 0        |
|  84 | B-toloc.city_name            |   713 |   21 |    3 | 0.97139  | 0.99581  | 0.983448 |
|  85 | I-fare_amount                |     2 |    0 |    0 | 1        | 1        | 1        |
|  86 | B-fromloc.state_code         |    23 |    0 |    0 | 1        | 1        | 1        |
|  87 | B-cost_relative              |    36 |    0 |    1 | 1        | 0.972973 | 0.986301 |
|  88 | I-arrive_time.time_relative  |     0 |    0 |    4 | 0        | 0        | 0        |
|  89 | B-mod                        |     0 |    0 |    2 | 0        | 0        | 0        |
|  90 | B-flight_time                |     1 |    0 |    0 | 1        | 1        | 1        |
|  91 | B-arrive_time.end_time       |     8 |    1 |    0 | 0.888889 | 1        | 0.941177 |
|  92 | B-fare_amount                |     2 |    0 |    0 | 1        | 1        | 1        |
|  93 | I-toloc.city_name            |   263 |   12 |    2 | 0.956364 | 0.992453 | 0.974074 |
|  94 | I-fromloc.city_name          |   176 |    1 |    1 | 0.99435  | 0.99435  | 0.99435  |
|  95 | B-airport_name               |    10 |    2 |   11 | 0.833333 | 0.47619  | 0.606061 |
|  96 | I-airline_name               |    65 |    0 |    0 | 1        | 1        | 1        |
|  97 | I-depart_time.period_of_day  |     0 |    0 |    1 | 0        | 0        | 0        |
|  98 | B-depart_date.today_relative |     8 |    0 |    1 | 1        | 0.888889 | 0.941177 |
|  99 | B-depart_time.time           |    57 |    8 |    0 | 0.876923 | 1        | 0.934426 |
| 100 | B-depart_time.period_mod     |     5 |    0 |    0 | 1        | 1        | 1        |
| 101 | Macro-average                | 3487  | 157  |  176 | 0.768428 | 0.758601 | 0.763483 |
| 102 | Micro-average                | 3487  | 157  |  176 | 0.956916 | 0.951952 | 0.954427 |