Packages

package coref

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. trait ReadSpanBertCorefTensorflowModel extends ReadTensorflowModel
  2. trait ReadablePretrainedSpanBertCorefModel extends ParamsAndFeaturesReadable[SpanBertCorefModel] with HasPretrained[SpanBertCorefModel]
  3. class SpanBertCorefModel extends AnnotatorModel[SpanBertCorefModel] with HasSimpleAnnotate[SpanBertCorefModel] with WriteTensorflowModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

    A coreference resolution model based on SpanBert

    A coreference resolution model based on SpanBert

    A coreference resolution model identifies expressions which refer to the same entity in a text. For example, given a sentence "John told Mary he would like to borrow a book from her." the model will link "he" to "John" and "her" to "Mary".

    This model is based on SpanBert, which is fine-tuned on the OntoNotes 5.0 data set.

    Pretrained models can be loaded with pretrained of the companion object:

    val dependencyParserApproach = SpanBertCorefModel.pretrained()
      .setInputCols("sentence", "token")
      .setOutputCol("corefs")

    The default model is "spanbert_base_coref", if no name is provided. For available pretrained models please see the Models Hub.

    Sources:

    Example

    import spark.implicits._
    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import com.johnsnowlabs.nlp.annotators.Tokenizer
    import com.johnsnowlabs.nlp.annotators.sbd.pragmatic.SentenceDetector
    import com.johnsnowlabs.nlp.annotators.coref.SpanBertCorefModel*
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val sentence = new SentenceDetector()
      .setInputCols("document")
      .setOutputCol("sentence")
    
    val tokenizer = new Tokenizer()
      .setInputCols("sentence")
      .setOutputCol("token")
    
    val corefResolution = SpanBertCorefModel.pretrained()
      .setInputCols("sentence", "token")
      .setOutputCol("corefs")
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      sentence,
      tokenizer,
      corefResolution
    ))
    
    val data = Seq(
      "John told Mary he would like to borrow a book from her."
    ).toDF("text")
    
    val result = pipeline.fit(data).transform(data)
    
    result.selectExpr(""explode(corefs) AS coref"")
      .selectExpr("coref.result as token", "coref.metadata").show(truncate = false)
    +-----+------------------------------------------------------------------------------------+
    |token|metadata                                                                            |
    +-----+------------------------------------------------------------------------------------+
    |John |{head.sentence -> -1, head -> ROOT, head.begin -> -1, head.end -> -1, sentence -> 0}|
    |he   |{head.sentence -> 0, head -> John, head.begin -> 0, head.end -> 3, sentence -> 0}   |
    |Mary |{head.sentence -> -1, head -> ROOT, head.begin -> -1, head.end -> -1, sentence -> 0}|
    |her  |{head.sentence -> 0, head -> Mary, head.begin -> 10, head.end -> 13, sentence -> 0} |
    +-----+------------------------------------------------------------------------------------+

Value Members

  1. object SpanBertCorefModel extends ReadablePretrainedSpanBertCorefModel with ReadSpanBertCorefTensorflowModel with Serializable

    This is the companion object of SpanBertCorefModel.

    This is the companion object of SpanBertCorefModel. Please refer to that class for the documentation.

Ungrouped