Licensed Annotators


Spark-NLP Licensed

The following annotators are available by buying a John Snow Labs Spark NLP license. They are mostly meant for healthcare applications but other applications have been made with these NLP features. Check out for more information.


It will classify each clinicaly relevant named entity into its assertion:

type: “present”, “absent”, “hypothetical”, “conditional”, “associated_with_other_person”, etc.

Input types: “sentence”, “ner_chunk”, “embeddings”

Output type: “assertion”


  • setLabelCol(label)
  • setMaxIter(maxiter)
  • setReg(lamda)
  • setEnet(enet)
  • setBefore(before)
  • setAfter(after)
  • setStartCol(s)
  • setEndCol(e)
  • setNerCol(n):
  • setTargetNerLabels(v)


It will classify each clinicaly relevant named entity into its assertion type: “present”, “absent”, “hypothetical”, “conditional”, “associated_with_other_person”, etc.

Input types: “sentence”, “ner_chunk”, “embeddings”

Output type: “assertion”


  • setGraphFolder(p)
  • setConfigProtoBytes(b)
  • setLabelCol(label)
  • setStartCol(s)
  • setEndCol(e)
  • setBatchSize(size)
  • setEpochs(number)
  • setLearningRate(lamda)
  • setDropout(rate)
  • setMaxSentLen(length):


Assigns a ICD10 (International Classification of Diseases version 10) code to chunks identified as “PROBLEMS” by the NER Clinical Model.

Input types: “ner_chunk_tokenized”, “embeddings”

Output type: “resolution_cm”


  • setSearchTree(s)
  • setNeighbours(k)
  • setThreshold(dist)
  • setMergeChunks(merge)
  • setMissAsEmpty(value)


Identifies potential pieces of content with personal information about patients and remove them by replacing with semantic tags.

Input types: “sentence”, “token”, “ner_chunk”

Output type: “deidentified”


  • setRegexPatternsDictionary(path, read_as, options)


This spell checker utilizes tensorflow to do context based spell checking. At this moment, this annotator cannot be trained from Spark NLP. We are providing pretrained models only, for now.
Output type: Token
Input types: Tokenizer


Consumes a pageMatrixCol given by OCR and returns chunks location in the original source file.

Output type: CHUNK Input types: CHUNK Functions:

  • setPageMatrixCol(string)
  • setMatchingWindows(int): In text window around target coordinates. Improves precision when dealing with noisy documents
  • setWindowPageTolerance(bool): Increases precision on noisy documents by increasing tolerance on multi page files.


[1] Speech and Language Processing. Daniel Jurafsky & James H. Martin. 2018

Last updated