Refer to our GitHub page to take a look at the GH Issues, as
the project is yet small. You can create in there your own issues to either work
on them yourself or simply propose them.
Feel free to clone the repository locally and submit pull requests so we can
review them and work together.
Creating your first annotator transformer should not be hard, here are a few guidelines to get you started. Lets assume we want a wrapper annotator, which puts a character surrounding tokens provided by a Tokenizer
uid is utilized for transformer serialization, AnnotatorModel[MyAnnotator] will contain the common annotator logic We need to use standard constructor for java and python compatibility
class WordWrapper(override val uid: String) extends AnnotatorModel[WordWrapper] {
def this() = this(Identifiable.randomUID("WORD_WRAPPER"))
}
We need to establish the type of our annotator, which means its identifier when used by another annotator And also need to define which other annotators we require, for this example, we will work with tokens
import com.johnsnowlabs.nlp.AnnotatorType._
override val annotatorType: AnnotatorType = TOKEN
override val requiredAnnotatorTypes: Array[AnnotatorType] = Array[AnnotatorType](TOKEN)
This annotator is not flexible if we don't provide parameters
protected val character: Param[String] = new Param(this, "character", "this is the character used to wrap a token")
def setCharacter(value: String): this.type = set(pattern, value)
def getCharacter: String = $(pattern)
setDefault(character, "@")
Here is how we act, annotations will automatically provide our required annotations We generally use annotatorType for metadata keys
override def annotate(annotations: Seq[Annotation]): Seq[Annotation] = {
annotations.map(annotation => {
Annotation(
annotatorType,
annotation.begin,
annotation.end,
Map(annotatorType -> $(character) + annotation.result + $(character))
})
}