package normalizer
Ordering
- Alphabetic
Visibility
- Public
- All
Type Members
-
class
DateNormalizer extends AnnotatorModel[DateNormalizer] with HasSimpleAnnotate[DateNormalizer]
Try to normalize dates in chunks annotations.
Try to normalize dates in chunks annotations. The expected format for the date will be YYYY/MM/DD. If the date is normalized then field normalized in metadata will be true else will be false.
Example
Define a pipeline with 2 different NER models with a ChunkMergeApproach at the end
val df = Seq(("08/02/2018"),("11/2018"),("11/01/2018"),("next monday"),("today"),("next week")).toDF("text") val documentAssembler = new DocumentAssembler().setInputCol("text").setOutputCol("document") val chunksDF = documentAssembler .transform(df) .mapAnnotationsCol[Seq[Annotation]]("document", "chunk_date", CHUNK, (aa:Seq[Annotation]) => aa.map( ann => ann.copy(annotatorType = CHUNK) ) ) val dateNormalizerModel = new DateNormalizer() .setInputCols("chunk_date") .setOutputCol("date") .setAnchorDateDay(15) .setAnchorDateMonth(3) .setAnchorDateYear(2000) val dateDf = dateNormalizerModel.transform(chunksDF)
Show results
dateDf.select("chunk_date.result","text").show() +-------------+-----------+ | result| text| +-------------+-----------+ | [08/02/2018]| 08/02/2018| | [11/2018]| 11/2018| | [11/01/2018]| 11/01/2018| |[next monday]|next monday| | [today]| today| | [next week]| next week| +-------------+-----------+
- case class MyCalendar(year: Try[Int], month: Try[Int], day: Try[Int]) extends Product with Serializable
Value Members
- object DateHelper