Load & Predict 1 liner
The johnsnowlabs
library provides 2 simple methods with which most NLP tasks can be solved while achieving state-of-the-art
results.
The load and predict method.
when building a load&predict
based model you will follow these steps:
- Pick a model/pipeline/component you want to create from the Namespace
- Call the
model = nlp.load(component)
method which will return an auto-completed pipeline - Call
model.predict('that was easy')
on some String input
These 3 steps can be boiled down to just 1 line
from johnsnowlabs import nlp
nlp.load('sentiment').predict('How does this witchcraft work?')
nlp.load()
defines 18 components types usable in 1-liners, some can be prefixed with .train
for training models
Any of the actions for the component types can be passed as a string to nlp.load() and will return you the default model
for that component type for the English language.
You can further specify your model selection by placing a ‘.’ behind your component selection.
After the ‘.’ you can specify the model you want via specifying a dataset or model version.
See the Models Hub, the Components Namespace
and The load function for more infos.
Component type | nlp.load() base |
---|---|
Named Entity Recognition(NER) | nlp.load('ner') |
Part of Speech (POS) | nlp.load('pos') |
Classifiers | nlp.load('classify') |
Word embeddings | nlp.load('embed') |
Sentence embeddings | nlp.load('embed_sentence') |
Chunk embeddings | nlp.load('embed_chunk') |
Labeled dependency parsers | nlp.load('dep') |
Unlabeled dependency parsers | nlp.load('dep.untyped') |
Legitimatizes | nlp.load('lemma') |
Matchers | nlp.load('match') |
Normalizers | nlp.load('norm') |
Sentence detectors | nlp.load('sentence_detector') |
Chunkers | nlp.load('chunk') |
Spell checkers | nlp.load('spell') |
Stemmers | nlp.load('stem') |
Stopwords cleaners | nlp.load('stopwords') |
Cleaner | nlp.load('clean') |
N-Grams | nlp.load('ngram') |
Tokenizers | nlp.load('tokenize') |
Annotator & PretrainedPipeline based pipelines
You can create Annotator & PretrainedPipeline based pipelines using all the classes
attached to the nlp
module.
nlp.PretrainedPipeline('pipe_name')
gives access to Pretrained Pipelines
from johnsnowlabs import nlp
from pprint import pprint
nlp.start()
explain_document_pipeline = nlp.PretrainedPipeline("explain_document_ml")
annotations = explain_document_pipeline.annotate("We are very happy about SparkNLP")
pprint(annotations)
OUTPUT:
{
'stem': ['we', 'ar', 'veri', 'happi', 'about', 'sparknlp'],
'checked': ['We', 'are', 'very', 'happy', 'about', 'SparkNLP'],
'lemma': ['We', 'be', 'very', 'happy', 'about', 'SparkNLP'],
'document': ['We are very happy about SparkNLP'],
'pos': ['PRP', 'VBP', 'RB', 'JJ', 'IN', 'NNP'],
'token': ['We', 'are', 'very', 'happy', 'about', 'SparkNLP'],
'sentence': ['We are very happy about SparkNLP']
}
Custom Pipes
Alternatively you can compose Annotators into a pipeline which offers the highest degree of customization
from johnsnowlabs import nlp
spark = nlp.start(nlp=False)
pipe = nlp.Pipeline(stages=
[
nlp.DocumentAssembler().setInputCol('text').setOutputCol('doc'),
nlp.Tokenizer().setInputCols('doc').setOutputCol('tok')
])
spark_df = spark.createDataFrame([['Hello NLP World']]).toDF("text")
pipe.fit(spark_df).transform(spark_df).show()