To use most features you must start a Spark Session with nlp.start()
first.
This will launch a Java Virtual Machine(JVM) process on your machine
which has all of John Snow Labs and Sparks Scala/Java Libraries(JARs) you have access to loaded into memory.
The nlp.start()
method downloads loads and caches all jars for which credentials are provided if they are missing into ~/.jsl_home/java_installs
.
If you have installed via nlp.install()
you can most likely skip the rest of this page, since your secrets have been cached in ~/.jsl_home
and will be re-used.
If you disabled license caching while installing or if you want to tweak settings about your spark session continue reading this section further.
Outputs of running nlp.start()
tell you which jars are loaded and versions of all relevant libraries.
Authorization Flow Parameters
Most of the authorization Flows and Parameters of nlp.install()
are supported.
Review detailed docs here
Parameter | Description | Example | Default |
---|---|---|---|
None |
Load license automatically via one of the Auto-Detection Mechanisms | nlp.start() |
False |
browser_login |
Browser based authorization, Button to click on Notebooks and Browser Pop-Up otherwise. | nlp.start(browser_login=True) |
False |
access_token |
Vist my.johnsnowlabs.com to extract a token which you can provide to enable license access. See Access Token Example | nlp.start(access_token='myToken') |
None |
secrets_file |
Define JSON license file with keys defined by License Variable Overview and provide file path | nlp.start(secrets_file='path/to/license.json') |
None |
store_in_jsl_home |
Disable caching of new licenses to ~./jsl_home |
nlp.start(store_in_jsl_home=False) |
True |
local_license_number |
Specify which license to use, if you have access to multiple locally cached | nlp.start(license_number=5) |
0 |
remote_license_number |
Specify which license to use, if you have access to multiple via OAUTH on my.jsl.com | nlp.start(license_number=5) |
0 |
Manually specify License Parameters
These can be omitted according to the License Variable Overview
Parameter | Description |
---|---|
aws_access_key |
Corresponds to AWS_ACCESS_KEY_ID |
aws_key_id |
Corresponds to AWS_SECRET_ACCESS_KEY |
enterprise_nlp_secret |
Corresponds to HC_SECRET |
ocr_secret |
Corresponds to OCR_SECRET |
hc_license |
Corresponds to HC_LICENSE |
ocr_license |
Corresponds to OCR_LICENSE |
fin_license |
Corresponds to JSL_LEGAL_LICENSE |
leg_license |
Corresponds to JSL_FINANCE_LICENSE |
Sparksession Parameters
These parameters configure how your spark Session is started up.
See Spark Configuration for a comprehensive overview of all spark settings
Parameter | Default | Description | Example |
---|---|---|---|
spark_conf |
None |
Dictionary Key/Value pairs of Spark Configurations for the Spark Session | nlp.start(spark_conf={'spark.executor.memory':'6g'}) |
master_url |
local[*] |
URL to Spark Cluster master | nlp.start(master_url='spark://my.master') |
jar_paths |
None |
List of paths to jars which should be loaded into the Spark Session | nlp.start(jar_paths=['my/jar_folder/jar1.zip','my/jar_folder/jar2.zip'] ) |
exclude_nlp |
False |
Whether to include Spark NLP jar in Session or not. This will always load the jar if available, unless set to True . |
nlp.start(exclude_nlp=True) |
exclude_healthcare |
False |
Whether to include licensed NLP Jar for Legal,Finance or Healthcare. This will always load the jar if available using your provided license, unless set to True . |
nlp.start(exclude_healthcare=True) |
exclude_ocr |
False |
Whether to include licensed OCR Jar for Legal,Finance or Healthcare. This will always load the jar if available using your provided license, unless set to True . |
nlp.start(exclude_ocr=True) |
hardware_target |
cpu |
Specify for which hardware Jar should be optimized. Valid values are gpu ,cpu ,m1 ,aarch |
nlp.start(hardware_target='m1') |
model_cache_folder |
None |
Specify where models should be downloaded to when using model.pretrained() |
nlp.start(model_cache_folder=True) |