Starting a Spark Session

 

To use most features you must start a Spark Session with nlp.start()first.
This will launch a Java Virtual Machine(JVM) process on your machine which has all of John Snow Labs and Sparks Scala/Java Libraries(JARs) you have access to loaded into memory.

The nlp.start() method downloads loads and caches all jars for which credentials are provided if they are missing into ~/.jsl_home/java_installs.
If you have installed via nlp.install() you can most likely skip the rest of this page, since your secrets have been cached in ~/.jsl_home and will be re-used.
If you disabled license caching while installing or if you want to tweak settings about your spark session continue reading this section further.

Outputs of running nlp.start() tell you which jars are loaded and versions of all relevant libraries.
access_token1.png

Authorization Flow Parameters

Most of the authorization Flows and Parameters of nlp.install() are supported.
Review detailed docs here

Parameter Description Example Default
None Load license automatically via one of the Auto-Detection Mechanisms nlp.start() False
browser_login Browser based authorization, Button to click on Notebooks and Browser Pop-Up otherwise. nlp.start(browser_login=True) False
access_token Vist my.johnsnowlabs.com to extract a token which you can provide to enable license access. See Access Token Example nlp.start(access_token='myToken') None
secrets_file Define JSON license file with keys defined by License Variable Overview and provide file path nlp.start(secrets_file='path/to/license.json') None
store_in_jsl_home Disable caching of new licenses to ~./jsl_home nlp.start(store_in_jsl_home=False) True
local_license_number Specify which license to use, if you have access to multiple locally cached nlp.start(license_number=5) 0
remote_license_number Specify which license to use, if you have access to multiple via OAUTH on my.jsl.com nlp.start(license_number=5) 0

Manually specify License Parameters

These can be omitted according to the License Variable Overview

Parameter Description
aws_access_key Corresponds to AWS_ACCESS_KEY_ID
aws_key_id Corresponds to AWS_SECRET_ACCESS_KEY
enterprise_nlp_secret Corresponds to HC_SECRET
ocr_secret Corresponds to OCR_SECRET
hc_license Corresponds to HC_LICENSE
ocr_license Corresponds to OCR_LICENSE
fin_license Corresponds to JSL_LEGAL_LICENSE
leg_license Corresponds to JSL_FINANCE_LICENSE

Sparksession Parameters

These parameters configure how your spark Session is started up.
See Spark Configuration for a comprehensive overview of all spark settings

Parameter Default Description Example
spark_conf None Dictionary Key/Value pairs of Spark Configurations for the Spark Session nlp.start(spark_conf={'spark.executor.memory':'6g'})
master_url local[*] URL to Spark Cluster master nlp.start(master_url='spark://my.master')
jar_paths None List of paths to jars which should be loaded into the Spark Session nlp.start(jar_paths=['my/jar_folder/jar1.zip','my/jar_folder/jar2.zip'] )
exclude_nlp False Whether to include Spark NLP jar in Session or not. This will always load the jar if available, unless set to True. nlp.start(exclude_nlp=True)
exclude_healthcare False Whether to include licensed NLP Jar for Legal,Finance or Healthcare. This will always load the jar if available using your provided license, unless set to True. nlp.start(exclude_healthcare=True)
exclude_ocr False Whether to include licensed OCR Jar for Legal,Finance or Healthcare. This will always load the jar if available using your provided license, unless set to True. nlp.start(exclude_ocr=True)
hardware_target cpu Specify for which hardware Jar should be optimized. Valid values are gpu,cpu,m1,aarch nlp.start(hardware_target='m1')
model_cache_folder None Specify where models should be downloaded to when using model.pretrained() nlp.start(model_cache_folder=True)
Last updated