Getting Started#

Spark NLP for Healthcare is a commercial extension of Spark NLP for clinical and biomedical text mining. If you don’t have a Spark NLP for Healthcare subscription yet, you can ask for a free trial by clicking on the button below.

[Try Free](https://www.johnsnowlabs.com/spark-nlp-try-free/)

Spark NLP for Healthcare provides healthcare-specific annotators, pipelines, models, and embeddings for: - Clinical entity recognition - Clinical Entity Linking - Entity normalization - Assertion Status Detection - De-identification - Relation Extraction - Spell checking & correction - Entity Resolver - Rule Based Contextual Parser - Text Generator - Summarizer - Risk Adjustment Module

The library offers access to several clinical and biomedical transformers: JSL-BERT-Clinical, BioBERT, ClinicalBERT, GloVe-Med, GloVe-ICD-O. It also includes over 2000+ pre-trained healthcare models, that can recognize the following entities (and many more): - Clinical - support Signs, Symptoms, Treatments, Procedures, Tests, Labs, Sections - Drugs - support Name, Dosage, Strength, Route, Duration, Frequency - Risk Factors- support Smoking, Obesity, Diabetes, Hypertension, Substance Abuse - Anatomy - support Organ, Subdivision, Cell, Structure Organism, Tissue, Gene, Chemical - Demographics - support Age, Gender, Height, Weight, Race, Ethnicity, Marital Status, Vital Signs - Sensitive Data- support Patient Name, Address, Phone, Email, Dates, Providers, Identifiers

For more information visit our [models](https://nlp.johnsnowlabs.com/models/) site.

Requirements#

Spark NLP is built on top of Apache Spark 3.x.x. For using Spark NLP you need:

Java 8 or Java 11
Apache Spark 3.x.x
Python 3.7.x, 3.8.x, 3.9.x, and 3.10.x

It is recommended to have basic knowledge of the framework and a working environment before using Spark NLP. Please refer to Spark documentation to get started with Spark.

Installation#

First, let’s make sure the installed java version is Java 8 or 11 (Oracle or OpenJDK):

java -version
# openjdk version "1.8.0_292"

You can install the Spark NLP for Healthcare package by using:

pip install spark-nlp-jsl==${version} --extra-index-url https://pypi.johnsnowlabs.com/${secret.code} --upgrade

{version} is the version part of the {secret.code} ({secret.code}.split(‘-‘)[0]) (i.e. 2.x.x)

The {secret.code} is a secret code that is only available to users with valid/trial license. If you did not receive it yet, please contact us at <a href=”mailto:info@johnsnowlabs.com”>info@johnsnowlabs.com</a>.

Starting a Spark NLP Session from Python#

You can start the spark session with this simple piece of code.

import sparknlp_jsl
spark = sparknlp_jsl.start(secret = "{secret.code}")

Or use the SparkSession module for more flexibility:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("Spark NLP Enterprise") \
    .master("local[*]") \
    .config("spark.driver.memory","16") \
    .config("spark.driver.maxResultSize", "2G") \
    .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
    .config("spark.kryoserializer.buffer.max", "2000M") \
    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:${version_public}") \
    .config("spark.jars", "https://pypi.johnsnowlabs.com/${secret.code}/spark-nlp-jsl-${version}.jar") \
    .getOrCreate()