Text-to-SQL Generation (MIMICSQL)

Description

This model can generate SQL queries from natural questions. It is based on a small-size LLM, which is finetuned by John Snow Labs on a dataset having a schema with the same schema that MIMIC-III has.

Predicted Entities

Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

text2sql = Text2SQL.pretrained("text2sql_mimicsql", "en", "clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sql")

pipeline = Pipeline(stages=[
    document_assembler,
    text2sql 
])

text = "Calulate the total number of patients who had icd9 code 5771"
data = spark.createDataFrame([[text]]).toDF("text")

pipeline = Pipeline(stages=[document_assembler, text2sql])
result= pipeline.fit(data).transform(data)

val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val text2sql = new Text2SQL.pretrained("text2sql_mimicsql", "en", "clinical/models")
    .setInputCols(["document"])
    .setOutputCol("sql")

val pipeline = new Pipeline().setStages(Array(document_assembler, text2sql ))

val text = """Calulate the total number of patients who had icd9 code 5771"""

val data = Seq(Array(text)).toDS.toDF("text")

val result = pipeline.fit(data).transform(data)


Results

[
SELECT COUNT ( DISTINCT DEMOGRAPHIC."SUBJECT_ID" )
FROM DEMOGRAPHIC
INNER JOIN PROCEDURES on DEMOGRAPHIC.HADM_ID = PROCEDURES.HADM_ID
WHERE PROCEDURES."ICD9_CODE" = "5771"
]

Model Information

Model Name: text2sql_mimicsql
Compatibility: Healthcare NLP 5.0.1+
License: Licensed
Edition: Official
Language: en
Size: 3.0 GB