Deploy LLMs as Private Azure Endpoints

The list below shows John Snow Labs’ Medical LLM models available on Azure marketplace.

All LLMs on Azure are Open AI compatible.

Medical LLM Medium

Medical LLM Small

Medical LLM - 14B

Medical LLM - 10B

Medical LLM - 8B

Medical Reasoning LLM - 14B

Medical Reasoning LLM - 32B

Spanish Medical LLM - 24B

Deployment Instructions

Subscribe to the Product from the models listing page using the Get It Now button.
Create a virtual machine of the product. Make sure port 80 is open for inbound requests.

Optional, open port 3000 for Open Web UI interface.
Wait for the services to be active. This might take few minutes for the initial boot.
To check the status, login to the instance and run this command
sudo systemctl status med-llm.service
Once all the status is active, access the model api docs from http://INSTANCE_IP/docs
Open WebUI hosted on port 3000. You can also interact with the model from here.

Model Interactions

Once deployed, the container exposes a RESTful API for model interactions.

Chat Completions

Use this endpoint for multi-turn conversational interactions (e.g., clinical assistants).

Endpoint: /v1/chat/completions
Method: POST
Example Request:

payload = {
    "model": "Medical-LLM-8B",
    "messages": [
        {"role": "system", "content": "You are a professional medical assistant"},
        {"role": "user", "content": "Explain symptoms of chronic fatigue syndrome"}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
}

Text Completions

Use this endpoint for single-turn prompts or generating long-form medical text.

Endpoint: /v1/completions
Method: POST
Example Request: ```python payload = { “model”: “Medical-LLM-8B”, “prompt”: “Provide a detailed explanation of rheumatoid arthritis treatment”, “temperature”: 0.7, “max_tokens”: 4096 }

PREVIOUSDeploy on AWS

NEXTDeploy on Snowflake