Deploy LLMs as Private Azure Endpoints

 

The list below shows John Snow Labs’ Medical LLM models available on Azure marketplace.

All LLMs on Azure are Open AI compatible.

Medical LLM Medium

Medical LLM Small

Medical LLM - 14B

Medical LLM - 10B

Medical LLM- 7B

Medical Reasoning LLM - 14B

Medical Reasoning LLM - 32B

Deployment Instructions

  1. Subscribe to the Product from the models listing page using the Get It Now button. azure_subscribe

  2. Create a virtual machine of the product. Make sure port 80 is open for inbound requests.

    Optional, open port 3000 for Open Web UI interface. launch_azure_product

  3. Wait for the services to be active. This might take few minutes for the initial boot.
    To check the status, login to the instance and run this command
    sudo systemctl status med-llm.service azure_product_status
  4. Once all the status is active, access the model api docs from http://INSTANCE_IP/docs

  5. Open WebUI hosted on port 3000. You can also interact with the model from here. azure_open_web_ui

Model Interactions

Once deployed, the container exposes a RESTful API for model interactions.

Chat Completions

Use this endpoint for multi-turn conversational interactions (e.g., clinical assistants).

  • Endpoint: /v1/chat/completions
  • Method: POST
  • Example Request:
payload = {
    "model": "Medical-LLM-7B",
    "messages": [
        {"role": "system", "content": "You are a professional medical assistant"},
        {"role": "user", "content": "Explain symptoms of chronic fatigue syndrome"}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
}

Text Completions

Use this endpoint for single-turn prompts or generating long-form medical text.

  • Endpoint: /v1/completions
  • Method: POST
  • Example Request: ```python payload = { “model”: “Medical-LLM-7B”, “prompt”: “Provide a detailed explanation of rheumatoid arthritis treatment”, “temperature”: 0.7, “max_tokens”: 4096 }
Last updated