Installation

 

Type of installation

Dedicated Server AWS Marketplace Azure Marketplace EKS deployment AKS deployment AirGap Environment OpenShift

Dedicated Server

Install NLP Lab (Annotation Lab) on a dedicated server to reduce the likelihood of conflicts or unexpected behavior.

Fresh install

To install NLP Lab run the following command:

wget https://setup.johnsnowlabs.com/annotationlab/install.sh -O - | sudo bash -s $VERSION

Replace $VERSION in the above one liners with the version you want to install.

For installing the latest available version of the NLP Lab use:

wget https://setup.johnsnowlabs.com/annotationlab/install.sh -O - | sudo bash -s --


Upgrade

To upgrade your NLP Lab installation to a newer version, run the following command on a terminal:

wget https://setup.johnsnowlabs.com/annotationlab/upgrade.sh -O - | sudo bash -s $VERSION

Replace $VERSION in the above one liners with the version you want to upgrade to.

For upgrading to the latest version of the NLP Lab, use:

wget https://setup.johnsnowlabs.com/annotationlab/upgrade.sh -O - | sudo bash -s --

NOTE: The install/upgrade script displays the login credentials for the admin user on the terminal.

After running the install/upgrade script, the NLP Lab is available at http://INSTANCE_IP or https://INSTANCE_IP

We have an aesthetically pleasing Sign-In Page with a section highlighting the key features of NLP Lab using animated GIFs.

AWS Marketplace

Visit the product page on AWS Marketplace and follow the instructions on the video below to subscribe and deploy.

Deploy NLP Lab via AWS Marketplace

Azure Marketplace

Visit the product page on Azure Marketplace and follow the instructions on the video below to subscribe and deploy.

Deploy NLP Lab via Azure Marketplace

EKS deployment

  1. Create NodeGroup for a given cluster

    eksctl create nodegroup --config-file eks-nodegroup.yaml
    
    kind: ClusterConfig
    apiVersion: eksctl.io/v1alpha5
    metadata:
      name: <cluster-name>
      region: <region>
      version: "1.21"
    availabilityZones:
      - <zone-1>
      - <zone-2>
    vpc:
      id: "<vpc-id>"
      subnets:
        private:
          us-east-1d:
            id: "<subnet-id"
          us-east-1f:
            id: "<subent-id>"
      securityGroup: "<security-group>"
    iam:
      withOIDC: true
    managedNodeGroups:
      - name: alab-workers
        instanceType: m5.large
        desiredCapacity: 3
        VolumeSize: 50
        VolumeType: gp2
        privateNetworking: true
        ssh:
          publicKeyPath: <path/to/id_rsa_pub>
    
    
    eksctl utils associate-iam-oidc-provider --region=us-east-1 --cluster=<cluster-name> --approve
    
  2. Create an EFS as shared storage. EFS stands for Elastic File System and is a scalable storage solution that can be used for general purpose workloads.

    curl -S https://raw.githubusercontent.com/kubernetes-sigs/aws-efs-csi-driver/v1.2.0/docs/iam-policy-example.json -o iam-policy.json
    aws iam create-policy \
      --policy-name EFSCSIControllerIAMPolicy \
      --policy-document file://iam-policy.json
    
    eksctl create iamserviceaccount \
      --cluster=<cluster> \
      --region <AWS Region> \
      --namespace=kube-system \
      --name=efs-csi-controller-sa \
      --override-existing-serviceaccounts \
      --attach-policy-arn=arn:aws:iam::<AWS account ID>:policy/EFSCSIControllerIAMPolicy \
      --approve
    
    helm repo add aws-efs-csi-driver https://kubernetes-sigs.github.io/aws-efs-csi-driver
    
    helm repo update
    
    helm upgrade -i aws-efs-csi-driver aws-efs-csi-driver/aws-efs-csi-driver \
      --namespace kube-system \
      --set image.repository=602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/aws-efs-csi-driver \
      --set controller.serviceAccount.create=false \
      --set controller.serviceAccount.name=efs-csi-controller-sa
    
    
  3. Create storageClass.yaml

    cat <<EOF > storageClass.yaml
    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
      name: efs-sc
    provisioner: efs.csi.aws.com
    parameters:
      provisioningMode: efs-ap
      fileSystemId: <EFS file system ID>
      directoryPerms: "700"
    EOF
    
    kubectl apply -f storageClass.yaml
    

    Edit annotationlab-installer.sh inside artifact folder as follows:

    helm install annotationlab annotationlab-${ANNOTATIONLAB_VERSION}.tgz                                 \
        --set image.tag=${ANNOTATIONLAB_VERSION}                                                          \
        --set model_server.count=1                                                                        \
        --set ingress.enabled=true                                                                        \
        --set networkPolicy.enabled=true                                                                  \
        --set networkPolicy.enabled=true --set extraNetworkPolicies='- namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
      podSelector:
        matchLabels:
          app.kubernetes.io/name: traefik
          app.kubernetes.io/instance: traefik'                                                            \
        --set keycloak.postgresql.networkPolicy.enabled=true                                              \
        --set sharedData.storageClass=efs-sc                                                              \
        --set airflow.postgresql.networkPolicy.enabled=true                                               \
        --set postgresql.networkPolicy.enabled=true                                                       \
        --set airflow.networkPolicies.enabled=true                                                        \
        --set ingress.defaultBackend=true                                                                 \
        --set ingress.uploadLimitInMegabytes=16                                                           \
        --set 'ingress.hosts[0].host=domain.tld'                                                          \
        --set airflow.model_server.count=1                                                                \
        --set airflow.redis.password=$(bash -c "echo ${password_gen_string}")                             \
        --set configuration.FLASK_SECRET_KEY=$(bash -c "echo ${password_gen_string}")                     \
        --set configuration.KEYCLOAK_CLIENT_SECRET_KEY=$(bash -c "echo ${uuid_gen_string}")               \
        --set postgresql.postgresqlPassword=$(bash -c "echo ${password_gen_string}")                      \
        --set keycloak.postgresql.postgresqlPassword=$(bash -c "echo ${password_gen_string}")             \
        --set keycloak.secrets.admincreds.stringData.user=admin                                           \
        --set keycloak.secrets.admincreds.stringData.password=$(bash -c "echo ${password_gen_string}")
    
    
  4. Run annotationlab-installer.sh script

         ./artifacts/annotationlab-installer.sh
    
  5. Install ingress Controller

    helm repo add nginx-stable https://helm.nginx.com/stable
    helm repo update
    helm install my-release nginx-stable/nginx-ingress
    
  6. Apply ingress.yaml

    cat <<EOF > ingress.yaml
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      annotations:
        kubernetes.io/ingress.class: nginx
        meta.helm.sh/release-name: annotationlab
        meta.helm.sh/release-namespace: default
      name: annotationlab
    spec:
      defaultBackend:
        service:
          name: annotationlab
          port:
            name: http
      rules:
      - host: domain.tld
        http:
          paths:
          - backend:
              service:
                  name: annotationlab
                  port:
                    name: http
            path: /
            pathType: ImplementationSpecific
          - backend:
              service:
                  name: annotationlab-keyclo-http
                  port:
                    name: http
            path: /auth
            pathType: ImplementationSpecific
    EOF
    
    kubectl apply -f ingress.yaml
    

AKS deployment

To deploy NLP Lab on Azure Kubernetes Service (AKS) a Kubernetes cluster needs to be created in Microsoft Azure.

  1. Login to your Azure Portal and search for Kubernetes services.

  2. On the Kubernetes services page click on the Create dropdown and select Create a Kubernetes cluster.

  3. On the Create Kubernetes cluster page, select the resource group and provide the name you want to give to the cluster.

  4. You can keep the rest of the fields to default values and click on Review + create.

  5. Click on Create button to start the deployment process.

  6. Once the deployment is completed, click on Go to resource button.

  7. On the newly created resource page, click on Connect button. You will be shown a list of commands to run on the Cloud Shell or Azure CLI to connect to this resource. We will execute them successively in the following steps.

  8. Run the following commands to connect to Azure Kubernetes Service.

    az account set --subscription <subscription-id>
    

    NOTE: Replace with your account's subscription id.

    az aks get-credentials --resource-group <resource-group-name> --name <cluster-name>
    

    NOTE: Replace and with what you selected in Step 3.

  9. Check to see if azurefile or azuredisk storage class is present by running the following command:

    kubectl get storageclass
    

    Later in the helm script we need to update the value of sharedData.storageClass with the respective storage class.

  10. Go to the artifact directory and from there edit the annotationlab-installer.sh script.

    helm install annotationlab annotationlab-${ANNOTATIONLAB_VERSION}.tgz                                 \
        --set image.tag=${ANNOTATIONLAB_VERSION}                                                          \
        --set model_server.count=1                                                                        \
        --set ingress.enabled=true                                                                        \
        --set networkPolicy.enabled=true                                                                  \
        --set networkPolicy.enabled=true --set extraNetworkPolicies='- namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
      podSelector:
        matchLabels:
          app.kubernetes.io/name: traefik
          app.kubernetes.io/instance: traefik'                                                            \
        --set keycloak.postgresql.networkPolicy.enabled=true                                              \
        --set sharedData.storageClass=azurefile                                                           \
        --set airflow.postgresql.networkPolicy.enabled=true                                               \
        --set postgresql.networkPolicy.enabled=true                                                       \
        --set airflow.networkPolicies.enabled=true                                                        \
        --set ingress.defaultBackend=true                                                                 \
        --set ingress.uploadLimitInMegabytes=16                                                           \
        --set 'ingress.hosts[0].host=domain.tld'                                                          \
        --set airflow.model_server.count=1                                                                \
        --set airflow.redis.password=$(bash -c "echo ${password_gen_string}")                             \
        --set configuration.FLASK_SECRET_KEY=$(bash -c "echo ${password_gen_string}")                     \
        --set configuration.KEYCLOAK_CLIENT_SECRET_KEY=$(bash -c "echo ${uuid_gen_string}")               \
        --set postgresql.postgresqlPassword=$(bash -c "echo ${password_gen_string}")                      \
        --set keycloak.postgresql.postgresqlPassword=$(bash -c "echo ${password_gen_string}")             \
        --set keycloak.secrets.admincreds.stringData.user=admin                                           \
        --set keycloak.secrets.admincreds.stringData.password=$(bash -c "echo ${password_gen_string}")
    
  11. Execute the annotationlab-installer.sh script to run the NLP Lab installation.

    ./annotationlab-installer.sh
    
  12. Verify if the installation was successful.

    kubectl get pods
    
  13. Install ingress controller. This will be required for load-balancing purpose.

    helm repo add nginx-stable https://helm.nginx.com/stable
    helm repo update
    helm install my-release nginx-stable/nginx-ingress
    
  14. Create a YAML configuration file named ingress.yaml with the following configuration

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      annotations:
        kubernetes.io/ingress.class: nginx
        meta.helm.sh/release-name: annotationlab
        meta.helm.sh/release-namespace: default
      name: annotationlab
    spec:
      defaultBackend:
        service:
          name: annotationlab
          port:
            name: http
      rules:
      - host: domain.tld
        http:
          paths:
          - backend:
              service:
                  name: annotationlab
                  port:
                    name: http
            path: /
            pathType: ImplementationSpecific
          - backend:
              service:
                  name: annotationlab-keyclo-http
                  port:
                    name: http
            path: /auth
            pathType: ImplementationSpecific
    
  15. Apply the ingress.yaml by running the following command

    kubectl apply -f ingress.yaml
    

AirGap Environment

Get Artifact

Run the following command on a terminal to fetch the compressed artifact (tarball) of the NLP Lab.

wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/annotationlab/annotationlab-$VERSION.tar.gz

Extract the tarball and the change directory to the extracted folder (artifacts):

tar -xzf annotationlab-$VERSION.tar.gz
cd artifacts

Replace $VERSION with the version you want to download and install.


Fresh Install

Run the installer script annotationlab-installer.sh with sudo privileges.

$ sudo su
$ ./annotationlab-installer.sh


Upgrade

Run the upgrade script annotationlab-updater.sh with sudo privileges.

$ sudo su
$ ./annotationlab-updater.sh


OpenShift

Annotation Lab can also be installed using the operator framework on an OpenShift cluster. The Annotation Lab operator can be found under the OperatorHub.


Find and select

The OperatorHub has a large list of operators that can be installed into your cluster. Search for Annotation Lab operator under AI/Machine Learning category and select it.


Install

Some basic information about this operator is provided on the navigation panel that opens after selecting Annotation Lab on the previous step.

NOTE: Make sure you have defined shared storage such as efs/nfs/cephfs prior to installing the Annotation Lab Operator.

Click on the Install button located on the top-left corner of this panel to start the installation process.

After successful installation of the Annotation Lab operator, you can access it by navigating to the Installed Operators page.


Create Instance

Next step is to create a cluster instance of the Annotation Lab. For this, select the Annotation Lab operator under the Installed Operators page and then switch to Annotationlab tab. On this section, click on Create Annotationlab button to spawn a new instance of Annotation Lab.

Define shared Storage Class

Update the storageClass property in the YAML configuration to define the storage class to one of efs, nfs, or cephfs depending upon what storage you set up before Annotation Lab operator installation.

Define domain name

Update the host property in the YAML configuration to define the required domain name to use instead of the default hostname annotationlab as shown in the image below.

Click on Create button once you have made all the necessary changes. This will also set up all the necessary resources to run the instance in addition to standing up the services themselves.


View Resources

After the instance is successfully created we can visit its page to view all the resources as well as supporting resources like the secrets, configuration maps, etc that were created.

Now, we can access the Annotation Lab from the provided domain name or also from the location defined for this service under the Networking > Routes page

Work over proxy

Custom CA certificate

You can provide a custom CA certificate chain to be included into the deployment. To do it add --set-file custom_cacert=./cachain.pem options to helm install/upgrade command inside annotationlab-installer.sh and annotationlab-updater.sh files.

cachain.pem must include a certificate in the following format:

-----BEGIN CERTIFICATE-----
....
-----END CERTIFICATE-----


Proxy env variables

You can provide a proxy to use for external communications. To do that add

`--set proxy.http=[protocol://]<host>[:port]`,
`--set proxy.https=[protocol://]<host>[:port]`,
`--set proxy.no=<comma-separated list of hosts/domains>`

commands inside annotationlab-installer.sh and annotationlab-updater.sh files.

System requirements You can install Annotation Lab on a Ubuntu 20+ machine.
Port requirements Annotation Lab expects ports 443 and 80 to be open by default.
Server requirements The minimal required configuration is 32GB RAM, 8 Core CPU, 512 SSD.

The ideal configuration in case model training and preannotations are required on a large number of tasks is 64 GiB, 16 Core CPU, 2TB HDD, 512 SSD.
Web browser support Annotation Lab is tested with the latest version of Google Chrome and is expected to work in the latest versions of:
  • Google Chrome
  • Apple Safari
  • Mozilla Firefox
Last updated