NLP Models Hub

 

The Annotation Lab 1.8.0 offers a tight integration with NLP Models Hub. Any compatible NER model and Embeddings can be downloaded and made available to the Annotation Lab users for preannotations either from within the application or via manual upload.

Models Hub page can be accessed via the left navigation menu by users in the UserAdmins group.

The Models Hub page has three tabs that hold information about models and embeddings.

Models Hub Tab

This tab lists all pretrained NER models and embeddings from NLP Models Hub which are compatible with Spark NLP 2.7.5 and which are defined for English language. By selecting one or multiple items from the grid view, users can download them to the Annotation Lab. The licensed/Healthcare models and embeddings are available to download only when a valid license is uploaded.

One restriction that we impose on models download/upload relates to the available disk space. Any model download requires that the double of its size is available on the local storage. If not enough space is available the download cannot proceed.

Disk usage view, search, and filter features are available on the upper section of the Models Hub page.

Available Models Tab

All the models available in the Annotation Lab are listed in this tab. The models are either trained within the Annotation Lab, uploaded to Annotation Lab by admin users, or downloaded from NLP Models Hub. General information about the models like labels/categories and the source (downloaded or trained or uploaded) of the model can be viewed. It is possible to delete any model or redownload failed ones by using the overflow menu on the top right corner of each model.

Since release 2.8.0 users can view the full version specification and the language of the model/embeddings on the Models Hub page of the model. Also to make searching of models/embeddings more efficient Annotation Lab offers a Language filter. This can be used to select models/embeddings on the Models Hub page according to their language.

untitled (2)

Download of model dependencies

In previous versions of Annotation Lab, when a user downloaded a model from the Models Hub page, for example, ner_healthcare_de, and tried to reuse labels from this model into a new project, via the Predefined labels page, an error message was preventing the saving of the configuration. The error (e.g. “w2v_cc_300d is not present in the machine”) appeared because the embeddings used to train the model were not available to the Annotation Lab server. Starting with version 2.8.0, Annotation Lab automatically downloads all necessary dependencies along with the model. So, the user does not have to manually handle this step – e.g. navigate to the Models Hub page and download the dependent embeddings.

Screen Recording 2022-03-16 at 9 58 12 PM

Available Embeddings Tab

This tab lists all embeddings available to the Annotation Lab together with information on their source and date of upload/download. Like models, any compatible embeddings can be downloaded from NLP Models Hub. By default, glove_100d embeddings are included in the deployment.

Available Rules Tab

Spark NLP for Healthcare supports rule-based annotations via the ContextualParser Annotator. In this release, Annotationlab adds support for creating and using ContextualParser rules in NER project.

Any user with admin privileges can see and edit the available rules under the Available Rules tab on the Models Hub page. Users can create new rules using the + Add Rules button.

There are two types of rules supported:

  • Regex Based: Users can define a regex that will be used to label all possible hit chunks and label them as being the target entity. For example, for labeling height entities the following regex can be used [0-7]'((0?[0-9])|(1(0|1))). All hits found in the task text that match the regex, are pre-annotated as heights.

  • Dictionary Based: Users can define and upload a CSV dictionary of keywords that cover the list of chunks that should be annotated as a target entity. For example, for the label female: woman, lady, girl, all occurrences of stings woman, lady, and girl within the text of a given task will be perannotated as female.

After adding a rule on Models Hub page, the Project Owner or Manager can add the rule to the configuration of the project where he wants to use it. This can be done via the Rules tab from the Project Setup page under the Project Configuration tab. A valid Spark NLP for Healthcare license is required to deploy rules from project config.

The user is notified when a rule was edited via an alert message “Redeploy preannotation server to apply these changes” on the rule edit form so that the users can redeploy the preannotation model.

redeploy-rules

Import and Export Rules

From this version, Annotationlab provides the feasibility of importing and exporting Contextual parser Rules from the Model Hub page.

Import Rules

This release provides user’s with the functionality to import rules, now users can import these rules from the Models Hub page under the Rules tab. Users can import both dictionaries as well as regex rules. Rules can be imported in the following formats:-

  1. A single JSON file.
  2. A ZIP archive containing multiple individual rules.

RulesImport

Export Rules

To export any rule the user needs to navigate to the Models Hub page under the Rules tab, then select from the available rules and click on the Export button. Rules will be downloaded in ZIP format containing files in JSON format. Also, these exported rules can be imported into AnnotationLab.

RulesExport

Custom Models/Embeddings Upload

Custom NER models or embeddings can be uploaded using the Upload button present in the top right corner of the page. The labels predicted by the uploaded NER model need to be specified using the Model upload form.

The models& embeddings to upload need to be Spark NLP compatible.

All available models are listed in the Spark NLP Pipeline Config on the Setup Page of any project and are ready to be included in the Labeling Config for preannotation.

Last updated