Annotation Lab offers out-of-the-box support for
To run pre-annotation on one or several tasks, the Pre-Annotate
button from the top right side of the
This information is crucial, especially when multiple users are doing training and deployment in parallel. So before doing preannotations on your tasks, carefully check the list of currently deployed models and their labels.
If needed, users can deploy the models defined in the current project (based on the current Labeling Config) by clicking the Deploy button. After the deployment is complete, the preannotation can be triggered.
Since
In case a preannotation server does not exist for the current project, the dialog box also offers the option to deploy a new server with the current project’s configuration. If this option is selected and enough resources are available (infrastructure capacity and a license if required) the server is deployed, and preannotation can be started. If there are no free resources, users can delete one or several existing servers from
Concurrency is not only supported between preannotation servers but also between training and preannotation. Users can have training running on one project and preannotation running on another project at the same time.
Preannotation Approaches
Pretrained Models
On the Add Label
button you can add the predefined labels to your project configuration and take advantage of the Spark NLP auto labeling capabilities.
In the example below, we are reusing the ner_posology
model that comes with 7 labels related to drugs.
In the same manner classification, assertion status or relation models can be added to the project configuration and used for preannotation purpose.
Starting from version 4.3.0, Finance and Legal models downloaded from the Models Hub can be used for pre-annotation of NER, assertion status and classification projects. Visual NER models can now be downloaded from the NLP Models Hub, and used for pre-annotating image-based documents. Once you download the models from the Models Hub page, you can see the model’s label in the
Rules
Preannotation of NER projects can also be done using
In the example below, we are reusing the available rules for preannotation.
Read more on how to create rules and reuse them to speed up the annotation process here.
Text Preannotation
Preannotation is available for projects with text contents as the tasks. When you setup a project to use existing Spark NLP models for pre-annotation, you can run the designated models on all of your tasks by pressing the Pre-Annotate
button on the top-right corner of the
As a result, all predicted labels for a given task will be available in the
Visual Preannotation
For running pre-annotation on one or several tasks, the Pre-Annotate
button from the upper right side of the
Known Limitations:
- When bulk pre-annotation runs on many tasks, the pre-annotation can fail due to memory issues.
- Preannotation currently works at the token level, and does not merge all tokens of a chunk into one entity.
Pipeline Limitations
Loading too many models in the preannotation server is not memory efficient and may not be practically required. Starting from version
Another restriction for Annotation Lab versions older than 4.2.0 is that two models trained on different embeddings cannot be used together in the same project. The Labeling Config will throw validation errors in any of the cases above, and we cannot save the configuration preventing preannotation server deployment.