Import Documents


Once a new project is created and its configuration is saved, the user is redirected to the Import page. Here the user has multiple options for importing tasks.

Plain text file

When you upload a plain text file, only one task will be created which will contain the entire data in the input file.

This is an update from versions of Annotation Lab when the input text file was split by the new line character and one task was created for each line.

Json file

For bulk importing a list of documents you can use the json import option. The expected format is illustrated in the image below. It consists of a list of dictionaries, each with 2 keys-values pairs (“text” and “title”).

[{  "text": "Task text content.", "title":"Task title"}]

CSV, TSV file

When CSV / TSV formatted text file is used, column names are interpreted as task data keys:

Task text content, Task title
this is a first task, Colon Cancer.txt
this is a second task, Breast radiation therapy.txt

Import annotated tasks

When importing tasks that already contain annotations (e.g. exported from another project, with predictions generated by pre-trained models) the user has the option to overwrite completions/predictions or to skip the tasks that are already imported into the project.

Dynamic Task Pagination

The support for pagination offered by earlier versions of the Annotation Lab involved the use of the <pagebreak> tag. A document pre-processing step was necessary for adding/changing the page breaks and those involved extra effort from the part of the users.

Annotation Lab 2.8.0 introduces a paradigm change for pagination. Going forward, pagination is dynamic and can be configured according to the user’s needs and preferences from the Labeling page. Annotators or reviewers can now choose the number of words to include on a single page from a predefined list of values or can add the desired counts.

A new settings option has been added to prevent splitting a sentence into two different pages.

Last updated