Import Documents

 

Once a new project is created and its configuration is saved, the user is redirected to the Import page. Here the user has multiple options for importing tasks.

Users can import the accepted file formats in multiple ways. They can drag and drop the file(s) to the upload box, select the file from the file explorer, provide the URL of the file in JSON format, or import it directly from the S3 bucket. To import from Amazon S3 bucket the user needs to provide the necessary connection details (credentials, access keys, and S3 bucket path). All documents present in the specified path, are then imported as tasks in the current project.

Plain text file

When you upload a plain text file, only one task will be created which will contain the entire data in the input file.

This is an update from earlier versions of Generative AI Lab when the input text file was split by the new line character and one task was created for each line.

Json file

For bulk importing a list of documents you can use the json import option. The expected format is illustrated in the image below. It consists of a list of dictionaries, each with 2 keys-values pairs (“text” and “title”).

[{"text": "Task text content.", "title":"Task title"}]

CSV, TSV file

When CSV / TSV formatted text file is used, column names are interpreted as task data keys:

Task text content, Task title
this is a first task, Colon Cancer.txt
this is a second task, Breast radiation therapy.txt

Import annotated tasks

When importing tasks that already contain annotations (e.g. exported from another project, with predictions generated by pre-trained models) the user has the option to overwrite completions/predictions or to skip the tasks that are already imported into the project.

NOTE: When importing tasks from different projects with the purpose of combining them in one project, users should take care of the overlaps existing between tasks IDs. Generative AI Lab will simply overwrite tasks with the same ID.

Dynamic Task Pagination

The support for pagination offered by earlier versions of the Generative AI Lab involved the use of the <pagebreak> tag. A document pre-processing step was necessary for adding/changing the page breaks and those involved extra effort from the part of the users.

Generative AI Lab 2.8.0 introduces a paradigm change for pagination. Going forward, pagination is dynamic and can be configured according to the user’s needs and preferences from the Labeling page. Annotators or reviewers can now choose the number of words to include on a single page from a predefined list of values or can add the desired counts.

A new settings option has been added to prevent splitting a sentence into two different pages.

Import from Cloud Storage

Generative AI Lab 4.3.0 offers support for importing tasks/documents stored on cloud. In the Import Page, a new section was added which allows users to define S3 connection details (credentials, access keys, and S3 bucket path). All documents present in the specified path, are imported as tasks in the current Generative AI Lab project. With Version 5.9 of Generative AI Lab allows you to effortlessly import projects using S3 and Azure Blob.

Generative AI Lab 5.8 introduces a pivotal enhancement that expands task management capabilities by seamlessly integrating with Azure Blob storage, complementing the existing support for AWS S3. This integration empowers users to streamline task import and export processes, fostering greater efficiency and flexibility in their data handling workflows within the Generative AI Lab platform.

Effortless Task Import from Azure Blob Storage:

Importing tasks from Azure storage containers is now as straightforward and intuitive as importing from AWS S3. Follow these simple steps to effortlessly integrate your Azure data into Generative AI Lab projects:

  • Prepare the Azure Source: Ensure the Azure storage container from which you intend to import tasks is readily accessible and the target files are available. Generative AI Lab can currently accommodate various document types such as text, PDF, images, videos, and sound files.
  • In your Generative AI Lab project: Navigate to the Task Import page of the project where you wish to import tasks.
  • Select Azure Blob Storage: Choose the “Azure BLOB” import option by clicking on the corresponding radio button on the Import page.
  • Enter Azure Credentials: Provide the Azure connection details: Azure Container Name, Azure Account Name, and Azure Account Secret Key.
  • Initiate Import Process: Click the “Import” button to seamlessly transfer compatible documents from the specified Azure container into the current Generative AI Lab project.

1

Import Project from S3 and Blob

Generative AI Lab 4.3.0 offers support for importing tasks/documents stored on cloud. In the Import Page, a new section was added which allows users to define S3 connection details (credentials, access keys, and S3 bucket path). All documents present in the specified path, are imported as tasks in the current Generative AI Lab project. With Version 5.9 of Generative AI Lab allows you to effortlessly import projects using S3 and Azure Blob.

Steps to import a project from S3:

  • Navigate to “Import Project”
  • Choose “AWS S3”
  • Input the path to the S3 file as s3://bucket/folder/file.zip
  • Provide S3 Access Key, S3 Secret Key, and Session Token (Required for MFA Accounts)
  • Click “Import” S3_import

Steps to import a project from Azure Bbob:

  • Go to “Import Project”
  • Select “Azure Blob”
  • Enter the path to the Azure Blob file as Container/file.zip
  • Input Azure Account Name and Azure Account Secret Key
  • Click “Import” Import_azure
Last updated