Starting in Generative AI Lab 5.2, you can harness the potential of synthetic documents generated by LLMs such as ChatGPT. This integration allows you to easily create diverse and customizable synthetic text for your annotation tasks, enabling you to balance any entity skewness in your data and to train and evaluate your models more efficiently.
In Generative AI Lab 6.5.0, additional types of synthetic documents are supported: proportional and augmented.
This use of the feature depends on an LLM being configured.
Once the service provider integration is completed, it can be utilized in projects that can benefit from the robust capabilities of this new integration. Text generation becomes straightforward and effortless.
Generate synthetic tasks using Azure OpenAI
Azure OpenAI can also be used to generate synthetic tasks. Here’s a quick guide:
Setting up and Validating the New Service Provider:
- From the task page, click on the “Import” button and navigate to the “Generate Synthetic Task” page.
- Provide an appropriate prompt in the “Write Prompt” text box and click on the settings icon located on the right side of the page.
- Enter the API endpoint URL and secret key, then click on “validate.”
- After validating the connection, set the desired temperature and the number of tasks to generate.
- Click on the “Generate” button to create synthetic tasks.
For synthetic tasks, provide a prompt adapted to your data needs to initiate the generation process and obtain the required tasks. Users can further control the results by setting the “Temperature” and the “Number of text to generate.” The “Temperature” parameter governs the “creativity” or randomness of the LLM-generated text. Higher temperature values (e.g., 0.7) yield more diverse and creative outputs, whereas lower values (e.g., 0.2) produce more deterministic and focused outputs.
Proportional Augmentaiton
This method enhances data quality by using various testing techniques to generate new data based on an existing dataset. Proportional Augmentation is particularly effective in improving model performance by addressing specific weaknesses, such as the inability to recognize lowercase text, uppercase text, typos, and more. It is especially beneficial for bias and robustness testing, ensuring that the model produces high-quality and accurate results for machine learning, predictive modeling, and decision-making tasks. After setting the test types and max_proportion, click on “Generate Results” to create augmented tasks. Based on your configuration, data augmentation will enhance the existing tasks and generate new ones.
Another way to generate augmented tasks is through “Templatic augmentation”.
Templatic Augementation
Templatic Augmentation creates new data by using templates or patterns that are similar in structure and context to the original input. This method depends a lot on the templates provided by the user. There are two options for using this approach:
A. Manually Add Templates Users can manually choose templates along with the available labels. They can choose how many results to generate for each template using a scroll bar, which can be set from 1 to 50.