Generative AI Lab 7.4 - Annotate Deeper, Import Faster, Evaluate Better
Release date: 09-10-2025
We’re excited to announce the release of Generative AI Lab 7.4, a version focused on improving usability, flexibility, and evaluation workflows for LLM projects. This update brings significant enhancements such as long-form chunk-level comments for LLM evaluation annotations, real-time tooltips for XML creation, and expanded LLM integration options.
Alongside these updates, we’ve introduced improvements to analytics, simplified prompt import workflows, and added support for Claude across all key features. Usability has been further refined with better project configuration, user filtering in analytics, and external ID support for users.
This release also includes numerous bug fixes to ensure smoother workflows, stronger stability, and more consistent performance across annotation, evaluation, and integration processes.
LLM Evaluation - Explain your Annotations via Comments
What’s New: Generative AI Lab 7.4 introduces support for long-form, chunk-level comments in LLM evaluation and comparison projects. Users can now add dedicated, long-form comments to each annotation span in LLM evaluation projects, separate from the existing meta field. These comments provide detailed explanations for labels such as “hallucinations” and other NER-style annotations
Technical Details:
- Each annotation span has its own comment field, separate from the meta field.
- Comments support long-form input for detailed notes, for example, facts, contradictions, or references.
- Accessible via the annotation widget — appears when a chunk is clicked or labeled.
- Comments are saved with the chunk’s annotation data; they do not change existing meta logic.
- Supported for both HyperTextLabels and Labels.
- Annotation widget resized to fill the full vertical space; when relations exist, space is split 60/40 between NER spans and relations.
User Benefits:
- Reliable data export – Comments are included in the JSON output as text and as key value metadata, so downstream systems receive complete information.
- Seamless commenting experience – Users can add and view chunk-level comments directly in the annotation workflow.
- No migration worries – Older annotations remain fully compatible, so existing work is preserved without extra effort.
- Improved usability – The annotation widget automatically adjusts its size to provide the best experience, whether relation annotations are present or not.
Example Use Case: During evaluation, a user labels a text span as a “hallucination” and adds a detailed comment explaining why it is factually incorrect, providing context for future reviewers and model fine-tuning.
Guided XML Configuration via Tooltips
What’s New: Users now receive real-time tag-level tooltips while creating XML configurations in the Customize Label page. These tooltips provide clear descriptions and value suggestions for each tag, making XML creation more accurate and efficient.
Technical Details:
- Tooltip appears dynamically as the user types a tag during XML creation.
- Tooltip content includes:
- Description of the tag
- Expected attributes (if any)
- Implemented for all supported tags and attributes in the Customize Label page.
User Benefits:
- New Users: Understand tag semantics easily, reducing the learning curve and setup errors.
- Experienced Users: Speed up XML configuration with real-time guidance and attribute suggestions.
Example Use Case: A team setting up custom XML configurations for their project can now view tooltips for each tag, ensuring correct attribute usage and minimizing errors during the configuration process.
Improvements
Streamlined LLM Integration with Admin Approval Workflow
Generative AI Lab 7.4 simplifies LLM integration with a centralized approval system that allows users to request access to language models while maintaining administrative control over resource usage and permissions.
What’s New: Users can now add LLMs from the Configuration page and submit a request for admin approval to use those models for response generation. Once an approval is granted, the selected LLM appears as a selectable option wherever responses are generated, and administrators can revoke that permission at any time without affecting other models. In addition, ADHOC providers created by users are now listed on the Configuration page, improving visibility and making provider management easier.
Technical Details:
- All available LLMs are listed on the Configuration and Integration page.
- Users can select an LLM and submit an approval request to the admin.
- Before approval:
- The Generate Response button redirects to the setup page.
- After approval:
- The project owner can use the approved LLM to generate responses.
- ADHOC providers created by users are included in the LLM list.
- Admins can revoke or restore permissions for any LLM.
User Benefits:
- Teams: Streamlines integrating LLMs and getting admin approval without navigating multiple steps.
- Admins: Maintains control over LLM usage while allowing flexibility in project setup.
Example Use Case: A project team can select an LLM from the Configuration page and request approval. After the admin approves, they can start generating responses immediately. This reduces setup delays and improves operational efficiency.
Notes:
- Users cannot request a revoked LLM
- Once an LLM is re-approved, it is automatically listed in the project LLM list without requiring a new request.
Flexible Project Setup with Optional LLM Configuration
Accelerate project creation by bypassing external LLM setup when not immediately needed. Create custom LLM configurations and access analytics and label customization without waiting for external service integration.
What’s New: The project configuration wizard for LLM projects now allows users to skip the LLM configuration step. By creating a custom LLM, users can customize labels and view analytics without needing to configure any external LLM service provider.
Technical Details:
- The wizard now allows skipping the LLM configuration step for LLM projects.
- Users can create a custom LLM and proceed directly to label customization and analytics.
User Benefits:
- Project Teams: Quickly set up projects and access analytics without relying on external LLMs.
- Annotators: Start customizing labels immediately and reduce setup time.
- Data Analysts: View project insights and metrics without waiting for LLM configuration.
Example Use Case: A user setting up an LLM project can create a custom LLM, skip the external configuration steps, and immediately customize labels and view project analytics.
Notes: If a user attempts to generate responses without any configured LLM, they will be redirected to the setup page to complete the necessary steps.
Comprehensive Analytics for LLM Evaluation
Gain deeper insights into your LLM projects with enhanced analytics that support complex evaluation structures including multiple rating systems, hypertext labels, and choice-based assessments.
What’s New: The Analytics page in LLM-based projects now supports multiple rating sections, HypertextLabels, and Choices within the evaluation block. This provides more detailed and accurate analytics for completed evaluations.
Technical Details:
- Added support for multiple rating sections in evaluation blocks.
- HypertextLabels and Choices are now fully displayed and counted in analytics.
- Updated chart behavior:
- Chart titles are always displayed.
- Subtitles now show “No data available yet” if no data exists.
User Benefits:
- Project Teams: Can view more detailed and accurate analytics with multiple rating sections.
- Data Analysts: Better insights into responses with full support for HypertextLabels and Choices.
- Managers/Reviewers: Clearer visualization of results and improved consistency in the interface.
Example Use Case: A user reviewing an LLM-based project can now analyze multiple ratings, choices, and hypertext labels for each evaluation. This ensures more accurate reporting of team performance and evaluation results.
Note: All labels, classifications, and ratings defined after the following XML line will be included in the LLM analytics.
<View orientation="vertical" pretty="true" style="overflow-y: auto;" evaluation_block="true">
Simplified Prompt Import for LLM Evaluation and Comparison
Import prompts effortlessly using simplified JSON or CSV formats that work consistently across all LLM project types, replacing complex data structures with user-friendly options.
What’s New: Users can now import prompts using a simple JSON or CSV format across all LLM project types, replacing the previously complex JSON structure.
Technical Details:
- New lightweight JSON schema for prompt import:
{ "text": "Your Prompt Here" }
- Supports batch imports via JSON arrays or CSV files:
[
{"text":"Your Prompt Here"},
{"text":"Your Another Prompt Here"}
]
- Import available for:
- LLM Evaluation (Text & HTML types)
- LLM Comparison (Text & HTML types)
- Added ability to download a sample JSON directly from the import page.
- Updated “Import Sample Task” to use real prompts that generate LLM responses.
User Benefits:
- Simplified Workflow: Removes the need for verbose completion-task JSON structures.
- Cross-Project Consistency: Same import structure now works for both LLM and Text projects.
- Faster Onboarding: Downloadable samples reduce setup errors and accelerate project configuration.
- Flexible Input Options: Teams can choose between JSON or CSV depending on workflow preference.
Example Use Case: A research team setting up an LLM Response Comparison project can quickly import 500 test prompts from a CSV file instead of building complex JSON payloads, allowing them to focus on analyzing model quality instead of data formatting.
Complete Claude Integration Across All Features
Expand your LLM toolkit with full Claude support across synthetic task generation, external prompts, and LangTest augmentation, providing greater flexibility and choice in your AI workflows.
What’s New: The application now provides full support for Claude across all major features, including:
- Synthetic Task Generation
- External LLM Prompts
- LangTest Augmented Tasks This enhancement ensures seamless integration of Claude for multiple workflows, expanding flexibility and choice for users working with LLM-based tasks.
Technical Details:
- Added Claude integration for generating synthetic tasks.
- Enabled Claude as a provider for external LLM prompts.
- Extended LangTest pipeline to support Claude for augmented task generation.
Synthetic Tasks
Exernal Prompt
User Benefits:
- Flexibility: Users can now select Claude as an alternative LLM for synthetic data generation and task augmentation.
- Consistency: Claude is supported across all major LLM-related features for a unified experience.
Example Use Case: A user creating synthetic tasks for evaluation can now select Claude as the LLM to generate tasks.
Enhanced Team Analytics with Individual User Filtering
What’s New: The “Submitted Completions Over Time” chart in the Team Productivity section now includes an option to filter submissions by individual users instead of viewing all users collectively.
User Benefit:
Users can analyze team productivity in more detail by filtering data for a specific user, making performance tracking more accurate.
Technical Details:
- Added user filter dropdown to the chart component in the Analytics Dashboard.
- Handled UI state management so that when a user is unselected, the chart resets to show data for all users.
Example Use Case: A project manager can now select a single user in the chart to check how many completions that user submitted over time.
External System Integration with User ID Mapping
What’s New: Admins can now add an External ID when creating a user. This field links a Generative AI Lab user to the matching account in an external application.
User Benefit:
Better mapping between Generative AI Lab and external systems, which improves integration and makes user management easier.
Technical Details:
- The User Creation form includes an External ID field with input validation. The field accepts any string, including special characters, up to 50 characters.
Example: An admin creating a new user for an enterprise integration can set the External ID as extemp-1023 to map the Generative AI Lab user with the enterprise HR system.
Bug Fixes
-
Credentials Not Saved for Project Cloud Export/Import
Fixed an issue where S3 credentials were not being persisted for project export/import operations, requiring users to re-enter them each time. Credentials are now stored securely and reused across sessions. Additionally, sensitive credential information is no longer exposed in API payloads, improving security.
-
Save Credentials in Import Project Form
Fixed an issue where the Save Credentials option in the Import Project form was not working as expected. Previously, credentials could not be saved and must be re-entered for each import. This functionality now works correctly, allowing credentials to be securely saved and reused.
-
Analytics Discrepancy with Multiple Ground Truth Completions
Fixed an issue where the Analytics page did not correctly use the highest-priority user’s submission when multiple ground truth completions existed. Previously, analytics could display results from a lower-priority user instead of the intended highest-priority user. This has been resolved, and analytics now consistently reflect the highest-priority user’s data.
-
Missing Titles in Analytics Charts
Fixed an issue where the titles for the “Average edit time” and “Average number of edits per task” charts were displayed even when the charts were empty. These charts are now hidden when no data is available, ensuring a cleaner Analytics view.
-
Triple Dot Menu Inaccessible After User Search
Fixed an issue where the triple dot menu (⋮) next to users in the search results was not accessible. This occurred because normal users and users created through security providers were listed in the same category. The system now separates them into their respective categories, ensuring the triple dot menu is fully accessible for normal users after performing a search.
-
‘h’ Hotkey Assignment Blocked Due to Default Hide Label Shortcut
Fixed an issue where the ‘h’ key could not be assigned as a hotkey in text-based projects because it conflicted with the default “hide labels” shortcut. The system now properly handles such conflicts by either preventing reassignment with a clear message or allowing reassignment if the default shortcut is intentionally overridden.
-
LLM Response Fails to Generate with Invalid Input
Resolved a bug where responses failed to generate if the input textbox contained spaces for the “responseName”. The system now blocks spaces and only allows valid characters. Also the search bar now remains visible regardless of search results.
-
Element Name Matching for Labeling in LLM Evaluation and Comparison
Fixed an issue where labeling did not work unless the element name matched the expected response name. The system now correctly enforces the required names: This ensures labeling works reliably in both evaluation and comparison workflows.
-
Task ID Not Generated on First Azure Import
Fixed an issue where task IDs were not generated during the first import from Azure, causing the import process to fail. Task IDs are now correctly generated on initial imports, ensuring successful project setup from Azure.
-
Optimize API calls to check the deployed pre-annotation servers
Fixed an issue where the get_server API was being called on every page, causing unnecessary requests. The API is now called only on the Cluster page, while the Task page uses the new get_model_server API to check for an active Pay-As-You-Go server (has-payg-server). This improves performance and reduces redundant API calls.
-
Cloud Import Fails for Visual NER Projects When Local Import Is Disabled
Fixed an issue where users could not import tasks from the cloud for Visual NER projects if local import was disabled in system settings. Cloud import now works correctly regardless of the local import configuration.
-
Revoke Data Missing for LLM Request
Fixed an issue where the Revoke section did not display any data, even when items had been revoked. The section now correctly shows all revoked items, ensuring accurate visibility for LLM requests.
-
Server Status Inconsistency and OCR Server Auto-Selection
Fixed an issue where the server status appeared as idle on the Cluster page but was incorrectly marked as busy on the Pre-annotation page. Server status now displays consistently across both pages.
Fixed an OCR server auto-selection issue on the import page. Users can now intentionally choose the OCR server or rely on automatic pre-selection.
-
Empty CSV Exported for De-identified Text
Fixed an issue where exporting tasks as CSV for de-identified text resulted in an empty file. Exporting in JSON, CSV, or TSV formats now correctly includes all de-identified task data.