6.4.0
Release date: 28-04-2026
Visual NLP 6.4.0 Release Notes 🕶️
We are glad to announce that Visual NLP 6.4.0 has been released! Dicom improvements, a new OCR engine, and much more. 📢📢📢
Main Changes 🔴
- Dicom Processing Improvements
- New Optimized Fast Dicom Pipeline
- New V4 OCR engine
- New Yolo based Layout analysis model
- New OpenVino Text Detector model
- Two new AWS Marketplace listings
Dicom Processing Improvements
Dicom processing pipelines now support sampling of the frames in each study. This new strategy allows pipelines to run faster by looking into only a subset of all the frames in the study. The changes are spread across different components.
DicomToImageV3 Changes
- Added a new param
setFrameSamplingStrategy(). Valid values are['Consecutive', 'Stride', 'Random', 'Middle']for sub-sampling the frames. - Added a new param
setFrameDimsCol(), which contains metadata information about the image. - Fixed the selection of frames via
setInputCols():DicomToImageV3accepts a columnparts, which is a per-dicom-file integer list representing a cherry-picked list of frame ids.
DicomToImageV3.setInputCols(['content', 'parts'])
Some corner cases were fixed in this feature.
- Added page number information in the
pagenumcol.
DicomDrawRegion Changes
- This component renders the regions into the frames of the input dicoms. When frame sampling was performed in previous stages, this component will now perform the extrapolation of the regions that were analyzed to all the frames in the dicom file.
DicomMetadataDeidentifier
- Added guardrails for
remove,delete, andreplaceWithLiteralactions when applied to VR SQ. - Added support for removing or deleting group tags through a group strategy file via
setGroupStrategyFile("group_strategy.csv"), for example deletion of all overlay tags in group60xx. - Improved private tag removal with
setRemovePrivateTags(True)to consistently delete private tags from the DICOM file. - Improved tracking of DICOM object references to better preserve and trace metadata tag operations.
If you do not want to build from scratch, but want to leverage this and other optimizations, check the next section.
New Optimized Fast Dicom Pipeline
This new pipeline leverages image re-scaling, compression, and frame sub-sampling for improved performance. Just call it like this:
from sparkocr.pretrained import DicomPretrainedPipeline
dcm_pipe = DicomPretrainedPipeline("dicom_deid_fully_optimized")
clean_dcm_df = dcm_pipe.transform(dicom_df)
New V4 OCR engine
This model is a new OCR model that operates with an external Text Detector for high-recall use cases such as de-identification. Contrary to V2 families, which were Transformer-based, this one is CNN-based, which allows it to deliver reasonable throughput even on CPU. The model is competitive accuracy-wise as well.
text_detector = ImageTextDetectorCraft()\
.pretrained("text_detection_v4", "en", "clinical/ocr")\
.setInputCol("image")\
.setOutputCol("regions")\
.setSizeThreshold(10)\
.setLinkThreshold(0.3)\
.setTextThreshold(0.4)\
.setWithRefiner(False)
text_extractor = ImageToTextV4()\
.pretrained("text_recognition_v4", "en", "clinical/ocr")\
.setInputCols(["image", "regions"])\
.setOutputCol("text")
New Yolo based Layout analysis model
This new model can detect layout entities {Text, Title, List, Table, Figure}. It is similar in accuracy to other DiT-based models in the library, but with a speed-up of up to 10X over DiT options such as ImageLayoutAnalyzerDit.
This is how you use it:
doc_layout = DocumentLayoutAnalyzer \
.pretrained("doc_layout_jsl", "en", "clinical/ocr")
New OpenVino Text Detector model
Our ImageTextDetector annotator, which is used in many OCR and de-identification pipelines, now supports checkpoints with OpenVino. To use it, call the annotator exactly the same way, but pass the image_text_detector_open_vino model name like this:
ImageTextDetector.pretrained("image_text_detector_open_vino", "clinical/ocr", "en")
This model delivers a speed-up of around 2.2X when used on CPUs that support AI acceleration features such as AVX-512, VNNI, and bfloat16. For example, AWS’s C7a family.
Two new AWS Marketplace listings
- Vision OCR LLM: highly accurate text extraction model.
- Vision OCR Structured LLM: a highly accurate VLM-based text extraction model that can handle text and tables.
Previous versions
- 6.3.0
- 6.0.0
- 5.5.0
- 5.4.2
- 5.4.1
- 5.4.0
- 5.3.2
- 5.3.1
- 5.3.0
- 5.2.0
- 5.1.2
- 5.1.0
- 5.0.2
- 5.0.1
- 5.0.0
- 4.4.4
- 4.4.3
- 4.4.2
- 4.4.1
- 4.4.0
- 4.3.3
- 4.3.0
- 4.2.4
- 4.2.1
- 4.2.0
- 4.1.0
- 4.0.2
- 4.0.0
- 3.14.0
- 3.13.0
- 3.12.0
- 3.11.0
- 3.10.0
- 3.9.1
- 3.9.0
- 3.8.0
- 3.7.0
- 3.6.0
- 3.5.0
- 3.4.0
- 3.3.0
- 3.2.0
- 3.1.0
- 3.0.0
- 1.11.0
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.2
- 1.1.1
- 1.1.0
- 1.0.0