Description
This is a pretrained pipeline designed to correct skew in printed documents, improving the readability of the text and enhancing the accuracy of Optical Character Recognition (OCR) processes. By automatically detecting and adjusting any misalignment or tilt in scanned or photographed documents, the pipeline ensures that the document is properly oriented for optimal text extraction.
The model uses advanced image processing techniques to analyze the orientation of the document, apply the necessary corrections, and produce a more uniform and readable output. This preprocessing step is crucial for OCR tasks, as skewed or tilted documents can lead to inaccurate text recognition. By enhancing the document’s alignment, the pipeline significantly boosts the reliability and efficiency of subsequent OCR processes, making it an invaluable tool for digitizing and extracting text from printed materials.
Predicted Entities
Live Demo Open in Colab Download
How to use
pdf_pipeline = PretrainedPipeline('mixed_scanned_digital_pdf_skew_correction', 'en', 'clinical/ocr')
pdf_path = '/content/pdfs/'
pdf_example_df = spark.read.format("binaryFile").load(pdf_path).cache()
result = pdf_pipeline.transform(pdf_example_df)
val pdf_pipeline = new PretrainedPipeline("mixed_scanned_digital_pdf_skew_correction", "en", "clinical/ocr")
val pdf_path = "/content/pdfs/"
val pdf_example_df = spark.read.format("binaryFile").load(pdf_path).cache()
val result = pdf_pipeline.transform(pdf_example_df)
Example
Input

Output

"PEMBERITAHUAN PERTANYAAN DEWAN RAKYAT\n\nPERTANYAAN\n\nBUKAN JAWAB LISAN\n\nDARIPADA\n\nTUAN BUDIMAN BIN MOHD. ZOHDI\n\nSOALAN\n\nNO. 3\n\nTuan Budiman bin Moh\n\nTI\n\nd. Zohdi [ Sun\n\nbidang\n\nlah terkini pelaj\n\nar kolej komuniti di seluruh\n\ngai Besar ] minta MENTERI PENDIDIKAN\n\ndan kursus serta a\n\nPakah tahap\n\nut\n\nkebolehpasaran lepasan kolej k\n\nomuniti ini\n\nJAWAPAN\n\nTuan Yang di-Pertua,\n\nUntuk makluman Ahli Yang Berhormat\n\nSeluruh negara sehingga 16 Ogos 2016\n\njJumlah terkini pelajar aktif Kolej Komuniti di\n\nadalah se\n\nfamai 19,933 orang yang mengikut\n\nPengajian di peringkat meliputi kursus dip\n\nloma dan sijil.\n\nBerdasarkan Kajian\n\nPengesanan Graduan tahun 2015 iaitu soal\n\nSelidik yang dijalankan ke atas graduan\n\nsemasa musim konvokesyen dapatan me\n\ngraduan Kolej Komuniti adalah 97.4%.\n\nnunjukkan kadar kebolehpasaran bagi\n"
Model Information
| Model Name: | mixed_scanned_digital_pdf_skew_correction |
| Type: | pipeline |
| Compatibility: | Visual NLP 5.0.2+ |
| License: | Licensed |
| Edition: | Official |
| Language: | en |