OCR from Image?

Requests for new functionality or improvements in existing functionality. Please provide clear descriptions of your request, an example or if possible a real life scenario.
Post Reply
mhswa
Posts: 1
Joined: Thu Nov 15, 2018 8:25 am

OCR from Image?

Post by mhswa » Thu Nov 15, 2018 8:28 am

Hello

We upload alot of PDF documents that are tech drawings that have dimensions on them, it doesn't seem that tesseract process them at all? see below an example is there anyway we can get this to work as sometimes we need to search with little information like a dimension only

https://imgur.com/a/v6Kl7gh

User avatar
rosarior
Posts: 159
Joined: Tue Aug 21, 2018 3:28 am

Re: OCR from Image?

Post by rosarior » Thu Nov 15, 2018 11:37 pm

Tesseract can't recognized rotated text (https://github.com/tesseract-ocr/tesser ... -deskewing).

My recommendation would be to add a rotation transformation to align the most amount of numbers.

Off the top of my head I can't think of any OCR engine that can recognize text with multiple rotations in the same page. I think you would need a trained neural network or a custom computer vision implementation to pull that off.

Post Reply