Newbie Questions

Questions, comments, discussions. Over time certain topics might be moved to their own category.
Post Reply
SimonSnow
Posts: 5
Joined: Wed Mar 27, 2019 5:09 am

Newbie Questions

Post by SimonSnow » Wed Mar 27, 2019 3:24 pm

Hi,

Thank you for the documentation. I successfully installed the Mayan EDMS using the Basic deployment and interested in the OCR function.

I will be greatly appreciated if any advice and assistance on using the system could be given. I would like to post some questions.

For example, by adding a document (purchase order) to the system, is the OCR automatically done on the document? Or I have to click 'OCR' to perform the action. Where can I found the OCRed copy that I can use to paste the 'content' to Excel for further processing? I understand I get the OCRed page after clicking OCR; however, I have to do this for all incoming purchase orders. I am trying to automate as much as possible.

On the 'Home Page', there are two searches: Page and Document. What are the difference?

Thanks

KevinPawsey
Posts: 85
Joined: Wed Aug 22, 2018 2:52 pm

Re: Newbie Questions

Post by KevinPawsey » Thu Mar 28, 2019 11:14 am

Hi Simon,

the OCR is done automatically when the document is uploaded.

If you go to the document a few minutes (CPU/size of document depending), you should see an OCR option at the side... the text content should be in there, from there you can copy/paste the information that you need... I don't think that it produces an OCR'd "document" that you can then download.


Kevin
Running Mayan-EDMS on: OpenMediaVault, (Docker plugin), on x86 dual-core

SimonSnow
Posts: 5
Joined: Wed Mar 27, 2019 5:09 am

Re: Newbie Questions

Post by SimonSnow » Thu Mar 28, 2019 2:37 pm

Thank you for your reply.

Please advise what the differences between Page Search and Document Search are.

KevinPawsey
Posts: 85
Joined: Wed Aug 22, 2018 2:52 pm

Re: Newbie Questions

Post by KevinPawsey » Fri Mar 29, 2019 12:55 pm

Hi Simon,

just thought of one proviso on the "OCR is done automatically" ... when you setup the document type, there is an option for OCR.

Go to iSystem/Setup/Document Types
You will then see "Setup OCR" button to the right of the document type, which if you click on that there is a checkbox:
"Automatically queue newly created documents for OCR."

This means that whenever you import documents with that document type, they are automatically queued for OCR. I think that the default document type has this checked by default.

On to your next question... the difference between search for pages, and search for documents:

If you search for something under "Page Search" the results will be the actual pages that have the search term in it... this could be multiple pages within a document. When you open any of the results they will have the search term on that page.

If you search for something under "Document Search" the results will be the document that contains that search term. If there are multiple pages that have that search term in it, the result will be just the document rather than each individual page.

Hope that my explanation makes sense.


Kevin
Running Mayan-EDMS on: OpenMediaVault, (Docker plugin), on x86 dual-core

SimonSnow
Posts: 5
Joined: Wed Mar 27, 2019 5:09 am

Re: Newbie Questions

Post by SimonSnow » Fri Mar 29, 2019 7:54 pm

Kevin, thank you for your reply.

By reading your reply on Search, here is my understanding. Let say I have only 1 PDF document in the system and this document has 3 pages: Page 1 has only a picture with the words 'Chapter One'; page 2 has only a picture with the words 'Chapter Two' only; Page 3 has only a picture with the words 'Chapter Three'.

Scenario 1
Doing Page Search on 'Chapter', the result is 3 'icons' that represents three individual pages.
Doing Document Search on 'Chapter', the result is 1 'icon' that represent the document.

Scenario 2
Doing Page Search on 'Chapter Two', the result is 1 'icon' that represents a page.
Doing Document Search on 'Chapter Two', the result is 1 'icon' that represent the document.

However, If I did NOT do the OCR processing on the PDF document after adding the document to the system, I would get an empty result on all these searches because the document has only pictures.

KevinPawsey
Posts: 85
Joined: Wed Aug 22, 2018 2:52 pm

Re: Newbie Questions

Post by KevinPawsey » Sat Mar 30, 2019 4:32 am

Hi Simon,

yes, I believe that you get it, that is what would happen :)


Kevin
Running Mayan-EDMS on: OpenMediaVault, (Docker plugin), on x86 dual-core

SimonSnow
Posts: 5
Joined: Wed Mar 27, 2019 5:09 am

Re: Newbie Questions

Post by SimonSnow » Mon Apr 01, 2019 10:48 pm

Thank you for your confirmation.

We accepted the fact that OCR does NOT provide 100% accuracy on the scan document; is it possible to fix the OCR copy of the document myself to allow better result on subsequent search. Using the previous example, the OCR scan on the document produced 'Chopter' instead of 'Chapter'.

Thanks

HarryE
Posts: 5
Joined: Wed Apr 03, 2019 12:01 pm

Re: Newbie Questions

Post by HarryE » Wed Apr 03, 2019 1:02 pm

I this case, the database is your friend
Just access public.ocr_documentpageocrcontent table, identify your document/page and modify the OCR text using other OCR engine result or method.

HTH

SimonSnow
Posts: 5
Joined: Wed Mar 27, 2019 5:09 am

Re: Newbie Questions

Post by SimonSnow » Thu Apr 04, 2019 2:34 pm

Thank you. I will check it out.

Post Reply