The Indxr application is designed to perform OCR PDF documents stored within a SharePoint Online document library.
Upon launching the application the user will be presented with the Application Settings page.
The following information is required:
- Api Key - this will be provided to you when you purchase a Indxr licence
- Log File Location - the file system location to store the Indxr log file
- Execution Staging Directory - the file system location where documents can be stored whilst being processed
- Language - the default OCR language. Targets the correct language used in the files.
- Authentication Refresh Time (Minutes) - The time in minutes before the user authentication context is refreshed.
You will next be asked to enter the URL of the SharePoint Online site where the documents requiring OCR are stored and select and authentication method.
Authentication methods that are user are:
- Email and Password - this is the recommended approach. If you have MFA enforced on your account this approach will not work with your standard password, instead you will need to create an application password. Instructions on how to do this as per the relevant configuration for your organisation can be found in this Microsoft support article
- Browser - this method will use the authentication session currently in use by your browser. If there is no session when clicking Next will prompt the user to enter their SharePoint Online credentials and if necessary provide multi-factor authentication.
Subject to successful authentication the user can then configure SharePoint and Performance settings.
SharePoint setting include Source Library, Source Folder and Target document library for the OCR operation along with the following configuration items.
- Copy metadata will transfer metadata from the source document to the target document. This will only succeed if the necessary fields are configured on the target library
- Copy permission will transfer the permissions from the source document to the target document
- Force OCR will ensure the application carries out OCR on all PDF file regardless of whether they already have a text layer or no
- Overwrite Source File will be available if the source and target libraries are the same
- New File Prefix will be available if the source and target libraries are the same but the Overwrite Source File setting is false. This is to ensure a unique filename is created
Performance settings allow the user to configure the following items.
- Document Threads controls how many documents can be processed concurrently*
- Page Threads controls how many pages can be processed concurrently*
- Performance can be set to either fast or quality. Quality will provide the best results but will increase processing times
- Image Clean up will perform deskew, despeckle, rotation and brightness and contrast adjustments before OCR is performed
* when adjusting these values, consideration needs to be given to the hardware of the machine hosting the tool. If processing power and memory on the host machine are not sufficient then the tool will not be able to execute at the configured performance levels.
Clicking "Execute" will start the OCR job.
A summary is provided of the settings associated with the job. The grid will be dynamically updated as each document is processed. Rows in the grid will be colour coded based on the result of the action:
- Green - processed successfully
- White - in progress
- Red - OCR operation failed for the file
- Grey - OCR not required for the file / operation has been cancelled
The user has the options to export the grid to a CSV file and also open to the log file to examine any processing errors.
The entire operation can be cancelled at any point by the user clicking the "Cancel" button. This will complete the processing of the current document before stopping the execution job.
Multiple instances of Indxr can be executed simultaneously targeting different document libraries. However, it is recommended that the performance of the host machine is monitored closely.
17/09/2021 - 22.214.171.124