In 'Analysis Only' configuration mode the user can add one or more SharePoint sites to the configuration to perform analysis against.
The managed path drop down can be used to select from the predefined SharePoint Online managed paths. None is provided as an option to enable the root site within the tenant to be targeted.
Upon adding a site a settings window is presented whereby the user can select 'All Libraries' or select specific libraries to perform analysis on.
Note: As this is an 'Analysis Only' configuration some of the settings pertaining to OCR are read-only.
Once added the Site will appear in the list. The pencil icon can be used to edit the settings pertaining to the site and the bin icon will remove the site from the analysis.
'Save Configuration' can be used to store the configuration in a separate .indxr file that can be reloaded at a later date.
Once the list of sites has been added click 'Analyse' to run the analysis job.
Progression of the job will be displayed to the user and the results shown when the job has been completed.
When complete the results can be exported to a .csv file for further analysis or distribution.
Configuration settings allow the user to set the following items.
- Document Threads control how many documents can be processed concurrently*
- Page Threads control how many pages can be processed concurrently*
- Performance can be set to either fast or quality. Quality will provide the best results but will increase processing times
* When adjusting these values, consideration needs to be given to the hardware of the machine hosting the tool. If processing power and memory on the host machine are not sufficient then the tool will not be able to execute at the configured performance levels. Placing both values to the maximum available will not necessarily result in better performance as it may result in more locking as multiple threads attempt to access the shared resources.
Handling Large Libraries
To improve performance and allow for concurrent processing, we recommend that where possible, large document libraries are split into smaller libraries with item counts below the 5000 SharePoint list view threshold limit. Indxr will work across larger repositories but processing times may be affected.
OCR is a complex and resource-intensive operation. Processing of large repositories can easily take days or weeks. Indxr is designed so that multiple instances can be executed simultaneously with no additional licensing overhead.