Extract Images from PDF

Overview

The 'Extract Images from PDF' Power Automate action enables images to be extracted from specified regions of a PDF document and returns an array of the extracted image files.

Whilst this action is limited to extracting images regions from PDF documents, simply convert files to PDF format using the 'Convert to PDF' flow action prior to executing this action to enable image regions to be extracted from 70+ different files types.

Please refer to the Supported File Types article for a complete list of the different file formats / document types which are supported for PDF conversion.

Parameters

The default 'Extract Images from PDF' flow action parameters are detailed below:

  • Filename: The PDF filename (including file extension)
  • File Content: A Base64 encoded representation of the PDF file to be processed.
  • Image Regions: An array of Image Regions (See below for further details)

mceclip1.png

Please refer to the Obtaining the 'File Contents' Parameter article for guidance on how to obtain the 'File Content' parameter ready to provide to an Encodian flow action. 

Extract Images from PDF Utility

The 'Extract Images from PDF Utility' utility helps you to automatically generate the JSON data used by the 'Extract Images from PDF' Power Automate action. You can upload and browse a PDF document selecting the required regions, the utility then automatically generates the JSON which can be passed to the Encodian action.

1.png

Image Region Detail

An Image region is specified as a rectangle and is made up of 4 coordinates representing the bottom left of the rectangle on the X and Y axis and the upper right of the rectangle on the X and Y axis.

The origin (0,0) of the coordinate system is the bottom left hand corner of the page.  Coordinates are specified in points, a typical A4 page is 595 x 842 points.

mceclip2.png

  • Image Region - Multiple image regions can be selected in one operation.  To create more than one region click the "Add new item" button:
    • Image Regions Name: A name for the region that will be used as the filename for the extracted image.  If duplicate names are provided, a number will be appended to the end of the name to guarantee uniqueness.
    • Image Regions Lower Left X Coordinate: Number of points across from the left hand edge of the page to the lower left corner of the rectangle
    • Image Regions Lower Left Y Coordinate: Number of points up from the bottom edge of the page to the lower left corner of the rectangle
    • Image Regions Upper Right X Coordinate: Number of points across from the left hand edge of the page to the upper right corner of the rectangle
    • Image Regions Upper Right Y Coordinate: Number of points up from the bottom edge of the page to the upper right corner of the rectangle.
    • Image Regions Page Number: The page number of the PDF from which to extract the image.
    • Image Regions Image Type: The required file type of the extracted image, choices are TIFF, PNG, JPG or BMP.
    • Image Regions Extract Entire Page:  If set to Yes any specified coordinates will be ignored and the entire page will be extracted as an image.
    • Image Regions Resolutions: The DPI or the extract image.

Advanced Parameters

The advanced 'Extract Image from PDF' flow action parameters are detailed below:

mceclip3.png

Return Parameters

The 'Extract Images from PDF' Power Automate action returns the following data. 

Action Specific Return Values

  • Images- The collection of images extracted from the PDF document supplied
    Each document within the documents collection contains the following values:
  • Filename - The filename of the image
  • FileContent - The file content of the image

Working with the images returned (example)

1.png

The following is an example of the JSON payload that is returned in the response. 

"Images": [
{
"fileName": "image1.tiff",
"fileContent": "SUkqAH...."
},
{
"fileName": "image2.png",
"fileContent": "iVBORw...."
},
{
"fileName": "image3.jpeg",
"fileContent": "/9j/4AAQ...."
},
{
"fileName": "image4.png",
"fileContent": "iVBOR...."
}
],
"HttpStatusCode": 200,
"HttpStatusMessage": "",
"OperationId": "**********-****-****-****-************",
"Errors": [],
"Operation Status": "Complete" 

Standard Return Values

  • OperationId - The unique ID assigned to this operation.
  • HttpStatusCode - The HTTP Status code for the response.
  • HttpStatusMessage - The HTTP Status message for the response.
  • Errors - An array of error messages should an error occur.
  • Operation Status - Indicates whether the operation has completed, has been queued or has failed.

 

 

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk