Automated extraction of metadata from PDF documents with Power Automate

In today's digital world, the automated extraction of metadata from PDF documents is essential for companies to process information efficiently. With tools such as Microsoft Power Automate, business processes can be optimised by digitising recurring tasks.

Background and workflow

Companies regularly receive PDF documents from partners, customers or service providers that contain important business data. Managing these documents manually was common in the past, but the current trend is towards automation. Despite automation, a logical check of the received data should still be part of the process.

Overview of providers

There are various third-party providers that help with automated text extraction from documents. These functions can be further processed within the Office 365 environment.

Provider Seat Features Free contingent per month
Encodian U.K. Text extraction from PDF documents 50 documents
Docparser U.S. Text extraction from PDF documents 30 documents
Parserr.com U.S. Text extraction from emails and their attachments 10 documents
Aquaforest U.K. Text extraction from PDFs 100 documents

The providers make it possible for documents to be sent by e-mail attachment or as a file and then analysed using OCR technology and returned as metadata.

Integration with Microsoft Power Automate

Automation is supported by integration options with platforms such as Zapier.com, IFTTT.com and PowerAutomate achieved. The focus here is on integration with Power Automate for seamless workflow creation.

Provider Integration with PowerAutomate Templates for flows
Encodian Connector available no
Docparser Connector available Yes
Parser Connector available Yes
Aquaforest Connector available Yes

With Parserr in particular, you can set up endpoints for receiving mails in order to integrate mails with attachments directly.

Microsoft solutions

Microsoft has introduced the paid product „SharePoint Syntex“, which is based on the „Cortex“ project. This tool uses AI for metadata extraction. After extraction, the information is stored in metadata columns of the document libraries. However, this does not currently offer any direct application with Power Automate.

Alternatively, Microsoft offers the Form Recognizer from Azure Cognitive Services. This service is available in Europe, e.g. in France or the U.K., and offers a free quota of 500 pages per month.