Automated extraction of metadata from PDF documents with Power Automate
In today's digital world, the automated extraction of metadata from PDF documents is essential for companies to process information efficiently. With tools such as Microsoft Power Automate, business processes can be optimised by digitising recurring tasks.
Background and workflow
Companies regularly receive PDF documents from partners, customers or service providers that contain important business data. Managing these documents manually was common in the past, but the current trend is towards automation. Despite automation, a logical check of the received data should still be part of the process.
Overview of providers
There are various third-party providers that help with automated text extraction from documents. These functions can be further processed within the Office 365 environment.
| Provider | Seat | Features | Free contingent per month |
|---|---|---|---|
| Encodian | U.K. | Text extraction from PDF documents | 50 documents |
| Docparser | U.S. | Text extraction from PDF documents | 30 documents |
| Parserr.com | U.S. | Text extraction from emails and their attachments | 10 documents |
| Aquaforest | U.K. | Text extraction from PDFs | 100 documents |
The providers make it possible for documents to be sent by e-mail attachment or as a file and then analysed using OCR technology and returned as metadata.
Integration with Microsoft Power Automate
Automation is supported by integration options with platforms such as Zapier.com, IFTTT.com and PowerAutomate achieved. The focus here is on integration with Power Automate for seamless workflow creation.
| Provider | Integration with PowerAutomate | Templates for flows |
|---|---|---|
| Encodian | Connector available | no |
| Docparser | Connector available | Yes |
| Parser | Connector available | Yes |
| Aquaforest | Connector available | Yes |
With Parserr in particular, you can set up endpoints for receiving mails in order to integrate mails with attachments directly.
Microsoft solutions
Microsoft has introduced the paid product „SharePoint Syntex“, which is based on the „Cortex“ project. This tool uses AI for metadata extraction. After extraction, the information is stored in metadata columns of the document libraries. However, this does not currently offer any direct application with Power Automate.
Alternatively, Microsoft offers the Form Recognizer from Azure Cognitive Services. This service is available in Europe, e.g. in France or the U.K., and offers a free quota of 500 pages per month.



