Have you ever wondered how you could check if a PDF document contains a specific string? Wonder no more!
Let's dive into it directly.
The use case is as following: Enter a key word, select a file and check if the file contains the keyword. Response shall be "true" if it does.
This is a very simplified version but it was a POC I built for a colleague. It can later be extended to work with several files or several keywords.
So, what do we need to do?
We will use Power Automate and a AI builder action.
For this blog post we will do a very simplified version and we will use the manual trigger as our start.
The complete flow looks like this:
It looks easy but stand with me. There are some important things to take care of.
We need the AI Builder action "Recognize text in an image or a PDF document". This will allow us to check the content of the document.
The output of the AI builder action is the following:
In this case we need only the output "results" which is an object with an array of all "lines". "Lines" as you might think are really all the lines the pdf document contains.
We could check every line, which is a string, for the keyword but this would be very tedious and time consuming. Even or especially(?) for Power Automate.
Therefore, we will use "results". Now you might wonder, how I am going to compare the object with a string and check if it contains our keyword string. Well, there is a trick:
contains(string(outputs('Recognize_text_in_an_image_or_a_PDF_document')?['body/responsev2/predictionOutput/results']),triggerBody()['text'])
You need to convert the object to a string and et voilà you can check if the PDF document contains the keyword.
As you can see, a very easy flow that allows you to check keywords against PDF documents easily and fast.
Let me know how you extended this solution :)