OCR extracted text options

eli · September 25, 2024, 9:58pm

I’ve been using the OCR service for a month or more now with great results mostly. I’m using the service to extract text from images and then match that text to items in my DB. For the most part it works very well and I’ve been able to come up with work arounds for the few minor issues I had except for one thing. when text goes to a new line in the middle of a value the extracted text also does this and I can’t get bubble to recognize the item since it’s no longer an exact match. An example

item in data is -
“On One Line”

extracted text reads as -
This would normally work if it was on
one line, but since it split in the middle of the value I want to match, it doesn’t work.

The simplest way to fix this would be a method where I can make the OCR text extract as a single line of text or even just as plain text without going to a new line unless the available space forces the text on to a new line. I’ve tried all available extraction formats and each will go to a new line if there’s a new line in the scanned text. Is it possible to add a single line option, or am I maybe missing something/doing it wrong?

Stephan · September 26, 2024, 3:05pm

Hello @eli ,

Thank you for reaching out.

To better understand the concern you’re experiencing, could you please provide more details about your use case? Specifically, could you confirm which plugin you’re referring to, as we offer two OCR plugins:

Additionally, it would be helpful if you could share screenshots or screencasts of your workflow setup, the plugin element, the issue you’re encountering, and the expected outcome. For recording screencasts, I recommend using Loom.

Could you also provide the test image you’re using so we can run tests on our side?

These details will help us replicate your setup and investigate further.

Thank you again, and I look forward to your reply.

Best regards,
Stefan

eli · September 26, 2024, 8:21pm

I’m only able to add one image per post so I’ll split this up.

@Stephan My app is designed to scan ingredient labels and match them to a record in my database. As of now we are using the OCR on the left in your image. OCR- Convert Images & PDF to text. I will upload an image like the one I’ve attached below. This image should work to produce the result I’m getting as it’s the one I’ve used.
tWJ66

eli · September 26, 2024, 8:22pm

eli · September 26, 2024, 8:24pm

Below is a photo of the ingredient in my database
Screenshot 2024-09-26 144348

eli · September 26, 2024, 8:27pm

Here you can see the method I use to match text. I have a other similar set ups to match when an ingredient has multiple names and one of those other names is on the label instead of the primary name. I also have one set to catch if there is an error the OCR commonly makes when with a certain word or ingredient name. these are fed into various repeating groups then joined back into one group.

eli · September 26, 2024, 8:31pm

As mentioned Potassium Sorbate doesn’t appear at all in this list and I can’t simply add Potassium and Sorbate as alternate names because each of those can also be a part of other ingredients such as Potassium Alum. Anyway this image is so you can see what I was mentioning with the other ingredient that was split onto a new line.

Stephan · September 27, 2024, 3:10pm

Hello @eli,

Thank you for your message.

Regarding the OCR plugin, it is working by retrieving text from the provided images, which you can then use for your specific use case. Based on your message and our tests, the plugin is functioning correctly and extracting the text from the image you shared.

For your use case, you can try using a Bubble expression like Arbitrary text with the OCR state’s “Output as Text” and then split the result using a comma:

This workaround will allow you to extract a list of ingredients from the image.

Please note that this is a solution for this specific use case. Our Zeroqode support is focused on providing general assistance and fixing plugin-related bugs, unfurtunately we can’t provide customizations in the app.

Thanks for your understanding, and have a wonderful weekend!

Best regards,
Stefan

eli · September 27, 2024, 10:23pm

I did try this solution, but in my experience the text output will go to a new line wherever the original text did. So If I was to
go to a new line here and then the extracted text would as well. When that happens in the middle of a value I need, the app does not recognize it. I’ve used the (and retried again today) the expression you suggested and several others. But it’s like there’s a line break or something in the output that even when using the split by comma operator it will still not read correctly. I understand you’re only able to provider support though.

Stephan · September 30, 2024, 2:05pm

Hello @eli ,

Thank you for your message, and apologies for the delayed reply over the weekend.

From your message, I understand that you prefer not to see the line breaks in the created list. Is that correct?

To remove them, you can use the Bubble expression “each item: formatted as JSON-safe” and then apply “Find and replace” to make the necessary adjustments to the list.

Additionally, for your use case involving different types of images, you might find it helpful to use a ChatGPT plugin. With it, you can create a prompt that will automatically extract a list of ingredients from the output text.

I hope this helps! Please feel free to reach out if you have any further questions.

Looking forward to your reply.

Best regards,
Stefan