One of our suppliers sends multi-page invoices where the invoice summary information at the top of each page is repeated then a table begins with with 5 named columns (Description, Order Qty, Shipped Qty, Unit Price, Net). The description column field can have multiple rows and is overloaded to include 3 pieces of information we wish to extract (Manufacturer, Model #, Actual description of the item).
I have been using the advanced tagging and identifying all 7 pieces of information we wish to extract (3 pieces in the description column and 1 piece each in the remaining 4 columns. I have also been using the multi-page support since the table can span multiple pages.
The issue I have now is a single item can span two pages i.e. the overloaded description column field had one line at the end of one page and the remainder on the next. Any ideas on how to properly tag this?
Here is a rough example of the issue. Item #3 is the issue and is split between pages:
------------------------------------------------------------
Supplier name order date
Address order number
Phone
------------------------------------------------------------
Description Order Qty Ship Qty Unit Price Net
MyMfg1 MyModel#1 1 1 $10 $10
MyDescription of #1
MyMfg2 MyModel#2 1 1 $10 $10
MyDescription of #2
MyMfg3 MyModel#3 1 1 $10 $10
-------------------------- Page Break ------------------------------
Supplier name order date
Address order number
Phone
------------------------------------------------------------
Description Order Qty Ship Qty Unit Price Net
MyDescription of #3
MyMfg4 MyModel#4 1 1 $10 $10
MyDescription of #4
MyMfg5 MyModel#5 1 1 $10 $10
MyDescription of #5
Hi @Wittig ,
Thanks for reaching out on that.
Indeed, there's no possibility for now to continue tagging a field that started in a page and spans over a second page.
Did you try not tagging documents in this case but only documents with descriptions that don't span over two pages? Then see if at prediction time, such documents are predicted correctly.
I found 12 invoices which span multiple pages but do not split the overloaded description field. After tagging these files as well as 12 more single page invoices the accuracy score is now 87% across the entire model but the overloaded field is only 39%. Is there a way to improve this?
Bad news. Once the model encounters one of these "page spanned" description fields the accuracy goes way down for every entry thereafter. Even when it goes to a new page with no "page spanned" description it can't recover. In one document I tested it even started to drop columns.
At this point it is unusable which is very disappointing.
Hi @Wittig ,
Sorry to hear about that.
Did you try to tag the Description as a unique field (instead of 3 fields) and see if you get better results? If it's the case, then perhaps you would be able to do post processing logic in a flow to separate the values.
I have not tried that but if the AI model can't handle the complex parts of the document, I could just write a non-AI parser for the entire document. Our goal is to add more collections which contain invoices from other suppliers so we could have one process for all. Also, I would have to start the entire document tagging process over again or lose all the existing tagging.
I did try yet again to add even more training documents to the model which contained many of the special cases/anomalies which show up in the invoices. Unfortunately, the accuracy score went down as a result and when I tested it with a document, I used to train it with it couldn't replicate the training data.
Are there any plans for improving the types of issues I am encountering? If so, is there a timeline?
Antrod - I tried what you suggested, and I still get poor results. I trained a new model with 8 documents, and it has problems with both single and multi-page documents.
I have no idea how to make the tables any simpler.
Hi @Wittig ,
I'm really sorry to hear that latest tests weren't positive.
We unfortunately have no date to have this scenario covered.
Are you using the Structured version of the Document processing model? If it's the case, the very last thing I may suggest you is to test with the Unstructured version. I understand it could be tedious to recreate the model but perhaps you could create a very basic version of the model just to see if you have better result.
@Antrod @Wittig I am facing the exact same issue. I need to extract information in tabular format from order confirmation pdfs received from suppliers. Each pdf has multiple items and each item will have a code, description, quantity and delivery date. So the table will have four columns: Code , Description, Quantity, Delivery Date with each row representing an item.
The problem arises when some details for an item are present at the bottom of one page and the remaining details are on the next page.
The document cannot be tagged correctly in such case and the accuracy of model is pretty bad. Suppose the document has 2 pages then the tagged tables look like this
Code | Description | Quantity | Delivery Date |
101 | this is first item | 56 | 22.08.2023 |
102 | this is second item | 65 | 23.08.2023 |
103 | this is third item |
Code | Description | Quantity | Delivery Date |
72 | 24.08.2023 | ||
104 | this is fourth item | 80 | 23.08.2023 |
105 | this is fifth item | 60 | 21.08.2023 |
You could also try this template that just uses text recognition to create a text replica of the given file & feeds it to GPT for data extraction:
We are excited to announce our new Copilot Cookbook Gallery in the Copilot Studio Community. We can't wait for you to share your expertise and your experience! Join us for an amazing opportunity where you'll be one of the first to contribute to the Copilot Cookbook—your ultimate guide to mastering Microsoft Copilot. Whether you're seeking inspiration or grappling with a challenge while crafting apps, you probably already know that Copilot Cookbook is your reliable assistant, offering a wealth of tips and tricks at your fingertips--and we want you to add your expertise. What can you "cook" up? Click this link to get started: https://aka.ms/CS_Copilot_Cookbook_Gallery Don't miss out on this exclusive opportunity to be one of the first in the Community to share your app creation journey with Copilot. We'll be announcing a Cookbook Challenge very soon and want to make sure you one of the first "cooks" in the kitchen. Don't miss your moment--start submitting in the Copilot Cookbook Gallery today! Thank you, Engagement Team
We are excited to share that the all-new Copilot Cookbook Gallery for Power Apps is now available in the Power Apps Community, full of tips and tricks on how to best use Microsoft Copilot as you develop and create in Power Apps. The new Copilot Cookbook is your go-to resource when you need inspiration--or when you're stuck--and aren't sure how to best partner with Copilot while creating apps. Whether you're looking for the best prompts or just want to know about responsible AI use, visit Copilot Cookbook for regular updates you can rely on--while also serving up some of your greatest tips and tricks for the Community. Check Out the new Copilot Cookbook for Power Apps today: Copilot Cookbook - Power Platform Community. We can't wait to see what you "cook" up!
You are now a part of a fast-growing vibrant group of peers and industry experts who are here to network, share knowledge, and even have a little fun. Now that you are a member, you can enjoy the following resources: Welcome to the Community News & Announcements: The is your place to get all the latest news around community events and announcements. This is where we share with the community what is going on and how to participate. Be sure to subscribe to this board and not miss an announcement. Get Help with Power Automate Forums: If you're looking for support with any part of Power Automate, our forums are the place to go. From General Power Automate forums to Using Connectors, Building Flows and Using Flows. You will find thousands of technical professionals, and Super Users with years of experience who are ready and eager to answer your questions. You now have the ability to post, reply and give "kudos" on the Power Automate community forums. Make sure you conduct a quick search before creating a new post because your question may have already been asked and answered. Galleries: The galleries are full of content and can assist you with information on creating a flow in our Webinars and Video Gallery, and the ability to share the flows you have created in the Power Automate Cookbook. Stay connected with the Community Connections & How-To Videos from the Microsoft Community Team. Check out the awesome content being shared there today. Power Automate Community Blog: Over the years, more than 700 Power Automate Community Blog articles have been written and published by our thriving community. Our community members have learned some excellent tips and have keen insights on the future of process automation. In the Power Automate Community Blog, you can read the latest Power Automate-related posts from our community blog authors around the world. Let us know if you'd like to become an author and contribute your own writing — everything Power Automate-related is welcome. Community Support: Check out and learn more about Using the Community for tips & tricks. Let us know in the Community Feedback board if you have any questions or comments about your community experience. Again, we are so excited to welcome you to the Microsoft Power Automate community family. Whether you are brand new to the world of process automation or you are a seasoned Power Automate veteran - our goal is to shape the community to be your 'go to' for support, networking, education, inspiration and encouragement as we enjoy this adventure together. Power Automate Community Team
Hear from Principal Program Manager, Dimpi Gandhi, to discover the latest enhancements to the Microsoft #PowerUpProgram, including a new accelerated video-based curriculum crafted with the expertise of Microsoft MVPs, Rory Neary and Charlie Phipps-Bennett. If you’d like to hear what’s coming next, click the link below to sign up today! https://aka.ms/PowerUp
It's time for another TUESDAY TIPS, your weekly connection with the most insightful tips and tricks that empower both newcomers and veterans in the Power Platform Community! Every Tuesday, we bring you a curated selection of the finest advice, distilled from the resources and tools in the Community. Whether you’re a seasoned member or just getting started, Tuesday Tips are the perfect compass guiding you across the dynamic landscape of the Power Platform Community. As our community family expands each week, we revisit our essential tools, tips, and tricks to ensure you’re well-versed in the community’s pulse. Keep an eye on the News & Announcements for your weekly Tuesday Tips—you never know what you may learn! Today's Tip: How to Report Spam in Our Community We strive to maintain a professional and helpful community, and part of that effort involves keeping our platform free of spam. If you encounter a post that you believe is spam, please follow these steps to report it: Locate the Post: Find the post in question within the community.Kebab Menu: Click on the "Kebab" menu | 3 Dots, on the top right of the post.Report Inappropriate Content: Select "Report Inappropriate Content" from the menu.Submit Report: Fill out any necessary details on the form and submit your report. Our community team will review the report and take appropriate action to ensure our community remains a valuable resource for everyone. Thank you for helping us keep the community clean and useful!