IDP: Solving bot illiteracy in the digital workforce – Part 2

Editor’s noteThis is a guest post from Jupp Stöpetie

In this post we examine the role of Intelligent Document Processing (IDP) relative to Robotic Process Automation (RPA) and how these technologies drive Digital Transformation when combined. Part one was looking at what is driving and enabling Digital Transformation. This part two is dedicated to RPA and why IDP is essential.

Robotic Process Automation 

Since maybe 3 or 4 years RPA has become the tool of choice for automating repetitive tasks for many companies. RPA vendors claim that their systems are easy to set up and maintain without the need of coding. Vendors propagate that RPA systems can be operated by business managers with some light-weight training. Basically every manager can create bots that replace human workers. This has led to a wide spread of RPA systems in businesses. There is no need anymore for complicated, lengthy and costly automation projects managed by IT departments. RPA puts automation and the use of AI in the hands of business managers. Today the workforce in companies is increasingly a combination of humans and robots. But of course things are more complicated when you look a bit deeper than the RPA marketing collaterals. What is for example when we need to retrieve information from a document? That is no problem for human workers. But bots have no humanlike reading skills.  They only can “read” data from structured data like files and databases. That is because it is easy to instruct bots with the exact location where to find the data.

Why bots can’t read documents

A document can be seen as a container for content. The content is made up of static data and explicit and hidden information describing the relationships between the data which gives meaning to the document.  A document also has meta-data being the properties of the container itself.  Content is represented in documents in a way so that humans can process the content. Note that this has an important implication. Most documents were never designed to be read by bots. When data must be extracted it doesn’t matter for human workers if a document has a fixed structure like in a form, a semi-structure like in an invoice or no structure like in a contract. With some proper instructions human workers will be able to find the data they are looking for. Depending on how much structure there is, processing time may vary significantly of course. Bots however have no cognitive reading skills. And adding OCR and data capture technology to an RPA solution is often not enough to make bots really skilled at processing documents.

Photo by Arlington Research on Unsplash

Why OCR and Data Capture often do not offer the right reading skills for bots

The short answer: these technologies fall short because they were not developed for RPA. OCR was not designed to understand content. OCR is a technology for converting pixels into characters. Most OCR packages can also convert document images (scans) into text files while recreating the original layout. Data Capture systems use OCR technology and also many other AI technologies. Data capture systems were designed for extracting data from large volumes of documents with the highest possible accuracy. Neither OCR nor data capture were designed for RPA users to teach their bots how to read stuff. 

The users who usually set up data capture installations are engineers who know exactly how to use all the levers and parameters of these systems. They often will create scripts or even add self coded additions in order to achieve the highest possible accuracy. The initial set up is high but that makes a lot of sense. The ROI of data capture installations doesn’t have to be fast. Almost always these installations are set up to run for many years. 

Batch oriented data capture systems, although very powerful when setup correctly, logically are not the first choice of RPA users when they need to add document processing capabilities to their bots. These users are looking for simple, easy, fast and flexible functionality. The volumes they need to process are small. They also have higher needs for these systems to learn while doing because not the whole spectrum of variability in the documents that need to be processed will be available at the start of the project. And often they also need a higher level of intelligence because they want to automate tasks that were formerly performed by humans. And when they design new processes RPA users want to create intelligent bots that behave just like humans. But what RPA users cannot handle and what would break the RPA paradigm is if adding reading skills to bots comes at the expense of requiring a lot of investment and special technical skills like solution design, coding, production testing etc.

Note that in cases where RPA systems are used for processing large volumes of documents it makes sense to use data capture systems to extract data from these documents. Because in these cases efficiency aka accuracy most likely will play a significant role. Processing large volumes of documents is data capture’s sweet spot.

What RPA users really need when they have to process documents is something that may look a bit like OCR and data capture but is much smarter than that because it is operating a much broader set of AI technologies and at the same time it also should be easier to use.

IDP: a new product category for a new market

The massive pervasion of RPA installations in companies that happened during the past 3 to 4 years has led to an increasingly high demand for these easy, simple, flexible but yet powerful intelligent data capture solutions. This fast growing demand has spawned a new generation of companies who has gone down a different path using different technologies than what the incumbents in the data capture market have been doing for more than 20 years. These new systems are all based on the idea that Deep Neural Networks and other forms of Machine Learning are better and much easier ways to fulfill the needs of RPA users. All you need is a lot of samples, train your neural networks and off you go. What however is unclear at this stage is if ML can actually deliver the accuracy that is needed when bots should have humanlike reading skills executing mission critical tasks. When deep neural networks make mistakes you cannot go in and correct for these mistakes. These systems are black boxes. It is noteworthy that all incumbents are updating their existing offerings with adding ML technologies. Especially with the goal to become better at processing unstructured documents. And it seems not such a ridiculous assumption that these companies who have many years of experience developing document processing systems have an advantage over the competition that is fully ML focused. It looks quite plausible that incumbents are better set up to marry old AI and new AI technologies based on their solid understanding of how to build robust document processing systems.

All these developments of old and new companies to develop document processing skills for RPA bots have led to a new category of products that cater to the digital transformation market and not so much to the traditional capture market. The emergence of these new products was the reason for the Everest Group, a leading management consulting and research firm to come up with a new product category: Intelligent Document Processing or IDP.

Everest Group defines IDP as any software product or solution that captures data from documents (e.g., email, text, pdf, and scanned documents), categorizes, and extracts relevant data for further processing using AI technologies such as computer vision, OCR, Natural Language Processing (NLP), and machine/deep learning. These solutions are typically non-invasive and can be integrated with internal applications, systems, and other automation platforms.

Photo by Markus Spiske on Unsplash

About OCR and Online-learning

In many blogs about IDP authors create a contradiction between OCR being the old fashioned unintelligent way of data extraction and modern AI based extraction methods that are state of the art and intelligent. First of all as I pointed out OCR is not data extraction. OCR is one of many AI technologies that are used in data capture systems. And there are different ways we can build intelligence into systems that will help to find and interpret data in documents. Machine learning may perform better on unstructured documents. When dealing with forms and semi-structured documents systems that use templates, classifiers and other AI technologies including machine learning will almost always outperform systems that are solely based on machine learning. Note that these comprehensive systems operating ML in a smart way will learn from automated feedback through users correcting mistakes and from the results of successful and correct classification and extraction to generate additional knowledge (expanding the space) and statistics of usage of existing knowledge. See The Magic of Online-Learning.

What is the ideal IDP solution?

The challenge for intelligent document processing is that there seems to be no one ideal approach. Depending on the type of documents, the volumes that have to be processed and the importance of accuracy ML or more traditional AI approaches or a blend thereof will be the best choice. For example when document volumes are big accuracy is important. When a shared service centre for example is processing 50 million documents a year improving accuracy from say 95% to 96% is significant. We just reduced the number of documents that have to be corrected by 500k. Another case where accuracy is critical is in straight-through processing.

It seems that the best option customers have is to adopt an IDP platform that enables them to operate different solutions or combinations of such solutions while shielding users from the complexity of these solutions. Vinna is such a platform that even allows to add crowdsourcing (Human In The Loop Verification) without users ever being aware of the complexity of what goes on under the hood.


Jupp Stöpetie

Jupp Stöpetie as CEO of ABBYY Europe established ABBYY’s presence in the Western European markets, growing the brand and market presence to a leadership status for +25 years. His experience includes founding and growing companies and managing all levels of business operations, sales, and marketing. Jupp left ABBYY in spring 2020 and now works as an independent consultant based in Munich, Germany.