If your OCR is taking long, someone is looking at your data…

When processing bookkeeping and accounting documents using an optical character recognition (OCR) service, the results should only take seconds. If this is not the case, then it raises the question: what is really going on here

First, let’s take an example from history. During the 18th century, the story of the Mechanical Turk (The Turk) baffled the world. The Mechanical Turk was a chess playing machine that proved almost unbeatable for nearly a century, even playing and successfully defeating Napoleon Bonapart and Benjamin Franklin. 

So, you might be wondering - how did that happen? The technology at the time didn’t allow for a machine to play chess, let alone beat professional chess players. The Mechanical Turk turned out to be a hoax - it was simply a device operated by a human hidden inside the Turk. The machine was operated by some of the brightest chess players of that time. 

Today, in a world full of AI promises, many companies claiming to offer fast, efficient and reliable OCR services are in fact relying on third party crowdsourcing marketplaces. Amazon offers such a crowdsourcing service ironically named Amazon Mechanical Turk.

In other words, companies relying on third party crowdsourcing marketplaces are sending your data into a “farm” of data entry clerks in developing countries, where laborers “process” the dirty work which gets marketed as an AI OCR. This is what’s likely happening when your OCR takes more than a few seconds.  While your OCR may sometimes process a document in a couple of minutes, other times its processing time seemingly never ends. That kind of OCR is certainly human-powered; ask any techie with OCR experience and they will tell you that a machine-powered OCR takes no longer than a few seconds, not minutes (and definitely not hours).

Unlike the Mechanical Turk’s story, today’s technology makes it possible to have 100% machine-powered OCR. Although machine-powered OCRs are not easy to build, some companies have done it already.

Yes, there are two kinds of OCR:

  1. Human-Powered
  2. Machine-Powered

Besides processing time, what does it mean to be using one type of OCR over the other in terms of privacy, security and efficiency?

Using either type of OCR service is considered outsourcing, yet the risks associated with the two can vary a lot. Soha Systems reported that 63% of all data breaches result either directly or indirectly from access by third parties, such as outsourcing contractors and suppliers.

“We are not saying outsourcing is inherently bad, but organizations that do get breached have probably made some bad outsourcing decisions,” said John Yeo, Trustwave's European director.

Data breaches are just numbers we read about, until they affect us. According to Business Wire, human error is the leading cause of data breaches, being linked to more than 80% of cases. Unknowingly sending your data for crowdsourced clerks to process is by TrendMicro’s definition a data breach. On the other hand, when your data is being sent and processed in the cloud using 100% machine computing power, you’re still technically outsourcing the same task but with much better privacy and security. It’s also easier for bad actors to successfully attempt social engineering attacks on your company when they are spying on your data, so imagine what they can do with full access. This is another factor that you won’t get exposed to when using true AI. 

Not only is it faster and more secure, but with machine learning, AI OCR can now surpass human level accuracy in most cases. To add to that, with true humanless AI you would most definitely not receive emails like the one below; a machine can’t catch COVID (or take a nap).

Please be informed that due to the rise in Covid-19 cases in the city where our Invoice Verification Team is based, many of the team members have been unable to attend work. This significant decrease in the labour resources has led to slower processing times for invoice verification.

So, what’s the deal? How much more expensive is a machine powered OCR? In fact, machine powered costs less. When using a human powered OCR, you have both a software cost and a human cost (not to mention a privacy issue, and how much that might cost). 

A good question to address would be: if machine-powered OCR is superior in every aspect, why would someone choose to use a human-powered one? The answer is simply because they don’t know. This can be known as fake product marketing: no OCR company makes it clear that they use crowdsourced humans to get the job done. One would have to dig deep into the privacy policy to see that there might be a “data extraction team” or “verification team” involved in the process.

FISPAN’s browser extension allows business owners, accountants, and bookkeepers to process invoices right from their inbox. We use a 100% machine-powered OCR because it’s simply better. Our browser extension is designed to work instantly in a contextual manner on top of your inbox and accounting software. Don’t stare at your loading screen waiting for someone to get the job done. Using an AI OCR, it takes a few seconds and makes it a much more valuable tool to save time, effort, and give you peace of mind.

 

Back to Blog

Related Articles

How to choose a bank that can help boost your bottom line

Not all banks are created equal. Look for one that embraces forward-thinking tech.

The Implications of Real-Time Payments with Lou Towchik

Lou Towchik retired in April 2020 after serving for 38 years in various management roles at...

The Log4j Zero-Day Flaw: What Is It and 8 Ways to Protect Yourself

Editor’s Note: FISPAN is currently not vulnerable to the Log4Shell (CVE-2021-44228) and has not...