Amazon's Textract, the open source tesseract, or OCR in off-the-shelf PDF readers are great for digitizing random notes, books, or newspapers...but what if you're working with receipts and invoices instead?
General purpose OCR is just not going to cut it there because the problem is way too specific, and needs something more than just elbow grease.
Also, you'll often need to do it at scale. When it's tax season and you need to digitize and file hundreds of receipts and invoices, all of which contain critical financial data, even an OCR algorithm that's always 100% accurate would be too dumb to extract meaning from receipts in bulk to give you structured, actionable data.
To put it simply: You need to work smarter, not harder. And that's the attitude that TAGGUN's machine-learning product brings to the table.
General Purpose OCR: The 30,000 Foot View
To understand TAGGUN's solution, let's look at why receipts are a pain without a specialized receipt data engine:
- There's no standardized way of putting information into receipts, or to format them (layouts, spacing, font choice). How do you show vendor headers, subtotals, taxes, totals?
- Often, there will be lines on receipts that are simply meant to be ignored - maybe because of regulations as old as the POS machine, and the software either hasn't been updated, or can't be.
- Scans or images of receipts won't be perfect. There will be lighting/angle/blur issues, and you'd have to first edit and fix each image to even make a Computer Vision (CV) analysis possible.
- The biggest challenge by far is interpreting meaning once you have the text. How do you even know you're looking at a receipt? How do you know which random sequence of characters is vendor names, geolocation, item names, codes, prices for each? How do you then extract them out as structured data that can be acted upon?
So the key here is context - and general purpose OCR cannot extract that from receipts for you.
Sure, you can add more resources to help out (pre-processing images, more compute power, rules, etc.), but this problem space is just too specific for them to cover meaningfully. They can extract text blocks from images, and if you feed it templates they'll be decent at parsing receipts from one - maybe two--- specific vendors, but the requirements will skyrocket; forget ever doing it at scale.
The only real solution is one that has been trained to know what receipts and invoices are, what they look like, and how they are meant to be read...one that also learns, adapts, and gets better each time it does so.
Enter: TAGGUN
TAGGUN is a receipt scanning API boasting a powerful engine under the hood that combines state-of-the-art OCR with cognitive learning technology - a combination of machine learning, regular expressions, NLP, and fuzzy-matching for logically related fields - that can:
- scan and parse receipts in any language, without needing templates,
- extract information while preserving its context,
- and output structured data that can be easily read by humans and machines alike.
Best of all? It operates as a traditional RESTful API in the cloud. You simply send it images of receipts - whether photographs or scanned copies, in JPEG, PDF, PNG8, PNG24, GIF, and HEIF formats, or just the URL of the image on your servers - and within seconds get back JSON results with an accuracy rate of 90%+, processed realtime.
Now, the million dollar question.
Why Does TAGGUN Work Where General Purpose OCR Comes Up Short?
Purpose Built
TAGGUN has been extensively trained on millions of receipts, so it knows to a human-level accuracy exactly what a receipt even is. It can parse structured, contextual data from just pictures of receipts - even if it is in a previously unseen format, or a completely different language.
For example, if you're trying to budget for your household, all you have to do is:
- Point your phone at each receipt
- Take a picture
- Send it to TAGGUN
- Get back data as simple (just the totals) or as complex (vendor information, quantities and prices for each item, taxes, etc.) as you want
- Move on to the next one.
It's literally that simple. With TAGGUN, you're saving precious time and effort in picking through a mountain of receipts.
Even if general purpose OCR APIs worked with 100% accuracy, you'd get a long block of digitized text that you'd still have to scroll through to find the data you want - not nearly as useful as separate fields clearly marking date of purchase, vendor name and location, item ID, name, taxes, and cost.
Only a specialized engine can create structured data from unstructured receipts (which might even be handwritten!).
Accurate, And Fast
Imagine having all that AI-powered goodness...and then combining it with Google Vision AI/Azure Cognitive Services, the two best raw OCR technologies. TAGGUN gives you a powerful solution with 90+% accurate results in under 5 seconds, with confidence metrics for each extracted field so your team can inspect and review data with low confidence scores to get even more accurate.
It's the difference between having a middle schooler read a newspaper article about the stock market vs. an actual Investment professional. Both can read it, but only the latter can understand it - and do it much faster because they know what they're looking for.
When you're this good, other solutions would have to brute-force the problem with manpower like AWS Mechanical Turk to be more accurate. And at that point, it'd no longer be a viable no-queues, real-time solution that you could integrate anywhere.
Built-In Pre And Post Processing
When digitizing receipts and invoices for your business, you'll often run into ones that are low quality, or have artifacts. Most will be from non-employees, so you can't really control this process. To even get started with these, you'd have to first edit the images to fix noise, blur, contrast, orientation and so on, and then post-process results to account for errors (using OCR merging, error models, etc.)
TAGGUN handles all of this under the hood for you, and you could forget these parts of the pipeline even exist. From your point of view, it's an end-to-end solution. You're sending in images/scans of receipts, and getting back extracted information; saving on employees' time and investors' capital in the process.
Existing general purpose solutions, on the other hand, are like getting quite a capable motor for free (recognizing text), but it's up to you to build the rest of the car around it.
Easy, Zero-Friction Integration
TAGGUN is a conventional RESTful API that lives in the cloud, and integration into your tech stack is trivially fast; a matter of writing boilerplate code in your language of choice to make POST requests to it with receipt(s), and getting back data. Time-to-market being this low means you can instead focus on parts of your business that actually needthat time and effort.
General solutions are either too inflexible, or complex SDKs and libraries like tesseract, where the expertise/development costs alone might make the option a non-starter.
Saves You Infrastructure And Hardware Costs
TAGGUN is just a REST API gateway to a powerful AI engine with self-learning - so adopting it would save you money in building an in-house ML/Deep Learning solution (i.e., hiring professionals, high-performance GPUs, rented servers, sourcing large enough datasets to train the AI) and yearly expenses to maintain all of that.
TAGGUN's pricing is transparent, predictable, and a fraction of what building your own infrastructure from scratch would.
Also, TAGGUN is language-and-platform agnostic, while existing solutions might require vendor specific hardware - PyTorch and CUDA libraries as dependencies (Meaning an Nvidia GPU, with the CPU fallback in its absence downgrading performance considerably).
Reliable Support
You can count on TAGGUN's dedicated Global Services team for top notch, prompt support whenever, and wherever you need it. Based on your feedback and usage patterns, TAGGUN's engineers can even train the AI with your data, or build you a custom-made AI model that fits you precisely within weeks, not months or years.
Conclusion
Digitizing receipts and invoices is hard.
Just throwing them into a PDF isn't enough. Just extracting a long block of raw text isn't enough. For precise, time-sensitive goals like budgeting, filing tax returns, making accurate cash flow forecasts, managing employee expenses, and so on, you need a solution that understands the problem with receipts just as well as you do, and can deliver - in real-time - contextual data that gives you the confidence to make informed decisions.
So, really, the choice between TAGGUN and general purpose OCR solutions comes down to this: do you want to work smarter, or harder?
Shouldn't be too difficult a decision.