Skip to content
Microsoft Industry Blogs - United Kingdom

A photo of a woman sat at a desk, using Microsoft Excel on a laptop, with a picture of Bit the Raccoon on the right.

Many companies still manually read thousands of forms (invoices, tax forms, etc.) and enter the data into structured schema. Other companies hire 3rd party companies for the processing of their forms, and the cost per page is prohibitive. In both cases, form processing is time consuming, tedious, expensive and encourages human error.

In this post, I will describe an end-to-end solution to automate form processing using our newest Cognitive Service, the Form Recognizer. In this solution, we’ll analyse invoices from different clients with varying formats, keys and values placed in different positions.

Our goals are:

  • Ingest the invoices.
  • Validate that the documents are real invoices.
  • Extract the key-value pairs association.
  • Send it to your ERP for payment.
A mock invoice, with different areas of information that need processing highlighted.
Figure 1: Sample of a form you can process with the Form Recognizer Service

 

Azure Cognitive Services

Microsoft Azure provides many Cognitive Services (Vision, Speech, Language, Knowledge, and Search) to help developers bring AI into their applications. These Cognitive Services can be combined to make applications more intelligent, engaging and discoverable, without you needing to be a data scientist.

Using Computer Vision and Optical Character Recognition (OCR), we can detect and extract text from images. It’s ideal for search but doesn’t allow a key-value pair association, and therefore is still insufficient to extract forms into a structured schema or automate business workflows in organisations. That’s the gap covered by Form Recognizer.

The Form Recognizer service, currently in preview, was announced in early May. It uses machine learning algorithms to extract key-value pairs and tables out of forms. The Form Recognizer service is available both as a Docker-based container image and a managed hosted service. The containerised Docker-based image option is ideal for companies with restrictions on the transfer of data to the cloud as it can run on a Kubernetes based Azure Container Service, an on-premises Container Service or on an Azure VM.

You can train a model and analyse your forms by calling a simple REST API. The REST API makes it easy and flexible to use, regardless of the language and platform.

The Form Recognizer service supports PDF (text or scanned), JPG and PNG input documents. It provides a json file as an output.

The containerised version requires a Computer Vision API. The Docker container must allow communication with the Azure Service for billing purposes.

A diagram showing how the Computer Vision API resource and Form Recognizer resource work together.
Figure 2: Docker-based container image components

To train the model, the Form Recognizer service uses a sample of at least five identical forms for each type of document. Under the hood, it clusters the forms by type, autonomously trains the model and makes it available through a REST API.

The different processes that Form Recognizer runs to train itself on forms.
Figure 3: Form Recognizer Service, under the hood

 

Scenario

For this use case, we’ll ingest the forms into an Azure Blob Storage container. We’ll use an Azure Logic App to trigger the invoice extraction as soon as an email is received.

Because an attachment might not always be an invoice, we’ll train and deploy a custom vision model to classify the attached files (Invoice/Not invoice). We’ll create a Blob-triggered Azure function to call the REST API whenever a new document lands in the blob storage. Real invoices will be copied into an input container and will be processed by Form Recognizer.

The Form Recognizer API call will also be called through another Blob-triggered function. The result will be copied to an output container in the storage. A copy activity in Azure Data Factory (ADF) can be used to ingest the data into a relational database (Azure SQL Database or Azure SQL Data Warehouse). Different options can be used to trigger further processing in your ERP system.

Figure 4: High-level architecture of the end-to-end invoice processing automation solution

To setup the Form Recognizer during the private preview, you will need to:

  1. Create the Computer Vision resource in the Azure portal.
  2. Create the Form Recognizer resource in the Azure portal. The link will be provided after your subscription is whitelisted. You can request access here.
  3. Setup an Ubuntu VM on Azure. Only an Ubuntu VM will allow you to map a Blob Storage as input for Form Recognizer. Note, you can also use Azure File as input.
  4. Install Azure CLI in the host (Ubuntu VM).
  5. Install Blobfuse to mount Blob Storage as a file system. If you’re using Azure Files as a file system, you will need to install CIFS VFS packages.
  6. Create an Azure Blob Storage and two containers: the first container will receive the attachment in the emails and the second one will host the validated images.
  7. Download the Form Recognizer container. The details will be provided after your subscription is whitelisted.
  8. Run the container:
    1. docker run –rm -it -p 5000:5000 –memory 8g –cpus 2 –mount type=bind,source=/home/<user>,target=/input –mount type=bind,source=/home//<user>/output,target=/output containerpreview.azurecr.io/microsoft/cognitive-services-forms eula=accept apikey=<Key> billing=<endpoint> computervisionapikey=<key> computervisionendpointuri=<Endpoint>
  9. Create a dataset and train the model:
    1. curl -X POST “http://localhost:5000/forms/v1.0/dataset” –data “{\”name\”: \”datasetname\”,\”dataRef\”: \”/input/ <dataset path>\”}” -H “Content-Type: application/json”
    2. curl -X POST –data “{\”modelId\”: <modelId> }” http://localhost:5000/forms/v1.0/dataset/<DatasetId>/score
  10. Run the model:
    1. curl -X POST “http://localhost:5000/forms/v1.0/dataset/1/score” -H “accept: application/json” -H  “Content-Type: application/json-patch+json” -d “{  \”modelId\”:1}”

 

The different sections of the mock invoice that will be fed into Form Recognizer.
Figure 5: Sample of an input invoice to the Form Recognizer

Result

The data acquired by Form Recognizer, based on the earlier mock invoice.
Figure 6: Sample of an output of the Form Recognizer

Get started