August 31, 2020 | Important features, New Releases, OCR

GdPicture.NET OCR SDK: New MRZ Engine


Illustration for the blog article about the new MRZ Engine in GdPicture.NET SDK.

Hi Everyone,

Earlier this month, we have released a new OCR engine to read ID documents, and today we’re going to tell you more about it.

And we’re very happy with the results:
our engine decodes MRZ characters on any image in less than 100ms, even if the quality is low and the image skewed. 

But first…

What is MRZ?

MRZ stands for Machine Readable Zone.
It’s a format intended for countries to have machines able to read ID documents like passports, ID cards, or visas, without typing anything. These documents all use a specific font called OCR-B and a specific number of characters and lines.

The data of the MRZ is found at the bottom of the ID document.

US State Department image of a machine-readable passport ID page (Wikimedia Commons)

The International Civil Aviation Organization (ICAO) is in charge of the MRZ specifications in Document 9303.

Since not all IDs are made equals, there is more than one MRZ format, as each need requires a specific amount of information and adaptations. Current MRZ formats are:

  • TD1 (ID card, passport card)
  • TD2 (passport)
  • TD3 (passport)
  • MRV-A (visa)
  • MRV-B (visa)

There are also other specific formats depending on the country. For instance, French and Portuguese ID cards don’t follow these standards.

What makes MRZ different is that, in addition to storing your data in a way that is more machine-friendly for it to read, it also adds checksum validation.

Each numerical value is followed by a checksum that verifies if the machine reads it properly. Of course, checksums are not infallible, but they significantly improve accuracy.

Specific cases

The French ID cards validate first names and last names/surnames. The French government decided that ensuring the validity of the names was equally important as the integrity of the number, and by doing so, doesn’t follow the global standard.

The Portuguese ID cards also vary from the standard MRZ format and use a different algorithm for validating the integrity of their ID number.

How to use MRZ with GdPicture.NET?

Now that you have the theory of what is an MRZ let’s dive into how to use it with GdPicture.NET.

Let’s assume you already have a working sample able to scan a document (if not look at this documentation).

The only thing you need to do for switching from our general-purpose OCR to MRZ is to specify the special context as a parameter of the RunOCR method:

gdPictureOcr.SetImage(imageId);
resultId = gdPictureOcr.RunOCR(OCRSpecialContext.MRZ);
string mrzRead = _ gdPictureOcr.GetOCRResultText(resultId , false);

That’s it!
You get the result of the OCR and an MRZ value.

As always, with OCR, the better your source (the scanned document) is, the more accurate the result will be. Do not hesitate to check our clean-up samples in the programming section of our documentation to improve your document’s quality.

Coming soon: MRZ validation

In a future release, the engine will also tell if the MRZ read is valid.
In MRZ reading, a document is considered valid when all the checksummed data are coherent. The current specifications don’t validate a few parts of the MRZ (including names), and that is why it’s still possible to have a valid MRZ document with inaccuracies.

The GdPicture.NET MRZ engine will bring another level of security by validating this data. 

Meanwhile, please do not hesitate to share any document with a detection issue. You can send your files via our secured helpdesk, so our team can analyze them.

Stay tuned!

Etienne and Elodie

References

Specifications:

Wikipedia:


Tags: