Validating customers’ identity in the shortest possible time


In a world where every daily activity is characterised by the exchange of data and information, verifying users’ personal identity plays a central role. Whether opening a bank account, applying for a loan or renting a safety deposit box, users are often asked to provide scans of their identification documents. Companies are therefore forced to employ a significant number of resources in an extensive effort to verify and compare the documents and data in their possession. Most companies simply outsource these verification activities in order to overcome the complexity of this task.


Let’s consider the different types of documents available: identity cards, regional service cards, driving licence, tax codes, etc. Each of these materials could be used to access the service requested by the customer. This involves having to extract data and process documents that are often provided in anything but optimal conditions. Scanned images can often be of low quality, or there may be several documents contained in one image, while some may be blurred in precisely the places containing the pieces of information in which we are interested, etc. The multitude of possible cases creates a framework that is worth analysing and addressing in all its various aspects.


To address the complexity at hand, we have developed a solution designed to support the automation of the identification process, rendering the process effective, economic, accurate and suitable for all the various scenarios.

  • First of all, making use of specific Neural Networks, during the initial scan we separate the fragments needed from the images provided, thus obtaining the data that requires processing.

  • These fragments are then passed on to a pre-processing stage, through suitable Computer Vision algorithms, in order to clean and filter the initial images and obtain a result that guarantees the efficiency of the process.

  • Once filtered and selected, the images are submitted to a suite of Optical Character Recognition (OCR) systems, the result of which is subsequently cleaned, corrected and classified by means of advanced Text Processing and Machine Learning algorithms.

  • Thanks to this approach, the automated and optimal extraction of the data of interest is possible. Data that can therefore be used as part of security procedures and in the verification of corporate identities, with the maximum possible saving in terms of time and resources.


Even on particularly complex datasets, characterised by low quality images of documents that are difficult to process, such as paper identity cards, the minimum accuracy obtained from the pipeline created is around 70%, arriving at nearly 100% on more structured documents and superior image quality.

In a context in which the cost of validation of each document by an outsourcer can be more than 10-20 cents per piece, by adopting our automated solution, even volumes of a few tens of thousands of pieces per month can generate significant savings.