How graph databases improve fraud detection

A new generation of fraud detection applications is about to eliminate the shortcomings of the current applications. Combatting fraud is an annoying and expensive hassle for everyone:

Credit cardholders don’t want the bother of listening to Muzak at the 800 number of card issuers to report fraudulent charges and then having to change to a new card.
Merchants don’t want chargebacks undermining their profitability.
Card issuers don’t want fraudulent charges reducing their net income.

Fraud also increases the cost of operating the payment system for everyone:

Credit cardholders pay higher fees and interest charges than they’d like to.
Merchants pay higher transaction fees than they’d like to.
Card issuers charge higher interest rates, that no one likes, to cover the losses.
Police consume budget to investigate and charge fraudsters.

Since none of us want to wait any longer than we already do to complete transactions, fraud detection applications must make an approve or reject decision in well under a second.

Mr. Richard Henderson, EMEA Team Lead Solution Architect, at TigerGraph said that his team “built a fraud detection application using machine learning and a graph database to demonstrate that a 50 per cent increase in frauds detected with a low number of false-positives is feasible while delivering excellent real-time performance. TigerGraph is now implementing the solution at major card-issuing banks in the United States.”

Limitations of the current fraud detection applications

In addition to a fast response time requirement, fraud detection applications operate under added constraints. The application must:

Respond to tens of thousands of authorization requests concurrently.
Track the approvals and rejections.
Operate with high availability to minimize fraud losses due to outages or degraded operation.
Operate within a budget that doesn’t scare the management of the card-issuing bank.

The impact of these constraints is that current fraud detection applications can only use the four immediately available transaction variables plus a few fairly simplistic static rules in their fraud scoring systems to make each approve or reject decision as illustrated in the table below.

The rows in the table list all the variables that can be relevant to making each approve or reject decision. The numbers in the cells of the table show the relative importance of each variable.
The table illustrates that current fraud detection applications are severely limited because they can’t even consider the five most important variables. The limitations leave a lot of fraud undetected.

Advantages of graph database fraud detection applications

The advent of production-quality and highly scalable fraud detection applications that rely on the advantages of machine learning and a graph database can materially improve fraud detection and thereby reduce fraud losses for everyone.

Calculated variables

As illustrated in the table above, a graph database fraud detection application can use twelve additional calculated variables in addition to the immediately available transaction variables to make each approve or reject decision without increasing response time. The period used to aggregate data for the calculated variables varies from 7 to 30 days.

No relational database can select and aggregate the data required for the calculated variables and still achieve the real-time response target.

Machine learning

Machine learning promises that it can adapt to the dynamics of the potential fraud situation. However, a relational database can’t provide the values for the calculated variables to the machine learning models quickly enough. That shortcoming has prevented the potential of machine learning from being realized for this application.

By switching the fraud detection application to a graph database, the value of a machine learning model can be added to the approve or reject decision while still meeting the real-time response target.

First published at IT World Canada