Skip to content

Detect and Prevent Fraud Using Graph + Machine Learning

Learn how Graph Technology + Machine Learning can help to identify risk and fraud patterns in order to quickly respond. Many new fraud rings use sophisticated measures for credit card and other methods of fraud. Utilizing Graph + ML will allow you to see beyond individual data points and uncover difficult-to-detect patterns. Join us to learn how to maximize time and resources with Graph Database vs. traditional relational database platforms.

we all know that fraud is certainly something that affects us all it’s really that unseen cost and whether it’s this billions of dollars where these fraudsters are trying to get in and whether it’s money laundering check kiting or ACH kinds of transforms basically what what we want to talk about then is it’s happening and they’re getting more and more sophisticated and so what Graham’s going to show us today in conjunction with what we can do with graph is really sort of the key takeaway is how do we get there and how do we do it the other sort of face of fraud is what we’ve all heard about in the news which is whether it’s insurance disputes or illicit medication prescriptions those kinds of things it’s it’s hitting us all and it’s hitting our families and really where we’re trying to go with this is use these technologies to really sort of get to the the core of what happens in ways to avoid that

The other thing then is these many faces of fraud which is this contact surface now so once we see what it is it can be in any sort of location and whether it again it’s that money-laundering that check kiting somebody Cielo stealing or synthetic identifications in many cases now there’s combinations and so we’re seeing groups of rings that are really focusing on that and it really doesn’t doesn’t matter what industry you’re in the unfortunate problem is it’s starting to become more proliferated with our electronic means but there are some commonalities and that’s again what we’re going to talk about today is these actors and whether it’s locations in the different devices and some of the activities that give us these attributes that lend themselves so well to graph and machine learning that we’re really going to grab on to today

Once we have those traits and what that data looks like that’s where we can really sort of use the power to analyze them and then get out in front of them so we can be proactive one of the things that we start to do here is we start to look at this ability now where the smarter ones smarter fraudsters if you will are now realizing that they can’t be the outliers no longer can I go get some small amount of money can I go do things off on the fringes they’ve realized now that the fraud detection has gotten better and ultimately what we’re starting to do now is this need for graph which is to take this traditional one-dimensional problem in a relational structure put it into more of a three-dimensional structure but actually now start to look at how those networks are put together so that we can flatten that and the ultimate point here is to make and create dynamic capabilities so that we can do these decisions and make them in a real-time basis so that we can get away from the way we’ve been doing things into the new platforms and this is the other thing that they’ve started to learn which is structures structures are very complex

We start to visualize some of these things they can realize what these patterns start to look like and start to hide in plain sight for lack of a better word and again we’re going to spend some time looking at what these look like but these are some ways traditionally these look like things that we can start to do and this is why again sort of a graph and a visualization starts to become very powerful is that just from visual recognition you can see some things but you can’t quite get the entire mile if you will we need some help to do that now we get to the point which is graph is really good for fraud detection and we want to be able to to dial that up a notch so what is it about graph that lets us do that in this application now we see our world on the left and we start to see some things that matter in your world and many of you joining us today live in this environment today so whether it’s s AP and Oracle or custom sequel applications or if you’ve moved to Hadoop or or whatever those types of structural changes are there’s an infinite number of custom solutions etc and ultimately data sources and our goal here with a graph application is really as best we can to leave most of that data where it’s at what we really want to start to do now is put together this 3-dimensional view of a customer of a pattern of a source or a credit card

First step in yellow which is start to map those data sources now we may choose to move some of that data forward to make our selections faster and those kinds of things but on whole our whole governing now is to get this source schema start to flatten it really start to map it and then in our next one now which is entity resolution Graham will talk a little bit about this now is our first view of machine learning which is now that we started to build all of the customers that exist out there and the different companies and the patterns and whether it’s credit cards etc and in this case if it’s Scott Heath and Scott Heath has friends or my dad it’s got Heath senior it’s how do I make those different associations and using machine learning if I can to identify with much greater certainty what are those groupings and how do we build those in a graph engine which is now this Oreo cookie layer in the middle which is terribly important and we use some technology elements inside of a graph which is to show this representation now in this holistic view not just what Scott Heath has done in my past what credit cards have I done what other kinds of things mortgages other kinds of aspects to put that whole picture together then we start to talk again about machine learning which is to use that graph now on the right which is do some additional thoughtful processes that we can now use that graph learn from the graph and actually now create to the applications on the right and in our upper right hand corner will see our customer and let’s all wave to them ultimately all of this technology has to be in a visible and usable capability where humans can use them and in some cases they don’t necessarily care about all the the great technology we just talked about they just want to know how can I get my job done how can I stop fraud and what can I do so this is a good logical picture and if you can keep this in the back of your mind as we move forward today this will be pretty helpful so when we get to this meta model

The key concepts that we constantly share with our customers is this concept of multiple data sources so whether it’s in ten different customer systems maybe it’s an invoicing system in our case here we’re talking about banking and fraud it could be multiple accounts all of those things make up these branches at the bottom of the Rosetta Stone and what we’re attempting to do in graph now is put together a singular picture of all of the events from whether it’s a customer or an account or a bigger picture and the point of that now is to create the singular view so we can start to go do some very intelligent and thoughtful things with that but the key now is this Rosetta Stone or this master translation in the middle where we take our old systems and we map them into this new world of what we refer to as graph and the technology they’re in and the reasoning for that is in our older view and wheel of sequel and we love the older systems they continue to work and they’ll probably never go away but they don’t do things some things very well and on this system on the Left we can see one-to-many many-to-many those kinds of things ultimately manifest themselves in very large cumbersome code the other thing it does is it flattens those kinds of searches into a very simplistic view so instead of doing complex queries and those kinds of things I can now do some very intelligent activities and very fast activities by simply looking at the network around me and or proclivities that it may I may have seen in the graph database and the point now is that I can do some things very fast

We can also make a much shorter period of time to do these queries as well as visualizations which gets us to my favorite which in this case is a very large inner and outer joining simply trying to look for an HR kind of a rollup or lookup inside of a sequel system and the point here is that you can still do it we can still do it if many of the technologies are out there but it does consume time and in many cases it becomes very brittle so if there are things that are unknown that I don’t know what a query pattern is for sequel is very difficult to to start to do and again the point here is that we can do things better the other sort of thing that we see when we start to touch into graph now is in our little corporation here Dwayne is trying to buy tickets but our fraudster on the outside now has sent a false advertisement he actually is able to go in and steal that credit card number etc but in this type of a flat graph environment I can actually do some very intelligent things because of the way that the data model is number one is it looks like a logical data model when in fact it actually joins and allows us to check on degrees clustering similarity and dependencies and this is what grandma’s going to spend a little more time on when we get into the machine learning aspect but a point of this is we must get it into a flat graph type of nomenclature here to be able to go do those things once we’ve now gotten it into that graph data model or data schema we can start to do some really powerful things which is look for dependencies of things the idea of clustering the ability to then create similarity of different things whether their past or previous kinds of things or whether they’re in the present so that we can start to look at those patterns then the idea of matching so where are those things how do we highlight them how can we identify them and we’ll talk about visualization later and then more importantly as what is centrality and how do these things flow together these are all concepts that are very difficult to do in a sequel kind of a world but are very simple and extremely powerful to do in a graph world then we can see Great Scott look we’ve taken all those wonderful things and a graph on the left and we’ve put them all together and I’ve created this great big as we affectionately refer to at a hairball in overviewing those kinds of things it’s too much a human can’t see where the subtleties are and can’t see some of these transactions although you can drill in and we can do some things to do that it’s still quite difficult ultimately what we’re looking for is an enhanced user interface that’s meaningful that’s usable in this particular case here we have some nomenclatures with 80% tiles and some other visual cues that say hey perhaps I’ve done something that my user should spend some quality time on or investigate on and the goal now is to create this kind of an activity with graph and machine learning something that’s quite a bit more useful and so now with the concept that graph is good machine learning is great so I’ll hand that over to Graham and he’ll walk us through the machine learning fortune yes indeed so we have several different types of analysis we can run on graph and machine learning spans a big portion of those analyses so before it gets started on specifics I wanted to point out a link at the bottom of the screen there to experiment in machine learning you can click that or you can visit that link and access all of the code and all the ideas that we’ve collated there it’s it’s pretty powerful resource so this slide suggests that using graph is a great way to leverage insight from your data using the relationships between samples and not just the properties of each one of those samples or records but when we apply machine learning to this system and to any graph database structure we can actually autumn in an automated way extract these insights and deliver them intelligently to our users so we’re going to talk about a couple of different analysis patterns using machine learning in the graph world the first one is using sort of traditional deep learning methodologies to uncover individual and organizational fraud so on the upper right-hand side we have an image of a three-dimensional representation of a graph in this graph the nodes represent the individuals in the network and the edges represent financial transactions or maybe time-stamped financial transactions so a data scientist would look at this problem and stay no problem I’ll just take the data out of the graph and I will apply my standard machine learning techniques to this like deep learning techniques in fact that doesn’t work and it does not leverage the power of the connections between those nodes in the graph what we do instead is we use the same or similar deep neural network architectures and we embed the graphs into a lower dimensional space to input the information into these deep neural networks if you’re interested in learning more about the neural networks as they relate to graph keep following experiment our website we have three or four articles out already on the applications of this technology which actually includes code and specifics on how to run this stuff so this is exactly that this is an analysis of individual credit-card fraud detection so in the graph on the Left we have a so-called super node graph which is an assemblage of each one of these entities these individuals financial transactions using a particular credit card we popped one of these note nodes super nodes open on the right and what you see in the star what you graph on the right is the financial transactions conducted by that individual so we have to labeled suspected fraud cases so this obviously would not be the end user UI but this is how the technology works behind that UI so once we detect these suspected fraud cases we will highlight them on a UI for an expert user to then go in and review more carefully to verify whether this is true and it needs to be bumped up to the next investigative level so again this is using deep learning some standard deep learning methodologies on graph structured data this slide uses the same types of architectures except that in this case rather than sort of flattening our graph like we did in the previous case we actually take out pieces of our graph so-called sub graphs and calculate what’s called topology metrics on each one of these sub graphs that’s just a fancy word for how does the thing look geometrically so we look at all of the various metrics that Scott mentioned earlier so he was talking about between this clustering similarity we take all of these metrics from each one of these sub graphs and we put the actual metrics into our deep learning algorithm the reason we care about that is because now we’re not just looking at individual fraudulent transactions we’re actually beginning to analyze structures networks of communities that would who are potentially committing fraud this could be for example a money-laundering ring so I put this graph example together to highlight a case of money laundering and we see on the bottom right hand side of that graph a group of yellow companies those companies as we can see are very densely connected to themselves and are connected in a few spots to the outside communities in that graph Network so the question is is that real group of companies or are they just a money-laundering ring that’s appearing to act normally as Scott mentioned before it’s fairly simple for fraudsters nowadays to approximate a normalcy in their financial transactions so that would be the properties of the nodes for instance corporate revenues addresses names things like this but the thing that’s very hard to forge is the topology of the network so you can imagine how one person sitting behind a computer could Forge the names of several companies and create false identities but when you actually start interacting those entities I’ll start interacting with other entities you can actually pick up in the graph structure then those interactions don’t seem normal they don’t seem realistic and that is something you cannot do with relational data bases that’s the reason that we use graph for this type of run detection so here’s an implementation this is a diagram or schematic of a deep neural network which is doing this ring fraud or organizational fraud detection on the left hand side we have an example this is a corporate graph this is real data being fed into this roll network then the output from this deep learning system goes into an accuracy calculation system and then eventually we get an output and as we see here we have a fraud potential measurement system and this again is not the actual UI but would be fed then back into the UI design by experia IT so we’ve talked about a couple of different ways to analyze both individual and organizational fraud and in both of those cases we were talking about supervised so called supervised learning so those are updated on in real time by examples that expert users have highlighted over historical transactional data now we’ll talk about a couple of unsupervised learning techniques for fraud analysis using graph so I mentioned geometries of graphs are very hard to fake and we want to leverage that power in a completely automated way in this setting so we do that and we do it by using sophisticated clustering algorithms with there’s a couple of those mentioned on the left and we can actually start to build up detail about the types of companies that are organized in rings and in various geometric settings inside of the graph so we can see a couple different groups here it looks like a couple of those groups are normal operational entities businesses companies and then we have the small fraudster ring down there in the center in the bottom so the question is are we actually able to use these fully automated algorithms to derive any insight and as it turns out the answer is yes so on the left there you see a big blob of graph and it is completely unstructured all of those entities have nodes again this is the same type of the transactional financial transaction graph so the edges are financial transactions the nodes would be individuals or corporations that are transacting money back and forth and we use one of these very simple unsupervised learning techniques to build up community clusters in that graph and then weight them so that when we view the graph we can actually both see a grouping of these companies which all look alike and we get a scored output so the color in that graph on the right is the risk potential for these companies based on their associations to other companies in the graph there’s one more topic I want to touch on before I give it back to Scott which is using a cutting-edge highly sophisticated architecture to perform both individual and organizational fraud many there’s many architectures leveraging machine learning which can solve these problems and we just touched on a few of them in the previous slides but this architecture in particular is built specifically for this task and we have found huge accuracy improvements using this architecture so the architecture itself is called a graph convolutional net work and again visit experiment comm and check out the blog’s section we have it I think we have two blogs up on this particular architecture right now one includes math and equations we can skip that money and go straight to the overview here’s an article here’s a reference from an article we saw recently which said that 118 billion dollars was lost last year to false positive fraud pointers and so basically what’s happening is that these companies are cutting bisa MasterCard they’re cutting off the users from using their credit cards at a reputable store you know when you’re in the checkout line and your card doesn’t work that loses them underneath a billion dollars so we’re gonna try to eliminate some of those false positives and this is the system that we’ve used to do that so at the bottom there I just mentioned again that this works for both individuals and organizations and we’ll see that briefly this is another schematic slide of how this thing works in its implementation so we have a graph there at number one we extract information right maybe that’s an organization or maybe it’s just an individual and we build a so-called feature vector from that information and we input the entire graph into this graph convolutional network architecture and then we get labels back out.

What we find is that we can use the entire graph structure including the topologies the geometries I talked about before to build these very accurate models so this is the exact same architecture and the exact same data you saw before but in this case we’re actually building again this thing called a super node graph on the right we just take the community clusters that are colored there and we collapse them down into the super node graph on the right and then we deploy our graph convolutional net work on those on that super node graph and what we have found is that all of the accuracy figures in every time we’ve implemented this have been a big deal higher than using standard deep learning methodologies thank you Scott all right so if you’ve been following public huckle everyone still has witch’s graph is good machine learning is great but with the three now with visualization it changes the game entirely

We now have to complete that circle by showing a user something they should spend their time on versus something they should not or not to worry so much about and that’s where we get into how do we do those kinds of things in this case we have a positive machine learning which is we’ve noticed something in a trend we’ve been able to identify a positive use case which in this case is potentially recommending a different credit card based on the travel patterns or spend patterns the negative of those which is now someone who has been potentially doing something with us they may have an activity whether it’s a credit card or in some cases a combination of things that may indicate to a user that they should be talked to and in this case we’re showing where someone is also potentially at risk of leaving us in our credit card company so we should actually be careful with them and proactively reach out perhaps with a manager we have some other tools in our tool belt in the visualization which is to show this graph traversal as Graham mentioned which is how do we use power of graph how do we then use this screen where we show whether it’s richness or in some cases relevancy and some of the other elements that are behind that how do we do those things and share them with a human being we can also do things like geospatial so that if multiple credit card instances are happening across in a similar kind of a landscape how could they possibly in multiple places at one time we can start to show those two users and let them make decisions you can also see in this case where we’ve tied into traditional OLAP and or other kinds of sequel recording that gives us information we can also do things like timelines which show as violations over time and as these things happen we can see that our score and ratcheting overall score goes up which is interesting to someone to go investigate and then we can also show sort of the combination now of all of these things in one screen where I can see regionality I can see scorecards and I can see risk factors overlaid and again the point here is a human being has to use this let’s build something that’s useful as well as conveys that information okay and this is another look now where with many of these dashboards we’re actually combining whether it’s sequel information no sequel information previous legacy kinds of things but we can do standard reporting’s as well these dashboards are typically a mixed application and again when we started we have all these tools in our tool belt to be able to generate and stop the fraud with that what I’d like to do now is actually give you some what we like to say art of the possible.

In this case now I’m showing a dashboard where someone in our case here it’s a it’s a bank and what we’re doing now is we show a dashboard with some items on the Left where we can look for trends and in this case what I’ve done now is I’m going out and I’m doing is we call it treasure mapping I’m looking for categories of fraud in this case I’m looking for potential fraudulent transactions in an in a regional area perhaps in North Carolina I’m able to look at now in this case across the bottom a timeline slider so I can go back in time and again through the in-memory computing capability I’m able to do that at my fingertips and to do that very fast the other items that I can see here is as I move through on this I can now pull up a graph and in this case what I’m looking for is multiple disputes perhaps that have happened against the central gas station in this case I only see a couple of these that may have happened for an individual and I’d like to go now look at this individual and he’s marked red so again using that visual component and now I can use the information that I see here to say well in this case Fran Farmington happens to have a history of different transactions I see that he has many that have happens that have actually been pretty good I’ve seen that there’s been some recent ones that are suspicious but really only one of them has been disputed so in this case it might be in fact that fraud is occurring at the gas station or perhaps someone in the gas station the other thing that I mentioned previously was up here now I can see some other items that might have been flagged for machine learning which is in this case we see a trend on the credit card where he hasn’t been using his credit card as much maybe he’s disgruntled and in this case we see that he was potentially a higher risk for churn so that when one of our analysts or someone reaches out to him perhaps we should chat with him about what he thinks about us perhaps there’s a sentiment and something else behind us

In this use case we’ve been able to do a couple of things not just the fraud the next one that we see now is we can start to do things in this case similar style of a dashboard we’re going to use power of the graph and the visualization as well as the machine learning to stop an opioid over prescription so in this use case what we’re looking for then again is a time line on the bottom down here where we can go backwards and forwards in time over in the right we’re looking for prescriptions without doctor visits and now again we can see our old friend to graph we can see what these items may look like and in this case I’d like to then click on this particular individual and there’s our friend Fran again and in this diagnosis care we can see that he’s actually had two of these doctor visits or actually where there were prescriptions written without a doctor visit and so again what were to show now is this power of the graph this flag that we’ve done at the top of how we’re using these visual cues and the goal here now is to combine all three of these components if you will in useful dashboards to help human beings try and figure out where the fraud is okay gosh Scott that all looks great you and grandma put together a really compelling case

We typically see with our customers is that you may have already started on this journey somewhere you may have already done some graph technology downloads you may have started looking on the web and realizing hey you know this graph thing is pretty good or in this you may have already started something in your business unit where maybe you’ve worked on a pilot etc and what we do here at expiry is we’re not we’re not really too worried about where you’ve started it’s about what you what you might need some assistance with and we’re here to help and you’ll notice here that we can help you in any of these categories so whether it’s talking to us about how to get started what products to use whether it’s a proof of concept or if you’re in the middle of the process and you try and do those things it all starts with gathering some business requirements your use cases and your technology requirements as well and that’s typically what we see is somewhere along this chain is a rapid prototype and whether you do this on your very own or you call us for tips and tricks we’re happy to help but again this is typically a good place to start and ultimately it’s taking that business element of what are you trying to do from the technology get your feet wet to try and do those things and that’s really what we what we suggest you to do is come see us at expiry online come see some of our technology demonstrations and certainly reach out to us if any of these things are interesting or you need additional help and as we walk through these things these are what we normally do when we when we lay it out our prototype which is start to do some of the rapid visuals and part of that is really trying to get your hand on the size of the data what we typically find with our user interfaces is some of the older user interfaces are useful but they tend to break or become brittle when you’re looking at terabytes of data and some of the sizes data that Graham certainly has touched on today can be problematic so a little bit of the data a little bit of the visualization and a prototype can go a very very long way