Graph databases are a rising tide in the world of big data insights, and the enterprises that tap into their power realize significant competitive advantages. So how might your enterprise leverage graph databases to generate competitive insights and derive significant business value from your connected data? This webinar will show you the top five most impactful and profitable use cases of graph databases.
I’ll start this morning just with a quick tour of what great databases there are out there and the greatest of them all of course and maybe some of you have guessed this is a database which has got something like 200 300 trillion connections in it if you look in detail inside you’ll see that it’s made of things called neurons and things called synaptic the synapse is as those connecting things and if your brains in good shape you’ll have two or three hundred trillion of those at the least and if you’re a– if you’re an Einstein level person it’ll be a lot more than that because it’s the connections in your brain that are the things that make it function and differentiate super intelligent people from from the rest of us so if you look at that it doesn’t look much like a database but it really is a database because it’s storing information and it’s providing ways of processing information and but that’s just one example of the sort of network that we’re seeing everywhere and I guess another one is is the medium we’re using for this talk this morning the Internet is of course a database of sorts this one’s quite small compared to the brain in that it only has about 8 million servers connected to it but each of those servers obviously can be immensely complicated inside so it’s still a very very important network and and if you look at that pattern of branches and leaves that’s very typical of organizational structures that we see all over the natural and indeed the built world and we’ll see it again in the next illustration which is a an illustration here of ancestry of corn
This picture is taken from an illustration it is taken from a talk that’s given by one of our customers the agro chemical company Monsanto who are using various databases to model the changes of the genetic structure and the characteristics that are associated with that genetic structure of maize or corn they needed to do very intense modeling over many generations and they came to graph databases because their initial attempt at doing this using relational databases ran out of steam when they got to eight or nine generations and one of the characteristics that they noticed using a graph database is that that doesn’t matter how many generations you use doesn’t matter how many entries how many points you’ve got in your database it just doesn’t run out of steam in the same way and will look a little bit of why that might be another great database is within the Internet and this illustrates that that 80 million server profile that the Internet is really only beginning to scratch the surface of the complexity of it this is just one application and running on the Internet in fact this is just one person’s profile if you look in the middle you may see the name of one of the Neos engineers villa traful and this is his mapping of his own LinkedIn profile and you can see the little clusters are different places where he’s worked or went to college or whatever and one thing you might notice about this is that the connections between it are extremely rich and meshed and by no way could you describe this as a hierarchy or implement this as a hierarchy it’s a very complex interlinked profile so if we think about those sorts of data and just about any other sort of data that you’re familiar with you’ll realize that the world has come a long way since the relational databases were developed in the 1970s
In the 1970’s relational databases were built to deal with a different sort of problem to the problem that we’re looking at with the data that that everywhere today built and still deal with rows and columns of data or basically data that can be entered through forms on computers and it’s taken us a long way when you think about what the world was like when those first relational databases were developed that model has stood up remarkably well and but it’s not great for everything and it’s in some sense is reaching the end of the road as the proliferation of data becomes so extreme and as the different types of data organization becomes so so varied so it’s again it’s that that a large variety of no SQL databases have been developed and neo4j is is one of them and it’s got particular characteristics as a graph database so if you think about this typical data organization that’s the sort of thing that we’re trying to make it possible to work with the process get value out of within a neo4j database.
The agenda for today what we want to do is just talk about what graphs are what they’re good at and how to get started okay what are graphs so if you think about any database it’s a way of representing data it’s not the real world it’s a way of modeling and storing information the real world and that’s a problem that’s existed since computers were invented and before that of course in index cards and files and things and there are lots of ways of representing data so in a relational database you see tables you see rows and you see columns and that’s very good for certain things it’s particularly good where data structures don’t change very often it’s particularly good where the data points are what you’re interested in much more than you’re interested in the relationships between the data points so as it says here mineral connectivity between data points so the way of representing data that we’re going to talk about today the graph database is is just a different structure just a different way of organizing your data which is it’s good for some things it’s good particularly we’re taking two structures are changing all the time it’s difficult to understand which connections it’s difficult to predict in advance rather what connections are going to be important and it’s difficult to process those connections it’s good where you’re living in an agile world where requirements develop all the time and it’s particularly good for problems where relationships in the data contribute to meaning and value you remember.
When we were looking at the human brain we saw that the value of the brain that the quality of the rain gets more as you add more relationships more Sinemet that is between the neurons it’s just the same in a lot of problem domains the more you can exploit relationships the more value you can derive so we’re now going to talk about neo4js particular data model so it’s a graph database but it’s a particular sort of graph database we’ll start just with the very basic elements that make up the model so first of all we have what are called nodes so a node is a data point so node has just is any sort of data point and and I often say a node is just a thing and the nodes are connected by relationship and those relationships are can be you can have as many as you like they gave me is they can be self referential so a relationship can look back to a node you can operate its relationship in any direction in either direction so that’s the basic element of a graph and neo has added in the opening in neo4j you find a couple of very powerful but still very simple enhancements to that model the first is we’ve assigned we can we’ve got a concept called a label so you can label different relation different types of node so that you can process them differently you might want to handle in this example all the roads or all the traffic lights or all the traffic but nodes are labels rather are powerful things you can have as many labels as you like associated with a node and just because you’ve assigned a label to a node doesn’t mean that that node has to have a lot in common with another another node with the same label and we’ll see that in a second so that’s the first element in the description that we’ve added the label and the second one sorry we’ve also added label add labels or types to the relationship and then we’ve added what we call properties so if you see here we’ve got a property associated with one of the nodes in this case the property is a name and we’ve associated property with one of the relationships and you can have as many properties as you like with a node you can have as many properties as you like with a relationship they all follow this straightforward key value structure and just because two nodes have got the same label two people might be people they might have a label of people but it doesn’t mean you need to have the same properties associated with them.
That’s the whole of the neo4j graph model nodes and relationships labels and properties so that’s very simple and we can see it’s very simple but it’s very extensible so it’s very easy to add new nodes it’s easy to add new concepts so added we’ve got some people without any new concept which is it a university here’s another new concept which is a place of employment and you can just go at go on adding nodes as much as you like and adding new types of relationships and that’s why we say it’s a very suitable for an agile environment very evolving data model you can just add new concepts and new nodes dynamically at any time now if you look at this model you think well that’s that’s quite rich it’s nice I can understand it but it’s no big deal and but of course that is just the way we’re using it to describe it but if you look a little further you’ll see that that little pattern might be one among millions tens of millions hundreds of millions or billions and what we need from a graph database is the ability to process patterns in billions of nodes with connected by billions of relationships and that’s the objective of Nyoka J to be able to handle very large numbers of nodes and relationships but still maintain the simplicity that we saw from that model okay so what would you use a graph database for what are they good at so let’s first just look at some of the big ways in which graph databases are being applied in the world and if you look at these you’ll truly see that the world we live in has been changed significantly in the last 10 or 15 years and that change has been driven made possible at least by the application of graph technology here in graph databases so if you think about the way Google has come to dominate the world 15 years ago.
There were a lot of different search engines or competing Google captured the market because it applied graph thinking to the problem of doing search in particular they came up with an algorithm called page rank which meant that there or get their ranking of pages was more useful than anybody else’s and that PageRank is a graph algorithm and they needed to store their data as a graph to be able to do it so best you know that’s a obviously an absolutely dominant part of our world today and that’s been made possible by a graph and the same is true of search engines a few years ago there were dozens of different job search engines and there still are a few but LinkedIn has is now dominating that market and again that’s based on graph technology and the same is true across other industries now but only one of these three is using near 4j and they’re not using the FPGA for their main driving what they what they’ve done is write their own graph databases what neo4j is is a graph database that’s capable of delivering this sort of processing power and this capability is the empowerment if you like to companies that don’t want to go to the trouble of writing their own graph databases so neo4j is an off-the-shelf supported tested engineered product for people who’ve got applications that need graph databases and perhaps have ambitions to change the world in the way that some of these companies have.
Let’s look at some of the ways in which people are using near future at the moment there are a number of different use cases and but having worked for near firts but a couple of years i can say that the biggest characteristic among the use cases is diversity we see extraordinary range of ways in which people can think of applying graphs for as we say for pleasure and profit but let’s look at these the use cases that are probably the communist first of all real time recommendations so if you go to a website and that website has only one thing in mind it really wants to sell you something and very often the thing you think or you buy from that website may be the thing that you they’re the first baseball but it may well be something that you that’s being suggested to you you think about a physical store you go in with one thing in mind but you’re entranced by all the attractive things hanging on racks or displayed on shelves and you buy something else as well and seeing Amazon’s web page they’re very very keen that not only do you buy the book that you originally went there for but you also buy a couple of other books or something else as well that that science is is the key to running a successful web website and it’s based on recommendations and recommendations can be very simple you can just bring up a list of products from the same manufacturer or close the same color or something like that but to be effective it needs to be very rich and subtle for example you need to consider for example natural product pairings and if I buy a TV set and I might also want to buy a extended guarantee for that TV set and if I buy a printer I’ll buy ink for it as well but those relationships can get much more subtle and what people do is they mind their daters to find the relationships in past buying patterns and use those to make recommendations but that may be a bit naive in some cases and often if you look at the way which users have behaved on your website either the individual who’s got the session that’s a carer or took the MER but they may have been looking at other products and that may give you some clue as to their buying habits and or they may fit a profile they may be you know you may be able to identify that they are in the age bracket 15 to 20 and they’re interested in in technology and you might think that they’d be interested in something that somebody of a similar profile would be interested in so is it is it’s a a rich mixture of personal preferences of like product.
Behavior people who profiles I like particularly the guy on the right on the picture you may be the Builder he’s brought a bucket and a trial to build a wall but I think he’s going to the beach I think he’s brought a bucket and a spade and you’re going to spend the afternoon on the beach ok so that’s a recommendation engine we’ve got some very well-known users biggest of which is probably Walmart to use neo4j for driving their website recommendations and other of the four big US retailers at the moment the four largest three are using neo4j so that’s a good use case for a graph a second good use case is what we call master data management I should say actually that this is only one of the use cases that use the MDM initials because we also see a lot of people who are using it for metadata management which is to say managing all the different data silos in their organization and providing connections between them but master data management is is is the process of finding a of finding the the unambiguous copy or a record and of structures about a record in an organization so in this case we’re looking at an employee tree.
There are numerous examples where this is being used or providing that total total customer view and linking profiles where people may have different names in different silos linking linking data is stored in different silos and and trying to produce a definitive overlay showing the true picture of the data within an organization and we’ve got numerous examples of this we’ve got people who are using this within MDM products like Pitney Bowes we’ve also got a very good use case at Cisco where it’s used and as it says here to provide a single source of the truth for all of their into an external hierarchies.
Fraud detection – this is a very good graph use case so just to quickly describe what we see in fraud detection if you go into a bank open a bank account immediately go to the ATM outside and try and withdraw a million pounds the bank won’t give you a million pounds it will probably deny take your card away and an investigation will follow so if you’re committing a fraud and first of all you want to avoid the most elementary cheques so you don’t want to immediately open a bank account and try and withdraw a lot of money what you do want to do there is to is to start to look as normal as possible and looking normal is a bit of a pain in the neck because it means you can’t be an outlier who’s detected by any of the bank’s anti-fraud systems and you need to do a bit of work to do that now if you’re going to do that you need to open several bank accounts so open several bank accounts is a bit painful because you have to produce evidence of identity utility bills and bank statements from previous banks and things like that to prove that you are who you say you are now it’s not very difficult to do that but once you’ve done it you still are limited with the amount that you can rob from the bank because you don’t want to look like a you don’t want to stand out and be an outlier you still want to conduct normal banking transactions and you’re going to have to do that a lot of times if you’re going to make a lot of money by defrauding the bank out of 20 pounds or 30 pounds of time so you need to do that at scale and remember those identity tokens that you use to set up a bank account are hard to obtain so you need to reuse those so the way in which these frauds are often conducted it is a lot of false bank accounts will be set up using the same identity tokens and then money will be withdrawn at from all of those bank accounts at once and the fraudsters will disappear from sight it’s quite hard to detect that sort of behavior with most systems but it’s very easy with graph because that pattern of people opening bank accounts using the same identity tokens is a pattern in a graph that can be detected and acted on so that’s one of the ways in which the FPGA has been used for fraud or fraud detection and as gorga Sadowski says it’s capable of stopping advanced fraud in real time we’ve got users like ria a money transfer agent we’ve got some of the major credit card companies who are using neo4j for exactly their sort of real time for all detection
Most famously the Panama papers a story was revealed by linking all of the information in the in the stolen database that was stolen from the intermediary in Panama and then using the APJ to link together the the identities and the activities of the customers of that intermediary to produce a paper trail that allowed for example the recognition that the prime minister of Iceland had a wife who was investing in some of the companies that he was bailing out through a anonymous company and numerous other stories across the world.
It’s a great use of a graph database we’re practically impossible using any other technology another use case is what we call graph based search and graph based searches you may be familiar with the concept of faceted search if you go to a website you’re often and invited to restrict your choice to in the case of Amazon to compute a good sort of books or to movies but a graph based search is much more than that it allows multiple multi-dimensional faceted search through the data that allows you to reach your destination much more quickly even when there are lots of products they’ve got the same name or in other way similar characteristics another I should say that’s being used by quite a lot of companies in the search business but it figures in unexpected places as well such as looked answer who using it for in-flight entertainment so you can search dude facet it’s super faceted search to find the movies you’re interested in another a very obvious I guess use case for neo4J is just mirroring the complexity of networks a physical computer networks linking servers and routers and bridges and all the other things you find and the paragraph is obviously that it allows arbitrary structures to be modeled very easily but also it’s very good at allowing you to express the different layers that you encounter so here we’ve got physical assets Reuters etc but on top of those there will be services there will be groups of customers there’ll be all sorts of logical layering in the data and all each layer depends on the one below it and if you’re trying to do for example root-cause detection or if you’re trying to do impact analysis you need to understand the links between the layers and those can be very dynamic we all know that that using virtual hardware applications can be moved without notification from server to server which means if there’s an outage it’s often quite difficult to work out what’s going on and that’s where a graph for example can give you an instant up-to-date impact analysis of an outage and that’s being used both by Cisco we’ve mentioned before and HP built into some of their components so that some of that some of the HP network management tools use neo4j in slide to provide network analysis root cause analysis and and impact analysis ok
Very good use case is identity and act its management so this is where you for example have got if you’re a bank you might have hundreds of you users within a large organization so if you’re providing banking service for a multinational your you’ll have dozens of people who are allowed to access your banking facilities from within that multinational and you need to be very careful about which one has got the authority to do what what limits there are where you need to go to a dual signature who has proxy rights to act on your bank at bank account or it may be the same in personal banking but for parents looking after their children’s bank accounts so who has access to what is often a very non hierarchical very rich network of connected data and neo4j is great for modeling that UBS in London is using the APJ for managing their customer access to their and banking capabilities and as you can see there that’s that’s an award-winning application for UBS using using neo4J navigate being adopted by numerous different companies in numerous different financial verticals and this is just a small section and you’ll see maybe a few companies you recognize and Tom Tom the in-car navigation Nomura Financial Services Adi Dassler using it for graph based search and website management and LinkedIn are using it in China for providing a new service and for capture using using social marketing for people of just left school or just left university you know in a first job stage where they don’t have much in the way of a of a LinkedIn profile and they needed they need to be managed using social tools that work for that age group okay so what is it that people are using graph they do is for
We’ve looked at the use cases but why do they then stick with it what why a graph so there are another characteristics of a graph database that once you get into it become compelling and they’re very intuitive they’re very fast and they’re very agile and we’ll look at each of these in turn so here we see somebody who’s thinking about a system design for the first time and what do they do and what do you do what do I do we draw it we try and draw some sort of picture and very often we find that we’re drawing blobs linked by lines we drink we’re drawing a graph and and the the beautiful thing the amazing thing about neo PJ is that we store what you draw so the structure on the database in the database is very direct representation of the business problem it doesn’t go through the conceptual and logical and physical design process there’s there’s no point at which you in it with the relational databases typically denormalize the data in order to get better performance which means that there’s a big gap between the database has it seen by the technical people and the database has it seen by the by the business people with neo PJ the database is understood by everybody and is extremely intuitive which gives great benefits in terms of getting a problems right first time and not having to do a huge amounts of rework looking at speed so we already looked at how Monsanto had used neo PJ because they couldn’t to analyze their crop generations quickly enough but an even more powerful example was a company called shuttle in London and which used neo PJ for calculating optimal delivery strategies for goods in fact they do same hour delivery and they were so successful that they were bought by eBay and now that’s the this service is called eBay now and is providing same hour delivery services all over the world and so there’s a the developers at eBay say minutes to milliseconds and that’s literally true very very much faster using MJ and finally we’ll look at agility
So we’ve already talked about the way in which the graph can evolve and adapt and as you extend the scope of your system it’s very easy to include more data a more functional more more business context just by adding new elements to the graph it’s a very non-destructive way of doing things there’s no major refactoring required so you can build complicated applications in a nice agile way without worrying that you’ve gone off in the wrong direction start with and that you’re you’re going to have to rebuild your database completely after a few months it’s an extremely forgiving IVA evolving an adaptive model the second thing that contributes to agility is the query language that we’ve developed with neo4j we developed a language called cipher which is what’s called a declarative language so you don’t need to worry with cypher about how the data is organized you ask a simple question using a simple language which is like sequel we can make it familiar for people but does have some special facilities for pattern matching and it’s a business level query that you design so people do tend to fall in love with cypher because it is so efficient so what we’re looking at here is a same problem into it using sequel with a relational database behind it and using the APJ with a graft using cipher with a with with neo4j behind it so what this is trying to do is just to count the number of employees who report to a senior management manager and then grouped by the first level of manager so if you look at that you think ah I’m not going to waste my time reading that you that looks frightening and you imagine going to take a long time to debug it’s going to take a long time for anybody to understand it they need to make a change and as the structure of the database changes it gets it has to be maintained so it’s it’s it’s a simple question that’s resulted in a very complicated piece of logic okay in neo4j on the other hand you know I don’t know how much you in the audience know about cypher but it’s pretty easy to see what’s going on here we’re just doing a match which is the cipher equivalent of a select statement we just try to find the boss and his subordinates and how many people those subordinates manage so it’s a basically a two line query in neo4j as against I can’t even count how many lines in sequel so that has obvious impacts on your projects it’s you can write your queries more quickly it’s hard to it’s much easier to debug things and most important as far as TCO is concerned cost of ownership is concerned you don’t need to you don’t become dependent on critic on on key project members in order to maintain the code going forward it’s understandable and it’s maintainable and it’s it’s modifiable okay as I say users do what people have spent an hour or two using cipher they often say they will never go back to neo4j back to SQL again because it’s cipher is such a natural way of looking at a natural model of the world at model of things connected to other things with no intermediate concepts like artificial concepts like rows and tables and columns and so on just things connected to things with a powerful language for getting at them okay hey so that’s been a pretty quick introduction to cipher I should say it’s being adopted quite widely and we’ve created an open standard called open cipher which is being used all over the place and we hope it will encourage a lot of front-end developers to develop new ways of displaying and interacting with neo4j but also inevitably it will create back-end developers competing directly with with neo and and we wish them luck.
I hope you’ve been intrigued I’ve been interested and I hope you’re now ready to have a go with neo4j so it couldn’t be easy to get started and you can download the product and it doesn’t take more than a few minutes on most internet connections it’s very easy to download and when you download it you’re you’re you’re quickly load the neo4j browser which has got a help guide with some example databases and access to all of the support information you need so you can get started very quickly on our website you’ll find the graph Academy which has got a list of classroom based and online training facilities and there’s training happening in most cities once a month or so there’s one in London I think on the 22nd of November an advanced cyber course there’s one in Amsterdam and there are other ones in other parts of Europe finally if you want to learn more you can look at our YouTube channel where there’s the latest and greatest announcements of the most recent release on the APJ but also a very large range of people who are showing how to do particular things on how to how they solve their problems near Fiji is a open source product it’s got a very large and active community so you’re very welcome to participate so that’s the end of what I had to talk about I’ve now been joined by hazus and to answer any questions so is there anybody in the audience who’s got any questions I think we have one question which is how does Neil Patel integrate with all other elements in architectures in other active architecture I don’t know can you can you can you answer that question Hayes’s sure yeah cuz you can have your right yeah well killing us as this rat me out here for J user is a datastore it’s a very powerful data store and and as a datastore II can integrate with a number of components.
I would probably mention those that bring data into the store that will be specifically ETL tools and and well there’s a number of waiting ways in which they can integrate that would compete JDBC bays that can be to the through the REST API and and the other category of elements in that in an standard data architecture would be the data consumerism that can be via either tools that can be visualizations and again that can be through standard interfaces or or as well there’s a there’s a number of um of language drivers if you decide to build your own design a station for example your own client there’s a number of community drivers but there’s four of them that are a part of the standard delivery of Nia and and supported by it by any technology and these are they the JavaScript one the Java one of course a Python and net so uh many different ways and and and what a simple component to integrating your in your data architecture we have other questions so um we have a question that says what are good tools for visualizing graph output I know named for Jack comes with its own brother but most of my customers are use tableau view yes as I was mentioning that you may want to consume graph a graph data and delight it as a graph and there are some commercial tools that can do that for you we have partnership with some of them one of one that will building curious for example but there are standard BI tools like to want to mention like tab locate view or view many others and and well just to mention these two they typically access through the to the REST API to our tableau will use the web they hate the connector component so it’s pretty straightforward you can you can define to the cipher what’s the date set that you want to extract out emile and it’s directly imported into tableau and you can do your standard visualization or even combine it with the data coming from other sources so it’s pretty pretty straightforward okay we have other questions can you talk about the limitation of nunavut a under what circumstances would you such as people not to use neo4j well as Jenny has described it’s a it uses a particular way of modeling data so you define the limits there’s no the term the graph model the property graph we describe this is really simple but but really rich and power very expressive so I can’t think of a domain that you can’t model using using a graph database now if you’re coming back to to what relational has been solving for many years if your data is very predictable if you get data through forms in a very structured way and you consume it to produce reports then maybe you’ll find with relational databases but when I can’t name like a domain that what I would say don’t use perhaps in that case of course you know the key driver should be what’s the value in the connections in your data so we’re seeing that data in data point are valuable of course they are but the connections between them sometimes even more valuable so if you detect that that’s the case then then graphs can definitely help you other question is it possible to deploy a distributed version of neo4j like the traditional databases with partitions and read replicas you can deploy near in a cluster configuration you can have a multiple replicas of the opposite of your data now in the current version you want you can’t do partitioning you use each of the instances in a cluster will contain and will hold a full copy of your of your data so that’s that’s doing sense but our experience is that is a very efficient storage and and it’s I was going to say it well usually we haven’t come across a case that near can’t come handle in in terms of capacity or limitation of the of the storage in in a single node but yeah you can deploy it in a cluster configuration for high availability and keeping for disaster recovery and and to deal with with high high throughput so that’s that’s terribly possible and there’s another question that asked how many nodes can neo4j handle well there used to be a number that I could say which up to version two point something that was 32 billion but when we released a version three back in April the limit was removed I think it’s in the young trillions quadrillions Ivanov utility fee but basically it’s it’s unlimited in terms of the number of nodes yeah that’s it for the questions okay well thank you everybody for joining today and we’ll send an email with a link to the recording and are we going to distribute the slides as well yeah we can we can send you the slides and and the webinar is going to be recorded and available online in the coming week thank you everyone and have a very nice day bye