Skip to content

Neo4j advances machine learning compatibility for its graph database

Graph database developer Neo4j Inc. is upping its machine learning game today with a new release of Neo4j for Graph Data Science framework that leverages deep learning and graph convolutional neural networks to make data about graph connections more accessible to mainstream data science algorithms.

Specifically, release 1.4 adds graph embedding, a technique that calculates the shape of a surrounding network for each data element within a graph. Graph databases are unique for their ability to represent complex relationships using nodes, relationships and key-value pairs that define linked data items using a unique identifier. These connections can be traversed to find correlations that would be difficult or impossible to discover using relational tables because of the large number of joins that would be required.

However, multidimensional graph relationships don’t map cleanly to the lower-dimension vectors that are common in machine learning data sets. Graph embeddings make this possible by sampling the topology and properties of the graph to reduce its complexity to just the significant features that are needed for further machine learning.

“Graph embedding learns the structure of your graph to improve your knowledge of the graph,” said Alicia Frame, Neo4j’s product manager for the Graph Data Science library. “It’s graduating from chasing pointers to running really fast queries.” Without the reduction in complexity, an adjacency matrix for a 5 billion-node graph would have to have 5 billion-squared elements. “This distills that giant graph into a computer representation of every node in your graph,” she said.

The enhancements significantly increase the scope of data science algorithms that can be run against a graph beyond the basic set that was included when the library was introduced in April. They’re part of Neo4j’s broader goal to take graph databases beyond queries of raw data to predict outcomes based on connections.

Specifically, the company is adding three new embedding options. First is Node2Vec, a popular graph embedding algorithm that uses neural networks to learn continuous feature representations for nodes, which can then be used for downstream machine learning tasks.

FastRP (random projection) is a node-embedding algorithm that Neo4j says is up to 75,000 times faster than Node2Vec with equivalent accuracy and extreme scale. Although it’s functionally equivalent to Node2Vec, Frame said many data scientists will likely use both.

“FastRP is lightning fast but more work to tune the embeddings to know what you want,” she said. “Many customers will run Node2Vec till they get results that make sense to them and then go to FastRP to run them at scale.”

GraphSage is an embedding algorithm and process for inductive representation learning on graphs that uses graph convolutional neural networks. This can be applied continuously as the graph updates.

The upshot is that “we’re taking techniques that used to require a Ph.D. and democratizing them so anyone can download and have the power of graph predictions,” said Frame, who holds a Ph.D. “Before, we’d use a graph to store the data with the machine learning happening in Python. We’re connecting the dots.”

First published at Silicon Angle

Similar Posts: