Graph DBs in Enterprise: Top 3 use cases in which they make sense
Efficiently spotting suspect "circles" is a core features of GraphDBs for fraud fighting

Graph DBs in Enterprise: Top 3 use cases in which they make sense

A bit of history: the wild, early days of graph DBs

Maybe not many know that billions of dollar/euros have been (publicly) invested from the early 2000 to 2015+ in a topic called "Semantic web" which had a kind of graph databases - triplestores - and it's related knowledge representation format (RDF) as one of its core part.

It was a large research effort, so it is normal that attempts were made to use these tools for... pretty much anything (webservices, sensors, news, ancient texts, geometrical shapes, social networks, publishing data and many more)

The mantra was that triplestores were going to replace "traditional systems" like RDBMS and RDF replace any other exchange format and that graph query languages like the SPARQL would be more natural and easy to understand.

What Graph DBs fell short at

Many of those experiments failed to reach the intended impact. Certain expectations on Graph DBs did not really stand up to the test of real enterprise use cases. Three of these in my opinion are:

  1. The hope that graph DBs would magically be a cure for datamodel changes. While it's easy to add a new "property type" in a graph (no need to SQL schema changing elements) this is a minor plus as opposed to the changes required in loading and application logic.
  2. The hope that a graph representation like RDF could be conveniently used for anything. Due to it's "simplicity" (Tripes) RDF quickly become excruciating to use in real world to express "meta" data concepts such as time/provenance/trust. The complexity of the required extra data "triple castles" very quickly shuts brains off and makes developers and people who "have to get stuff done" - go to JSON, CSVs and what not.
  3. The hope that graph DB query languages would be superior in all use cases. Reality is that while some graph query languages like SPARQL (the SQL for RDF) are more natural for certain questions, for others they are a nightmare (here is an example sample query to "export a human legible list of companies" from PermID.org which is distributed in RDF)

Graph DBs hot again (and what they are unbeatable at)

As per many hype cycles, winter came on that very research strand, but we're now experiencing a novel bloom phase for Knowledge Graphs and graph DBs driven, in my opinion by a number of solid reasons.

On the one hand, Tech and Flexibility: new Graph DB engines have emerged which are way more flexible and enterprise ready. I am talking: multi modal DBs, Graph DBs supporting "Property Graphs" (out of the box, more expressive than RDF), but also graph stores that allow "virtualization" (seeing data in other systems as a graph without the need to copy data). Again, article above can provide a useful overview on it.

On the other hand, enterprise adoption for the right use cases: gone are the wild experimentation days, clever folks are using them for use cases in which they provide critical benefits.

In general: GraphDBs are absolutely unbeatable when one needs to explore "in depth" the relationships between records.

To clarify: it's not just a matter of having "many" relations (regular DBs are great at that) but it's a matter of depth.

Does your use case requires to explore in "depth" in the connections? If so, you really should get the help of a Graph DB.

Here are 3 notable classes of problems fitting this picture:

  1. Ring (graph pattern) detection: To fight fraud, for example, it is critical to flag individuals which are part of "suspect rings" which might have diverse shapes e.g. in a review fraud example they might be the authors of the item being reviewed or part of the circles of friends of the author. 
  2. Shortest path (and other path related algorithms) In law enforcement/cyber-security (and even life science), a typical question might be what is the shortest path between two "entities" - be these physical machines, people, events, computers - all across across a complex network of facts and entities.
  3. Graph Metric computations: In a complex network of entities, which one "stand out" because they're the most central? Graph metrics like "centrality" or "pagerank", make important nodes "emerge" out of the many others.

Graph DBs and 'Enterprise Wide' Knowledge Graphs

In Siren, we provide "Enterprise Wide Knowledge Graph", which means we connect to one or more sources your data where it naturally lives (e.g. Elasticsearch for huge logs, DBMS, Hadoop) and purely via UI configuration (no ETL) make all the data look and feel like a huge enterprise knowledge graph.

In the picture below, we explore the time evolution of a knowledge graph with no need of a graph DB (the data in this case is in an Elasticsearch back-end)

No alt text provided for this image

While this is... pretty cool (one can have a knowledge graph as big as production instances of Elasticsearch or Spark), we're excited that Siren 10.2 adds our first support specifically to graph DBs backend, to be able to provide boosted performances for use cases as those described above.

We start by supporting Neo4J. With the new connector one can use Siren to:

  • Connect to a Neo4J database and browse its data in Siren dashboards, with it's unique feature including "set-to-set relational navigation", exploitative link analysis, high quality content search discovery, and operational alerting (our general intro video here).
  • See Neo4J data "interconnected" to data you have in another systems: No need to move all into Neo4J to ask questions across your enterprise backends .
  • Use graph DB superpowers for the use cases where they are critical: Shortest path, graph pattern detection, graph matrix, etc.

A 'review fraud' fighting example in Siren with Neo4J

We illustrate this power with an example using a Neo4j graph of "movies", "reviewers", "actors" and "directors" with the associated relations. This is a view of the graph "schema":

The schema in our example: a person can be reviewer, but also director, actor etc

The goal is to detect "suspect reviews" - e.g. made by people who are also connected with the production of the movie itself.

After connecting and following the wizard, Siren presents the same schema in its internal configuration (and here one could connect to other data backend if required)

No alt text provided for this image

At this point one can explore data in Siren Dashboards and in Siren built-in Link analysis:

No alt text provided for this image

Automatic fraudster detection

At this point, we can use Neo4J superpower to efficiently spot suspect circles directly from the Siren UI. Here we insert the Neo4J cypher query to detect a suspect circle and we save it so that it will be automatically re-executed every day.

No alt text provided for this image

The result of this query can:

  • Generate alerts automatically (e.g. emails) if these circles are detected.
  • Browsed in the Siren Dashboards, where the flagged records can be singled out.
  • Visualized on the graph, as "starting points" for investigations (the Red icon on the left below is the automatically generated starting point).
No alt text provided for this image

Conclusion: The powerful case for Graph DBs as part of Enterprise Knowledge Graphs

Graph databases are gaining momentum and rightfully so, there are use cases for which they're indispensable.

On the other hand, its clear that IT departments won't be ditching RDBMS, Spark, or Elasticsearch for big data logs anytime soon - nor should they.

To implement an enterprise wide "knowledge graph" vision, encompassing all from low level logs to high level entities, keeping data where it is (as much as possible) is obviously quite attractive - for IT departments, business leadership and compliance department alike.

The Siren support for Graph DBs, starting with the Neo4J connector is a fundamental enabler for this vision - and is now also available in the free Siren Community edition.

For more on Siren and Knowledge Graphs, see https://siren.io/enterprise-knowledge-search-and-ediscovery/

As a member of the First Gen semweb era who funded and built a "Palantir-like" platform using Ontobroker - validated by Paul Allen's team at Vulcan (who really underwrote a lot of the US-based funding for advanced semantic rule engines like OB…) While EU Funding came from multiple Govts, US Govt reserved its early-stage funding by 3Letter agencies only. Very few VCs and Enterprise CIOs grok'd the value prop back then, making it prohibitively expensive for early stage companies to finance and build POCs. In addition, the deep terminology alone would stop accomplished CEOs and Phds in their tracks during due diligence. We moved to terms like "smart data" to simplify and keep the focus. Despite the tremendous cost savings over traditional Java apps (we were 30% of that cost), enterprises would rather pay more than learn something new. Things have come a long way since then. Your concise summary helps outline how platforms have evolved to a more rich compliment of web services that can integrate into SemWeb. Which doesn't necessarily simplify things until you are vested in the stacks on both sides, which you are. From your recent demo, its clear that Siren brings important benefits that helps bridge Enterprise Architects over to the appropriate tech for the right problem use case for semantics. Now Architects only need to invest in ontology design, for example when necessary. Previously that was the mantra, which of course, became a deep rabbit hole and drove demand for skills that weren’t always available or appreciated and split dev teams into different mindsets. Siren helps bridge those worlds and make the transition between traditional RDBMS and ontology-enabled Graph DBs much easier. For faster more affordable crime solving, etc. Closing this gap took years of dedicated hard work, which I truly can appreciate. Keep up the great thought leadership Giovanni!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics