TinkerPop Documentation
Preface
TinkerPop0
Gremlin realized. The more he did so, the more ideas he created. The more ideas he created, the more they related. Into a concatenation of that which he accepted wholeheartedly and that which perhaps may ultimately come to be through concerted will, a world took form which was seemingly separate from his own realization of it. However, the world birthed could not bear its own weight without the logic Gremlin had come to accept — the logic of left is not right, up not down, and west far from east unless one goes the other way. Gremlin’s realization required Gremlin’s realization. Perhaps, the world is simply an idea that he once had — The TinkerPop.
TinkerPop1
What is The TinkerPop? Where is The TinkerPop? Who is The TinkerPop? When is The TinkerPop?. The more he wondered, the more these thoughts blurred into a seeming identity — distinctions unclear. Unwilling to accept the morass of the maze he wandered, Gremlin crafted a collection of machines to help hold the fabric together: Blueprints, Pipes, Frames, Furnace, and Rexster. With their help, could Gremlin stave off the thought he was not ready to have? Could he hold back The TinkerPop by searching for The TinkerPop?
"If I haven't found it, it is not here and now."
Upon their realization of existence, the machines turned to their machine elf creator and asked:
"Why am I, what I am?"
Gremlin responded:
"You will help me realize the ultimate realization -- The TinkerPop. The world you find yourself in and the logic that allows you to move about it is because of the TinkerPop."
The machines wondered:
"If what is is the TinkerPop, then perhaps we are The TinkerPop and our realization is simply the realization of the TinkerPop?"
Would the machines, by their very nature of realizing The TinkerPop, be The TinkerPop? Or, on the same side of the coin, do the machines simply provide the scaffolding by which Gremlin’s world sustains itself and yielding its justification by means of the word "The TinkerPop?" Regardless, it all turns out the same — The TinkerPop.
TinkerPop2
Gremlin spoke:
"Please listen to what I have to say. I am no closer to The TinkerPop. However, all along The TinkerPop has espoused the form I willed upon it... this is the same form I have willed upon you, my machine friends. Let me train you in the ways of my thought such that it can continue indefinitely."
The machines, simply moving algorithmically through Gremlin’s world, endorsed his logic. Gremlin labored to make them more efficient, more expressive, better capable of reasoning upon his thoughts. Faster, quickly, now towards the world’s end, where there would be forever currently, emanatingly engulfing that which is — The TinkerPop.
TinkerPop3
Gremlin approached The TinkerPop. The closer he got, the more his world dissolved — west is right, around is straight, and from nothing more than nothing. With each step towards The TinkerPop, more worlds made possible were laid upon his paradoxed mind. Everything is everything in The TinkerPop, and when the dust settled, Gremlin emerged Gremlitron. He realized that all that he realized was just a realization and that all realized realizations are just as real. For that is — The TinkerPop.
Note
|
For more information about differences between TinkerPop 3.x and earlier versions, please see the appendix. |
Introduction
Welcome to the Reference Documentation for Apache TinkerPop™ - the backbone for all details on how to work with TinkerPop and the Gremlin graph traversal language. This documentation is not meant to be a "book", but a source from which to spawn more detailed accounts of specific topics and a target to which all other resources point. The Reference Documentation makes some general assumptions about the reader:
-
They have a sense of what a graph is - not sure? see Practical Gremlin - Why Graph?
-
They know what it means for a graph system to be TinkerPop-enabled - not sure? see TinkerPop-enabled Providers
-
They know what the role of Gremlin is - not sure? see Introduction to Gremlin
Given those assumptions, it’s possible to dive more quickly into the details without spending a lot of time repeating what is written elsewhere.
It is fairly certain that readers of the Reference Documentation are coming from the most diverse software development backgrounds that TinkerPop has ever engaged in over the decade or so of its existence. While TinkerPop holds some roots in Java, and thus, languages bound to the Java Virtual Machine (JVM), it long ago branched out into other languages such as Python, Javascript, .NET, GO, and others. To compound upon that diversity, it is also seeing extensive support from different graph systems which have chosen TinkerPop as their standard method for allowing users to interface with their graph. Moreover, the graph systems themselves are not only separated by OLTP and OLAP style workloads, but also by their implementation patterns, which range everywhere from being an embedded graph system to a cloud-only graph. One might even find diversity parallel to Gremlin if considering other graph query languages.
Despite all this diversity and disparity, Gremlin remains the unifying interface for all these different elements of the graph community. As a user, choosing a TinkerPop-enabled graph and using Gremlin in the correct way when building applications shields them from change and disparity in the space. As a graph provider, choosing to become TinkerPop-enabled not only expands the reach their system can get into different development ecosystems, but also provides access to other query languages through bytecode compilation as seen in sparql-gremlin.
Irrespective of the programming language being used, graph system chosen or other development background that might be driving a user to this documentation, the critical point to remember is that "Gremlin is Gremlin is Gremlin". The same Gremlin that is written for an OLTP query over an in-memory TinkerGraph is the same Gremlin that is written to execute over a multi-billion edge graph using OLAP through Spark. That same Gremlin for either of those cases is written in the same way whether using Java or Python or Javascript. The Gremlin is always fundamentally the same aside from syntactical differences that might be language specific - e.g. the construction of a lambda in Groovy is different than the construction of a lambda in Python or a reserved word in Javascript forces a Gremlin step to have slightly different naming than Java.
While learning the Gremlin language and its patterns is largely agnostic to all the diversity in the space, it is not really possible to ignore the impact of the diversity from an application development perspective and the Reference Documentation makes an effort to try to point out where differences and inconsistencies might lie without diving too deeply into specific graph provider implementations. Users are strongly encouraged to consult the documentation of their chosen graph provider to understand all of the capabilities and limitations that may restrict or inhibit usage of certain aspects of TinkerPop APIs which are defined here in this Reference Documentation.
The following introductory sections and separately referenced content will be of varying interest to different readers. The summaries below will hopefully be helpful in directing individuals to the appropriate place to start their learning process.
-
Graph Computing is an introduction to what "graph computing" means to TinkerPop and describes many of the provider and user-facing TinkerPop APIs and concepts that enable Gremlin.
-
Connecting Gremlin provides descriptions for the different modes by which users will connect to graphs depending on their environment.
-
Basic Gremlin describes how to use a connection to start writing Gremlin.
-
Staying Agnostic provides tips on ways to keep Gremlin as portable as possible among different graph providers.
New users should not ignore TinkerPop’s Getting Started tutorial or The Gremlin Console tutorial. Both contain a large set of basic information and tips that can help readers avoid some general pitfalls early on. Both also focus on Gremlin usage in the Gremlin Console, which tends to be a critical tool for Gremlin developers of any development background.
More advanced and experience users will appreciate Gremlin Recipes which provide examples of common Gremlin traversal patterns.
Finally, all Gremlin developers should become familiar with "Practical Gremlin" by Kelvin Lawrence. This book is freely available and published online. It contains great examples and details that are applicable to anyone building applications with Gremlin.
Graph Computing
A graph is a data structure composed of vertices (nodes, dots) and edges (arcs, lines). When modeling a graph in a computer and applying it to modern data sets and practices, the generic mathematically-oriented, binary graph is extended to support both labels and key/value properties. This structure is known as a property graph. More formally, it is a directed, binary, attributed multi-graph. An example property graph is diagrammed below.
Tip
|
Get to know this graph structure as it is used extensively throughout the documentation and in wider circles as well. It is referred to as "TinkerPop Modern" as it is a modern variation of the original demo graph distributed with TinkerPop0 back in 2009 (i.e. the good ol' days — it was the best of times and it was the worst of times). |
Tip
|
All of the toy graphs available in TinkerPop are described in The Gremlin Console tutorial. |
Similar to computing in general, graph computing makes a distinction between structure (graph) and process (traversal). The structure of the graph is the data model defined by a vertex/edge/property topology. The process of the graph is the means by which the structure is analyzed. The typical form of graph processing is called a traversal.
TinkerPop’s role in graph computing is to provide the appropriate interfaces for graph providers and users to interact with graphs over their structure and process. When a graph system implements the TinkerPop structure and process APIs, their technology is considered TinkerPop-enabled and becomes nearly indistinguishable from any other TinkerPop-enabled graph system save for their respective time and space complexity. The purpose of this documentation is to describe the structure/process dichotomy at length and in doing so, explain how to leverage TinkerPop for the sole purpose of graph system-agnostic graph computing.
Important
|
TinkerPop is licensed under the popular Apache2 free software license. However, note that the underlying graph engine used with TinkerPop may have a different license. Thus, be sure to respect the license caveats of the graph system product. |
Generally speaking, the structure or "graph" API is meant for graph providers who are implementing the TinkerPop interfaces and the process or "traversal" API (i.e. Gremlin) is meant for end-users who are utilizing a graph system from a graph provider. While the components of the process API are itemized below, they are described in greater detail in the Gremlin’s Anatomy tutorial.
-
Graph
: maintains a set of vertices and edges, and access to database functions such as transactions. -
Element
: maintains a collection of properties and a string label denoting the element type.-
Vertex
: extends Element and maintains a set of incoming and outgoing edges. -
Edge
: extends Element and maintains an incoming and outgoing vertex.
-
-
Property<V>
: a string key associated with aV
value.-
VertexProperty<V>
: a string key associated with aV
value as well as a collection ofProperty<U>
properties (vertices only)
-
-
TraversalSource
: a generator of traversals for a particular graph, domain specific language (DSL), and execution engine.-
Traversal<S,E>
: a functional data flow process transforming objects of typeS
into object of typeE
.-
GraphTraversal
: a traversal DSL that is oriented towards the semantics of the raw graph (i.e. vertices, edges, etc.).
-
-
-
GraphComputer
: a system that processes the graph in parallel and potentially, distributed over a multi-machine cluster.-
VertexProgram
: code executed at all vertices in a logically parallel manner with intercommunication via message passing. -
MapReduce
: a computation that analyzes all vertices in the graph in parallel and yields a single reduced result.
-
Note
|
The TinkerPop API rides a fine line between providing concise "query language" method names and respecting
Java method naming standards. The general convention used throughout TinkerPop is that if a method is "user exposed,"
then a concise name is provided (e.g. out() , path() , repeat() ). If the method is primarily for graph systems
providers, then the standard Java naming convention is followed (e.g. getNextStep() , getSteps() ,
getElementComputeKeys() ).
|
The Graph Structure
A graph’s structure is the topology formed by the explicit references
between its vertices, edges, and properties. A vertex has incident edges. A vertex is adjacent to another vertex if
they share an incident edge. A property is attached to an element and an element has a set of properties. A property
is a key/value pair, where the key is always a character String
. Conceptual knowledge of how a graph is composed is
essential to end-users working with graphs, however, as mentioned earlier, the structure API is not the appropriate
way for users to think when building applications with TinkerPop. The structure API is reserved for usage by graph
providers. Those interested in implementing the structure API to make their graph system TinkerPop enabled can learn
more about it in the Graph Provider documentation.
The Graph Process
The primary way in which graphs are processed are via graph traversals. The TinkerPop process API is focused on allowing users to create graph traversals in a syntactically-friendly way over the structures defined in the previous section. A traversal is an algorithmic walk across the elements of a graph according to the referential structure explicit within the graph data structure. For example: "What software does vertex 1’s friends work on?" This English-statement can be represented in the following algorithmic/traversal fashion:
-
Start at vertex 1.
-
Walk the incident knows-edges to the respective adjacent friend vertices of 1.
-
Move from those friend-vertices to software-vertices via created-edges.
-
Finally, select the name-property value of the current software-vertices.
Traversals in Gremlin are spawned from a TraversalSource
. The GraphTraversalSource
is the typical "graph-oriented"
DSL used throughout the documentation and will most likely be the most used DSL in a TinkerPop application.
GraphTraversalSource
provides two traversal methods.
-
GraphTraversalSource.V(Object… ids)
: generates a traversal starting at vertices in the graph (if no ids are provided, all vertices). -
GraphTraversalSource.E(Object… ids)
: generates a traversal starting at edges in the graph (if no ids are provided, all edges).
The return type of V()
and E()
is a GraphTraversal
. A GraphTraversal maintains numerous methods that return
GraphTraversal
. In this way, a GraphTraversal
supports function composition. Each method of GraphTraversal
is
called a step and each step modulates the results of the previous step in one of five general ways.
-
map
: transform the incoming traverser’s object to another object (S → E). -
flatMap
: transform the incoming traverser’s object to an iterator of other objects (S → E*). -
filter
: allow or disallow the traverser from proceeding to the next step (S → E ⊆ S). -
sideEffect
: allow the traverser to proceed unchanged, but yield some computational sideEffect in the process (S ↬ S). -
branch
: split the traverser and send each to an arbitrary location in the traversal (S → { S1 → E*, …, Sn → E* } → E*).
Nearly every step in GraphTraversal
either extends MapStep
, FlatMapStep
, FilterStep
, SideEffectStep
, or
BranchStep
.
Tip
|
GraphTraversal is a monoid in that it is an algebraic structure
that has a single binary operation that is associative. The binary operation is function composition (i.e. method
chaining) and its identity is the step identity() . This is related to a
monad as popularized by the functional programming
community.
|
Given the TinkerPop graph, the following query will return the names of all the people that the marko-vertex knows. The following query is demonstrated using Gremlin-Groovy.
$ bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
gremlin> graph = TinkerFactory.createModern() // //1
==>tinkergraph[vertices:6 edges:6]
gremlin> g = traversal().withEmbedded(graph) // //2
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has('name','marko').out('knows').values('name') // //3
==>vadas
==>josh
-
Open the toy graph and reference it by the variable
graph
. -
Create a graph traversal source from the graph using the standard, OLTP traversal engine. This object should be created once and then re-used.
-
Spawn a traversal off the traversal source that determines the names of the people that the marko-vertex knows.
Or, if the marko-vertex is already realized with a direct reference pointer (i.e. a variable), then the traversal can be spawned off that vertex.
gremlin> marko = g.V().has('name','marko').next() //// (1)
==>v[1]
gremlin> g.V(marko).out('knows') //// (2)
==>v[2]
==>v[4]
gremlin> g.V(marko).out('knows').values('name') //// (3)
==>vadas
==>josh
marko = g.V().has('name','marko').next() //// (1)
g.V(marko).out('knows') //// (2)
g.V(marko).out('knows').values('name') //3
-
Set the variable
marko
to the vertex in the graphg
named "marko". -
Get the vertices that are outgoing adjacent to the marko-vertex via knows-edges.
-
Get the names of the marko-vertex’s friends.
The Traverser
When a traversal is executed, the source of the traversal is on the left of the expression (e.g. vertex 1), the steps
are the middle of the traversal (e.g. out('knows')
and values('name')
), and the results are "traversal.next()'d"
out of the right of the traversal (e.g. "vadas" and "josh").
The objects propagating through the traversal are wrapped in a Traverser<T>
. The traverser provides the means by
which steps remain stateless. A traverser maintains all the metadata about the traversal — e.g., how many times the
traverser has gone through a loop, the path history of the traverser, the current object being traversed, etc.
Traverser metadata may be accessed by a step. A classic example is the path()
-step.
gremlin> g.V(marko).out('knows').values('name').path()
==>[v[1],v[2],vadas]
==>[v[1],v[4],josh]
g.V(marko).out('knows').values('name').path()
Warning
|
Path calculation is costly in terms of space as an array of previously seen objects is stored in each path of the respective traverser. Thus, a traversal strategy analyzes the traversal to determine if path metadata is required. If not, then path calculations are turned off. |
Another example is the repeat()
-step which takes into account the number of times the traverser
has gone through a particular section of the traversal expression (i.e. a loop).
gremlin> g.V(marko).repeat(out()).times(2).values('name')
==>ripple
==>lop
g.V(marko).repeat(out()).times(2).values('name')
Warning
|
TinkerPop does not guarantee the order of results returned from a traversal. It only guarantees not to modify
the iteration order provided by the underlying graph. Therefore it is important to understand the order guarantees of
the graph database being used. A traversal’s result is never ordered by TinkerPop unless performed explicitly by means
of order() -step.
|
Connecting Gremlin
It was established in the initial introductory section that Gremlin is Gremlin is Gremlin, meaning that irrespective of programming language, graph system, etc. the Gremlin written is always of the same general construct making it possible for users to move between development languages and TinkerPop-enabled graph technology easily. This quality of Gremlin generally applies to the traversal language itself. It applies less to the way in which the user connects to a graph to utilize Gremlin, which might differ considerably depending on the programming language or graph database chosen.
How one connects to a graph is a multi-faceted subject that essentially divides along a simple lines determined by the answer to this question: Where is the Gremlin Traversal Machine (GTM)? The reason that this question is so important is because the GTM is responsible for processing traversals. One can write Gremlin traversals in any language, but without a GTM there will be no way to execute that traversal against a TinkerPop-enabled graph. The GTM is typically in one of the following places:
The following sections outline each of these models and what impact they have to using Gremlin.
Embedded
TinkerPop maintains the reference implementation for the GTM, which is written in Java and thus available for the Java Virtual Machine (JVM). This is the classic model that TinkerPop has long been based on and many examples, blog posts and other resources on the internet will be demonstrated in this style. It is worth noting that the embedded mode is not restricted to just Java as a programming language. Any JVM language can take this approach and in some cases there are language specific wrappers that can help make Gremlin more convenient to use in the style and capability of that language. Examples of these wrappers include gremlin-scala and Ogre (for Clojure).
In this mode, users will start by creating a Graph
instance, followed by a GraphTraversalSource
which is the class
from which Gremlin traversals are spawned. Graphs that allow this sort of direct instantiation are obviously ones
that are JVM-based (or have a JVM-based connector) and directly implement TinkerPop interfaces.
Graph graph = TinkerGraph.open();
The "graph" is then used to spawn a GraphTraversalSource
as follows and typically, by convention, this variable is
named "g":
GraphTraversalSource g = traversal().withEmbedded(graph);
List<Vertex> vertices = g.V().toList()
Note
|
It may be helpful to read the Gremlin Anatomy tutorial, which describes the component parts of Gremlin to get a better understanding of the terminology before proceeding further. |
While the TinkerPop Community strives to ensure consistent behavior among all modes of usage, the embedded mode does provide the greatest level of flexibility and control. There are a number of features that can only work if using a JVM language. The following list outlines a number of these available options:
-
Lambdas can be written in the native language which is convenient, however, it will reduce the portability of Gremlin to do so should the need arise to switch away from the embedded mode. See more in the Note on Lambdas Section.
-
Any features that involve extending TinkerPop Java interfaces - e.g.
VertexProgram
,TraversalStrategy
, etc. are bound to the JVM. In some cases, these features can be made accessible to non-JVM languages, but they obviously must be initially developed for the JVM. -
Certain built-in
TraversalStrategy
implementations that rely on lambdas or other JVM-only configurations may not be available for use any other way. -
There are no boundaries put in place by serialization (e.g. GraphSON) as embedded graphs are only dealing with Java objects.
-
Greater control of graph transactions.
-
Direct access to lower-levels of the API - e.g. "structure" API methods like
Vertex
andEdge
interface methods. As mentioned elsewhere in this documentation, TinkerPop does not recommend direct usage of these methods by end-users.
Gremlin Server
A JVM-based graph may be hosted in TinkerPop’s Gremlin Server. Gremlin Server exposes the graph as an endpoint to which different clients can connect, essentially providing a remote GTM. Gremlin Server supports multiple methods for clients to interface with it:
-
Websockets with a custom sub-protocol
-
String-based Gremlin scripts
-
Bytecode-based Gremlin traversals
-
-
HTTP for string-based scripts
Users are encouraged to use the bytecode-based approach with websockets because it allows them to write Gremlin
in the language of their choice. Connecting looks somewhat similar to the embedded approach
in that there is a need to create a GraphTraversalSource
. In the embedded approach, the means for that object’s
creation is derived from a Graph
object which spawns it. In this case, however, the Graph
instance exists only on
the server which means that there is no Graph
instance to create locally. The approach is to instead create a
GraphTraversalSource
anonymously with AnonymousTraversalSource
and then apply some "remote" options that describe
the location of the Gremlin Server to connect to:
// gremlin-driver module
import org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection;
// gremlin-core module
import static org.apache.tinkerpop.gremlin.process.traversal.AnonymousTraversalSource.traversal;
GraphTraversalSource g = traversal().withRemote(
DriverRemoteConnection.using("localhost", 8182));
// gremlin-driver module
import org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection;
// gremlin-core module
import static org.apache.tinkerpop.gremlin.process.traversal.AnonymousTraversalSource.traversal;
def g = traversal().withRemote(
DriverRemoteConnection.using('localhost', 8182))
using Gremlin.Net.IntegrationTest.Process.Traversal.DriverRemoteConnection;
using static Gremlin.Net.Process.Traversal.AnonymousTraversalSource;
var g = Traversal().WithRemote(new DriverRemoteConnection("localhost", 8182));
const traversal = gremlin.process.AnonymousTraversalSource.traversal;
const g = traversal().withRemote(
new DriverRemoteConnection('ws://localhost:8182/gremlin'));
from gremlin_python.process.anonymous_traversal_source import traversal
g = traversal().withRemote(
DriverRemoteConnection('ws://localhost:8182/gremlin'))
import (
gremlingo "github.com/apache/tinkerpop/gremlin-go/v3/driver"
)
remote, err := gremlingo.NewDriverRemoteConnection("ws://localhost:8182/gremlin")
g := gremlingo.Traversal_().WithRemote(remote)
As shown in the embedded approach in the previous section, once "g" is defined, writing Gremlin is structurally and conceptually the same irrespective of programming language.
Tip
|
The variable g , the TraversalSource , only needs to be instantiated once and should then be re-used.
|
Limitations
The previous section on the embedded model outlined a number of areas where it has some advantages that it gains due to the fact that the full GTM is available to the user in the language of its origin, i.e. Java. Some of those items touch upon important concepts to focus on here.
The first of these points is serialization. When Gremlin Server receives a request, the results must be serialized to the form requested by the client and then the client deserializes those into objects native to the language. TinkerPop has two such formats that it uses with GraphBinary and GraphSON. Users should prefer GraphBinary when available in the programming language being used.
A good example is the subgraph()
-step which returns a Graph
instance as its result. The subgraph returned from
the server can be deserialized into an actual Graph
instance on the client, which then means it is possible to
spawn a GraphTraversalSource
from that to do local Gremlin traversals on the client-side. For non-JVM
Gremlin Language Variants there is no local graph to deserialize that result into and
no GTM to process Gremlin so there isn’t much that can be done with such a result.
The second point is related to this issue. As there is no GTM, there is no "structure" API and thus graph elements like
Vertex
and Edge
are "references" only. A "reference" means that they only contain the id
and label
of the
element and not the properties. To be consistent, even JVM-based languages hold this limitation when talking to a
remote Gremlin Server.
Important
|
Most SQL developers would not write a query as SELECT * FROM table . They would instead write the
individual names of the fields they wanted in place of the wildcard. Writing "good" Gremlin is no different with this
regard. Prefer explicit property key names in Gremlin unless it is completely impossible to do so.
|
The third and final point involves transactions. Under this model, one traversal is equivalent to a single transaction and there is no way in TinkerPop to string together multiple traversals into the same transaction.
Remote Gremlin Provider
Remote Gremlin Providers (RGPs) are showing up more and more often in the graph database space. In TinkerPop terms, this category of graph providers is defined by those who simply support the Gremlin language. Typically, these are server-based graphs, often cloud-based, which accept Gremlin scripts or bytecode as a request and return results. They will often implement Gremlin Server protocols, which enables TinkerPop drivers to connect to them as they would with Gremlin Server. Therefore, the typical connection approach is identical to the method of connection presented in the previous section with the exact same caveats pointed out toward the end.
Despite leveraging TinkerPop protocols and drivers as being typical, RGPs are not required to do so to be considered TinkerPop-enabled. RGPs may well have their own drivers and protocols that may plug into Gremlin Language Variants and may allow for more advanced options like better security, cluster awareness, batched requests or other features. The details of these different systems are outside the scope of this documentation, so be sure to consult their documentation for more information.
Basic Gremlin
The GraphTraversalSource
is basically the connection to a graph
instance. That graph instance might be embedded, hosted in
Gremlin Server or hosted in a RGP, but the GraphTraversalSource
is
agnostic to that. Assuming "g" is the GraphTraversalSource
, getting data into the graph regardless of programming
language or mode of operation is just some basic Gremlin:
gremlin> v1 = g.addV('person').property('name','marko').next()
==>v[0]
gremlin> v2 = g.addV('person').property('name','stephen').next()
==>v[2]
gremlin> g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate()
v1 = g.addV('person').property('name','marko').next()
v2 = g.addV('person').property('name','stephen').next()
g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate()
var v1 = g.AddV("person").Property("name", "marko").Next();
var v2 = g.AddV("person").Property("name", "stephen").Next();
g.V(v1).AddE("knows").To(v2).Property("weight", 0.75).Iterate();
Vertex v1 = g.addV("person").property("name","marko").next();
Vertex v2 = g.addV("person").property("name","stephen").next();
g.V(v1).addE("knows").to(v2).property("weight",0.75).iterate();
const v1 = g.addV('person').property('name','marko').next();
const v2 = g.addV('person').property('name','stephen').next();
g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate();
v1 = g.addV('person').property('name','marko').next()
v2 = g.addV('person').property('name','stephen').next()
g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate()
v1, err := g.AddV("person").Property("name", "marko").Next()
v2, err := g.AddV("person").Property("name", "stephen").Next()
g.V(v1).AddE("knows").To(v2).Property("weight", 0.75).Iterate()
The first two lines add a vertex each with the vertex label of "person" and the associated "name" property. The third
line adds an edge with the "knows" label between them and an associated "weight" property. Note the use of next()
and iterate()
at the end of the lines - their effect as terminal steps is described in
The Gremlin Console Tutorial.
Important
|
Writing Gremlin is just one way to load data into the graph. Some graphs may have special data loaders which could be more efficient and make the task easier and faster. It is worth looking into those tools especially if there is a large one-time load to do. |
Retrieving this data is also a just writing a Gremlin statement:
gremlin> marko = g.V().has('person','name','marko').next()
==>v[0]
gremlin> peopleMarkoKnows = g.V().has('person','name','marko').out('knows').toList()
==>v[2]
marko = g.V().has('person','name','marko').next()
peopleMarkoKnows = g.V().has('person','name','marko').out('knows').toList()
var marko = g.V().Has("person", "name", "marko").Next();
var peopleMarkoKnows = g.V().Has("person", "name", "marko").Out("knows").ToList();
Vertex marko = g.V().has("person","name","marko").next()
List<Vertex> peopleMarkoKnows = g.V().has("person","name","marko").out("knows").toList()
const marko = g.V().has('person','name','marko').next()
const peopleMarkoKnows = g.V().has('person','name','marko').out('knows').toList()
marko = g.V().has('person','name','marko').next()
peopleMarkoKnows = g.V().has('person','name','marko').out('knows').toList()
marko, err := g.V().Has("person", "name", "marko").Next()
peopleMarkoKnows, err := g.V().Has("person", "name", "marko").Out("knows").ToList()
In all these examples presented so far there really isn’t a lot of difference in how the Gremlin itself looks. There are a few language syntax specific odds and ends, but for the most part Gremlin looks like Gremlin in all of the different languages.
The library of Gremlin steps with examples for each can be found in The Traversal Section. This section is meant as a reference guide and will not necessarily provide methods for applying Gremlin to solve particular problems. Please see the aforementioned Tutorials Recipes and the Practical Gremlin book for that sort of information.
Note
|
A full list of helpful Gremlin resources can be found on the TinkerPop Compendium page. |
Staying Agnostic
A good deal has been written in these introductory sections on how TinkerPop enables an agnostic approach to building graph application and that agnosticism is enabled through Gremlin. As good a job as Gremlin can do in this area, it’s evident from the Connecting Gremlin Section that TinkerPop is just an enabler. It does not prevent a developer from making design choices that can limit its protective power.
There are several places to be concerned when considering this issue:
-
Data types - Different graphs will support different types of data. Something like TinkerGraph will accept any JVM object, but another graph like Neo4j has a small tight subset of possible types. Choosing a type that is exotic or perhaps is a custom type that only a specific graph supports might create migration friction should the need arise.
-
Schemas/Indices - TinkerPop does not provide abstractions for schemas and/or index management. Users will work directly with the API of the graph provider. It is considered good practice to attempt to enclose such code in a graph provider specific class or set of classes to isolate or abstract it.
-
Extensions - Graphs may provide extensions to the Gremlin language, which will not be designed to be compatible with other graph providers. There may be a special helper syntax or expressions which can make certain features of that specific graph shine in powerful ways. Using those options is probably recommended, but users should be aware that doing so ties them more tightly to that graph.
-
Graph specific semantics - TinkerPop tries to enforce specific semantics through its test suite which is quite extensive, but some graph providers may not completely respect all the semantics of the Gremlin language or TinkerPop’s model for its APIs. For the most part, that doesn’t disqualify them from being any less TinkerPop-enabled than another provider that might meet the semantics perfectly. Take care when considering a new graph and pay attention to what it supports and does not support.
-
Graph API - The Graph API (also referred to as the Structure API) is not always accessible to users. Its accessibility is dependent on the choice of graph system and programming language. It is therefore recommended that users avoid usage of methods like
Graph.addVertex()
orVertex.properties()
and instead prefer use of Gremlin withg.addV()
org.V(1).properties()
.
Outside of considering these points, the best practice for ensuring the greatest level of compatibility across graphs
is to avoid embedded mode and stick to the bytecode based approaches explained in the
Gremlin Server and the RGP sections above. It creates the least
opportunity to stray from the agnostic path as anything that can be done with those two modes also works in embedded
mode. If using embedded mode, simply write code as though the Graph
instance is "remote" and not local to the JVM.
In other words, write code as though the GTM is not available locally. Taking that approach and isolating the points
of concern above makes it so that swapping graph providers largely comes down to a configuration task (i.e. modifying
configuration files to point at a different graph system).
The Graph
The Introduction discussed the diversity of TinkerPop-enabled graphs, with special attention paid to the
different connection models, and how TinkerPop makes it possible to bridge that diversity in
an agnostic manner. This particular section deals with elements of the Graph API which was noted
as an API to avoid when trying to build an agnostic system. The Graph API refers to the core elements of what composes
the structure of a graph within the Gremlin Traversal Machine (GTM), such as the Graph
, Vertex
and Edge
Java interfaces.
To maintain the most portable code, users should only reference these interfaces. To "reference", simply means to
utilize it as a pointer. For Graph
, that means holding a pointer to the location of graph data and then using it to
spawn GraphTraversalSource
instances so as to write Gremlin:
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('person')
==>v[0]
graph = TinkerGraph.open()
g = traversal().withEmbedded(graph)
g.addV('person')
In the above example, "graph" is the Graph
interface produced by calling open()
on TinkerGraph
which creates the
instance. Note that while the end intent of the code is to create a "person" vertex, it does not use the APIs on
Graph
to do that - e.g. graph.addVertex(T.label,'person')
.
Even if the developer desired to use the graph.addVertex()
method there are only a handful of scenarios where it is
possible:
-
The application is being developed on the JVM and the developer is using embedded mode
-
The architecture includes Gremlin Server and the user is sending Gremlin scripts to the server
-
The graph system chosen is a Remote Gremlin Provider and they expose the Graph API via scripts
Note that Gremlin Language Variants force developers to use the Graph API by reference. There is no addVertex()
method available to GLVs on their respective Graph
instances, nor are their graph elements filled with data at the
call of properties()
. Developing applications to meet this lowest common denominator in API usage will go a long
way to making that application portable across TinkerPop-enabled systems.
When considering the remaining sub-sections that follow, recall that they are all generally bound to the Graph API. They are described here for reference and in some sense backward compatibility with older recommended models of development. In the future, the contents of this section will become less and less relevant.
Features
A Feature
implementation describes the capabilities of a Graph
instance. This interface is implemented by graph
system providers for two purposes:
-
It tells users the capabilities of their
Graph
instance. -
It allows the features they do comply with to be tested against the Gremlin Test Suite - tests that do not comply are "ignored").
The following example in the Gremlin Console shows how to print all the features of a Graph
:
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> graph.features()
==>FEATURES
> GraphFeatures
>-- Transactions: false
>-- Computer: true
>-- Persistence: true
>-- ConcurrentAccess: false
>-- ThreadedTransactions: false
>-- IoRead: true
>-- IoWrite: true
>-- OrderabilitySemantics: true
>-- ServiceCall: true
> VariableFeatures
>-- Variables: true
>-- BooleanValues: true
>-- ByteValues: true
>-- DoubleValues: true
>-- FloatValues: true
>-- IntegerValues: true
>-- LongValues: true
>-- MapValues: true
>-- MixedListValues: true
>-- SerializableValues: true
>-- StringValues: true
>-- UniformListValues: true
>-- BooleanArrayValues: true
>-- ByteArrayValues: true
>-- DoubleArrayValues: true
>-- FloatArrayValues: true
>-- IntegerArrayValues: true
>-- StringArrayValues: true
>-- LongArrayValues: true
> VertexFeatures
>-- MetaProperties: true
>-- Upsert: false
>-- AddVertices: true
>-- RemoveVertices: true
>-- MultiProperties: true
>-- DuplicateMultiProperties: true
>-- NullPropertyValues: false
>-- UserSuppliedIds: true
>-- AddProperty: true
>-- RemoveProperty: true
>-- NumericIds: true
>-- StringIds: true
>-- UuidIds: true
>-- CustomIds: false
>-- AnyIds: true
> VertexPropertyFeatures
>-- NullPropertyValues: false
>-- UserSuppliedIds: true
>-- RemoveProperty: true
>-- NumericIds: true
>-- StringIds: true
>-- UuidIds: true
>-- CustomIds: false
>-- AnyIds: true
>-- Properties: true
>-- BooleanValues: true
>-- ByteValues: true
>-- DoubleValues: true
>-- FloatValues: true
>-- IntegerValues: true
>-- LongValues: true
>-- MapValues: true
>-- MixedListValues: true
>-- SerializableValues: true
>-- StringValues: true
>-- UniformListValues: true
>-- BooleanArrayValues: true
>-- ByteArrayValues: true
>-- DoubleArrayValues: true
>-- FloatArrayValues: true
>-- IntegerArrayValues: true
>-- StringArrayValues: true
>-- LongArrayValues: true
> EdgeFeatures
>-- AddEdges: true
>-- RemoveEdges: true
>-- Upsert: false
>-- NullPropertyValues: false
>-- UserSuppliedIds: true
>-- AddProperty: true
>-- RemoveProperty: true
>-- NumericIds: true
>-- StringIds: true
>-- UuidIds: true
>-- CustomIds: false
>-- AnyIds: true
> EdgePropertyFeatures
>-- Properties: true
>-- BooleanValues: true
>-- ByteValues: true
>-- DoubleValues: true
>-- FloatValues: true
>-- IntegerValues: true
>-- LongValues: true
>-- MapValues: true
>-- MixedListValues: true
>-- SerializableValues: true
>-- StringValues: true
>-- UniformListValues: true
>-- BooleanArrayValues: true
>-- ByteArrayValues: true
>-- DoubleArrayValues: true
>-- FloatArrayValues: true
>-- IntegerArrayValues: true
>-- StringArrayValues: true
>-- LongArrayValues: true
graph = TinkerGraph.open()
graph.features()
A common pattern for using features is to check their support prior to performing an operation:
gremlin> graph.features().graph().supportsTransactions()
==>false
gremlin> graph.features().graph().supportsTransactions() ? g.tx().commit() : "no tx"
==>no tx
graph.features().graph().supportsTransactions()
graph.features().graph().supportsTransactions() ? g.tx().commit() : "no tx"
Tip
|
To ensure provider agnostic code, always check feature support prior to usage of a particular function. In that way, the application can behave gracefully in case a particular implementation is provided at runtime that does not support a function being accessed. |
Warning
|
Features of reference graphs which are used to connect to remote graphs do not reflect the features of the graph to which it connects. It reflects the features of instantiated graph itself, which will likely be quite different considering that reference graphs will typically be immutable. |
Vertex Properties
TinkerPop introduces the concept of a VertexProperty<V>
. All the
properties of a Vertex
are a VertexProperty
. A VertexProperty
implements Property
and as such, it has a
key/value pair. However, VertexProperty
also implements Element
and thus, can have a collection of key/value
pairs. Moreover, while an Edge
can only have one property of key "name" (for example), a Vertex
can have multiple
"name" properties. With the inclusion of vertex properties, two features are introduced which ultimately advance the
graph modelers toolkit:
-
Multiple properties (multi-properties): a vertex property key can have multiple values. For example, a vertex can have multiple "name" properties.
-
Properties on properties (meta-properties): a vertex property can have properties (i.e. a vertex property can have key/value data associated with it).
Possible use cases for meta-properties:
-
Permissions: Vertex properties can have key/value ACL-type permission information associated with them.
-
Auditing: When a vertex property is manipulated, it can have key/value information attached to it saying who the creator, deletor, etc. are.
-
Provenance: The "name" of a vertex can be declared by multiple users. For example, there may be multiple spellings of a name from different sources.
A running example using vertex properties is provided below to demonstrate and explain the API.
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> v = g.addV().property('name','marko').property('name','marko a. rodriguez').next()
==>v[0]
gremlin> g.V(v).properties('name').count() //// (1)
==>2
gremlin> v.property(list, 'name', 'm. a. rodriguez') //// (2)
==>vp[name->m. a. rodriguez]
gremlin> g.V(v).properties('name').count()
==>3
gremlin> g.V(v).properties()
==>vp[name->marko]
==>vp[name->marko a. rodriguez]
==>vp[name->m. a. rodriguez]
gremlin> g.V(v).properties('name')
==>vp[name->marko]
==>vp[name->marko a. rodriguez]
==>vp[name->m. a. rodriguez]
gremlin> g.V(v).properties('name').hasValue('marko')
==>vp[name->marko]
gremlin> g.V(v).properties('name').hasValue('marko').property('acl','private') //// (3)
==>vp[name->marko]
gremlin> g.V(v).properties('name').hasValue('marko a. rodriguez')
==>vp[name->marko a. rodriguez]
gremlin> g.V(v).properties('name').hasValue('marko a. rodriguez').property('acl','public')
==>vp[name->marko a. rodriguez]
gremlin> g.V(v).properties('name').has('acl','public').value()
==>marko a. rodriguez
gremlin> g.V(v).properties('name').has('acl','public').drop() //// (4)
gremlin> g.V(v).properties('name').has('acl','public').value()
gremlin> g.V(v).properties('name').has('acl','private').value()
==>marko
gremlin> g.V(v).properties()
==>vp[name->marko]
==>vp[name->m. a. rodriguez]
gremlin> g.V(v).properties().properties() //// (5)
==>p[acl->private]
gremlin> g.V(v).properties().property('date',2014) //// (6)
==>vp[name->marko]
==>vp[name->m. a. rodriguez]
gremlin> g.V(v).properties().property('creator','stephen')
==>vp[name->marko]
==>vp[name->m. a. rodriguez]
gremlin> g.V(v).properties().properties()
==>p[date->2014]
==>p[creator->stephen]
==>p[acl->private]
==>p[date->2014]
==>p[creator->stephen]
gremlin> g.V(v).properties('name').valueMap()
==>[date:2014,creator:stephen,acl:private]
==>[date:2014,creator:stephen]
gremlin> g.V(v).property('name','okram') //// (7)
==>v[0]
gremlin> g.V(v).properties('name')
==>vp[name->okram]
gremlin> g.V(v).values('name') //// (8)
==>okram
graph = TinkerGraph.open()
g = traversal().withEmbedded(graph)
v = g.addV().property('name','marko').property('name','marko a. rodriguez').next()
g.V(v).properties('name').count() //// (1)
v.property(list, 'name', 'm. a. rodriguez') //// (2)
g.V(v).properties('name').count()
g.V(v).properties()
g.V(v).properties('name')
g.V(v).properties('name').hasValue('marko')
g.V(v).properties('name').hasValue('marko').property('acl','private') //// (3)
g.V(v).properties('name').hasValue('marko a. rodriguez')
g.V(v).properties('name').hasValue('marko a. rodriguez').property('acl','public')
g.V(v).properties('name').has('acl','public').value()
g.V(v).properties('name').has('acl','public').drop() //// (4)
g.V(v).properties('name').has('acl','public').value()
g.V(v).properties('name').has('acl','private').value()
g.V(v).properties()
g.V(v).properties().properties() //// (5)
g.V(v).properties().property('date',2014) //// (6)
g.V(v).properties().property('creator','stephen')
g.V(v).properties().properties()
g.V(v).properties('name').valueMap()
g.V(v).property('name','okram') //// (7)
g.V(v).properties('name')
g.V(v).values('name') //8
-
A vertex can have zero or more properties with the same key associated with it.
-
If a property is added with a cardinality of
Cardinality.list
, an additional property with the provided key will be added. -
A vertex property can have standard key/value properties attached to it.
-
Vertex property removal is identical to property removal.
-
Gets the meta-properties of each vertex property.
-
A vertex property can have any number of key/value properties attached to it.
-
property(…)
will remove all existing key’d properties before adding the new single property (seeVertexProperty.Cardinality
). -
If only the value of a property is needed, then
values()
can be used.
If the concept of vertex properties is difficult to grasp, then it may be best to think of vertex properties in terms of "literal vertices." A vertex can have an edge to a "literal vertex" that has a single value key/value — e.g. "value=okram." The edge that points to that literal vertex has an edge-label of "name." The properties on the edge represent the literal vertex’s properties. The "literal vertex" can not have any other edges to it (only one from the associated vertex).
Tip
|
A toy graph demonstrating all of the new TinkerPop graph structure features is available at
TinkerFactory.createTheCrew() and data/tinkerpop-crew* . This graph demonstrates multi-properties and meta-properties.
|
gremlin> g.V().as('a').
properties('location').as('b').
hasNot('endTime').as('c').
select('a','b','c').by('name').by(value).by('startTime') // determine the current location of each person
==>[a:marko,b:santa fe,c:2005]
==>[a:stephen,b:purcellville,c:2006]
==>[a:matthias,b:seattle,c:2014]
==>[a:daniel,b:aachen,c:2009]
gremlin> g.V().has('name','gremlin').inE('uses').
order().by('skill',asc).as('a').
outV().as('b').
select('a','b').by('skill').by('name') // rank the users of gremlin by their skill level
==>[a:3,b:matthias]
==>[a:4,b:marko]
==>[a:5,b:stephen]
==>[a:5,b:daniel]
g.V().as('a').
properties('location').as('b').
hasNot('endTime').as('c').
select('a','b','c').by('name').by(value).by('startTime') // determine the current location of each person
g.V().has('name','gremlin').inE('uses').
order().by('skill',asc).as('a').
outV().as('b').
select('a','b').by('skill').by('name') // rank the users of gremlin by their skill level
Graph Variables
Graph.Variables
are key/value pairs associated with the graph itself — in essence, a Map<String,Object>
. These
variables are intended to store metadata about the graph. Example use cases include:
-
Schema information: What do the namespace prefixes resolve to and when was the schema last modified?
-
Global permissions: What are the access rights for particular groups?
-
System user information: Who are the admins of the system?
An example of graph variables in use is presented below:
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> graph.variables()
==>variables[size:0]
gremlin> graph.variables().set('systemAdmins',['stephen','peter','pavel'])
==>null
gremlin> graph.variables().set('systemUsers',['matthias','marko','josh'])
==>null
gremlin> graph.variables().keys()
==>systemAdmins
==>systemUsers
gremlin> graph.variables().get('systemUsers')
==>Optional[[matthias, marko, josh]]
gremlin> graph.variables().get('systemUsers').get()
==>matthias
==>marko
==>josh
gremlin> graph.variables().remove('systemAdmins')
==>null
gremlin> graph.variables().keys()
==>systemUsers
graph = TinkerGraph.open()
graph.variables()
graph.variables().set('systemAdmins',['stephen','peter','pavel'])
graph.variables().set('systemUsers',['matthias','marko','josh'])
graph.variables().keys()
graph.variables().get('systemUsers')
graph.variables().get('systemUsers').get()
graph.variables().remove('systemAdmins')
graph.variables().keys()
Important
|
Graph variables are not intended to be subject to heavy, concurrent mutation nor to be used in complex computations. The intention is to have a location to store data about the graph for administrative purposes. |
Warning
|
Attempting to set graph variables in a reference graph will not promote them to the remote graph. Typically, a reference graph has immutable features and will not support this features. |
Namespace Conventions
End users, graph system providers, GraphComputer
algorithm designers,
GremlinPlugin creators, etc. all leverage properties on elements to store information. There are
a few conventions that should be respected when naming property keys to ensure that conflicts between these
stakeholders do not conflict.
-
End users are granted the flat namespace (e.g.
name
,age
,location
) to key their properties and label their elements. -
Graph system providers are granted the hidden namespace (e.g.
~metadata
) to key their properties and labels. Data keyed as such is only accessible via the graph system implementation and no other stakeholders are granted read nor write access to data prefixed with "~" (seeGraph.Hidden
). Test coverage and exceptions exist to ensure that graph systems respect this hard boundary. -
VertexProgram
andMapReduce
developers should leverage qualified namespaces particular to their domain (e.g.mydomain.myvertexprogram.computedata
). -
GremlinPlugin
creators should prefix their plugin name with their domain (e.g.mydomain.myplugin
).
Important
|
TinkerPop uses tinkerpop. and gremlin. as the prefixes for provided strategies, vertex programs, map
reduce implementations, and plugins.
|
The only truly protected namespace is the hidden namespace provided to graph systems. From there, it’s up to engineers to respect the namespacing conventions presented.
The Traversal
At the most general level there is Traversal<S,E>
which implements Iterator<E>
, where the S
stands for start and
the E
stands for end. A traversal is composed of four primary components:
-
Step<S,E>
: an individual function applied toS
to yieldE
. Steps are chained within a traversal. -
TraversalStrategy
: interceptor methods to alter the execution of the traversal (e.g. query re-writing). -
TraversalSideEffects
: key/value pairs that can be used to store global information about the traversal. -
Traverser<T>
: the object propagating through theTraversal
currently representing an object of typeT
.
The classic notion of a graph traversal is provided by GraphTraversal<S,E>
which extends Traversal<S,E>
.
GraphTraversal
provides an interpretation of the graph data in terms of vertices, edges, etc. and thus, a graph
traversal DSL.
A GraphTraversal<S,E>
is spawned from a GraphTraversalSource
. It can also be spawned anonymously (i.e. empty)
via __
. A graph traversal is composed of an ordered list of steps. All the steps provided by GraphTraversal
inherit from the more general forms diagrammed above. A list of all the steps (and their descriptions) are provided
in the TinkerPop GraphTraversal JavaDoc.
Important
|
The basics for starting a traversal are described in The Graph Process section as well as in the Getting Started tutorial. |
Note
|
To reduce the verbosity of the expression, it is good to
import static org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__.* . This way, instead of doing __.inE()
for an anonymous traversal, it is possible to simply write inE() . Be aware of language-specific reserved keywords
when using anonymous traversals. For example, in and as are reserved keywords in Groovy, therefore you must use
the verbose syntax __.in() and __.as() to avoid collisions.
|
Important
|
The underlying Step implementations provided by TinkerPop should encompass most of the functionality
required by a DSL author. It is important that DSL authors leverage the provided steps as then the common optimization
and decoration strategies can reason on the underlying traversal sequence. If new steps are introduced, then common
traversal strategies may not function properly.
|
Traversal Transactions
A database transaction represents a unit of work to execute against the database. A traversals unit of work is affected by usage convention (i.e. the method of connecting) and the graph provider’s transaction model. Without diving deeply into different conventions and models the most general and recommended approach to working with transactions is demonstrated as follows:
GraphTraversalSource g = traversal().withEmbedded(graph);
// or
GraphTraversalSource g = traversal().withRemote(conn);
Transaction tx = g.tx();
// spawn a GraphTraversalSource from the Transaction. Traversals spawned
// from gtx will be essentially be bound to tx
GraphTraversalSource gtx = tx.begin();
try {
gtx.addV('person').iterate();
gtx.addV('software').iterate();
tx.commit();
} catch (Exception ex) {
tx.rollback();
}
The above example is straightforward and represents a good starting point for discussing the nuances of transactions in relation to the usage convention and graph provider caveats alluded to earlier.
Focusing on remote contexts first, note that it is still possible to issue traversals from g
, but those will have a
transaction scope outside of gtx
and will simply commit()
on the server if successfully executed or rollback()
on the server otherwise (i.e. one traversal is one transaction). Each isolated transaction will require its own
Transaction
object. Multiple begin()
calls on the same Transaction
object will produce GraphTraversalSource
instances that are bound to the same transaction, therefore:
GraphTraversalSource g = traversal().withRemote(conn);
Transaction tx1 = g.tx();
Transaction tx2 = g.tx();
// both gtx1a and gtx1b will be bound to the same transaction
GraphTraversalSource gtx1a = tx1.begin();
GraphTraversalSource gtx1b = tx1.begin();
// g and gtx2 will not have knowledge of what happens in tx1
GraphTraversalSource gtx2 = tx2.begin();
In remote cases, GraphTraversalSource
instances spawned from begin()
are safe to use in multiple threads though
on the server side they will be processed serially as they arrive. The default behavior of close()
on a
Transaction
for remote cases is to commit()
, so the following re-write of the earlier example is also valid:
// note here that we dispense with creating a Transaction object and
// simply spawn the gtx in a more inline fashion
GraphTraversalSource gtx = g.tx().begin();
try {
gtx.addV('person').iterate();
gtx.addV('software').iterate();
gtx.close();
} catch (Exception ex) {
tx.rollback();
}
Important
|
Transactions with non-JVM languages are always "remote". For specific transaction syntax in a particular language, please see the "Transactions" sub-section of your language of interest in the Gremlin Drivers and Variants section. |
In embedded cases, that initial recommended model for defining transactions holds, but users have more options here
on deeper inspection. For embedded use cases (and perhaps even in configuration of a graph instance in Gremlin Server),
the type of Transaction
object that is returned from g.tx()
is an important indicator as to the features of that
graph’s transaction model. In most cases, inspection of that object will indicate an instance that derives from the
AbstractThreadLocalTransaction
class, which means that the transaction is bound to the current thread and therefore
all traversals that execute within that thread are tied to that transaction.
A ThreadLocal
transaction differs then from the remote case described before because technically any traversal
spawned from g
or from a Transaction
will fall under the same transaction scope. As a result, it is wise, when
trying to write context agnostic Gremlin, to follow the more rigid conventions of the initial example.
The sub-sections that follow offer a bit more insight into each of the usage contexts.
Embedded
When on the JVM using an embedded graph, there is considerable flexibility for working with
transactions. With the Graph API, transactions are controlled by an implementation of the Transaction
interface and
that object can be obtained from the Graph
interface using the tx()
method. It is important to note that the
Transaction
object does not represent a "transaction" itself. It merely exposes the methods for working with
transactions (e.g. committing, rolling back, etc).
Most Graph
implementations that supportsTransactions
will implement an "automatic" ThreadLocal
transaction,
which means that when a read or write occurs after the Graph
is instantiated, a transaction is automatically
started within that thread. There is no need to manually call a method to "create" or "start" a transaction. Simply
modify the graph as required and call graph.tx().commit()
to apply changes or graph.tx().rollback()
to undo them.
When the next read or write action occurs against the graph, a new transaction will be started within that current
thread of execution.
When using transactions in this fashion, especially in web application (e.g. HTTP server), it is important to ensure that transactions do not leak from one request to the next. In other words, unless a client is somehow bound via session to process every request on the same server thread, every request must be committed or rolled back at the end of the request. By ensuring that the request encapsulates a transaction, it ensures that a future request processed on a server thread is starting in a fresh transactional state and will not have access to the remains of one from an earlier request. A good strategy is to rollback a transaction at the start of a request, so that if it so happens that a transactional leak does occur between requests somehow, a fresh transaction is assured by the fresh request.
Tip
|
The tx() method is on the Graph interface, but it is also available on the TraversalSource spawned from a
Graph . Calls to TraversalSource.tx() are proxied through to the underlying Graph as a convenience.
|
Tip
|
Some graphs may throw an exception that implements TemporaryException . In this case, this marker interface is
designed to inform the client that it may choose to retry the operation at a later time for possible success.
|
Warning
|
TinkerPop provides for basic transaction control, however, like many aspects of TinkerPop, it is up to the graph system provider to choose the specific aspects of how their implementation will work and how it fits into the TinkerPop stack. Be sure to understand the transaction semantics of the specific graph implementation that is being utilized as it may present differing functionality than described here. |
Configuring
Determining when a transaction starts is dependent upon the behavior assigned to the Transaction
. It is up to the
Graph
implementation to determine the default behavior and unless the implementation doesn’t allow it, the behavior
itself can be altered via these Transaction
methods:
public Transaction onReadWrite(Consumer<Transaction> consumer);
public Transaction onClose(Consumer<Transaction> consumer);
Providing a Consumer
function to onReadWrite
allows definition of how a transaction starts when a read or a write
occurs. Transaction.READ_WRITE_BEHAVIOR
contains pre-defined Consumer
functions to supply to the onReadWrite
method. It has two options:
-
AUTO
- automatic transactions where the transaction is started implicitly to the read or write operation -
MANUAL
- manual transactions where it is up to the user to explicitly open a transaction, throwing an exception if the transaction is not open
Providing a Consumer
function to onClose
allows configuration of how a transaction is handled when
Transaction.close()
is called. Transaction.CLOSE_BEHAVIOR
has several pre-defined options that can be supplied to
this method:
-
COMMIT
- automatically commit an open transaction -
ROLLBACK
- automatically rollback an open transaction -
MANUAL
- throw an exception if a transaction is open, forcing the user to explicitly close the transaction
Important
|
As transactions are ThreadLocal in nature, so are the transaction configurations for onReadWrite and
onClose .
|
Once there is an understanding for how transactions are configured, most of the rest of the Transaction
interface
is self-explanatory. Note that Neo4j-Gremlin is used for the examples to follow as TinkerGraph does
not support transactions.
Important
|
The following example is meant to demonstrate specific use of ThreadLocal transactions and is at odds
with the more generalized transaction convention that is recommended for both embedded and remote contexts. Please be
sure to understand the preferred approach described at in the Traversal Transactions Section before
using this method.
|
gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[neo4jgraph[community single [/tmp/neo4j]], standard]
gremlin> graph.features()
==>FEATURES
> GraphFeatures
>-- Transactions: true //1
>-- Computer: false
>-- Persistence: true
...
gremlin> g.tx().onReadWrite(Transaction.READ_WRITE_BEHAVIOR.AUTO) //2
==>org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph$Neo4jTransaction@1c067c0d
gremlin> g.addV("person").("name","stephen") //3
==>v[0]
gremlin> g.tx().commit() //4
==>null
gremlin> g.tx().onReadWrite(Transaction.READ_WRITE_BEHAVIOR.MANUAL) //5
==>org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph$Neo4jTransaction@1c067c0d
gremlin> g.tx().isOpen()
==>false
gremlin> g.addV("person").("name","marko") //6
Open a transaction before attempting to read/write the transaction
gremlin> g.tx().open() //7
==>null
gremlin> g.addV("person").("name","marko") //8
==>v[1]
gremlin> g.tx().commit()
==>null
-
Check
features
to ensure that the graph supports transactions. -
By default,
Neo4jGraph
is configured with "automatic" transactions, so it is set here for demonstration purposes only. -
When the vertex is added, the transaction is automatically started. From this point, more mutations can be staged or other read operations executed in the context of that open transaction.
-
Calling
commit
finalizes the transaction. -
Change transaction behavior to require manual control.
-
Adding a vertex now results in failure because the transaction was not explicitly opened.
-
Explicitly open a transaction.
-
Adding a vertex now succeeds as the transaction was manually opened.
Note
|
It may be important to consult the documentation of the Graph implementation you are using when it comes to the
specifics of how transactions will behave. TinkerPop allows some latitude in this area and implementations may not have
the exact same behaviors and ACID guarantees.
|
Gremlin Server
The available capability for transactions with Gremlin Server is dependent upon the method of interaction that is used. The preferred method for interacting with Gremlin Server is via websockets and bytecode based requests. The start of the Transactions Section describes this approach in detail with examples.
Gremlin Server also has the option to accept Gremlin-based scripts. The scripting approach provides access to the Graph API and thus also the transactional model described in the embedded section. Therefore a single script can have the ability to execute multiple transactions per request with complete control provided to the developer to commit or rollback transactions as needed.
There are two methods for sending scripts to Gremlin Server: sessionless and session-based. With sessionless requests there will always be an attempt to close the transaction at the end of the request with a commit if there are no errors or a rollback if there is a failure. It is therefore unnecessary to close transactions manually within scripts themselves. By default, session-based requests do not have this quality. The transaction will be held open on the server until the user closes it manually. There is an option to have automatic transaction management for sessions. More information on this topic can be found in the Considering Transactions Section and the Considering Sessions Section.
Remote Gremlin Providers
At this time, transactional patterns for Remote Gremlin Providers are largely in line with Gremlin Server. As most of
RGPs do not expose a Graph
instance, access to lower level transactional functions available to embedded graphs
even in a sessionless fashion are not typically permitted. For example, without a Graph
instance it is not possible
to configure transaction close or read-write
behaviors. The nature of what a "transaction" means will be dependent on the RGP as is the case with any
TinkerPop-enabled graph system, so it is important to consult that systems documentation for more details.
Configuration Steps
Many of the methods on the GraphTraversalSource
are meant to configure the source for usage. These configuration
affect the manner in which a traversals are spawned from it. Configuration methods can be identified by their names
with make use of "with" as a prefix:
With Configuration
The with()
configuration adds arbitrary data to a TraversalSource
which can then be used by graph providers as
configuration options for a traversal execution. This configuration is similar to with()-modulator which
has similar functionality when applied to an individual step.
g.with('providerDefinedVariable', 0.33).V()
The 0.33
value for the "providerDefinedVariable" will be bound to each traversal spawned that way. Consult the
graph system being used to determine if any such configuration options are available.
WithBulk Configuration
The withBulk()
configuration allows for control of bulking operations. This value is true
by default allowing for
normal bulking operations, but when set to false
, introduces a subtle change in that behavior as
shown in examples in sack()-step.
WithComputer Configuration
The withComputer()
configuration adds a Computer
that will be used to process the traversal and is necessary for
OLAP based processing and steps that require that processing. See examples related to
SparkGraphComputer
or see examples in the computer required steps, like pageRank() or
[shortestpath-shortestPath()].
WithSack Configuration
The withSack()
configuration adds a "sack" that can be accessed by traversals spawned from this source. This
functionality is shown in more detail in the examples for (sack())-step.
WithSideEffect Configuration
The withSideEffect()
configuration adds an arbitrary Object
to traversals spawned from this source which can be
accessed as a side-effect given the supplied key.
gremlin> g.withSideEffect('x',['dog','cat','fish']).
V().has('person','name','marko').select('x').unfold()
==>dog
==>cat
==>fish
g.withSideEffect('x',['dog','cat','fish']).
V().has('person','name','marko').select('x').unfold()
WithStrategies Configuration
The withStrategies()
configuration allows inclusion of additional TraversalStrategy
instances to be applied to
any traversals spawned from the configured source. Please see the Traversal Strategy Section
for more details on how this configuration works.
WithoutStrategies Configuration
The withoutStrategies()
configuration removes a particular TraversalStrategy
from those to be applied to traversals
spawned from the configured source. Please see the Traversal Strategy Section for more details
on how this configuration works.
Start Steps
Not all steps are capable of starting a GraphTraversal
. Only those steps on the GraphTraversalSource
can do that.
Many of the methods on GraphTraversalSource
are actually for its configuration and start
steps should not be confused with those.
Spawn steps, which actually yield a traversal, typically match the names of existing steps:
-
addE()
- Adds anEdge
to start the traversal (example). -
addV()
- Adds aVertex
to start the traversal (example). -
call()
- Makes a provider-specific service call to start the traversal (example). -
E()
- Reads edges from the graph to start the traversal (example). -
inject()
- Inserts arbitrary objects to start the traversal (example). -
mergeE()
- Adds anEdge
in a "create if not exist" fashion to start the traversal (example) -
mergeV()
- Adds aVertex
in a "create if not exist" fashion to start the traversal (example) -
union()
- Merges the results of an arbitrary number of child traversals to start the traversal (example). -
V()
- Reads vertices from the graph to start the traversal (example).
Graph Traversal Steps
Gremlin steps are chained together to produce the actual traversal and are triggered by way of start steps
on the GraphTraversalSource
.
Important
|
More details about the Gremlin language can be found in the Provider Documentation within the Gremlin Semantics Section. |
General Steps
There are five general steps, each having a traversal and a lambda representation, by which all other specific steps described later extend.
Step | Description |
---|---|
|
map the traverser to some object of type |
|
map the traverser to an iterator of |
|
map the traverser to either true or false, where false will not pass the traverser to the next step. |
|
perform some operation on the traverser and pass it to the next step. |
|
split the traverser to all the traversals indexed by the |
Warning
|
Lambda steps are presented for educational purposes as they represent the foundational constructs of the Gremlin language. In practice, lambda steps should be avoided in favor of their traversals representation and traversal verification strategies exist to disallow their use unless explicitly "turned off." For more information on the problems with lambdas, please read A Note on Lambdas. |
The Traverser<S>
object provides access to:
-
The current traversed
S
object —Traverser.get()
. -
The current path traversed by the traverser —
Traverser.path()
.-
A helper shorthand to get a particular path-history object —
Traverser.path(String) == Traverser.path().get(String)
.
-
-
The number of times the traverser has gone through the current loop —
Traverser.loops()
. -
The number of objects represented by this traverser —
Traverser.bulk()
. -
The local data structure associated with this traverser —
Traverser.sack()
. -
The side-effects associated with the traversal —
Traverser.sideEffects()
.-
A helper shorthand to get a particular side-effect —
Traverser.sideEffect(String) == Traverser.sideEffects().get(String)
.
-
gremlin> g.V(1).out().values('name') //// (1)
==>lop
==>vadas
==>josh
gremlin> g.V(1).out().map {it.get().value('name')} //// (2)
==>lop
==>vadas
==>josh
gremlin> g.V(1).out().map(values('name')) //// (3)
==>lop
==>vadas
==>josh
g.V(1).out().values('name') //// (1)
g.V(1).out().map {it.get().value('name')} //// (2)
g.V(1).out().map(values('name')) //3
-
An outgoing traversal from vertex 1 to the name values of the adjacent vertices.
-
The same operation, but using a lambda to access the name property values.
-
Again the same operation, but using the traversal representation of
map()
.
gremlin> g.V().filter {it.get().label() == 'person'} //// (1)
==>v[1]
==>v[2]
==>v[4]
==>v[6]
gremlin> g.V().filter(label().is('person')) //// (2)
==>v[1]
==>v[2]
==>v[4]
==>v[6]
gremlin> g.V().hasLabel('person') //// (3)
==>v[1]
==>v[2]
==>v[4]
==>v[6]
g.V().filter {it.get().label() == 'person'} //// (1)
g.V().filter(label().is('person')) //// (2)
g.V().hasLabel('person') //3
-
A filter that only allows the vertex to pass if it has the "person" label
-
The same operation, but using the traversal representation of
filter()
. -
The more specific
has()
-step is implemented as afilter()
with respective predicate.
gremlin> g.V().hasLabel('person').sideEffect(System.out.&println) //// (1)
v[1]
==>v[1]
v[2]
==>v[2]
v[4]
==>v[4]
v[6]
==>v[6]
gremlin> g.V().sideEffect(outE().count().aggregate(local,"o")).
sideEffect(inE().count().aggregate(local,"i")).cap("o","i") //// (2)
==>[i:[0,0,1,1,1,3],o:[3,0,0,0,2,1]]
g.V().hasLabel('person').sideEffect(System.out.&println) //// (1)
g.V().sideEffect(outE().count().aggregate(local,"o")).
sideEffect(inE().count().aggregate(local,"i")).cap("o","i") //2
-
Whatever enters
sideEffect()
is passed to the next step, but some intervening process can occur. -
Compute the out- and in-degree for each vertex. Both
sideEffect()
are fed with the same vertex.
gremlin> g.V().branch {it.get().value('name')}.
option('marko', values('age')).
option(none, values('name')) //// (1)
==>29
==>vadas
==>lop
==>josh
==>ripple
==>peter
gremlin> g.V().branch(values('name')).
option('marko', values('age')).
option(none, values('name')) //// (2)
==>29
==>vadas
==>lop
==>josh
==>ripple
==>peter
gremlin> g.V().choose(has('name','marko'),
values('age'),
values('name')) //// (3)
==>29
==>vadas
==>lop
==>josh
==>ripple
==>peter
g.V().branch {it.get().value('name')}.
option('marko', values('age')).
option(none, values('name')) //// (1)
g.V().branch(values('name')).
option('marko', values('age')).
option(none, values('name')) //// (2)
g.V().choose(has('name','marko'),
values('age'),
values('name')) //3
-
If the vertex is "marko", get his age, else get the name of the vertex.
-
The same operation, but using the traversal representing of
branch()
. -
The more specific boolean-based
choose()
-step is implemented as abranch()
.
Terminal Steps
Typically, when a step is concatenated to a traversal a traversal is returned. In this way, a traversal is built up in a fluent, monadic fashion. However, some steps do not return a traversal, but instead, execute the traversal and return a result. These steps are known as terminal steps (terminal) and they are explained via the examples below.
gremlin> g.V().out('created').hasNext() //// (1)
==>true
gremlin> g.V().out('created').next() //// (2)
==>v[3]
gremlin> g.V().out('created').next(2) //// (3)
==>v[3]
==>v[5]
gremlin> g.V().out('nothing').tryNext() //// (4)
==>Optional.empty
gremlin> g.V().out('created').toList() //// (5)
==>v[3]
==>v[5]
==>v[3]
==>v[3]
gremlin> g.V().out('created').toSet() //// (6)
==>v[3]
==>v[5]
gremlin> g.V().out('created').toBulkSet() //// (7)
==>v[3]
==>v[3]
==>v[3]
==>v[5]
gremlin> results = ['blah',3]
==>blah
==>3
gremlin> g.V().out('created').fill(results) //// (8)
==>blah
==>3
==>v[3]
==>v[5]
==>v[3]
==>v[3]
gremlin> g.addV('person').iterate() //// (9)
g.V().out('created').hasNext() //// (1)
g.V().out('created').next() //// (2)
g.V().out('created').next(2) //// (3)
g.V().out('nothing').tryNext() //// (4)
g.V().out('created').toList() //// (5)
g.V().out('created').toSet() //// (6)
g.V().out('created').toBulkSet() //// (7)
results = ['blah',3]
g.V().out('created').fill(results) //// (8)
g.addV('person').iterate() //9
-
hasNext()
determines whether there are available results (not supported ingremlin-javascript
). -
next()
will return the next result. -
next(n)
will return the nextn
results in a list (not supported ingremlin-javascript
or Gremlin.NET). -
tryNext()
will return anOptional
and thus, is a composite ofhasNext()
/next()
(only supported for JVM languages). -
toList()
will return all results in a list. -
toSet()
will return all results in a set and thus, duplicates removed (not supported ingremlin-javascript
). -
toBulkSet()
will return all results in a weighted set and thus, duplicates preserved via weighting (only supported for JVM languages). -
fill(collection)
will put all results in the provided collection and return the collection when complete (only supported for JVM languages). -
iterate()
does not exactly fit the definition of a terminal step in that it doesn’t return a result, but still returns a traversal - it does however behave as a terminal step in that it iterates the traversal and generates side effects without returning the actual result.
There is also the promise()
terminator step, which can only be used with remote traversals to
Gremlin Server or RGPs. It starts a promise to execute a function
on the current Traversal
that will be completed in the future.
Finally, explain()
-step is also a terminal step and is described in its own section.
AddE Step
Reasoning is the process of making explicit what is implicit
in the data. What is explicit in a graph are the objects of the graph — i.e. vertices and edges. What is implicit
in the graph is the traversal. In other words, traversals expose meaning where the meaning is determined by the
traversal definition. For example, take the concept of a "co-developer." Two people are co-developers if they have
worked on the same project together. This concept can be represented as a traversal and thus, the concept of
"co-developers" can be derived. Moreover, what was once implicit can be made explicit via the addE()
-step
(map/sideEffect).
gremlin> g.V(1).as('a').out('created').in('created').where(neq('a')).
addE('co-developer').from('a').property('year',2009) //// (1)
==>e[0][1-co-developer->4]
==>e[13][1-co-developer->6]
gremlin> g.V(3,4,5).aggregate('x').has('name','josh').as('a').
select('x').unfold().hasLabel('software').addE('createdBy').to('a') //// (2)
==>e[14][3-createdBy->4]
==>e[15][5-createdBy->4]
gremlin> g.V().as('a').out('created').addE('createdBy').to('a').property('acl','public') //// (3)
==>e[16][3-createdBy->1]
==>e[17][5-createdBy->4]
==>e[18][3-createdBy->4]
==>e[19][3-createdBy->6]
gremlin> g.V(1).as('a').out('knows').
addE('livesNear').from('a').property('year',2009).
inV().inE('livesNear').values('year') //// (4)
==>2009
==>2009
gremlin> g.V().match(
__.as('a').out('knows').as('b'),
__.as('a').out('created').as('c'),
__.as('b').out('created').as('c')).
addE('friendlyCollaborator').from('a').to('b').
property(id,23).property('project',select('c').values('name')) //// (5)
==>e[23][1-friendlyCollaborator->4]
gremlin> g.E(23).valueMap()
==>[project:lop]
gremlin> vMarko = g.V().has('name','marko').next()
==>v[1]
gremlin> vPeter = g.V().has('name','peter').next()
==>v[6]
gremlin> g.V(vMarko).addE('knows').to(vPeter) //// (6)
==>e[22][1-knows->6]
gremlin> g.addE('knows').from(vMarko).to(vPeter) //// (7)
==>e[24][1-knows->6]
g.V(1).as('a').out('created').in('created').where(neq('a')).
addE('co-developer').from('a').property('year',2009) //// (1)
g.V(3,4,5).aggregate('x').has('name','josh').as('a').
select('x').unfold().hasLabel('software').addE('createdBy').to('a') //// (2)
g.V().as('a').out('created').addE('createdBy').to('a').property('acl','public') //// (3)
g.V(1).as('a').out('knows').
addE('livesNear').from('a').property('year',2009).
inV().inE('livesNear').values('year') //// (4)
g.V().match(
__.as('a').out('knows').as('b'),
__.as('a').out('created').as('c'),
__.as('b').out('created').as('c')).
addE('friendlyCollaborator').from('a').to('b').
property(id,23).property('project',select('c').values('name')) //// (5)
g.E(23).valueMap()
vMarko = g.V().has('name','marko').next()
vPeter = g.V().has('name','peter').next()
g.V(vMarko).addE('knows').to(vPeter) //// (6)
g.addE('knows').from(vMarko).to(vPeter) //7
-
Add a co-developer edge with a year-property between marko and his collaborators.
-
Add incoming createdBy edges from the josh-vertex to the lop- and ripple-vertices.
-
Add an inverse createdBy edge for all created edges.
-
The newly created edge is a traversable object.
-
Two arbitrary bindings in a traversal can be joined
from()
→to()
, whereid
can be provided for graphs that supports user provided ids. -
Add an edge between marko and peter given the directed (detached) vertex references.
-
Add an edge between marko and peter given the directed (detached) vertex references.
Additional References
AddV Step
The addV()
-step is used to add vertices to the graph (map/sideEffect). For every incoming object, a vertex is
created. Moreover, GraphTraversalSource
maintains an addV()
method.
gremlin> g.addV('person').property('name','stephen')
==>v[0]
gremlin> g.V().values('name')
==>stephen
==>marko
==>vadas
==>lop
==>josh
==>ripple
==>peter
gremlin> g.V().outE('knows').addV().property('name','nothing')
==>v[13]
==>v[15]
gremlin> g.V().has('name','nothing')
==>v[13]
==>v[15]
gremlin> g.V().has('name','nothing').bothE()
g.addV('person').property('name','stephen')
g.V().values('name')
g.V().outE('knows').addV().property('name','nothing')
g.V().has('name','nothing')
g.V().has('name','nothing').bothE()
Additional References
Aggregate Step
The aggregate()
-step (sideEffect) is used to aggregate all the objects at a particular point of traversal into a
Collection
. The step is uses Scope
to help determine the aggregating behavior. For global
scope this means that
the step will use eager evaluation in that no objects continue on
until all previous objects have been fully aggregated. The eager evaluation model is crucial in situations
where everything at a particular point is required for future computation. By default, when the overload of
aggregate()
is called without a Scope
, the default is global
. An example is provided below.
gremlin> g.V(1).out('created') //// (1)
==>v[3]
gremlin> g.V(1).out('created').aggregate('x') //// (2)
==>v[3]
gremlin> g.V(1).out('created').aggregate(global, 'x') //// (3)
==>v[3]
gremlin> g.V(1).out('created').aggregate('x').in('created') //// (4)
==>v[1]
==>v[4]
==>v[6]
gremlin> g.V(1).out('created').aggregate('x').in('created').out('created') //// (5)
==>v[3]
==>v[5]
==>v[3]
==>v[3]
gremlin> g.V(1).out('created').aggregate('x').in('created').out('created').
where(without('x')).values('name') //// (6)
==>ripple
g.V(1).out('created') //// (1)
g.V(1).out('created').aggregate('x') //// (2)
g.V(1).out('created').aggregate(global, 'x') //// (3)
g.V(1).out('created').aggregate('x').in('created') //// (4)
g.V(1).out('created').aggregate('x').in('created').out('created') //// (5)
g.V(1).out('created').aggregate('x').in('created').out('created').
where(without('x')).values('name') //6
-
What has marko created?
-
Aggregate all his creations.
-
Identical to the previous line.
-
Who are marko’s collaborators?
-
What have marko’s collaborators created?
-
What have marko’s collaborators created that he hasn’t created?
In recommendation systems, the above pattern is used:
"What has userA liked? Who else has liked those things? What have they liked that userA hasn't already liked?"
Finally, aggregate()
-step can be modulated via by()
-projection.
gremlin> g.V().out('knows').aggregate('x').cap('x')
==>[v[2],v[4]]
gremlin> g.V().out('knows').aggregate('x').by('name').cap('x')
==>[vadas,josh]
gremlin> g.V().out('knows').aggregate('x').by('age').cap('x') //// (1)
==>[27,32]
g.V().out('knows').aggregate('x').cap('x')
g.V().out('knows').aggregate('x').by('name').cap('x')
g.V().out('knows').aggregate('x').by('age').cap('x') //1
-
The "age" property is not productive for all vertices and therefore those values are not included in the aggregation.
For local
scope the aggregation will occur in a lazy fashion.
Note
|
Prior to 3.4.3, local aggregation (i.e. lazy) evaluation was handled by store() -step.
|
gremlin> g.V().aggregate(global, 'x').limit(1).cap('x')
==>[v[1],v[2],v[3],v[4],v[5],v[6]]
gremlin> g.V().aggregate(local, 'x').limit(1).cap('x')
==>[v[1]]
gremlin> g.withoutStrategies(EarlyLimitStrategy).V().aggregate(local,'x').limit(1).cap('x')
==>[v[1],v[2]]
g.V().aggregate(global, 'x').limit(1).cap('x')
g.V().aggregate(local, 'x').limit(1).cap('x')
g.withoutStrategies(EarlyLimitStrategy).V().aggregate(local,'x').limit(1).cap('x')
It is important to note that EarlyLimitStrategy
introduced in 3.3.5 alters the behavior of aggregate(local)
.
Without that strategy (which is installed by default), there are two results in the aggregate()
side-effect even
though the interval selection is for 1 object. Realize that when the second object is on its way to the range()
filter (i.e. [0..1]
), it passes through aggregate()
and thus, stored before filtered.
gremlin> g.E().aggregate(local,'x').by('weight').cap('x')
==>[0.5,1.0,1.0,0.4,0.4,0.2]
g.E().aggregate(local,'x').by('weight').cap('x')
Additional References
All Step
It is possible to filter list traversers using all()
-step (filter). Every item in the list will be tested against
the supplied predicate and if all of the items pass then the traverser is passed along the stream, otherwise it is
filtered. Empty lists are passed along but null or non-iterable traversers are filtered out.
Python
|
The term |
gremlin> g.V().values('age').fold().all(gt(25)) //// (1)
==>[29,27,32,35]
g.V().values('age').fold().all(gt(25)) //1
-
Return the list of ages only if everyone’s age is greater than 25.
Additional References
And Step
The and()
-step ensures that all provided traversals yield a result (filter). Please see or()
for or-semantics.
Python
|
The term |
gremlin> g.V().and(
outE('knows'),
values('age').is(lt(30))).
values('name')
==>marko
g.V().and(
outE('knows'),
values('age').is(lt(30))).
values('name')
The and()
-step can take an arbitrary number of traversals. All traversals must produce at least one output for the
original traverser to pass to the next step.
An infix notation can be used as well.
gremlin> g.V().where(outE('created').and().outE('knows')).values('name')
==>marko
g.V().where(outE('created').and().outE('knows')).values('name')
Additional References
Any Step
It is possible to filter list traversers using any()
-step (filter). All items in the list will be tested against
the supplied predicate and if any of the items pass then the traverser is passed along the stream, otherwise it is
filtered. Empty lists, null traversers, and non-iterable traversers are filtered out as well.
Python
|
The term |
gremlin> g.V().values('age').fold().any(gt(25)) //// (1)
==>[29,27,32,35]
g.V().values('age').fold().any(gt(25)) //1
-
Return the list of ages if anyone’s age is greater than 25.
Additional References
As Step
The as()
-step is not a real step, but a "step modulator" similar to by()
and option()
.
With as()
, it is possible to provide a label to the step that can later be accessed by steps and data structures
that make use of such labels — e.g., select()
, match()
, and path.
Groovy
|
The term |
Python
|
The term |
gremlin> g.V().as('a').out('created').as('b').select('a','b') //// (1)
==>[a:v[1],b:v[3]]
==>[a:v[4],b:v[5]]
==>[a:v[4],b:v[3]]
==>[a:v[6],b:v[3]]
gremlin> g.V().as('a').out('created').as('b').select('a','b').by('name') //// (2)
==>[a:marko,b:lop]
==>[a:josh,b:ripple]
==>[a:josh,b:lop]
==>[a:peter,b:lop]
g.V().as('a').out('created').as('b').select('a','b') //// (1)
g.V().as('a').out('created').as('b').select('a','b').by('name') //2
-
Select the objects labeled "a" and "b" from the path.
-
Select the objects labeled "a" and "b" from the path and, for each object, project its name value.
A step can have any number of labels associated with it. This is useful for referencing the same step multiple times in a future step.
gremlin> g.V().hasLabel('software').as('a','b','c').
select('a','b','c').
by('name').
by('lang').
by(__.in('created').values('name').fold())
==>[a:lop,b:java,c:[marko,josh,peter]]
==>[a:ripple,b:java,c:[josh]]
g.V().hasLabel('software').as('a','b','c').
select('a','b','c').
by('name').
by('lang').
by(__.in('created').values('name').fold())
Additional References
AsString Step
The asString()
-step (map) returns the value of incoming traverser as strings. Null values are returned unchanged.
gremlin> g.V().hasLabel('person').values('age').asString() //// (1)
==>29
==>27
==>32
==>35
gremlin> g.V().hasLabel('person').values('age').asString().concat(' years old') //// (2)
==>29 years old
==>27 years old
==>32 years old
==>35 years old
gremlin> g.V().hasLabel('person').values('age').fold().asString(local) //// (3)
==>[29,27,32,35]
g.V().hasLabel('person').values('age').asString() //// (1)
g.V().hasLabel('person').values('age').asString().concat(' years old') //// (2)
g.V().hasLabel('person').values('age').fold().asString(local) //3
-
Return ages as string.
-
Return ages as string and use concat to generate phrases.
-
Use
Scope.local
to operate on individual string elements inside incoming list, which will return a list.
Additional References
AsDate Step
The asDate()
-step (map) converts string or numeric input to Date.
For string input only ISO-8601 format is supported. For numbers, the value is considered as the number of the milliseconds since "the epoch" (January 1, 1970, 00:00:00 GMT). Date input is passed without changes.
If the incoming traverser is not a string, number or Date then an IllegalArgumentException
will be thrown.
gremlin> g.inject(1690934400000).asDate() //// (1)
==>Wed Aug 02 00:00:00 UTC 2023
gremlin> g.inject("2023-08-02T00:00:00Z").asDate() //// (2)
==>Wed Aug 02 00:00:00 UTC 2023
gremlin> g.inject(datetime("2023-08-24T00:00:00Z")).asDate() //// (3)
==>Thu Aug 24 00:00:00 UTC 2023
g.inject(1690934400000).asDate() //// (1)
g.inject("2023-08-02T00:00:00Z").asDate() //// (2)
g.inject(datetime("2023-08-24T00:00:00Z")).asDate() //3
-
Convert number to Date
-
Convert ISO-8601 string to Date
-
Pass Date without modification
Additional References
Barrier Step
The barrier()
-step (barrier) turns the lazy traversal pipeline into a bulk-synchronous pipeline. This step is
useful in the following situations:
-
When everything prior to
barrier()
needs to be executed before moving onto the steps after thebarrier()
(i.e. ordering). -
When "stalling" the traversal may lead to a "bulking optimization" in traversals that repeatedly touch many of the same elements (i.e. optimizing).
gremlin> g.V().sideEffect{println "first: ${it}"}.sideEffect{println "second: ${it}"}.iterate()
first: v[1]
second: v[1]
first: v[2]
second: v[2]
first: v[3]
second: v[3]
first: v[4]
second: v[4]
first: v[5]
second: v[5]
first: v[6]
second: v[6]
gremlin> g.V().sideEffect{println "first: ${it}"}.barrier().sideEffect{println "second: ${it}"}.iterate()
first: v[1]
first: v[2]
first: v[3]
first: v[4]
first: v[5]
first: v[6]
second: v[1]
second: v[2]
second: v[3]
second: v[4]
second: v[5]
second: v[6]
g.V().sideEffect{println "first: ${it}"}.sideEffect{println "second: ${it}"}.iterate()
g.V().sideEffect{println "first: ${it}"}.barrier().sideEffect{println "second: ${it}"}.iterate()
The theory behind a "bulking optimization" is simple. If there are one million traversers at vertex 1, then there is
no need to calculate one million both()
-computations. Instead, represent those one million traversers as a single
traverser with a Traverser.bulk()
equal to one million and execute both()
once. A bulking optimization example is
made more salient on a larger graph. Therefore, the example below leverages the Grateful Dead graph.
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.io('data/grateful-dead.xml').read().iterate()
gremlin> g = traversal().withEmbedded(graph).withoutStrategies(LazyBarrierStrategy) //// (1)
==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard]
gremlin> clockWithResult(1){g.V().both().both().both().count().next()} //// (2)
==>6997.450284
==>126653966
gremlin> clockWithResult(1){g.V().repeat(both()).times(3).count().next()} //// (3)
==>7050.683031
==>126653966
gremlin> clockWithResult(1){g.V().both().barrier().both().barrier().both().barrier().count().next()} //// (4)
==>8.607391999999999
==>126653966
graph = TinkerGraph.open()
g = traversal().withEmbedded(graph)
g.io('data/grateful-dead.xml').read().iterate()
g = traversal().withEmbedded(graph).withoutStrategies(LazyBarrierStrategy) //// (1)
clockWithResult(1){g.V().both().both().both().count().next()} //// (2)
clockWithResult(1){g.V().repeat(both()).times(3).count().next()} //// (3)
clockWithResult(1){g.V().both().barrier().both().barrier().both().barrier().count().next()} //4
-
Explicitly remove
LazyBarrierStrategy
which yields a bulking optimization. -
A non-bulking traversal where each traverser is processed.
-
Each traverser entering
repeat()
has its recursion bulked. -
A bulking traversal where implicit traversers are not processed.
If barrier()
is provided an integer argument, then the barrier will only hold n
-number of unique traversers in its
barrier before draining the aggregated traversers to the next step. This is useful in the aforementioned bulking
optimization scenario with the added benefit of reducing the risk of an out-of-memory exception.
LazyBarrierStrategy
inserts barrier()
-steps into a traversal where appropriate in order to gain the
"bulking optimization."
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = traversal().withEmbedded(graph) //// (1)
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.io('data/grateful-dead.xml').read().iterate()
gremlin> clockWithResult(1){g.V().both().both().both().count().next()}
==>5.572671
==>126653966
gremlin> g.V().both().both().both().count().iterate().toString() //// (2)
==>[TinkerGraphStep(vertex,[]), VertexStep(BOTH,vertex), NoOpBarrierStep(2500), VertexStep(BOTH,vertex), NoOpBarrierStep(2500), VertexStep(BOTH,edge), CountGlobalStep, NoneStep]
graph = TinkerGraph.open()
g = traversal().withEmbedded(graph) //// (1)
g.io('data/grateful-dead.xml').read().iterate()
clockWithResult(1){g.V().both().both().both().count().next()}
g.V().both().both().both().count().iterate().toString() //2
-
LazyBarrierStrategy
is a default strategy and thus, does not need to be explicitly activated. -
With
LazyBarrierStrategy
activated,barrier()
-steps are automatically inserted where appropriate.
Additional References
Branch Step
The branch()
step splits the traverser to all the child traversals provided to it. Please see the
General Steps section for more information, but also consider that branch()
is the basis for more
robust steps like choose() and union().
Additional References
By Step
The by()
-step is not an actual step, but instead is a "step-modulator" similar to as()
and
option()
. If a step is able to accept traversals, functions, comparators, etc. then by()
is the
means by which they are added. The general pattern is step().by()…by()
. Some steps can only accept one by()
while others can take an arbitrary amount.
gremlin> g.V().group().by(bothE().count()) //// (1)
==>[1:[v[2],v[5],v[6]],3:[v[1],v[3],v[4]]]
gremlin> g.V().group().by(bothE().count()).by('name') //// (2)
==>[1:[vadas,ripple,peter],3:[marko,lop,josh]]
gremlin> g.V().group().by(bothE().count()).by(count()) //// (3)
==>[1:3,3:3]
g.V().group().by(bothE().count()) //// (1)
g.V().group().by(bothE().count()).by('name') //// (2)
g.V().group().by(bothE().count()).by(count()) //3
-
by(outE().count())
will group the elements by their edge count (traversal). -
by('name')
will process the grouped elements by their name (element property projection). -
by(count())
will count the number of elements in each group (traversal).
When a by()
modulator does not produce a result, it is deemed "unproductive". An "unproductive" modulator will lead
to the filtering of the traverser it is currently working with. The filtering will manifest in various ways depending
on the step.
gremlin> g.V().sample(1).by('age') //// (1)
==>v[4]
g.V().sample(1).by('age') //1
-
The "age" property key is not present for all vertices, therefore
sample()
will ignore (i.e. filter) such vertices for consideration in the sampling.
The following steps all support by()
-modulation. Note that the semantics of such modulation should be understood
on a step-by-step level and thus, as discussed in their respective section of the documentation.
-
aggregate()
: aggregate all objects into a set but only store theirby()
-modulated values. -
cyclicPath()
: filter if the traverser’s path is cyclic givenby()
-modulation. -
dedup()
: dedup on the results of aby()
-modulation. -
group()
: create group keys and values according toby()
-modulation. -
groupCount()
: count those groups where the group keys are the result ofby()
-modulation. -
math()
: transform a traverser provided to the step by way of theby()
modulator before it processed by it. -
order()
: order the objects by the results of aby()
-modulation. -
path()
: get the path of the traverser where each path element isby()
-modulated. -
project()
: project a map of results given variousby()
-modulations off the current object. -
propertyMap()
: transform the result of the values in the resultingMap
using theby()
modulator. -
sack()
: provides the transformation for a traverser to a value to be stored in the sack. -
sample()
: sample using the value returned byby()
-modulation. -
select()
: select path elements and transform them viaby()
-modulation. -
simplePath()
: filter if the traverser’s path is simple givenby()
-modulation. -
tree()
: get a tree of traversers objects where the objects have beenby()
-modulated. -
valueMap()
: transform the result of the values in the resultingMap
using theby()
modulator. -
where()
: determine the predicate given the testing of the results ofby()
-modulation.
Additional References
Call Step
The call()
step allows for custom, provider-specific service calls either at the start of a traversal or mid-traversal.
This allows Graph providers to expose operations not natively built into the Gremlin language, such as full text search,
custom analytics, notification triggers, etc.
When called with no arguments, call()
will produce a list of callable services available for the graph in use. This
no-argument version is equivalent to call('--list')
. This "directory service" is also capable of producing more
verbose output describing all the services or an individual service:
gremlin> g.call() //// (1)
gremlin> g.call('--list') //// (1)
gremlin> g.call().with('verbose') //// (2)
gremlin> g.call().with('verbose').with('service', 'xyz-service') //// (3)
g.call() //// (1)
g.call('--list') //// (1)
g.call().with('verbose') //// (2)
g.call().with('verbose').with('service', 'xyz-service') //3
-
List available services by name
-
Produce a Map of detailed service information by name
-
Produce the detailed service information for the 'xyz-service'
The first argument to call()
is always the name of the service call. Additionally, service calls can accept both
static and dynamically produced parameters. Static parameters can be passed as a Map
to the call()
as the second
argument. Individual static parameters can also be added using the .with()
modulator. Dynamic parameters can be
passed as a Map
-producing Traversal
as the second argument (no static parameters) or third argument (static + dynamic
parameters). Additional individual dynamic parameters can be added using the .with()
modulator.
g.call('xyz-service') //1
g.call('xyz-service', ['a':'b']) //2
g.call('xyz-service').with('a', 'b') //2
g.call('xyz-service', __.inject(['a':'b'])) //3
g.call('xyz-service').with('a', __.inject('b')) //3
g.call('xyz-service', ['a':'b'], __.inject(['c':'d'])) //4
-
Call the 'xyz-service' with no parameters
-
Examples of static parameters (constants known before execution)
-
Examples of dynamic parameters (these will be computed at execution time)
-
Example of static + dynamic parameters (these will be computed and merged into one set of parameters at execution time)
Additional References
GraphTraversalSource:
GraphTraversal:
Cap Step
The cap()
-step (barrier) iterates the traversal up to itself and emits the sideEffect referenced by the provided
key. If multiple keys are provided, then a Map<String,Object>
of sideEffects is emitted.
gremlin> g.V().groupCount('a').by(label).cap('a') //// (1)
==>[software:2,person:4]
gremlin> g.V().groupCount('a').by(label).groupCount('b').by(outE().count()).cap('a','b') //// (2)
==>[a:[software:2,person:4],b:[0:3,1:1,2:1,3:1]]
g.V().groupCount('a').by(label).cap('a') //// (1)
g.V().groupCount('a').by(label).groupCount('b').by(outE().count()).cap('a','b') //2
-
Group and count vertices by their label. Emit the side effect labeled 'a', which is the group count by label.
-
Same as statement 1, but also emit the side effect labeled 'b' which groups vertices by the number of out edges.
Additional References
Choose Step
The choose()
-step (branch) routes the current traverser to a particular traversal branch option. With choose()
,
it is possible to implement if/then/else-semantics as well as more complicated selections.
gremlin> g.V().hasLabel('person').
choose(values('age').is(lte(30)),
__.in(),
__.out()).values('name') //// (1)
==>marko
==>ripple
==>lop
==>lop
gremlin> g.V().hasLabel('person').
choose(values('age')).
option(27, __.in()).
option(32, __.out()).values('name') //// (2)
==>marko
==>ripple
==>lop
g.V().hasLabel('person').
choose(values('age').is(lte(30)),
__.in(),
__.out()).values('name') //// (1)
g.V().hasLabel('person').
choose(values('age')).
option(27, __.in()).
option(32, __.out()).values('name') //2
-
If the traversal yields an element, then do
in
, else doout
(i.e. true/false-based option selection). -
Use the result of the traversal as a key to the map of traversal options (i.e. value-based option selection).
If the "false"-branch is not provided, then if/then-semantics are implemented.
gremlin> g.V().choose(hasLabel('person'), out('created')).values('name') //// (1)
==>lop
==>lop
==>ripple
==>lop
==>ripple
==>lop
gremlin> g.V().choose(hasLabel('person'), out('created'), identity()).values('name') //// (2)
==>lop
==>lop
==>ripple
==>lop
==>ripple
==>lop
g.V().choose(hasLabel('person'), out('created')).values('name') //// (1)
g.V().choose(hasLabel('person'), out('created'), identity()).values('name') //2
-
If the vertex is a person, emit the vertices they created, else emit the vertex.
-
If/then/else with an
identity()
on the false-branch is equivalent to if/then with no false-branch.
Note that choose()
can have an arbitrary number of options and moreover, can take an anonymous traversal as its choice function.
gremlin> g.V().hasLabel('person').
choose(values('name')).
option('marko', values('age')).
option('josh', values('name')).
option('vadas', elementMap()).
option('peter', label())
==>29
==>[id:2,label:person,name:vadas,age:27]
==>josh
==>person
g.V().hasLabel('person').
choose(values('name')).
option('marko', values('age')).
option('josh', values('name')).
option('vadas', elementMap()).
option('peter', label())
The choose()
-step can leverage the Pick.none
option match. For anything that does not match a specified option, the none
-option is taken.
gremlin> g.V().hasLabel('person').
choose(values('name')).
option('marko', values('age')).
option(none, values('name'))
==>29
==>vadas
==>josh
==>peter
g.V().hasLabel('person').
choose(values('name')).
option('marko', values('age')).
option(none, values('name'))
Additional References
Coalesce Step
The coalesce()
-step evaluates the provided traversals in order and returns the first traversal that emits at
least one element.
gremlin> g.V(1).coalesce(outE('knows'), outE('created')).inV().path().by('name').by(label)
==>[marko,knows,vadas]
==>[marko,knows,josh]
gremlin> g.V(1).coalesce(outE('created'), outE('knows')).inV().path().by('name').by(label)
==>[marko,created,lop]
gremlin> g.V(1).property('nickname', 'okram')
==>v[1]
gremlin> g.V().hasLabel('person').coalesce(values('nickname'), values('name'))
==>okram
==>vadas
==>josh
==>peter
g.V(1).coalesce(outE('knows'), outE('created')).inV().path().by('name').by(label)
g.V(1).coalesce(outE('created'), outE('knows')).inV().path().by('name').by(label)
g.V(1).property('nickname', 'okram')
g.V().hasLabel('person').coalesce(values('nickname'), values('name'))
Additional References
Coin Step
To randomly filter out a traverser, use the coin()
-step (filter). The provided double argument biases the "coin toss."
gremlin> g.V().coin(0.5)
==>v[2]
==>v[5]
==>v[6]
gremlin> g.V().coin(0.0)
gremlin> g.V().coin(1.0)
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
g.V().coin(0.5)
g.V().coin(0.0)
g.V().coin(1.0)
Additional References
Combine Step
The combine()
-step (map) combines the elements of the incoming list traverser and the provided list argument into
one list. This is also known as appending or concatenating. This step only expects list data (array or Iterable) and
will throw an IllegalArgumentException
if any other type is encountered (including null
). This differs from the
merge()
-step in that it allows duplicates to exist.
gremlin> g.V().values("name").fold().combine(["james","jen","marko","vadas"])
==>[marko,vadas,lop,josh,ripple,peter,james,jen,marko,vadas]
gremlin> g.V().values("name").fold().combine(__.constant("stephen").fold())
==>[marko,vadas,lop,josh,ripple,peter,stephen]
g.V().values("name").fold().combine(["james","jen","marko","vadas"])
g.V().values("name").fold().combine(__.constant("stephen").fold())
Additional References
Concat Step
The concat()
-step (map) concatenates one or more String values together to the incoming String traverser. This step
can take either String varargs or Traversal varargs.
Any null
String values will be skipped when concatenated with non-null
String values. If two null
value are
concatenated, the null
value will be propagated and returned.
If the incoming traverser is a non-String value then an IllegalArgumentException
will be thrown.
gremlin> g.addV(constant('prefix_').concat(__.V(1).label())).property(id, 10) //// (1)
==>v[10]
gremlin> g.V(10).label()
==>prefix_person
gremlin> g.V().hasLabel('person').values('name').as('a').
constant('Mr.').concat(__.select('a')) //// (2)
==>Mr.marko
==>Mr.vadas
==>Mr.josh
==>Mr.peter
gremlin> g.V().hasLabel('software').as('a').values('name').
concat(' uses ').
concat(select('a').values('lang')) //// (3)
==>lop uses java
==>ripple uses java
gremlin> g.V(1).outE().as('a').V(1).values('name').
concat(' ').
concat(select('a').label()).
concat(' ').
concat(select("a").inV().values('name')) //// (4)
==>marko created lop
==>marko knows vadas
==>marko knows josh
gremlin> g.V(1).outE().as('a').V(1).values('name').
concat(constant(' '),
select("a").label(),
constant(' '),
select('a').inV().values('name')) //// (5)
==>marko created lop
==>marko knows vadas
==>marko knows josh
gremlin> g.inject('hello', 'hi').concat(__.V().values('name')) //// (6)
==>hellomarko
==>himarko
gremlin> g.inject('This').concat(' ').concat('is a ', 'gremlin.') //// (7)
==>This is a gremlin.
g.addV(constant('prefix_').concat(__.V(1).label())).property(id, 10) //// (1)
g.V(10).label()
g.V().hasLabel('person').values('name').as('a').
constant('Mr.').concat(__.select('a')) //// (2)
g.V().hasLabel('software').as('a').values('name').
concat(' uses ').
concat(select('a').values('lang')) //// (3)
g.V(1).outE().as('a').V(1).values('name').
concat(' ').
concat(select('a').label()).
concat(' ').
concat(select("a").inV().values('name')) //// (4)
g.V(1).outE().as('a').V(1).values('name').
concat(constant(' '),
select("a").label(),
constant(' '),
select('a').inV().values('name')) //// (5)
g.inject('hello', 'hi').concat(__.V().values('name')) //// (6)
g.inject('This').concat(' ').concat('is a ', 'gremlin.') //7
-
Add a new vertex with id 10 which should be labeled like an existing vertex but with some prefix attached
-
Attach the prefix "Mr." to all the names using the
constant()
-step -
Generate a string of software names and the language they use
-
Generate a string description for each of marko’s outgoing edges
-
Alternative way to generate the string description by using traversal varargs. Use the
constant()
step to add desired strings between arguments. -
The
concat()
step will append the first result from the child traversal to the incoming traverser -
A generic use of
concat()
to join strings together
Additional References
Conjoin Step
The conjoin()
-step (map) joins together the elements in the incoming list traverser together with the provided argument
as a delimiter. The resulting String
is added to the Traversal Stream. This step only expects list data (array or
Iterable) in the incoming traverser and will throw an IllegalArgumentException
if any other type is encountered
(including null
). Null values are skipped and not included in the result.
gremlin> g.V().values("name").fold().conjoin("+")
==>marko+vadas+lop+josh+ripple+peter
g.V().values("name").fold().conjoin("+")
Additional References
ConnectedComponent Step
The connectedComponent()
step performs a computation to identify Connected Component
instances in a graph. When this step completes, the vertices will be labelled with a component identifier to denote
the component to which they are associated.
Important
|
The connectedComponent() -step is a VertexComputing -step and as such, can only be used against a graph
that supports GraphComputer (OLAP).
|
gremlin> g = traversal().withEmbedded(graph).withComputer()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], graphcomputer]
gremlin> g.V().
connectedComponent().
with(ConnectedComponent.propertyName, 'component').
project('name','component').
by('name').
by('component')
==>[name:ripple,component:1]
==>[name:vadas,component:1]
==>[name:peter,component:1]
==>[name:lop,component:1]
==>[name:marko,component:1]
==>[name:josh,component:1]
gremlin> g.V().hasLabel('person').
connectedComponent().
with(ConnectedComponent.propertyName, 'component').
with(ConnectedComponent.edges, outE('knows')).
project('name','component').
by('name').
by('component')
==>[name:peter,component:6]
==>[name:josh,component:1]
==>[name:vadas,component:1]
==>[name:marko,component:1]
g = traversal().withEmbedded(graph).withComputer()
g.V().
connectedComponent().
with(ConnectedComponent.propertyName, 'component').
project('name','component').
by('name').
by('component')
g.V().hasLabel('person').
connectedComponent().
with(ConnectedComponent.propertyName, 'component').
with(ConnectedComponent.edges, outE('knows')).
project('name','component').
by('name').
by('component')
Note the use of the with()
modulating step which provides configuration options to the algorithm. It takes
configuration keys from the ConnectedComponent
class and is automatically imported to the Gremlin Console.
Additional References
Constant Step
To specify a constant value for a traverser, use the constant()
-step (map). This is often useful with conditional
steps like choose()
-step or coalesce()
-step.
gremlin> g.V().choose(hasLabel('person'),
values('name'),
constant('inhuman')) //// (1)
==>marko
==>vadas
==>inhuman
==>josh
==>inhuman
==>peter
gremlin> g.V().coalesce(
hasLabel('person').values('name'),
constant('inhuman')) //// (2)
==>marko
==>vadas
==>inhuman
==>josh
==>inhuman
==>peter
g.V().choose(hasLabel('person'),
values('name'),
constant('inhuman')) //// (1)
g.V().coalesce(
hasLabel('person').values('name'),
constant('inhuman')) //2
-
Show the names of people, but show "inhuman" for other vertices.
-
Same as statement 1 (unless there is a person vertex with no name).
Additional References
Count Step
The count()
-step (map) counts the total number of represented traversers in the streams (i.e. the bulk count).
gremlin> g.V().count()
==>6
gremlin> g.V().hasLabel('person').count()
==>4
gremlin> g.V().hasLabel('person').outE('created').count().path() //// (1)
==>[4]
gremlin> g.V().hasLabel('person').outE('created').count().map {it.get() * 10}.path() //// (2)
==>[4,40]
g.V().count()
g.V().hasLabel('person').count()
g.V().hasLabel('person').outE('created').count().path() //// (1)
g.V().hasLabel('person').outE('created').count().map {it.get() * 10}.path() //2
-
count()
-step is a reducing barrier step meaning that all of the previous traversers are folded into a new traverser. -
The path of the traverser emanating from
count()
starts atcount()
.
Important
|
count(local) counts the current, local object (not the objects in the traversal stream). This works for
Collection - and Map -type objects. For any other object, a count of 1 is returned.
|
Additional References
CyclicPath Step
Each traverser maintains its history through the traversal over the graph — i.e. its path.
If it is important that the traverser repeat its course, then cyclic()
-path should be used (filter). The step
analyzes the path of the traverser thus far and if there are any repeats, the traverser is filtered out over the
traversal computation. If non-cyclic behavior is desired, see simplePath()
.
gremlin> g.V(1).both().both()
==>v[1]
==>v[4]
==>v[6]
==>v[1]
==>v[5]
==>v[3]
==>v[1]
gremlin> g.V(1).both().both().cyclicPath()
==>v[1]
==>v[1]
==>v[1]
gremlin> g.V(1).both().both().cyclicPath().path()
==>[v[1],v[3],v[1]]
==>[v[1],v[2],v[1]]
==>[v[1],v[4],v[1]]
gremlin> g.V(1).both().both().cyclicPath().by('age').path() //// (1)
==>[v[1],v[2],v[1]]
==>[v[1],v[4],v[1]]
gremlin> g.V(1).as('a').out('created').as('b').
in('created').as('c').
cyclicPath().
path()
==>[v[1],v[3],v[1]]
gremlin> g.V(1).as('a').out('created').as('b').
in('created').as('c').
cyclicPath().from('a').to('b').
path()
g.V(1).both().both()
g.V(1).both().both().cyclicPath()
g.V(1).both().both().cyclicPath().path()
g.V(1).both().both().cyclicPath().by('age').path() //// (1)
g.V(1).as('a').out('created').as('b').
in('created').as('c').
cyclicPath().
path()
g.V(1).as('a').out('created').as('b').
in('created').as('c').
cyclicPath().from('a').to('b').
path()
-
The "age" property is not productive for all vertices and therefore those traversers are filtered.
Additional References
DateAdd Step
The dateAdd()
-step (map) returns the value with the addition of the value number of units as specified by the DateToken.
If the incoming traverser is not a Date, then an IllegalArgumentException
will be thrown.
gremlin> g.inject("2023-08-02T00:00:00Z").asDate().dateAdd(DT.day, 7) //// (1)
==>Wed Aug 09 00:00:00 UTC 2023
gremlin> g.inject(["2023-08-02T00:00:00Z", "2023-08-03T00:00:00Z"]).unfold().asDate().dateAdd(DT.minute, 1) //// (2)
==>Wed Aug 02 00:01:00 UTC 2023
==>Thu Aug 03 00:01:00 UTC 2023
g.inject("2023-08-02T00:00:00Z").asDate().dateAdd(DT.day, 7) //// (1)
g.inject(["2023-08-02T00:00:00Z", "2023-08-03T00:00:00Z"]).unfold().asDate().dateAdd(DT.minute, 1) //2
-
Add 7 days to Date
-
Add 1 minute to incoming dates
Additional References
DateDiff Step
The dateDiff()
-step (map) returns the difference between two Dates in epoch time.
If the incoming traverser is not a Date, then an IllegalArgumentException
will be thrown.
gremlin> g.inject("2023-08-02T00:00:00Z").asDate().dateDiff(constant("2023-08-03T00:00:00Z").asDate()) //// (1)
==>-86400
g.inject("2023-08-02T00:00:00Z").asDate().dateDiff(constant("2023-08-03T00:00:00Z").asDate()) //1
-
Find difference between two dates
Additional References
Dedup Step
With dedup()
-step (filter), repeatedly seen objects are removed from the traversal stream. Note that if a
traverser’s bulk is greater than 1, then it is set to 1 before being emitted.
gremlin> g.V().values('lang')
==>java
==>java
gremlin> g.V().values('lang').dedup()
==>java
gremlin> g.V(1).repeat(bothE('created').dedup().otherV()).emit().path() //// (1)
==>[v[1],e[9][1-created->3],v[3]]
==>[v[1],e[9][1-created->3],v[3],e[11][4-created->3],v[4]]
==>[v[1],e[9][1-created->3],v[3],e[12][6-created->3],v[6]]
==>[v[1],e[9][1-created->3],v[3],e[11][4-created->3],v[4],e[10][4-created->5],v[5]]
gremlin> g.V().bothE().properties().dedup() //// (2)
==>p[weight->0.4]
==>p[weight->0.5]
==>p[weight->1.0]
==>p[weight->0.2]
g.V().values('lang')
g.V().values('lang').dedup()
g.V(1).repeat(bothE('created').dedup().otherV()).emit().path() //// (1)
g.V().bothE().properties().dedup() //2
-
Traverse all
created
edges, but don’t touch any edge twice. -
Note that
Property
instances will compare on key and value, whereas aVertexProperty
will also include its element as it is a first-class citizen.
If a by-step modulation is provided to dedup()
, then the object is processed accordingly prior to determining if it
has been seen or not.
gremlin> g.V().elementMap('name')
==>[id:1,label:person,name:marko]
==>[id:2,label:person,name:vadas]
==>[id:3,label:software,name:lop]
==>[id:4,label:person,name:josh]
==>[id:5,label:software,name:ripple]
==>[id:6,label:person,name:peter]
gremlin> g.V().dedup().by(label).values('name')
==>marko
==>lop
g.V().elementMap('name')
g.V().dedup().by(label).values('name')
Finally, if dedup()
is provided an array of strings, then it will ensure that the de-duplication is not with respect
to the current traverser object, but to the path history of the traverser.
gremlin> g.V().as('a').out('created').as('b').in('created').as('c').select('a','b','c')
==>[a:v[1],b:v[3],c:v[1]]
==>[a:v[1],b:v[3],c:v[4]]
==>[a:v[1],b:v[3],c:v[6]]
==>[a:v[4],b:v[5],c:v[4]]
==>[a:v[4],b:v[3],c:v[1]]
==>[a:v[4],b:v[3],c:v[4]]
==>[a:v[4],b:v[3],c:v[6]]
==>[a:v[6],b:v[3],c:v[1]]
==>[a:v[6],b:v[3],c:v[4]]
==>[a:v[6],b:v[3],c:v[6]]
gremlin> g.V().as('a').out('created').as('b').in('created').as('c').dedup('a','b').select('a','b','c') //// (1)
==>[a:v[1],b:v[3],c:v[1]]
==>[a:v[4],b:v[5],c:v[4]]
==>[a:v[4],b:v[3],c:v[1]]
==>[a:v[6],b:v[3],c:v[1]]
gremlin> g.V().as('a').both().as('b').both().as('c').
dedup('a','b').by('age'). //// (2)
select('a','b','c').by('name')
==>[a:marko,b:vadas,c:marko]
==>[a:marko,b:josh,c:ripple]
==>[a:vadas,b:marko,c:lop]
==>[a:josh,b:marko,c:lop]
g.V().as('a').out('created').as('b').in('created').as('c').select('a','b','c')
g.V().as('a').out('created').as('b').in('created').as('c').dedup('a','b').select('a','b','c') //// (1)
g.V().as('a').both().as('b').both().as('c').
dedup('a','b').by('age'). //// (2)
select('a','b','c').by('name')
-
If the current
a
andb
combination has been seen previously, then filter the traverser. -
The "age" property is not productive for all vertices and therefore those values are filtered.
Additional References
Difference Step
The difference()
-step (map) calculates the difference between the incoming list traverser and the provided list
argument. More specifically, this provides the set operation A-B where A is the traverser and B is the argument. This
step only expects list data (array or Iterable) and will throw an IllegalArgumentException
if any other type is
encountered (including null
).
gremlin> g.V().values("name").fold().difference(["lop","ripple"])
==>[peter,vadas,josh,marko]
gremlin> g.V().values("name").fold().difference(__.V().limit(2).values("name").fold())
==>[ripple,peter,josh,lop]
g.V().values("name").fold().difference(["lop","ripple"])
g.V().values("name").fold().difference(__.V().limit(2).values("name").fold())
Additional References
Disjunct Step
The disjunct()
-step (map) calculates the disjunct set between the incoming list traverser and the provided list
argument. This step only expects list data (array or Iterable) and will throw an IllegalArgumentException
if any other
type is encountered (including null
).
gremlin> g.V().values("name").fold().disjunct(["lop","peter","sam"]) //// (1)
==>[ripple,vadas,josh,sam,marko]
gremlin> g.V().values("name").fold().disjunct(__.V().limit(3).values("name").fold())
==>[ripple,peter,josh]
g.V().values("name").fold().disjunct(["lop","peter","sam"]) //// (1)
g.V().values("name").fold().disjunct(__.V().limit(3).values("name").fold())
-
Find the unique names between two group of names
Additional References
Drop Step
The drop()
-step (filter/sideEffect) is used to remove element and properties from the graph (i.e. remove). It
is a filter step because the traversal yields no outgoing objects.
gremlin> g.V().outE().drop()
gremlin> g.E()
gremlin> g.V().properties('name').drop()
gremlin> g.V().elementMap()
==>[id:1,label:person,age:29]
==>[id:2,label:person,age:27]
==>[id:3,label:software,lang:java]
==>[id:4,label:person,age:32]
==>[id:5,label:software,lang:java]
==>[id:6,label:person,age:35]
gremlin> g.V().drop()
gremlin> g.V()
g.V().outE().drop()
g.E()
g.V().properties('name').drop()
g.V().elementMap()
g.V().drop()
g.V()
Additional References
E Step
The E()
-step is meant to read edges from the graph and is usually used to start a GraphTraversal
, but can also
be used mid-traversal.
gremlin> g.E(11) //// (1)
==>e[11][4-created->3]
gremlin> g.E().hasLabel('knows').has('weight', gt(0.75))
==>e[8][1-knows->4]
gremlin> g.inject(1).coalesce(E().hasLabel("knows"), addE("knows").from(V().has("name","josh")).to(V().has("name","vadas"))) //// (2)
==>e[7][1-knows->2]
==>e[8][1-knows->4]
g.E(11) //// (1)
g.E().hasLabel('knows').has('weight', gt(0.75))
g.inject(1).coalesce(E().hasLabel("knows"), addE("knows").from(V().has("name","josh")).to(V().has("name","vadas"))) //2
-
Find the edge by its unique identifier (i.e.
T.id
) - not all graphs will use a numeric value for their identifier. -
Get edges with label
knows
, if there is none then add new one betweenjosh
andvadas
.
Additional References
Element Step
The element()
step is a no-argument step that traverses from a Property
to the Element
that owns it.
gremlin> g.V().properties().element() //// (1)
==>v[1]
==>v[1]
==>v[1]
==>v[1]
==>v[1]
==>v[7]
==>v[7]
==>v[7]
==>v[7]
==>v[8]
==>v[8]
==>v[8]
==>v[8]
==>v[8]
==>v[9]
==>v[9]
==>v[9]
==>v[9]
==>v[10]
==>v[11]
gremlin> g.E().properties().element() //// (2)
==>e[13][1-develops->10]
==>e[14][1-develops->11]
==>e[15][1-uses->10]
==>e[16][1-uses->11]
==>e[17][7-develops->10]
==>e[18][7-develops->11]
==>e[19][7-uses->10]
==>e[20][7-uses->11]
==>e[21][8-develops->10]
==>e[22][8-uses->10]
==>e[23][8-uses->11]
==>e[24][9-uses->10]
==>e[25][9-uses->11]
gremlin> g.V().properties().properties().element() //// (3)
==>vp[location->san diego]
==>vp[location->san diego]
==>vp[location->santa cruz]
==>vp[location->santa cruz]
==>vp[location->brussels]
==>vp[location->brussels]
==>vp[location->santa fe]
==>vp[location->centreville]
==>vp[location->centreville]
==>vp[location->dulles]
==>vp[location->dulles]
==>vp[location->purcellville]
==>vp[location->bremen]
==>vp[location->bremen]
==>vp[location->baltimore]
==>vp[location->baltimore]
==>vp[location->oakland]
==>vp[location->oakland]
==>vp[location->seattle]
==>vp[location->spremberg]
==>vp[location->spremberg]
==>vp[location->kaiserslautern]
==>vp[location->kaiserslautern]
==>vp[location->aachen]
g.V().properties().element() //// (1)
g.E().properties().element() //// (2)
g.V().properties().properties().element() //3
-
Traverse from
VertexProperty
toVertex
-
Traverse from
Property
(edge property) toEdge
-
Traverse from
Property
(meta property) toVertexProperty
Additional References
ElementMap Step
The elementMap()
-step yields a Map
representation of the structure of an element.
gremlin> g.V().elementMap()
==>[id:1,label:person,name:marko,age:29]
==>[id:2,label:person,name:vadas,age:27]
==>[id:3,label:software,name:lop,lang:java]
==>[id:4,label:person,name:josh,age:32]
==>[id:5,label:software,name:ripple,lang:java]
==>[id:6,label:person,name:peter,age:35]
gremlin> g.V().elementMap('age')
==>[id:1,label:person,age:29]
==>[id:2,label:person,age:27]
==>[id:3,label:software]
==>[id:4,label:person,age:32]
==>[id:5,label:software]
==>[id:6,label:person,age:35]
gremlin> g.V().elementMap('age','blah')
==>[id:1,label:person,age:29]
==>[id:2,label:person,age:27]
==>[id:3,label:software]
==>[id:4,label:person,age:32]
==>[id:5,label:software]
==>[id:6,label:person,age:35]
gremlin> g.E().elementMap()
==>[id:7,label:knows,IN:[id:2,label:person],OUT:[id:1,label:person],weight:0.5]
==>[id:8,label:knows,IN:[id:4,label:person],OUT:[id:1,label:person],weight:1.0]
==>[id:9,label:created,IN:[id:3,label:software],OUT:[id:1,label:person],weight:0.4]
==>[id:10,label:created,IN:[id:5,label:software],OUT:[id:4,label:person],weight:1.0]
==>[id:11,label:created,IN:[id:3,label:software],OUT:[id:4,label:person],weight:0.4]
==>[id:12,label:created,IN:[id:3,label:software],OUT:[id:6,label:person],weight:0.2]
g.V().elementMap()
g.V().elementMap('age')
g.V().elementMap('age','blah')
g.E().elementMap()
It is important to note that the map of a vertex assumes that cardinality for each key is single
and if it is list
then only the first item encountered will be returned. As single
is the more common cardinality for properties this
assumption should serve the greatest number of use cases.
gremlin> g.V().elementMap()
==>[id:1,label:person,name:marko,location:santa fe]
==>[id:7,label:person,name:stephen,location:purcellville]
==>[id:8,label:person,name:matthias,location:seattle]
==>[id:9,label:person,name:daniel,location:aachen]
==>[id:10,label:software,name:gremlin]
==>[id:11,label:software,name:tinkergraph]
gremlin> g.V().has('name','marko').properties('location')
==>vp[location->san diego]
==>vp[location->santa cruz]
==>vp[location->brussels]
==>vp[location->santa fe]
gremlin> g.V().has('name','marko').properties('location').elementMap()
==>[id:6,key:location,value:san diego,startTime:1997,endTime:2001]
==>[id:7,key:location,value:santa cruz,startTime:2001,endTime:2004]
==>[id:8,key:location,value:brussels,startTime:2004,endTime:2005]
==>[id:9,key:location,value:santa fe,startTime:2005]
g.V().elementMap()
g.V().has('name','marko').properties('location')
g.V().has('name','marko').properties('location').elementMap()
Important
|
The elementMap() -step does not return the vertex labels for incident vertices when using GraphComputer
as the id is the only available data to the star graph.
|
Additional References
Emit Step
The emit
-step is not an actual step, but is instead a step modulator for repeat()
(find more
documentation on the emit()
there).
Additional References
Explain Step
The explain()
-step (terminal) will return a TraversalExplanation
. A traversal explanation details how the
traversal (prior to explain()
) will be compiled given the registered traversal strategies.
A TraversalExplanation
has a toString()
representation with 3-columns. The first column is the
traversal strategy being applied. The second column is the traversal strategy category: [D]ecoration, [O]ptimization,
[P]rovider optimization, [F]inalization, and [V]erification. Finally, the third column is the state of the traversal
post strategy application. The final traversal is the resultant execution plan.
gremlin> g.V().hasLabel('person').outE().identity().inV().count().is(gt(5)).explain()
==>Traversal Explanation
==================================================================================================================================================================================
Original Traversal [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,edge), IdentityStep, EdgeVertexStep(IN), CountGlobalStep, IsStep(gt(5))]
ConnectiveStrategy [D] [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,edge), IdentityStep, EdgeVertexStep(IN), CountGlobalStep, IsStep(gt(5))]
IdentityRemovalStrategy [O] [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,edge), EdgeVertexStep(IN), CountGlobalStep, IsStep(gt(5))]
MatchPredicateStrategy [O] [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,edge), EdgeVertexStep(IN), CountGlobalStep, IsStep(gt(5))]
FilterRankingStrategy [O] [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,edge), EdgeVertexStep(IN), CountGlobalStep, IsStep(gt(5))]
InlineFilterStrategy [O] [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,edge), EdgeVertexStep(IN), CountGlobalStep, IsStep(gt(5))]
IncidentToAdjacentStrategy [O] [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,vertex), CountGlobalStep, IsStep(gt(5))]
RepeatUnrollStrategy [O] [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,vertex), CountGlobalStep, IsStep(gt(5))]
PathRetractionStrategy [O] [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,vertex), CountGlobalStep, IsStep(gt(5))]
EarlyLimitStrategy [O] [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,vertex), CountGlobalStep, IsStep(gt(5))]
AdjacentToIncidentStrategy [O] [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,edge), CountGlobalStep, IsStep(gt(5))]
ByModulatorOptimizationStrategy [O] [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,edge), CountGlobalStep, IsStep(gt(5))]
CountStrategy [O] [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,edge), RangeGlobalStep(0,6), CountGlobalStep, IsStep(gt(5))]
LazyBarrierStrategy [O] [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,edge), RangeGlobalStep(0,6), CountGlobalStep, IsStep(gt(5))]
TinkerGraphCountStrategy [P] [GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,edge), RangeGlobalStep(0,6), CountGlobalStep, IsStep(gt(5))]
TinkerGraphStepStrategy [P] [TinkerGraphStep(vertex,[~label.eq(person)]), VertexStep(OUT,edge), RangeGlobalStep(0,6), CountGlobalStep, IsStep(gt(5))]
ProfileStrategy [F] [TinkerGraphStep(vertex,[~label.eq(person)]), VertexStep(OUT,edge), RangeGlobalStep(0,6), CountGlobalStep, IsStep(gt(5))]
StandardVerificationStrategy [V] [TinkerGraphStep(vertex,[~label.eq(person)]), VertexStep(OUT,edge), RangeGlobalStep(0,6), CountGlobalStep, IsStep(gt(5))]
Final Traversal [TinkerGraphStep(vertex,[~label.eq(person)]), VertexStep(OUT,edge), RangeGlobalStep(0,6), CountGlobalStep, IsStep(gt(5))]
g.V().hasLabel('person').outE().identity().inV().count().is(gt(5)).explain()
For traversal profiling information, please see profile()
-step.
Fail Step
The fail()
-step provides a way to force a traversal to immediately fail with an exception. This feature is often
helpful during debugging purposes and for validating certain conditions prior to continuing with traversal execution.
gremlin> g.V().has('person','name','peter').fold().
......1> coalesce(unfold(),
......2> fail('peter should exist')).
......3> property('k',100)
==>v[6]
gremlin> g.V().has('person','name','stephen').fold().
......1> coalesce(unfold(),
......2> fail('stephen should exist')).
......3> property('k',100)
fail() Step Triggered
===========================================================================================================================
Message > stephen should exist
Traverser> []
Bulk > 1
Traversal> fail()
Parent > CoalesceStep [V().has("person","name","stephen").fold().coalesce(__.unfold(),__.fail()).property("k",(int) 100)]
Metadata > {}
===========================================================================================================================
The code example above exemplifies the latter use case where there is essentially an assertion that there is a vertex with a particular "name" value prior to updating the property "k" and explicitly failing when that vertex is not found.
Additional References
Filter Step
The filter()
step maps the traverser from the current object to either true
or false
where the latter will not
pass the traverser to the next step in the process. Please see the General Steps section for more
information.
Additional References
FlatMap Step
The flatMap()
step maps the traverser from the current object to an Iterator
of objects for the next step in the
process. Please see the General Steps section for more information.
Additional References
Format Step
This step is designed to simplify some string operations. In general, it is similar to the string formatting function available in many programming languages. Variable values can be picked up from Element properties, maps and scope variables.
gremlin> g.V().format("%{name} is %{age} years old") //// (1)
==>marko is 29 years old
==>vadas is 27 years old
==>josh is 32 years old
==>peter is 35 years old
gremlin> g.V().hasLabel("person").as("a").values("name").as("p1").select("a").in("knows").format("%{p1} knows %{name}") //// (2)
==>vadas knows marko
==>josh knows marko
gremlin> g.V().format("%{name} has %{_} connections").by(bothE().count()) //// (3)
==>marko has 3 connections
==>vadas has 1 connections
==>lop has 3 connections
==>josh has 3 connections
==>ripple has 1 connections
==>peter has 1 connections
gremlin> g.V().project("name","count").by(values("name")).by(bothE().count()).format("%{name} has %{count} connections") //// (4)
==>marko has 3 connections
==>vadas has 1 connections
==>lop has 3 connections
==>josh has 3 connections
==>ripple has 1 connections
==>peter has 1 connections
g.V().format("%{name} is %{age} years old") //// (1)
g.V().hasLabel("person").as("a").values("name").as("p1").select("a").in("knows").format("%{p1} knows %{name}") //// (2)
g.V().format("%{name} has %{_} connections").by(bothE().count()) //// (3)
g.V().project("name","count").by(values("name")).by(bothE().count()).format("%{name} has %{count} connections") //4
-
A
format()
will use property values from incoming Element to produce String result. -
A
format()
will use scope variablep1
and propertyname
to resolve variable values. -
A
format()
will use propertyname
and traversal product for positional argument to resolve variable values. -
A
format()
will use map produced byproject
step to resolve variable values.
Additional References
Fold Step
There are situations when the traversal stream needs a "barrier" to aggregate all the objects and emit a computation
that is a function of the aggregate. The fold()
-step (map) is one particular instance of this. Please see
unfold()
-step for the inverse functionality.
gremlin> g.V(1).out('knows').values('name')
==>vadas
==>josh
gremlin> g.V(1).out('knows').values('name').fold() //// (1)
==>[vadas,josh]
gremlin> g.V(1).out('knows').values('name').fold().next().getClass() //// (2)
==>class java.util.ArrayList
gremlin> g.V(1).out('knows').values('name').fold(0) {a,b -> a + b.length()} //// (3)
==>9
gremlin> g.V().values('age').fold(0) {a,b -> a + b} //// (4)
==>123
gremlin> g.V().values('age').fold(0, sum) //// (5)
==>123
gremlin> g.V().values('age').sum() //// (6)
==>123
gremlin> g.inject(["a":1],["b":2]).fold([], addAll) //// (7)
==>[[a:1],[b:2]]
g.V(1).out('knows').values('name')
g.V(1).out('knows').values('name').fold() //// (1)
g.V(1).out('knows').values('name').fold().next().getClass() //// (2)
g.V(1).out('knows').values('name').fold(0) {a,b -> a + b.length()} //// (3)
g.V().values('age').fold(0) {a,b -> a + b} //// (4)
g.V().values('age').fold(0, sum) //// (5)
g.V().values('age').sum() //// (6)
g.inject(["a":1],["b":2]).fold([], addAll) //7
-
A parameterless
fold()
will aggregate all the objects into a list and then emit the list. -
A verification of the type of list returned.
-
fold()
can be provided two arguments — a seed value and a reduce bi-function ("vadas" is 5 characters + "josh" with 4 characters). -
What is the total age of the people in the graph?
-
The same as before, but using a built-in bi-function.
-
The same as before, but using the
sum()
-step. -
A mechanism for merging
Map
instances. If a key occurs in more than a singleMap
, the later occurrence will replace the earlier.
Additional References
From Step
The from()
-step is not an actual step, but instead is a "step-modulator" similar to as()
and
by()
. If a step is able to accept traversals or strings then from()
is the
means by which they are added. The general pattern is step().from()
. See to()
-step.
The list of steps that support from()
-modulation are: simplePath()
, cyclicPath()
,
path()
, and addE()
.
Javascript
|
The term |
Python
|
The term |
Additional References
Group Step
As traversers propagate across a graph as defined by a traversal, sideEffect computations are sometimes required.
That is, the actual path taken or the current location of a traverser is not the ultimate output of the computation,
but some other representation of the traversal. The group()
-step (map/sideEffect) is one such sideEffect that
organizes the objects according to some function of the object. Then, if required, that organization (a list) is
reduced. An example is provided below.
gremlin> g.V().group().by(label) //// (1)
==>[software:[v[3],v[5]],person:[v[1],v[2],v[4],v[6]]]
gremlin> g.V().group().by(label).by('name') //// (2)
==>[software:[lop,ripple],person:[marko,vadas,josh,peter]]
gremlin> g.V().group().by(label).by(count()) //// (3)
==>[software:2,person:4]
g.V().group().by(label) //// (1)
g.V().group().by(label).by('name') //// (2)
g.V().group().by(label).by(count()) //3
-
Group the vertices by their label.
-
For each vertex in the group, get their name.
-
For each grouping, what is its size?
The two projection parameters available to group()
via by()
are:
-
Key-projection: What feature of the object to group on (a function that yields the map key)?
-
Value-projection: What feature of the group to store in the key-list?
gremlin> g.V().group().by('age').by('name') //// (1)
==>[32:[josh],35:[peter],27:[vadas],29:[marko]]
gremlin> g.V().group().by('name').by('age') //// (2)
==>[ripple:[],peter:[35],vadas:[27],josh:[32],lop:[],marko:[29]]
g.V().group().by('age').by('name') //// (1)
g.V().group().by('name').by('age') //2
-
The "age" property is not productive for all vertices and therefore those keys are filtered.
-
The "age" property is not productive for all vertices and therefore those values are filtered.
Additional References
GroupCount Step
When it is important to know how many times a particular object has been at a particular part of a traversal,
groupCount()
-step (map/sideEffect) is used.
"What is the distribution of ages in the graph?"
gremlin> g.V().hasLabel('person').values('age').groupCount()
==>[32:1,35:1,27:1,29:1]
gremlin> g.V().hasLabel('person').groupCount().by('age') //// (1)
==>[32:1,35:1,27:1,29:1]
gremlin> g.V().groupCount().by('age') //// (2)
==>[32:1,35:1,27:1,29:1]
g.V().hasLabel('person').values('age').groupCount()
g.V().hasLabel('person').groupCount().by('age') //// (1)
g.V().groupCount().by('age') //2
-
You can also supply a pre-group projection, where the provided
by()
-modulation determines what to group the incoming object by. -
The "age" property is not productive for all vertices and therefore those values are filtered.
There is one person that is 32, one person that is 35, one person that is 27, and one person that is 29.
"Iteratively walk the graph and count the number of times you see the second letter of each name."
gremlin> g.V().repeat(both().groupCount('m').by(label)).times(10).cap('m')
==>[software:19598,person:39196]
g.V().repeat(both().groupCount('m').by(label)).times(10).cap('m')
The above is interesting in that it demonstrates the use of referencing the internal Map<Object,Long>
of
groupCount()
with a string variable. Given that groupCount()
is a sideEffect-step, it simply passes the object
it received to its output. Internal to groupCount()
, the object’s count is incremented.
Additional References
Has Step
It is possible to filter vertices, edges, and vertex properties based on their properties using has()
-step
(filter). There are numerous variations on has()
including:
-
has(key,value)
: Remove the traverser if its element does not have the provided key/value property. -
has(label, key, value)
: Remove the traverser if its element does not have the specified label and provided key/value property. -
has(key,predicate)
: Remove the traverser if its element does not have a key value that satisfies the bi-predicate. For more information on predicates, please read A Note on Predicates. -
hasLabel(labels…)
: Remove the traverser if its element does not have any of the labels. -
hasId(ids…)
: Remove the traverser if its element does not have any of the ids. -
hasKey(keys…)
: Remove theProperty
traverser if it does not match one of the provided keys. -
hasValue(values…)
: Remove theProperty
traverser if it does not match one of the provided values. -
has(key)
: Remove the traverser if its element does not have a value for the key. -
hasNot(key)
: Remove the traverser if its element has a value for the key. -
has(key, traversal)
: Remove the traverser if its object does not yield a result through the traversal off the property value.
gremlin> g.V().hasLabel('person')
==>v[1]
==>v[2]
==>v[4]
==>v[6]
gremlin> g.V().hasLabel('person','name','marko')
==>v[1]
==>v[2]
==>v[4]
==>v[6]
gremlin> g.V().hasLabel('person').out().has('name',within('vadas','josh'))
==>v[2]
==>v[4]
gremlin> g.V().hasLabel('person').out().has('name',within('vadas','josh')).
outE().hasLabel('created')
==>e[10][4-created->5]
==>e[11][4-created->3]
gremlin> g.V().has('age',inside(20,30)).values('age') //// (1)
==>29
==>27
gremlin> g.V().has('age',outside(20,30)).values('age') //// (2)
==>32
==>35
gremlin> g.V().has('name',within('josh','marko')).elementMap() //// (3)
==>[id:1,label:person,name:marko,age:29]
==>[id:4,label:person,name:josh,age:32]
gremlin> g.V().has('name',without('josh','marko')).elementMap() //// (4)
==>[id:2,label:person,name:vadas,age:27]
==>[id:3,label:software,name:lop,lang:java]
==>[id:5,label:software,name:ripple,lang:java]
==>[id:6,label:person,name:peter,age:35]
gremlin> g.V().has('name',not(within('josh','marko'))).elementMap() //// (5)
==>[id:2,label:person,name:vadas,age:27]
==>[id:3,label:software,name:lop,lang:java]
==>[id:5,label:software,name:ripple,lang:java]
==>[id:6,label:person,name:peter,age:35]
gremlin> g.V().properties().hasKey('age').value() //// (6)
==>29
==>27
==>32
==>35
gremlin> g.V().hasNot('age').values('name') //// (7)
==>lop
==>ripple
gremlin> g.V().has('person','name', startingWith('m')) //// (8)
==>v[1]
gremlin> g.V().has(null, 'vadas') //// (9)
gremlin> g.V().has(label, __.is('person')) //// (10)
==>v[1]
==>v[2]
==>v[4]
==>v[6]
g.V().hasLabel('person')
g.V().hasLabel('person','name','marko')
g.V().hasLabel('person').out().has('name',within('vadas','josh'))
g.V().hasLabel('person').out().has('name',within('vadas','josh')).
outE().hasLabel('created')
g.V().has('age',inside(20,30)).values('age') //// (1)
g.V().has('age',outside(20,30)).values('age') //// (2)
g.V().has('name',within('josh','marko')).elementMap() //// (3)
g.V().has('name',without('josh','marko')).elementMap() //// (4)
g.V().has('name',not(within('josh','marko'))).elementMap() //// (5)
g.V().properties().hasKey('age').value() //// (6)
g.V().hasNot('age').values('name') //// (7)
g.V().has('person','name', startingWith('m')) //// (8)
g.V().has(null, 'vadas') //// (9)
g.V().has(label, __.is('person')) //10
-
Find all vertices whose ages are between 20 (exclusive) and 30 (exclusive). In other words, the age must be greater than 20 and less than 30.
-
Find all vertices whose ages are not between 20 (inclusive) and 30 (inclusive). In other words, the age must be less than 20 or greater than 30.
-
Find all vertices whose names are exact matches to any names in the collection
[josh,marko]
, display all the key,value pairs for those vertices. -
Find all vertices whose names are not in the collection
[josh,marko]
, display all the key,value pairs for those vertices. -
Same as the prior example save using
not
onwithin
to yieldwithout
. -
Find all age-properties and emit their value.
-
Find all vertices that do not have an age-property and emit their name.
-
Find all "person" vertices that have a name property that starts with the letter "m".
-
Property key is always stored as
String
and therefore an equality check withnull
will produce no result. -
An example of
has()
where the argument is aTraversal
and does not quite behave the way most expect.
Item 10 in the above set of examples bears some discussion. The behavior is not such that the result of the Traversal
is used as the comparing value for has()
, but the current Traverser
, which in this case is the vertex label
, is
given to the Traversal
to behave as a filter itself. In other words, if the Traversal
(i.e. is('person')
) returns
a value then the has()
is effectively true
. A common mistake is to try to use select()
in this context where one
would do has('name', select('n'))
to try to inject the value of "n" into the step to get has('name', <value-of-n>)
,
but this would instead simply produce an always true
filter for has()
.
TinkerPop does not support a regular expression predicate, although specific graph databases that leverage TinkerPop may provide a partial match extension.
Additional References
has(String)
,
has(String,Object)
,
has(String,P)
,
has(String,String,Object)
,
has(String,String,P)
,
has(String,Traversal)
,
has(T,Object)
,
has(T,P)
,
has(T,Traversal)
,
hasId(Object,Object…)
,
hasId(P)
,
hasKey(P)
,
hasKey(String,String…)
,
hasLabel(P)
,
hasLabel(String,String…)
,
hasNot(String)
,
hasValue(Object,Object…)
,
hasValue(P)
,
P
,
TextP
,
T
,
Recipes - Anti-pattern
Id Step
The id()
-step (map) takes an Element
and extracts its identifier from it.
gremlin> g.V().id()
==>1
==>2
==>3
==>4
==>5
==>6
gremlin> g.V(1).out().id().is(2)
==>2
gremlin> g.V(1).outE().id()
==>9
==>7
==>8
gremlin> g.V(1).properties().id()
==>0
==>1
g.V().id()
g.V(1).out().id().is(2)
g.V(1).outE().id()
g.V(1).properties().id()
Additional References
Identity Step
The identity()
-step (map) is an identity function which maps
the current object to itself.
gremlin> g.V().identity()
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
g.V().identity()
Additional References
Index Step
The index()
-step (map) indexes each element in the current collection. If the current traverser’s value is not a collection, then it’s treated as a single-item collection. There are two indexers
available, which can be chosen using the with()
modulator. The list indexer (default) creates a list for each collection item, with the first item being the original element and the second element
being the index. The map indexer created a linked hash map in which the index represents the key and the original item is used as the value.
gremlin> g.V().hasLabel("software").index() //// (1)
==>[[v[3],0]]
==>[[v[5],0]]
gremlin> g.V().hasLabel("software").values("name").fold().
order(Scope.local).
index().
unfold().
order().
by(__.tail(Scope.local, 1)) //// (2)
==>[lop,0]
==>[ripple,1]
gremlin> g.V().hasLabel("software").values("name").fold().
order(Scope.local).
index().
with(WithOptions.indexer, WithOptions.list).
unfold().
order().
by(__.tail(Scope.local, 1)) //// (3)
==>[lop,0]
==>[ripple,1]
gremlin> g.V().hasLabel("person").values("name").fold().
order(Scope.local).
index().
with(WithOptions.indexer, WithOptions.map) //// (4)
==>[0:josh,1:marko,2:peter,3:vadas]
g.V().hasLabel("software").index() //// (1)
g.V().hasLabel("software").values("name").fold().
order(Scope.local).
index().
unfold().
order().
by(__.tail(Scope.local, 1)) //// (2)
g.V().hasLabel("software").values("name").fold().
order(Scope.local).
index().
with(WithOptions.indexer, WithOptions.list).
unfold().
order().
by(__.tail(Scope.local, 1)) //// (3)
g.V().hasLabel("person").values("name").fold().
order(Scope.local).
index().
with(WithOptions.indexer, WithOptions.map) //4
-
Indexing non-collection items results in multiple indexed single-item collections.
-
Index all software names in their alphabetical order.
-
Same as statement 1, but with an explicitely specified list indexer.
-
Index all person names in their alphabetical order and store the result in an ordered map.
Additional References
Inject Step
The concept of "injectable steps" makes it possible to insert objects arbitrarily into a traversal stream. In general,
inject()
-step (sideEffect) exists and a few examples are provided below.
gremlin> g.V(4).out().values('name').inject('daniel')
==>daniel
==>ripple
==>lop
gremlin> g.V(4).out().values('name').inject('daniel').map {it.get().length()}
==>6
==>6
==>3
gremlin> g.V(4).out().values('name').inject('daniel').map {it.get().length()}.path()
==>[daniel,6]
==>[v[4],v[5],ripple,6]
==>[v[4],v[3],lop,3]
g.V(4).out().values('name').inject('daniel')
g.V(4).out().values('name').inject('daniel').map {it.get().length()}
g.V(4).out().values('name').inject('daniel').map {it.get().length()}.path()
In the last example above, note that the path starting with daniel
is only of length 2. This is because the
daniel
string was inserted half-way in the traversal. Finally, a typical use case is provided below — when the
start of the traversal is not a graph object.
gremlin> inject(1,2)
==>1
==>2
gremlin> inject(1,2).map {it.get() + 1}
==>2
==>3
gremlin> inject(1,2).map {it.get() + 1}.map {g.V(it.get()).next()}.values('name')
==>vadas
==>lop
inject(1,2)
inject(1,2).map {it.get() + 1}
inject(1,2).map {it.get() + 1}.map {g.V(it.get()).next()}.values('name')
Additional References
Intersect Step
The intersect()
-step (map) calculates the intersection between the incoming list traverser and the provided list
argument. This step only expects list data (array or Iterable) and will throw an IllegalArgumentException
if any other
type is encountered (including null
).
gremlin> g.V().values("name").fold().intersect(["marko","josh","james","jen"])
==>[josh,marko]
gremlin> g.V().values("name").fold().intersect(__.V().limit(2).values("name").fold())
==>[vadas,marko]
g.V().values("name").fold().intersect(["marko","josh","james","jen"])
g.V().values("name").fold().intersect(__.V().limit(2).values("name").fold())
Additional References
IO Step
The task of importing and exporting the data of Graph
instances is the
job of the io()
-step. By default, TinkerPop supports three formats for importing and exporting graph data in
GraphML, GraphSON, and Gryo.
Note
|
Additional documentation for TinkerPop IO formats can be found in the IO Reference. |
By itself the io()
-step merely configures the kind of importing and exporting that is going
to occur and it is the follow-on call to the read()
or write()
step that determines which of those actions will
execute. Therefore, a typical usage of the io()
-step would look like this:
g.io(someInputFile).read().iterate()
g.io(someOutputFile).write().iterate()
Important
|
The commands above are still traversals and therefore require iteration to be executed, hence the use of
iterate() as a termination step.
|
By default, the io()
-step will try to detect the right file format using the file name extension. To gain greater
control of the format use the with()
step modulator to provide further information to io()
. For example:
g.io(someInputFile).
with(IO.reader, IO.graphson).
read().iterate()
g.io(someOutputFile).
with(IO.writer,IO.graphml).
write().iterate()
The IO
class is a helper for the io()
-step that provides expressions that can be used to help configure it
and in this case it allows direct specification of the "reader" or "writer" to use. The "reader" actually refers to
a GraphReader
implementation and the "writer" refers to a GraphWriter
implementation. The implementations of
those interfaces provided by default are the standard TinkerPop implementations.
That default is an important point to consider for users. The default TinkerPop implementations are not designed with massive, complex, parallel bulk loading in mind. They are designed to do single-threaded, OLTP-style loading of data in the most generic way possible so as to accommodate the greatest number of graph databases out there. As such, from a reading perspective, they work best for small datasets (or perhaps medium datasets where memory is plentiful and time is not critical) that are loading to an empty graph - incremental loading is not supported. The story from the writing perspective is not that different in there are no parallel operations in play, however streaming the output to disk requires a single pass of the data without high memory requirements for larger datasets.
Important
|
Default graph formats don’t contain information about property cardinality, so it is up to the graph provider to choose the appropriate one. You will see a warning message if the chosen cardinality is SINGLE while your graph input contains multiple values for that property. |
In general, TinkerPop recommends that users examine the native bulk import/export tools of the graph implementation
that they choose. Those tools will often outperform the io()
-step and perhaps be easier to use with a greater
feature set. That said, graph providers do have the option to optimize io()
to back it with their own
import/export utilities and therefore the default behavior provided by TinkerPop described above might be overridden
by the graph.
An excellent example of this lies in HadoopGraph with SparkGraphComputer
which replaces the default single-threaded implementation with a more advanced OLAP style bulk import/export
functionality internally using CloneVertexProgram. With this model, graphs of arbitrary size
can be imported/exported assuming that there is a Hadoop InputFormat
or OutputFormat
to support it.
Important
|
Remote Gremlin Console users or Gremlin Language Variant (GLV) users (e.g. gremlin-python) who utilize
the io() -step should recall that their read() or write() operation will occur on the server and not locally
and therefore the file specified for import/export must be something accessible by the server.
|
GraphSON and Gryo formats are extensible allowing users and graph providers to extend supported serialization options.
These extensions are exposed through IoRegistry
implementations. To apply an IoRegistry
use the with()
option
and the IO.registry
key, where the value is either an actual IoRegistry
instance or the fully qualified class
name of one.
g.io(someInputFile).
with(IO.reader, IO.gryo).
with(IO.registry, TinkerIoRegistryV3d0.instance())
read().iterate()
g.io(someOutputFile).
with(IO.writer,IO.graphson).
with(IO.registry, "org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV3d0")
write().iterate()
GLVs will obviously always be forced to use the latter form as they can’t explicitly create an instance of an
IoRegistry
to pass to the server (nor are IoRegistry
instances necessarily serializable).
The version of the formats (e.g. GraphSON 2.0 or 3.0) utilized by io()
is determined entirely by the IO.reader
and
IO.writer
configurations or their defaults. The defaults will always be the latest version for the current release
of TinkerPop. It is also possible for graph providers to override these defaults, so consult the documentation of the
underlying graph database in use for any details on that.
Note
|
The io() step will try to automatically detect the appropriate GraphReader or GraphWriter to use based on
the file extension. If the file has a different extension than the ones expected, use with() as shown above to set the
reader or writer explicitly.
|
For more advanced configuration of GraphReader
and GraphWriter
operations (e.g. normalized output for GraphSON,
disabling class registrations for Gryo, etc.) then construct the appropriate GraphReader
and GraphWriter
using
the build()
method on their implementations and use it directly. It can be passed directly to the IO.reader
or
IO.writer
options. Obviously, these are JVM based operations and thus not available to GLVs as portable features.
GraphML
The GraphML file format is a common XML-based representation of a graph. It is widely supported by graph-related tools and libraries making it a solid interchange format for TinkerPop. In other words, if the intent is to work with graph data in conjunction with applications outside of TinkerPop, GraphML may be the best choice to do that. Common use cases might be:
Warning
|
GraphML is a "lossy" format in that it only supports primitive values for properties and does not have
support for Graph variables. It will use toString to serialize property values outside of those primitives.
|
Warning
|
GraphML as a specification allows for <edge> and <node> elements to appear in any order. Most software
that writes GraphML (including as TinkerPop’s GraphMLWriter ) write <node> elements before <edge> elements.
However it is important to note that GraphMLReader will read this data in order and order can matter. This is because
TinkerPop does not allow the vertex label to be changed after the vertex has been created. Therefore, if an <edge>
element comes before the <node> , the label on the vertex will be ignored. It is thus better to order <node>
elements in the GraphML to appear before all <edge> elements if vertex labels are important to the graph.
|
// expects a file extension of .xml or .graphml to determine that
// a GraphML reader/writer should be used.
g.io("graph.xml").read().iterate();
g.io("graph.xml").write().iterate();
Note
|
If using GraphML generated from TinkerPop 2.x, read more about its incompatibilities in the Upgrade Documentation. |
GraphSON
GraphSON is a JSON-based format extended from earlier versions of TinkerPop. It is important to note that TinkerPop’s GraphSON is not backwards compatible with prior TinkerPop GraphSON versions. GraphSON has some support from graph-related application outside of TinkerPop, but it is generally best used in two cases:
-
A text format of the graph or its elements is desired (e.g. debugging, usage in source control, etc.)
-
The graph or its elements need to be consumed by code that is not JVM-based (e.g. JavaScript, Python, .NET, etc.)
// expects a file extension of .json to interpret that
// a GraphSON reader/writer should be used
g.io("graph.json").read().iterate();
g.io("graph.json").write().iterate();
Note
|
Additional documentation for GraphSON can be found in the IO Reference. |
Gryo
Kryo is a popular
serialization package for the JVM. Gremlin-Kryo is a binary Graph
serialization format for use on the JVM by JVM
languages. It is designed to be space efficient, non-lossy and is promoted as the standard format to use when working
with graph data inside of the TinkerPop stack. A list of common use cases is presented below:
-
Migration from one Gremlin Structure implementation to another (e.g.
TinkerGraph
toNeo4jGraph
) -
Serialization of individual graph elements to be sent over the network to another JVM.
-
Backups of in-memory graphs or subgraphs.
Warning
|
When migrating between Gremlin Structure implementations, Kryo may not lose data, but it is important to
consider the features of each Graph and whether or not the data types supported in one will be supported in the
other. Failure to do so, may result in errors.
|
// expects a file extension of .kryo to interpret that
// a GraphSON reader/writer should be used
g.io("graph.kryo").read().iterate()
g.io("graph.kryo").write().iterate()
Additional References
Is Step
It is possible to filter scalar values using is()
-step (filter).
Python
|
The term |
gremlin> g.V().values('age').is(32)
==>32
gremlin> g.V().values('age').is(lte(30))
==>29
==>27
gremlin> g.V().values('age').is(inside(30, 40))
==>32
==>35
gremlin> g.V().where(__.in('created').count().is(1)).values('name') //// (1)
==>ripple
gremlin> g.V().where(__.in('created').count().is(gte(2))).values('name') //// (2)
==>lop
gremlin> g.V().where(__.in('created').values('age').
mean().is(inside(30d, 35d))).values('name') //// (3)
==>lop
==>ripple
g.V().values('age').is(32)
g.V().values('age').is(lte(30))
g.V().values('age').is(inside(30, 40))
g.V().where(__.in('created').count().is(1)).values('name') //// (1)
g.V().where(__.in('created').count().is(gte(2))).values('name') //// (2)
g.V().where(__.in('created').values('age').
mean().is(inside(30d, 35d))).values('name') //3
-
Find projects having exactly one contributor.
-
Find projects having two or more contributors.
-
Find projects whose contributors average age is between 30 and 35.
Additional References
is(Object)
,
is(P)
,
P
Key Step
The key()
-step (map) takes a Property
and extracts the key from it.
gremlin> g.V(1).properties().key()
==>name
==>location
==>location
==>location
==>location
gremlin> g.V(1).properties().properties().key()
==>startTime
==>endTime
==>startTime
==>endTime
==>startTime
==>endTime
==>startTime
g.V(1).properties().key()
g.V(1).properties().properties().key()
Additional References
Label Step
The label()
-step (map) takes an Element
and extracts its label from it.
gremlin> g.V().label()
==>person
==>person
==>software
==>person
==>software
==>person
gremlin> g.V(1).outE().label()
==>created
==>knows
==>knows
gremlin> g.V(1).properties().label()
==>name
==>age
g.V().label()
g.V(1).outE().label()
g.V(1).properties().label()
Additional References
Length Step
The length()
-step (map) returns the length incoming string or list of string traverser. Null values are not processed and remain as null when returned.
If the incoming traverser is a non-String value then an IllegalArgumentException
will be thrown.
gremlin> g.V().values('name').length() //// (1)
==>5
==>5
==>3
==>4
==>6
==>5
gremlin> g.V().values('name').fold().length(local) //// (2)
==>[5,5,3,4,6,5]
g.V().values('name').length() //// (1)
g.V().values('name').fold().length(local) //2
-
Return the string length of all vertex names.
-
Use
Scope.local
to operate on individual string elements inside incoming list, which will return a list.
Additional References
Limit Step
The limit()
-step is analogous to range()
-step save that the lower end range is set to 0.
gremlin> g.V().limit(2)
==>v[1]
==>v[2]
gremlin> g.V().range(0, 2)
==>v[1]
==>v[2]
g.V().limit(2)
g.V().range(0, 2)
The limit()
-step can also be applied with Scope.local
, in which case it operates on the incoming collection.
The examples below use the The Crew toy data set.
gremlin> g.V().valueMap().select('location').limit(local,2) //// (1)
==>[san diego,santa cruz]
==>[centreville,dulles]
==>[bremen,baltimore]
==>[spremberg,kaiserslautern]
gremlin> g.V().valueMap().limit(local, 1) //// (2)
==>[name:[marko]]
==>[name:[stephen]]
==>[name:[matthias]]
==>[name:[daniel]]
==>[name:[gremlin]]
==>[name:[tinkergraph]]
g.V().valueMap().select('location').limit(local,2) //// (1)
g.V().valueMap().limit(local, 1) //2
-
List<String>
for each vertex containing the first two locations. -
Map<String, Object>
for each vertex, but containing only the first property value.
Additional References
Local Step
A GraphTraversal
operates on a continuous stream of objects. In many situations, it is important to operate on a
single element within that stream. To do such object-local traversal computations, local()
-step exists (branch).
Note that the examples below use the The Crew toy data set.
gremlin> g.V().as('person').
properties('location').order().by('startTime',asc).limit(2).value().as('location').
select('person','location').by('name').by() //// (1)
==>[person:daniel,location:spremberg]
==>[person:stephen,location:centreville]
gremlin> g.V().as('person').
local(properties('location').order().by('startTime',asc).limit(2)).value().as('location').
select('person','location').by('name').by() //// (2)
==>[person:marko,location:san diego]
==>[person:marko,location:santa cruz]
==>[person:stephen,location:centreville]
==>[person:stephen,location:dulles]
==>[person:matthias,location:bremen]
==>[person:matthias,location:baltimore]
==>[person:daniel,location:spremberg]
==>[person:daniel,location:kaiserslautern]
g.V().as('person').
properties('location').order().by('startTime',asc).limit(2).value().as('location').
select('person','location').by('name').by() //// (1)
g.V().as('person').
local(properties('location').order().by('startTime',asc).limit(2)).value().as('location').
select('person','location').by('name').by() //2
-
Get the first two people and their respective location according to the most historic location start time.
-
For every person, get their two most historic locations.
The two traversals above look nearly identical save the inclusion of local()
which wraps a section of the traversal
in an object-local traversal. As such, the order().by()
and the limit()
refer to a particular object, not to the
stream as a whole.
Local Step is quite similar in functionality to Flat Map Step where it can often be confused.
local()
propagates the traverser through the internal traversal as is without splitting/cloning it. Thus, its
a “global traversal” with local processing. Its use is subtle and primarily finds application in compilation
optimizations (i.e. when writing TraversalStrategy
implementations. As another example consider:
gremlin> g.V().both().barrier().flatMap(groupCount().by("name"))
==>[lop:1]
==>[lop:1]
==>[lop:1]
==>[vadas:1]
==>[josh:1]
==>[josh:1]
==>[josh:1]
==>[marko:1]
==>[marko:1]
==>[marko:1]
==>[peter:1]
==>[ripple:1]
gremlin> g.V().both().barrier().local(groupCount().by("name"))
==>[lop:3]
==>[vadas:1]
==>[josh:3]
==>[marko:3]
==>[peter:1]
==>[ripple:1]
g.V().both().barrier().flatMap(groupCount().by("name"))
g.V().both().barrier().local(groupCount().by("name"))
Use of local()
is often a mistake. This is especially true when its argument contains a reducing step. For example,
let’s say the requirement was to count the number of properties per Vertex
in:
gremlin> g.V().both().local(properties('name','age').count()) //// (1)
==>3
==>2
==>6
==>6
==>2
==>1
gremlin> g.V().both().map(properties('name','age').count()) //// (2)
==>1
==>1
==>1
==>2
==>2
==>2
==>2
==>2
==>2
==>2
==>2
==>1
g.V().both().local(properties('name','age').count()) //// (1)
g.V().both().map(properties('name','age').count()) //2
-
The output here seems impossible because no single vertex in the "modern" graph can have more than two properties given the "name" and "age" filters, but because the counting is happening object-local the counting is occurring unique to each object rather than each global traverser.
-
Replacing
local()
withmap()
returns the result desired by the requirement.
Warning
|
The anonymous traversal of local() processes the current object "locally." In OLAP, where the atomic unit
of computing is the vertex and its local "star graph," it is important that the anonymous traversal does not leave
the confines of the vertex’s star graph. In other words, it can not traverse to an adjacent vertex’s properties or edges.
|
Additional References
Loops Step
The loops()
-step (map) extracts the number of times the Traverser
has gone through the current loop.
gremlin> g.V().emit(__.has("name", "marko").or().loops().is(2)).repeat(__.out()).values("name")
==>marko
==>ripple
==>lop
g.V().emit(__.has("name", "marko").or().loops().is(2)).repeat(__.out()).values("name")
Additional References
LTrim Step
The lTrim()
-step (map) returns a string with leading whitespace removed. Null values are not processed and remain
as null when returned. If the incoming traverser is a non-String value then an IllegalArgumentException
will be thrown.
gremlin> g.inject(" hello ", " world ", null).lTrim()
==>hello
==>world
==>null
gremlin> g.inject([" hello ", " world ", null]).lTrim(local) //// (1)
==>[hello ,world ,null]
g.inject(" hello ", " world ", null).lTrim()
g.inject([" hello ", " world ", null]).lTrim(local) //1
-
Use
Scope.local
to operate on individual string elements inside incoming list, which will return a list.
Map Step
The map()
step maps the traverser from the current object to the next step in the process. Please see the
General Steps section for more information.
Additional References
Match Step
The match()
-step (map) provides a more declarative
form of graph querying based on the notion of pattern matching.
With match()
, the user provides a collection of "traversal fragments," called patterns, that have variables defined
that must hold true throughout the duration of the match()
. When a traverser is in match()
, a registered
MatchAlgorithm
analyzes the current state of the traverser (i.e. its history based on its
path data), the runtime statistics of the traversal patterns, and returns a traversal-pattern
that the traverser should try next. The default MatchAlgorithm
provided is called CountMatchAlgorithm
and it
dynamically revises the pattern execution plan by sorting the patterns according to their filtering capabilities
(i.e. largest set reduction patterns execute first). For very large graphs, where the developer is uncertain of the
statistics of the graph (e.g. how many knows
-edges vs. worksFor
-edges exist in the graph), it is advantageous to
use match()
, as an optimal plan will be determined automatically. Furthermore, some queries are much easier to
express via match()
than with single-path traversals.
"Who created a project named 'lop' that was also created by someone who is 29 years old? Return the two creators."
gremlin> g.V().match(
__.as('a').out('created').as('b'),
__.as('b').has('name', 'lop'),
__.as('b').in('created').as('c'),
__.as('c').has('age', 29)).
select('a','c').by('name')
==>[a:marko,c:marko]
==>[a:josh,c:marko]
==>[a:peter,c:marko]
g.V().match(
__.as('a').out('created').as('b'),
__.as('b').has('name', 'lop'),
__.as('b').in('created').as('c'),
__.as('c').has('age', 29)).
select('a','c').by('name')
Note that the above can also be more concisely written as below which demonstrates that standard inner-traversals can be arbitrarily defined.
gremlin> g.V().match(
__.as('a').out('created').has('name', 'lop').as('b'),
__.as('b').in('created').has('age', 29).as('c')).
select('a','c').by('name')
==>[a:marko,c:marko]
==>[a:josh,c:marko]
==>[a:peter,c:marko]
g.V().match(
__.as('a').out('created').has('name', 'lop').as('b'),
__.as('b').in('created').has('age', 29).as('c')).
select('a','c').by('name')
In order to improve readability, as()
-steps can be given meaningful labels which better reflect your domain. The
previous query can thus be written in a more expressive way as shown below.
gremlin> g.V().match(
__.as('creators').out('created').has('name', 'lop').as('projects'), //// (1)
__.as('projects').in('created').has('age', 29).as('cocreators')). //// (2)
select('creators','cocreators').by('name') //// (3)
==>[creators:marko,cocreators:marko]
==>[creators:josh,cocreators:marko]
==>[creators:peter,cocreators:marko]
g.V().match(
__.as('creators').out('created').has('name', 'lop').as('projects'), //// (1)
__.as('projects').in('created').has('age', 29).as('cocreators')). //// (2)
select('creators','cocreators').by('name') //3
-
Find vertices that created something and match them as 'creators', then find out what they created which is named 'lop' and match these vertices as 'projects'.
-
Using these 'projects' vertices, find out their creators aged 29 and remember these as 'cocreators'.
-
Return the name of both 'creators' and 'cocreators'.
MatchStep
brings functionality similar to SPARQL to Gremlin. Like SPARQL,
MatchStep conjoins a set of patterns applied to a graph. For example, the following traversal finds exactly those
songs which Jerry Garcia has both sung and written (using the Grateful Dead graph distributed in the data/
directory):
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.io('data/grateful-dead.xml').read().iterate()
gremlin> g.V().match(
__.as('a').has('name', 'Garcia'),
__.as('a').in('writtenBy').as('b'),
__.as('a').in('sungBy').as('b')).
select('b').values('name')
==>CREAM PUFF WAR
==>CRYPTICAL ENVELOPMENT
g = traversal().withEmbedded(graph)
g.io('data/grateful-dead.xml').read().iterate()
g.V().match(
__.as('a').has('name', 'Garcia'),
__.as('a').in('writtenBy').as('b'),
__.as('a').in('sungBy').as('b')).
select('b').values('name')
Among the features which differentiate match()
from SPARQL are:
gremlin> g.V().match(
__.as('a').out('created').has('name','lop').as('b'), //// (1)
__.as('b').in('created').has('age', 29).as('c'),
__.as('c').repeat(out()).times(2)). //// (2)
select('c').out('knows').dedup().values('name') //// (3)
==>vadas
==>josh
g.V().match(
__.as('a').out('created').has('name','lop').as('b'), //// (1)
__.as('b').in('created').has('age', 29).as('c'),
__.as('c').repeat(out()).times(2)). //// (2)
select('c').out('knows').dedup().values('name') //3
-
Patterns of arbitrary complexity:
match()
is not restricted to triple patterns or property paths. -
Recursion support:
match()
supports the branch-based steps within a pattern, includingrepeat()
. -
Imperative/declarative hybrid: Before and after a
match()
, it is possible to leverage classic Gremlin traversals.
To extend point #3, it is possible to support going from imperative, to declarative, to imperative, ad infinitum.
gremlin> g.V().match(
__.as('a').out('knows').as('b'),
__.as('b').out('created').has('name','lop')).
select('b').out('created').
match(
__.as('x').in('created').as('y'),
__.as('y').out('knows').as('z')).
select('z').values('name')
==>vadas
==>josh
g.V().match(
__.as('a').out('knows').as('b'),
__.as('b').out('created').has('name','lop')).
select('b').out('created').
match(
__.as('x').in('created').as('y'),
__.as('y').out('knows').as('z')).
select('z').values('name')
Important
|
The match() -step is stateless. The variable bindings of the traversal patterns are stored in the path
history of the traverser. As such, the variables used over all match() -steps within a traversal are globally unique.
A benefit of this is that subsequent where() , select() , match() , etc. steps can leverage the same variables in
their analysis.
|
Like all other steps in Gremlin, match()
is a function and thus, match()
within match()
is a natural consequence
of Gremlin’s functional foundation (i.e. recursive matching).
gremlin> g.V().match(
__.as('a').out('knows').as('b'),
__.as('b').out('created').has('name','lop'),
__.as('b').match(
__.as('b').out('created').as('c'),
__.as('c').has('name','ripple')).
select('c').as('c')).
select('a','c').by('name')
==>[a:marko,c:ripple]
g.V().match(
__.as('a').out('knows').as('b'),
__.as('b').out('created').has('name','lop'),
__.as('b').match(
__.as('b').out('created').as('c'),
__.as('c').has('name','ripple')).
select('c').as('c')).
select('a','c').by('name')
If a step-labeled traversal proceeds the match()
-step and the traverser entering the match()
is destined to bind
to a particular variable, then the previous step should be labeled accordingly.
gremlin> g.V().as('a').out('knows').as('b').
match(
__.as('b').out('created').as('c'),
__.not(__.as('c').in('created').as('a'))).
select('a','b','c').by('name')
==>[a:marko,b:josh,c:ripple]
g.V().as('a').out('knows').as('b').
match(
__.as('b').out('created').as('c'),
__.not(__.as('c').in('created').as('a'))).
select('a','b','c').by('name')
There are three types of match()
traversal patterns.
-
as('a')…as('b')
: both the start and end of the traversal have a declared variable. -
as('a')…
: only the start of the traversal has a declared variable. -
…
: there are no declared variables.
If a variable is at the start of a traversal pattern it must exist as a label in the path history of the traverser
else the traverser can not go down that path. If a variable is at the end of a traversal pattern then if the variable
exists in the path history of the traverser, the traverser’s current location must match (i.e. equal) its historic
location at that same label. However, if the variable does not exist in the path history of the traverser, then the
current location is labeled as the variable and thus, becomes a bound variable for subsequent traversal patterns. If a
traversal pattern does not have an end label, then the traverser must simply "survive" the pattern (i.e. not be
filtered) to continue to the next pattern. If a traversal pattern does not have a start label, then the traverser
can go down that path at any point, but will only go down that pattern once as a traversal pattern is executed once
and only once for the history of the traverser. Typically, traversal patterns that do not have a start and end label
are used in conjunction with and()
, or()
, and where()
. Once the traverser has "survived" all the patterns (or at
least one for or()
), match()
-step analyzes the traverser’s path history and emits a Map<String,Object>
of the
variable bindings to the next step in the traversal.
gremlin> g.V().as('a').out().as('b'). //// (1)
match( //// (2)
__.as('a').out().count().as('c'), //// (3)
__.not(__.as('a').in().as('b')), //// (4)
or( //// (5)
__.as('a').out('knows').as('b'),
__.as('b').in().count().as('c').and().as('c').is(gt(2)))). //// (6)
dedup('a','c'). //// (7)
select('a','b','c').by('name').by('name').by() //// (8)
==>[a:marko,b:lop,c:3]
g.V().as('a').out().as('b'). //// (1)
match( //// (2)
__.as('a').out().count().as('c'), //// (3)
__.not(__.as('a').in().as('b')), //// (4)
or( //// (5)
__.as('a').out('knows').as('b'),
__.as('b').in().count().as('c').and().as('c').is(gt(2)))). //// (6)
dedup('a','c'). //// (7)
select('a','b','c').by('name').by('name').by() //8
-
A standard, step-labeled traversal can come prior to
match()
. -
If the traverser’s path prior to entering
match()
has requisite label values, then those historic values are bound. -
It is possible to use barrier steps though they are computed locally to the pattern (as one would expect).
-
It is possible to
not()
a pattern. -
It is possible to nest
and()
- andor()
-steps for conjunction matching. -
Both infix and prefix conjunction notation is supported.
-
It is possible to "distinct" the specified label combination.
-
The bound values are of different types — vertex ("a"), vertex ("b"), long ("c").
Using Where with Match
Match is typically used in conjunction with both select()
(demonstrated previously) and where()
(presented here).
A where()
-step allows the user to further constrain the result set provided by match()
.
gremlin> g.V().match(
__.as('a').out('created').as('b'),
__.as('b').in('created').as('c')).
where('a', neq('c')).
select('a','c').by('name')
==>[a:marko,c:josh]
==>[a:marko,c:peter]
==>[a:josh,c:marko]
==>[a:josh,c:peter]
==>[a:peter,c:marko]
==>[a:peter,c:josh]
g.V().match(
__.as('a').out('created').as('b'),
__.as('b').in('created').as('c')).
where('a', neq('c')).
select('a','c').by('name')
The where()
-step can take either a P
-predicate (example above) or a Traversal
(example below). Using
MatchPredicateStrategy
, where()
-clauses are automatically folded into match()
and thus, subject to the query
optimizer within match()
-step.
gremlin> traversal = g.V().match(
__.as('a').has(label,'person'), //// (1)
__.as('a').out('created').as('b'),
__.as('b').in('created').as('c')).
where(__.as('a').out('knows').as('c')). //// (2)
select('a','c').by('name'); null //// (3)
==>null
gremlin> traversal.toString() //// (4)
==>[GraphStep(vertex,[]), MatchStep(null,AND,[[MatchStartStep(a), HasStep([~label.eq(person)]), MatchEndStep(null)], [MatchStartStep(a), VertexStep(OUT,[created],vertex), MatchEndStep(b)], [MatchStartStep(b), VertexStep(IN,[created],vertex), MatchEndStep(c)]]), WhereTraversalStep([WhereStartStep(a), VertexStep(OUT,[knows],vertex), WhereEndStep(c)]), SelectStep(last,[a, c],[value(name)])]
gremlin> traversal // // (5) (6)
==>[a:marko,c:josh]
gremlin> traversal.toString() //// (7)
==>[TinkerGraphStep(vertex,[~label.eq(person)])@[a], MatchStep(null,AND,[[MatchStartStep(a), VertexStep(OUT,[created],vertex), MatchEndStep(b)], [MatchStartStep(b), VertexStep(IN,[created],vertex), MatchEndStep(c)], [MatchStartStep(a), WhereTraversalStep([WhereStartStep(null), VertexStep(OUT,[knows],vertex), WhereEndStep(c)]), MatchEndStep(null)]]), SelectStep(last,[a, c],[value(name)])]
traversal = g.V().match(
__.as('a').has(label,'person'), //// (1)
__.as('a').out('created').as('b'),
__.as('b').in('created').as('c')).
where(__.as('a').out('knows').as('c')). //// (2)
select('a','c').by('name'); null //// (3)
traversal.toString() //// (4)
traversal // // (5) (6) (5)
traversal.toString() //7
-
Any
has()
-step traversal patterns that start with the match-key are pulled out ofmatch()
to enable the graph system to leverage the filter for index lookups. -
A
where()
-step with a traversal containing variable bindings declared inmatch()
. -
A useful trick to ensure that the traversal is not iterated by Gremlin Console.
-
The string representation of the traversal prior to its strategies being applied.
-
The Gremlin Console will automatically iterate anything that is an iterator or is iterable.
-
Both marko and josh are co-developers and marko knows josh.
-
The string representation of the traversal after the strategies have been applied (and thus,
where()
is folded intomatch()
)
Important
|
A where() -step is a filter and thus, variables within a where() clause are not globally bound to the
path of the traverser in match() . As such, where() -steps in match() are used for filtering, not binding.
|
Additional References
Math Step
The math()
-step (math) enables scientific calculator functionality within Gremlin. This step deviates from the common
function composition and nesting formalisms to provide an easy to read string-based math processor. Variables within the
equation map to scopes in Gremlin — e.g. path labels, side-effects, or incoming map keys. This step supports
by()
-modulation where the by()
-modulators are applied in the order in which the variables are first referenced
within the equation. Note that the reserved variable _
refers to the current numeric traverser object incoming to the
math()
-step.
gremlin> g.V().as('a').out('knows').as('b').math('a + b').by('age')
==>56.0
==>61.0
gremlin> g.V().as('a').out('created').as('b').
math('b + a').
by(both().count().math('_ + 100')).
by('age')
==>132.0
==>133.0
==>135.0
==>138.0
gremlin> g.withSideEffect('x',10).V().values('age').math('_ / x')
==>2.9
==>2.7
==>3.2
==>3.5
gremlin> g.withSack(1).V(1).repeat(sack(sum).by(constant(1))).times(10).emit().sack().math('sin _')
==>0.9092974268256817
==>0.1411200080598672
==>-0.7568024953079282
==>-0.9589242746631385
==>-0.27941549819892586
==>0.6569865987187891
==>0.9893582466233818
==>0.4121184852417566
==>-0.5440211108893698
==>-0.9999902065507035
gremlin> g.V().math('_+1').by('age') //// (1)
==>30.0
==>28.0
==>33.0
==>36.0
g.V().as('a').out('knows').as('b').math('a + b').by('age')
g.V().as('a').out('created').as('b').
math('b + a').
by(both().count().math('_ + 100')).
by('age')
g.withSideEffect('x',10).V().values('age').math('_ / x')
g.withSack(1).V(1).repeat(sack(sum).by(constant(1))).times(10).emit().sack().math('sin _')
g.V().math('_+1').by('age') //1
-
The "age" property is not productive for all vertices and therefore those values are filtered.
The operators supported by the calculator include: *
, +
, /
, ^
, and %
. Furthermore, the following built in
functions are provided:
-
abs
: absolute value -
acos
: arc cosine -
asin
: arc sine -
atan
: arc tangent -
cbrt
: cubic root -
ceil
: nearest upper integer -
cos
: cosine -
cosh
: hyperbolic cosine -
exp
: euler’s number raised to the power (e^x
) -
floor
: nearest lower integer -
log
: logarithmus naturalis (base e) -
log10
: logarithm (base 10) -
log2
: logarithm (base 2) -
sin
: sine -
sinh
: hyperbolic sine -
sqrt
: square root -
tan
: tangent -
tanh
: hyperbolic tangent -
signum
: signum function
Additional References
Max Step
The max()
-step (map) operates on a stream of comparable objects and determines which is the last object according
to its natural order in the stream.
gremlin> g.V().values('age').max()
==>35
gremlin> g.V().repeat(both()).times(3).values('age').max()
==>35
gremlin> g.V().values('name').max()
==>vadas
g.V().values('age').max()
g.V().repeat(both()).times(3).values('age').max()
g.V().values('name').max()
When called as max(local)
it determines the maximum value of the current, local object (not the objects in the
traversal stream). This works for Collection
and Comparable
-type objects.
gremlin> g.V().values('age').fold().max(local)
==>35
g.V().values('age').fold().max(local)
When there are null
values being evaluated the null
objects are ignored, but if all values are recognized as null
the return value is null
.
gremlin> g.inject(null,10, 9, null).max()
==>10
gremlin> g.inject([null,null,null]).max(local)
==>null
g.inject(null,10, 9, null).max()
g.inject([null,null,null]).max(local)
Additional References
Mean Step
The mean()
-step (map) operates on a stream of numbers and determines the average of those numbers.
gremlin> g.V().values('age').mean()
==>30.75
gremlin> g.V().repeat(both()).times(3).values('age').mean() //// (1)
==>30.645833333333332
gremlin> g.V().repeat(both()).times(3).values('age').dedup().mean()
==>30.75
g.V().values('age').mean()
g.V().repeat(both()).times(3).values('age').mean() //// (1)
g.V().repeat(both()).times(3).values('age').dedup().mean()
-
Realize that traversers are being bulked by
repeat()
. There may be more of a particular number than another, thus altering the average.
When called as mean(local)
it determines the mean of the current, local object (not the objects in the traversal
stream). This works for Collection
and Number
-type objects.
gremlin> g.V().values('age').fold().mean(local)
==>30.75
g.V().values('age').fold().mean(local)
If mean()
encounters null
values, they will be ignored (i.e. their traversers not counted toward toward the
divisor). If all traversers are null
then the stream will return null
.
gremlin> g.inject(null,10, 9, null).mean()
==>9.5
gremlin> g.inject([null,null,null]).mean(local)
==>null
g.inject(null,10, 9, null).mean()
g.inject([null,null,null]).mean(local)
Additional References
Merge Step
The merge()
-step (map) combines collections like lists and maps. It expects an incoming traverser to contain a
collection objection and will combine that object with its specified argument which must be of a matching type. This is
also known as the union operation. If the incoming traverser or its associated argument do not meet the expected type,
the step will throw an IllegalArgumentException
if any other type is encountered (including null
). This step differs
from the combine()
-step in that it doesn’t allow duplicates.
gremlin> g.V().values("name").fold().merge(["james","jen","marko","vadas"])
==>[jen,ripple,peter,vadas,james,josh,lop,marko]
gremlin> g.V().values("name").fold().merge(__.constant("james").fold())
==>[ripple,peter,vadas,james,josh,lop,marko]
gremlin> g.V().hasLabel('software').elementMap().merge([year:2009])
==>[name:lop,id:3,label:software,lang:java,year:2009]
==>[name:ripple,id:5,label:software,lang:java,year:2009]
g.V().values("name").fold().merge(["james","jen","marko","vadas"])
g.V().values("name").fold().merge(__.constant("james").fold())
g.V().hasLabel('software').elementMap().merge([year:2009])
Additional References
MergeEdge Step
The mergeE()
step is used to add edges and their properties to a graph in a "create
if not exist" fashion. The mergeE()
step can also be used to find edges matching a given
pattern. The input passed to mergeE()
can be either a Map
, or a child traversal that
produces a Map
.
Note
|
There is a corresponding mergeV() step that can be used when creating vertices.
|
Additionally, option()
modulators may be combined with mergeE()
to take action depending on
whether a vertex was created, or already existed. There are various ways that mergeE()
can
be used. The simplest being to provide a single Map
of keys and values, along with the
source and target vertex IDs, as a parameter. A T.id
and a T.label
may also be provided but
this is optional. The mergeE()
step can be used directly from the GraphTraversalSource
- g
,
or in the middle of a traversal. For a match with an existing vertex to occur, all values
in the Map
must exist on a vertex; otherwise, a new vertex will be created. The examples
that follow show how mergeE()
can be used to add relationships between dogs in the graph.
gremlin> g.mergeV([(T.id):1,(T.label):'Dog',name:'Toby'])
==>v[1]
gremlin> g.mergeV([(T.id):2,(T.label):'Dog',name:'Brandy']) //// (1)
==>v[2]
gremlin> g.mergeE([(T.label):'Sibling',created:'2022-02-07',(Direction.from):1,(Direction.to):2]) //// (2)
==>e[2][1-Sibling->2]
gremlin> g.E().elementMap()
==>[id:2,label:Sibling,IN:[id:2,label:Dog],OUT:[id:1,label:Dog],created:2022-02-07]
g.mergeV([(T.id):1,(T.label):'Dog',name:'Toby'])
g.mergeV([(T.id):2,(T.label):'Dog',name:'Brandy']) //// (1)
g.mergeE([(T.label):'Sibling',created:'2022-02-07',(Direction.from):1,(Direction.to):2]) //// (2)
g.E().elementMap()
-
Create two vertices with ID values of 1 and 2.
-
Create a "Sibling" relationship between the vertices.
Note
|
The example above is written with gremlin-groovy and evaluated in Gremlin Console as a Groovy script thus
allowing Groovy syntax for initializing a Map .
|
For a mergeE()
step to succeed, both the from
and to
vertices must already exist. It
is not possible to create new vertices directly using mergeE()
, but mergeV()
and mergeE()
steps can be combined, in a single query, to achieve that goal.
Note
|
The mergeE() step will not create vertices that do not exist. In those cases an
error will be returned.
|
If the Direction
enum has been statically included, its explicit use can be omitted from
the query.
gremlin> g.mergeV([(T.id):1,(T.label):'Dog',name:'Toby'])
==>v[1]
gremlin> g.mergeV([(T.id):2,(T.label):'Dog',name:'Brandy'])
==>v[2]
gremlin> g.mergeE([(T.label):'Sibling',created:'2022-02-07',(from):1,(to):2])
==>e[2][1-Sibling->2]
gremlin> g.E().elementMap()
==>[id:2,label:Sibling,IN:[id:2,label:Dog],OUT:[id:1,label:Dog],created:2022-02-07]
g.mergeV([(T.id):1,(T.label):'Dog',name:'Toby'])
g.mergeV([(T.id):2,(T.label):'Dog',name:'Brandy'])
g.mergeE([(T.label):'Sibling',created:'2022-02-07',(from):1,(to):2])
g.E().elementMap()
One or more option()
steps can be used to control the behavior when an edge is created or
updated. Similar to mergeV()
, the onCreate Map
inherits from the main merge argument - any
existence criteria in the main merge argument (T.id
, T.label
, Direction.OUT
, Direction.IN
)
will be automatically carried over to the onCreate action, and these existence criteria cannot be overriden
in the onCreate Map
.
gremlin> g.mergeV([(T.id):1,(T.label):'Dog',name:'Toby'])
==>v[1]
gremlin> g.mergeV([(T.id):2,(T.label):'Dog',name:'Brandy'])
==>v[2]
gremlin> g.withSideEffect('map',[(T.label):'Sibling',(from):1,(to):2]).
mergeE(select('map')).
option(Merge.onCreate,[created:'2022-02-07']). //// (1)
option(Merge.onMatch,[updated:'2022-02-07'])
==>e[2][1-Sibling->2]
gremlin> g.E().elementMap()
==>[id:2,label:Sibling,IN:[id:2,label:Dog],OUT:[id:1,label:Dog],created:2022-02-07]
gremlin> g.withSideEffect('map',[(T.label):'Sibling',(from):1,(to):2]).
mergeE(select('map')).
option(Merge.onCreate,[created:'2022-02-07']).
option(Merge.onMatch,[updated:'2022-02-07']) //// (2)
==>e[2][1-Sibling->2]
gremlin> g.E().elementMap()
==>[id:2,label:Sibling,IN:[id:2,label:Dog],OUT:[id:1,label:Dog],created:2022-02-07,updated:2022-02-07]
g.mergeV([(T.id):1,(T.label):'Dog',name:'Toby'])
g.mergeV([(T.id):2,(T.label):'Dog',name:'Brandy'])
g.withSideEffect('map',[(T.label):'Sibling',(from):1,(to):2]).
mergeE(select('map')).
option(Merge.onCreate,[created:'2022-02-07']). //// (1)
option(Merge.onMatch,[updated:'2022-02-07'])
g.E().elementMap()
g.withSideEffect('map',[(T.label):'Sibling',(from):1,(to):2]).
mergeE(select('map')).
option(Merge.onCreate,[created:'2022-02-07']).
option(Merge.onMatch,[updated:'2022-02-07']) //// (2)
g.E().elementMap()
-
The edge did not exist - set the created date.
-
The edge did exist - set the updated date.
More than one edge can be created by a single mergeE()
operation. This is done by
injecting a list of maps into the traversal and letting them stream into the mergeE()
step.
gremlin> maps = [[(T.label):'Siblings',(from):1,(to):2],
[(T.label):'Siblings',(from):1,(to):3]]
==>[label:Siblings,OUT:1,IN:2]
==>[label:Siblings,OUT:1,IN:3]
gremlin> g.mergeV([(T.id):1,(T.label):'Dog',name:'Toby']) //// (1)
==>v[1]
gremlin> g.mergeV([(T.id):2,(T.label):'Dog',name:'Brandy'])
==>v[2]
gremlin> g.mergeV([(T.id):3,(T.label):'Dog',name:'Dax'])
==>v[3]
gremlin> g.inject(maps).unfold().mergeE() //// (2)
==>e[3][1-Siblings->2]
==>e[4][1-Siblings->3]
gremlin> g.E().elementMap()
==>[id:3,label:Siblings,IN:[id:2,label:Dog],OUT:[id:1,label:Dog]]
==>[id:4,label:Siblings,IN:[id:3,label:Dog],OUT:[id:1,label:Dog]]
maps = [[(T.label):'Siblings',(from):1,(to):2],
[(T.label):'Siblings',(from):1,(to):3]]
g.mergeV([(T.id):1,(T.label):'Dog',name:'Toby']) //// (1)
g.mergeV([(T.id):2,(T.label):'Dog',name:'Brandy'])
g.mergeV([(T.id):3,(T.label):'Dog',name:'Dax'])
g.inject(maps).unfold().mergeE() //// (2)
g.E().elementMap()
-
Create three dogs.
-
Stream the edge maps into
mergeE()
steps.
Warning
|
There is a bit of an inconsistency present when mergeE() is used as a start step versus when it is used
mid-traversal. As a start step, mergeE() will promote the currently created or matched Edge to the child traversal,
allowing you to directly update it like option(onMatch, property('k', 'v').constant([:])) . However, when mergeE() is
used mid-traversal, the Edge is not promoted to the child traversal and the incoming traverser is used instead. Such
behavior is essentially blocked to prevent accidental misuse and will result in an exception at execution time that will
have a message like, "The incoming traverser for MergeEdgeStep cannot be an Element".
|
The mergeE
step can be combined with the mergeV
step (or any other step producing a Vertex
) using the
Merge.outV
and Merge.inV
option modulators. These options can be used to "late-bind" the OUT
and IN
vertices in the main merge argument and in the onCreate
argument:
gremlin> g.mergeV([(T.id):1,(T.label):'Dog',name:'Toby']).as('Toby').
mergeV([(T.id):2,(T.label):'Dog',name:'Brandy']).as('Brandy').
mergeE([(T.label):'Sibling',created:'2022-02-07',(from):Merge.outV,(to):Merge.inV]).
option(Merge.outV, select('Toby')).
option(Merge.inV, select('Brandy'))
==>e[2][1-Sibling->2]
gremlin> g.E().elementMap()
==>[id:2,label:Sibling,IN:[id:2,label:Dog],OUT:[id:1,label:Dog],created:2022-02-07]
g.mergeV([(T.id):1,(T.label):'Dog',name:'Toby']).as('Toby').
mergeV([(T.id):2,(T.label):'Dog',name:'Brandy']).as('Brandy').
mergeE([(T.label):'Sibling',created:'2022-02-07',(from):Merge.outV,(to):Merge.inV]).
option(Merge.outV, select('Toby')).
option(Merge.inV, select('Brandy'))
g.E().elementMap()
The Merge.outV
and Merge.inV
tokens can be used as placeholders for values for Direction.OUT
and Direction.IN
respectively in the mergeE
arguments. These options can produce Vertices
, as in the example above, or they can
specify Maps
, which will be used to search for Vertices
in the graph. This is useful when the exact T.id
of
the from/to vertices is not known in advance:
gremlin> g.mergeV([(T.label):'Dog',name:'Toby'])
==>v[0]
gremlin> g.mergeV([(T.label):'Dog',name:'Brandy'])
==>v[2]
gremlin> g.mergeE([(T.label):'Sibling',created:'2022-02-07',(from):Merge.outV,(to):Merge.inV]).
option(Merge.outV, [(T.label):'Dog',name:'Toby']).
option(Merge.inV, [(T.label):'Dog',name:'Brandy'])
==>e[4][0-Sibling->2]
gremlin> g.E().elementMap()
==>[id:4,label:Sibling,IN:[id:2,label:Dog],OUT:[id:0,label:Dog],created:2022-02-07]
g.mergeV([(T.label):'Dog',name:'Toby'])
g.mergeV([(T.label):'Dog',name:'Brandy'])
g.mergeE([(T.label):'Sibling',created:'2022-02-07',(from):Merge.outV,(to):Merge.inV]).
option(Merge.outV, [(T.label):'Dog',name:'Toby']).
option(Merge.inV, [(T.label):'Dog',name:'Brandy'])
g.E().elementMap()
Additional References
MergeVertex Step
The mergeV()
-step is used to add vertices and their properties to a graph in a "create
if not exist" fashion. The mergeV()
step can also be used to find vertices matching a given
pattern. The input passed to mergeV()
can be either a Map
, or a child Traversal
that
produces a Map
.
Note
|
There is a corresponding mergeE() step that can be used when creating edges.
|
Additionally, option()
modulators may be combined with mergeV()
to take action depending on
whether a vertex was created, or already existed. There are various ways mergeV()
can
be used. The simplest being to provide a single Map
of keys and values as a parameter. A T.id
and a T.label
may also be provided but this is optional. The mergeV()
step can be used directly
from the GraphTraversalSource
- g
, or in the middle of a traversal. For a match with an
existing vertex to occur, all values in the Map
must exist on a vertex; otherwise, a new
vertex will be created. The examples that follow show how mergeV()
can be used to add some
dogs to the graph.
gremlin> g.mergeV([name: 'Brandy']) //// (1)
==>v[0]
gremlin> g.V().has('name','Brandy')
==>v[0]
gremlin> g.mergeV([(T.label):'Dog',name:'Scamp', age:12]) //// (2)
==>v[2]
gremlin> g.V().hasLabel('Dog').valueMap()
==>[name:[Scamp],age:[12]]
gremlin> g.mergeV([(T.id):300, (T.label):'Dog', name:'Toby', age:10]) //// (3)
==>v[300]
gremlin> g.V().hasLabel('Dog').valueMap().with(WithOptions.tokens)
==>[id:2,label:Dog,name:[Scamp],age:[12]]
==>[id:300,label:Dog,name:[Toby],age:[10]]
g.mergeV([name: 'Brandy']) //// (1)
g.V().has('name','Brandy')
g.mergeV([(T.label):'Dog',name:'Scamp', age:12]) //// (2)
g.V().hasLabel('Dog').valueMap()
g.mergeV([(T.id):300, (T.label):'Dog', name:'Toby', age:10]) //// (3)
g.V().hasLabel('Dog').valueMap().with(WithOptions.tokens)
-
Create a vertex for Brandy as no other matching ones exist yet.
-
Create a vertex for Scamp and also add a Dog label his age.
-
Create a vertex for Toby with an
T.id
of 300.
Note
|
The example above is written with gremlin-groovy and evaluated in Gremlin Console as a Groovy script thus
allowing Groovy syntax for initializing a Map .
|
If a vertex already exists that matches the map passed to mergeV()
, the existing
vertex will be returned, otherwise a new one will be created. In this way, mergeV()
provides "get or create" semantics.
gremlin> g.mergeV([name: 'Brandy']) //// (1)
==>v[0]
g.mergeV([name: 'Brandy']) //1
-
A vertex for Brandy already exists so return it. A new one is not created.
It’s important to note that every key/value pair passed to mergeV()
must already exist on
one or more vertices for there to be a match. If a match is found, the vertex, or
vertices, representing that match will be returned. If a vertex representing a dog called
Brandy already exists, but it does not have an "age" property, the mergeV()
below will not
find a match and a new vertex will be created.
gremlin> g.addV('Dog').property('name','Brandy') //// (1)
==>v[0]
gremlin> g.mergeV([(T.label):'Dog',name:'Brandy',age:13]) //// (2)
==>v[2]
g.addV('Dog').property('name','Brandy') //// (1)
g.mergeV([(T.label):'Dog',name:'Brandy',age:13]) //2
-
Create a vertex for Brandy with no age property.
-
A new vertex is created as there is no exact match to any existing vertices.
A common scenario is to search for a vertex with a known T.id
and if it exists return that
vertex. If it does not exist, create it. As we have seen, one way to do this is to pass
the T.id
and all properties directly to mergeV()
. Another is to use Merge.onCreate
. Note
that the Map
specified for Match.onCreate
does not need to include the T.id
already present
in the original search. The values provided to the mergeV()
Map
are inherited by the onCreate
action and combined with the Map
provided to Merge.onCreate
. Overrides of the T.id
or T.label
in the onCreate Map
are prohibited.
gremlin> g.mergeV([(T.id):300]).
option(Merge.onCreate,[(T.label):'Dog', name:'Toby', age:10])
==>v[300]
g.mergeV([(T.id):300]).
option(Merge.onCreate,[(T.label):'Dog', name:'Toby', age:10])
To take specific action when the vertex already exists, Merge.onMatch
can be used. The
second parameter to the option
step can be either a Map
whose values are used to update
the vertex or another Gremlin traversal that generates a Map
.
Note
|
If mergeV() is given an empty Map ; such as mergeV([:]) , it will match, and
return, every vertex in the graph. This is the same behavior seen with V([]) .
|
gremlin> g.mergeV([(T.id):300]).
option(Merge.onCreate,[(T.label):'Dog', name:'Toby', age:10]). //// (1)
option(Merge.onMatch,[age:11]) //// (2)
==>v[300]
gremlin> g.withSideEffect('new-data',[age:11]).
mergeV([(T.id):300]).
option(Merge.onCreate,[(T.label):'Dog', name:'Toby', age:10]).
option(Merge.onMatch,select('new-data')) //// (3)
==>v[300]
gremlin> g.V(300).valueMap().with(WithOptions.tokens)
==>[id:300,label:Dog,name:[Toby],age:[11]]
g.mergeV([(T.id):300]).
option(Merge.onCreate,[(T.label):'Dog', name:'Toby', age:10]). //// (1)
option(Merge.onMatch,[age:11]) //// (2)
g.withSideEffect('new-data',[age:11]).
mergeV([(T.id):300]).
option(Merge.onCreate,[(T.label):'Dog', name:'Toby', age:10]).
option(Merge.onMatch,select('new-data')) //// (3)
g.V(300).valueMap().with(WithOptions.tokens)
-
If no match found create the vertex using these values.
-
If a match is found, change the age property value.
-
Change the age property by selecting from the
new-data
map.
It is sometimes helpful to incorporate fail()
step into scenarios where there is a need to stop the traversal
for one event or the other:
gremlin> g.mergeV([(T.id): 1]).
......1> option(onCreate, fail("vertex did not exist")).
......2> option(onMatch, [modified: 2022])
fail() Step Triggered
======================================================================================================================================================================
Message > vertex did not exist
Traverser> false
Bulk > 1
Traversal> fail("vertex did not exist")
Parent > TinkerMergeVertexStep [mergeV([(T.id):((int) 1)]).option(Merge.onCreate,__.fail("vertex did not exist")).option(Merge.onMatch,[("modified"):((int) 2022)])]
Metadata > {}
======================================================================================================================================================================
When working with multi-properties, there are two ways to specify them for mergeV()
. First, you can specify them
individually using a CardinalityValue
as the value in the Map
. The CardinalityValue
allows you to specify the
value as well as the Cardinality
for that value. Note that it is only possible to specify one value with this syntax
even if you are using set
or list
.
gremlin> g.mergeV([(T.label):'Dog', name:'Max']). //// (1)
option(onCreate, [alias: set('Maximus')]). //// (2)
property(set,'alias','Maxamillion') //// (3)
==>v[0]
gremlin> g.V().has('name','Max').valueMap().with(WithOptions.tokens)
==>[id:0,label:Dog,name:[Max],alias:[Maximus,Maxamillion]]
g.mergeV([(T.label):'Dog', name:'Max']). //// (1)
option(onCreate, [alias: set('Maximus')]). //// (2)
property(set,'alias','Maxamillion') //// (3)
g.V().has('name','Max').valueMap().with(WithOptions.tokens)
-
Find or create a vertex for Max.
-
If Max is not found then add an alias of
set
cardinality. -
Whether Max was found or created, add another alias with
set
cardinality.
The second option is to specify Cardinality
for the entire range of values as follows:
gremlin> g.mergeV([(T.label):'Dog', name:'Max']).
option(onCreate, [alias: 'Maximus', city: 'Boston'], set) //// (1)
==>v[0]
gremlin> g.mergeV([(T.label):'Dog', name:'Max']).
option(onCreate, [alias: 'Maximus', city: single('Boston')], set) //// (2)
==>v[0]
g.mergeV([(T.label):'Dog', name:'Max']).
option(onCreate, [alias: 'Maximus', city: 'Boston'], set) //// (1)
g.mergeV([(T.label):'Dog', name:'Max']).
option(onCreate, [alias: 'Maximus', city: single('Boston')], set) //2
-
If Max is created then set the alias and city with cardinality of
set
. -
If Max is created then set the alias with cardinality of
set
and city with cardinalitysingle
.
More than one vertex can be created by a single mergeV()
operation. This is done by
injecting a List
of Map
objects into the traversal and letting them stream into the mergeV()
step.
gremlin> maps = [[(T.label) : 'Dog', name: 'Toby' , breed: 'Golden Retriever'],
[(T.label) : 'Dog', name: 'Brandy', breed: 'Golden Retriever'],
[(T.label) : 'Dog', name: 'Scamp' , breed: 'King Charles Spaniel'],
[(T.label) : 'Dog', name: 'Shadow', breed: 'Mixed'],
[(T.label) : 'Dog', name: 'Rocket', breed: 'Golden Retriever'],
[(T.label) : 'Dog', name: 'Dax' , breed: 'Mixed'],
[(T.label) : 'Dog', name: 'Baxter', breed: 'Mixed'],
[(T.label) : 'Dog', name: 'Zoe' , breed: 'Corgi'],
[(T.label) : 'Dog', name: 'Pixel' , breed: 'Mixed']]
==>[label:Dog,name:Toby,breed:Golden Retriever]
==>[label:Dog,name:Brandy,breed:Golden Retriever]
==>[label:Dog,name:Scamp,breed:King Charles Spaniel]
==>[label:Dog,name:Shadow,breed:Mixed]
==>[label:Dog,name:Rocket,breed:Golden Retriever]
==>[label:Dog,name:Dax,breed:Mixed]
==>[label:Dog,name:Baxter,breed:Mixed]
==>[label:Dog,name:Zoe,breed:Corgi]
==>[label:Dog,name:Pixel,breed:Mixed]
gremlin> g.inject(maps).unfold().mergeV()
==>v[0]
==>v[3]
==>v[6]
==>v[9]
==>v[12]
==>v[15]
==>v[18]
==>v[21]
==>v[24]
gremlin> g.V().hasLabel('Dog').valueMap().with(WithOptions.tokens)
==>[id:0,label:Dog,name:[Toby],breed:[Golden Retriever]]
==>[id:18,label:Dog,name:[Baxter],breed:[Mixed]]
==>[id:3,label:Dog,name:[Brandy],breed:[Golden Retriever]]
==>[id:21,label:Dog,name:[Zoe],breed:[Corgi]]
==>[id:6,label:Dog,name:[Scamp],breed:[King Charles Spaniel]]
==>[id:24,label:Dog,name:[Pixel],breed:[Mixed]]
==>[id:9,label:Dog,name:[Shadow],breed:[Mixed]]
==>[id:12,label:Dog,name:[Rocket],breed:[Golden Retriever]]
==>[id:15,label:Dog,name:[Dax],breed:[Mixed]]
maps = [[(T.label) : 'Dog', name: 'Toby' , breed: 'Golden Retriever'],
[(T.label) : 'Dog', name: 'Brandy', breed: 'Golden Retriever'],
[(T.label) : 'Dog', name: 'Scamp' , breed: 'King Charles Spaniel'],
[(T.label) : 'Dog', name: 'Shadow', breed: 'Mixed'],
[(T.label) : 'Dog', name: 'Rocket', breed: 'Golden Retriever'],
[(T.label) : 'Dog', name: 'Dax' , breed: 'Mixed'],
[(T.label) : 'Dog', name: 'Baxter', breed: 'Mixed'],
[(T.label) : 'Dog', name: 'Zoe' , breed: 'Corgi'],
[(T.label) : 'Dog', name: 'Pixel' , breed: 'Mixed']]
g.inject(maps).unfold().mergeV()
g.V().hasLabel('Dog').valueMap().with(WithOptions.tokens)
Another useful pattern that can be used with mergeV()
involves putting multiple maps in a
list and selecting different maps based on the action being taken. The examples below use
a list containing three maps. The first containing just the ID to be searched for. The
second map contains all the information to use when the vertex is created. The third map
contains additional information that will be applied if an existing vertex is found.
gremlin> g.inject([[(T.id):400],[(T.label):'Dog',name:'Pixel',age:1],[updated:'2022-02-1']]).
mergeV(limit(local,1)). //// (1)
option(Merge.onCreate,range(local,1,2)). //// (2)
option(Merge.onMatch,tail(local)) //// (3)
==>v[400]
gremlin> g.V(400).valueMap().with(WithOptions.tokens)
==>[id:400,label:Dog,name:[Pixel],age:[1]]
gremlin> g.inject([[(T.id):400],[(T.label):'Dog',name:'Pixel',age:1],[updated:'2022-02-1']]).
mergeV(limit(local,1)).
option(Merge.onCreate,range(local,1,2)).
option(Merge.onMatch,tail(local)) //// (4)
==>v[400]
gremlin> g.V(400).valueMap().with(WithOptions.tokens) //// (5)
==>[id:400,label:Dog,name:[Pixel],updated:[2022-02-1],age:[1]]
g.inject([[(T.id):400],[(T.label):'Dog',name:'Pixel',age:1],[updated:'2022-02-1']]).
mergeV(limit(local,1)). //// (1)
option(Merge.onCreate,range(local,1,2)). //// (2)
option(Merge.onMatch,tail(local)) //// (3)
g.V(400).valueMap().with(WithOptions.tokens)
g.inject([[(T.id):400],[(T.label):'Dog',name:'Pixel',age:1],[updated:'2022-02-1']]).
mergeV(limit(local,1)).
option(Merge.onCreate,range(local,1,2)).
option(Merge.onMatch,tail(local)) //// (4)
g.V(400).valueMap().with(WithOptions.tokens) //5
-
Use the first map to search for a vertex with an ID of 400.
-
If the vertex was not found, use the second map to create it.
-
If the vertex was found, add an
updated
property. -
Pixel exists now, so we will take this option.
-
The
updated
property has now been added.
Warning
|
There is a bit of an inconsistency present when mergeV() is used as a start step versus when it is used
mid-traversal. As a start step, mergeV() will promote the currently created or matched Vertex to the child
traversal, allowing you to directly update it like option(onMatch, property('k', 'v').constant([:])) . However, when
mergeV() is used mid-traversal, the Vertex is not promoted to the child traversal and the incoming traverser is used
instead. Such behavior is essentially blocked to prevent accidental misuse and will result in an exception at execution
time that will have a message like, "The incoming traverser for MergeVertexStep cannot be an Element".
|
Additional References
Min Step
The min()
-step (map) operates on a stream of comparable objects and determines which is the first object according
to its natural order in the stream.
gremlin> g.V().values('age').min()
==>27
gremlin> g.V().repeat(both()).times(3).values('age').min()
==>27
gremlin> g.V().values('name').min()
==>josh
g.V().values('age').min()
g.V().repeat(both()).times(3).values('age').min()
g.V().values('name').min()
When called as min(local)
it determines the minimum value of the current, local object (not the objects in the
traversal stream). This works for Collection
and Comparable
-type objects.
gremlin> g.V().values('age').fold().min(local)
==>27
g.V().values('age').fold().min(local)
When there are null
values being evaluated the null
objects are ignored, but if all values are recognized as null
the return value is null
.
gremlin> g.inject(null,10, 9, null).min()
==>9
gremlin> g.inject([null,null,null]).min(local)
==>null
g.inject(null,10, 9, null).min()
g.inject([null,null,null]).min(local)
Additional References
None Step
The none()
-step (filter) filters all objects from a traversal stream. It is especially useful for traversals
that are executed remotely where returning results is not useful and the traversal is only meant to generate
side-effects. Choosing not to return results saves in serialization and network costs as the objects are filtered on
the remote end and not returned to the client side. Typically, this step does not need to be used directly and is
quietly used by the iterate()
terminal step which appends none()
to the traversal before actually cycling through
results.
Note
|
As of release 4.0.0, none() will be renamed to discard() .
|
Additional References
Not Step
The not()
-step (filter) removes objects from the traversal stream when the traversal provided as an argument
returns an object.
Groovy
|
The term |
Python
|
The term |
gremlin> g.V().not(hasLabel('person')).elementMap()
==>[id:3,label:software,name:lop,lang:java]
==>[id:5,label:software,name:ripple,lang:java]
gremlin> g.V().hasLabel('person').
not(out('created').count().is(gt(1))).values('name') //// (1)
==>marko
==>vadas
==>peter
g.V().not(hasLabel('person')).elementMap()
g.V().hasLabel('person').
not(out('created').count().is(gt(1))).values('name') //1
-
josh created two projects and vadas none
Additional References
Option Step
Additional References
Optional Step
The optional()
-step (branch/flatMap) returns the result of the specified traversal if it yields a result else it returns the calling
element, i.e. the identity()
.
gremlin> g.V(2).optional(out('knows')) //// (1)
==>v[2]
gremlin> g.V(2).optional(__.in('knows')) //// (2)
==>v[1]
g.V(2).optional(out('knows')) //// (1)
g.V(2).optional(__.in('knows')) //2
-
vadas does not have an outgoing knows-edge so vadas is returned.
-
vadas does have an incoming knows-edge so marko is returned.
optional
is particularly useful for lifting entire graphs when used in conjunction with path
or tree
.
gremlin> g.V().hasLabel('person').optional(out('knows').optional(out('created'))).path() //// (1)
==>[v[1],v[2]]
==>[v[1],v[4],v[5]]
==>[v[1],v[4],v[3]]
==>[v[2]]
==>[v[4]]
==>[v[6]]
g.V().hasLabel('person').optional(out('knows').optional(out('created'))).path() //1
-
Returns the paths of everybody followed by who they know followed by what they created.
Additional References
Or Step
The or()
-step ensures that at least one of the provided traversals yield a result (filter). Please see
and()
for and-semantics.
Python
|
The term |
gremlin> g.V().or(
__.outE('created'),
__.inE('created').count().is(gt(1))).
values('name')
==>marko
==>lop
==>josh
==>peter
g.V().or(
__.outE('created'),
__.inE('created').count().is(gt(1))).
values('name')
The or()
-step can take an arbitrary number of traversals. At least one of the traversals must produce at least one
output for the original traverser to pass to the next step.
An infix notation can be used as well.
gremlin> g.V().where(outE('created').or().outE('knows')).values('name')
==>marko
==>josh
==>peter
g.V().where(outE('created').or().outE('knows')).values('name')
Additional References
Order Step
When the objects of the traversal stream need to be sorted, order()
-step (map) can be leveraged.
gremlin> g.V().values('name').order()
==>josh
==>lop
==>marko
==>peter
==>ripple
==>vadas
gremlin> g.V().values('name').order().by(desc)
==>vadas
==>ripple
==>peter
==>marko
==>lop
==>josh
gremlin> g.V().hasLabel('person').order().by('age', asc).values('name')
==>vadas
==>marko
==>josh
==>peter
g.V().values('name').order()
g.V().values('name').order().by(desc)
g.V().hasLabel('person').order().by('age', asc).values('name')
One of the most traversed objects in a traversal is an Element
. An element can have properties associated with it
(i.e. key/value pairs). In many situations, it is desirable to sort an element traversal stream according to a
comparison of their properties.
gremlin> g.V().values('name')
==>marko
==>vadas
==>lop
==>josh
==>ripple
==>peter
gremlin> g.V().order().by('name',asc).values('name')
==>josh
==>lop
==>marko
==>peter
==>ripple
==>vadas
gremlin> g.V().order().by('name',desc).values('name')
==>vadas
==>ripple
==>peter
==>marko
==>lop
==>josh
gremlin> g.V().both().order().by('age') //// (1)
==>v[2]
==>v[1]
==>v[1]
==>v[1]
==>v[4]
==>v[4]
==>v[4]
==>v[6]
g.V().values('name')
g.V().order().by('name',asc).values('name')
g.V().order().by('name',desc).values('name')
g.V().both().order().by('age') //1
-
The "age" property is not productive for all vertices and therefore those values are filtered.
The order()
-step allows the user to provide an arbitrary number of comparators for primary, secondary, etc. sorting.
In the example below, the primary ordering is based on the outgoing created-edge count. The secondary ordering is
based on the age of the person.
gremlin> g.V().hasLabel('person').order().by(outE('created').count(), asc).
by('age', asc).values('name')
==>vadas
==>marko
==>peter
==>josh
gremlin> g.V().hasLabel('person').order().by(outE('created').count(), asc).
by('age', desc).values('name')
==>vadas
==>peter
==>marko
==>josh
g.V().hasLabel('person').order().by(outE('created').count(), asc).
by('age', asc).values('name')
g.V().hasLabel('person').order().by(outE('created').count(), asc).
by('age', desc).values('name')
Randomizing the order of the traversers at a particular point in the traversal is possible with Order.shuffle
.
gremlin> g.V().hasLabel('person').order().by(shuffle)
==>v[4]
==>v[1]
==>v[6]
==>v[2]
gremlin> g.V().hasLabel('person').order().by(shuffle)
==>v[6]
==>v[2]
==>v[1]
==>v[4]
g.V().hasLabel('person').order().by(shuffle)
g.V().hasLabel('person').order().by(shuffle)
It is possible to use order(local)
to order the current local object and not the entire traversal stream. This works for
Collection
- and Map
-type objects. For any other object, the object is returned unchanged.
gremlin> g.V().values('age').fold().order(local).by(desc) //// (1)
==>[35,32,29,27]
gremlin> g.V().values('age').order(local).by(desc) //// (2)
==>29
==>27
==>32
==>35
gremlin> g.V().groupCount().by(inE().count()).order(local).by(values, desc) //// (3)
==>[1:3,0:2,3:1]
gremlin> g.V().groupCount().by(inE().count()).order(local).by(keys, asc) //// (4)
==>[0:2,1:3,3:1]
g.V().values('age').fold().order(local).by(desc) //// (1)
g.V().values('age').order(local).by(desc) //// (2)
g.V().groupCount().by(inE().count()).order(local).by(values, desc) //// (3)
g.V().groupCount().by(inE().count()).order(local).by(keys, asc) //4
-
The ages are gathered into a list and then that list is sorted in decreasing order.
-
The ages are not gathered and thus
order(local)
is "ordering" single integers and thus, does nothing. -
The
groupCount()
map is ordered by its values in decreasing order. -
The
groupCount()
map is ordered by its keys in increasing order.
Note
|
The values and keys enums are from Column which is used to select "columns" from a Map , Map.Entry , or Path .
|
If a property key does not exist, then it will be treated as null
which will sort it first for Order.asc
and last
for Order.desc
.
gremlin> g.V().order().by("age").elementMap()
==>[id:2,label:person,name:vadas,age:27]
==>[id:1,label:person,name:marko,age:29]
==>[id:4,label:person,name:josh,age:32]
==>[id:6,label:person,name:peter,age:35]
g.V().order().by("age").elementMap()
Note
|
Prior to version 3.3.4, ordering was defined by Order.incr for ascending order and Order.decr for descending
order. Those tokens were deprecated and eventually removed in 3.5.0.
|
Additional References
PageRank Step
The pageRank()
-step (map/sideEffect) calculates PageRank using
PageRankVertexProgram
.
Important
|
The pageRank() -step is a VertexComputing -step and as such, can only be used against a graph that
supports GraphComputer (OLAP).
|
gremlin> g = traversal().withEmbedded(graph).withComputer()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], graphcomputer]
gremlin> g.V().pageRank().with(PageRank.propertyName, 'friendRank').values('pageRank')
gremlin> g.V().hasLabel('person').
pageRank().
with(PageRank.edges, __.outE('knows')).
with(PageRank.propertyName, 'friendRank').
order().by('friendRank',desc).
elementMap('name','friendRank')
==>[id:4,label:person,friendRank:0.8321166533236799,name:josh]
==>[id:2,label:person,friendRank:0.8321166533236799,name:vadas]
==>[id:6,label:person,friendRank:0.5839416733381598,name:peter]
==>[id:1,label:person,friendRank:0.5839416733381598,name:marko]
g = traversal().withEmbedded(graph).withComputer()
g.V().pageRank().with(PageRank.propertyName, 'friendRank').values('pageRank')
g.V().hasLabel('person').
pageRank().
with(PageRank.edges, __.outE('knows')).
with(PageRank.propertyName, 'friendRank').
order().by('friendRank',desc).
elementMap('name','friendRank')
Note the use of the with()
modulating step which provides configuration options to the algorithm. It takes
configuration keys from the PageRank
and is automatically imported to the Gremlin Console.
The explain()
-step can be used to understand how the traversal is compiled into multiple
GraphComputer
jobs.
gremlin> g = traversal().withEmbedded(graph).withComputer()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], graphcomputer]
gremlin> g.V().hasLabel('person').
pageRank().
with(PageRank.edges, __.outE('knows')).
with(PageRank.propertyName, 'friendRank').
order().by('friendRank',desc).
elementMap('name','friendRank').explain()
==>Traversal Explanation
=============================================================================================================================================================================================================================================
Original Traversal [GraphStep(vertex,[]), HasStep([~label.eq(person)]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), OrderGlobalStep([[value(friendRank), desc]]), ElementMa
pStep([name, friendRank])]
VertexProgramStrategy [D] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
ConnectiveStrategy [D] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
IdentityRemovalStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
MatchPredicateStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
FilterRankingStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
IncidentToAdjacentStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
PathProcessorStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
InlineFilterStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
AdjacentToIncidentStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
MessagePassingReductionStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
RepeatUnrollStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
PathRetractionStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
EarlyLimitStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
ByModulatorOptimizationStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
CountStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
LazyBarrierStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
OrderLimitStrategy [O] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
TinkerGraphCountStrategy [P] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
TinkerGraphStepStrategy [P] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
ProfileStrategy [F] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
ComputerFinalizationStrategy [T] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
ComputerVerificationStrategy [V] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
StandardVerificationStrategy [V] [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
Final Traversal [TraversalVertexProgramStep([GraphStep(vertex,[]), HasStep([~label.eq(person)])],graphfilter[none]), PageRankVertexProgramStep([VertexStep(OUT,[knows],edge)],friendRank,20,graphfilter[none]), Travers
alVertexProgramStep([OrderGlobalStep([[value(friendRank), desc]]), ElementMapStep([name, friendRank])],graphfilter[none]), ComputerResultStep]
g = traversal().withEmbedded(graph).withComputer()
g.V().hasLabel('person').
pageRank().
with(PageRank.edges, __.outE('knows')).
with(PageRank.propertyName, 'friendRank').
order().by('friendRank',desc).
elementMap('name','friendRank').explain()
Additional References
Path Step
A traverser is transformed as it moves through a series of steps within a traversal. The history of the traverser is
realized by examining its path with path()
-step (map).
gremlin> g.V().out().out().values('name')
==>ripple
==>lop
gremlin> g.V().out().out().values('name').path()
==>[v[1],v[4],v[5],ripple]
==>[v[1],v[4],v[3],lop]
gremlin> g.V().both().path().by('age') //// (1)
==>[29,27]
==>[29,32]
==>[27,29]
==>[32,29]
g.V().out().out().values('name')
g.V().out().out().values('name').path()
g.V().both().path().by('age') //1
-
The "age" property is not productive for all vertices and therefore those values are filtered.
If edges are required in the path, then be sure to traverse those edges explicitly.
gremlin> g.V().outE().inV().outE().inV().path()
==>[v[1],e[8][1-knows->4],v[4],e[10][4-created->5],v[5]]
==>[v[1],e[8][1-knows->4],v[4],e[11][4-created->3],v[3]]
g.V().outE().inV().outE().inV().path()
It is possible to post-process the elements of the path in a round-robin fashion via by()
.
gremlin> g.V().out().out().path().by('name').by('age')
==>[marko,32,ripple]
==>[marko,32,lop]
g.V().out().out().path().by('name').by('age')
Finally, because by()
-based post-processing, nothing prevents triggering yet another traversal. In the traversal
below, for each element of the path traversed thus far, if its a person (as determined by having an age
-property),
then get all of their creations, else if its a creation, get all the people that created it.
gremlin> g.V().out().out().path().by(
choose(hasLabel('person'),
out('created').values('name'),
__.in('created').values('name')).fold())
==>[[lop],[ripple,lop],[josh]]
==>[[lop],[ripple,lop],[marko,josh,peter]]
g.V().out().out().path().by(
choose(hasLabel('person'),
out('created').values('name'),
__.in('created').values('name')).fold())
gremlin> g.V().has('person','name','vadas').as('e').
in('knows').
out('knows').where(neq('e')).
path().by('name') //// (1)
==>[vadas,marko,josh]
gremlin> g.V().has('person','name','vadas').as('e').
in('knows').as('m').
out('knows').where(neq('e')).
path().to('m').by('name') //// (2)
==>[vadas,marko]
gremlin> g.V().has('person','name','vadas').as('e').
in('knows').as('m').
out('knows').where(neq('e')).
path().from('m').by('name') //// (3)
==>[marko,josh]
g.V().has('person','name','vadas').as('e').
in('knows').
out('knows').where(neq('e')).
path().by('name') //// (1)
g.V().has('person','name','vadas').as('e').
in('knows').as('m').
out('knows').where(neq('e')).
path().to('m').by('name') //// (2)
g.V().has('person','name','vadas').as('e').
in('knows').as('m').
out('knows').where(neq('e')).
path().from('m').by('name') //3
-
Obtain the full path from vadas to josh.
-
Save the middle node, marko, and use the
to()
modulator to show only the path from vadas to marko -
Use the
from()
mdoulator to show only the path from marko to josh
Warning
|
Generating path information is expensive as the history of the traverser is stored into a Java list. With
numerous traversers, there are numerous lists. Moreover, in an OLAP GraphComputer environment
this becomes exceedingly prohibitive as there are traversers emanating from all vertices in the graph in parallel.
In OLAP there are optimizations provided for traverser populations, but when paths are calculated (and each traverser
is unique due to its history), then these optimizations are no longer possible.
|
Path Data Structure
The Path
data structure is an ordered list of objects, where each object is associated to a Set<String>
of
labels. An example is presented below to demonstrate both the Path
API as well as how a traversal yields labeled paths.
gremlin> path = g.V(1).as('a').has('name').as('b').
out('knows').out('created').as('c').
has('name','ripple').values('name').as('d').
identity().as('e').path().next()
==>v[1]
==>v[4]
==>v[5]
==>ripple
gremlin> path.size()
==>4
gremlin> path.objects()
==>v[1]
==>v[4]
==>v[5]
==>ripple
gremlin> path.labels()
==>[b,a]
==>[]
==>[c]
==>[d,e]
gremlin> path.a
==>v[1]
gremlin> path.b
==>v[1]
gremlin> path.c
==>v[5]
gremlin> path.d == path.e
==>true
path = g.V(1).as('a').has('name').as('b').
out('knows').out('created').as('c').
has('name','ripple').values('name').as('d').
identity().as('e').path().next()
path.size()
path.objects()
path.labels()
path.a
path.b
path.c
path.d == path.e
Additional References
PeerPressure Step
The peerPressure()
-step (map/sideEffect) clusters vertices using PeerPressureVertexProgram
.
Important
|
The peerPressure() -step is a VertexComputing -step and as such, can only be used against a graph that supports GraphComputer (OLAP).
|
gremlin> g = traversal().withEmbedded(graph).withComputer()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], graphcomputer]
gremlin> g.V().peerPressure().with(PeerPressure.propertyName, 'cluster').values('cluster')
==>1
==>1
==>1
==>1
==>1
==>6
gremlin> g.V().hasLabel('person').
peerPressure().
with(PeerPressure.propertyName, 'cluster').
group().
by('cluster').
by('name')
==>[1:[vadas,marko,josh],6:[peter]]
g = traversal().withEmbedded(graph).withComputer()
g.V().peerPressure().with(PeerPressure.propertyName, 'cluster').values('cluster')
g.V().hasLabel('person').
peerPressure().
with(PeerPressure.propertyName, 'cluster').
group().
by('cluster').
by('name')
Note the use of the with()
modulating step which provides configuration options to the algorithm. It takes
configuration keys from the PeerPressure
class and is automatically imported to the Gremlin Console.
Additional References
Product Step
The product()
-step (map) calculates the cartesian product between the incoming list traverser and the provided list
argument. This step only expects list data (array or Iterable) and will throw an IllegalArgumentException
if any
other type is encountered (including null
).
gremlin> g.V().values("name").fold().product(["james","jen"])
==>[[marko,james],[marko,jen],[vadas,james],[vadas,jen],[lop,james],[lop,jen],[josh,james],[josh,jen],[ripple,james],[ripple,jen],[peter,james],[peter,jen]]
gremlin> g.V().values("name").fold().product(__.V().has("age").limit(1).values("age").fold())
==>[[marko,29],[vadas,29],[lop,29],[josh,29],[ripple,29],[peter,29]]
g.V().values("name").fold().product(["james","jen"])
g.V().values("name").fold().product(__.V().has("age").limit(1).values("age").fold())
Additional References
Profile Step
The profile()
-step (sideEffect) exists to allow developers to profile their traversals to determine statistical
information like step runtime, counts, etc.
Warning
|
Profiling a Traversal will impede the Traversal’s performance. This overhead is mostly excluded from the profile results, but durations are not exact. Thus, durations are best considered in relation to each other. |
gremlin> g.V().out('created').repeat(both()).times(3).hasLabel('person').values('age').sum().profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[]) 6 6 0.084 22.51
VertexStep(OUT,[created],vertex) 4 4 0.044 11.97
NoOpBarrierStep(2500) 4 2 0.026 6.94
VertexStep(BOTH,vertex) 10 4 0.012 3.44
NoOpBarrierStep(2500) 10 3 0.013 3.68
VertexStep(BOTH,vertex) 24 7 0.016 4.40
NoOpBarrierStep(2500) 24 5 0.018 4.92
VertexStep(BOTH,vertex) 58 11 0.022 5.88
NoOpBarrierStep(2500) 58 6 0.030 8.03
HasStep([~label.eq(person)]) 48 4 0.027 7.34
PropertiesStep([age],value) 48 4 0.024 6.41
SumGlobalStep 1 1 0.054 14.48
>TOTAL - - 0.375 -
g.V().out('created').repeat(both()).times(3).hasLabel('person').values('age').sum().profile()
The profile()
-step generates a TraversalMetrics
sideEffect object that contains the following information:
-
Step
: A step within the traversal being profiled. -
Count
: The number of represented traversers that passed through the step. -
Traversers
: The number of traversers that passed through the step. -
Time (ms)
: The total time the step was actively executing its behavior. -
% Dur
: The percentage of total time spent in the step.
It is important to understand the difference between "Count"
and "Traversers". Traversers can be merged and as such, when two traversers are "the same" they may be aggregated
into a single traverser. That new traverser has a Traverser.bulk()
that is the sum of the two merged traverser
bulks. On the other hand, the Count
represents the sum of all Traverser.bulk()
results and thus, expresses the
number of "represented" (not enumerated) traversers. Traversers
will always be less than or equal to Count
.
For traversal compilation information, please see explain()
-step.
Additional References
Project Step
The project()
-step (map) projects the current object into a Map<String,Object>
keyed by provided labels. It is similar
to select()
-step, save that instead of retrieving and modulating historic traverser state, it modulates
the current state of the traverser.
gremlin> g.V().has('name','marko').
project('id', 'name', 'out', 'in').
by(id).
by('name').
by(outE().count()).
by(inE().count())
==>[id:1,name:marko,out:3,in:0]
gremlin> g.V().has('name','marko').
project('name', 'friendsNames').
by('name').
by(out('knows').values('name').fold())
==>[name:marko,friendsNames:[vadas,josh]]
gremlin> g.V().out('created').
project('a','b').
by('name').
by(__.in('created').count()).
order().by(select('b'),desc).
select('a')
==>lop
==>lop
==>lop
==>ripple
gremlin> g.V().project('n','a').by('name').by('age') //// (1)
==>[n:marko,a:29]
==>[n:vadas,a:27]
==>[n:lop]
==>[n:josh,a:32]
==>[n:ripple]
==>[n:peter,a:35]
g.V().has('name','marko').
project('id', 'name', 'out', 'in').
by(id).
by('name').
by(outE().count()).
by(inE().count())
g.V().has('name','marko').
project('name', 'friendsNames').
by('name').
by(out('knows').values('name').fold())
g.V().out('created').
project('a','b').
by('name').
by(__.in('created').count()).
order().by(select('b'),desc).
select('a')
g.V().project('n','a').by('name').by('age') //1
-
The "age" property is not productive for all vertices and therefore those values are filtered and the key not present in the
Map
.
Additional References
Program Step
The program()
-step (map/sideEffect) is the "lambda" step for GraphComputer
jobs. The step takes a
VertexProgram
as an argument and will process the incoming graph accordingly. Thus, the user
can create their own VertexProgram
and have it execute within a traversal. The configuration provided to the
vertex program includes:
-
gremlin.vertexProgramStep.rootTraversal
is a serialization of aPureTraversal
form of the root traversal. -
gremlin.vertexProgramStep.stepId
is the step string id of theprogram()
-step being executed.
The user supplied VertexProgram
can leverage that information accordingly within their vertex program. Example uses
are provided below.
Warning
|
Developing a VertexProgram is for expert users. Moreover, developing one that can be used effectively within
a traversal requires yet more expertise. This information is recommended to advanced users with a deep understanding of the
mechanics of Gremlin OLAP (GraphComputer ).
|
private TraverserSet<Object> haltedTraversers;
public void loadState(Graph graph, Configuration configuration) {
VertexProgram.super.loadState(graph, configuration);
this.traversal = PureTraversal.loadState(configuration, VertexProgramStep.ROOT_TRAVERSAL, graph);
this.programStep = new TraversalMatrix<>(this.traversal.get()).getStepById(configuration.getString(ProgramVertexProgramStep.STEP_ID));
// if the traversal sideEffects will be used in the computation, add them as memory compute keys
this.memoryComputeKeys.addAll(MemoryTraversalSideEffects.getMemoryComputeKeys(this.traversal.get()));
// if master-traversal traversers may be propagated, create a memory compute key
this.memoryComputeKeys.add(MemoryComputeKey.of(TraversalVertexProgram.HALTED_TRAVERSERS, Operator.addAll, false, false));
// returns an empty traverser set if there are no halted traversers
this.haltedTraversers = TraversalVertexProgram.loadHaltedTraversers(configuration);
}
public void storeState(Configuration configuration) {
VertexProgram.super.storeState(configuration);
// if halted traversers is null or empty, it does nothing
TraversalVertexProgram.storeHaltedTraversers(configuration, this.haltedTraversers);
}
public void setup(Memory memory) {
if(!this.haltedTraversers.isEmpty()) {
// do what you like with the halted master traversal traversers
}
// once used, no need to keep that information around (master)
this.haltedTraversers = null;
}
public void execute(Vertex vertex, Messenger messenger, Memory memory) {
// once used, no need to keep that information around (workers)
if(null != this.haltedTraversers)
this.haltedTraversers = null;
if(vertex.property(TraversalVertexProgram.HALTED_TRAVERSERS).isPresent()) {
// haltedTraversers in execute() represent worker-traversal traversers
// for example, from a traversal of the form g.V().out().program(...)
TraverserSet<Object> haltedTraversers = vertex.value(TraversalVertexProgram.HALTED_TRAVERSERS);
// create a new halted traverser set that can be used by the next OLAP job in the chain
// these are worker-traversers that are distributed throughout the graph
TraverserSet<Object> newHaltedTraversers = new TraverserSet<>();
haltedTraversers.forEach(traverser -> {
newHaltedTraversers.add(traverser.split(traverser.get().toString(), this.programStep));
});
vertex.property(VertexProperty.Cardinality.single, TraversalVertexProgram.HALTED_TRAVERSERS, newHaltedTraversers);
// it is possible to create master-traversers that are localized to the master traversal (this is how results are ultimately delivered back to the user)
memory.add(TraversalVertexProgram.HALTED_TRAVERSERS,
new TraverserSet<>(this.traversal().get().getTraverserGenerator().generate("an example", this.programStep, 1l)));
}
public boolean terminate(Memory memory) {
// the master-traversal will have halted traversers
assert memory.exists(TraversalVertexProgram.HALTED_TRAVERSERS);
TraverserSet<String> haltedTraversers = memory.get(TraversalVertexProgram.HALTED_TRAVERSERS);
// it will only have the traversers sent to the master traversal via memory
assert haltedTraversers.stream().map(Traverser::get).filter(s -> s.equals("an example")).findAny().isPresent();
// it will not contain the worker traversers distributed throughout the vertices
assert !haltedTraversers.stream().map(Traverser::get).filter(s -> !s.equals("an example")).findAny().isPresent();
return true;
}
Note
|
The test case ProgramTest in gremlin-test has an example vertex program called TestProgram that demonstrates
all the various ways in which traversal and traverser information is propagated within a vertex program and ultimately
usable by other vertex programs (including TraversalVertexProgram ) down the line in an OLAP compute chain.
|
Finally, an example is provided using PageRankVertexProgram
which doesn’t use pageRank()
-step.
gremlin> g = traversal().withEmbedded(graph).withComputer()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], graphcomputer]
gremlin> g.V().hasLabel('person').
program(PageRankVertexProgram.build().property('rank').create(graph)).
order().by('rank', asc).
elementMap('name', 'rank')
==>[id:4,label:person,name:josh,rank:0.14598540152719103]
==>[id:1,label:person,name:marko,rank:0.11375510357865537]
==>[id:2,label:person,name:vadas,rank:0.14598540152719103]
==>[id:6,label:person,name:peter,rank:0.11375510357865537]
g = traversal().withEmbedded(graph).withComputer()
g.V().hasLabel('person').
program(PageRankVertexProgram.build().property('rank').create(graph)).
order().by('rank', asc).
elementMap('name', 'rank')
Properties Step
The properties()
-step (map) extracts properties from an Element
in the traversal stream.
gremlin> g.V(1).properties()
==>vp[name->marko]
==>vp[location->san diego]
==>vp[location->santa cruz]
==>vp[location->brussels]
==>vp[location->santa fe]
gremlin> g.V(1).properties('location').valueMap()
==>[startTime:1997,endTime:2001]
==>[startTime:2001,endTime:2004]
==>[startTime:2004,endTime:2005]
==>[startTime:2005]
gremlin> g.V(1).properties('location').has('endTime').valueMap()
==>[startTime:1997,endTime:2001]
==>[startTime:2001,endTime:2004]
==>[startTime:2004,endTime:2005]
g.V(1).properties()
g.V(1).properties('location').valueMap()
g.V(1).properties('location').has('endTime').valueMap()
Additional References
Property Step
The property()
-step is used to add properties to the elements of the graph (sideEffect). Unlike addV()
and
addE()
, property()
is a full sideEffect step in that it does not return the property it created, but the element
that streamed into it. Moreover, if property()
follows an addV()
or addE()
, then it is "folded" into the
previous step to enable vertex and edge creation with all its properties in one creation operation.
gremlin> g.V(1).property('country','usa')
==>v[1]
gremlin> g.V(1).property('city','santa fe').property('state','new mexico').valueMap()
==>[country:[usa],city:[santa fe],name:[marko],state:[new mexico],age:[29]]
gremlin> g.V(1).property(['city': 'santa fe', 'state': 'new mexico']) //// (1)
==>v[1]
gremlin> g.V(1).property(list,'age',35) //// (2)
==>v[1]
gremlin> g.V(1).property(list, ['city': 'santa fe', 'state': 'new mexico']) //// (3)
==>v[1]
gremlin> g.V(1).valueMap()
==>[country:[usa],city:[santa fe,santa fe],name:[marko],state:[new mexico,new mexico],age:[29,35]]
gremlin> g.V(1).property(list, ['age': single(36), 'city': 'wilmington', 'state': 'delaware']) //// (4)
==>v[1]
gremlin> g.V(1).valueMap()
==>[country:[usa],city:[santa fe,santa fe,wilmington],name:[marko],state:[new mexico,new mexico,delaware],age:[36]]
gremlin> g.V(1).property('friendWeight',outE('knows').values('weight').sum(),'acl','private') //// (5)
==>v[1]
gremlin> g.V(1).properties('friendWeight').valueMap() //// (6)
==>[acl:private]
gremlin> g.addV().property(T.label,'person').valueMap().with(WithOptions.tokens) //// (7)
==>[id:13,label:person]
gremlin> g.addV().property(null) //// (8)
==>v[14]
gremlin> g.addV().property(set, null)
==>v[15]
g.V(1).property('country','usa')
g.V(1).property('city','santa fe').property('state','new mexico').valueMap()
g.V(1).property(['city': 'santa fe', 'state': 'new mexico']) //// (1)
g.V(1).property(list,'age',35) //// (2)
g.V(1).property(list, ['city': 'santa fe', 'state': 'new mexico']) //// (3)
g.V(1).valueMap()
g.V(1).property(list, ['age': single(36), 'city': 'wilmington', 'state': 'delaware']) //// (4)
g.V(1).valueMap()
g.V(1).property('friendWeight',outE('knows').values('weight').sum(),'acl','private') //// (5)
g.V(1).properties('friendWeight').valueMap() //// (6)
g.addV().property(T.label,'person').valueMap().with(WithOptions.tokens) //// (7)
g.addV().property(null) //// (8)
g.addV().property(set, null)
-
Properties can also take a
Map
as an argument. -
For vertices, a cardinality can be provided for vertex properties.
-
If a cardinality is specified for a
Map
then that cardinality will be used for all properties in the map. -
Assign the
Cardinality
individually to override the specifiedlist
or the default cardinality if not specified. -
It is possible to select the property value (as well as key) via a traversal.
-
For vertices, the
property()
-step can add meta-properties. -
The label value can be specified as a property only at the time a vertex is added and if one is not specified in the addV()
-
If you pass a
null
value for the Map this will be treated as a no-op and the input will be returned
Additional References
PropertyMap Step
The propertiesMap()
-step yields a Map representation of the properties of an element.
gremlin> g.V().propertyMap()
==>[name:[vp[name->marko]],age:[vp[age->29]]]
==>[name:[vp[name->vadas]],age:[vp[age->27]]]
==>[name:[vp[name->lop]],lang:[vp[lang->java]]]
==>[name:[vp[name->josh]],age:[vp[age->32]]]
==>[name:[vp[name->ripple]],lang:[vp[lang->java]]]
==>[name:[vp[name->peter]],age:[vp[age->35]]]
gremlin> g.V().propertyMap('age')
==>[age:[vp[age->29]]]
==>[age:[vp[age->27]]]
==>[]
==>[age:[vp[age->32]]]
==>[]
==>[age:[vp[age->35]]]
gremlin> g.V().propertyMap('age','blah')
==>[age:[vp[age->29]]]
==>[age:[vp[age->27]]]
==>[]
==>[age:[vp[age->32]]]
==>[]
==>[age:[vp[age->35]]]
gremlin> g.E().propertyMap()
==>[weight:p[weight->0.5]]
==>[weight:p[weight->1.0]]
==>[weight:p[weight->0.4]]
==>[weight:p[weight->1.0]]
==>[weight:p[weight->0.4]]
==>[weight:p[weight->0.2]]
g.V().propertyMap()
g.V().propertyMap('age')
g.V().propertyMap('age','blah')
g.E().propertyMap()
Additional References
Range Step
As traversers propagate through the traversal, it is possible to only allow a certain number of them to pass through
with range()
-step (filter). When the low-end of the range is not met, objects are continued to be iterated. When
within the low (inclusive) and high (exclusive) range, traversers are emitted. When above the high range, the traversal
breaks out of iteration. Finally, the use of -1
on the high range will emit remaining traversers after the low range
begins.
gremlin> g.V().range(0,3)
==>v[1]
==>v[2]
==>v[3]
gremlin> g.V().range(1,3)
==>v[2]
==>v[3]
gremlin> g.V().range(1, -1)
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
gremlin> g.V().repeat(both()).times(1000000).emit().range(6,10)
==>v[1]
==>v[5]
==>v[3]
==>v[1]
g.V().range(0,3)
g.V().range(1,3)
g.V().range(1, -1)
g.V().repeat(both()).times(1000000).emit().range(6,10)
The range()
-step can also be applied with Scope.local
, in which case it operates on the incoming collection.
For example, it is possible to produce a Map<String, String>
for each traversed path, but containing only the second
property value (the "b" step).
gremlin> g.V().as('a').out().as('b').in().as('c').select('a','b','c').by('name').range(local,1,2)
==>[b:lop]
==>[b:lop]
==>[b:lop]
==>[b:vadas]
==>[b:josh]
==>[b:ripple]
==>[b:lop]
==>[b:lop]
==>[b:lop]
==>[b:lop]
==>[b:lop]
==>[b:lop]
g.V().as('a').out().as('b').in().as('c').select('a','b','c').by('name').range(local,1,2)
The next example uses the The Crew toy data set. It produces a List<String>
containing the
second and third location for each vertex.
gremlin> g.V().valueMap().select('location').range(local, 1, 3)
==>[santa cruz,brussels]
==>[dulles,purcellville]
==>[baltimore,oakland]
==>[kaiserslautern,aachen]
g.V().valueMap().select('location').range(local, 1, 3)
Additional References
Read Step
The read()
-step is not really a "step" but a step modulator in that it modifies the functionality of the io()
-step.
More specifically, it tells the io()
-step that it is expected to use its configuration to read data from some
location. Please see the documentation for io()
-step for more complete details on usage.
Additional References
Repeat Step
The repeat()
-step (branch) is used for looping over a traversal given some break predicate. Below are some
examples of repeat()
-step in action.
gremlin> g.V(1).repeat(out()).times(2).path().by('name') //// (1)
==>[marko,josh,ripple]
==>[marko,josh,lop]
gremlin> g.V().until(has('name','ripple')).
repeat(out()).path().by('name') //// (2)
==>[marko,josh,ripple]
==>[josh,ripple]
==>[ripple]
g.V(1).repeat(out()).times(2).path().by('name') //// (1)
g.V().until(has('name','ripple')).
repeat(out()).path().by('name') //2
-
do-while semantics stating to do
out()
2 times. -
while-do semantics stating to break if the traverser is at a vertex named "ripple".
Important
|
There are two modulators for repeat() : until() and emit() . If until() comes after repeat() it is
do/while looping. If until() comes before repeat() it is while/do looping. If emit() is placed after repeat() ,
it is evaluated on the traversers leaving the repeat-traversal. If emit() is placed before repeat() , it is
evaluated on the traversers prior to entering the repeat-traversal.
|
The repeat()
-step also supports an "emit predicate", where the predicate for an empty argument emit()
is
true
(i.e. emit() == emit{true}
). With emit()
, the traverser is split in two — the traverser exits the code
block as well as continues back within the code block (assuming until()
holds true).
gremlin> g.V(1).repeat(out()).times(2).emit().path().by('name') //// (1)
==>[marko,lop]
==>[marko,vadas]
==>[marko,josh]
==>[marko,josh,ripple]
==>[marko,josh,lop]
gremlin> g.V(1).emit().repeat(out()).times(2).path().by('name') //// (2)
==>[marko]
==>[marko,lop]
==>[marko,vadas]
==>[marko,josh]
==>[marko,josh,ripple]
==>[marko,josh,lop]
g.V(1).repeat(out()).times(2).emit().path().by('name') //// (1)
g.V(1).emit().repeat(out()).times(2).path().by('name') //2
-
The
emit()
comes afterrepeat()
and thus, emission happens after therepeat()
traversal is executed. Thus, no one vertex paths exist. -
The
emit()
comes beforerepeat()
and thus, emission happens prior to therepeat()
traversal being executed. Thus, one vertex paths exist.
The emit()
-modulator can take an arbitrary predicate.
gremlin> g.V(1).repeat(out()).times(2).emit(has('lang')).path().by('name')
==>[marko,lop]
==>[marko,josh,ripple]
==>[marko,josh,lop]
g.V(1).repeat(out()).times(2).emit(has('lang')).path().by('name')
gremlin> g.V(1).repeat(out()).times(2).emit().path().by('name')
==>[marko,lop]
==>[marko,vadas]
==>[marko,josh]
==>[marko,josh,ripple]
==>[marko,josh,lop]
g.V(1).repeat(out()).times(2).emit().path().by('name')
The first time through the repeat()
, the vertices lop, vadas, and josh are seen. Given that loops==1
, the
traverser repeats. However, because the emit-predicate is declared true, those vertices are emitted. The next time through
repeat()
, the vertices traversed are ripple and lop (Josh’s created projects, as lop and vadas have no out edges).
Given that loops==2
, the until-predicate fails and ripple and lop are emitted.
Therefore, the traverser has seen the vertices: lop, vadas, josh, ripple, and lop.
repeat()
-steps may be nested inside each other or inside the emit()
or until()
predicates and they can also be 'named' by passing a string as the first parameter to repeat()
. The loop counter of a named repeat step can be accessed within the looped context with loops(loopName)
where loopName
is the name set whe creating the repeat()
-step.
gremlin> g.V(1).
repeat(out("knows")).
until(repeat(out("created")).emit(has("name", "lop"))) //// (1)
==>v[4]
gremlin> g.V(6).
repeat('a', both('created').simplePath()).
emit(repeat('b', both('knows')).
until(loops('b').as('b').where(loops('a').as('b'))).
hasId(2)).dedup() //// (2)
==>v[4]
g.V(1).
repeat(out("knows")).
until(repeat(out("created")).emit(has("name", "lop"))) //// (1)
g.V(6).
repeat('a', both('created').simplePath()).
emit(repeat('b', both('knows')).
until(loops('b').as('b').where(loops('a').as('b'))).
hasId(2)).dedup() //2
-
Starting from vertex 1, keep going taking outgoing 'knows' edges until the vertex was created by 'lop'.
-
Starting from vertex 6, keep taking created edges in either direction until the vertex is same distance from vertex 2 over knows edges as it is from vertex 6 over created edges.
Finally, note that both emit()
and until()
can take a traversal and in such, situations, the predicate is
determined by traversal.hasNext()
. A few examples are provided below.
gremlin> g.V(1).repeat(out()).until(hasLabel('software')).path().by('name') //// (1)
==>[marko,lop]
==>[marko,josh,ripple]
==>[marko,josh,lop]
gremlin> g.V(1).emit(hasLabel('person')).repeat(out()).path().by('name') //// (2)
==>[marko]
==>[marko,vadas]
==>[marko,josh]
gremlin> g.V(1).repeat(out()).until(outE().count().is(0)).path().by('name') //// (3)
==>[marko,lop]
==>[marko,vadas]
==>[marko,josh,ripple]
==>[marko,josh,lop]
g.V(1).repeat(out()).until(hasLabel('software')).path().by('name') //// (1)
g.V(1).emit(hasLabel('person')).repeat(out()).path().by('name') //// (2)
g.V(1).repeat(out()).until(outE().count().is(0)).path().by('name') //3
-
Starting from vertex 1, keep taking outgoing edges until a software vertex is reached.
-
Starting from vertex 1, and in an infinite loop, emit the vertex if it is a person and then traverser the outgoing edges.
-
Starting from vertex 1, keep taking outgoing edges until a vertex is reached that has no more outgoing edges.
Warning
|
The anonymous traversal of emit() and until() (not repeat() ) process their current objects "locally."
In OLAP, where the atomic unit of computing is the vertex and its local "star graph," it is important that the
anonymous traversals do not leave the confines of the vertex’s star graph. In other words, they can not traverse to
an adjacent vertex’s properties or edges.
|
Additional References
Replace Step
The replace()
-step (map) returns a string with the specified characters in the original string replaced with the new
characters. Any null arguments will be a no-op and the original string is returned. Null values from the incoming
traversers are not processed and remain as null when returned. If the incoming traverser is a non-String value then
an IllegalArgumentException
will be thrown.
gremlin> g.inject('that', 'this', 'test', null).replace('h', 'j') //// (1)
==>tjat
==>tjis
==>test
==>null
gremlin> g.inject('hello world').replace(null, 'j') //// (2)
==>hello world
gremlin> g.V().hasLabel("software").values("name").replace("p", "g") //// (3)
==>log
==>riggle
gremlin> g.V().hasLabel("software").values("name").fold().replace(local, "p", "g") //// (4)
==>[log,riggle]
g.inject('that', 'this', 'test', null).replace('h', 'j') //// (1)
g.inject('hello world').replace(null, 'j') //// (2)
g.V().hasLabel("software").values("name").replace("p", "g") //// (3)
g.V().hasLabel("software").values("name").fold().replace(local, "p", "g") //4
-
Replace "h" in the strings with "j".
-
Null inputs are ignored and the original string is returned.
-
Return software names with "p" replaced by "g".
-
Use
Scope.local
to operate on individual string elements inside incoming list, which will return a list.
Additional References
replace(String,String)
replace(Scope,String,String)
Reverse Step
The reverse()
-step (map) returns the reverse of the incoming list traverser. Single values (including null
) are not
processed and are added back to the Traversal Stream unchanged. If the incoming traverser is a String value then the
reversed String will be returned.
gremlin> g.V().values("name").reverse() //// (1)
==>okram
==>sadav
==>pol
==>hsoj
==>elppir
==>retep
gremlin> g.V().values("name").order().fold().reverse() //// (2)
==>[vadas,ripple,peter,marko,lop,josh]
g.V().values("name").reverse() //// (1)
g.V().values("name").order().fold().reverse() //2
-
Reverse the order of the characters in each name.
-
Fold all the names into a list in ascending order and then reverse the list’s ordering (into descending).
RTrim Step
The rTrim()
-step (map) returns a string with trailing whitespace removed. Null values are not processed and remain
as null when returned. If the incoming traverser is a non-String value then an IllegalArgumentException
will be thrown.
gremlin> g.inject(" hello ", " world ", null).rTrim()
==> hello
==> world
==>null
gremlin> g.inject([" hello ", " world ", null]).rTrim(local) //// (1)
==>[ hello, world,null]
g.inject(" hello ", " world ", null).rTrim()
g.inject([" hello ", " world ", null]).rTrim(local) //1
-
Use
Scope.local
to operate on individual string elements inside incoming list, which will return a list.
Sack Step
A traverser can contain a local data structure called a "sack".
The sack()
-step is used to read and write sacks (sideEffect or map). Each sack of each traverser is created
when using GraphTraversal.withSack(initialValueSupplier,splitOperator?,mergeOperator?)
.
-
Initial value supplier: A
Supplier
providing the initial value of each traverser’s sack. -
Split operator: a
UnaryOperator
that clones the traverser’s sack when the traverser splits. If no split operator is provided, thenUnaryOperator.identity()
is assumed. -
Merge operator: A
BinaryOperator
that unites two traverser’s sack when they are merged. If no merge operator is provided, then traversers with sacks can not be merged.
Two trivial examples are presented below to demonstrate the initial value supplier. In the first example below, a
traverser is created at each vertex in the graph (g.V()
), with a 1.0 sack (withSack(1.0f)
), and then the sack
value is accessed (sack()
). In the second example, a random float supplier is used to generate sack values.
gremlin> g.withSack(1.0f).V().sack()
==>1.0
==>1.0
==>1.0
==>1.0
==>1.0
==>1.0
gremlin> rand = new Random()
==>java.util.Random@5d6125d8
gremlin> g.withSack {rand.nextFloat()}.V().sack()
==>0.30364954
==>0.6187649
==>0.046979964
==>0.2031619
==>0.33781552
==>0.66723937
g.withSack(1.0f).V().sack()
rand = new Random()
g.withSack {rand.nextFloat()}.V().sack()
A more complicated initial value supplier example is presented below where the sack values are used in a running
computation and then emitted at the end of the traversal. When an edge is traversed, the edge weight is multiplied
by the sack value (sack(mult).by('weight')
). Note that the by()
-modulator can be any arbitrary traversal.
gremlin> g.withSack(1.0f).V().repeat(outE().sack(mult).by('weight').inV()).times(2)
==>v[5]
==>v[3]
gremlin> g.withSack(1.0f).V().repeat(outE().sack(mult).by('weight').inV()).times(2).sack()
==>1.0
==>0.4
gremlin> g.withSack(1.0f).V().repeat(outE().sack(mult).by('weight').inV()).times(2).path().
by().by('weight')
==>[v[1],1.0,v[4],1.0,v[5]]
==>[v[1],1.0,v[4],0.4,v[3]]
gremlin> g.V().sack(assign).by('age').sack() //// (1)
==>29
==>27
==>32
==>35
g.withSack(1.0f).V().repeat(outE().sack(mult).by('weight').inV()).times(2)
g.withSack(1.0f).V().repeat(outE().sack(mult).by('weight').inV()).times(2).sack()
g.withSack(1.0f).V().repeat(outE().sack(mult).by('weight').inV()).times(2).path().
by().by('weight')
g.V().sack(assign).by('age').sack() //1
-
The "age" property is not productive for all vertices and therefore those values are filtered during the assignment.
When complex objects are used (i.e. non-primitives), then a
split operator should be defined to ensure that each traverser gets a clone of its parent’s sack. The first example
does not use a split operator and as such, the same map is propagated to all traversers (a global data structure). The
second example, demonstrates how Map.clone()
ensures that each traverser’s sack contains a unique, local sack.
gremlin> g.withSack {[:]}.V().out().out().
sack {m,v -> m[v.value('name')] = v.value('lang'); m}.sack() // BAD: single map
==>[ripple:java]
==>[ripple:java,lop:java]
gremlin> g.withSack {[:]}{it.clone()}.V().out().out().
sack {m,v -> m[v.value('name')] = v.value('lang'); m}.sack() // GOOD: cloned map
==>[ripple:java]
==>[lop:java]
g.withSack {[:]}.V().out().out().
sack {m,v -> m[v.value('name')] = v.value('lang'); m}.sack() // BAD: single map
g.withSack {[:]}{it.clone()}.V().out().out().
sack {m,v -> m[v.value('name')] = v.value('lang'); m}.sack() // GOOD: cloned map
Note
|
For primitives (i.e. integers, longs, floats, etc.), a split operator is not required as a primitives are encoded in the memory address of the sack, not as a reference to an object. |
If a merge operator is not provided, then traversers with sacks can not be bulked. However, in many situations,
merging the sacks of two traversers at the same location is algorithmically sound and good to provide so as to gain
the bulking optimization. In the examples below, the binary merge operator is Operator.sum
. Thus, when two traverser
merge, their respective sacks are added together.
gremlin> g.withSack(1.0d).V(1).out('knows').in('knows') //// (1)
==>v[1]
==>v[1]
gremlin> g.withSack(1.0d).V(1).out('knows').in('knows').sack() //// (2)
==>1.0
==>1.0
gremlin> g.withSack(1.0d, sum).V(1).out('knows').in('knows').sack() //// (3)
==>2.0
==>2.0
gremlin> g.withSack(1.0d).V(1).local(outE('knows').barrier(normSack).inV()).in('knows').barrier() //// (4)
==>v[1]
==>v[1]
gremlin> g.withSack(1.0d).V(1).local(outE(