3.2.9
Gremlin Language Variants
Gremlin is an embeddable query language that can be represented using the constructs of a host programming language.
Any programming language that supports function composition
(e.g. fluent chaining) and function nesting (e.g. call stacks)
can support Gremlin. Nearly every modern programming language is capable of meeting both requirements.
With Gremlin, the distinction between a programming language and a query language is not as large as they
have historically been. For instance, with Gremlin-Java, the developer is able to have their application code and their
graph database queries at the same level of abstraction — both written in Java. A simple example is presented below
where the MyApplication
Java class contains both application-level and database-level code written in Java.
Warning
|
This is an advanced tutorial intended for experts knowledgeable in Gremlin in particular and TinkerPop in general. Moreover, the audience should understand advanced programming language concepts such as reflection, meta-programming, source code generation, and virtual machines. |
public class MyApplication {
public static void run(final String[] args) {
// assumes args[0] is a configuration file location
Graph graph = GraphFactory.open(args[0]);
GraphTraversalSource g = graph.traversal();
// assumes that args[1] and args[2] are range boundaries
Iterator<Map<String,Double>> result =
g.V().hasLabel("product").
order().by("unitPrice", incr).
range(Integer.valueOf(args[1]), Integer.valueOf(args[2])).
valueMap("name", "unitPrice")
while(result.hasNext()) {
Map<String,Double> map = result.next();
System.out.println(map.get("name") + " " + map.get("unitPrice"));
}
}
}
In query languages like SQL, the user must construct a string representation of their query and submit it to the database for evaluation. This is because SQL cannot be expressed in Java as they use fundamentally different constructs in their expression. The same example above is presented below using SQL and the JDBC interface. The take home point is that Gremlin does not exist outside the programming language in which it will be used. Gremlin was designed to be able to be embedded in any modern programming language and thus, always free from the complexities of string manipulation as seen in other database and analytics query languages.
public class MyApplication {
public static void run(final String[] args) {
// assumes args[0] is a URI to the database
Connection connection = DriverManager.getConnection(args[0])
Statement statement = connection.createStatement();
// assumes that args[1] and args[2] are range boundaries
ResultSet result = statement.executeQuery(
"SELECT Products.ProductName, Products.UnitPrice \n" +
" FROM (SELECT ROW_NUMBER() \n" +
" OVER ( \n" +
" ORDER BY UnitPrice) AS [ROW_NUMBER], \n" +
" ProductID \n" +
" FROM Products) AS SortedProducts \n" +
" INNER JOIN Products \n" +
" ON Products.ProductID = SortedProducts.ProductID \n" +
" WHERE [ROW_NUMBER] BETWEEN " + args[1] + " AND " + args[2] + " \n" +
"ORDER BY [ROW_NUMBER]"
while(result.hasNext()) {
result.next();
System.out.println(result.getString("Products.ProductName") + " " + result.getDouble("Products.UnitPrice"));
}
}
}
The purpose of this tutorial is to explain how to develop a Gremlin language variant. That is, for those developers that are interested in supporting Gremlin in their native language and there currently does not exist a (good) Gremlin variant in their language, they can develop one for the Apache TinkerPop community (and their language community in general). In this tutorial, Python will serve as the host language and two typical implementation models will be presented.
-
Using Jython and the JVM: This is perhaps the easiest way to produce a Gremlin language variant. With JSR-223, any language compiler written for the JVM can directly access the JVM and any of its libraries (including Gremlin-Java).
-
Using Python and GremlinServer: This model requires that there exist a Python class that mimics Gremlin-Java’s
GraphTraversal
API. With each method call of this Python class, GremlinBytecode
is generated which is ultimately translated into a Gremlin variant that can execute the traversal (e.g. Gremlin-Java).
Important
|
Apache TinkerPop’s Gremlin-Java is considered the idiomatic, standard implementation of Gremlin. Any Gremlin language variant, regardless of the implementation model chosen, must, within the constraints of the host language, be in 1-to-1 correspondence with Gremlin-Java. This ensures that language variants are collectively consistent and easily leveraged by anyone versed in Gremlin. |
Important
|
The "Gremlin-Python" presented in this tutorial is basic and provided to show the primary techniques used to construct a Gremlin language variant. Apache TinkerPop distributes with a full fledged Gremlin-Python variant that uses many of the techniques presented in this tutorial. |
Language Drivers vs. Language Variants
Before discussing how to implement a Gremlin language variant in Python, it is necessary to understand two concepts related to Gremlin language development. There is a difference between a language driver and a language variant and it is important that these two concepts (and their respective implementations) remain separate.
Language Drivers
A Gremlin language driver is a software library that is able to communicate with a TinkerPop-enabled graph system whether directly via the JVM or indirectly via Gremlin Server GremlinServer or some other RemoteConnection enabled graph system. Language drivers are responsible for submitting Gremlin traversals to a TinkerPop-enabled graph system and returning results to the developer that are within the developer’s language’s type system. For instance, resultant doubles should be coerced to floats in Python.
This tutorial is not about language drivers, but about language variants. Moreover, community libraries should make this
distinction clear and should not develop libraries that serve both roles. Language drivers will be useful to a collection
of Gremlin variants within a language community — able to support GraphTraversal
-variants as well as also other
DSL-variants (e.g. SocialTraversal
).
Note
|
GraphTraversal is a particular Gremlin domain-specific language (DSL),
albeit the most popular and foundational DSL. If another DSL is created, then the same techniques discussed in this tutorial
for GraphTraversal apply to XXXTraversal .
|
Language Variants
A Gremlin language variant is a software library that allows a developer
to write a Gremlin traversal within their native programming language. The language variant is responsible for
creating Gremlin Bytecode
that will ultimately be translated and compiled to a Traversal
by a TinkerPop-enabled graph system.
Every language variant, regardless of the implementation details, will have to account for the four core concepts below:
-
Graph
(data): The source of the graph data to be traversed and the interface which enables the creation of aGraphTraversalSource
(viagraph.traversal()
). -
GraphTraversalSource
(compiler): This is the typicalg
reference. AGraphTraversalSource
maintains thewithXXX()
-strategy methods as well as the "traversal spawn"-methods such asV()
,E()
,addV()
, etc. A traversal source’s registeredTraversalStrategies
determine how the submitted traversal will be ultimately evaluated. -
GraphTraversal
(function composition): A graph traversal maintains the computational steps such asout()
,groupCount()
,match()
, etc. This fluent interface supports method chaining and thus, a linear "left-to-right" representation of a traversal/query. -
(function nesting) : The anonymous traversal class is used for passing a traversal as an argument to a parent step. For example, in
repeat(.out())
,__.out()
is an anonymous traversal passed to the traversal parentrepeat()
. Anonymous traversals enable the "top-to-bottom" representation of a traversal. -
Bytecode
(language agnostic encoding): The source and traversal steps and their arguments are encoded in a language agnostic representation called Gremlin bytecode. This representation is a nested list of the form[step,[args*]]*
.
Both GraphTraversal
and _
define the structure of the Gremlin language. Gremlin is a _two-dimensional language supporting
linear, nested step sequences. Historically, many Gremlin language variants have failed to make the distinctions above clear
and in doing so, either complicate their implementations or yield variants that are not in 1-to-1 correspondence with Gremlin-Java.
By keeping these concepts clear when designing a language variant, the construction of the Gremlin bytecode representation is
easy.
Important
|
The term "Gremlin-Java" denotes the language that is defined by GraphTraversalSource , GraphTraversal ,
and __ . These three classes exist in org.apache.tinkerpop.gremlin.process.traversal.dsl.graph and form the definitive
representation of the Gremlin traversal language.
|
Gremlin-Jython and Gremlin-Python
Using Jython and the JVM
Jython provides a
JSR-223 ScriptEngine
implementation that enables the evaluation of
Python on the Java virtual machine. In other words, Jython’s
virtual machine is not the standard CPython reference implementation
distributed with most operating systems, but instead the JVM. The benefit of Jython is that Python code and classes
can easily interact with the Java API and any Java packages on the CLASSPATH
. In general, any JSR-223 Gremlin language
variant is trivial to "implement."
Jython 2.7.0 (default:9987c746f838, Apr 29 2015, 02:25:11)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.8.0_40
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
# this list is longer than displayed, including all jars in lib/, not just Apache TinkerPop jars
# there is probably a more convenient way of importing jars in Jython though, at the time of writing, no better solution was found.
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.9-standalone/lib/gremlin-console-3.2.9.jar")
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.9-standalone/lib/gremlin-core-3.2.9.jar")
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.9-standalone/lib/gremlin-driver-3.2.9.jar")
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.9-standalone/lib/gremlin-shaded-3.2.9.jar")
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.9-standalone/ext/tinkergraph-gremlin/lib/tinkergraph-gremlin-3.2.9.jar")
# import Java classes
>>> from org.apache.tinkerpop.gremlin.tinkergraph.structure import TinkerFactory
>>> from org.apache.tinkerpop.gremlin.process.traversal.dsl.graph import __
>>> from org.apache.tinkerpop.gremlin.process.traversal import *
>>> from org.apache.tinkerpop.gremlin.structure import *
# create the toy "modern" graph and spawn a GraphTraversalSource
>>> graph = TinkerFactory.createModern()
>>> g = graph.traversal()
# The Jython shell does not automatically iterate Iterators like the GremlinConsole
>>> g.V().hasLabel("person").out("knows").out("created")
[GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,[knows],vertex), VertexStep(OUT,[created],vertex)]
# toList() will do the iteration and return the results as a list
>>> g.V().hasLabel("person").out("knows").out("created").toList()
[v[5], v[3]]
>>> g.V().repeat(__.out()).times(2).values("name").toList()
[ripple, lop]
# results can be interacted with using Python
>>> g.V().repeat(__.out()).times(2).values("name").toList()[0]
u'ripple'
>>> g.V().repeat(__.out()).times(2).values("name").toList()[0][0:3].upper()
u'RIP'
>>>
Most every JSR-223 ScriptEngine
language will allow the developer to immediately interact with GraphTraversal
.
The benefit of this model is that nearly every major programming language has a respective ScriptEngine
:
JavaScript, Groovy,
Scala, Lisp (Clojure), Ruby, etc. A
list of implementations is provided here.
Traversal Wrappers
While it is possible to simply interact with Java classes in a ScriptEngine
implementation, such Gremlin language variants
will not leverage the unique features of the host language. It is for this reason that JVM-based language variants such as
Gremlin-Scala were developed. Scala provides many syntax niceties not
available in Java. To leverage these niceties, Gremlin-Scala "wraps" GraphTraversal
in order to provide Scala-idiomatic extensions.
Another example is Apache TinkerPop’s Gremlin-Groovy which does the same via the
Sugar plugin, but uses
meta-programming instead of object wrapping, where "behind the scenes,"
Groovy meta-programming is doing object wrapping.
The Jython example below uses Python meta-programming to add functionality to GraphTraversal
.
In particular, the getitem
and getattr
"magic methods" are leveraged.
def getitem_bypass(self, index):
if isinstance(index,int):
return self.range(index,index+1)
elif isinstance(index,slice):
return self.range(index.start,index.stop)
else:
return TypeError('Index must be int or slice')");
GraphTraversal.__getitem__ = getitem_bypass
GraphTraversal.__getattr__ = lambda self, key: self.values(key)
The two methods getitem
and getattr
support Python slicing and object attribute interception, respectively.
In this way, the host language is able to use its native constructs in a meaningful way within a Gremlin traversal.
Important
|
Gremlin-Java serves as the standard/default representation of the Gremlin traversal language. Any Gremlin
language variant must provide all the same functionality (methods) as GraphTraversal , but can extend it
with host language specific constructs. This means that the extensions must compile to GraphTraversal -specific
steps. A Gremlin language variant should not add steps/methods that do not exist in GraphTraversal . If an extension
is desired, the language variant designer should submit a proposal to Apache TinkerPop
to have the extension added to a future release of Gremlin.
|
Using Python and RemoteConnection
The JVM is a powerful piece of technology that has, over the years, become a meeting ground for developers from numerous language communities. However, not all applications will use the JVM. Given that Apache TinkerPop is a Java-framework, there must be a way for two different virtual machines to communicate traversals and their results. This section presents the second Gremlin language variant implementation model which does just that.
Note
|
Apache TinkerPop is a JVM-based graph computing framework. Most graph databases and processors today are built on the JVM. This makes it easy for these graph system providers to implement Apache TinkerPop. However, TinkerPop is more than its graph API and tools — it is also the Gremlin traversal machine and language. While Apache’s Gremlin traversal machine was written for the JVM, its constructs are simple and can/should be ported to other VMs for those graph systems that are not JVM-based. A theoretical review of the concepts behind the Gremlin traversal machine is provided in this article. |
This section’s Gremlin language variant design model does not leverage the JVM directly. Instead, it constructs a Bytecode
representation of a Traversal
that will ultimately be evaluated by RemoteConnection
(e.g. GremlinServer).
It is up to the language variant designer to choose a language driver to use for submitting the generated bytecode and
coercing its results. The language driver is the means by which, for this example, the CPython
VM communicates with the JVM.
# sudo easy_install pip
$ pip install gremlinpython
The Groovy source code below uses Java reflection to generate a Python class that is in 1-to-1 correspondence with Gremlin-Java.
class GraphTraversalSourceGenerator {
public static void create(final String graphTraversalSourceFile) {
final StringBuilder pythonClass = new StringBuilder()
pythonClass.append("from .traversal import Traversal\n")
pythonClass.append("from .traversal import TraversalStrategies\n")
pythonClass.append("from .traversal import Bytecode\n")
pythonClass.append("from ..driver.remote_connection import RemoteStrategy\n")
pythonClass.append("from .. import statics\n\n")
//////////////////////////
// GraphTraversalSource //
//////////////////////////
pythonClass.append(
"""class GraphTraversalSource(object):
def __init__(self, graph, traversal_strategies, bytecode=None):
self.graph = graph
self.traversal_strategies = traversal_strategies
if bytecode is None:
bytecode = Bytecode()
self.bytecode = bytecode
def __repr__(self):
return "graphtraversalsource[" + str(self.graph) + "]"
""")
GraphTraversalSource.getMethods(). // SOURCE STEPS
findAll { GraphTraversalSource.class.equals(it.returnType) }.
findAll {
!it.name.equals("clone") &&
!it.name.equals(TraversalSource.Symbols.withRemote)
}.
collect { SymbolHelper.toPython(it.name) }.
unique().
sort { a, b -> a <=> b }.
forEach { method ->
pythonClass.append(
""" def ${method}(self, *args):
source = GraphTraversalSource(self.graph, TraversalStrategies(self.traversal_strategies), Bytecode(self.bytecode))
source.bytecode.add_source("${SymbolHelper.toJava(method)}", *args)
return source
""")
}
pythonClass.append(
""" def withRemote(self, remote_connection):
source = GraphTraversalSource(self.graph, TraversalStrategies(self.traversal_strategies), Bytecode(self.bytecode))
source.traversal_strategies.add_strategies([RemoteStrategy(remote_connection)])
return source
""")
GraphTraversalSource.getMethods(). // SPAWN STEPS
findAll { GraphTraversal.class.equals(it.returnType) }.
collect { SymbolHelper.toPython(it.name) }.
unique().
sort { a, b -> a <=> b }.
forEach { method ->
pythonClass.append(
""" def ${method}(self, *args):
traversal = GraphTraversal(self.graph, self.traversal_strategies, Bytecode(self.bytecode))
traversal.bytecode.add_step("${SymbolHelper.toJava(method)}", *args)
return traversal
""")
}
pythonClass.append("\n\n")
////////////////////
// GraphTraversal //
////////////////////
pythonClass.append(
"""class GraphTraversal(Traversal):
def __init__(self, graph, traversal_strategies, bytecode):
Traversal.__init__(self, graph, traversal_strategies, bytecode)
def __getitem__(self, index):
if isinstance(index, int):
return self.range(index, index + 1)
elif isinstance(index, slice):
return self.range(index.start, index.stop)
else:
raise TypeError("Index must be int or slice")
def __getattr__(self, key):
return self.values(key)
""")
GraphTraversal.getMethods().
findAll { GraphTraversal.class.equals(it.returnType) }.
findAll { !it.name.equals("clone") }.
collect { SymbolHelper.toPython(it.name) }.
unique().
sort { a, b -> a <=> b }.
forEach { method ->
pythonClass.append(
""" def ${method}(self, *args):
self.bytecode.add_step("${SymbolHelper.toJava(method)}", *args)
return self
""")
};
pythonClass.append("\n\n")
////////////////////////
// AnonymousTraversal //
////////////////////////
pythonClass.append("class __(object):\n");
__.class.getMethods().
findAll { GraphTraversal.class.equals(it.returnType) }.
findAll { Modifier.isStatic(it.getModifiers()) }.
collect { SymbolHelper.toPython(it.name) }.
unique().
sort { a, b -> a <=> b }.
forEach { method ->
pythonClass.append(
""" @staticmethod
def ${method}(*args):
return GraphTraversal(None, None, Bytecode()).${method}(*args)
""")
};
pythonClass.append("\n\n")
// add to gremlin.python.statics
__.class.getMethods().
findAll { GraphTraversal.class.equals(it.returnType) }.
findAll { Modifier.isStatic(it.getModifiers()) }.
findAll { !it.name.equals("__") }.
collect { SymbolHelper.toPython(it.name) }.
unique().
sort { a, b -> a <=> b }.
forEach {
pythonClass.append("def ${it}(*args):\n").append(" return __.${it}(*args)\n\n")
pythonClass.append("statics.add_static('${it}', ${it})\n\n")
}
pythonClass.append("\n\n")
// save to a python file
final File file = new File(graphTraversalSourceFile);
file.delete()
pythonClass.eachLine { file.append(it + "\n") }
}
}
When the above Groovy script is evaluated (e.g. in GremlinConsole), Gremlin-Python is born. The generated Python
file is available at graph_traversal.py.
It is important to note that there is a bit more to Gremlin-Python in that there also exists Python implementations of TraversalStrategies
, Traversal
, Bytecode
, etc.
Please review the full implementation of Gremlin-Python here.
Note
|
In practice, TinkerPop uses the Groovy’s GStringTemplateEngine to help with the code generation task described
above and automates that generation as part of the standard build with Maven using the gmavenplus-plugin . See the
gremlin-python pom.xml for more details.
|
Of particular importance is Gremlin-Python’s implementation of Bytecode
.
class Bytecode(object):
def __init__(self, bytecode=None):
self.source_instructions = []
self.step_instructions = []
self.bindings = {}
if bytecode is not None:
self.source_instructions = list(bytecode.source_instructions)
self.step_instructions = list(bytecode.step_instructions)
def add_source(self, source_name, *args):
newArgs = ()
for arg in args:
newArgs = newArgs + (self.__convertArgument(arg),)
self.source_instructions.append((source_name, newArgs))
return
def add_step(self, step_name, *args):
newArgs = ()
for arg in args:
newArgs = newArgs + (self.__convertArgument(arg),)
self.step_instructions.append((step_name, newArgs))
return
def __convertArgument(self,arg):
if isinstance(arg, Traversal):
self.bindings.update(arg.bytecode.bindings)
return arg.bytecode
elif isinstance(arg, tuple) and 2 == len(arg) and isinstance(arg[0], str):
self.bindings[arg[0]] = arg[1]
return Binding(arg[0],arg[1])
else:
return arg
As GraphTraversalSource
and GraphTraversal
are manipulated, the step-by-step instructions are written to Bytecode
.
This bytecode is simply a list of lists. For instance, g.V(1).repeat(out('knows').hasLabel('person')).times(2).name
has
the Bytecode
form:
[
["V", [1]],
["repeat", [[
["out", ["knows"]]
["hasLabel", ["person"]]]]]
["times", [2]]
["values", ["name"]]
]
This nested list representation is ultimately converted by the language variant into GraphSON
for serialization to a RemoteConnection
such as GremlinServer.
$ bin/gremlin-server.sh -i org.apache.tinkerpop gremlin-python 3.2.9
$ bin/gremlin-server.sh conf/gremlin-server-modern-py.yaml
[INFO] GremlinServer -
\,,,/
(o o)
---oOOo-(3)-oOOo---
[INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server-modern-py.yaml
[INFO] GraphManager - Graph [graph] was successfully configured via [conf/tinkergraph-empty.properties].
[INFO] ScriptEngines - Loaded gremlin-jython ScriptEngine
[INFO] ScriptEngines - Loaded gremlin-groovy ScriptEngine
[INFO] ServerGremlinExecutor - Initialized GremlinExecutor and configured ScriptEngines.
[INFO] ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
[INFO] TraversalOpProcessor - Initialized cache for TraversalOpProcessor with size 1000 and expiration time of 600000 ms
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v1.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0
[INFO] AbstractChannelizer - Configured application/json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0
[INFO] GremlinServer$1 - Channel started at port 8182.
Within the CPython console, it is possible to evaluate the following.
Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from gremlin_python import statics
>>> from gremlin_python.structure.graph import Graph
>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
# loading statics enables __.out() to be out() and P.gt() to be gt()
>>> statics.load_statics(globals())
>>> graph = Graph()
>>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))
# nested traversal with Python slicing and attribute interception extensions
>>> g.V().hasLabel("person").repeat(both()).times(2).name[0:2].toList()
[u'marko', u'marko']
>>> g = g.withComputer()
>>> g.V().hasLabel("person").repeat(both()).times(2).name[0:2].toList()
[u'peter', u'peter']
# a complex, nested multi-line traversal
>>> g.V().match( \
... as_("a").out("created").as_("b"), \
... as_("b").in_("created").as_("c"), \
... as_("a").out("knows").as_("c")). \
... select("c"). \
... union(in_("knows"),out("created")). \
... name.toList()
[u'ripple', u'marko', u'lop']
>>>
Important
|
Learn more about Apache TinkerPop’s distribution of Gremlin-Python here. |
Gremlin Language Variant Conventions
Every programming language is different and a Gremlin language variant must ride the fine line between leveraging the conventions of the host language and ensuring consistency with Gremlin-Java. A collection of conventions for navigating this dual-language bridge are provided.
-
If camelCase is not an accepted method naming convention in the host language, then the host language’s convention can be used instead. For instance, in a Gremlin-Ruby implementation,
outE("created")
may beout_e("created")
. -
If Gremlin-Java step names conflict with the host language’s reserved words, then a consistent amelioration should be used. For instance, in Python
as
is a reserved word, thus, Gremlin-Python usesas_
. -
If the host language does not use dot-notion for method chaining, then its method chaining convention should be used instead of going the route of operator overloading. For instance, a Gremlin-PHP implementation should do
$g→V()→out()
. -
If a programming language does not support method overloading, then varargs and type introspection should be used. In Gremlin-Python,
*args
does just that.
Conclusion
Gremlin is a simple language because it uses two fundamental programming language constructs: function composition
and function nesting. Because of this foundation, it is relatively easy to implement Gremlin in any modern programming
language. Two ways of doing this for the Python language were presented in this tutorial. One using Jython (on the JVM) and one using Python
(on CPython). It is strongly recommended that language variant designers leverage (especially when not on the JVM)
the reflection-based source code generation technique presented. This method ensures that the language
variant is always in sync with the corresponding Apache TinkerPop Gremlin-Java release version. Moreover, it reduces
the chance of missing methods or creating poorly implemented methods. While Gremlin is simple, there are nearly 200
steps in GraphTraversal
. As such, mechanical means of host language embedding are strongly advised.