apache tinkerpop logo


Gremlin Language Variants

Gremlin is an embeddable query language that can be represented using the constructs of a host programming language. Any programming language that supports function composition (e.g. fluent chaining) and function nesting (e.g. call stacks) can support Gremlin. Nearly every modern programming language is capable of meeting both requirements. With Gremlin, the distinction between a programming language and a query language is not as large as they have historically been. For instance, with Gremlin-Java, the developer is able to have their application code and their graph database queries at the same level of abstraction — both written in Java. A simple example is presented below where the MyApplication Java class contains both application-level and database-level code written in Java.

gremlin house of mirrors
This is an advanced tutorial intended for experts knowledgeable in Gremlin in particular and TinkerPop in general. Moreover, the audience should understand advanced programming language concepts such as reflection, meta-programming, source code generation, and virtual machines.
public class MyApplication {

  public static void run(final String[] args) {

    // assumes args[0] is a configuration file location
    Graph graph = GraphFactory.open(args[0]);
    GraphTraversalSource g = graph.traversal();

    // assumes that args[1] and args[2] are range boundaries
    Iterator<Map<String,Double>> result =
        order().by("unitPrice", incr).
        range(Integer.valueOf(args[1]), Integer.valueOf(args[2])).
        valueMap("name", "unitPrice")

    while(result.hasNext()) {
      Map<String,Double> map = result.next();
      System.out.println(map.get("name") + " " + map.get("unitPrice"));

In query languages like SQL, the user must construct a string representation of their query and submit it to the database for evaluation. This is because SQL cannot be expressed in Java as they use fundamentally different constructs in their expression. The same example above is presented below using SQL and the JDBC interface. The take home point is that Gremlin does not exist outside the programming language in which it will be used. Gremlin was designed to be able to be embedded in any modern programming language and thus, always free from the complexities of string manipulation as seen in other database and analytics query languages.

public class MyApplication {

  public static void run(final String[] args) {

    // assumes args[0] is a URI to the database
    Connection connection = DriverManager.getConnection(args[0])
    Statement statement = connection.createStatement();

    // assumes that args[1] and args[2] are range boundaries
    ResultSet result = statement.executeQuery(
      "SELECT Products.ProductName, Products.UnitPrice \n" +
      "  FROM (SELECT ROW_NUMBER() \n" +
      "                   OVER ( \n" +
      "                     ORDER BY UnitPrice) AS [ROW_NUMBER], \n" +
      "                 ProductID \n" +
      "            FROM Products) AS SortedProducts \n" +
      "      INNER JOIN Products \n" +
      "              ON Products.ProductID = SortedProducts.ProductID \n" +
      "   WHERE [ROW_NUMBER] BETWEEN " + args[1] + " AND " + args[2] + " \n" +

    while(result.hasNext()) {
      System.out.println(result.getString("Products.ProductName") + " " + result.getDouble("Products.UnitPrice"));

The purpose of this tutorial is to explain how to develop a Gremlin language variant. That is, for those developers that are interested in supporting Gremlin in their native language and there currently does not exist a (good) Gremlin variant in their language, they can develop one for the Apache TinkerPop community (and their language community in general). In this tutorial, Python will serve as the host language and two typical implementation models will be presented.

  1. Using Jython and the JVM: This is perhaps the easiest way to produce a Gremlin language variant. With JSR-223, any language compiler written for the JVM can directly access the JVM and any of its libraries (including Gremlin-Java).

  2. Using Python and GremlinServer: This model requires that there exist a Python class that mimics Gremlin-Java’s GraphTraversal API. With each method call of this Python class, Gremlin Bytecode is generated which is ultimately translated into a Gremlin variant that can execute the traversal (e.g. Gremlin-Java).

Apache TinkerPop’s Gremlin-Java is considered the idiomatic, standard implementation of Gremlin. Any Gremlin language variant, regardless of the implementation model chosen, must, within the constraints of the host language, be in 1-to-1 correspondence with Gremlin-Java. This ensures that language variants are collectively consistent and easily leveraged by anyone versed in Gremlin.
The "Gremlin-Python" presented in this tutorial is basic and provided to show the primary techniques used to construct a Gremlin language variant. Apache TinkerPop distributes with a full fledged Gremlin-Python variant that uses many of the techniques presented in this tutorial.

Language Drivers vs. Language Variants

Before discussing how to implement a Gremlin language variant in Python, it is necessary to understand two concepts related to Gremlin language development. There is a difference between a language driver and a language variant and it is important that these two concepts (and their respective implementations) remain separate.

Language Drivers

language-drivers A Gremlin language driver is a software library that is able to communicate with a TinkerPop-enabled graph system whether directly via the JVM or indirectly via Gremlin Server GremlinServer or some other RemoteConnection enabled graph system. Language drivers are responsible for submitting Gremlin traversals to a TinkerPop-enabled graph system and returning results to the developer that are within the developer’s language’s type system. For instance, resultant doubles should be coerced to floats in Python.

This tutorial is not about language drivers, but about language variants. Moreover, community libraries should make this distinction clear and should not develop libraries that serve both roles. Language drivers will be useful to a collection of Gremlin variants within a language community — able to support GraphTraversal-variants as well as also other DSL-variants (e.g. SocialTraversal).

GraphTraversal is a particular Gremlin domain-specific language (DSL), albeit the most popular and foundational DSL. If another DSL is created, then the same techniques discussed in this tutorial for GraphTraversal apply to XXXTraversal.

Language Variants

language-variants A Gremlin language variant is a software library that allows a developer to write a Gremlin traversal within their native programming language. The language variant is responsible for creating Gremlin Bytecode that will ultimately be translated and compiled to a Traversal by a TinkerPop-enabled graph system.

Every language variant, regardless of the implementation details, will have to account for the four core concepts below:

  1. Graph (data): The source of the graph data to be traversed and the interface which enables the creation of a GraphTraversalSource (via graph.traversal()).

  2. GraphTraversalSource (compiler): This is the typical g reference. A GraphTraversalSource maintains the withXXX()-strategy methods as well as the "traversal spawn"-methods such as V(), E(), addV(), etc. A traversal source’s registered TraversalStrategies determine how the submitted traversal will be ultimately evaluated.

  3. GraphTraversal (function composition): A graph traversal maintains the computational steps such as out(), groupCount(), match(), etc. This fluent interface supports method chaining and thus, a linear "left-to-right" representation of a traversal/query.

  4. __ (function nesting) : The anonymous traversal class is used for passing a traversal as an argument to a parent step. For example, in repeat(__.out()), __.out() is an anonymous traversal passed to the traversal parent repeat(). Anonymous traversals enable the "top-to-bottom" representation of a traversal.

  5. Bytecode (language agnostic encoding): The source and traversal steps and their arguments are encoded in a language agnostic representation called Gremlin bytecode. This representation is a nested list of the form [step,[args*]]*.

Both GraphTraversal and __ define the structure of the Gremlin language. Gremlin is a two-dimensional language supporting linear, nested step sequences. Historically, many Gremlin language variants have failed to make the distinctions above clear and in doing so, either complicate their implementations or yield variants that are not in 1-to-1 correspondence with Gremlin-Java. By keeping these concepts clear when designing a language variant, the construction of the Gremlin bytecode representation is easy.

The term "Gremlin-Java" denotes the language that is defined by GraphTraversalSource, GraphTraversal, and __. These three classes exist in org.apache.tinkerpop.gremlin.process.traversal.dsl.graph and form the definitive representation of the Gremlin traversal language.

Gremlin-Jython and Gremlin-Python

Using Jython and the JVM

jython-logo Jython provides a JSR-223 ScriptEngine implementation that enables the evaluation of Python on the Java virtual machine. In other words, Jython’s virtual machine is not the standard CPython reference implementation distributed with most operating systems, but instead the JVM. The benefit of Jython is that Python code and classes can easily interact with the Java API and any Java packages on the CLASSPATH. In general, any JSR-223 Gremlin language variant is trivial to "implement."

Jython 2.7.0 (default:9987c746f838, Apr 29 2015, 02:25:11)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.8.0_40
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
# this list is longer than displayed, including all jars in lib/, not just Apache TinkerPop jars
# there is probably a more convenient way of importing jars in Jython though, at the time of writing, no better solution was found.
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.4-standalone/lib/gremlin-console-3.2.4.jar")
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.4-standalone/lib/gremlin-core-3.2.4.jar")
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.4-standalone/lib/gremlin-driver-3.2.4.jar")
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.4-standalone/lib/gremlin-shaded-3.2.4.jar")
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.4-standalone/ext/tinkergraph-gremlin/lib/tinkergraph-gremlin-3.2.4.jar")
# import Java classes
>>> from org.apache.tinkerpop.gremlin.tinkergraph.structure import TinkerFactory
>>> from org.apache.tinkerpop.gremlin.process.traversal.dsl.graph import __
>>> from org.apache.tinkerpop.gremlin.process.traversal import *
>>> from org.apache.tinkerpop.gremlin.structure import *
# create the toy "modern" graph and spawn a GraphTraversalSource
>>> graph = TinkerFactory.createModern()
>>> g = graph.traversal()
# The Jython shell does not automatically iterate Iterators like the GremlinConsole
>>> g.V().hasLabel("person").out("knows").out("created")
[GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,[knows],vertex), VertexStep(OUT,[created],vertex)]
# toList() will do the iteration and return the results as a list
>>> g.V().hasLabel("person").out("knows").out("created").toList()
[v[5], v[3]]
>>> g.V().repeat(__.out()).times(2).values("name").toList()
[ripple, lop]
# results can be interacted with using Python
>>> g.V().repeat(__.out()).times(2).values("name").toList()[0]
>>> g.V().repeat(__.out()).times(2).values("name").toList()[0][0:3].upper()

Most every JSR-223 ScriptEngine language will allow the developer to immediately interact with GraphTraversal. The benefit of this model is that nearly every major programming language has a respective ScriptEngine: JavaScript, Groovy, Scala, Lisp (Clojure), Ruby, etc. A list of implementations is provided here.

Traversal Wrappers

While it is possible to simply interact with Java classes in a ScriptEngine implementation, such Gremlin language variants will not leverage the unique features of the host language. It is for this reason that JVM-based language variants such as Gremlin-Scala were developed. Scala provides many syntax niceties not available in Java. To leverage these niceties, Gremlin-Scala "wraps" GraphTraversal in order to provide Scala-idiomatic extensions. Another example is Apache TinkerPop’s Gremlin-Groovy which does the same via the Sugar plugin, but uses meta-programming instead of object wrapping, where "behind the scenes," Groovy meta-programming is doing object wrapping.

The Jython example below uses Python meta-programming to add functionality to GraphTraversal. In particular, the __getitem__ and __getattr__ "magic methods" are leveraged.

def getitem_bypass(self, index):
  if isinstance(index,int):
    return self.range(index,index+1)
  elif isinstance(index,slice):
    return self.range(index.start,index.stop)
    return TypeError('Index must be int or slice')");
GraphTraversal.__getitem__ = getitem_bypass
GraphTraversal.__getattr__ = lambda self, key: self.values(key)

The two methods __getitem__ and __getattr__ support Python slicing and object attribute interception, respectively. In this way, the host language is able to use its native constructs in a meaningful way within a Gremlin traversal.

Gremlin-Java serves as the standard/default representation of the Gremlin traversal language. Any Gremlin language variant must provide all the same functionality (methods) as GraphTraversal, but can extend it with host language specific constructs. This means that the extensions must compile to GraphTraversal-specific steps. A Gremlin language variant should not add steps/methods that do not exist in GraphTraversal. If an extension is desired, the language variant designer should submit a proposal to Apache TinkerPop to have the extension added to a future release of Gremlin.

Using Python and RemoteConnection

python-logo The JVM is a powerful piece of technology that has, over the years, become a meeting ground for developers from numerous language communities. However, not all applications will use the JVM. Given that Apache TinkerPop is a Java-framework, there must be a way for two different virtual machines to communicate traversals and their results. This section presents the second Gremlin language variant implementation model which does just that.

Apache TinkerPop is a JVM-based graph computing framework. Most graph databases and processors today are built on the JVM. This makes it easy for these graph system providers to implement Apache TinkerPop. However, TinkerPop is more than its graph API and tools — it is also the Gremlin traversal machine and language. While Apache’s Gremlin traversal machine was written for the JVM, its constructs are simple and can/should be ported to other VMs for those graph systems that are not JVM-based. A theoretical review of the concepts behind the Gremlin traversal machine is provided in this article.

This section’s Gremlin language variant design model does not leverage the JVM directly. Instead, it constructs a Bytecode representation of a Traversal that will ultimately be evaluated by RemoteConnection (e.g. GremlinServer). It is up to the language variant designer to choose a language driver to use for submitting the generated bytecode and coercing its results. The language driver is the means by which, for this example, the CPython VM communicates with the JVM.

# sudo easy_install pip
$ pip install gremlinpython

The Groovy source code below uses Java reflection to generate a Python class that is in 1-to-1 correspondence with Gremlin-Java.

class GraphTraversalSourceGenerator {

    public static void create(final String graphTraversalSourceFile) {

        final StringBuilder pythonClass = new StringBuilder()

        pythonClass.append("from .traversal import Traversal\n")
        pythonClass.append("from .traversal import TraversalStrategies\n")
        pythonClass.append("from .traversal import Bytecode\n")
        pythonClass.append("from ..driver.remote_connection import RemoteStrategy\n")
        pythonClass.append("from .. import statics\n\n")

// GraphTraversalSource //
                """class GraphTraversalSource(object):
  def __init__(self, graph, traversal_strategies, bytecode=None):
    self.graph = graph
    self.traversal_strategies = traversal_strategies
    if bytecode is None:
      bytecode = Bytecode()
    self.bytecode = bytecode
  def __repr__(self):
    return "graphtraversalsource[" + str(self.graph) + "]"
        GraphTraversalSource.getMethods(). // SOURCE STEPS
                findAll { GraphTraversalSource.class.equals(it.returnType) }.
                findAll {
                    !it.name.equals("clone") &&
                collect { SymbolHelper.toPython(it.name) }.
                sort { a, b -> a <=> b }.
                forEach { method ->
                            """  def ${method}(self, *args):
    source = GraphTraversalSource(self.graph, TraversalStrategies(self.traversal_strategies), Bytecode(self.bytecode))
    source.bytecode.add_source("${SymbolHelper.toJava(method)}", *args)
    return source
                """  def withRemote(self, remote_connection):
    source = GraphTraversalSource(self.graph, TraversalStrategies(self.traversal_strategies), Bytecode(self.bytecode))
    return source
        GraphTraversalSource.getMethods(). // SPAWN STEPS
                findAll { GraphTraversal.class.equals(it.returnType) }.
                collect { SymbolHelper.toPython(it.name) }.
                sort { a, b -> a <=> b }.
                forEach { method ->
                            """  def ${method}(self, *args):
    traversal = GraphTraversal(self.graph, self.traversal_strategies, Bytecode(self.bytecode))
    traversal.bytecode.add_step("${SymbolHelper.toJava(method)}", *args)
    return traversal

// GraphTraversal //
                """class GraphTraversal(Traversal):
  def __init__(self, graph, traversal_strategies, bytecode):
    Traversal.__init__(self, graph, traversal_strategies, bytecode)
  def __getitem__(self, index):
    if isinstance(index, int):
        return self.range(index, index + 1)
    elif isinstance(index, slice):
        return self.range(index.start, index.stop)
        raise TypeError("Index must be int or slice")
  def __getattr__(self, key):
    return self.values(key)
                findAll { GraphTraversal.class.equals(it.returnType) }.
                findAll { !it.name.equals("clone") }.
                collect { SymbolHelper.toPython(it.name) }.
                sort { a, b -> a <=> b }.
                forEach { method ->
                            """  def ${method}(self, *args):
    self.bytecode.add_step("${SymbolHelper.toJava(method)}", *args)
    return self

// AnonymousTraversal //
        pythonClass.append("class __(object):\n");
                findAll { GraphTraversal.class.equals(it.returnType) }.
                findAll { Modifier.isStatic(it.getModifiers()) }.
                collect { SymbolHelper.toPython(it.name) }.
                sort { a, b -> a <=> b }.
                forEach { method ->
                            """  @staticmethod
  def ${method}(*args):
    return GraphTraversal(None, None, Bytecode()).${method}(*args)
        // add to gremlin.python.statics
                findAll { GraphTraversal.class.equals(it.returnType) }.
                findAll { Modifier.isStatic(it.getModifiers()) }.
                findAll { !it.name.equals("__") }.
                collect { SymbolHelper.toPython(it.name) }.
                sort { a, b -> a <=> b }.
                forEach {
                    pythonClass.append("def ${it}(*args):\n").append("      return __.${it}(*args)\n\n")
                    pythonClass.append("statics.add_static('${it}', ${it})\n\n")

// save to a python file
        final File file = new File(graphTraversalSourceFile);
        pythonClass.eachLine { file.append(it + "\n") }

When the above Groovy script is evaluated (e.g. in GremlinConsole), Gremlin-Python is born. The generated Python file is available at graph_traversal.py. It is important to note that here is a bit more to Gremlin-Python in that there also exists Python implementations of TraversalStrategies, Traversal, Bytecode, etc. Please review the full implementation of Gremlin-Python here.

Of particular importance is Gremlin-Python’s implementation of Bytecode.

class Bytecode(object):
  def __init__(self, bytecode=None):
    self.source_instructions = []
    self.step_instructions = []
    self.bindings = {}
    if bytecode is not None:
      self.source_instructions = list(bytecode.source_instructions)
      self.step_instructions = list(bytecode.step_instructions)

  def add_source(self, source_name, *args):
    newArgs = ()
    for arg in args:
      newArgs = newArgs + (self.__convertArgument(arg),)
    self.source_instructions.append((source_name, newArgs))

  def add_step(self, step_name, *args):
    newArgs = ()
    for arg in args:
      newArgs = newArgs + (self.__convertArgument(arg),)
    self.step_instructions.append((step_name, newArgs))

  def __convertArgument(self,arg):
    if isinstance(arg, Traversal):
      return arg.bytecode
    elif isinstance(arg, tuple) and 2 == len(arg) and isinstance(arg[0], str):
      self.bindings[arg[0]] = arg[1]
      return Binding(arg[0],arg[1])
      return arg

As GraphTraversalSource and GraphTraversal are manipulated, the step-by-step instructions are written to Bytecode. This bytecode is simply a list of lists. For instance, g.V(1).repeat(out('knows').hasLabel('person')).times(2).name has the Bytecode form:

 ["V", [1]],
 ["repeat", [[
   ["out", ["knows"]]
   ["hasLabel", ["person"]]]]]
 ["times", [2]]
 ["values", ["name"]]

This nested list representation is ultimately converted by the language variant into GraphSON for serialization to a RemoteConnection such as GremlinServer.

$ bin/gremlin-server.sh -i org.apache.tinkerpop gremlin-python 3.2.4
$ bin/gremlin-server.sh conf/gremlin-server-modern-py.yaml
[INFO] GremlinServer -
       (o o)

[INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server-modern-py.yaml
[INFO] GraphManager - Graph [graph] was successfully configured via [conf/tinkergraph-empty.properties].
[INFO] ScriptEngines - Loaded gremlin-jython ScriptEngine
[INFO] ScriptEngines - Loaded gremlin-groovy ScriptEngine
[INFO] ServerGremlinExecutor - Initialized GremlinExecutor and configured ScriptEngines.
[INFO] ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
[INFO] TraversalOpProcessor - Initialized cache for TraversalOpProcessor with size 1000 and expiration time of 600000 ms
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v1.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0
[INFO] AbstractChannelizer - Configured application/json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0
[INFO] GremlinServer$1 - Channel started at port 8182.

Within the CPython console, it is possible to evaluate the following.

Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from gremlin_python import statics
>>> from gremlin_python.structure.graph import Graph
>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
# loading statics enables __.out() to be out() and P.gt() to be gt()
>>> statics.load_statics(globals())
>>> graph = Graph()
>>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))
# nested traversal with Python slicing and attribute interception extensions
>>> g.V().hasLabel("person").repeat(both()).times(2).name[0:2].toList()
[u'marko', u'marko']
>>> g = g.withComputer()
>>> g.V().hasLabel("person").repeat(both()).times(2).name[0:2].toList()
[u'peter', u'peter']
# a complex, nested multi-line traversal
>>> g.V().match( \
...     as_("a").out("created").as_("b"), \
...     as_("b").in_("created").as_("c"), \
...     as_("a").out("knows").as_("c")). \
...   select("c"). \
...   union(in_("knows"),out("created")). \
...   name.toList()
[u'ripple', u'marko', u'lop']
Learn more about Apache TinkerPop’s distribution of Gremlin-Python here.

Gremlin Language Variant Conventions

Every programming language is different and a Gremlin language variant must ride the fine line between leveraging the conventions of the host language and ensuring consistency with Gremlin-Java. A collection of conventions for navigating this dual-language bridge are provided.

  • If camelCase is not an accepted method naming convention in the host language, then the host language’s convention can be used instead. For instance, in a Gremlin-Ruby implementation, outE("created") may be out_e("created").

  • If Gremlin-Java step names conflict with the host language’s reserved words, then a consistent amelioration should be used. For instance, in Python as is a reserved word, thus, Gremlin-Python uses as_.

  • If the host language does not use dot-notion for method chaining, then its method chaining convention should be used instead of going the route of operator overloading. For instance, a Gremlin-PHP implementation should do $g->V()->out().

  • If a programming language does not support method overloading, then varargs and type introspection should be used. In Gremlin-Python, *args does just that.


Gremlin is a simple language because it uses two fundamental programming language constructs: function composition and function nesting. Because of this foundation, it is relatively easy to implement Gremlin in any modern programming language. Two ways of doing this for the Python language were presented in this tutorial. One using Jython (on the JVM) and one using Python (on CPython). It is strongly recommended that language variant designers leverage (especially when not on the JVM) the reflection-based source code generation technique presented. This method ensures that the language variant is always in sync with the corresponding Apache TinkerPop Gremlin-Java release version. Moreover, it reduces the chance of missing methods or creating poorly implemented methods. While Gremlin is simple, there are nearly 200 steps in GraphTraversal. As such, mechanical means of host language embedding are strongly advised.