apache tinkerpop logo

3.2.1-SNAPSHOT

Gremlin Language Variants

Gremlin is an embeddable query language able to represent itself within the constructs of a host programming language. Any programming language that supports function composition (e.g. fluent chaining) and function nesting (e.g. call stacks) can support Gremlin. Nearly every modern programming language is capable of meeting both requirements. With Gremlin, the distinction between a programming language and a query language is not be as strongly divided as they have historically been. For instance, with Gremlin-Java, the developer is able to have their application code and their graph database queries at the same level of abstraction — both written in Java. A simple example is presented below where the MyApplication Java class contains both application-level and database-level code written in Java.

gremlin house of mirrors
Warning
This is an advanced tutorial intended for experts knowledgeable in Gremlin in particular and TinkerPop in general. Moreover, the audience should understand advanced programming language concepts such as reflection, meta-programming, source code generation, and virtual machines.
public class MyApplication {

  public static void run(final String[] args) {

    // assumes args[0] is a configuration file location
    Graph graph = GraphFactory.open(args[0]);
    GraphTraversalSource g = graph.traversal();

    // assumes that args[1] and args[2] are range boundaries
    Iterator<Map<String,Double>> result =
      g.V().hasLabel("product").
        order().by("unitPrice", incr).
        range(Integer.valueOf(args[1]), Integer.valueOf(args[2])).
        valueMap("name", "unitPrice")

    while(result.hasNext()) {
      Map<String,Double> map = result.next();
      System.out.println(map.get("name") + " " + map.get("unitPrice"));
    }
  }
}

In query languages like SQL, the user must construct a string representation of their query and submit it to the database for evaluation. This is because SQL can be expressed in Java as they use fundamentally different constructs in their expression. The same example above is presented below using SQL and the JDBC interface. The take home point is that Gremlin does not exist outside the programming language in which it will be used. Gremlin was designed to be able to easily be embedded in any modern programming language and thus, always free from the complexities of string manipulation as seen in other database and analytics query languages.

public class MyApplication {

  public static void run(final String[] args) {

    // assumes args[0] is a URI to the database
    Connection connection = DriverManager.getConnection(args[0])
    Statement statement = connection.createStatement();

    // assumes that args[1] and args[2] are range boundaries
    ResultSet result = statement.executeQuery(
      "SELECT Products.ProductName, Products.UnitPrice \n" +
      "  FROM (SELECT ROW_NUMBER() \n" +
      "                   OVER ( \n" +
      "                     ORDER BY UnitPrice) AS [ROW_NUMBER], \n" +
      "                 ProductID \n" +
      "            FROM Products) AS SortedProducts \n" +
      "      INNER JOIN Products \n" +
      "              ON Products.ProductID = SortedProducts.ProductID \n" +
      "   WHERE [ROW_NUMBER] BETWEEN " + args[1] + " AND " + args[2] + " \n" +
      "ORDER BY [ROW_NUMBER]"

    while(result.hasNext()) {
      result.next();
      System.out.println(result.getString("Products.ProductName") + " " + result.getDouble("Products.UnitPrice"));
    }
  }
}

The purpose of this tutorial is to explain how to develop a Gremlin language variant. That is, for those developers that are interested in supporting Gremlin in their native language and there currently does not exist a (good) Gremlin variant in their language, they can develop one for the Apache TinkerPop community (and their language community in general). In this tutorial, Python will serve as the host language and two typical implementation models will be presented.

  1. Using Jython and the JVM: This is perhaps the easiest way to produce a Gremlin language variant. With JSR-223, any language compiler written for the JVM can directly access the JVM and any of its libraries (including Gremlin-Java).

  2. Using Python and GremlinServer: This model requires that there exist a Python class that mimics Gremlin-Java’s GraphTraversal API. With each method call of this Python class, a ScriptEngine string is constructed (e.g. Gremlin-Groovy). Ultimately, that constructed traversal string is submitted to a GremlinServer-compliant graph system for evaluation.

Important
Apache TinkerPop’s Gremlin-Java is considered the idiomatic, standard implementation of Gremlin. Any Gremlin language variant, regardless of the implementation model chosen, must, within the constraints of the host language, be in 1-to-1 correspondence with Gremlin-Java. This ensures that language variants are collectively consistent and easily leveraged by anyone versed in Gremlin.

Language Drivers vs. Language Variants

Before discussing how to implement a Gremlin language variant in Python, it is necessary to understand two concepts related to Gremlin language development. There is a difference between a language driver and a language variant and it is important that these two concepts (and their respective implementations) remain separated.

Language Drivers

language-drivers A Gremlin language driver is a software library that is able to communicate with a TinkerPop-enabled graph system whether directly via the JVM or indirectly via Gremlin Server. By in large, if a language driver is being developed, it is typically being developed to interact with GremlinServer or a RemoteConnection. Language drivers are responsible for submitting Gremlin traversals to a TinkerPop-enabled graph system and returning results to the developer that are within the developer’s language’s type system. For instance, resultant doubles should be coerced to floats in Python.

This tutorial is not about language drivers, but about language variants. Moreover, community libraries should make this distinction clear and should not develop libraries that serve both roles. Language drivers will be useful to a collection of Gremlin variants within a language community — able to support GraphTraversal-variants as well as also other DSL-variants (e.g. SocialTraversal).

Note
GraphTraversal is a particular Gremlin domain-specific language (DSL), albeit the most popular and foundational DSL. If another DSL is created, then the same techniques discussed in this tutorial for GraphTraversal apply to XXXTraversal.

Language Variants

language-variants A Gremlin language variant is a software library that allows a developer to write a Gremlin traversal within their native programming language. The language variant is responsible for creating a Traversal instance that will ultimately be evaluated by a TinkerPop-enabled graph system. The Traversal instance is either created directly on the JVM or as a String for ultimate conversion to a Traversal by a JSR-223 ScriptEngine (typically, via GremlinServer).

Every language variant, regardless of the implementation details, will have to account for the four core concepts below:

  1. Graph (data): The source of the graph data to be traversed and the interface which enables the creation of a GraphTraversalSource (via graph.traversal()).

  2. GraphTraversalSource (compiler): This is the typical g reference. A GraphTraversalSource maintains the withXXX()-strategy methods as well as the "traversal spawn"-methods such as V(), E(), addV(), etc. A traversal source’s registered TraversalStrategies determine how the submitted traversal will be compiled prior to evaluation.

  3. GraphTraversal (function composition): A graph traversal maintains every the computational steps such as out(), groupCount(), match(), etc. This fluent interface supports method chaining and thus, a linear "left-to-right" representation of a traversal/query.

  4. __ (function nesting) : The anonymous traversal class is used for passing a traversal as an argument to a parent step. For example, in repeat(__.out()), __.out() is an anonymous traversal passed to the traversal parent repeat(). Anonymous traversals enable the "top-to-bottom" representation of a traversal.

Both GraphTraversal and __ define the structure of the Gremlin language. Gremlin is a two-dimensional language supporting linear, nested step sequences. Historically, many Gremlin language variants have failed to make the distinctions above clear and in doing so, either complicate their implementations or yield variants that are not in 1-to-1 correspondence with Gremlin-Java.

Important
The term "Gremlin-Java" denotes the language that is defined by GraphTraversalSource, GraphTraversal, and __. These three classes exist in org.apache.tinkerpop.gremlin.process.traversal.dsl.graph and form the definitive representation of the Gremlin traversal language.

Gremlin-Jython and Gremlin-Python

Using Jython and the JVM

jython-logo Jython provides a JSR-223 ScriptEngine implementation that enables the evaluation of Python on the Java virtual machine. In other words, Jython’s virtual machine is not the standard CPython reference implementation distributed with most operating systems, but instead the JVM. The benefit of Jython is that Python code and classes can easily interact with the Java API and any Java packages on the CLASSPATH. In general, any JSR-223 Gremlin language variant is trivial to "implement."

Jython 2.7.0 (default:9987c746f838, Apr 29 2015, 02:25:11)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.8.0_40
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
# this list is longer than displayed, including all jars in lib/, not just Apache TinkerPop jars
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.1-SNAPSHOT-standalone/lib/gremlin-console-3.2.1-SNAPSHOT.jar")
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.1-SNAPSHOT-standalone/lib/gremlin-core-3.2.1-SNAPSHOT.jar")
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.1-SNAPSHOT-standalone/lib/gremlin-driver-3.2.1-SNAPSHOT.jar")
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.1-SNAPSHOT-standalone/lib/gremlin-shaded-3.2.1-SNAPSHOT.jar")
>>> sys.path.append("/usr/local/apache-gremlin-console-3.2.1-SNAPSHOT-standalone/ext/tinkergraph-gremlin/lib/tinkergraph-gremlin-3.2.1-SNAPSHOT.jar")
# import Java classes
>>> from org.apache.tinkerpop.gremlin.tinkergraph.structure import TinkerFactory
>>> from org.apache.tinkerpop.gremlin.process.traversal.dsl.graph import __
>>> from org.apache.tinkerpop.gremlin.process.traversal import *
>>> from org.apache.tinkerpop.gremlin.structure import *
# create the toy "modern" graph and spawn a GraphTraversalSource
>>> graph = TinkerFactory.createModern()
>>> g = graph.traversal()
# The Jython shell does not automatically iterate Iterators like the GremlinConsole
>>> g.V().hasLabel("person").out("knows").out("created")
[GraphStep(vertex,[]), HasStep([~label.eq(person)]), VertexStep(OUT,[knows],vertex), VertexStep(OUT,[created],vertex)]
# toList() will do the iteration and return the results as a list
>>> g.V().hasLabel("person").out("knows").out("created").toList()
[v[5], v[3]]
>>> g.V().repeat(__.out()).times(2).values("name").toList()
[ripple, lop]
# results can be interacted with using Python
>>> g.V().repeat(__.out()).times(2).values("name").toList()[0]
u'ripple'
>>> g.V().repeat(__.out()).times(2).values("name").toList()[0][0:3].upper()
u'RIP'
>>>

Most every JSR-223 ScriptEngine language will allow the developer to immediately interact with GraphTraversal. The benefit of this model is that nearly every major programming language has a respective ScriptEngine: JavaScript, Groovy, Scala, Lisp (Clojure), Ruby, etc. A list of implementations is provided here.

Traversal Wrappers

While it is possible to simply interact with Java classes in a ScriptEngine implementation, such Gremlin language variants will not leverage the unique features of the host language. It is for this reason that JVM-based language variants such as Gremlin-Scala were developed. Scala provides many syntax niceties not available in Java. To leverage these niceties, Gremlin-Scala "wraps" GraphTraversal in order to provide Scala-idiomatic extensions. Another example is Apache TinkerPop’s Gremlin-Groovy which does the same via the Sugar plugin, but uses meta-programming instead of object wrapping, where "behing the scenes," Groovy meta-programming is doing object wrapping.

The Jython classes below wrap GraphTraversalSource and GraphTraversal. In doing so, they add methods that apply Python-specific constructs to Gremlin. In particular, the __getitem__ and __getattr__ "magic methods" are leveraged. It is important to note that the classes below are not complete and only provide enough functionality to demonstrate this sub-sections tutorial material.

# GraphTraversalSource (incomplete)
class JythonGraphTraversalSource(object):
  def __init__(self, traversalSource):
    self.traversalSource = traversalSource
  def V(self,*args):
    return JythonGraphTraversal(self.traversalSource.V(*args))
  def __repr__(self):
    return self.traversalSource.toString()

# GraphTraversal (incomplete)
class JythonGraphTraversal(object):
  def __init__(self, traversal):
    self.traversal = traversal
  def V(self,*args):
    self.traversal = self.traversal.V(args)
    return self
  def values(self, *propertyKeys):
    self.traversal = self.traversal.values(propertyKeys)
    return self
  def toList(self):
    return self.traversal.toList()
  def __repr__(self):
    return self.traversal.toString()
  def __getitem__(self,index):
    if type(index) is int:
      self.traversal = self.traversal.range(index,index+1)
    elif type(index) is slice:
        self.traversal = self.traversal.range(index.start,index.stop)
    else:
        raise TypeError("index must be int or slice")
    return self
  def __getattr__(self,key):
    return self.values(key)

The two methods __getitem__ and __getattr__ support Python slicing and object attribute interception, respectively. In this way, the host language is able to use its native constructs in a meaningful way within a Gremlin traversal.

>>> graph
tinkergraph[vertices:6 edges:6]
>>> g = JythonGraphTraversalSource(graph.traversal())
>>> g
graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
# Python slices are converted to range()-steps
>>> g.V()[1:4]
[GraphStep(vertex,[]), RangeGlobalStep(1,4)]
# Python attribute selections are converted to values()-steps
>>> g.V()[1:4].name
[GraphStep(vertex,[]), RangeGlobalStep(1,4), PropertiesStep([name],value)]
>>> g.V()[1:4].name.toList()
[vadas, lop, josh]
Important
Gremlin-Java serves as the standard/default representation of the Gremlin traversal language. Any Gremlin language variant must provide all the same functionality (methods) as GraphTraversal, but can extend it with host language specific constructs. This means that the extensions must compile to GraphTraversal-specific steps. A Gremlin language variant should not add steps/methods that do not exist in GraphTraversal. If an extension is desired, the language variant designer should submit a proposal to Apache TinkerPop to have the extension added to a future release of Gremlin.

Auto-Generated Traversal Wrappers

In the previous example, only a subset of the GraphTraversalSource and GraphTraversal methods were implemented in the corresponding Jython classes. Unfortunately, adding the near 200 GraphTraversal methods to a wrapper class is both tedious and error-prone. To alleviate this pain, Python classes can be dynamically created using Groovy and Java reflection. The Groovy code for constructing the JythonGraphTraversal class source is reviewed below. By simply executing this code in the Gremlin Console, the gremlin-jython.py file is generated and can be execfile()'d by Jython.

Note
Any JVM language can use Java reflection to generate source code. The examples in this tutorial use Groovy because of its terse syntax and convenient multi-line string construct """ """. Moreover, the Gremlin Console is recommended for the Groovy script evaluation because all requisite TinkerPop libraries are pre-loaded and available at startup.
pythonClass = new StringBuilder();
pythonClass.append("from org.apache.tinkerpop.gremlin.process.traversal import *\n")
pythonClass.append("from org.apache.tinkerpop.gremlin.structure import *\n")
pythonClass.append("from org.apache.tinkerpop.gremlin.process.traversal.dsl.graph import __ as anon\n\n")
//////////////////////////
// GraphTraversalSource //
//////////////////////////
methods = GraphTraversalSource.getMethods().collect{it.name} as Set; []
pythonClass.append(
"""class JythonGraphTraversalSource(object):
  def __init__(self, traversalSource):
    self.traversalSource = traversalSource
  def __repr__(self):
    return self.traversalSource.toString()
""")
methods.each{ method ->
  returnType = (GraphTraversalSource.getMethods() as Set).findAll{it.name.equals(method)}.collect{it.returnType}[0]
  if(null != returnType && TraversalSource.isAssignableFrom(returnType)) {
  pythonClass.append(
"""  def ${method}(self, *args):
    self.traversalSource = self.traversalSource.${method}(*args)
    return self
""")
  } else if(null != returnType && Traversal.isAssignableFrom(returnType)) {
  pythonClass.append(
"""  def ${method}(self, *args):
    return JythonGraphTraversal(self.traversalSource.${method}(*args))
""")
  } else {
  pythonClass.append(
"""  def ${method}(self, *args):
    return self.traversalSource.${method}(*args)
""")
  }
}; []
pythonClass.append("\n\n")

////////////////////
// GraphTraversal //
////////////////////
methodMap = [as:"_as",in:"_in",and:"_and",or:"_or",is:"_is",not:"_not",from:"_from"].withDefault{ it }  //(1)
invertedMethodMap = [_as:"as",_in:"in",_and:"and",_or:"or",_is:"is",_not:"not",_from:"from"].withDefault{ it }
pythonClass.append(                                                           //(2)
"""class JythonGraphTraversal(object):
  def __init__(self, traversal):
    self.traversal = traversal
  def __repr__(self):
    return self.traversal.toString()
  def __getitem__(self,index):
    if type(index) is int:
      self.traversal = self.traversal.range(index,index+1)
    elif type(index) is slice:
        self.traversal = self.traversal.range(index.start,index.stop)
    else:
        raise TypeError("index must be int or slice")
    return self
  def __getattr__(self,key):
    return self.values(key)
""")
methods = GraphTraversal.getMethods().collect{methodMap[it.name]} as Set; []  //(3)
methods.each{ method ->
  returnType = (GraphTraversal.getMethods() as Set).findAll{it.name.equals(method)}.collect{it.returnType}[0]
  if(null != returnType && Traversal.isAssignableFrom(returnType)) {          //(4)
  pythonClass.append(
"""  def ${method}(self, *args):
    self.traversal = self.traversal.${invertedMethodMap[method]}(*args)
    return self
""")
  } else {
  pythonClass.append(                                                         //(5)
"""  def ${method}(self, *args):
    return self.traversal.${invertedMethodMap[method]}(*args)
""")
  }
}; []
pythonClass.append("\n\n")

////////////////////////
// AnonymousTraversal //
////////////////////////
methods = __.getMethods().collect{methodMap[it.name]} as Set; []
pythonClass.append("class __(object):\n");
methods.each{ method ->
  pythonClass.append(
"""  @staticmethod
  def ${method}(*args):
    return anon.${invertedMethodMap[method]}(*args)
""")
}; []
pythonClass.append("\n\n")

// save to a python file
file = new File("/tmp/gremlin-jython.py")                                    //(6)
file.delete()
pythonClass.eachLine{ file.append(it + "\n") }
  1. There are numerous GraphTraversal step names that are reserved words in Python. Prefixing these steps with _ is the chosen workaround.

  2. Add Gremlin-Jython specific methods to JythonGraphTraversal. These methods are idiomatic Python extensions, not step additions.

  3. Use Java reflection to get all the methods of GraphTraversal.

  4. If the method is a fluent traversal-method, then mutate the underlying/wrapped GraphTraversal instance accordingly.

  5. If the method is not a fluent traversal-method, return the result of applying the method.

  6. Save the string representation of the Jython source code to gremlin-jython.py.

From the Jython console, gremlin-jython.py is loaded and a complete Gremlin language variant is born: Gremlin-Jython. The generated file is available at gremlin-jython.py.

Jython 2.7.0 (default:9987c746f838, Apr 29 2015, 02:25:11)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.8.0_40
Type "help", "copyright", "credits" or "license" for more information.
>>> execfile("/tmp/gremlin-jython.py")
>>> from org.apache.tinkerpop.gremlin.tinkergraph.structure import TinkerFactory
>>> graph = TinkerFactory.createModern()
>>> g = JythonGraphTraversalSource(graph.traversal())
# using the Gremlin-Jython __getattr__ and __getitem__ extensions and anonymous traversals
>>> g.V().repeat(__.both("created")).times(2).name[1:3].path().toList()
[[v[1], v[3], v[4], josh], [v[1], v[3], v[6], peter]]
# JythonGraphTraversalSource works as expected -- an example using Gremlin-Jython w/ OLAP
>>> g = g.withComputer()
>>> g
graphtraversalsource[tinkergraph[vertices:6 edges:6], graphcomputer]
>>> g.V().repeat(__.both("created")).times(2).name[1:3].path().toList()
[[v[3], v[4], v[5], ripple], [v[1], v[4], v[5], ripple]]
>>>

Gremlin-Jython was simple to create. Unfortunately, this simplicity is not without some problems. These problems are itemized below. The interested reader can solve the aforementioned problems as a training exercise.

  • The Gremlin-Jython API is non-informative as all methods take a tuple reference (*args).

    • The Gremlin-Java JavaDoc would be a sufficient guide to Gremlin-Jython (minus the extensions).

  • Lambdas are not supported as map(lambda x: x.get()) will throw an exception about not being able to coerce lamba into java.util.function.Function.

    • Python type inspection with a creation of Function lambda wrapper would solve this problem.

  • __ is always required for anonymous traversals and thus, repeat(__.both()) can not be replaced by repeat(both()).

    • By placing the @staticmethods outside of the __ Jython class, the methods would be globally scoped (analogous to import static in Java).

Note
Another technique that can be leveraged in most dynamic languages is to use meta-programming and intercept all method calls to the variant’s traversal classes. From there, the name of the method that was called, along with its parameters, are used to dynamically construct a method call to the wrapped traversals. In this way, there is no need to create a wrapper method for each method in GraphTraversalSource, GraphTraversal, and __. The drawback of this technique is that not all methods are fluent and those that are not, might need special handling. Moreover, runtime reflection is typically not efficient.

Using Python and GremlinServer

python-logo The JVM is a wonderful piece of technology that has, over the years, become a meeting ground for developers from numerous language communities. However, not all applications will use the JVM. Given that Apache TinkerPop is a Java-framework, there must be a way for two different virtual machines to communicate traversals and their results. This section presents the second Gremlin language variant implementation model which does just that.

Note
Apache TinkerPop is a JVM-based graph computing framework. Most graph databases and processors today are built on the JVM. This makes it easy for these graph system providers to implement Apache TinkerPop. However, TinkerPop is more than its graph API and tools — it is also the Gremlin traversal machine and language. While Apache’s Gremlin traversal machine was written for the JVM, its constructs are simple and can/should be ported to other VMs for those graph systems that are not JVM-based. A theoretical review of the concepts behind the Gremlin traversal machine is provided in this article.

This section’s Gremlin language variant design model does not leverage the JVM directly. Instead, it constructs a String representation of a Traversal that will ultimately be evaluated by a registered ScriptEngine at a GremlinServer or RemoteConnection. It is up to the language variant designer to choose a language driver to use for submitting the generated String and coercing its results. The language driver is the means by which, for this example, the CPython VM communicates with the JVM. The gremlinclient Python language driver is used and its installation via pip is provided below.

# sudo easy_install pip
$ sudo pip install gremlinclient
Important
When language drivers are separated from language variants, language variants can more easily choose a language driver to use. In fact, it is possible for multiple language drivers to be supported by a language variant as GremlinServer, for example, supports various interaction mechanisms such as WebSockets, REST, custom endpoints, etc.

The Groovy source code below uses Java reflection to generate a Python class that is in 1-to-1 correspondence with Gremlin-Java.

pythonClass = new StringBuilder()
pythonClass.append("from tornado import gen\n")
pythonClass.append("from tornado.ioloop import IOLoop\n")
pythonClass.append("from gremlinclient.tornado_client import submit\n")
pythonClass.append("""
class Helper(object):
  @staticmethod
  def stringOrObject(arg):
    if (type(arg) is str and
       not(arg.startswith("P.")) and
       not(arg.startswith("Order.")) and
       not(arg.startswith("T.")) and
       not(arg.startswith("Pop.")) and
       not(arg.startswith("Column."))):
      return "\\"" + arg + "\\""
    elif type(arg) is bool:
      return str(arg).lower()
    else:
      return str(arg)
  @staticmethod
  def stringify(*args):
    if len(args) == 0:
      return ""
    elif len(args) == 1:
      return Helper.stringOrObject(args[0])
    else:
      return ", ".join(Helper.stringOrObject(i) for i in args)
  @staticmethod
  @gen.coroutine
  def submit(gremlinServerURI, traversalString):
    response = yield submit(gremlinServerURI, traversalString)
    while True:
      result = yield response.read()
      if result is None:
        break
      raise gen.Return(result.data)\n

"""); //(1)

//////////////////////////
// GraphTraversalSource //
//////////////////////////
methods = GraphTraversalSource.getMethods().collect{it.name} as Set; []
pythonClass.append(
"""class PythonGraphTraversalSource(object):
  def __init__(self, gremlinServerURI, traversalSourceString):
    self.gremlinServerURI = gremlinServerURI
    self.traversalSourceString = traversalSourceString
  def __repr__(self):
    return "graphtraversalsource[" + self.gremlinServerURI + ", " + self.traversalSourceString + "]"
""")
methods.each{ method ->
  returnType = (GraphTraversalSource.getMethods() as Set).findAll{it.name.equals(method)}.collect{it.returnType}[0]
  if(null != returnType && Traversal.isAssignableFrom(returnType)) {
  pythonClass.append(
"""  def ${method}(self, *args):
    return PythonGraphTraversal(self.traversalSourceString + ".${method}(" + Helper.stringify(*args) + ")", self.gremlinServerURI)
""")
  } else {
  pythonClass.append(
"""  def ${method}(self, *args):
    return PythonGraphTraversalSource(self.gremlinServerURI, self.traversalSourceString + ".${method}(" + Helper.stringify(*args) + ")")
""")
  }
}; []
pythonClass.append("\n\n")

////////////////////
// GraphTraversal //
////////////////////
methodMap = [as:"_as",in:"_in",and:"_and",or:"_or",is:"_is",not:"_not",from:"_from"].withDefault{ it }
invertedMethodMap = [_as:"as",_in:"in",_and:"and",_or:"or",_is:"is",_not:"not",_from:"from"].withDefault{ it }
methods = GraphTraversal.getMethods().collect{methodMap[it.name]} as Set; []
methods.remove("toList")                                                                //(2)
pythonClass.append(
"""class PythonGraphTraversal(object):
  def __init__(self, traversalString, gremlinServerURI=None):
    self.traversalString = traversalString
    self.gremlinServerURI = gremlinServerURI
  def __repr__(self):
    return self.traversalString;
  def __getitem__(self,index):
    if type(index) is int:
      return self.range(index,index+1)
    elif type(index) is slice:
      return self.range(index.start,index.stop)
    else:
      raise TypeError("index must be int or slice")
  def __getattr__(self,key):
    return self.values(key)
  def toList(self):
    return IOLoop.current().run_sync(lambda: Helper.submit(self.gremlinServerURI, self.traversalString))
""")
methods.each{ method ->
  returnType = (GraphTraversal.getMethods() as Set).findAll{it.name.equals(invertedMethodMap[method])}.collect{it.returnType}[0]
  if(null != returnType && Traversal.isAssignableFrom(returnType)) {
    pythonClass.append(
"""  def ${method}(self, *args):
    self.traversalString = self.traversalString + ".${invertedMethodMap[method]}(" + Helper.stringify(*args) + ")"
    return self
""")
  } else {
    pythonClass.append(
"""  def ${method}(self, *args):
    self.traversalString = self.traversalString + ".${invertedMethodMap[method]}(" + Helper.stringify(*args) + ")"
    return self.toList()
""")
  }
}; []
pythonClass.append("\n\n")

////////////////////////
// AnonymousTraversal //
////////////////////////
methods = __.getMethods().collect{methodMap[it.name]} as Set; []
pythonClass.append("class __(object):\n");
methods.each{ method ->
  pythonClass.append(
"""  @staticmethod
  def ${method}(*args):
    return PythonGraphTraversal("__").${method}(*args)
""")
}; []
pythonClass.append("\n\n")

// save to a python file
file = new File("/tmp/gremlin-python.py")                                                //(3)
file.delete()
pythonClass.eachLine{ file.append(it + "\n") }
  1. The Helper class contains static methods that are generally useful to the other classes. This could have been a separate file, but was included in the Groovy script so that the tutorial’s code is consolidated.

  2. toList()'s method def is not generated programmatically, but instead is hardcoded and uses the gremlinclient driver to communicate with GremlinServer.

  3. Save the string representation of the Python source code to gremlin-python.py.

When the above Groovy script is evaluated in GremlinConsole, Gremlin-Python is born. The generated file is available at gremlin-python.py. Now, from any Python virtual machine (not just Jython), Gremlin traversals can be expressed in native Python and a legal Gremlin-Groovy string is created behind the scenes.

Note
The string that is generated for submission to a GremlinServer or RemoteConnection does not have to be a Gremlin-Groovy string. However, it must be a string that has a respective ScriptEngine that is enabled on the remote location. It is recommended that a Gremlin-Groovy string be constructed as Gremlin-Groovy is maintained by Apache TinkerPop and is guaranteed to always be aligned with Gremlin-Java.

Be sure that GremlinServer is running and has a GraphSON endpoint. The following serializers were added to conf/gremlin-server-modern.yaml.

- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { useMapperFromGraph: graph }} # application/vnd.gremlin-v1.0+json
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { useMapperFromGraph: graph }}        # application/json

Then GremlinServer was started.

$ bin/gremlin-server.sh conf/gremlin-server-modern.yaml
[INFO] GremlinServer -
         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----

[INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server-modern.yaml
[INFO] MetricManager - Configured Metrics Slf4jReporter configured with interval=180000ms and loggerName=org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics
[INFO] GraphManager - Graph [graph] was successfully configured via [conf/tinkergraph-empty.properties].
[INFO] ServerGremlinExecutor - Initialized Gremlin thread pool.  Threads in pool named with pattern gremlin-*
[INFO] ScriptEngines - Loaded gremlin-groovy ScriptEngine
[INFO] GremlinExecutor - Initialized gremlin-groovy ScriptEngine with scripts/generate-modern.groovy
[INFO] ServerGremlinExecutor - Initialized GremlinExecutor and configured ScriptEngines.
[INFO] ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
[INFO] OpLoader - Adding the standard OpProcessor.
[INFO] OpLoader - Adding the control OpProcessor.
[INFO] OpLoader - Adding the session OpProcessor.
[INFO] OpLoader - Adding the traversal OpProcessor.
[INFO] GremlinServer - Executing start up LifeCycleHook
[INFO] Logger$info - Loading 'modern' graph data.
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v1.0+gryo with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v1.0+gryo-stringd with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v1.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0
[INFO] AbstractChannelizer - Configured application/json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0
[INFO] GremlinServer$1 - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1.
[INFO] GremlinServer$1 - Channel started at port 8182.
Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> execfile("/tmp/gremlin-python.py")
# PythonGraphTraversalSource requires a GremlinServer endpoint and a traversal alias
>>> g = PythonGraphTraversalSource("ws://localhost:8182/", "g")
>>> g
graphtraversalsource[ws://localhost:8182/, g]
# nested traversal with Python slicing and attribute interception extensions
>>> g.V().repeat(__.both("created")).times(2).name[1:3].path()
g.V().repeat(__.both("created")).times(2).values("name").range(1, 3).path()
>>> g.V().hasLabel("person").repeat(__.both()).times(2).name[0:2].toList()
[u'marko', u'josh']
# PythonGraphTraversalSource works as expected -- an example using Gremlin-Python w/ OLAP
>>> g = g.withComputer()
>>> g
graphtraversalsource[ws://localhost:8182/, g.withComputer()]
>>> g.V().hasLabel("person").repeat(__.both()).times(2).name[0:2].toList()
[u'ripple', u'marko']
# a complex, nested multi-line traversal
>>> g.V().match( \
...     __._as("a").out("created")._as("b"), \
...     __._as("b")._in("created")._as("c"), \
...     __._as("a").out("knows")._as("c")). \
...   select("c"). \
...   union(__._in("knows"),__.out("created")). \
...   name.toList()
[u'marko', u'ripple', u'lop']
>>>

Finally, for the sake of brevity, Gremlin-Python is simple and as such, incurs a few peculiarities that the interested reader may want to remedy as an exercise.

  • P, T, Order, etc. are handled via string analysis and are used as has("age","P.gt(36)"). It would be better to create P, T, etc. Python classes that yield the appropriate string representation.

  • Results are retrieved using toList(). This simple implementation does not account for GremlinServer’s result batching and is thus, is not optimal for large result sets.

  • While terminal methods such as next(), hasNext(), toSet(), etc. work, they simply rely on toList() in an awkward way.

Gremlin Language Variant Conventions

Every programming language is different and a Gremlin language variant must ride the fine line between leveraging the conventions of the host language and ensuring consistency with Gremlin-Java. A collection of conventions for navigating this dual-language bridge are provided.

  • If camelCase is not an accepted method naming convention in the host language, then the host language’s convention should be used instead. For instance, in a Gremlin-Ruby implementation, outE("created") should be out_e("created").

  • If Gremlin-Java step names conflict with the host language’s reserved words, then a consistent amelioration should be used. For instance, in Python as is a reserved word, thus, Gremlin-Python uses _as.

  • If the host language does not use dot-notion for method chaining, then its method chaining convention should be used instead of going the route of operator overloading. For instance, a Gremlin-PHP implementation should do $g->V()->out().

  • If a programming language does not support method overloading, then varargs and type introspection should be used. In Gremlin-Python, *args does just this and that is why there are not 200 methods off of PythonGraphTraversal.

As stated in Language Drivers vs. Language Variants, drivers and variants should be separate libraries. A proposed naming convention for each library type is gremlin-<language>-driver and gremlin-<language>. Unfortunately, numerous drivers and languages already exist for Gremlin that don’t use this convention. However, moving forward, it might be good to be explicit in the naming so its obvious to users what is what.

Finally, note that Gremlin-Jython and Gremlin-Python (as presented in this tutorial) were only manually tested. This means that there are most likely errors in the translation and thus, some traversals may break. A future addition to this tutorial will explain how to leverage TinkerPop’s ProcessStandardSuite and ProcessComputerSuite to test not only JVM-based language variants, but also non-JVM variants. In doing so, every Gremlin language variant’s syntax and semantics will be validated and deemed an accurate representation of Gremlin-Java within another host language.

Conclusion

Gremlin is a simple language because it uses two fundamental programming language constructs: function composition and function nesting. Because of this foundation, it is relatively easy to implement Gremlin in any modern programming language. Two ways of doing this for the Python language were presented in this tutorial. One using Jython (on the JVM) and one using Python (on CPython). It is strongly recommended that language variant designers leverage (especially when not on the JVM) the reflection-based source code generation technique presented. This method ensures that the language variant is always in sync with the corresponding Apache TinkerPop Gremlin-Java release version. Moreover, it reduces the chance of missing methods or creating poorly implemented methods. While Gremlin is simple, there are nearly 200 steps in GraphTraversal. As such, computational means of host language embedding are strongly advised.