The Gremlin Console

3.7.2

The Gremlin Console

In "The First Five Minutes" of the Apache TinkerPop™ tutorial on how to get started with TinkerPop and graphs, the importance of the Gremlin Console was introduced. This tutorial further explores the usage of the console in the daily work of Gremlin developers delving more deeply into the details of its operations and expanding upon the basic usage guide in the reference documentation.

Important

This tutorial assumes that the Gremlin Console is installed and that you have some familiarity with Gremlin in general. Please be sure to read the Getting Started tutorial prior to proceeding further with this one.

The Gremlin Console serves a variety of use cases that can meet the needs of different types of Gremlin users. This tutorial explores the features of the Gremlin Console through a number of these different use cases to hopefully inspire you to new levels of usage. While a use case may not fit your needs, you may well find it worthwhile to read, as it is possible that a "feature" will be discussed that may be useful to you.

The following points summarize the key features discussed in each use case:

A Learning Tool
- Introducing the toy graphs
- Finding help for commands
Application Developers
- Static importing of common methods
- Result iteration
Ad-hoc Analysis
- Importing new classes
- Installing new dependencies
- Deciding when to use the def keyword

Use Case: A Learning Tool

gremlin grad You are a new user of Apache TinkerPop and perhaps new to graphs as well. You’re trying to get familiar with how Gremlin works and how it might fit into your project. You want some "quick wins" with Gremlin and aim to conceptually prove that the TinkerPop stack is a good direction to go.

It cannot be emphasized enough just how important the Gremlin Console is to new users. The interactive nature of a REPL makes it possible to quickly try some Gremlin code and get some notion of success or failure without the longer process of build tools (e.g. Maven), IDEs, compilation, and application execution. The faster that you can iterate through versions of your Gremlin code, the faster you can advance your knowledge.

As a new user, your best way to learn is to try Gremlin with a graph already packaged with the console: TinkerGraph. TinkerGraph is an in-memory graph database that is easy to use and does not have a lot of configuration options to be concerned with. You can create an empty TinkerGraph as follows:

console (groovy) groovy

gremlin> graph = TinkerGraph.open() //// (1)
==>tinkergraph[vertices:0 edges:0]
gremlin> g = traversal().withEmbedded(graph) //// (2)
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]

graph = TinkerGraph.open() //// (1)
g = traversal().withEmbedded(graph)        //2

Creates the Graph instance that is the API to the structure of the graph.
Creates the TraversalSource which is the API for processing or traversing that Graph.

Important

TinkerPop recommends creating the TraversalSource once and re-using it as necessary in your application.

Note	The `traversal()` method is statically imported from the `AnonymousTraversalSource` class so that it can be used in a more fluent fashion. There are common imports for all languages that support Gremlin to make it easier to read and to write (Java, Python, .NET, Javascript).

Now that you have an empty TinkerGraph instance, you could load a sample of your data and get started with some traversals. Of course, you might also try one of the "toy" graphs (i.e. graphs with sample data) that TinkerPop packages with the console through the TinkerFactory. TinkerFactory has a number of static methods that can be called to create these standard TinkerGraph instances. They are "standard" in the sense that they are typically used for all TinkerPop examples and test cases.

createClassic() - The original TinkerPop 2.x toy graph (diagram).
createModern() - The TinkerPop 3.x representation of the "classic" graph, where the main difference is that vertex labels are defined and the "weight" edge property is a double rather than a float (diagram).
createTheCrew() - A graph that demonstrates usage of the new structural features of TinkerPop 3.x such as vertex meta-properties and multi-properties (diagram).

console (groovy) groovy

gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]

graph = TinkerFactory.createModern()
g = traversal().withEmbedded(graph)

grateful gremlin As you might have noticed from the diagrams of these graphs or from the output of the Gremlin Console itself, these toy graphs are small (only a few vertices and edges each). It is nice to have a small graph when learning Gremlin, so that you can easily see if you are getting the results you expect. Even though these graphs are "small", they are robust enough in structure to try out many different kinds of traversals. However, if you find that a larger graph might be helpful, there is another option: The Grateful Dead (schema).

console (groovy) groovy

gremlin> graph = TinkerFactory.createGratefulDead()
==>tinkergraph[vertices:808 edges:8049]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard]

graph = TinkerFactory.createGratefulDead()
g = traversal().withEmbedded(graph)

The Grateful Dead graph ships with the Gremlin Console and the data can be found in several formats (along with the other toy graphs previously mentioned) in the console’s data directory.

Tip

If you find yourself in a position where you need to ask a question on the Gremlin Users mailing list about a traversal that you are having trouble with in your application, try to convert the gist of it to one of the toy graphs. Taking this step will make it easier for advanced Gremlin users to help you, which should lead to a faster response time for your problem. In addition, there is the added benefit that the mailing list post will be more relevant to other users, as it is not written solely in the context of your domain. If the sample data sets don’t properly demonstrate your issue, then including a Gremlin script that can construct a small body of sample data would be equally helpful.

As you get familiar with the console, it is good to know what some of the basic commands are. A "command" is not "Gremlin code", but something interpreted by the console to have special meaning in terms of configuring how the console works or performing a particular function outside of code itself. These commands are itemized in the reference documentation, but they can also be accessed within the console itself with the :help command.

console (groovy) groovy

gremlin> :help

For information about Groovy, visit:
    http://groovy-lang.org

Available commands:
  :help       (:h  ) Display this help message
  ?           (:?  ) Alias to: :help
  :exit       (:x  ) Exit the shell
  :quit       (:q  ) Alias to: :exit
  import      (:i  ) Import a class into the namespace
  :display    (:d  ) Display the current buffer
  :clear      (:c  ) Clear the buffer and reset the prompt counter
  :show       (:S  ) Show variables, classes or imports
  :inspect    (:n  ) Inspect a variable or the last result with the GUI object browser
  :purge      (:p  ) Purge variables, classes, imports or preferences
  :edit       (:e  ) Edit the current buffer
  :load       (:l  ) Load a file or URL into the buffer
  .           (:.  ) Alias to: :load
  :save       (:s  ) Save the current buffer to a file
  :record     (:r  ) Record the current session to a file
  :history    (:H  ) Display, manage and recall edit-line history
  :alias      (:a  ) Create an alias
  :grab       (:g  ) Add a dependency to the shell environment
  :register   (:rc ) Register a new command with the shell
  :doc        (:D  ) Open a browser window displaying the doc for the argument
  :set        (:=  ) Set (or list) preferences
  :uninstall  (:-  ) Uninstall a Maven library and its dependencies from the Gremlin Console
  :install    (:+  ) Install a Maven library and its dependencies into the Gremlin Console
  :plugin     (:pin) Manage plugins for the Console
  :remote     (:rem) Define a remote connection
  :submit     (:>  ) Send a Gremlin script to Gremlin Server
  :bytecode   (:bc ) Gremlin bytecode helper commands
  :cls        (:C  ) Clear the screen.

For help on a specific command type:
    :help command

:help

The :help command shows a list of all the commands registered to the console and as this console is based on the Groovy Shell, you will see commands that are inherited from there in addition to the ones provided by TinkerPop. You can also request help on a specific command:

console (groovy) groovy

gremlin> :help :remote

usage: :remote [current|connect <type-of-remote> [<args>]|config <args>|list|next|prev|choose <index>|close]

Define and manage remote connections to use in conjunction with the :submit command, which will send Gremlin scripts to the specified remote agent for processing.

:help :remote

The Gremlin Console can also provide you with code help via auto-complete functionality. Use the <TAB> key to trigger a search of possible method names that might complete what you’ve typed to that point.

As you learn more about Gremlin, you will find many code examples in the documentation and most all will be executable in the console. Trying these examples for yourself and modifying their execution slightly to see how output changes is a good way to go about your Gremlin education.

Use Case: Application Development

gremlin working on tinkerpop You are an application developer and the TinkerPop stack will be central to your application architecture. You need to develop a series of services that will execute queries against a graph database in support of the application front-end.

Most application developers use an IDE, such as Intellij, to help with their software development efforts. The IDE provides shortcuts and conveniences that make complex engineering jobs more productive. When developing applications for TinkerPop, the Gremlin Console should accompany the IDE as an additional tool to enhance that productivity. In other words, when you open your IDE, open the Gremlin Console next to it.

You will find that as you write Gremlin for your code base in your IDE, you will inevitably reach a point of sufficient complexity in your traversals where you will need to:

Quickly test the traversal over real data to determine if it is correct.
Test or debug pieces of the traversal in isolation.
Experiment with different ways of expressing the same traversal.
Examine the performance of a traversal through the profile() or explain() steps.

Consider an example where you are developing an application that uses TinkerGraph and the data from the "modern" toy graph. You want to encapsulate some logic for a graph traversal that finds a "person" vertex, iterates outgoing edges and groups the adjacent vertices as "value maps".

As you have read the TinkerPop documentation and have been experimenting with Gremlin for a while, you head to your IDE with your open project in it and write a simple class like this:

package com.my.company;

import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import static org.apache.tinkerpop.gremlin.structure.T.*;
import static org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__.*;

import java.util.List;
import java.util.Map;

public final class Traversals {
  public static Map<String,List<Vertex>> groupAround(GraphTraversalSource g, long vertexId) {
    return g.V(vertexId).outE().
             group().
               by(label).
               by(inV()).next()
  }
}

Note

TinkerPop code samples typically use static importing, which allows for a more fluid code style. If the static import above were removed in favor of a standard import of the __ and T classes, the traversal would read as follows: g.V(id).outE().group().by(T.label).by(__.inV()).next(). The console automatically performs the static imports for these methods, so they do not need to be imported again in that environment.

The diagram above displays the "modern" graph for reference. Assuming that g refers to a TraversalSource generated from a Graph instance that refers to that graph, calling groupAround with "1" as the vertexId argument, should return a Map with two keys: "knows" and "created", where the "knows" key should have vertices "2" and "4" and the "created" key should have vertex "3". As you are a good developer, you know to write a unit test to validate this outcome. You write your test, compile your application, and execute your test only to find it failing on the "knows" key, which only has one vertex associated to it instead of two.

As you have the Gremlin Console open you decide to debug the problem there. You copy your Gremlin code from the IDE and execute it in the console and confirm the failure:

console (groovy) groovy

gremlin> g.V(1).outE().
           group().
             by(label).
             by(inV())
==>[created:v[3],knows:v[4]]

g.V(1).outE().
  group().
    by(label).
    by(inV())

Note that next() is removed here. The Gremlin Console automatically tries to iterate all results from a line of execution. In the above case, that line returns a Traversal. A Traversal is an Iterator and when the console detects that type it steps through each item in the Iterator and prints it to the screen.

Trying it with the use of next() produces the following:

console (groovy) groovy

gremlin> g.V(1).outE().
           group().
             by(label).
             by(inV()).next()
==>created=v[3]
==>knows=v[4]

g.V(1).outE().
  group().
    by(label).
    by(inV()).next()

In this case, the line of execution does not return a Traversal. It returns the first item in the Traversal with the call to next(). This first item is a Map. When the console detects that it is a Map, it iterates the entrySet() and prints each Map.Entry to the screen. It is possible to "prevent" auto-iteration, which is useful when you want to work with a Traversal as a variable. You can do this with a clever use of a semi-colon:

console (groovy) groovy

gremlin> t = g.V(1).outE().
               group().
                 by(label).
                 by(inV());null
==>null
gremlin> t.next()
==>created=v[3]
==>knows=v[4]

t = g.V(1).outE().
      group().
        by(label).
        by(inV());null
t.next()

Tip	In addition to "returning null", you could also return an empty list as in: `t = g.V(1);[]'.

gremlin console ide The first line assigns the Traversal to t, but the line itself is actually two lines of code as denoted by the semi-colon. The line of execution actually returns null, which is what the console actual auto-iterates. At that point, you can work with t as you desire.

Turning your attention back to the original problem, you can now think about the issue with the Traversal not containing the appropriate number of vertices in the context of iteration. In the original Traversal the second by() modulator takes inV() as an argument (an anonymous Traversal spawned from the __ class whose methods are statically imported to the console). This by() tells Gremlin what aspect of the current group of edges should be stored in the list associated with that group. By specifying inV() you are saying that you want to store the in-vertices of the edges for a group.

Warning

While convenient, statically imported methods can be confusing for new users, especially those who are translating their code between the console (which is Groovy-based) and a Java IDE. Take care with the use of the in() method in this context, as the word in is reserved in Groovy. For the console, you must explicitly use this method as __.in().

Structurally, this Traversal is sound, however it makes an assumption about how inV() will be utilized as an inner Traversal. It is always important to remember that the console does not auto-iterate every Traversal in your script. It only iterates the result of a line of execution. Therefore, inner Traversal instances do not get that benefit, and as such, inV() only has next() called upon it pulling a single vertex from the "knows" edges. You can remedy that by adding fold() to inV() as follows:

console (groovy) groovy

gremlin> g.V(1).outE().
           group().
             by(label).
             by(inV().fold()).next()
==>created=[v[3]]
==>knows=[v[2], v[4]]

g.V(1).outE().
  group().
    by(label).
    by(inV().fold()).next()

You can now see that your result is as expected and you can modify your Java class to reflect the change:

package com.my.company;

import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import static org.apache.tinkerpop.gremlin.structure.T.*;
import static org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__.*;

import java.util.List;
import java.util.Map;

public final class Traversals {
  public static Map<String,List<Vertex>> groupAround(GraphTraversalSource g, long vertexId) {
    return g.V(vertexId).outE().
             group().
               by(label).
               by(inV().fold()).next()
  }
}

Result iteration represents the most common "simple" bug that users encounter. It’s all too easy to write a traversal as follows:

console (groovy) groovy

gremlin> g.V().has('name','marko').drop()
gremlin> g.V().has('name','marko').count()
==>0

g.V().has('name','marko').drop()
g.V().has('name','marko').count()

As you can see, the first traversal removes vertices with the "name" field of "marko" and the second traversal verifies that there are no vertices named "marko" after the first is executed. After seeing success like that in the console, it is all too tempting to copy and paste that line of code to a Java class like:

package com.my.company;

import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;

public final class Traversals {
  public static void removeByName(GraphTraversalSource g, String name) {
    g.V().has("name", name).drop();
  }
}

Of course, this won’t work and you will likely be left wondering why your unit test for "removeByName" is failing, but the identical line of code in the console is doing what is expected. The drop()-step is not some special form of terminating step that iterates the traversal - it is just one more step that vertices will pass through. Outside of the console you must add iterate() as follows:

package com.my.company;

import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;

public final class Traversals {
  public static void removeByName(GraphTraversalSource g, String name) {
    g.V().has("name", name).drop().iterate();
  }
}

The call to iterate() will do what the console does automatically, executing the Traversal instance and stepping through the results. You will generally use iterate() to generate side-effects (e.g. drop vertices from the database), though it has its usage in the console as well. If you have an especially long result set for which side-effects will be generated, you can simply call iterate() on the traversal and avoid a long stream of output to the console.

Gremlin written in the console usually has a copy and paste translation to source files (and vice versa). You need only recall the rules of iteration when you move code between them. It is equally important that you keep an eye on Traversal objects declared as inner traversals or within lambda expressions where they will not receive automatic iteration. Keeping these semantics in mind will save you from many annoying debugging sessions.

Use Case: Ad-hoc Analysis

You are doing some general analysis on a graph with Gremlin and decide that you’d like to store those results in Apache Cassandra for additional analysis with other tools.

gremlin explorer old photo The Gremlin Console is an indispensable tool for working with graph data, but it is also well suited for working with other types of data as well. Its ability to process data from different sources and formats provides a flexible environment for exploratory analysis. This ability stems from the underlying Groovy Shell and the fact that any JVM-based libraries are easily imported into it, making their classes and functions available at the prompt in conjunction with Gremlin.

Let’s consider an example where you are exploring "The Crew" toy graph and that you are interested in doing some analysis on where people live and when they lived there. You decide to start simple and just get a basic feeling for the data of the "person" vertices in the graph:

console (groovy) groovy

gremlin> graph = TinkerFactory.createTheCrew()
==>tinkergraph[vertices:6 edges:14]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[tinkergraph[vertices:6 edges:14], standard]
gremlin>
gremlin> g.V().hasLabel('person').valueMap()
==>[name:[marko],location:[san diego,santa cruz,brussels,santa fe]]
==>[name:[stephen],location:[centreville,dulles,purcellville]]
==>[name:[matthias],location:[bremen,baltimore,oakland,seattle]]
==>[name:[daniel],location:[spremberg,kaiserslautern,aachen]]

graph = TinkerFactory.createTheCrew()
g = traversal().withEmbedded(graph)

g.V().hasLabel('person').valueMap()

You can see from the output above that there are four "person" vertices and each has a "name" property and a "location" property. The "location" is actually a multi-property, where "location" does not have one value, but several. If you look a bit closer you can also see that each "location" has meta-properties as well:

console (groovy) groovy

gremlin> g.V().hasLabel('person').as('person').
               properties('location').as('location').
               select('person','location').by('name').by(valueMap())
==>[person:marko,location:[startTime:1997,endTime:2001]]
==>[person:marko,location:[startTime:2001,endTime:2004]]
==>[person:marko,location:[startTime:2004,endTime:2005]]
==>[person:marko,location:[startTime:2005]]
==>[person:stephen,location:[startTime:1990,endTime:2000]]
==>[person:stephen,location:[startTime:2000,endTime:2006]]
==>[person:stephen,location:[startTime:2006]]
==>[person:matthias,location:[startTime:2004,endTime:2007]]
==>[person:matthias,location:[startTime:2007,endTime:2011]]
==>[person:matthias,location:[startTime:2011,endTime:2014]]
==>[person:matthias,location:[startTime:2014]]
==>[person:daniel,location:[startTime:1982,endTime:2005]]
==>[person:daniel,location:[startTime:2005,endTime:2009]]
==>[person:daniel,location:[startTime:2009]]

g.V().hasLabel('person').as('person').
      properties('location').as('location').
      select('person','location').by('name').by(valueMap())

You are pleased. You like that you have the basic data present to achieve your goal, but you see a couple of problems. First, given a quick glance at the data, you can see that the data doesn’t uniformly start at a particular time. You were hoping to see data presented in such a way that each "person" had data starting and ending in the same years. For example, the first entry for "daniel" is "1982", while the first entry for "marko" is "1997" - where was "marko" in "1982"? The second problem you can see is that the data really isn’t in a format that you need. Ideally, you would like to have something that had rows and columns that was easily dumped to CSV for use in other tools. You currently have the data in two separate traversals and the data is nested.

graph to table

As a first step to solving your problems, you first need to determine the earliest "startTime" that is common to all the "person" vertices, as this will be the main filter for the data you intend to retrieve:

console (groovy) groovy

gremlin> firstYear = g.V().hasLabel('person').
                           local(properties('location').values('startTime').min()).
                           max().next()
==>2004

firstYear = g.V().hasLabel('person').
                  local(properties('location').values('startTime').min()).
                  max().next()

You store that result in a variable called "firstYear", as you will need that later to help filter results in the traversal that ultimately gets the data. In this way, all "person" vertices can be compared from the same start time. It is often helpful to store results from traversals if you intend to work with that data later and the traversal itself is expensive to execute. It is only important to keep in mind that you will be limited by the memory available to the console.

Tip

You can change the amount of memory allotted to the console by altering its -Xmx setting in bin/gremlin.sh. This setting controls the maximum size of the JVM memory allocation pool. To set this value to 1024 megabytes, you would set this value as follows: -Xmx1024m. It is likely best to append this setting to the initialization of the JAVA_OPTIONS variable in that script. If you choose to override JAVA_OPTIONS, be sure to examine the default settings in bin/gremlin.sh to include them as they should not be omitted in your override.

In an attempt to test things out, you take a naive approach at the traversal with your filter for "firstYear" applied:

console (groovy) groovy

gremlin> firstYear = g.V().hasLabel('person').
                           local(properties('location').values('startTime').min()).
                           max().next()
==>2004
gremlin> l = g.V().hasLabel('person').as('person').
                   properties('location').or(has('endTime',gt(firstYear)),hasNot('endTime')).as('location').
                   valueMap().as('times').
                   select('person','location','times').by('name').by(value).by().toList()
==>[person:marko,location:brussels,times:[startTime:2004,endTime:2005]]
==>[person:marko,location:santa fe,times:[startTime:2005]]
==>[person:stephen,location:dulles,times:[startTime:2000,endTime:2006]]
==>[person:stephen,location:purcellville,times:[startTime:2006]]
==>[person:matthias,location:bremen,times:[startTime:2004,endTime:2007]]
==>[person:matthias,location:baltimore,times:[startTime:2007,endTime:2011]]
==>[person:matthias,location:oakland,times:[startTime:2011,endTime:2014]]
==>[person:matthias,location:seattle,times:[startTime:2014]]
==>[person:daniel,location:spremberg,times:[startTime:1982,endTime:2005]]
==>[person:daniel,location:kaiserslautern,times:[startTime:2005,endTime:2009]]
==>[person:daniel,location:aachen,times:[startTime:2009]]

firstYear = g.V().hasLabel('person').
                  local(properties('location').values('startTime').min()).
                  max().next()
l = g.V().hasLabel('person').as('person').
          properties('location').or(has('endTime',gt(firstYear)),hasNot('endTime')).as('location').
          valueMap().as('times').
          select('person','location','times').by('name').by(value).by().toList()

As you scan through the data, you can see that it appears to cover the range of time you were looking for. Of course, you still have the problem of the format of the data. Recalling that the Gremlin Console is an extension of the Groovy Console, you decide to just process "l" with some Groovy syntax to coerce it into the format that you would like to see for your rows and columns style output:

console (groovy) groovy

gremlin> firstYear = g.V().hasLabel('person').
                           local(properties('location').values('startTime').min()).
                           max().next()
==>2004
gremlin> l = g.V().hasLabel('person').as('person').
                   properties('location').or(has('endTime',gt(firstYear)),hasNot('endTime')).as('location').
                   valueMap().as('times').
                   select('person','location','times').by('name').by(value).by().toList()
==>[person:marko,location:brussels,times:[startTime:2004,endTime:2005]]
==>[person:marko,location:santa fe,times:[startTime:2005]]
==>[person:stephen,location:dulles,times:[startTime:2000,endTime:2006]]
==>[person:stephen,location:purcellville,times:[startTime:2006]]
==>[person:matthias,location:bremen,times:[startTime:2004,endTime:2007]]
==>[person:matthias,location:baltimore,times:[startTime:2007,endTime:2011]]
==>[person:matthias,location:oakland,times:[startTime:2011,endTime:2014]]
==>[person:matthias,location:seattle,times:[startTime:2014]]
==>[person:daniel,location:spremberg,times:[startTime:1982,endTime:2005]]
==>[person:daniel,location:kaiserslautern,times:[startTime:2005,endTime:2009]]
==>[person:daniel,location:aachen,times:[startTime:2009]]
gremlin> l.collect{
           row->((Math.max(row.times.startTime,firstYear))..((row.times.endTime?:2017)-1)).collect{
             year->[person:row.person,location:row.location,year:year]}}.flatten()
==>[person:marko,location:brussels,year:2004]
==>[person:marko,location:santa fe,year:2005]
==>[person:marko,location:santa fe,year:2006]
==>[person:marko,location:santa fe,year:2007]
==>[person:marko,location:santa fe,year:2008]
==>[person:marko,location:santa fe,year:2009]
==>[person:marko,location:santa fe,year:2010]
==>[person:marko,location:santa fe,year:2011]
==>[person:marko,location:santa fe,year:2012]
==>[person:marko,location:santa fe,year:2013]
==>[person:marko,location:santa fe,year:2014]
==>[person:marko,location:santa fe,year:2015]
==>[person:marko,location:santa fe,year:2016]
==>[person:stephen,location:dulles,year:2004]
==>[person:stephen,location:dulles,year:2005]
==>[person:stephen,location:purcellville,year:2006]
==>[person:stephen,location:purcellville,year:2007]
==>[person:stephen,location:purcellville,year:2008]
==>[person:stephen,location:purcellville,year:2009]
==>[person:stephen,location:purcellville,year:2010]
==>[person:stephen,location:purcellville,year:2011]
==>[person:stephen,location:purcellville,year:2012]
==>[person:stephen,location:purcellville,year:2013]
==>[person:stephen,location:purcellville,year:2014]
==>[person:stephen,location:purcellville,year:2015]
==>[person:stephen,location:purcellville,year:2016]
==>[person:matthias,location:bremen,year:2004]
==>[person:matthias,location:bremen,year:2005]
==>[person:matthias,location:bremen,year:2006]
==>[person:matthias,location:baltimore,year:2007]
==>[person:matthias,location:baltimore,year:2008]
==>[person:matthias,location:baltimore,year:2009]
==>[person:matthias,location:baltimore,year:2010]
==>[person:matthias,location:oakland,year:2011]
==>[person:matthias,location:oakland,year:2012]
==>[person:matthias,location:oakland,year:2013]
==>[person:matthias,location:seattle,year:2014]
==>[person:matthias,location:seattle,year:2015]
==>[person:matthias,location:seattle,year:2016]
==>[person:daniel,location:spremberg,year:2004]
==>[person:daniel,location:kaiserslautern,year:2005]
==>[person:daniel,location:kaiserslautern,year:2006]
==>[person:daniel,location:kaiserslautern,year:2007]
==>[person:daniel,location:kaiserslautern,year:2008]
==>[person:daniel,location:aachen,year:2009]
==>[person:daniel,location:aachen,year:2010]
==>[person:daniel,location:aachen,year:2011]
==>[person:daniel,location:aachen,year:2012]
==>[person:daniel,location:aachen,year:2013]
==>[person:daniel,location:aachen,year:2014]
==>[person:daniel,location:aachen,year:2015]
==>[person:daniel,location:aachen,year:2016]

firstYear = g.V().hasLabel('person').
                  local(properties('location').values('startTime').min()).
                  max().next()
l = g.V().hasLabel('person').as('person').
          properties('location').or(has('endTime',gt(firstYear)),hasNot('endTime')).as('location').
          valueMap().as('times').
          select('person','location','times').by('name').by(value).by().toList()
l.collect{
  row->((Math.max(row.times.startTime,firstYear))..((row.times.endTime?:2017)-1)).collect{
    year->[person:row.person,location:row.location,year:year]}}.flatten()

You had to apply a bit of brute force, but now you have the rows and columns you wanted, with the data normalized and flattened in such a way that each year since "2004" is represented all the way up to "2016".

gremlin asciiart Unfortunately, you are unsatisfied. The added Groovy processing of "l" feels "wrong" despite it producing the correct output. It has that unfortunate hack for dealing with the possibility that the "endTime" property contains a "null" value, thus hard-coding the "2017" year into the it (you want the years through "2016"). You also recall that the Gremlin language has advanced considerably in TinkerPop 3.x and that it is usually possible to eliminate closures and other direct processing with Groovy. With those issues in mind, you look to enhance your work.

A first step would be to get rid of the hard-coded "2017". You decide to get the current year programmatically by using java.time.Year. This class is not one that is available by default in the console. You might think of this as similar to what happens when you decide to use a particular class in a Java file. You must "import" the classes that you wish to use. To do this, you need to use the import command:

console (groovy) groovy

gremlin> import java.time.Year
==>java.time.Year
gremlin> Year.now()
==>2024

import java.time.Year
Year.now()

You can now use Year with the constant() step, to produce the set of years to have for each person up to the current year:

console (groovy) groovy

gremlin> import java.time.Year
==>java.time.Year
gremlin> firstYear = g.V().hasLabel('person').
                           local(properties('location').values('startTime').min()).
                           max().next()
==>2004
gremlin> g.V().hasLabel("person").as("person").
               constant((firstYear..(Year.now().value)).toList()).unfold().as("year").
               select('person','year').by('name').by()
==>[person:marko,year:2004]
==>[person:marko,year:2005]
==>[person:marko,year:2006]
==>[person:marko,year:2007]
==>[person:marko,year:2008]
==>[person:marko,year:2009]
==>[person:marko,year:2010]
==>[person:marko,year:2011]
==>[person:marko,year:2012]
==>[person:marko,year:2013]
==>[person:marko,year:2014]
==>[person:marko,year:2015]
==>[person:marko,year:2016]
==>[person:marko,year:2017]
==>[person:marko,year:2018]
==>[person:marko,year:2019]
==>[person:marko,year:2020]
==>[person:marko,year:2021]
==>[person:marko,year:2022]
==>[person:marko,year:2023]
==>[person:marko,year:2024]
==>[person:stephen,year:2004]
==>[person:stephen,year:2005]
==>[person:stephen,year:2006]
==>[person:stephen,year:2007]
==>[person:stephen,year:2008]
==>[person:stephen,year:2009]
==>[person:stephen,year:2010]
==>[person:stephen,year:2011]
==>[person:stephen,year:2012]
==>[person:stephen,year:2013]
==>[person:stephen,year:2014]
==>[person:stephen,year:2015]
==>[person:stephen,year:2016]
==>[person:stephen,year:2017]
==>[person:stephen,year:2018]
==>[person:stephen,year:2019]
==>[person:stephen,year:2020]
==>[person:stephen,year:2021]
==>[person:stephen,year:2022]
==>[person:stephen,year:2023]
==>[person:stephen,year:2024]
==>[person:matthias,year:2004]
==>[person:matthias,year:2005]
==>[person:matthias,year:2006]
==>[person:matthias,year:2007]
==>[person:matthias,year:2008]
==>[person:matthias,year:2009]
==>[person:matthias,year:2010]
==>[person:matthias,year:2011]
==>[person:matthias,year:2012]
==>[person:matthias,year:2013]
==>[person:matthias,year:2014]
==>[person:matthias,year:2015]
==>[person:matthias,year:2016]
==>[person:matthias,year:2017]
==>[person:matthias,year:2018]
==>[person:matthias,year:2019]
==>[person:matthias,year:2020]
==>[person:matthias,year:2021]
==>[person:matthias,year:2022]
==>[person:matthias,year:2023]
==>[person:matthias,year:2024]
==>[person:daniel,year:2004]
==>[person:daniel,year:2005]
==>[person:daniel,year:2006]
==>[person:daniel,year:2007]
==>[person:daniel,year:2008]
==>[person:daniel,year:2009]
==>[person:daniel,year:2010]
==>[person:daniel,year:2011]
==>[person:daniel,year:2012]
==>[person:daniel,year:2013]
==>[person:daniel,year:2014]
==>[person:daniel,year:2015]
==>[person:daniel,year:2016]
==>[person:daniel,year:2017]
==>[person:daniel,year:2018]
==>[person:daniel,year:2019]
==>[person:daniel,year:2020]
==>[person:daniel,year:2021]
==>[person:daniel,year:2022]
==>[person:daniel,year:2023]
==>[person:daniel,year:2024]

import java.time.Year
firstYear = g.V().hasLabel('person').
                  local(properties('location').values('startTime').min()).
                  max().next()
g.V().hasLabel("person").as("person").
      constant((firstYear..(Year.now().value)).toList()).unfold().as("year").
      select('person','year').by('name').by()

From there you can build on that traversal to grab the "location" given the generated "year" for that data:

console (groovy) groovy

gremlin> import java.time.Year
==>java.time.Year
gremlin> firstYear = g.V().hasLabel('person').
                           local(properties('location').values('startTime').min()).
                           max().next()
==>2004
gremlin> g.V().hasLabel("person").as("person").
               constant((firstYear..(new Date().getYear() + 1900)).toList()).unfold().as("year").
               select("person").coalesce(
                 properties("location").filter(values("startTime").where(gte("year"))).
                                        order().by("startTime").limit(1),
                 properties("location").hasNot("endTime")).value().as("location").
               select("person","year","location").by("name").by().by()
==>[person:marko,year:2004,location:brussels]
==>[person:marko,year:2005,location:santa fe]
==>[person:marko,year:2006,location:santa fe]
==>[person:marko,year:2007,location:santa fe]
==>[person:marko,year:2008,location:santa fe]
==>[person:marko,year:2009,location:santa fe]
==>[person:marko,year:2010,location:santa fe]
==>[person:marko,year:2011,location:santa fe]
==>[person:marko,year:2012,location:santa fe]
==>[person:marko,year:2013,location:santa fe]
==>[person:marko,year:2014,location:santa fe]
==>[person:marko,year:2015,location:santa fe]
==>[person:marko,year:2016,location:santa fe]
==>[person:marko,year:2017,location:santa fe]
==>[person:marko,year:2018,location:santa fe]
==>[person:marko,year:2019,location:santa fe]
==>[person:marko,year:2020,location:santa fe]
==>[person:marko,year:2021,location:santa fe]
==>[person:marko,year:2022,location:santa fe]
==>[person:marko,year:2023,location:santa fe]
==>[person:marko,year:2024,location:santa fe]
==>[person:stephen,year:2004,location:purcellville]
==>[person:stephen,year:2005,location:purcellville]
==>[person:stephen,year:2006,location:purcellville]
==>[person:stephen,year:2007,location:purcellville]
==>[person:stephen,year:2008,location:purcellville]
==>[person:stephen,year:2009,location:purcellville]
==>[person:stephen,year:2010,location:purcellville]
==>[person:stephen,year:2011,location:purcellville]
==>[person:stephen,year:2012,location:purcellville]
==>[person:stephen,year:2013,location:purcellville]
==>[person:stephen,year:2014,location:purcellville]
==>[person:stephen,year:2015,location:purcellville]
==>[person:stephen,year:2016,location:purcellville]
==>[person:stephen,year:2017,location:purcellville]
==>[person:stephen,year:2018,location:purcellville]
==>[person:stephen,year:2019,location:purcellville]
==>[person:stephen,year:2020,location:purcellville]
==>[person:stephen,year:2021,location:purcellville]
==>[person:stephen,year:2022,location:purcellville]
==>[person:stephen,year:2023,location:purcellville]
==>[person:stephen,year:2024,location:purcellville]
==>[person:matthias,year:2004,location:bremen]
==>[person:matthias,year:2005,location:baltimore]
==>[person:matthias,year:2006,location:baltimore]
==>[person:matthias,year:2007,location:baltimore]
==>[person:matthias,year:2008,location:oakland]
==>[person:matthias,year:2009,location:oakland]
==>[person:matthias,year:2010,location:oakland]
==>[person:matthias,year:2011,location:oakland]
==>[person:matthias,year:2012,location:seattle]
==>[person:matthias,year:2013,location:seattle]
==>[person:matthias,year:2014,location:seattle]
==>[person:matthias,year:2015,location:seattle]
==>[person:matthias,year:2016,location:seattle]
==>[person:matthias,year:2017,location:seattle]
==>[person:matthias,year:2018,location:seattle]
==>[person:matthias,year:2019,location:seattle]
==>[person:matthias,year:2020,location:seattle]
==>[person:matthias,year:2021,location:seattle]
==>[person:matthias,year:2022,location:seattle]
==>[person:matthias,year:2023,location:seattle]
==>[person:matthias,year:2024,location:seattle]
==>[person:daniel,year:2004,location:kaiserslautern]
==>[person:daniel,year:2005,location:kaiserslautern]
==>[person:daniel,year:2006,location:aachen]
==>[person:daniel,year:2007,location:aachen]
==>[person:daniel,year:2008,location:aachen]
==>[person:daniel,year:2009,location:aachen]
==>[person:daniel,year:2010,location:aachen]
==>[person:daniel,year:2011,location:aachen]
==>[person:daniel,year:2012,location:aachen]
==>[person:daniel,year:2013,location:aachen]
==>[person:daniel,year:2014,location:aachen]
==>[person:daniel,year:2015,location:aachen]
==>[person:daniel,year:2016,location:aachen]
==>[person:daniel,year:2017,location:aachen]
==>[person:daniel,year:2018,location:aachen]
==>[person:daniel,year:2019,location:aachen]
==>[person:daniel,year:2020,location:aachen]
==>[person:daniel,year:2021,location:aachen]
==>[person:daniel,year:2022,location:aachen]
==>[person:daniel,year:2023,location:aachen]
==>[person:daniel,year:2024,location:aachen]

import java.time.Year
firstYear = g.V().hasLabel('person').
                  local(properties('location').values('startTime').min()).
                  max().next()
g.V().hasLabel("person").as("person").
      constant((firstYear..(new Date().getYear() + 1900)).toList()).unfold().as("year").
      select("person").coalesce(
        properties("location").filter(values("startTime").where(gte("year"))).
                               order().by("startTime").limit(1),
        properties("location").hasNot("endTime")).value().as("location").
      select("person","year","location").by("name").by().by()

Tip	Not sure what the above traversal is doing? When you come across a traversal that you don’t understand fully, the Gremlin Console is great place to get help. You can dismantle a large traversal and execute it in smaller parts to see what each part produces as output.

You now have a traversal written with idiomatic Gremlin with the results in the form that you wanted to have. Now you’d like to dump this data to Cassandra for further analysis in another tool. You decide to use the DataStax java-driver in the console to write to Cassandra.

graph to table to cassandra

The driver does not come bundled with the console and is not available on its classpath by default. You can bring other libraries into the console with the :install command. With :install, you can reference the Maven coordinates (i.e. group, artifact, and version) of a library to have it automatically downloaded from a Maven repository and placed into the console classpath. If you have read through the reference documentation, you would find a number of examples of :install usage to bring in unbundled TinkerPop libraries, like neo4j-gremlin or hadoop-gremlin.

Important

Before you use the :install command, please be sure to read the reference documentation on Grape configuration. If you do not have proper settings in place, it is likely that the :install command will fail by way of download errors.

Tip	You can also manually "install" dependencies to the console by copying them into the Gremlin Console classpath. This is most easily accomplished by copying the required jar files to the `GREMLIN_HOME/lib` directory.

gremlin> :install com.datastax.cassandra cassandra-driver-core 2.1.9
==>Loaded: [com.datastax.cassandra, cassandra-driver-core, 2.1.9]
gremlin> import com.datastax.driver.core.*
==>groovy.grape.Grape, org.apache.commons.configuration2.*, ..., com.datastax.driver.core.*
gremlin> import static com.datastax.driver.core.querybuilder.QueryBuilder.*
==>groovy.grape.Grape, org.apache.commons.configuration2.*, ..., static com.datastax.driver.core.querybuilder.QueryBuilder.*
gremlin> cluster = com.datastax.driver.core.Cluster.builder().addContactPoint("localhost").build()
==>com.datastax.driver.core.Cluster@3e1624c7
gremlin> session = cluster.connect()
==>com.datastax.driver.core.SessionManager@35764bef
gremlin> session.execute("CREATE KEYSPACE crew WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }")
gremlin> session.execute("USE crew")
gremlin> session.execute("CREATE TABLE locations ( name varchar, location varchar, year int, PRIMARY KEY (name, year))")

In the above code, you first use :install to pull in the dependencies of the driver. When that first line has executed to completion you can inspect the GREMLIN_HOME/ext directory to see that the appropriate jar files have been copied to the classpath. The remaining lines of code demonstrate how to instantiate a driver instance to connect to a running Cassandra instance. CQL statements are then issued to create the keyspace and table to hold the data.

Now that you have a Session established with a table to store the data in, you can iterate through the Traversal and stream the data to Cassandra:

gremlin> g.V().hasLabel("person").as("person").
gremlin>       constant((firstYear..(new Date().getYear() + 1900)).toList()).unfold().as("year").
gremlin>       select("person").coalesce(
gremlin>         properties("location").filter(values("startTime").where(gte("year"))).
gremlin>                                order().by("startTime").limit(1),
gremlin>         properties("location").hasNot("endTime")).value().as("location").
gremlin>       select("person","year","location").by("name").by().by().
gremlin>       forEachRemaining{
gremlin>         def statement = insertInto("locations").
gremlin>                         value("name", it.person).
gremlin>                         value("location", it.location).
gremlin>                         value("year", it.year)
gremlin>         session.execute(statement)
gremlin>       }
gremlin> session.execute(select().all().from("locations"))
==>Row[daniel, 2004, kaiserslautern]
==>Row[daniel, 2005, kaiserslautern]
==>Row[daniel, 2006, aachen]
==>Row[daniel, 2007, aachen]
==>Row[daniel, 2008, aachen]
==>Row[daniel, 2009, aachen]
==>Row[daniel, 2010, aachen]
...
==>Row[stephen, 2015, purcellville]
==>Row[stephen, 2016, purcellville]

Iteration is performed by the call to forEachRemaining(). The closure supplied to that method is applied to each "row" in the Traversal. Note the use of def in that closure to declare the "statement" variable. In the console, the use of def inside a closure scopes that variable to the closure. Without def the "row" variable would be accessible globally (i.e. at the gremlin> prompt). The use of def at the console prompt for variable definition is unnecessary and will result in error:

gremlin> def x = 10
==>10
gremlin> x
No such property: x for class: groovysh_evaluate
Display stack trace? [yN] n

Tip

If you find that you always work with a particular library, consider starting the console with an initialization script that prepares your environment for you and start Gremlin Console in interactive mode. An "initialization script" is just a Groovy script that contains the initial commands to execute when the console starts. Following the use case, it would be nice if the initialization script contained the import statement for the driver and possibly the code to get the Session object ready for use. Start the Gremlin Console with that script by just adding it as an argument on the command line: bin/gremlin.sh -i init.groovy.

This use case focused on using a Cassandra related library, but it should be evident that it would be equally straightforward to perform this same data dump to HBase, Microsoft SQL Server, MongoDB, etc. You should further note, that you are not restricted to a "data dump". You could just as easily :install libraries to read data from Oracle into a graph, use functions from Commons Math, or do anything else you can think of with available JVM libraries.

Summary

These use cases have tried to demonstrate some of the common ways in which you can use the Gremlin Console. In the process, they exposed tips and pitfalls to be aware of when working with it. Hopefully, you have gained some new knowledge on what the console can do for you and have been inspired to work with it in more productive ways.