Getting Started

3.7.4-SNAPSHOT

Getting Started

Apache TinkerPop™ is an open source Graph Computing Framework. Within itself, TinkerPop represents a large collection of capabilities and technologies and, in its wider ecosystem, an additionally extended world of third-party contributed graph libraries and systems. TinkerPop’s ecosystem can appear complex to newcomers of all experience levels, especially when glancing at the reference documentation for the first time.

So, where do you get started with TinkerPop? How do you dive in quickly and get productive? Well … Gremlin, the most recognizable citizen of The TinkerPop, is here to help with this thirty-minute tutorial. That’s right: in just thirty short minutes, you too can be fit to start building graph applications with TinkerPop. Welcome to The TinkerPop Workout — by Gremlin!

The First Five Minutes

It is quite possible to learn a lot in just five minutes with TinkerPop, but before doing so, a proper introduction of your trainer is in order. Meet Gremlin!

gremlin standing

Gremlin helps you navigate the vertices and edges of a graph. He is essentially your query language to graph databases, as SQL is the query language to relational databases. To tell Gremlin how he should "traverse" the graph (i.e., what you want your query to do) you need a way to provide him commands in the language he understands — and, of course, that language is called "Gremlin". For this task, you need one of TinkerPop’s most important tools: The Gremlin Console.

Note	Are you unsure of what a vertex or edge is? That topic is covered in the next section, but please allow the tutorial to get you oriented with the Gremlin Console first, so that you have an understanding of the tool that will help you with your learning experience.

Download the console, unpackage it and start it:

$ unzip apache-tinkerpop-gremlin-console-3.7.4-SNAPSHOT-bin.zip
$ cd apache-tinkerpop-gremlin-console-3.7.4-SNAPSHOT
$ bin/gremlin.sh

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin>

Tip	Windows users may use the included `bin/gremlin.bat` (`bin/gremlin-java8.bat` for Java 8) file to start the Gremlin Console.

Tip	If you make a mistake when entering a command and the prompt gets "stuck" (for example, after a syntax error or an incomplete statement), you can use the `:clear` (or `:c`) command to reset the console’s input buffer and return to a normal prompt.

The Gremlin Console is a REPL environment, which provides a nice way to learn Gremlin as you get immediate feedback for the code that you enter. This eliminates the more complex need to "create a project" to try things out. The console is not just for "getting started", however. You will find yourself using it for a variety of TinkerPop-related activities, such as loading data, administering graphs and working out complex traversals.

To get Gremlin to traverse a graph, you need a TraversalSource instance, which holds a reference to a Graph instance, which in turn holds the structure and data of the graph. TinkerPop is a graph abstraction layer over different graph databases and different graph processors, so there are many Graph instances you can choose from to instantiate a connection to in the console. The best Graph instance to start with, however, is TinkerGraph. TinkerGraph is a fast, in-memory graph database with a small handful of configuration options, making it a good choice for beginners.

Tip	TinkerGraph is not just a toy for beginners. It is useful in analyzing subgraphs taken from a large graph, working with a small static graph that doesn’t change much, writing unit tests and other use cases where the graph can fit in memory.

Tip

For purposes of "getting started", resist the temptation to dig into more complex databases that have lots of configuration options or to delve into how to get Gremlin Server working properly. Focusing on the basics, presented in this guide, builds a good foundation for all the other things TinkerPop offers.

To make your learning process even easier, start with one of TinkerPop’s "toy" graphs. These are "small" graphs designed to provide a quick start into querying. It is good to get familiar with them, as almost all TinkerPop documentation is based on them and when you need help and have to come to the mailing list, a failing example put in the context of the toy graphs can usually get you a fast answer to your problem.

Tip

When asking questions on the mailing list or StackOverflow about Gremlin, it’s always helpful to include a sample graph so that those attempting to answer your question understand exactly what kind of graph you have and can focus their energies on providing a good, tested answer rather than trying to build sample data themselves. The sample graph should just be a simple Gremlin script that can be copied and pasted into a Gremlin Console session.

For your first graph, use the "Modern" graph, which looks like this:

tinkerpop modern

It can be instantiated in the console this way:

console (groovy) groovy

gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]

graph = TinkerFactory.createModern()
g = traversal().withEmbedded(graph)

The first command creates a Graph instance named graph, which thus provides a reference to the data you want Gremlin to traverse. Unfortunately, just having graph doesn’t provide Gremlin enough context to do his job. You also need something called a TraversalSource, which is generated by the second command. The TraversalSource provides additional information to Gremlin (such as the traversal strategies to apply and the traversal engine to use) which provides him guidance on how to execute his trip around the Graph.

There are several ways to create a TraversalSource. The example above uses the embedded style and is an approach restricted to languages using the Java Virtual Machine (JVM). Other methods are similar in form, but are not the focus of this tutorial. See the Reference Documentation for more information on the different ways of connecting with Gremlin.

Note	The `traversal()` method is statically imported from the `AnonymousTraversalSource` class so that it can be used in a more fluent fashion. There are common imports for all languages that support Gremlin to make it easier to read and to write (Java, Python, .NET, Javascript).

With your TraversalSource g available it is now possible to ask Gremlin to traverse the Graph:

console (groovy) groovy

gremlin> g.V() //// (1)
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
gremlin> g.V(1) //// (2)
==>v[1]
gremlin> g.V(1).values('name') //// (3)
==>marko
gremlin> g.V(1).outE('knows') //// (4)
==>e[7][1-knows->2]
==>e[8][1-knows->4]
gremlin> g.V(1).outE('knows').inV().values('name') //// (5)
==>vadas
==>josh
gremlin> g.V(1).out('knows').values('name') //// (6)
==>vadas
==>josh
gremlin> g.V(1).out('knows').has('age', gt(30)).values('name') //// (7)
==>josh

g.V() //// (1)
g.V(1) //// (2)
g.V(1).values('name') //// (3)
g.V(1).outE('knows') //// (4)
g.V(1).outE('knows').inV().values('name') //// (5)
g.V(1).out('knows').values('name') //// (6)
g.V(1).out('knows').has('age', gt(30)).values('name')    //7

Get all the vertices in the Graph.
Get the vertex with the unique identifier of "1".
Get the value of the name property on the vertex with the unique identifier of "1".
Get the edges with the label "knows" for the vertex with the unique identifier of "1".
Get the names of the people whom the vertex with the unique identifier of "1" "knows".
Note that when one uses outE().inV() as shown in the previous command, this can be shortened to just out() (similar to inE().outV() and in() for incoming edges).
Get the names of the people vertex "1" knows who are over the age of 30.

Tip	The variable `g`, the `TraversalSource`, only needs to be instantiated once and should then be re-used.

Important

A Traversal is essentially an Iterator so if you have code like x = g.V(), the x does not contain the results of the g.V() query. Rather, that statement assigns an Iterator value to x. To get your results, you would then need to iterate through x. This understanding is important because in the context of the console typing g.V() instantly returns a value. The console does some magic for you by noticing that g.V() returns an Iterator and then automatically iterates the results. In short, when writing Gremlin outside of the console always remember that you must iterate your Traversal manually in some way for it to do anything. The concept of "iterating your traversal" is described further in The Gremlin Console Tutorial.

In this first five minutes with Gremlin, you’ve gotten the Gremlin Console installed, instantiated a Graph and TraversalSource, written some traversals and hopefully learned something about TinkerPop in general. You’ve only scratched the surface of what there is to know, but those accomplishments will help enable your understanding of the more detailed sections to come.

The Next Fifteen Minutes

In the first five minutes of The TinkerPop Workout — by Gremlin, you learned some basics for traversing graphs. Of course, there wasn’t much discussion about what a graph is. A graph is a collection of vertices (i.e., nodes, dots) and edges (i.e., relationships, lines), where a vertex is an entity which represents some domain object (e.g., a person or a place) and an edge represents the relationship between two vertices.

modern edge 1 to 3 1

The diagram above shows a graph with two vertices, one with a unique identifier of "1" and another with a unique identifier of "3". There is an edge connecting the two with a unique identifier of "9". It is important to consider that the edge has a direction, which goes out from vertex "1" and in to vertex "3".

Important

Most TinkerPop implementations do not allow for identifier assignment. They will rather assign their own identifiers and ignore assigned identifiers that you attempt to assign to them.

A graph with elements that just have identifiers does not make for much of a database. To give some meaning to this basic structure, vertices and edges can each be given labels to categorize them.

modern edge 1 to 3 2

You can now see that vertex "1" is a "person" and vertex "3" is a "software" vertex. They are joined by a "created" edge which allows you to see that a "person created software". The "label" and the "id" are reserved attributes of vertices and edges, but you can add your own arbitrary properties as well:

modern edge 1 to 3 3

This model is referred to as a property graph and it provides a flexible and intuitive way in which to model your data.

Creating a Graph

As intuitive as it is to you, it is perhaps more intuitive to Gremlin himself, as vertices, edges and properties make up the very elements of his existence. It is indeed helpful to think of our friend, Gremlin, moving about a graph when developing traversals, as picturing his position as the traverser helps orient where you need him to go next. Let’s use the two-vertex, one-edge graph we’ve been discussing above as an example. First, you need to create this graph:

console (groovy) groovy

gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> v1 = g.addV("person").property(id, 1).property("name", "marko").property("age", 29).next()
==>v[1]
gremlin> v2 = g.addV("software").property(id, 3).property("name", "lop").property("lang", "java").next()
==>v[3]
gremlin> g.addE("created").from(v1).to(v2).property(id, 9).property("weight", 0.4)
==>e[9][1-created->3]

graph = TinkerGraph.open()
g = traversal().withEmbedded(graph)
v1 = g.addV("person").property(id, 1).property("name", "marko").property("age", 29).next()
v2 = g.addV("software").property(id, 3).property("name", "lop").property("lang", "java").next()
g.addE("created").from(v1).to(v2).property(id, 9).property("weight", 0.4)

There are a number of important things to consider in the above code. First, recall that id is "reserved" for special usage in TinkerPop. It is a member of the enum, T. Those "keys" supplied to the creation method are statically imported to the console, which allows you to access them without having to specify their owning enum. Think of id as a shorthand form that enables a more fluid code style. You would normally refer to it as T.id, so without that static importing you would instead have to write:

console (groovy) groovy

gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> v1 = g.addV("person").property(T.id, 1).property("name", "marko").property("age", 29).next()
==>v[1]
gremlin> v2 = g.addV("software").property(T.id, 3).property("name", "lop").property("lang", "java").next()
==>v[3]
gremlin> g.addE("created").from(v1).to(v2).property(T.id, 9).property("weight", 0.4)
==>e[9][1-created->3]

graph = TinkerGraph.open()
g = traversal().withEmbedded(graph)
v1 = g.addV("person").property(T.id, 1).property("name", "marko").property("age", 29).next()
v2 = g.addV("software").property(T.id, 3).property("name", "lop").property("lang", "java").next()
g.addE("created").from(v1).to(v2).property(T.id, 9).property("weight", 0.4)

Note

On the JVM, the fully qualified name for T is org.apache.tinkerpop.gremlin.structure.T. Another important static import that is often seen in Gremlin comes from org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__, which allows for the creation of anonymous traversals. You can find the analogous variations of T and __ for other Gremlin languages by viewing the "Common Imports" sections for the programming language you are interested in in the Reference Documentation.

Second, don’t forget that you are working with TinkerGraph, which allows for identifier assignment. That is not the case with most graph databases.

Finally, the label for an Edge is required and is thus part of the method signature of addEdge(). This usage of addEdge is creating an edge that goes out of v1 and into v2 with a label of "created".

Graph Traversal - Staying Simple

Now that Gremlin knows where the graph data is, you can ask him to get you some data from it by doing a traversal, which you can think of as executing some process over the structure of the graph. We can form our question in English and then translate it to Gremlin. For this initial example, let’s ask Gremlin: "What software has Marko created?"

To answer this question, we would want Gremlin to:

Find "marko" in the graph
Walk along the "created" edges to "software" vertices
Select the "name" property of the "software" vertices

The English-based steps above largely translate to Gremlin’s position in the graph and to the steps we need to take to ask him to answer our question. By stringing these steps together, we form a Traversal or the sequence of programmatic steps Gremlin needs to perform in order to get you an answer.

Let’s start with finding "marko". This operation is a filtering step as it searches the full set of vertices to match those that have the "name" property value of "marko". This can be done with the has() step as follows:

console (groovy) groovy

gremlin> g.V().has('name','marko')
==>v[1]

g.V().has('name','marko')

Note	The variable `g` is the `TraversalSource`, which was introduced in the "The First Five Minutes". The `TraversalSource` is created with `traversal().withEmbedded(graph)` and is the object used to spawn new traversals.

This bit of Gremlin can be improved and made more idiomatically pleasing by including the vertex label as part of the filter to ensure that the "name" property key refers to a "person" vertex.

console (groovy) groovy

gremlin> g.V().has('person','name','marko')
==>v[1]

g.V().has('person','name','marko')

We can picture this traversal in our little graph with Gremlin sitting on vertex "1".

modern edge 1 to 3 1 gremlin

When Gremlin is on a vertex or an edge, he has access to all the properties that are available to that element.

Important

The above query iterates all the vertices in the graph to get its answer. That’s fine for our little example, but for multi-million- or billion-edge graphs that is a big problem. To solve this problem, you should look to use indices. TinkerPop does not provide an abstraction for index management. You should consult the documentation of the graph you have chosen and utilize its native API to create indices which will then speed up these types of lookups. Your traversals will remain unchanged, however, as the indices will be used transparently at execution time.

Now that Gremlin has found "marko", he can consider the next step in the traversal where we ask him to "walk" along "created" edges to "software" vertices. As described earlier, edges have direction, so we have to tell Gremlin what direction to follow. In this case, we want him to traverse on outgoing edges from the "marko" vertex. For this, we use the outE step.

console (groovy) groovy

gremlin> g.V().has('person','name','marko').outE('created')
==>e[9][1-created->3]

g.V().has('person','name','marko').outE('created')

At this point, you can picture Gremlin moving from the "marko" vertex to the "created" edge.

modern edge 1 to 3 2 gremlin

To get to the vertex on the other end of the edge, you need to tell Gremlin to move from the edge to the incoming vertex with inV().

console (groovy) groovy

gremlin> g.V().has('person','name','marko').outE('created').inV()
==>v[3]

g.V().has('person','name','marko').outE('created').inV()

You can now picture Gremlin on the "software" vertex as follows:

modern edge 1 to 3 3 gremlin

As you are not asking Gremlin to do anything with the properties of the "created" edge, you can simplify the statement above with:

console (groovy) groovy

gremlin> g.V().has('person','name','marko').out('created')
==>v[3]

g.V().has('person','name','marko').out('created')

modern edge 1 to 3 4 gremlin

Finally, now that Gremlin has reached the "software that Marko created", he has access to the properties of the "software" vertex and you can therefore ask Gremlin to extract the value of the "name" property as follows:

console (groovy) groovy

gremlin> g.V().has('person','name','marko').out('created').values('name')
==>lop

g.V().has('person','name','marko').out('created').values('name')

You should now be able to see the connection Gremlin has to the structure of the graph and how Gremlin maneuvers from vertices to edges and so on. Your ability to string together steps to ask Gremlin to do more complex things depends on your understanding of these basic concepts.

Graph Traversal - Increasing Complexity

Armed with the knowledge from the previous section, let’s ask Gremlin to perform some more difficult traversal tasks. There’s not much more that can be done with the "baby" graph we had, so let’s return to the "modern" toy graph from the "First Five Minutes" section. Recall that you can create this Graph and establish a TraversalSource with:

console (groovy) groovy

gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]

graph = TinkerFactory.createModern()
g = traversal().withEmbedded(graph)

Earlier we’d used the has()-step to tell Gremlin how to find the "marko" vertex. Let’s look at some other ways to use has(). What if we wanted Gremlin to find the "age" values of both "vadas" and "marko"? In this case we could use the within comparator with has() as follows:

console (groovy) groovy

gremlin> g.V().has('person','name',within('vadas','marko')).values('age')
==>29
==>27

g.V().has('person','name',within('vadas','marko')).values('age')

It is worth noting that within is statically imported from P to the Gremlin Console (much like T is, as described earlier).

Note	On the JVM, the fully qualified name for `P` is `org.apache.tinkerpop.gremlin.process.traversal.P`. You can find the analogous variation of `P` for other Gremlin languages by viewing the "Common Imports" sections for the programming language you are interested in in the Reference Documentation.

If we wanted to ask Gremlin the average age of "vadas" and "marko" we could use the mean() step as follows:

console (groovy) groovy

gremlin> g.V().has('person','name',within('vadas','marko')).values('age').mean()
==>28.0

g.V().has('person','name',within('vadas','marko')).values('age').mean()

Another method of filtering is seen in the use of the where step. We know how to find the "software" that "marko" created:

console (groovy) groovy

gremlin> g.V().has('person','name','marko').out('created')
==>v[3]

g.V().has('person','name','marko').out('created')

gremlin on software vertex Let’s extend on that query to try to learn who "marko" collaborates with when it comes to the software he created. In other words, let’s try to answer the question of: "Who are the people that marko develops software with?" To do that, we should first picture Gremlin where we left him in the previous query. He was standing on the "software" vertex. To find out who "created" that "software", we need to have Gremlin traverse back in along the "created" edges to find the "person" vertices tied to it.

Tip	The nature of Gremlin leads to long lines of code. Readability can be greatly improved by using line spacing and indentation. See the Style Guide for recommendations on what well-formatted Gremlin should look like.

console (groovy) groovy

gremlin> g.V().has('person','name','marko').
           out('created').in('created').
           values('name')
==>marko
==>josh
==>peter

g.V().has('person','name','marko').
  out('created').in('created').
  values('name')

So that’s nice, we can see that "peter", "josh" and "marko" are all responsible for creating "v[3]", which is the "software" vertex named "lop". Of course, we already know about the involvement of "marko" and it seems strange to say that "marko" collaborates with himself, so excluding "marko" from the results seems logical. The following traversal handles that exclusion:

console (groovy) groovy

gremlin> g.V().has('person','name','marko').as('exclude').
           out('created').in('created').
           where(neq('exclude')).
           values('name')
==>josh
==>peter

g.V().has('person','name','marko').as('exclude').
  out('created').in('created').
  where(neq('exclude')).
  values('name')

We made two additions to the traversal to make it exclude "marko" from the results. First, we added the as() step. The as()-step is not really a "step", but a "step modulator" - something that adds features to a step or the traversal. Here, the as('exclude') labels the has()-step with the name "exclude" and all values that pass through that step are held in that label for later use. In this case, the "marko" vertex is the only vertex to pass through that point, so it is held in "exclude".

The other addition that was made was the where()-step, which is a filter step like has(). The where() is positioned after the in()-step that has "person" vertices, which means that the where() filter is occurring on the list of "marko" collaborators. The where() specifies that the "person" vertices passing through it should not equal (i.e., neq()) the contents of the "exclude" label. As it just contains the "marko" vertex, the where() filters out the "marko" that we get when we traverse back in on the "created" edges.

You will find many uses of as(). Here it is in combination with select:

console (groovy) groovy

gremlin> g.V().as('a').out().as('b').out().as('c').
           select('a','b','c')
==>[a:v[1],b:v[4],c:v[5]]
==>[a:v[1],b:v[4],c:v[3]]

g.V().as('a').out().as('b').out().as('c').
  select('a','b','c')

In the above example, we tell Gremlin to iterate through all vertices and traverse out twice from each. Gremlin will label each vertex in that path with "a", "b" and "c", respectively. We can then use select to extract the contents of that label.

Another common but important step is the group() step and its related step modulator called by(). If we wanted to ask Gremlin to group all the vertices in the graph by their vertex label we could do:

console (groovy) groovy

gremlin> g.V().group().by(label)
==>[software:[v[3],v[5]],person:[v[1],v[2],v[4],v[6]]]

g.V().group().by(label)

The use of by() here provides the mechanism by which to do the grouping. In this case, we’ve asked Gremlin to use the label (which, again, is an automatic static import from T in the console). We can’t really tell much about our distribution though because we just have unique identifiers of vertices as output. To make that nicer we could ask Gremlin to get us the value of the "name" property from those vertices, by supplying another by() modulator to group() to transform the values.

console (groovy) groovy

gremlin> g.V().group().by(label).by('name')
==>[software:[lop,ripple],person:[marko,vadas,josh,peter]]

g.V().group().by(label).by('name')

In this section, you have learned a bit more about what property graphs are and how Gremlin interacts with them. You also learned how to envision Gremlin moving about a graph and how to use some of the more complex, but commonly utilized traversal steps. You are now ready to think about TinkerPop in terms of its wider applicability to graph computing.

The Final Ten Minutes

In these final ten minutes of The TinkerPop Workout — by Gremlin, we’ll look at TinkerPop from a higher level and introduce different features it provides to help orient you to some of the project’s technology ecosystem. In this way, you can identify areas of interest and dig into the details from there.

Why TinkerPop?

provider integration The goal of TinkerPop, as a Graph Computing Framework, is to make it easy for developers to create graph applications by providing APIs and tools that simplify their endeavors. One of the fundamental aspects to what TinkerPop offers in this area lies in the fact that TinkerPop is an abstraction layer over different graph databases and different graph processors. As an abstraction layer, TinkerPop provides a way to avoid vendor lock-in to a specific database or processor. This capability provides immense value to developers who are thus afforded options in their architecture and development because:

They can try different implementations using the same code to decide which is best for their environment.
They can grow into a particular implementation if they so desire; e.g., start with a graph that is designed to scale within a single machine and then later switch to a graph that is designed to scale horizontally.
They can feel more confident in graph technology choices, as advances in the state of different provider implementations are behind TinkerPop APIs, which open the possibility to switch providers with limited impact.

TinkerPop has always had the vision of being an abstraction over different graph databases. That much is not new and dates back to TinkerPop 1.x. It is in TinkerPop 3.x, however, that we see the introduction of the notion that TinkerPop is also an abstraction over different graph processors like Spark. The scope of this tutorial does not permit it to delve into "graph processors", but the short story is that the same Gremlin statement we wrote in the examples above can be executed to run in distributed fashion over Spark or Hadoop. The changes required to the code to do this are not in the traversal itself, but in the definition of the TraversalSource. You can again see why we encourage graph operations to be executed through that class as opposed to just using Graph. You can read more about these features in this section on hadoop-gremlin.

Tip	To maintain an abstraction over `Graph` creation, use `GraphFactory.open()` to construct new instances. See the documentation for individual `Graph` implementations to learn about the configuration options to provide.

Loading Data

gremlin to the 7 There are many strategies for getting data into your graph. As you are just getting started, let’s look at the simpler methods aimed at "smaller" graphs. A "small" graph, in this context, is one that has fewer than ten million edges. The most direct way to load this data is to write a Groovy script that can be executed in the Gremlin Console, a tool that you should be well familiar with at this point. For our example, let’s use the Wikipedia Vote Network data set, which contains 7,115 vertices and 103,689 edges.

$ curl -L -O http://snap.stanford.edu/data/wiki-Vote.txt.gz
$ gunzip wiki-Vote.txt.gz

The data is contained in a tab-delimited structure in which vertices are Wikipedia users and edges from one user to another imply a "vote" relationship. Here is the script to parse the file and generate the Graph instance using TinkerGraph:

graph = TinkerGraph.open()
graph.createIndex('userId', Vertex.class) //1

g = traversal().withEmbedded(graph)

getOrCreate = { id ->
  g.V().has('user','userId', id).
    fold().
    coalesce(unfold(),
             addV('user').property('userId', id)).next()  //2
}

new File('wiki-Vote.txt').eachLine {
  if (!it.startsWith("#")){
    (fromVertex, toVertex) = it.split('\t').collect(getOrCreate) //3
    g.addE('votesFor').from(fromVertex).to(toVertex).iterate()
  }
}

To ensure fast lookups of vertices, we need an index. The createIndex() method is a method native to TinkerGraph. Please consult your graph databases' documentation for their index creation approaches.
This "get or create" traversal gets a vertex if it already exists; otherwise, it creates it. It uses coalesce() in a clever way by first determining whether the list of vertices produced by the previous fold() has anything in it by testing the result of unfold(). If unfold() returns nothing then that vertex doesn’t exist and the subsequent addV() inner traversal can be called to create it.
We are iterating each line of the wiki-Vote.txt file and this line splits the line on the delimiter, then uses some neat Groovy syntax to apply the getOrCreate() function to each of the two userId fields encountered in the line and stores those vertices in the fromVertex and toVertex variables, respectively.

Note	While this is a tab-delimited structure, this same pattern can be applied to any data source you require and Groovy tends to have nice libraries that can help make working with data quite enjoyable.

Warning

Take care if using a Graph implementation that supports transactions. As TinkerGraph does not, there is no need to commit(). If your Graph does support transactions, intermediate commits during load will need to be applied.

To load larger data sets you should read about the CloneVertexProgram, which provides a generalized method for loading graphs of virtually any size and consider the native bulk loading features of the underlying graph database that you’ve chosen.

Gremlin in Other Programming Languages

This tutorial focused on Gremlin usage within the Gremlin Console which means that the examples were Groovy-based and oriented toward the JVM. Gremlin, however, is far from being a Java-only library. TinkerPop natively supports a number of different programming languages, making it possible to execute all of the examples presented in this tutorial with little modification. These different language implementations of Gremlin are referred to as Gremlin Language Variants and they help make Gremlin more accessible and easier to use for those who do not use Java as their primary programming language.

console (groovy) groovy csharp java javascript python go

gremlin> v1 = g.addV('person').property('name','marko').next()
==>v[0]
gremlin> v2 = g.addV('person').property('name','stephen').next()
==>v[2]
gremlin> g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate()

v1 = g.addV('person').property('name','marko').next()
v2 = g.addV('person').property('name','stephen').next()
g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate()

Vertex v1 = g.AddV("person").Property("name","marko").Next();
Vertex v2 = g.AddV("person").Property("name","stephen").Next();
g.V(v1).AddE("knows").To(v2).Property("weight",0.75).Iterate();

Vertex v1 = g.addV("person").property("name","marko").next();
Vertex v2 = g.addV("person").property("name","stephen").next();
g.V(v1).addE("knows").to(v2).property("weight",0.75).iterate();

const v1 = g.addV('person').property('name','marko').next();
const v2 = g.addV('person').property('name','stephen').next();
g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate();

v1 = g.addV('person').property('name','marko').next()
v2 = g.addV('person').property('name','stephen').next()
g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate()

v1, err := g.AddV("person").Property("name", "marko").Next()
v2, err := g.AddV("person").Property("name", "stephen").Next()
g.V(v1).AddE("knows").To(v2).Property("weight", 0.75).Iterate()

Conclusion

…and that is the end of The TinkerPop Workout — by Gremlin. You are hopefully feeling more confident in your TinkerPop skills and have a good overview of what the stack has to offer, as well as some entry points to further research within the reference documentation. Welcome to The TinkerPop!