apache tinkerpop logo

TinkerPop Upgrade Information

This document helps users of TinkerPop to understand the changes that come with each software release. It outlines new features, how to resolve breaking changes and other information specific to a release. This document is useful to end-users who are building applications on TinkerPop, but it is equally useful to TinkerPop providers, who build libraries and other systems on the the core APIs and protocols that TinkerPop exposes.

These providers include:

TinkerPop 3.2.0

nine inch gremlins

Nine Inch Gremlins

TinkerPop 3.2.0

Release Date: Release Date: April 8, 2016

Please see the changelog for a complete list of all the modifications that are part of this release.

Upgrading for Users

Hadoop FileSystem Variable

The HadoopGremlinPlugin defines two variables: hdfs and fs. The first is a reference to the HDFS FileSystemStorage and the latter is a reference to the the local FileSystemStorage. Prior to 3.2.x, fs was called local. However, there was a variable name conflict with Scope.local. As such local is now fs. This issue existed prior to 3.2.x, but was not realized until this release. Finally, this only effects Gremlin Console users.

Hadoop Configurations

Note that gremlin.hadoop.graphInputFormat, gremlin.hadoop.graphOutputFormat, gremlin.spark.graphInputRDD, and gremlin.spark.graphOuputRDD have all been deprecated. Using them still works, but moving forward, users only need to leverage gremlin.hadoop.graphReader and gremlin.hadoop.graphWriter. An example properties file snippet is provided below.

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer

TraversalSideEffects Update

There were changes to TraversalSideEffect both at the semantic level and at the API level. Users that have traversals of the form sideEffect{...} that leverage global side-effects should read the following carefully. If the user’s traversals do not use lambda-based side-effect steps (e.g. groupCount("m")), then the changes below will not effect them. Moreover, if user’s traversal only uses sideEffect{...} with closure (non-TraversalSideEffect) data references, then the changes below will not effect them. If the user’s traversal uses sideEffects in OLTP only, the changes below will not effect them. Finally, providers should not be effected by the changes save any tests cases.

TraversalSideEffects Get API Change

TraversalSideEffects can now logically operate within a distributed OLAP environment. In order to make this possible, it is necessary that each side-effect be registered with a reducing BinaryOperator. This binary operator will combine distributed updates into a single global side-effect at the master traversal. Many of the methods in TraversalSideEffect have been Deprecated, but they are backwards compatible save that TraversalSideEffects.get() no longer returns an Optional, but instead throws an IllegalArgumentException. While the Optional semantics could have remained, it was deemed best to directly return the side-effect value to reduce object creation costs and because all side-effects must be registered apriori, there is never a reason why an unknown side-effect key would be used. In short:

// change
traversal.getSideEffects().get("m").get()
// to
traversal.getSideEffects().get("m")
TraversalSideEffects Registration Requirement

All TraversalSideEffects must be registered upfront. This is because, in OLAP, side-effects map to Memory compute keys and as such, must be declared prior to the execution of the TraversalVertexProgram. If a user’s traversal creates a side-effect mid-traversal, it will fail. The traversal must use GraphTraversalSource.withSideEffect() to declare the side-effects it will use during its execution lifetime. If the user’s traversals use standard side-effect Gremlin steps (e.g. group("m")), then no changes are required.

TraversalSideEffects Add Requirement

In a distributed environment, a side-effect can not be mutated and be expected to exist in the mutated form at the final, aggregated, master traversal. For instance, if the side-effect "myCount" references a Long, the Long can not be updated directly via sideEffects.set("myCount", sideEffects.get("myCount") + 1). Instead, it must rely on the registered reducer to do the merging and thus, the Step must do sideEffect.add("mySet",1), where the registered reducer is Operator.sum. Thus, the below will increment "a". If no operator was provided, then the operator is assumed Operator.assign and the final result of "a" would be 1. Note that Traverser.sideEffects(key,value) uses TraversalSideEffect.add().

gremlin> traversal = g.withSideEffect('a',0,sum).V().out().sideEffect{it.sideEffects('a',1)}
==>v[3]
==>v[2]
==>v[4]
==>v[5]
==>v[3]
==>v[3]
gremlin> traversal.getSideEffects().get('a')
==>6
gremlin> traversal = g.withSideEffect('a',0).V().out().sideEffect{it.sideEffects('a',1)}
==>v[3]
==>v[2]
==>v[4]
==>v[5]
==>v[3]
==>v[3]
gremlin> traversal.getSideEffects().get('a')
==>1

ProfileStep Update and GraphTraversal API Change

The profile()-step has been refactored into 2 steps — ProfileStep and ProfileSideEffectStep. Users who previously used the profile() in conjunction with cap(TraversalMetrics.METRICS_KEY) can now simply omit the cap step. Users who retrieved TraversalMetrics from the side-effects after iteration can still do so, but will need to specify a side-effect key when using the profile(). For example, profile("myMetrics").

BranchStep Bug Fix

There was a bug in BranchStep that also rears itself in subclass steps such as UnionStep and ChooseStep. For traversals with branches that have barriers (e.g. count(), max(), groupCount(), etc.), the traversal needs to be updated. For instance, if a traversal is of the form g.V().union(out().count(),both().count()), the result is now different (the bug fix yields a different output). In order to yield the same result, the traversal should be rewritten as g.V().local(union(out().count(),both().count())). Note that if a branch does not have a barrier, then no changes are required. For instance, g.V().union(out(),both()) does not need to be updated. Moreover, if the user’s traversal already used the local()-form, then no change are required either.

MemoryComputeKey and VertexComputeKey

Users that have custom VertexProgram implementations will need to change their implementations to support the new VertexComputeKey and MemoryComputeKey classes. In the VertexPrograms provided by TinkerPop, these changes were trivial, taking less than 5 minutes to make all the requisite updates.

  • VertexProgram.getVertexComputeKeys() returns a Set<VertexComputeKey>. No longer a Set<String>. Use VertexComputeKey.of(String key,boolean transient) to generate a VertexComputeKey. Transient keys were not supported in the past, so to make the implementation semantically equivalent, the boolean transient should be false.

  • VertexProgram.getMemoryComputeKeys() returns a Set<MemoryComputeKey>. No longer a Set<String>. Use MemoryComputeKey.of(String key, BinaryOperator reducer, boolean broadcast, boolean transient) to generate a MemoryComputeKey. Broadcasting and transients were not supported in the past so to make the implementation semantically equivalent, the boolean broadcast should be true and the boolean transient should be false.

An example migration looks as follows. What might currently look like:

public Set<String> getMemoryComputeKeys() {
   return new HashSet<>(Arrays.asList("a","b","c"))
}

Should now look like:

public Set<MemoryComputeKey> getMemoryComputeKeys() {
  return new HashSet<>(Arrays.asList(
    MemoryComputeKey.of("a", Operator.and, true, false),
    MemoryComputeKey.of("b", Operator.sum, true, false),
    MemoryComputeKey.of("c", Operator.or, true, false)))
}

A similar patterns should also be used for VertexProgram.getVertexComputeKeys().

SparkGraphComputer and GiraphGraphComputer Persistence

The MapReduce-based steps in TraversalVertexProgram have been removed and replaced using a new Memory-reduction model. MapReduce jobs always created a persistence footprint, e.g. in HDFS. Memory data was never persisted to HDFS. As such, there will be no data on the disk that is accessible. For instance, there is no more ~reducing, ~traversers, and specially named side-effects such as m from a groupCount('m'). The data is still accessible via ComputerResult.memory(), it simply does not have a corresponding on-disk representation.

RemoteGraph

RemoteGraph is a lightweight Graph implementation that acts as a proxy for sending traversals to Gremlin Server for remote execution. It is an interesting alternative to the other methods for connecting to Gremlin Server in that all other methods involved construction of a String representation of the Traversal which is then submitted as a script to Gremlin Server (via driver or REST).

gremlin> graph = RemoteGraph.open('conf/remote-graph.properties')
==>remotegraph[DriverServerConnection-localhost/127.0.0.1:8182 [graph='graph]]
gremlin> g = graph.traversal()
==>graphtraversalsource[remotegraph[DriverServerConnection-localhost/127.0.0.1:8182 [graph='graph]], standard]
gremlin> g.V().valueMap(true)
==>[name:[marko], label:person, id:1, age:[29]]
==>[name:[vadas], label:person, id:2, age:[27]]
==>[name:[lop], label:software, id:3, lang:[java]]
==>[name:[josh], label:person, id:4, age:[32]]
==>[name:[ripple], label:software, id:5, lang:[java]]
==>[name:[peter], label:person, id:6, age:[35]]

Note that g.V().valueMap(true) is executing in Gremlin Server and not locally in the console.

Upgrading for Providers

Graph System Providers

GraphStep Compilation Requirement

OLTP graph providers that have a custom GraphStep implementation should ensure that g.V().hasId(x) and g.V(x) compile to the same representation. This ensures a consistent user experience around random access of elements based on ids (as opposed to potentially the former doing a linear scan). A static helper method called GraphStep.processHasContainerIds() has been added. TinkerGraphStepStrategy was updated as such:

((HasContainerHolder) currentStep).getHasContainers().forEach(tinkerGraphStep::addHasContainer);

is now

((HasContainerHolder) currentStep).getHasContainers().forEach(hasContainer -> {
  if (!GraphStep.processHasContainerIds(tinkerGraphStep, hasContainer))
    tinkerGraphStep.addHasContainer(hasContainer);
});
Step API Update

The Step interface is fundamental to Gremlin. Step.processNextStart() and Step.next() both returned Traverser<E>. We had so many Traverser.asAdmin() and direct typecast calls throughout (especially in TraversalVertexProgram) that it was deemed prudent to have Step.processNextStart() and Step.next() return Traverser.Admin<E>. Moreover it makes sense as this is internal logic where Admins are always needed. Providers with their own step definitions will simply need to change the method signatures of Step.processNextStart() and Step.next(). No logic update is required — save that asAdmin() can be safely removed if used. Also, Step.addStart() and Step.addStarts() take Traverser.Admin<S> and Iterator<Traverser.Admin<S>>, respectively.

Traversal API Update

The way in which TraverserRequirements are calculated has been changed (for the better). The ramification is that post compilation requirement additions no longer make sense and should not be allowed. To enforce this, Traversal.addTraverserRequirement() method has been removed from the interface. Moreover, providers/users should never be able to add requirements manually (this should all be inferred from the end compilation). However, if need be, there is always RequirementStrategy which will allow the provider to add a requirement at strategy application time (though again, there should not be a reason to do so).

ComparatorHolder API Change

Providers that either have their own ComparatorHolder implementation or reason on OrderXXXStep will need to update their code. ComparatorHolder now returns List<Pair<Traversal,Comparator>>. This has greatly reduced the complexity of comparison-based steps like OrderXXXStep. However, its a breaking API change that is trivial to update to, just some awareness is required.

GraphComputer Semantics and API

Providers that have a custom GraphComputer implementation will have a lot to handle. Note that if the graph system simply uses SparkGraphComputer or GiraphGraphComputer provided by TinkerPop, then no updates are required. This only effects providers that have their own custom GraphComputer implementations.

Memory updates:

  • Any BinaryOperator can be used for reduction and is made explicit in the MemoryComputeKey.

  • MemoryComputeKeys can be marked transient and must be removed from the resultant ComputerResult.memory().

  • MemoryComputeKeys can be specified to not broadcast and thus, must not be available to workers to read in VertexProgram.execute().

  • The Memory API has been changed. No more incr(), and(), etc. Now its just set() (setup/terminate) and add() (execute).

VertexProgram updates:

  • VertexComputeKeys can be marked transient and must be removed from the resultant ComputerResult.graph().

Operational semantic test cases have been added to GraphComputerTest to ensure that all the above are implemented correctly.

Barrier Step Updates

The Barrier interface use to simply be a marker interface. Now it has methods and it is the primary means by which distributed steps across an OLAP job are aggregated and distributed. It is unlikely that Barrier was ever used directly by a provider’s custom step. Instead, a provider most likely extended SupplyingBarrierStep, CollectingBarrierStep, and/or ReducingBarrierStep.

Providers that have custom extensions to these steps or that use Barrier directly will need to adjust their implementation slightly to accommodate a new API that reflects the Memory updates above. This should be a simple change. Note that FinalGet no longer exists and such post-reduction processing is handled by the reducing step (via the new Generating interface).

Performance Tests

The ProcessPerformanceSuite and TraversalPerformanceTest have been deprecated. They are still available, but going forward, providers should implement their own performance tests and not rely on the built-in JUnit benchmark-based performance test suit.

Graph Processor Providers

GraphFilter and GraphComputer

The GraphComputer API has changed with the addition of GraphComputer.vertices(Traversal) and GraphComputer.edges(Traversal). These methods construct a GraphFilter object which is also new to TinkerPop 3.2.0. GraphFilter is a "push-down predicate" used to selectively retrieve subgraphs of the underlying graph to be OLAP processed.

  • If the graph system provider relies on an existing GraphComputer implementations such as SparkGraphComputer and/or GiraphGraphComputer, then there is no immediate action required on their part to remain TinkerPop-compliant. However, they may wish to update their InputFormat or InputRDD implementation to be GraphFilterAware and handle the GraphFilter filtering at the disk/database level. It is advisable to do so in order to reduce OLAP load times and memory/GC usage.

  • If the graph system provider has their own GraphComputer implementation, then they should implement the two new methods and ensure that GraphFilter is processed correctly. There is a new test case called GraphComputerTest.shouldSupportGraphFilter() which ensures the semantics of GraphFilter are handled correctly. For a "quick and easy" way to move forward, look to GraphFilterInputFormat as a way of wrapping an existing InputFormat to do filtering prior to VertexProgram or MapReduce execution.

Note
To quickly move forward, the GraphComputer implementation can simply set GraphComputer.Features.supportsGraphFilter() to false and ensure that GraphComputer.vertices() and GraphComputer.edges() throws GraphComputer.Exceptions.graphFilterNotSupported(). This is not recommended as its best to support GraphFilter.
Job Chaining and GraphComputer

TinkerPop 3.2.0 has integrated VertexPrograms into GraphTraversal. This means, that a single traversal can compile to multiple GraphComputer OLAP jobs. This requires that ComputeResults be chainable. There was never any explicit tests to verify if a provider’s GraphComputer could be chained, but now there are. Given a reasonable implementation, it is likely that no changes are required of the provider. However, to ensure the implementation is "reasonable" GraphComputerTests have been added.

  • For providers that support their own GraphComputer implementation, note that there is a new GraphComputerTest.shouldSupportJobChaining(). This tests verifies that the ComputerResult output of one job can be fed into the input of a subsequent job. Only linear chains are tested/required currently. In the future, branching DAGs may be required.

  • For providers that support their own GraphComputer implementation, note that there is a new GraphComputerTest.shouldSupportPreExistingComputeKeys(). When chaining OLAP jobs together, if an OLAP job requires the compute keys of a previous OLAP job, then the existing compute keys must be accessible. A simple 2 line change to SparkGraphComputer and TinkerGraphComputer solved this for TinkerPop. GiraphGraphComputer did not need an update as this feature was already naturally supported.

Graph Language Providers

ScriptTraversal

Providers that have custom Gremlin language implementations (e.g. Gremlin-Scala), there is a new class called ScriptTraversal which will handle script-based processing of traversals. The entire GroovyXXXTest-suite was updated to use this new class. The previous TraversalScriptHelper class has been deprecated so immediate upgrading is not required, but do look into ScriptTraversal as TinkerPop will be using it as a way to serialize "String-based traversals" over the network moving forward.

ByModulating and Custom Steps

If the provider has custom steps that leverage by()-modulation, those will now need to implement ByModulating. Most of the methods in ByModulating are default and, for most situations, only ByModulating.modulateBy(Traversal) needs to be implemented. Note that this method’s body will most like be identical the custom step’s already existing TraversalParent.addLocalChild(). It is recommended that the custom step not use TraversalParent.addLocalChild() as this method may be deprecated in a future release. Instead, barring any complex usages, simply rename the CustomStep.addLocalChild(Traversal) to CustomStep.modulateBy(Traversal).

TraversalEngine Deprecation and GraphProvider

The TraversalSource infrastructure has been completely rewritten. Fortunately for users, their code is backwards compatible. Unfortunately for graph system providers, a few tweaks to their implementation are in order.

  • If the graph system supports more than Graph.compute(), then implement GraphProvider.getGraphComputer().

  • For custom TraversalStrategy implementations, change traverser.getEngine().isGraphComputer() to TraversalHelper.onGraphComputer(Traversal).

  • For custom Steps, change implements EngineDependent to implements GraphComputing.

TinkerPop 3.1.0

gremlin gangster

A 187 On The Undercover Gremlinz

TinkerPop 3.1.2

Release Date: April 8, 2016

Please see the changelog for a complete list of all the modifications that are part of this release.

Upgrading for Users

Aliasing Sessions

Calls to SessionedClient.alias() used to throw UnsupportedOperationException and it was therefore not possible to use that capability with a session. That method is now properly implemented and aliasing is allowed.

Remote Console

The :remote console command provides a way to avoid having to prefix the :> command to scripts when remoting. This mode of console usage can be convenient when working exclusively with a remote like Gremlin Server and there is only a desire to view the returned data and not to actually work with it locally in any way.

Console Remote Sessions

The :remote tinkerpop.server command now allows for a "session" argument to be passed to connect. This argument, tells the remote to configure it with a Gremlin Server session. In this way, the console can act as a window to script exception on the server and behave more like a standard "local" console when it comes to script execution.

TinkerPop Archetypes

TinkerPop now offers Maven archetypes, which provide example project templates to quickly get started with TinkerPop. The available archetypes are as follows:

  • gremlin-archetype-server - An example project that demonstrates the basic structure of a Gremlin Server project, how to connect with the Gremlin Driver, and how to embed Gremlin Server in a testing framework.

  • gremlin-archetype-tinkergraph - A basic example of how to structure a TinkerPop project with Maven.

Session Transaction Management

When connecting to a session with gremlin-driver, it is now possible to configure the Client instance so as to request that the server manage the transaction for each requests.

Cluster cluster = Cluster.open();
Client client = cluster.connect("sessionName", true);

Specifying true to the connect() method signifies that the client should make each request as one encapsulated in a transaction. With this configuration of client there is no need to close a transaction manually.

Session Timeout Setting

The gremlin-driver now has a setting called maxWaitForSessionClose that allows control of how long it will wait for an in-session connection to respond to a close request before it simply times-out and moves on. When that happens, the server will either eventually close the connection via at session expiration or at the time of shutdown.

Upgrading for Providers

Important
It is recommended that providers also review all the upgrade instructions specified for users. Many of the changes there may prove important for the provider’s implementation.

All Providers

Provider Documentation

Documentation related to the lower-level APIs used by a provider, that was formerly in the reference documentation, has been moved to its own documentation set that is now referred to as the Provider Documentation.

Graph System Providers

GraphProvider.clear() Semantics

The semantics of the various clear() methods on GraphProvider didn’t really change, but it would be worth reviewing their implementations to ensure that implementations can be called successfully in an idempotent fashion. Multiple calls to clear() may occur for a single test on the same Graph instance, as 3.1.1-incubating introduced an automated method for clearing graphs at the end of a test and some tests call clear() manually.

Driver Providers

Session Transaction Management

Up until now transaction management has been a feature of sessionless requests only, but the new manageTransaction request argument for the Session OpProcessor changes that. Session-based requests can now pass this boolean value on each request to signal to Gremlin Server that it should attempt to commit (or rollback) the transaction at the end of the request. By default, this value as false, so there is no change to the protocol for this feature.

scriptEvalTimeout Override

The Gremlin Server protocol now allows the passing of scriptEvaluationTimeout as an argument to the SessionOpProcessor and the StandardOpProcessor. This value will override the setting of the same name provided in the Gremlin Server configuration file on a per request basis.

Plugin Providers

RemoteAcceptor allowRemoteConsole

The RemoteAcceptor now has a new method called allowRemoteConsole(). It has a default implementation that returns false and should thus be a non-breaking change for current implementations. This value should only be set to true if the implementation expects the user to always use :> to interact with it. For example, the tinkerpop.server plugin expects all user interaction through :>, where the line is sent to Gremlin Server. In that case, that RemoteAcceptor implementation can return true. On the other hand, the tinkerpop.gephi plugin, expects that the user sometimes call :> and sometimes work with local evaluation as well. It interacts with the local variable bindings in the console itself. For tinkerpop.gephi, this method returns false.

TinkerPop 3.1.1

Release Date: February 8, 2016

Please see the changelog for a complete list of all the modifications that are part of this release.

Upgrading for Users

Storage I/O

The gremlin-core io-package now has a Storage interface. The methods that were available via hdfs (e.g. rm(), ls(), head(), etc.) are now part of Storage. Both HDFS and Spark implement Storage via FileSystemStorage and SparkContextStorage, respectively. SparkContextStorage adds support for interacting with persisted RDDs in the Spark cache.

This update changed a few of the file handling methods. As it stands, these changes only effect manual Gremlin Console usage as HDFS support was previously provided via Groovy meta-programing. Thus, these are not "code-based" breaking changes.

  • hdfs.rmr() no longer exists. hdfs.rm() is now recursive. Simply change all references to rmr() to rm() for identical behavior.

  • hdfs.head(location,lines,writableClass) no longer exists.

    • For graph locations, use hdfs.head(location,writableClass,lines).

    • For memory locations, use hdfs.head(location,memoryKey,writableClass,lines).

  • hdfs.head(...,ObjectWritable) no longer exists. Use SequenceFileInputFormat as an input format is the parsing class.

Given that HDFS (and now Spark) interactions are possible via Storage and no longer via Groovy meta-programming, developers can use these Storage implementations in their Java code. In fact, Storage has greatly simplified complex file/RDD operations in both GiraphGraphComputer and SparkGraphComputer.

Finally, note that the following low-level/internal classes have been removed: HadoopLoader and HDFSTools.

Gremlin Server Transaction Management

Gremlin Server now has a setting called strictTransactionManagement, which forces the user to pass aliases for all requests. The aliases are then used to determine which graphs will have their transactions closed for that request. The alternative is to continue with default operations where the transactions of all configured graphs will be closed. It is likely that strictTransactionManagement (which is false by default so as to be backward compatible with previous versions) will become the future standard mode of operation for Gremlin Server as it provides a more efficient method for transaction management.

Deprecated credentialsDbLocation

The credentialsDbLocation setting was a TinkerGraph only configuration option to the SimpleAuthenticator for Gremlin Server. It provided the file system location to a "credentials graph" that TinkerGraph would read from a Gryo file at that spot. This setting was only required because TinkerGraph did not support file persistence at the time that SimpleAuthenticator was created.

As of 3.1.0-incubating, TinkerGraph received a limited persistence feature that allowed the "credentials graph" location to be specified in the TinkerGraph properties file via gremlin.tinkergraph.graphLocation and as such the need for credentialsDbLocation was eliminated.

This deprecation is not a breaking change, however users should be encouraged to convert their configurations to use the gremlin.tinkergraph.graphLocation as soon as possible, as the deprecated setting will be removed in a future release.

TinkerGraph Supports Any I/O

TinkerGraph’s gremlin.tinkergraph.graphLocation configuration setting can now take a fully qualified class name of a Io.Builder implementation, which means that custom IO implementations can be used to read and write TinkerGraph instances.

Authenticator Method Deprecation

For users who have a custom Authenticator implementation for Gremlin Server, there will be a new method present:

public default SaslNegotiator newSaslNegotiator(final InetAddress remoteAddress)

Implementation of this method is now preferred over the old method with the same name that has no arguments. The old method has been deprecated. This is a non-breaking change as the new method has a default implementation that simply calls the old deprecated method. In this way, existing Authenticator implementations will still work.

Spark Persistence Updates

Spark RDD persistence is now much safer with a "job server" system that ensures that persisted RDDs are not garbage collected by Spark. With this, the user is provider a spark object that enables them to manage persisted RDDs much like the hdfs object is used for managing files in HDFS.

Finally, InputRDD instance no longer need a reduceByKey() postfix as view merges happen prior to writing the graphRDD. Note that a reduceByKey() postfix will not cause problems if continued, it is simply inefficient and no longer required.

Logging

Logging to Gremlin Server and Gremlin Console can now be consistently controlled by the log4j-server.properties and log4j-console.properties which are in the respective conf/ directories of the packaged distributions.

Gremlin Server Sandboxing

A number of improvements were made to the sandboxing feature of Gremlin Server (more specifically the GremlinGroovyScriptEngine). A new base class for sandboxing was introduce with the AbstractSandboxExtension, which makes it a bit easier to build white list style sandboxes. A usable implementation of this was also supplied with the FileSandboxExtension, which takes a configuration file containing a white list of accessible methods and variables that can be used in scripts. Note that the original SandboxExtension has been deprecated in favor of the AbsstractSandboxExtension or extending directly from Groovy’s TypeCheckingDSL.

Deprecated supportsAddProperty()

It was realized that VertexPropertyFeatures.supportsAddProperty() was effectively a duplicate of VertexFeatures.supportsMetaProperties(). As a result, supportsAddProperty() was deprecated in favor of the other. If using supportsAddProperty(), simply modify that code to instead utilize supportsMetaProperties().

Upgrading for Providers

Important
It is recommended that providers also review all the upgrade instructions specified for users. Many of the changes there may prove important for the provider’s implementation.

Graph System Providers

Data Types in Tests

There were a number of fixes related to usage of appropriate types in the test suite. There were cases where tests were mixing types, such that a single property key might have two different values. This mixed typing caused problems for some graphs and wasn’t really something TinkerPop was looking to explicitly enforce as a rule of implementing the interfaces.

While the changes should not have been breaking, providers should be aware that improved consistencies in the tests may present opportunities for test failures.

Graph Database Providers

Custom ClassResolver

For providers who have built custom serializers in Gryo, there is a new feature open that can be considered. A GryoMapper can now take a custom Kryo ClassResolver, which means that custom types can be coerced to other types during serialization (e.g. a custom identifier could be serialized as a HashMap). The advantage to taking this approach is that users will not need to have the provider’s serializers on the client side. They will only need to exist on the server (presuming that the a type is coerced to a type available on the client, of course). The downside is that serialization is then no longer a two way street. For example, a custom ClassResolver that coerced a custom identifier to HashMap would let the client work with the identifier as a HashMap, but the client would then have to send that identifier back to the server as a HashMap where it would be recognized as a HashMap (not an identifier).

Feature Consistency

There were a number of corrections made around the consistency of Features and how they were applied in tests. Corrections fell into two groups of changes:

  1. Bugs in the how Features were applied to certain tests.

  2. Refactoring around the realization that VertexFeatures.supportsMetaProperties() is really just a duplicate of features already exposed as VertexPropertyFeatures.supportsAddProperty(). VertexPropertyFeatures.supportsAddProperty() has been deprecated.

These changes related to "Feature Consistency" open up a number of previously non-executing tests for graphs that did not support meta-properties, so providers should be wary of potential test failure on previously non-executing tests.

Graph Processor Providers

InputRDD and OutputRDD Updates

There are two new methods on the Spark-Gremlin RDD interfaces.

  • InputRDD.readMemoryRDD(): get a ComputerResult.memory() from an RDD.

  • OutputRDD.writeMemoryRDD(): write a ComputerResult.memory() to an RDD.

Note that both these methods have default implementations which simply work with empty RDDs. Most providers will never need to implement these methods as they are specific to file/RDD management for GraphComputer. The four classes that implement these methods are PersistedOutputRDD, PersistedInputRDD, InputFormatRDD, and OutputFormatRDD. For the interested provider, study the implementations therein to see the purpose of these two new methods.

TinkerPop 3.1.0

Release Date: November 16, 2015

Please see the changelog for a complete list of all the modifications that are part of this release.

Additional upgrade information can be found here:

Upgrading for Users

Shading Jackson

The Jackson library is now shaded to gremlin-shaded, which will allow Jackson to version independently without breaking compatibility with dependent libraries or with those who depend on TinkerPop. The downside is that if a library depends on TinkerPop and uses the Jackson classes, those classes will no longer exist with the standard Jackson package naming. They will have to shifted as follows:

  • org.objenesis becomes org.apache.tinkerpop.shaded.objenesis

  • com.esotericsoftware.minlog becomes org.apache.tinkerpop.shaded.minlog

  • com.fasterxml.jackson becomes org.apache.tinkerpop.shaded.jackson

PartitionStrategy and VertexProperty

PartitionStrategy now supports partitioning within VertexProperty. The Graph needs to be able to support meta-properties for this feature to work.

Gremlin Server and Epoll

Gremlin Server provides a configuration option to turn on support for Netty native transport on Linux, which has been shown to help improve performance.

Rebindings Deprecated

The notion of "rebindings" has been deprecated in favor of the term "aliases". Alias is a better and more intuitive term than rebindings which should make it easier for newcomers to understand what they are for.

Configurable Driver Channelizer

The Gremlin Driver now allows the Channerlizer to be supplied as a configuration, which means that custom implementations may be supplied.

GraphSON and Strict Option

The GraphMLReader now has a strict option on the Builder so that if a data type for a value is invalid in some way, GraphMLReader will simply skip that problem value. In that way, it is a bit more forgiving than before especially with empty data.

Transaction.close() Default Behavior

The default behavior of Transaction.close() is to rollback the transaction. This is in contrast to previous versions where the default behavior was commit. Using rollback as the default should be thought of as a like a safer approach to closing where a user must now explicitly call commit() to persist their mutations.

See TINKERPOP-805 for more information.

ThreadLocal Transaction Settings

The Transaction.onReadWrite() and Transaction.onClose() settings now need to be set for each thread (if another behavior than the default is desired). For gremlin-server users that may be changing these settings via scripts. If the settings are changed for a sessionless request they will now only apply to that one request. If the settings are changed for an in-session request they will now only apply to all future requests made in the scope of that session.

Hadoop-Gremlin

  • Hadoop1 is no longer supported. Hadoop2 is now the only supported Hadoop version in TinkerPop.

  • Spark and Giraph have been split out of Hadoop-Gremlin into their own respective packages (Spark-Gremlin and Giraph-Gremlin).

  • The directory where application jars are stored in HDFS is now hadoop-gremlin-3.2.0-incubating-libs.

    • This versioning is important so that cross-version TinkerPop use does not cause jar conflicts.

See link:https://issues.apache.org/jira/browse/TINKERPOP-616

Spark-Gremlin

  • Providers that wish to reuse a graphRDD can leverage the new PersistedInputRDD and PersistedOutputRDD.

    • This allows the graphRDD to avoid serialization into HDFS for reuse. Be sure to enabled persisted SparkContext (see documentation).

See link:https://issues.apache.org/jira/browse/TINKERPOP-868, link:https://issues.apache.org/jira/browse/TINKERPOP-925

TinkerGraph Serialization

TinkerGraph is serializable over Gryo, which means that it can shipped over the wire from Gremlin Server. This feature can be useful when working with remote subgraphs.

Deprecation in TinkerGraph

The public static String configurations have been renamed. The old public static variables have been deprecated. If the deprecated variables were being used, then convert to the replacements as soon as possible.

Deprecation in Gremlin-Groovy

The closure wrappers classes GFunction, GSupplier, GConsumer have been deprecated. In Groovy, a closure can be specified using as Function and thus, these wrappers are not needed. Also, the GremlinExecutor.promoteBindings() method which was previously deprecated has been removed.

Gephi Traversal Visualization

The process for visualizing a traversal has been simplified. There is no longer a need to "name" steps that will represent visualization points for Gephi. It is possible to just "configure" a visualTraversal in the console:

gremlin> :remote config visualTraversal graph vg

which creates a special TraversalSource from graph called vg. The traversals created from vg can be used to :submit to Gephi.

Alterations to GraphTraversal

There were a number of changes to GraphTraversal. Many of the changes came by way of deprecation, but some semantics have changed as well:

  • ConjunctionStrategy has been renamed to ConnectiveStrategy (no other behaviors changed).

  • ConjunctionP has been renamed to ConnectiveP (no other behaviors changed).

  • DedupBijectionStrategy has been renamed (and made more effective) as FilterRankingStrategy.

  • The GraphTraversal mutation API has change significantly with all previous methods being supported but deprecated.

    • The general pattern used now is addE('knows').from(select('a')).to(select('b')).property('weight',1.0).

  • The GraphTraversal sack API has changed with all previous methods being supported but deprecated.

    • The old sack(mult,'weight') is now sack(mult).by('weight').

  • GroupStep has been redesigned such that there is now only a key- and value-traversal. No more reduce-traversal.

    • The previous group()-methods have been renamed to groupV3d0(). To immediately upgrade, rename all your group()-calls to groupV3d0().

    • To migrate to the new group()-methods, what was group().by('age').by(outE()).by(sum(local)) is now group().by('age').by(outE().sum()).

  • There was a bug in fold(), where if a bulked traverser was provided, the traverser was only represented once.

    • This bug fix might cause a breaking change to a user query if the non-bulk behavior was being counted on. If so, used dedup() prior to fold().

  • Both GraphTraversal().mapKeys() and GraphTraversal.mapValues() has been deprecated.

    • Use select(keys) and select(columns). However, note that select() will not unroll the keys/values. Thus, mapKeys()select(keys).unfold().

  • The data type of Operator enums will now always be the highest common data type of the two given numbers, rather than the data type of the first number, as it’s been before.

Aliasing Remotes in the Console

The :remote command in Gremlin Console has a new alias configuration option. This alias option allows specification of a set of key/value alias/binding pairs to apply to the remote. In this way, it becomes possible to refer to a variable on the server as something other than what it is referred to for purpose of the submitted script. For example once a :remote is created, this command:

:remote alias x g

would allow "g" on the server to be referred to as "x".

:> x.E().label().groupCount()

Upgrading for Providers

Important
It is recommended that providers also review all the upgrade instructions specified for users. Many of the changes there may prove important for the provider’s implementation.

All providers should be aware that Jackson is now shaded to gremlin-shaded and could represent breaking change if there was usage of the dependency by way of TinkerPop, a direct dependency to Jackson may be required on the provider’s side.

Graph System Providers

GraphStep Alterations
  • GraphStep is no longer in sideEffect-package, but now in map-package as traversals support mid-traversal V().

  • Traversals now support mid-traversal V()-steps. Graph system providers should ensure that a mid-traversal V() can leverage any suitable index.

See link:https://issues.apache.org/jira/browse/TINKERPOP-762

Decomposition of AbstractTransaction

The AbstractTransaction class has been abstracted into two different classes supporting two different modes of operation: AbstractThreadLocalTransaction and AbstractThreadedTransaction, where the former should be used when supporting ThreadLocal transactions and the latter for threaded transactions. Of course, providers may still choose to build their own implementation on AbstractTransaction itself or simply implement the Transaction interface.

The AbstractTransaction gains the following methods to potentially implement (though default implementations are supplied in AbstractThreadLocalTransaction and AbstractThreadedTransaction):

  • doReadWrite that should execute the read-write consumer.

  • doClose that should execute the close consumer.

Transaction.close() Default Behavior

The default behavior for Transaction.close() is to rollback the transaction and is enforced by tests, which previously asserted the opposite (i.e. commit on close). These tests have been renamed to suite the new semantics:

  • shouldCommitOnCloseByDefault became shouldCommitOnCloseWhenConfigured

  • shouldRollbackOnCloseWhenConfigured became shouldRollbackOnCloseByDefault

If these tests were referenced in an OptOut, then their names should be updated.

Graph Traversal Updates

There were numerous changes to the GraphTraversal API. Nearly all changes are backwards compatible with respective "deprecated" annotations. Please review the respective updates specified in the "Graph System Users" section.

  • GraphStep is no longer in sideEffect package. Now in map package.

  • Make sure mid-traversal GraphStep calls are folding HasContainers in for index-lookups.

  • Think about copying TinkerGraphStepStrategyTest for your implementation so you know folding is happening correctly.

Element Removal

Element.Exceptions.elementAlreadyRemoved has been deprecated and test enforcement for consistency have been removed. Providers are free to deal with deleted elements as they see fit.

VendorOptimizationStrategy Rename

The VendorOptimizationStrategy has been renamed to ProviderOptimizationStrategy. This renaming is consistent with revised terminology for what were formerly referred to as "vendors".

GraphComputer Updates

GraphComputer.configure(String key, Object value) is now a method (with default implementation). This allows the user to specify engine-specific parameters to the underlying OLAP system. These parameters are not intended to be cross engine supported. Moreover, if there are not parameters that can be altered (beyond the standard GraphComputer methods), then the provider’s GraphComputer implementation should simply return and do nothing.

Driver Providers

Aliases Parameter

The "rebindings" argument to the "standard" OpProcessor has been renamed to "aliases". While "rebindings" is still supported it is recommended that the upgrade to "aliases" be made as soon as possible as support will be removed in the future. Gremlin Server will not accept both parameters at the same time - a request must contain either one parameter or the other if either is supplied.

ThreadLocal Transaction Settings

If a driver configures the Transaction.onReadWrite() or Transaction.onClose() settings, note that these settings no longer apply to all future requests. If the settings are changed for a sessionless request they will only apply to that one request. If the settings are changed from an in-session request they will only apply to all future requests made in the scope of that session.

TinkerPop 3.0.0

gremlin hindu

A Gremlin Rāga in 7/16 Time

TinkerPop 3.0.2

Release Date: October 19, 2015

Please see the changelog for a complete list of all the modifications that are part of this release.

Upgrading for Users

BulkLoaderVertexProgram (BLVP)

BulkLoaderVertexProgram now supports arbitrary inputs (i addition to HadoopGraph, which was already supported in version 3.0.1-incubating). It can now also read from any TP3 enabled graph, like TinkerGraph or Neo4jGraph.

TinkerGraph

TinkerGraph can now be configured to support persistence, where TinkerGraph will try to load a graph from a specified location and calls to close() will save the graph data to that location.

Gremlin Driver and Server

There were a number of fixes to gremlin-driver that prevent protocol desynchronization when talking to Gremlin Server.

On the Gremlin Server side, Websocket sub-protocol introduces a new "close" operation to explicitly close sessions. Prior to this change, sessions were closed in a more passive fashion (i.e. session timeout). There were also so bug fixes around the protocol as it pertained to third-party drivers (e.g. python) using JSON for authentication.

Upgrading for Providers

Graph Driver Providers

Gremlin Server close Operation

It is important to note that this feature of the sub-protocol applies to the SessionOpProcessor (i.e. for session-based requests). Prior to this change, there was no way to explicitly close a session. Sessions would get closed by the server after timeout of activity. This new "op" gives drivers the ability to close the session explicitly and as needed.

TinkerPop 3.0.1

Release Date: September 2, 2015

Please see the changelog for a complete list of all the modifications that are part of this release.

Upgrading for Users

Gremlin Server

Gremlin Server now supports a SASL-based (Simple Authentication and Security Layer) authentication model and a default SimpleAuthenticator which implements the PLAIN SASL mechanism (i.e. plain text) to authenticate requests. This gives Gremlin Server some basic security capabilities, especially when combined with its built-in SSL feature.

There have also been changes in how global variable bindings in Gremlin Server are established via initialization scripts. The initialization scripts now allow for a Map of values that can be returned from those scripts. That Map will be used to set global bindings for the server. See this sample script for an example.

Neo4j

Problems related to using :install to get the Neo4j plugin operating in Gremlin Console on Windows have been resolved.

Upgrading for Providers

Graph System Providers

GraphFactoryClass Annotation

Providers can consider the use of the new GraphFactoryClass annotation to specify the factory class that GraphFactory will use to open a new Graph instance. This is an optional feature and will generally help implementations that have an interface extending Graph. If that is the case, then this annotation can be used in the following fashion:

@GraphFactory(MyGraphFactory.class)
public interface MyGraph extends Graph{
}

MyGraphFactory must contain the static open method that is normally expected by GraphFactory.

GraphProvider.Descriptor Annotation

There was a change that affected providers who implemented GraphComputer related tests such as the ProcessComputerSuite. If the provider runs those tests, then edit the GraphProvider implementation for those suites to include the GraphProvider.Descriptor annotation as follows:

@GraphProvider.Descriptor(computer = GiraphGraphComputer.class)
public final class HadoopGiraphGraphProvider extends HadoopGraphProvider {

    public GraphTraversalSource traversal(final Graph graph) {
        return GraphTraversalSource.build().engine(ComputerTraversalEngine.build().computer(GiraphGraphComputer.class)).create(graph);
    }
}

See: TINKERPOP-690 for more information.

Semantics of Transaction.close()

There were some adjustments to the test suite with respect to how Transaction.close() was being validated. For most providers, this will generally mean checking OptOut annotations for test renaming problems. The error that occurs when running the test suite should make it apparent that a test name is incorrect in an OptOut if there are issues there.

See: TINKERPOP-764 for more information.

Graph Driver Providers

Authentication

Gremlin Server now supports SASL-based authentication. By default, Gremlin Server is not configured with authentication turned on and authentication is not required, so existing drivers should still work without any additional change. Drivers should however consider implementing this feature as it is likely that many users will want the security capabilities that it provides.