apache tinkerpop logo

3.7.4-SNAPSHOT

TinkerPop Future

This document offers a rough view at what features can be expected from future releases and catalogs proposals for changes that might be better written and understood in a document form as opposed to a dev list post.

This document is meant to change and restructure frequently and should serve more as a guide rather than directions set in stone. As this document represents a future look, the current version is always on the master branch and therefore the only one that need be maintained.

Roadmap

gremlin explorer old photoThis section provides some expectations as to what features might be provided in future major versions. It is not a guarantee of a feature landing in a particular version, but it yields some sense of what core developers have some focus on. The most up-to-date version of this document will always be in the git repository.

The sub-sections that follow only outline those release lines that expect to include significant new features. Other release lines may continue to be active with maintenance work and bug fixes, but won’t be listed here. The items listed in each release line represent unreleased changes only. Once an official release is made for a line, the roadmap items are removed with new items taking their place as they are planned. The release line is removed from the roadmap completely when it is no longer maintained.

TinkerPop 4.x

TinkerPop 4 marks the beginning of the move into semantic versioning, as discussed in the DISCUSS thread. Development has begun with the switch from WebSocket to HTTP/1.1 for the underlying transport of Gremlin Server, along with many new features that have been proposed. Here is a rough outline of where new features are expected to land, with major breaking features lined for 4.0, and additional features lined up for minor versions. Additional details on each feature can be found in the Appendix.

4.0 Milestone Release (2024Q4)

As discussed in the DISCUSS thread, we are looking to release a milestone build of what is done so far to gather feedback.

This milestone would roughly offer the opportunity to try Gremlin Server HTTP with the revised GraphSON and GraphBinary serialization formats for TinkerPop 4.0, using Java and Python drivers. These drivers will initially continue to support GraphSON as it can help with debugging (given its human-readable format) and offers wider serialization options for users in these early days, we should expect to just have GraphBinary support for drivers in the future for simplicity.

This milestone release would include the following list of features for preview:

4.0 (2025H1)

Full GA release of TinkerPop 4.0.

TinkerPop 5.x


Features originally planned for 3.7.x.

  • Add support for traversals as parameters for V(), is(), and has() (includes Traversal arguments to P)

  • Add subgraph/tree structure in all GLVs

  • Define semantics for query federation across Gremlin servers (depends on call() step)

  • Gremlin debug support

  • Case-insensitive search (TINKERPOP-2673)

  • Mutation steps for clone() of an Element and for moveE() for edges.

  • Add a language element to merge Map objects more easily.

Proposals

This section tracks and details future ideas. While the dev list is the primary place to talk about new ideas, complex topics can be initiated from and/or promoted to this space. While it is fine to include smaller bits of content directly in future/index.asciidoc, longer, more developed proposals and ideas would be better added as individual asciidoc files which would then be included as links to the GitHub repository where they will be viewable in a formatted state. In this way, this section is more just a list of links to proposals rather than an expansion of text. Proposals should be named according to this pattern "proposal-<name>-<number>" where the "name" is just a logical title to help identify the proposal and the "number" is the incremented proposal count.

The general structure of a proposal is fairly open but should include an initial "Status" section which would describe the current state of the proposal. A new proposal would likely hae a status like "Open for discussion". From there, the proposal should include something about the "motivation" for the change which describes a bit about what the issue is and why a change is needed. Finally, it should explain the details of the change itself.

At this stage, the proposal can then be submitted as a pull request for comment. As part of that pull request, the proposal should be added to the table below. Proposals always target the master branch.

The table below lists various proposals and their disposition. The Targets column identifies the release or releases to which the proposal applies and the Resolved column helps clarify the state of the proposal itself. Generally speaking, the proposal is "resolved" when the core tenants of its contents are established. For some proposals that might mean "fully implemented", but it might also mean "scheduled and scoped with open issues set aside". In that sense, the meaning is somewhat subjective. Consulting the "Status" section of the proposal itself will provide the complete story.

Proposal Description Targets Resolved

Proposal 1

Equality, Equivalence, Comparability and Orderability Semantics - Documents existing Gremlin semantics along with clarifications for ambiguous behaviors and recommendations for consistency.

3.6.0

N

Proposal 2

Gremlin Arrow Flight.

4.0.0

N

Proposal 3

Removing the Need for Closures/Lambda in Gremlin

3.7.0

N

Appendix

TinkerPop 4.x Feature Details

HTTP support - Server

Currently under development in the master-http branch. This body of work aims to replace the WebSocket protocol in Gremlin Server with HTTP/1.1 (DISCUSS thread). For API design, see TINKERPOP-3065 Implement a new HTTP API.

HTTP support - GLVs

As server will no longer support WebSocket, each GLVs will also switch to HTTP protocol. Connection options should be simplified with HTTP compared to WebSocket, and should be unified across all GLVs to the best of each language’s library availability. This will also include implementing interface for pluggable request interceptor for authentication, as raised in the DISCUSS thread.

IO serialization updates

TinkerPop’s serialization IO has not been updated for quite a long time, there are serializers that can and should likely be removed, and definition updated to make the IO overall more simple and maintainable. (DISCUSS thread)

Switch default from GremlinGroovyScriptEngine to GremlinLangScriptEngine

Switching the default script processing from GremlinGroovyScriptEngine to GremlinLangScriptEngine is a step towards removing dependency on Groovy in the Gremlin Server. Currently, the TinkerPop testing system make heavy use of the Groovy script engine, and a major portion of the work will involve updating the tests.

Gremlin Console rework

As a result of sessions removal and switch to gremlin-lang, the Gremlin Console remote mode will be affected, and users may notice a difference in the interactive experience on the Console. Additional discussions may be needed on the impact and acceptable changes. (DISCUSS thread)

Transaction redesign

As transaction will have to be implemented over HTTP, this is an opportunity to improve the usability of the transaction APIs. This potentially mean redesigning the transaction model so that it is better suited for all graphs, align remote and embedded transaction usages, and ensure transaction support in GLVs. Such API redesign will be a breaking change that needs to be introduced in the initial release of TP4, which can include stub implementations only, with full implementation added iteratively in minor releases.

Bytecode removal

One of the purposes that bytecode served was to provide a universal way to translate a Traversal. However, with the introduction of the gremlin-lang parser this need can be fulfilled differently. Any Gremlin script can be converted into a Traversal in a uniform way which reduces the need for bytecode. Now, we are left with two systems that serve a similar purpose, it is probably time to remove one of them during a major version upgrade, see (DISCUSS thread).

Before the full removal can be implemented, a few updates will be needed in gremlin-lang to ensure appropriate types are covered. Each GLV will also have to be updated to switch from bytecode based to string based traversal construction. A proposed plan includes:

  1. Extract interface from Bytecode, and implement string based traversals and request options

  2. Add support for missing types, such as UUID, Set, Edge, ByteBuffer, etc. in gremlin-lang (TINKERPOP-3023)

  3. Add missing types to GLVs and rework traversal generation

  4. Ensure Feature tests work properly

I/O serialization update needed

One important note for this proposed plan is that currently gremlin-lang does not cover all types supported via Bytecode, which means either all missing types need to be fully defined and implemented in the gremlin-lang parser for parity (related to Type System), or consensus have to be reached in the community on if reduced type support is acceptable, and if so, which types can be omitted at this point.

Groovy removal in Gremlin Server

Removing Groovy from Gremlin Server implies:

  1. Revising the configuration system to avoid the init script through Groovy. This is also an opportunity to simply server set-up.

  2. Deprecate GremlinGroovyScriptEngine for GremlinLangScriptEngine for script processing

  3. Remove/replace all the Groovy based plugin infrastructure from the server

One main impact of how Groovy allows arbitrary code to be executed on the server is security vulnerabilities. However, the removal of this system itself has overreaching affects in the community that should be discussed.

Schema support

Schema support relies on a well-defined type system.

Multi-label, no label, mutable label support

TinkerPop only support single, immutable labels for its Elements. Various providers have implemented their own mechanisms for multi-label, no label, and/or mutable label support. Many popular non-TinkerPop graph models also allow for multiple labels. It is time to consider bringing these functionalities into parity.

Multi/meta properties on edges

Currently, meta-properties only exists on vertices, this extends to allowing meta-properties on edges.

Pluggable System for explain/profile()

While TinkerPop provides explain() and profile() steps, switching to a pluggable architecture would increase flexibility for providers who wish to customize the amount and format of information they return. (DISCUSS Thread)

An extension of this is for explain() to work in remote fashion, see TINKERPOP-2128

Improve local() step

The concept and application of the local() step has been somewhat confusing to users, and the addition of the string and list manipulation steps in 3.7 further blurred some definitions of local execution in a traversal. It is a good time to start considering a redesign or improved design of the local() step.

Type conversion with cast() step

We have introduced asString() and asDate() in 3.7, this would be to introduce additional casting steps like toInt(), which should rely on a well-defined type system.

New Gremlin language elements for geospatial and vector computation

Similar to how string and list manipulation steps were introduced, there is room for creating first-class steps for vector computation and geospatial steps (DISCUSS Thread).

Rework match() step

The match() step has been an attempt to introduce a way of declarative form of querying in TinkerPop based on pattern matching. There exists various issues with the step, and rework is due for improvements.

Unresolved issues related to current match():

has() accepting Traversal

This is a body of work that was in the roadmap for 3.7.x, which is to add support of traversals as parameters to has(), which should expand the usability of the Gremlin language.

Query status/query cancellation

These are useful features for debugging and improved resource management that have been implemented by providers, but would now be a good time to bring parity into TinkerPop.

Unify algorithm steps

Moving the algorithm steps into call() step or generify them in some way.

Modernize IO for OLAP

As name suggests, we should remove old file serialization formats, and introduce more modernized format for IO. One possible candidate is GraphAR, which is a standard data file format for graph data storage and retrieval, currently an incubating Apache project.

A potential large extension of this work, which may not be included for this version yet, is revisiting OLAP in general to resolve open JIRA issues.

Remove neo4j-gremlin

As discussed inside (DISCUSS thread), neo4j-gremlin was deprecated in 3.7 with the introduction of native transaction in TinkerGraph. TP4 would be the place to remove the module.

Documentation reorganization

In addition to the necessary documentation updates needed for new TP4 feature implementations, this entails more major rework to the documentation structure.

The current documentation is very thorough in certain areas, but lacking in many others. The accumulation of the features and functionalities over the past years likely mean that certain information are outdated, and/or should be reworded for clarity. While we have a generous amount of reference material, there tend to lack implementation guidelines for contributors and providers. TP4 is an opportunity to rework the documentations to be more thorough, concise, clear, and easy to update when new features are implemented.

Another implication of this is to revisit the current documentation generation process. We have a very complex scripting structure that we use to orchestrate the generation of documentations, combined with Maven plugins for language specific docs. This process maybe affected by any major alterations to documentation structure, which would need some effort to revise.

Deprecate sparql-gremlin

This module of TinkerPop has been largely unmaintained and likely unused for many years. Unless we receive fresh interest and contribution, it would be the time to deprecate and remove in a future version.

Proxy implementation

Implementing a proxy for Gremlin Server might be a viable alternative to implementing clustering in the client, for orchestrating multiple Gremlin Server instances, and/or rerouting WebSocket/HTTP requests for compatibility.

io() step improvements

Simply io() for data ingestion and export in both embedded and remote usage in some way, and add support for CSV format.

Matrix testing

This aims to create an automated testing set up, which helps to ensure compatibility between drivers and server across minor releases, and to make sure API contracts are not broken unintentionally.

Improved telemetry in driver/server

This is a less well-defined area, aimed at improved metrics collection that can better aid debugging for users and providers. Work may include adding the ability to debug queries and traversals, adding OpenTelemetry support, etc.

Type System

TinkerPop has not had one’s own type system defined and has been relying on the JVM types, which becomes a problem especially in GLVs that doesn’t have corresponding types defined in their language. (DISCUSS thread)

4.x Branching Methodology

Development of 4.x occurs on the 4.0-dev branch. This branch was created as an orphan branch and therefore has no history tied to any other branch in the repo including master. As such, there is no need to merge/rebase 4.0-dev. When it comes time to promote 4.0-dev to master the procedure for doing so will be to:

  1. Create a 3.x-master branch from master

  2. Delete all content from master in one commit

  3. Rebase 4.0-dev on master

  4. Merge 4.0-dev to master and push

From this point 3.x development will occur on 3.x-master and 4.x development occurs on master (with the same version branching as we have now, e.g 3.3-dev, 4.1-dev, etc.) The 3.x-master branch changes will likely still merge to master, but will all merge as no-op changes.