public interface MapReduce<MK,MV,RK,RV,R> extends Cloneable
MapReduce.Stage.MAP
stage processes the vertices of the Graph
in a logically parallel manner.
The MapReduce.Stage.COMBINE
stage aggregates the values of a particular map emitted key prior to sending across the cluster.
The MapReduce.Stage.REDUCE
stage aggregates the values of the combine/map emitted keys for the keys that hash to the current machine in the cluster.
The interface presented here is nearly identical to the interface popularized by Hadoop save the map() is over the vertices of the graph.Modifier and Type | Interface and Description |
---|---|
static interface |
MapReduce.MapEmitter<K,V>
The MapEmitter is used to emit key/value pairs from the map() stage of the MapReduce job.
|
static class |
MapReduce.NullObject
A convenience singleton when a single key is needed so that all emitted values converge to the same combiner/reducer.
|
static interface |
MapReduce.ReduceEmitter<OK,OV>
The ReduceEmitter is used to emit key/value pairs from the combine() and reduce() stages of the MapReduce job.
|
static class |
MapReduce.Stage
MapReduce is composed of three stages: map, combine, and reduce.
|
Modifier and Type | Field and Description |
---|---|
static String |
MAP_REDUCE |
Modifier and Type | Method and Description |
---|---|
default void |
addResultToMemory(Memory.Admin memory,
Iterator<KeyValue<RK,RV>> keyValues)
The final result can be generated and added to
Memory and accessible via DefaultComputerResult . |
MapReduce<MK,MV,RK,RV,R> |
clone()
When multiple workers on a single machine need MapReduce instances, it is possible to use clone.
|
default void |
combine(MK key,
Iterator<MV> values,
MapReduce.ReduceEmitter<RK,RV> emitter)
The combine() method is logically executed at all "machines" in parallel.
|
static <M extends MapReduce> |
createMapReduce(Graph graph,
Configuration configuration)
A helper method to construct a
MapReduce given the content of the supplied configuration. |
boolean |
doStage(MapReduce.Stage stage)
A MapReduce job can be map-only, map-reduce-only, or map-combine-reduce.
|
R |
generateFinalResult(Iterator<KeyValue<RK,RV>> keyValues)
The key/value pairs emitted by reduce() (or map() in a map-only job) can be iterated to generate a local JVM Java object.
|
default Optional<Comparator<MK>> |
getMapKeySort()
If a
Comparator is provided, then all pairs leaving the MapReduce.MapEmitter are sorted. |
String |
getMemoryKey()
The results of the MapReduce job are associated with a memory-key to ultimately be stored in
Memory . |
default Optional<Comparator<RK>> |
getReduceKeySort()
If a
Comparator is provided, then all pairs leaving the MapReduce.ReduceEmitter are sorted. |
default void |
loadState(Graph graph,
Configuration configuration)
When it is necessary to load the state of a MapReduce job, this method is called.
|
void |
map(Vertex vertex,
MapReduce.MapEmitter<MK,MV> emitter)
The map() method is logically executed at all vertices in the graph in parallel.
|
default void |
reduce(MK key,
Iterator<MV> values,
MapReduce.ReduceEmitter<RK,RV> emitter)
The reduce() method is logically on the "machine" the respective key hashes to.
|
default void |
storeState(Configuration configuration)
When it is necessary to store the state of a MapReduce job, this method is called.
|
default void |
workerEnd(MapReduce.Stage stage)
This method is called at the end of the respective
MapReduce.Stage for a particular "chunk of vertices."
The set of vertices in the graph are typically not processed with full parallelism. |
default void |
workerStart(MapReduce.Stage stage)
This method is called at the start of the respective
MapReduce.Stage for a particular "chunk of vertices."
The set of vertices in the graph are typically not processed with full parallelism. |
static final String MAP_REDUCE
default void storeState(Configuration configuration)
configuration
- the configuration to store the state of the MapReduce job in.default void loadState(Graph graph, Configuration configuration)
graph
- the graph the MapReduce job will run againstconfiguration
- the configuration to load the state of the MapReduce job from.boolean doStage(MapReduce.Stage stage)
stage
- the stage to check for definition.void map(Vertex vertex, MapReduce.MapEmitter<MK,MV> emitter)
MapReduce
classes must at least provide an implementation of MapReduce#map(Vertex, MapEmitter)
.vertex
- the current vertex being map() processed.emitter
- the component that allows for key/value pairs to be emitted to the next stage.default void combine(MK key, Iterator<MV> values, MapReduce.ReduceEmitter<RK,RV> emitter)
key
- the key that has aggregated valuesvalues
- the aggregated values associated with the keyemitter
- the component that allows for key/value pairs to be emitted to the reduce stage.default void reduce(MK key, Iterator<MV> values, MapReduce.ReduceEmitter<RK,RV> emitter)
key
- the key that has aggregated valuesvalues
- the aggregated values associated with the keyemitter
- the component that allows for key/value pairs to be emitted as the final result.default void workerStart(MapReduce.Stage stage)
MapReduce.Stage
for a particular "chunk of vertices."
The set of vertices in the graph are typically not processed with full parallelism.
The vertex set is split into subsets and a worker is assigned to call the MapReduce methods on it method.
The default implementation is a no-op.stage
- the stage of the MapReduce computationdefault void workerEnd(MapReduce.Stage stage)
MapReduce.Stage
for a particular "chunk of vertices."
The set of vertices in the graph are typically not processed with full parallelism.
The vertex set is split into subsets and a worker is assigned to call the MapReduce methods on it method.
The default implementation is a no-op.stage
- the stage of the MapReduce computationdefault Optional<Comparator<MK>> getMapKeySort()
Comparator
is provided, then all pairs leaving the MapReduce.MapEmitter
are sorted.
The sorted results are either fed sorted to the combine/reduce-stage or as the final output.
If sorting is not required, then Optional.empty()
should be returned as sorting is computationally expensive.
The default implementation returns Optional.empty()
.Optional
of a comparator for sorting the map output.default Optional<Comparator<RK>> getReduceKeySort()
Comparator
is provided, then all pairs leaving the MapReduce.ReduceEmitter
are sorted.
If sorting is not required, then Optional.empty()
should be returned as sorting is computationally expensive.
The default implementation returns Optional.empty()
.Optional
of a comparator for sorting the reduce output.R generateFinalResult(Iterator<KeyValue<RK,RV>> keyValues)
keyValues
- the key/value pairs that were emitted from reduce() (or map() in a map-only job)String getMemoryKey()
Memory
.default void addResultToMemory(Memory.Admin memory, Iterator<KeyValue<RK,RV>> keyValues)
Memory
and accessible via DefaultComputerResult
.
The default simply takes the object from generateFinalResult() and adds it to the Memory given getMemoryKey().memory
- the memory of the GraphComputer
keyValues
- the key/value pairs emitted from reduce() (or map() in a map only job).MapReduce<MK,MV,RK,RV,R> clone()
storeState(Configuration)
and loadState(org.apache.tinkerpop.gremlin.structure.Graph, Configuration)
model.
The default implementation simply returns the object as it assumes that the MapReduce instance is a stateless singleton.static <M extends MapReduce> M createMapReduce(Graph graph, Configuration configuration)
MapReduce
given the content of the supplied configuration.
The class of the MapReduce is read from the MAP_REDUCE
static configuration key.
Once the MapReduce is constructed, loadState(org.apache.tinkerpop.gremlin.structure.Graph, Configuration)
method is called with the provided configuration.graph
- The graph that the MapReduce job will run againstconfiguration
- A configuration with requisite information to build a MapReduceCopyright © 2013–2022 Apache Software Foundation. All rights reserved.