Improving the MapReduce Algorithm:
Avoiding Useless Work
• Can we avoid processing vertices that do not improve any distances in the
iteration? This turns out to be easy based on the following observations:
– If a vertex u has distance d[u]=∞, then it cannot help reduce the distance for
any of the vertices in its adjacency list.
– Assume vertex u had the same distance d[u]=x in iterations i and (i+1). For any
vertex v in its adjacency list, the Map function call for u would emit the same
value x+weight(u,v) in both iterations. Since this value was already included in
the Reduce computation in iteration i, it cannot result in improvements in
iteration (i+1).
• To exploit these properties, we can do the following:
– Like BFS, the program distinguishes between “active” and “inactive” vertices.
Active vertices are those that could potentially help reduce the distance for
another vertex. We define a vertex to be active if and only if its distance value
changed in the previous iteration. The only exception to this rule is source
vertex s, which is set to “active” before the first iteration. Note that a vertex
that was active in one iteration could become inactive in the next, and vice
versa.
– It is easy to prove that a vertex whose distance reached the final value, i.e.,
the shortest-path distance from s, will remain inactive afterwards. Hence the
algorithm can stop iterating as soon as all vertices become inactive.
• The corresponding program is shown next.
63