Home

Acyclic Dependency Digraphs for Immutable Chunks

Jason Cairns

2021-08-23

1 Introduction

Granting only immutable chunks to the largeScaleR system gains both guarantees and challenges for it. The guarantees involve easier reasoning about the system and consistency of chunk-related transactions [1]. Challenges grow alongside additional memory usage from copying of data, as compared to in-place data modification. Much of the memory growth of immutable objects is optimised away within R, but this is difficult to rely upon once the system becomes distributed [2].

This report suggests the addition of a significant component to the largeScaleR system, which has positive implications for backup/checkpointing, as well as solving the memory problem, and it finally gains the system fault-tolerance. The intended comonent is a reification of dependencies between chunks as an abstract Directed Acyclic Graph (DAG), which sees it’s distributed realisation described in an associated document, Distributed Hash Table for the Dependency Graph. Each chunk comes into being either from manipulation of other chunks, or direct reading from external system such as a read from HDFS. In this fashion, a graph of dependencies can be defined over the set of chunks that exist as a part of the system at any one point in time. Each node in this graph may represent a single chunk, with directed edges pointing from each node to the nodes representing that chunk’s prerequisites. This can be represented as a diagram, or in textual notation, with makefile rules being a popular example [3]. The terminology of makefiles will be used in this document for clarity, with ‘target’ referring to a node representing some chunk that depends on any number of chunks, themselves represented by ‘prerequisite’ nodes. The connection with the diagram has ‘target’ nodes pointing back to their ‘prerequisite’ nodes. A ‘prerequisite’ may also be referred to as a ‘dependency’.

2 Relation to System

An explicit graph that can be queried and modified by the system grants especial utility as it may store further information on the system’s chunks. This information may be relayed in turn to update the graph. Importantly, the graph is modular; while the system is dependent upon the graph, and queries and updates it, it remains a distinct and separate entity, described herein as some abstract data type.

3 Chunk Information: Location

An immediate example of the use of the graph is for the storage of chunk metadata. Most importantly, information on chunk location can be stored as part of this graph, and the system can use it for chunk access and directing RPC’s to the location of their principal chunks.

4 Chunk Reference Counting

A valuable use of the graph is to relate referencing information to nodes on the graph as an aid to distributed garbage collection. If a chunk is referenced in the system, that is reflected by some default marking on the graph. Upon garbage collection of a chunk in the relevant R process referring to it, the node may be marked as unreferenced, in turn triggering the global deletion of all data relating to a chunk in the system.

5 Checkpointing and Checkpoint Triggers

In order to back up data, which may be later restored in the face of machine failure, checkpoints of individual chunks may be taken [4].

For all practical purposes, not every single chunk can be checkpointed. The amount of time spent writing to disk, or replicating across machines, is significant and will slow the system. As such, the non-trivial decision of which chunks to checkpoint, and how to to restore from sparse checkpoints, serves as the basis of this discussion.

A variety of mechanisms may be used to trigger the designation of a node as a checkpoint, as well as methods of checkpointing. The methods of checkpointing include redundancy along the cluster, or dumping the data structure to disk; dumping is favoured in this discussion due to the easier reasoning, but either could be considered or even combined without loss in generality [5]. Triggers for checkpointing may be classified into whether they occur at the creation of the node, or during it’s lifetime.

Given that each node retains a reference to it’s direct prerequisites, information on the dependencies is easily accessed, and may be propagated along the graph as each new node is added. This fact can be taken advantage of in order to implement creation-based checkpointing, in aid of fault-tolerance.

I suggest a time-to-recover checkpointing scheme, which attempts to checkpoint based on reaching a certain limit for how long it would take for the system to recover in the face of a worst-case fault, such as a total power outage. This is performed through recording how long it takes to independently generate each chunk, and recording that information in the node associated with the chunk. Each chunk also takes the maximum generation time from each of it’s direct prerequisites and adds that to it’s own time, in order to create a cumulative generation time, in a similar fashion to a Merkle tree or blockchain where properties of nodes accumulate along references, though without the cryptographic properties [6], [7]. When the pre-defined cumulative generation time limit is reached, the chunk at which the recorded limit is reached is designated for checkpointing, and the cumulative generation time resets (zeroes) for it’s dependencies.

Limits on chain length can be placed similarly, where instead of time, a maximum count of nodes forming the path back to the originating checkpoint can be taken, with checkpointing taking place upon reaching the limit.

Dynamic checkpointing, taking place after node creation, may be used to checkpoint upon certain memory thresholds being reached in the chunk host, with a full dump to disk and system stall, before any system crash.

My suggestion for the restoration of the system to current working state following node failure, from sparse checkpoints, can be performed in the following manner; If each node retains a record of the precise function used to create it’s chunk, along with references to the chunks required by the function (it’s immediate prerequisites), then it has an effective delta encoding to represent means of attaining one chunk from a prerequisite, and the graph of dependencies can be seen as a complete record of the construction of chunks, somewhat akin to a function-based, rather than line-based, git [8]. As such, any given node may be reconstructed by recursively walking back over the graph along the dependencies of nodes, collecting the required difference functions in a stack, until arriving at checkpoints, or leaf nodes representing file reads. Upon reaching the checkpoints or leaves, ordered application of all of the accumulated difference functions through popping the stack, should result in the recreation of the node to be restored, assuming referential transparency. Restoration from checkpoints serves effectively for enabling fault-tolerance in this respect.

6 Self-Pruning of Dependency DAG’s

Up to this point, the graph as described has been append-only. With such a description, it will grow excessively large, creating memory and traversal issues. In conjunction with checkpointing, I consider a means of pruning the graph, keeping it to the minimum size necessary for recovery of the current state of the system.

First, it is necessary to recognise that nodes representing unreferenced chunks still serve the bare purpose of delineating an intermediate (delta) transformation to target referenced chunks at some point further along the dependency path [9]. If some checkpointed nodes are placed in the path of one of these unreferenced delta nodes and all of the delta’s target referenced chunks, the unreferenced delta node is no longer necessary, and may be pruned.

The task then becomes one of determining the unreferenced delta nodes that have all their dependency paths, if followed backwards, resulting in a checkpoint. A reference counting algorithm suffices to reveal these nodes, when combined with the important observation that checkpoints shouldn’t count as references. There are then four rules for algorithm for node removal:

  1. Every node has an target counter, as an integer, initialised at zero.
  2. The introduction of a target node must result in the unit incrementation of all of it’s direct prerequisite’s target counters.
  3. The removal of a target node, or a node becoming a checkpoint, must result in the unit decrementation of all of it’s direct prerequisite’s target counters.
  4. Unreferenced nodes with a target counter of zero are to be removed.

With these rules followed, unreferenced nodes with all targets resulting in checkpoints are immediately removed from the graph, thereby preserving the graph as being the minimum size required for restoration at the current point in time.

[1]
B. Goetz, Java concurrency in practice. Upper Saddle River, NJ: Addison-Wesley, 2006.
[2]
R. C. Team, R language definition. 2020.
[3]
M. Shal, “Build system rules and algorithms,” Published online (2009). Retrieved July, vol. 18, p. 2013, 2009.
[4]
E. N. (Mootaz) Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson, “A survey of rollback-recovery protocols in message-passing systems,” ACM Computing Surveys, vol. 34, no. 3, pp. 375–408, Sep. 2002, doi: 10.1145/568522.568525.
[5]
J. P. Walters and V. Chaudhary, “Replication-based fault tolerance for MPI applications,” IEEE Transactions on Parallel and Distributed Systems, vol. 20, no. 7, pp. 997–1010, Jul. 2009, doi: 10.1109/tpds.2008.172.
[6]
R. C. Merkle, “A digital signature based on a conventional encryption function,” in Advances in cryptology CRYPTO ’87, Springer Berlin Heidelberg, 1988, pp. 369–378. doi: 10.1007/3-540-48184-2_32.
[7]
S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” Decentralized Business Review, p. 21260, 2008.
[8]
S. Chacon and B. Straub, “Git basics,” in Pro git, Apress, 2014, pp. 15–41. doi: 10.1007/978-1-4842-0076-6_2.
[9]
J. Mogul et al., “Delta encoding in HTTP,” RFC Editor, Jan. 2002. doi: 10.17487/rfc3229.