* Disk representation ** A disk block now contains two copies of an internal node *** two sets of keys/children data, *** Earliest (logical) time at which they're valid **** Really only the second copy (the "modification block") needs this **** But we might as well include it on the first for sanity-checking. ** Leaf nodes are stored in the same way -- no persistence *** To change a key->value mapping, just create and point to a new leaf * High level implementation ideas ** Implement at block/node level to minimize changes to the B+ algorithm. ** Add caching *** Both for I/O efficiency and keeping to one logical TS per operation *** Have a 'logical time' counter. *** Keep a write-back cache of all the disk blocks we've touched (parents). *** If we get a request to modify a block: **** If its timestamp is current, just make the change (in memory) **** If its timestamp is old, make the necessary persistence changes (switching to the modification field or doing a path copy) in memory, and set the timestamp current for later modifications. *** Have a flush/commit operation that flushes the cache and increments the time counter * More detailed stuff ** A BPlusInternalNode (BPIN) represents "half" a block *** The memory representation contains the offset of the block, and which half *** Child pointers just point to the block as a whole *** getInternalNode() takes a pos and a TS and decides which half was meant *** putInternalNode() is straightforward ** Do caching *** Keep a table of loaded (half-)blocks (as BPINs) *** Change pass-by-value of BPINs to use ptrs, for consistency of shared cache ** Need to do splitting/version updating. *** Initially cached nodes are just read-only *** Then we can make them writable. (makeInternalNodeRW() or a getInternalNode flag?) **** Marks the block dirty and writable **** Brings the timestamp up to date **** If we can do this with the modification box, change the BPIN to "move" into the modification box **** Otherwise, perform a split, in the cache (including recursively applying this procedure to the parent, possibly splitting it...) ** Etc *** Also need to pass timestamps through the tree in lots of places *** And a stack of parent pointers **** Though maybe we could avoid this by just using the cache?