CPU -> registers -> L1 -> L2 -> more caches -> main memory -> disk -> ... RAM/cell probe cares about how many different things we're accessing. But larger disks are slower, but work with larger blocks (better parallelism). Exploit locality in order to minimize number of blocks that need to be interacted with. External memory model (I/O model, disk access model - DAM) [Aggarwal & Vitter 1988] Captures two levels of hierarchy. Cache w/ M/B B-bit words (total size M) connected to CPU via fat pipe - instantaneous transfer. Also connected to disk, arranged in blocks of size B. Can read/write blocks. Slow to transfer between disk and cache. Goal: design algorithms that minimize number of memory transfers. If we've got a T(N) algorithm in the RAM model, we can do it in T(N) memory transfers trivially. We want to get it down -- T(N)/B is the minimum, but usually hard to achieve. Searching B-trees give us O(lg_{B+1} N) Lower bound of Omega(lg_{B+1} N) for searching (comparison model). - Information theory: lg(N+1) bits to discover, lg(B+1) per transfer. Sorting O(N/B lg_{M/B} N/B) - M/B-way mergesort. matching lower bound in comparison model [A&V] Permutation: Rearrange N elements into some new order Theta(min(N , (N/B lg_{M/B} N/B)) N-pick up each block and move it to its new position sorting bound - sort elements, sort permutation, undo permutation Omega in indivisible model (can't divide elements, but other numbers can be split up etc) Open problem: can do better in weaker model? Sorting data structures: Search trees can't be used for sorting: gives O(N lg_(M/B) N/B) instead of O(N/B ...) Buffer trees give O(1/B lg_(M/B) N/B) amortized insert, delete, delete-min, delayed search/range search Cache-oblivious model [Frigo, Leiserson, Prokop, Ramachandran 1999] Just like external memory, except algorithm doesn't know B or M Doesn't explicitly manage memory -- is a RAM algorithm. Memory is managed via automatic block transfers triggered by element access, using offline optimal block replacement policy. (in practice: FIFO/LRU/... 2-competitive on double size cache) Will assume M >= cB for some sufficiently large c (but we usually don't require it to be too big, so that's OK) Why cache-oblivious? Nice clean model Allows RAM algorithms to be used directly Multilevel memory hierarchies captured. (can't do this cleanly in the external memory model) Results B-tree: insert/delete/search in O(lg_{B+1} N) transfers [Bender, Demaine, Farach-Colton 2000] but simplified Sorting in O(N/B lg_(M/B) N/B) [Frigo et al] requires tall-cache assumption: M=Omega(B^(1+\epsilon)) tall-cache necessary in cache-oblivious (but not external-memory) [Brodel & Fagerberg 2003] Priority queue - insert/delete/delete-min O(1/B lg_(M/B) N/B) [Arge, Bender, et al 2002 CAST?] Static search tree [Prokop 1999] Store n elts in order in a complete binary tree Cut tree at middle level of edges - \sqrt{N}+1 subtrees of \sqrt{N} elts Recurse on subtrees Concatenate - van Emde Boas layout Claim: search uses O(lg_(B+1) N) mem transfers Pf: algorithm continues, but we can stop the analysis when we reach a tree that fits in a block Look at level of detail (recursion) that straddles B: each subtree has size <= B, but the whole structure at that level doesn't. Each subtree cost <= 2 to access. How many to access? Each one has height >= 1/2 lg B, so total cost = (lg N)/(1/2 lg B) = 2 lg_B N Works for arbitrary height (not just 2^k) Works for constant-degree (not 1) trees (not just binary) Ordered file maintenance Problem: store N elements, in order in a O(N) array (O(1)-size gaps) Insert an element between 2 elements, preserving order Delete element Black box: can do this by rearranging O(lg^2 N) consecutive elts Dynamic search tree: [Bender, Duan/Puan? Iacono, Wu 2002] Build vEB tree with each leaf corresponding to an array slot in the ofm struct Each internal node stores max of its children (ignoring empty slots) Search in O(lg_(B+1) N) - look at left child Insert(x): - search(x) -> pred or succ = where to insert in ofm - insert into ofm - changes O(lg^2 N) cells - update corresponding leaves and propagate maxima up inp post-order traversal of changed leaves and ancestors Top part costs O(\lg_B N) Claim: if k cells change, cost is O(lg_B N + k/B) Pf: Consider level of detail straddling B again. Look at bottom two levels of /<=B\ Can be done by scanning: only need to store ofm block, current bottom block of tree, next-from-bottom block of tree. With J>B in large square, O(J/B + 1) = O(J/B) O(K/B) for the bottom two levels. After bottom two levels, also O(K/B), since J>B are reduced to 1. Now have B-tree with insert (and delete, equivalently) in O(lg_(B+1) N + (lg^2 N) / B), search in O(lg_(B+1) N) Can get rid of (lg^2 N)/ B via indirection: cluster into groups of lg n store min of group in ofb, previous structure rewriting group requires O((lg N)/B) <= O(lg_B N) may need to be split after lg_N inserts, but can be amortized away Now O(lg_(B+1) N + lg N/B) = O(lg_(B+1) N) amortized updates.