Austin Clements (amdragon@mit.edu) and I are interested in working on the problem of data structures for persistent file systems. Motivation: We want to build a file system that's persistent, in the data-structures sense --- allowing the user to go back in time and access the previous states of the file system. This has lots of obvious applications: recovering from accidental deletions and other errors, automatic (albeit rudimentary) version control, etc. We're actually working on implementing such a system, called PersiFS, for a 6.824 (Distributed Systems) project, along with two other people. If you're interested in more details about why this is a good idea, or how to implement it (from a systems perspective), see [XXX PersiFS paper URL]. Our initial design calls for some fairly simple data structures that will provide most of the necessary functionality but leave something to be desired in terms of efficiency, in addition to having little theoretical value. We would like to identify data structures for this purpose --- perhaps devising new ones or variations on existing ones as necessary --- and implement them in order to compare their performance against our original data structures. Details: Our data structure needs to be able to support arbitrary modifications to the content and metadata of files on a filesystem, in a persistent way. Only partial persistence is necessary; we need to be able to view previous revisions, but only the latest revision should be modifiable. Performance in this case would have to be measured in terms of disk accesses, so we're looking at data structures in the external memory model. Cache-obliviousness is probably not required. Storing every version of every file on disk could require inordinately large amounts of disk space, so space-efficiency is critical. We can only afford to store changes between revisions. It would also be nice to support efficiently erasing a revision from the data structure, e.g. to purge old revisions to save space. This is non-critical, however, since our current design doesn't support this either. Design Ideas: WRITEME - Chunk indirection/LBFS to minimize storage costs. - Concatenation trees/ropes for file -> chunk mapping? - B-trees are always good -- Sleator-Tarjan persistifying them -- That Arge paper -- That other paper