\documentclass[twocolumn,11pt]{article}
\usepackage{fullpage}
\usepackage{url}

\title{PersiFS: \LARGE{A Continuously Versioned Network File System}}

\author{Austin T. Clements, Dan R. K. Ports, Ben A. Schmeckpeper, Hector Yuen \\
\small{\texttt{\{aclements, drkp, bschmeck, hyz\}@mit.edu}}}

\newcommand{\PersiFS}{\emph{PersiFS}}

\begin{document}

\maketitle

% First draft report. This should include a draft of your report's
% abstract, introduction, related work, and design sections. These
% sections should be in good shape, close to what they would look like
% in the final report. Be sure that the draft clearly states what your
% project's goals are, why those goals are worthwhile, and how you're
% going to achieve those goals.

\begin{abstract}
  {\small\it Most file systems are \emph{ephemeral}, meaning that once
    a change has been made, there is no way to recall the previous
    contents of the file system.  Backups, version control systems,
    and user interface improvements such as ``trash cans'' attempt to
    alleviate this problem; however, these are all rough
    approximations of \emph{persistent} file system structures, giving
    users restricted access to a restricted set of past states of the
    file system.  \PersiFS\ is a \emph{fully persistent} or
    \emph{continuously versioned} file system, allowing access to
    \emph{any} past state of the entire file system.  Furthermore,
    \PersiFS\ exposes the file system and its full set of past states
    via a conventional file system interface, allowing unmodified
    applications to access the past states of the file system.}
\end{abstract}

\section{Introduction}          % Ben

% Overall description
%% Meaning of ``persistence''
%% Versioning is done at the scope of the entire file system, not just
%% to individual files
%% Ask for a snapshot of the entire file system at a particular point
%% in time

\PersiFS\ is a network file system which remembers all changes made to
every file and allows the user to recall any and all modifications to
any file, including deletions.  Time stamps are used to select a view
of the file system's past state.  A user can specify a time in the
past at which the file was correct and \PersiFS\ will provide
read-only access to that past version of the file.  \PersiFS's time
stamp interface provides users with a natural way to express the
desire to access past versions of files.

A \emph{version-controlled file system} lets the user access not just
the current state of the file system but previous states as well.  For
example, backups are one typical, restricted form of a
version-controlled file system, allowing high-latency access to a
highly restricted set of past copies of a file system.

\PersiFS\ is a \emph{continuously} versioned file system.  A
continuously version-controlled file system allows access to the
complete state of the file system at \emph{any} point in the past.
Incorporating continuous versioning into the file system means that a
file can never be lost and mistaken modifications can always be
undone.  In \PersiFS, any change to a file forces the file system to
archive a copy of that file, indexed by a time stamp.

\subsection{Motivation}

% Motivations
%% Versioning is good
%% See the wiki
%%% File systems don't provide a trash can, even though UI's do.  This
%%% provides a file system-supported trash can that applications don't
%%% have to be aware of

Users frequently wish to undo changes they have made, whether
intentionally or inadvertently, to the file system --- for example,
inadvertent deletions, restoring files corrupted by application bugs,
or simply reverting to an earlier revision of a document. Regular file
systems do not support this operation natively, which results in a
number of tools that address parts of this problem in different ways.
Some operating systems provide a ``trash can'' interface to help users
avoid mistaken deletions of files.  However, this is only a partial
solution because it only addresses the problem of deletions, and even
then only until the trash can is emptied and the files are permanently
removed. The proliferation of ``undelete'' tools for administrators
suggests that this solution is inadequate. \PersiFS\ makes these tools
unnecessary, since files are never truly deleted from the file system.
Fears of accidental overwrite, rename or deletion are unnecessary.

Versioning is a natural and desirable aspect of file systems. Often,
critical files are versioned through the use of version control
systems. However, these must be explicitly configured, maintained, and
interacted with. Providing such support at the file system level gives
the ability to easily track changes to \emph{any} file over time. To
eliminate the need for user interaction and the possibilitiy of user
error, systems exist which automatically create periodic snapshots
(either of just critical system files, or of entire file systems).
These, however, suffer from problems of time aliasing, failing to
capture multiple changes made between snapshots and possibly creating
inconsistent snapshots at inopportune times.  By capturing every
revision, \PersiFS\ cannot suffer from time aliasing.

\subsection{Features}

% Key features
%% Continuous versioning
%% File system interface
%% High performance for a versioned file system
%% Actually stored on disk
%%% Using a regular file system to actually store our data
%%%% For reliability and ease of implementation
%%%% There's no direct correlation between the exported file
%%%% system and the underlying file system
%%%% And it wouldn't be difficult to translate over into a real disk
%%% (This detail should probably go in the internal representation
%%% section instead, though the fact that we store the FS on disk
%%% should be mentioned in the intro)
%% Network accessible
% Key contributions
%% File system interface capable of exposing all past states
%% Unmodified user applications and no special libraries

% Backups -> Snapshotting -> Continuous versioning

\PersiFS's main contribution is a novel file system interface that
allows users and applications to access the file system and its past
states without any modifications to applications or the need for any
special libraries. As a networked file system, \PersiFS\ allows remote
access to both current and archived files for multiple users.
\PersiFS\ is a \emph{continuously versioned} file system, so it is
capable of exposing \emph{all} past states, unlike \emph{snapshotting}
file systems that only save states at periodic intervals. Users can
access the entire file system tree as it existed at a particular time
simply by accessing the automatically mounted directory
\texttt{/persifs/servername@timestamp}.

\PersiFS\ provides persistent storage by storing all data on disk on a
central server. For reliability and ease of implementation, the
\PersiFS\ data structures are stored on a normal file system, though
with minor modifications it could operate on a physical disk as
well. Each client system runs a client daemon including an automounter
that provides the \texttt{/persifs} directory hierarchy, and forwards
requests to the appropriate server.
  
\section{Related Work}          % Dan
\label{sec:related-work}

Many other systems provide some form of persistent versioned storage.
\PersiFS\ differs from these systems in that it provides access to all
previous versions of the file system's state via a standard file system
interface.

\subsection{Version Control Systems}
\label{sec:related-work:vc}

Version control systems such as CVS~\cite{cvs} are the standard
mechanism for tracking revision histories in large projects. While
\PersiFS\ provides similar revision history operations, it does not
provide the higher-level semantics for synchronizing the work of
multiple users, such as file locking as in RCS~\cite{tichy85rcs} or
merging as in CVS. \PersiFS\ provides only revision history support;
it does not provide the higher-level functionality because the
appropriate semantics are dependent on the type of file and its mode
of use (e.g. text vs. binary files), and so should not be applied at
the file system level.
% XXX Mention that CVS-filesystem 6.824 project here if I can ever
% find it?

CVS organizes revision histories by assigning a version number to each
revision of a particular file. This is a natural interface for the
history of a particular file, but does not generalize well to tracking
the history of the entire file system. Many of the weaknesses of CVS as
a version control system are due to this interface: it does not
capture the notion of changes that occur simultaneously to multiple
files (changesets), and it does not effectively handle changes to the
directory structure, such as moving or renaming a file. The latter is
a major problem for a file system, since the directory structure can
be highly dynamic.

Subversion~\cite{subversion} addresses many of the shortcomings of CVS
by using a single global revision number that identifies a particular
state of the entire repository. This is similar to \PersiFS 's internal
representation; however, the user interface uses date/timestamps
instead because global revision numbers do not generally correspond to
a useful identifier from the user's perspective.

\subsection{Snapshots vs. Continuous Versioning}
\label{sec:related-work:snapshots}
Previous states of the file system are commonly stored by a backup
system that periodically archives the state of the filesystem.
\emph{Snapshot}-based version-controlled file systems are the natural
extension of this idea: they periodically take a snapshot of the state
of the filesystem, and make it readily available. The Plan 9 file
server~\cite{pike95plan}, for example, creates daily snapshots of the
filesystem and stores them on a write-once mass-storage system such as
a WORM jukebox or the Venti archival server~\cite{quinlan02venti}.
WAFL~\cite{hitz94file} provides similar snapshotting functionality
using copy-on-write disk blocks. AFS provides access to the most
recent snapshot through the \texttt{OldFiles} mechanism.
% XXX Find and add AFS cite

While a great improvement over a standard ephemeral file system,
snapshotting filesystems have the obvious disadvantage that they
cannot track changes that occur between revisions. \PersiFS\ is a
\emph{continuously} version-controlled file system: each change to the
file system is stored as a new revision.
Elephant~\cite{santry99deciding} and CVFS~\cite{soules03metadata} also
use this technique.

\subsection{Interfaces}
\label{sec:related-work:interfaces}

Some versioned file systems require special tools to access previous
versions of the file system. For example, CVFS is designed for
performing post-intrusion forensic analysis, so it does not provide an
interface for users to easily recover old versions of their
files. A convenient way to provide access to old versions is via a
filesystem interface. Plan 9 creates a directory hierarchy of
snapshots; Pike et al.~\cite{pike95plan} report that providing access
to snapshots via this interface is very convenient for users.

Ideally, it should be possible to interact with the versioned file
system exclusively through the file system interface, such that
unmodified standard UNIX utilities (\texttt{ls}, \texttt{cp}, etc.)
suffice for revision control. This is not possible in Elephant, which
adds new system calls. VersionFS~\cite{muniswamyreddy03-versionfs}
allows unmodified utilities to be used via a library wrapper and the
\texttt{LD\_PRELOAD} mechanism. \PersiFS\ uses a purely
file-system-based interface, using an automounter based on that of
SFS~\cite{sfs:mazieres-phd}, allowing standard utilities to be used
without any modification.
% Need to check Elephant claim. Express more disgust at VersionFS?

\section{Design}

\subsection{User Interface}     % Hector
\label{sec:design:ui}

\PersiFS\ provides access to both current and previous versions of the
file system state via a standard file system interface. Volumes are
accessed across the network using an automounter daemon in
essentially the same way as SFS~\cite{sfs:mazieres-phd}: a
\texttt{/persifs} directory is created, and the current version of the
file system on server \texttt{foo} is accessed via
\texttt{/persifs/foo}.

The automounter also provides access to previous versions of the file
system, by combining the server name with a timestamp, such as
\texttt{/persifs/foo@timestamp}. This gives a read-only snapshot of
the file system on \texttt{foo} as it appeared at the given time.

The user interface may also include a number of small tools that
improve the usability of \PersiFS. For example,
\begin{itemize}
\item \texttt{persifs pwd} \textit{datespec} \\
  Print the path to the current directory at the time specified by
  \emph{datespec}. Datespec should allow a wide range of time stamp
  formats, including relative ones as supported by CVS.
\item \texttt{persifs log} \emph{file}\\
  Display information about when \emph{file} was modified. This will
  require a special interface to the server to operate efficiently.
\end{itemize}

The administrator of a \PersiFS\ server should be able to set
retention policy, deleting data when it becomes older than a
pre-specified time, or merging together revisions that occur within a
certain time interval. This may be relegated to the domain of future
work depending on time and sanity constraints.

% File system interface
%% Harnessing the automounter
%%% Otherwise, this would either require an infinite number of
%%% directories to be explicitly exported, or there would have to be
%%% some mechanism for explicitly creating references to old versions
%% Read-only version at /persifs/foo@timestamp
%%% Dynamically mounted
%% Read-write current snapshot /persifs/foo
%% Fundamentally, path contains something to identify it as a
%% persifs-managed path (/persifs/), the address of the server (foo),
%% and optionally a timestamp (@timestamp)

% User tools

\subsection{Client-Server Interface} % Hector
\label{sec:design:client-server}
% The Client-Server Interface is going to be compatible with NFS,
% minimizing the need to change the behavior of the OS or the
% applications. The server is in charge of containing the PersiFS
% structure, taking care of the versioning.

% The client in the individual systems are going to run in the user
% level, interfacing the applications with the file system, like SFS,
% exposing PersiFS under \verb[/persifs/[. It is going to take care of
% the different versions, requesting the server to mount each one
% whenever it is necessary.

All data in \PersiFS\ is stored on a server, which contains the file
system structure and manages the versioning. Clients interact with the
server using a RPC-based protocol similar in spirit but somewhat
simpler than that of NFS~\cite{sandberg85design}, with support for versioned
operations.

Each client runs a user-level \PersiFS client that exposes \PersiFS\
under the \texttt{/persifs} directory. It is implemented using the SFS
user-level file system toolkit~\cite{sfs-toolkit}, minimizing the need
to change the behavior of the OS or the applications. The client acts
as an automounter, translating timestamps into revision numbers and
mounting directories under \texttt{/persifs} as necessary. It then
forwards requests to the appropriate \PersiFS\ server.

% Network file system interface
%% Server understands the file system structure and versioning
%% The client is responsible for exposing the /persifs/ interface on
%% individual computers and communicating with the appropriate
%% servers obtain and update file content
%% Interface to the server is on the same level as the NFS protocol,
%% though, of course, differs from the NFS protocol in its relative
%% simplicity and expression of versioned operations
%% (ie, the intelligence is mostly in the server, unlike block
%% server-based systems like Frangipani [ref] or Sun's NBD [ref?])

\subsection{Internal Representation} % Dan and Austin

% Storage/internal representation
% External memory model
% B+-tree over inodes
%% Better than B-tree because we can make it very flat because we only
%% have to store inode numbers in the internal nodes
% Inodes store pointers to lists of 

\newcommand{\bptree}{B\textsuperscript{+}-tree}
\newcommand{\bptrees}{\bptree{}s}

The file system server's internal representation is driven by two
structures: the \emph{layout tree} and the \emph{superblob}.  Both of
these structures are optimized for the external memory model, thus
yielding efficient operations in terms of number of disk accesses.
Furthermore, both have efficient on-disk representations.

Like traditional Unix file systems, \PersiFS\ uses inodes identified
with unique inumbers to store file and directory metadata.  Unlike
most file systems, however, inumbers are never recycled.  While not
strictly necessary, this eases inode generation and makes sense in a
system that never loses inodes.  Furthermore, this eases exportation
over NFS by eliminating the need for generation numbers in order to
uniquify file handles.  Also unlike most file systems, \PersiFS\
inumbers have no correlation to disk addresses and inode data is
stored along side file and directory contents.

The layout tree stores the overall structure of data in file system
indexed by inumber.  Actual inode and file contents are stored in the
superblob and pointed to by elements in the layout tree, avoiding the
problem of having to pack variable-sized records into the layout
tree.  The superblob is simply an append-only vector of all inodes and
file content chunks.
% XXX Mention future work of chunk pooling, garbage collection
% and compaction of unused versions, and/or chunk xdiffs here?

The layout tree is stored as a \bptree, where the branching factor is
determined by the requested block size, which itself is a parameter of
the file system.  A \bptree\ is used instead of a plain B-tree because
it allows for a higher branching factor, decreasing the depth of the
tree overall, and thus reducing the number of disk accesses necessary
to traverse it.
% Furthermore, it simplifies the operation of iterating over elements.
% XXX Motivate this sentence or nuke it

Files in \PersiFS\ consist of an inode and a sequence of
\emph{chunks}, which, concatenated, form the contents of the file.
Directories are also stored this way because they are simply files
with a special flag and a particular, internal format.  As such, the
layout tree is primarily keyed on inumbers and secondarily keyed on
\emph{chunk numbers}.  Each element is a 3-tuple of $<$inumber, chunk
number, superblob pointer$>$.  The zeroth chunk of a file is, by
convention, the inode and the remaining chunks are the content chunks
of the file.  By using a concatenation tree to represent the file
contents, \PersiFS\ is able to manipulate the contents of large files
with very little copying, while unmodified parts can remain shared
between versions.  Ropes~\cite{boehm95ropes} use a very similar
technique for manipulations of large, immutable strings.

% XXX Chunking

With the entire file system structure represented in a \bptree, it
becomes relatively trivial to add persistence.  Driscoll, et al.
describe a general technique for adding persistence to data structures
that operate under the cell-probe model, including
\bptrees~\cite{driscoll89peristent}.  Because the root itself has
multiple versions after this approach is applied, pointers to the
roots of the tree at different versions must be kept in a vector
indexed by version number.  Additionally, because time stamps, not
version numbers, are exposed by the file system interface (as
described in section~\ref{sec:design:ui}), a vector of $<$timestamp,
version number$>$ pairs must be maintained.  This can be searched
binarily to map between meaningful timestamps and internal version
numbers.  The layout tree is the only file system structure that needs
to be versioned.  Notably, the superblob does not need to be versioned
because data added to the superblob can never be removed or modified.

% Need to ponder this: http://portal.acm.org/citation.cfm?id=765854

Such a structure is difficult to manipulate without taking a global
lock.  Instead of allowing simultaneous modifications of the layout
tree, thus requiring locking, \PersiFS\ introduces a log.  The log
maintains a partial order over high-level operations to be performed
on the layout tree, including read operations.  A single background
thread is responsible for committing log operations to the layout tree,
eliminating problems with concurrent access to the tree.

The log is kept as a partial order and not simply a queue for
efficiency reasons.  By structuring the log as a partial order, commit
scheduling can be made far more flexible.  Many requests do not depend
on others, meaning that requests that require responses to the user
(for example, read operations) can be bubbled up in the log, while
other requests (such as write operations) that do not require the
operation to be committed to the layout tree for a successful
response, can be delayed.  Read operations cannot be unconditionally
brought to the front of the queue because they may depend on write
operations in the queue, introducing dependencies in the log's partial
order.
% XXX I think most reads can be done of existing versions, though I've
% unconvinced myself of how this interacts with caching, so I left it
% out of the above

\bibliographystyle{abbrv}
\bibliography{report}

\end{document}
