\begin{latexonly}
\svnInfo $Id$  
\end{latexonly}

\newenvironment{problem}{\paragraph{Potential attacks.}}{}
\newenvironment{solution}{\paragraph{Proposed solution.}}{}

We now turn our attention to specific OS features that applications
commonly depend upon. For each feature, we examine how a malicious OS
might use it to mount an attack on an application, and whether that
functionality can be securely delegated to the OS using a verifiable
interface. For concreteness, we use examples based on a Linux OS, but
although the details may differ, the components we discuss are common
to most operating systems.

\subsection{File System}
\label{sec:components:fs}

One of the most important services provided by the OS is access to
persistent storage; it is also particularly critical for security,
since both the program code and its (potentially sensitive) data are
stored on the file system.

\subsubsection{File Contents}
\label{sec:components:fs:contents}

\begin{problem}
  Protection is clearly needed for file contents. If files are stored
  unprotected, a malicious operating system could directly read an
  application's secret data as soon as it is written to disk. A
  malicious OS could also tamper with application binaries, replacing
  an application with code that simply prints out its sensitive data.
  It might also launch a replay attack, reverting a file to an earlier
  version, perhaps replacing a patched application with an earlier
  version that contains a buffer overflow. 
\end{problem}

\begin{solution}  
  Most protection architectures already provide some protection for
  file contents, thereby thwarting these attacks. For example,
  Overshadow's cryptographic secrecy and integrity protection extends
  to files stored on disk as well as memory. This is accomplished by
  translating all file I/O system calls into operations on
  memory-mapped file buffers. Since these buffers consist of memory
  pages that are shared between the application and the kernel, they
  are automatically encrypted and hashed when the OS flushes them to
  disk. Like memory regions, all files are encrypted with the same
  key, known only to the VMM and stored securely outside the VM;
  access control is independent of key management.

  To defend against tampering, reordering, and replay attacks on file
  contents, Overshadow maintains \emph{protection metadata} for each
  file, consisting of a secure hash of each page in the file, in
  addition to the randomly-chosen block cipher initialization
  vectors. This protection metadata is protected by a MAC and
  freshness counter, and stored in the untrusted guest file system.

  %\begin{ednote}{DRKP}
    %Relate this to how other systems secure file contents.
  %\end{ednote}
\end{solution}

\subsubsection{File Metadata}
\label{sec:components:fs:metadata}

\begin{problem}
  More subtly, file \emph{metadata} needs to be protected, including
  file names, sizes, and modification times. Many designs omit this
  aspect, relying on the OS for services such as pathname
  resolution. As a result, a malicious OS could perform a pathname
  lookup incorrectly. Even a system that protects file contents may be
  subverted if the OS redirects a request for a protected file to a
  \emph{different} but still valid protected file. For example, with
  Overshadow's file protection mechanism described above, such an
  attack could succeed if the OS also redirected the access to the
  protection metadata file that contains the hashes to verify the
  file. It can only redirect file lookups to valid, existing protected
  files, but this still opens many possibilities for attack, such as
  redirecting a web server's request for \texttt{index.html} to its
  private key file instead.
\end{problem}

\begin{solution}  
  We propose using a trusted, protected daemon to maintain a secure
  namespace, mapping a file's pathname to the associated protection
  metadata. Applications can communicate with this daemon over a
  protected IPC channel (as described in
  Section~\ref{sec:components:ipc}), requesting directory lookups when
  files are opened, and updating the namespace when files or
  directories are created, removed, or renamed. Each file or directory
  can also have an associated list of SIDs (\emph{i.e.}~applications)
  that are allowed to access it. Maintaining this namespace requires
  adding code for directory lookups to the TCB, but this can be far
  smaller than a full file system implementation. This design was
  proposed in Overshadow~\cite{chen08:_overs}, and also used in
  VPFS~\cite{weinhold08:_vpfs}, a similar file system architecture for
  L4 that uses a small trusted server and an untrusted commodity file
  system to reduce TCB size.

  Alternatively, as noted in~\cite{weinhold08:_vpfs}, storing a hash of
  a file's pathname in its header provides a much simpler way to verify
  that pathnames are looked up correctly, but does not allow directory
  contents to be enumerated.

  Similar ideas are used by other systems that build secure storage on
  an untrusted medium. VPFS also uses a trusted server to store file
  system metadata, although it uses different techniques to secure
  file
  contents~\cite{weinhold08:_vpfs}. SUNDR~\cite{li04:_secur_untrus_data_repos_sundr}
  and Sirius~\cite{goh03:_sirius} are distributed file systems that
  use client-side cryptography to avoid trusting the file server;
  because they have no trusted storage, they cannot guarantee
  freshness, and are therefore subject to \emph{fork attacks}, where
  the file server presents different versions to different
  clients. Like TDB~\cite{maheshwari00:_how_to_build_trust_datab}, we
  have available a small amount of trusted storage (in our case, in
  the VMM) that can be used to guarantee freshness.
\end{solution}

\subsection{Inter-Process Communication}
\label{sec:components:ipc}

A trusted inter-process communication mechanism is a key component of
a secure system. In addition to protecting application communications,
it is a useful building block for constructing other secure
components; for example, it is necessary for communicating with the
file system namespace daemon.

\begin{problem}
  IPC channels provided by the OS are insecure, and thus face all of
  the standard problems inherent in communication over an untrusted
  channel. A malicious OS might spy on IPC messages between protected
  processes, or might tamper with, drop, delay, reorder, or spoof
  messages.

  Many attacks are possible as a result. For example, a secure
  application might consist of a database of sensitive information
  such as credit card numbers that is accessible only through a
  restricted web interface. A malicious OS could observe the credit
  card numbers as they are transmitted over the web server's IPC
  connection to the database server, or it could tamper with the
  database by sending spoofed requests over the IPC connection.

  More subtle attacks are also possible, much like the attacks on file
  metadata. Rather than directly inspect the contents a protected
  application's IPC channel, the OS might redirect the connection to
  point to a different process which would then expose the data, such
  as \texttt{/bin/cat}. The OS could also simply refuse to deliver any
  messages between two processes.
  %\begin{ednote}{DRKP}
    %Good example of OS dropping messages causing a correctness
    %problem?
  %\end{ednote}
\end{problem}


\begin{solution}
  One way to provide secure IPC is to implement it entirely in the
  trusted layer, by setting up a message queue in the VMM. Processes
  could then enqueue messages or check for pending messages via secure
  hypercall. A problem with this approach is that it is impractical for
  applications to poll for messages, since this either requires waking
  up each process regularly, or tolerating a high message
  latency. However, we can use the guest operating system to provide
  asynchronous notifications: after sending a message through the VMM,
  the sender also sends the receiving process a signal through the guest
  OS. Because the guest OS does not handle message data, it cannot
  impact confidentiality, integrity, or ordering; the OS is relied upon
  only for availability.

  However, although this approach is suitable for small, infrequent
  messages, it is not ideal for large data transfers, both because of
  the need to copy data into and out of the VMM, and to keep VMM
  complexity to a minimum. Instead, we can use shared memory
  regions for most of the communication, using VMM-assisted
  communication only for bootstrapping the secure channel. Specifically,
  a protected process wishing to communicate with another process in the
  same compartment would create a shared memory region (\emph{e.g.\
  }using \texttt{mmap}), and populate it with a pair of message
  queues. Using Overshadow's protection mechanism for memory-mapped file
  contents, the OS cannot read or modify the contents of the shared
  memory region. However, the OS manages the namespace of these shared
  memory regions, so it might still attempt to map in a different
  region, such as the one corresponding to a different IPC channel. To
  defend against this, the sender can place a random nonce in the memory
  region, and communicate it securely to the recipient through the
  VMM. As before, the untrusted OS's signals can be used as asynchronous
  notifications.

  Implementing IPC in this way guarantees secrecy, integrity, and
  ordering, but there cannot be any guarantees that messages are
  received in a timely manner (or at all) when the operating system
  could delay or terminate one of the processes involved. We could have
  added acknowledgements to our message-passing protocol, blocking the
  sender until the receiver acknowledges the message, but chose not to
  because the OS could still stop the receiving process after it
  acknowledges the message but before acting on it. Instead, we
  require that applications not assume messages have been received
  unless they implement their own acknowledgement protocol. This is
  sound practice even with a correctly functioning OS, as the
  receiving process might be slow or have crashed.  
\end{solution}

\subsection{Process Management}
\label{sec:components:process}

The OS is responsible for the management of processes, including
starting new processes and terminating existing processes. In
addition, it manages process identities, which applications rely on
for directing signals and IPC messages. This opens several avenues of
attack.

\begin{problem}
  Although the OS cannot interfere with program execution contexts and
  control flow during normal operation, it might be able to do so when
  a new process is started. For example, when a process forks, it
  might initialize the child's memory with malicious code instead of
  the parent's, or set the starting instruction pointer to a different
  location. Signal delivery also presents an opportunity for a
  malicious OS to interfere with program control flow, since the
  standard implementation involves the OS redirecting a program's
  execution to a signal handler.

  A malicious OS might try to redirect signals, process return values,
  or other information to the wrong process. It might attempt to change
  a process's ID while it is running, or send the wrong child process ID
  to a parent.
\end{problem}

\begin{solution}
  Solutions for securing control flow for newly-created processes are
  relatively well-understood. Overshadow interposes on \texttt{clone}
  and \texttt{fork} system calls to set up the new thread's initial
  state. This includes cloning the memory integrity hashes and thread
  context (including the instruction pointer), thereby ensuring that
  the new thread can only be started in the same state as its parent.
  %\begin{ednote}{DRKP}
    %XOMOS does the same thing, I think --- need to check
  %\end{ednote}

  To ensure that signals are delivered to the correct entry point,
  Overshadow also maintains its own protected table of the application's
  signal handlers. It registers only a single signal handler with the
  kernel, which immediately makes a hypercall to the VMM. The VMM then
  securely transfers control to the appropriate signal handler.

  We can address the problems related to the OS managing process
  identity by using an independent process identity in conjunction
  with the secure IPC mechanism of
  Section~\ref{sec:components:ipc}. Whenever a new process is created,
  it is assigned a secure process ID (SPID) to identify it for secure
  IPC purposes; this is an identifier that is conceptually independent
  of the OS's process ID, although with a correctly functioning OS
  there will be a one-to-one relationship. When a process is forked,
  its SPID is communicated to the parent, along with the OS's process
  ID, via a secure IPC message. When one process wants to send another
  a signal, it sends a secure IPC message identifying itself and the
  signal. Similarly, when a process exits, it sends its return value
  securely to its parent.
\end{solution}

\subsection{Time and Randomness}
\label{sec:components:time}

\begin{problem}
  The operating system maintains the system clock, which means that
  security-critical applications cannot rely on it. A malicious OS
  could speed up or slow down the clock, which could allow it to
  subvert expiration mechanisms in protocols like
  Kerberos~\cite{neuman94:_kerber} or time-based authentication
  schemes. It might also cause the clock to move backwards, an
  unexpected situation that could expose bugs in application code.

  In addition, the standard system source of randomness comes from the
  OS, making it unsuitable for use in cryptographic applications. A
  malicious OS could use this to control private keys generated by an
  application, or defeat many cryptographic protocols.
\end{problem}

\begin{solution}
  We see little solution other than to create a trusted clock and
  source of secure randomness. In our system, these would be
  implemented in the VMM, and time-related system calls and access to
  \texttt{/dev/random} would be transformed into hypercalls.

  Although this requires adding additional trusted components to the
  system, the TCB expansion is not significant because the VMM or
  microkernel likely already includes time and entropy services for
  its own use. However, keeping time perfectly synchronized between
  the guest OS's clock and the VMM's can be challenging even with a
  correctly functioning OS~\cite{vmware05:_timek_in_vmwar_virtual_machin}.
  
  %\begin{ednote}{DRKP}
    %Not as straightforward as it sounds; reference VMM timekeeping
    %complexities.
  %\end{ednote}
\end{solution}

\subsection{I/O and Trusted Paths}
\label{sec:components:io}

\begin{problem}
  An application's input and output paths to the external world go
  through the operating system, including display output and user
  input. The OS can observe traffic across these channels, capturing
  sensitive data as it is displayed on the screen, or input as the
  user types it in (\emph{e.g.\ }passwords). It could also send fake
  user input to a protected application, or display malicious output,
  such as a fake password entry window.

  Network I/O also depends on the operating system, but this poses
  less of a problem because many applications already treat the
  network as an untrusted entity. Cryptographic protocols such as SSL
  are sufficient to solve this problem, and are already in common use.
\end{problem}

\begin{solution}
  There are many complex issues inherent in designing a secure GUI,
  such as labeling windows and securing passphrase entry; many of
  these have been studied extensively in the context of multi-level
  secure operating systems~\cite{berger90:_compar_mode_works,
    shapiro04:_desig_of_eros_trust_window_system}. We do not address
  them here, but focus on the question of how to achieve a trusted
  path that does not rely on the operating system.
  
  A simple approach that maintains backwards-compatibility with
  existing applications is to run a dedicated, trusted X server in the
  application's compartment. Overshadow's memory protection can ensure
  that only the application and the virtual graphics card can access
  the server's framebuffer in unencrypted form. This approach requires
  adding the entire X server and its dependencies to the application's
  TCB. The situation may not be as grim as the number of lines of code
  would suggest, however, because the attack surfaces are limited.
  The trusted display server's interfaces can be limited to a secure
  socket to the protected application and the virtual I/O devices,
  so an attacker cannot easily interact with it.
  
  Nevertheless, we might still wish to avoid trusting the entire
  display server, and instead use an untrusted display server that
  manages the display without having access to the contents of
  windows. It seems possible to achieve this using a window system
  architecture where applications render their window contents into
  buffers, and the window server simply composites them. It is not
  clear, however, how to implement this in a way that maintains
  compatibility with existing applications.
\end{solution}

\subsection{Identity Management}
\label{sec:components:identity}

The OS is responsible for managing a number of types of identities; we
have already discussed the need to secure file system names and
process IDs. Several others also exist, including user and group IDs
and network endpoints (IP addresses, DNS names, and port numbers).

\begin{problem}
  These OS-managed identities are frequently used in authentication:
  applications often use the user ID of a local process or the IP
  address of a remote host to determine whether to grant access to a
  client. A malicious OS could cause a connection from an attacker to
  appear to be coming from a trusted local user or host.
\end{problem}

\begin{solution}
  Applications should not rely on these identities for authentication
  or other security-critical purposes. Secure authentication can
  be implemented cryptographically for either local or remote
  connections. It may also sometimes be possible
  to securely authenticate a local connection simply by verifying that
  both endpoints are in the same secure compartment.
\end{solution}

\subsection{Error Handling}
\label{sec:components:error}

When a system call fails, the operating system returns an error code
that, in addition to indicating failure, gives a reason for the
failure. A malicious OS might return an incorrect error code,
affecting a protected application's control flow. There are several
types of violations. It is relatively straightforward to detect values
that are clearly invalid according to the system call specification,
such as a ``bad file descriptor'' error on a \texttt{fork}
call. However, the OS might return a legitimate error code for an
error that did not take place. Certain error codes can be verified
because they correspond to functions that are implemented in or can be
verified by trusted components. For example, if \texttt{open} returns
a ``no such file'' error, the trusted namespace daemon
(Section~\ref{sec:components:fs:metadata}) can distinguish between
cases where the file legitimately never existed and those where the OS
is trying to conceal the file. Other error codes cannot easily be
verified, such as returning a ``network unreachable'' error on
\texttt{connect}, even if no network error took place. In these cases,
applications cannot rely on the error codes being accurate for
safety.

Given that error checking is essential for constructing robust
software, it may seem alarming that error codes cannot always be
guaranteed accurate. However, applications frequently use error values
in ways that do not affect the secrecy of sensitive data. In our
experience, the most common responses to a failed system call are to
ignore the error, retry the system call, or fail-stop --- in most
cases, applications are mainly concerned with whether the system call
succeeded rather than the reason for failure. There are a few common
exceptions, such as the ``no such file'' error mentioned above, but
these errors are often verifiable and thus can be relied upon. Indeed,
many of the errors that cannot be verified correspond to availability
problems (\emph{e.g.}~``out of memory'' or other resource limit
errors), or untrusted components such as the network.

%%% Local Variables: 
%%% mode: latex
%%% TeX-command-default: "Make"
%%% TeX-PDF-mode: t
%%% TeX-master: "paper"
%%% End: 

% LocalWords:  metadata pathname lookup lookups namespace TCB IPC pathnames VMM
% LocalWords:  enqueue hypercall