% -*- TeX-master: "dp2.tex" -*-
% $Id: mission-distribution.tex,v 1.4 2004/05/06 16:51:44 dan Exp $

The mission distribution subsystem constitutes the logic that runs on
the control center. It receives missions and amendments from Earth,
and efficiently allocates them to rovers. In addition to basic
distribution, it accounts for possible rover failure, schedules
experiment repetition to account for sensor failure, and allows for
mission amendments.

The mission distribution subsystem allocates \emph{assignments} to
rovers. An assignment is a sequence of instructions for one rover that
must be executed in order, i.e. ``travel to location 37.33, -121.03;
take a photo; travel to location 37.45, -120.82; take another photo;
return to the control center''. The instruction format is described in
Section~\ref{sec:mission-execution} and
Table~\ref{tab:instruction-format}. Each assignment is transmitted by
the control center to a rover, which then executes the assignment and
returns the results to the control center. The process by which rovers
execute assignments is described in detail in
Section~\ref{sec:mission-execution}

\subsection{State Information}
\label{sec:distribution-state}

In order to manage the allocation of missions to rovers, the central
controller must keep track of the current state of the system at all
times. To accomplish this, it maintains two state tables: a table of
rover states, and a table of mission states.

The table of mission states primarily associates a mission ID with the
list of \emph{tasks} required to complete the mission. We define a
task to be a tuple consisting of an experiment, and the location at
which it needs to be performed; in addition, we associate with each
task record a list of the assignments that involve it. We also
associate additional data with each mission, as in
Table~\ref{tab:mission-state}. To deal with amendments, as in
Section~\ref{sec:new-missions}, we give each mission record a revision
number. We also track dependencies --- missions that must be completed
before this one -- for each mission; since we split scattered and
scattered-cluster missions into their individual components (see
Section~\ref{sec:new-missions}), this ensures that the components are
executed in order. Finally, we introduce the notion of a
\emph{priority value} for the mission, which captures the relative
importance of each mission. This is used in the scheduling algorithm,
and is described extensively in Section~\ref{sec:priority-values}.

\begin{table}[htbp]
  \centering
  \begin{tabular}{|c|c|}
    \hline
    \textbf{Field} & \textbf{Type} \\
    \hline
    Mission ID & integer \\
    Revision number & integer \\
    Tasks & list of task records (experiment/position/assignments) \\
    Dependencies & list of mission IDs \\
    Priority value & integer \\
    \hline
  \end{tabular}
  \caption{Mission state table fields}
  \label{tab:mission-state}
\end{table}

The table of rover states associates each rover's ID number with its
current status. There are four possible states each rover can be in:

\begin{itemize}
\item \emph{available}, meaning it is present at the control
  center and waiting to be given an assignment
\item \emph{busy}, meaning it is currently executing an assignment
\item \emph{lame}, meaning it is currently executing an assignment
  that has become moot due to a mission amendment
\item \emph{dead}, meaning that the control center has not heard from
  the rover and has presumed it to have failed
\end{itemize}

For rovers that are not in the available state, we associate with them
their current assignment (an assignment record), and the time at which
it was assigned. We also compute the expected return time, which is
the sum of the start time, the required travel time for the
assignment, and the sum of the predicted times required to complete
each experiment.

\begin{table}[htbp]
  \centering
  \begin{tabular}{|c|c|}
    \hline
    \textbf{Field} & \textbf{Type} \\
    \hline
    Rover ID & integer \\
    Rover state & 2-bit state type (see above) \\
    Current assignment & assignment record \\
    Start time & time \\
    Expected return time & time \\
    \hline
  \end{tabular}
  \caption{Rover state table fields}
  \label{tab:rover-state}
\end{table}


\subsection{Mission Scheduling and Assignment Algorithm}
\label{sec:scheduling-and-assignment}

The control center assigns missions to available rovers. It does so by
periodically checking which rovers are in the available state: at
regular intervals, it sends out a ``who's-here?'' message. It gathers
responses from the nearby rovers, and once a short timeout period has
elapsed, begins assigning missions to any nearby rovers that are in
the available state, according to the rover state table.

Mission scheduling is performed by a greedy algorithm. Our algorithm
is designed to meet the principal design objective of \emph{maximizing
  useful work}. We assume that rovers will fail using a memoryless
failure model having a mean time to failure of 100 days, and this
failure is independent of whether the rovers are active or idle.
Hence, in order to complete the maximum amount of work before all
rovers fail, it is optimal to have all the rovers doing as much work
as often as possible; idle rovers are ``wasted resources.''

The basic greedy algorithm is augmented by three additional
concepts. Priority values are used to express the relative priority of
missions; this scheme allows the most valuable missions to be executed
first. If beneficial, multiple missions may be \emph{batched} and
executed sequentially by the same rover. Finally, if additional rovers
remain idle with no missions to execute, they may be assigned to
cooperate with some other rover on an assignment that can be executed
in parallel.

\subsubsection{Priority Values}
\label{sec:priority-values}

In particular, we would like to accomplish the most useful work as
quickly as possible. This concept of relative utility is expressed by
the priority value associated with each mission. This is an extension
to the design specifications, but it has several important properties
that make it a natural addition. The first is the very likely
possibility that all missions are not of equal value. Second, even in
the original non-prioritized case, it neatly resolves the ambiguity of
whether each mission has the same value, or whether longer missions
are inherently more valuable than shorter ones: in the former case,
each mission can be assigned the same priority value, and in the
latter case each mission can be given a priority value proportional to
its length. It also allows NASA to provide low-priority experiments
that will be executed only if rovers are available, which can help
ensure that rovers are never idle. Finally, though we do not
explicitly address starvation in this design (we presume that
higher-valued missions really are valued higher, and hence should be
prioritized over others), it is easy to imagine a solution that would
address starvation by incrementing the priority values for missions
that have been in the queue longer.

Accordingly, we rank available assignments in terms of the quotient
$\left(\frac{\mbox{priority value}}{\mbox{required time}}\right)$, and
  assign them greedily. We define a procedure \proc{Greedy-Select}
  that takes a list of assignments, ranks them, and selects the
  highest-ranked. We will use this to iteratively assign the mission
  with highest quotient to the first available rover, and eliminate it
  from the available mission pool.


\subsubsection{Batching}
\label{sec:batching}

We also allow for \emph{batching} of multiple missions when it is
advantageous. That is, we may consider having a rover execute one
mission, then proceed to execute a different mission without returning
to the control center first. The advantage of this scheme is obvious:
if the two missions take place close together, more work can be
accomplished in the same time, since we avoid the overhead of having
to travel along the possibly-circuitous route from one mission site
back to the control center then to the second mission site. The
drawback is that the rover will be away from the control center
longer, and the probability that it will fail while executing its
assignment increases accordingly. Moreover, a rover failure has the
potential to be more catastrophic because it can cause the data for
\emph{all} the missions it is executing to be lost, not just a single
mission.

We attempt to quantify the benefit or cost of a particular batching
assignment of two missions $a$ and $b$ using this reasoning, in the
following equation:

\begin{multline*}
  \proc{Travel-Time}(\proc{End}(a) \to \proc{Start}(b))
- 
\shoveleft{  \proc{Travel-Time}(\proc{End}(a) \to \id{control-center} \to
  \proc{Start}(b))  }
  \\
\shoveleft{>
  \proc{Work-Required}(a) * \id{failure-probability} *}\\
  \shoveleft{ (\proc{Travel-Time}(\proc{End}(a) \to \proc{Start}(b))\; 
          +\proc{Work-Required}(b)}\\
         \shoveleft{ +\proc{Travel-Time}(\proc{End}(b) \to \id{control-center})
          -\proc{Travel-Time}(\proc{End}(a) \to \id{control-center})}
        )
\end{multline*}

This equation takes into account the benefit of the time saved by
traveling from $a$'s endpoint directly to $b$, rather than through the
control center, and weighs it against the product of the probability
that the rover will fail in the extra time it spends performing
mission $b$, with the work from mission $a$ that would be lost in such
a failure.

Before performing mission assignments, the mission distributor on the
control center first examines all pairs of available missions, and
evaluates whether it would be beneficial to batch them using this
metric. If so, it joins the two, and then repeats the process to see
if there are any other missions it would be beneficial to batch with
the pair. When we join two assignments together, we sum their priority
values; this reflects that the new assignment will be doing more
useful work. This process is repeated until no further beneficial
batchings remain.


\subsubsection{Cooperative Execution}
\label{sec:cooperative-rovers}

Our scheduling approach is optimized for the case in which there are
more unassigned missions than available rovers. In this case, it can
assign one rover per mission (and sometimes one rover per multiple
missions, when batching can be performed), selecting the missions that
provide the greatest benefit the most rapidly. This process ensures
that the most limited resource, rover time, is allocated effectively.

It is reasonable to assume that more missions will be available than
rovers. First, it is always possible for NASA to upload more missions
to ensure an ample supply is available. Since we divide scattered and
scattered-cluster missions into their component parts
(Section~\ref{sec:new-missions}), the mission distributor actually has
more missions in its queue than were transmitted by NASA. Finally, as
rovers inevitably fail, the number of remaining rovers will dwindle,
so that eventually there will be more missions than rovers, even if
originally more rovers than missions were available.

However, a recent amendment to the specifications proclaimed that at
any given time there would be approximately 25 missions either being
executed or waiting to be distributed. Hence, when we have a mission
available that requires some experiment to be performed repeatedly in
order to achieve the desired failure probability, we allow for
multiple rovers to be cooperatively assigned to the same mission.

Though this is not optimal in terms of maximizing useful work done,
since both rovers will incur the overhead of traveling to the
appropriate location, it allows missions to be performed more rapidly.
Additionally, the multiple rovers provide redundancy: if one rover
fails while performing the experiment, its previous results will have
been replicated onto the other rovers, minimizing the amount of result
data lost. It is certainly appropriate to perform cooperative
execution if idle rovers are available at the control center, since
their time would otherwise go to waste.

The algorithm in Figure~\ref{fig:cooperative-pseudocode} is used to
determine whether to assign rovers cooperatively to a mission
assignment. First, there must be rovers available and no outstanding
unassigned missions. Next, each assignment that has been previously
assigned to another rover is considered. The assignment must have
fewer rovers executing it than the maximum number of repetitions
required for any experiment in the assignment. The algorithm also
takes into account how long it has been since the previous rovers were
assigned to that mission: if the rovers had been assigned a long time
in the past, they may be almost complete executing the mission, and
there is no point to assigning more rovers to it. Hence, we take the
sum of the times since all previous rovers had been given that
assignment, divide it by the duration of the experiment requiring the
most repetitions, and subtract that from the maximum number of
repetitions required. If the resulting number of repetitions is
greater than 1, the assignments will be considered for cooperative
execution. If multiple assignments satisfy this criterion, the
allocation will be performed greedily using the same metric as for
normal assignments.

\begin{figure}[htbp]
  \centering
  \begin{codebox}
    \Procname{\proc{Assign-Cooperative-Assignments}()}
    \li \While rovers are available \textbf{and} no missions are unassigned
    \li \Do
      $\id{available-assignments} \gets \{\}$
      \li \For \id{assignment} \textbf{in} \id{busy-rover-assignments}
      \li \Do
        $x \gets
        \proc{Maximum-Number-of-Repetitions-in}(\id{assignment})$
        \li $x \gets x -
        \proc{Number-of-Rovers-Currently-Executing}(\id{assignment})$
        \li \If $x > 1$
        \li \Then
          $\id{available-assignments} \gets \id{available-assignments} \;\cup
          \; \{\id{assignment}\}$
      \End
      \End
    \li $\id{selected-assignment} \gets \proc{Greedy-Select}(assignments)$
    \li \proc{Assign}(\id{selected-assignment}, first available rover)
    \End
    \end{codebox}
  \caption{Cooperative assignment pseudocode}
  \label{fig:cooperative-pseudocode}
\end{figure}

\subsubsection{The Algorithm}
\label{sec:distributing-algorithm}

The mission distributing algorithm is presented in
Figure~\ref{fig:distributing-algorithm}. The mission distributor first
takes the list of available mission assignments, and batches them
together whenever appropriate. It then ranks the assignments in terms of
the quotient of their priority value and the required time-to-execute,
and greedily assigns them to available rovers. If all missions are
assigned and additional rovers remain, they will be considered for
cooperative assignment execution (as described above).

\begin{figure}[htbp]
  \centering
  \begin{codebox}
    \Procname{\proc{Distribute-Missions}()}
    \li \Comment Initialize the list of possible assignments from the
    available missions list
    \li $\id{assignments} \gets \{\}$
    \li \For $\id{mission}$ \textbf{in} \id{unassigned-missions}
    \li \Do
          $\id{assignments} \gets \id{assignments} \;\cup\;
            \{\id{mission}.\mbox{tasks}\}$
        \End
   \li
   \li \Comment Batch assignments together when beneficial
   \li \Comment Beneficial batchings are identified using the
   equation in Section~\ref{sec:batching}
   \li \While $\proc{Identify-Beneficial-Batchings}(\id{assignments})
            \neq \emptyset$
   \li \Do
         $\{a,b\} \gets
         \proc{First}(\proc{Identify-Beneficial-Batchings}(\id{assignments}))$
   \li   $\id{batched-assignment} \gets \proc{Join-Assignments}(a,b)$
   \li   $\id{assignments} \gets \id{assignments} \;\cup\;
         \{\id{batched-assignment}\} \; \setminus \; \{a,b\}$
       \End
   \li
   \li \Comment Greedily assign rovers to missions
   \li \While rovers are available \textbf{and} $\id{assignments} \neq
   \emptyset$
   \li   \Do
           $\id{selected-assignment} \gets \proc{Greedy-Select}(assignments)$
   \li     \proc{Assign}(\id{selected-assignment}, first available
           rover)
   \li     $assignments \gets assignments \;\setminus\;
   \{\id{selected-assignment}\}$
         \End
   \li
   \li \If rovers are still available
   \li   \Do
          \proc{Assign-Cooperative-Assignments}()
         \End
     
  \end{codebox}
  \caption{Mission distribution algorithm pseudocode}
  \label{fig:distributing-algorithm}
\end{figure}

\subsection{Result Collection}
\label{sec:result-collection}

The control center handles the task of retrieving mission results from
rovers that have completed their assignments, and transmitting this
data to NASA on Earth. When a rover returns to and locates the control
center, it begins a transmission to the control center. It then
transmits the contents of its mission results buffer to the control
center. This is a sequence of mission results, each tagged with
appropriate metadata: the location and type of experiment performed,
the mission ID and revision number of the mission that the experiment
was performed for, the ID number of the rover that performed the
experiment, and the time at which the experiment was performed.

When a mission result is received, the control center adds it to a
queue of results awaiting transmission back to Earth. The only
exception is if the same result --- with identical metadata, including
the performing-rover and timestamp --- is already in the queue. In
this case, the duplicate result is simply discarded. In all other
cases, the result is queued for transmission. We do so even if the
result is presumed to be invalid due to a sensor fault, since the
scientists at NASA may wish to attempt to recover meaningful data or
diagnose the faulty sensor for it. Similarly, we also return results
even if they may no longer be meaningful due to an amendment, since
the results may still be useful and it seems wasteful to simply
discard them. In doing so, we make the assumption that there is
adequate bandwidth (albeit high latency and limited availability) to
transmit all the data; if this is not the case, it may be necessary to
discard presumed-invalid results.

The control center is periodically queried by NASA on Earth to see if
there are any mission results available. If so, it transmits all
available mission results. Once the results are acknowledged, they are
deleted from the control center's queue. If all results pertaining to
a mission have been completed and transmitted to Earth, and no rovers
are currently assigned to that mission, the mission may be considered
complete and removed from the mission status table.

\subsection{Handling Rover Failures}
\label{sec:rover-failures}


Rovers can fail unpredictably. The control center can only detect
failures when it fails to receive any response from the rover. There
must therefore be a time after which the control center assumes the
rover has failed if no response has been received. We compute this
time by summing the required travel time and the time required to
perform and repeat the experiments. We then apply a ``slack'' scaling
factor that accounts for inaccuracies in travel and experiment times
estimate. The value of this scaling factor depends, of course, on the
quality of the estimates being used, and should be chosen accordingly.


The control center periodically iterates through the list of rovers,
and checks whether the difference between the current time and the
time that rover's mission was assigned is greater than the maximum
time previously mentioned. If the mission has not been completed by
this time, it is marked unassigned and the rover assigned to it is
marked dead. If it turns out that the rover was in fact still
operating, and it returns after being previously marked dead, the
experiment results will be downloaded, the mission marked completed,
and any new rover assigned to that mission treated as lame, as though
a mission amendment had been received. The second set of mission
results will still be returned to Earth, as described in
Section~\ref{sec:result-collection}.

We originally considered the possibility of sending a ``interceptor''
rover to travel to the location of the lame rover and inform it that
its mission has been made moot by an amendment. The two rovers could
then either return to the control center for a new assignment, or
proceed to execute the next mission. Though this could conceivably
eliminate some of the wasted time used by lame rovers that continue to
execute missions that are no longer relevant, it is fraught with
complexities and we concluded that it would be impractical. First, it
is not always beneficial to send an interceptor rover; if the lame
rover has almost completed executing its mission, the minimal time
savings will be nullified by the time required to send the
interceptor. More troublesome, it is non-trivial to correctly deliver
a message. The location of the target rover must be ascertained, and
this becomes impractical because of the number of variables
involved. The travel time and experiment execution times are only
estimates, so it is impossible to compute with complete certainty
where the rover will be found at a certain time. If the interceptor
does not find its target rover where it was expected to be, it cannot
determine whether the rover has failed, or whether it has simply
finished its earlier tasks more quickly or slowly than expected. In
light of these complexities, we opted not to use this approach.


\subsection{New Missions and Amendments}
\label{sec:new-missions}

When a mission is received, it is added to the mission table in the
unassigned state. Scattered and scattered-cluster missions are broken
up into their individual or cluster components, and these are treated
as separate missions. Dependency lists, as described in
Section~\ref{sec:distribution-state} are used to ensure that the
individual components of each scattered mission are executed in order.
This allows rovers to return to the control center to offload their
data between parts of the scattered mission, minimizing the amount of
data that will be lost if the rover fails. The tradeoff here is
between the time that will be lost traveling back to the control
center instead of directly between the two mission locations, and the
expected increase in the amount of work that will need to be redone if
the rover fails. Fortunately, this tradeoff is accounted for by the
batching algorithm (Section~\ref{sec:batching}). If it is ultimately
beneficial for a rover to travel directly between two mission
locations, it will do so; otherwise, it will travel first to the
control center. The priority value of a scattered or scattered-cluster
mission is divided among the new component missions, proportionally to
the amount of work in each mission.

When an amendment is received from Earth, the mission distributor
updates the mission table with the new parameters and increments the
mission's revision number.  If the mission is still unassigned, no
further processing is necessary; the newly-revised mission will be
distributed by the assignment algorithm when it is appropriate to do
so. If the mission has already been assigned or completed, then the
modified portions of the mission will be rescheduled as though it were
a new mission. Any rover that is currently executing the old revision
of the mission will be marked ``lame''.  If the lame rover completes
the old mission, the revision number information will indicate to the
control center that it is an outdated revision and is not satisfactory
for the purpose of marking the mission as completed. The results,
however, will still be sent back to Earth, since they may still be of
some value.

If any rovers are currently in the available state when new missions
or amendments are received, the mission distributor will re-run the
assignment algorithm to appropriately assign the new missions to the
available rovers.


% LocalWords:  rovers rover's