\documentclass{article} \include{6033-preamble} \usepackage{graphicx,fullpage} \begin{document} %\title {Design Project 2 Executive Summary} %\author {Dan Ports, Amanda Smith, Sarah Lieberman} %\maketitle \begin{center} \par \huge{AARDVARK} \vskip 1em {\lineskip .75em \Large{Dan Ports, Amanda Smith, and Sarah Lieberman}\par \Large{May 6, 2004}\par \vskip 1em \large{6.033 Design Project 2 Executive Summary}\par \large{Recitation \#9: Karger/Lesniewski-Laas} }\end{center} \section{Overview} The Mars Rover project can most easily be split into three components: communication protocol, mission execution, and mission distribution. The problems associated with communication protocol are relatively straightforward and include message authentication and error detection and correction. Mission execution covers issues relating to the actual completion of missions by rovers, and includes such issues as sensor failure and data relay between rovers. Mission distribution, possibly the most difficult problem in the project, contains the algorithm which the command center uses to distribute missions to rovers, as well as the mechanisms it uses to detect and deal with rover failure, distribute amendments, and coordinate data uploads. \section{Communications}\ The basic communication protocol is an at-least-once delivery system, with possible duplicate messages, using a standard packet structure, and only broadcasting one packet at a time to prevent misordering of packets. The packet structure contains the name of the sending device, a message nonce, a message type, a message (optional), a signature, and a checksum. The name is simply an integer coded into each rover and the command center, and the nonce is simply a large integer used as a message identifier. The type is an integer code such as "01" for an acknowledgement, "06" for mission data, and "10" for a request to resend a message. The message is any additional content, such as mission assignments or nonces in acknowledgements. Finally, the devices use a shared-secret key and a SHA1 hash as a checksum to provide authentication and guard against transmission errors, respectively. Encryption and other protections against malicious devices are not needed, because the only other devices on Mars are non-malicious Poodle rovers. To detect the presence of another device, a device sends out queries until it recieves an acknowledgement. The device then opens communication, which stops both devices from moving if it is acknowledged. The device will send out messages and packets of data one at a time, waiting for an acknowledgement (or a resend request) before sending another, thus ensuring at-least-once delivery; duplicate messages are fine since the only communications are data transmission rather than action requests. When the devices end communication, they send out end messages and then resume other operations. If one device does not recieve a message from another in a certain amount of time, it will try resending its last packet or sending a query; if this fails repeatedly, it will assume a dead device and end communication. \section{Mission Execution} \label{sec:mission-execution} Each rover is capable of autonomously executing missions according to parameters received from the control center. It does so by maintaining a queue of instructions. There are four types of instructions: \begin{itemize} \item travel to a certain location \item perform an experiment (e.g. taking a photo) \item deliver a message to another rover \item return to the control center \end{itemize} When a rover performs a experiment, there is a non-zero probability of sensor failure. However, to ensure that a correct result is returned with 95\% probability, the rover will repeat the experiment the required number of times. The rover will have a preprogrammed number of trials to perform with each sensor to ensure a 95\% chance of at least one accurate result, based on the failure rate of the sensor. When two rovers are sent together to perform a mission, or a rover travels within communication range of another rover, they will attempt to transmit data to each other. A rover will reject any data which it does not have room to store in favor of keeping its own mission data. Each data copy will have a version number so that if the command center recives multiple copies of data from the same mission, it will know which is the most complete and up-to-date. This will reduce the probability of data loss due to rover failure. \section{Mission Distribution} \label{sec:mission-distribution} The control center receives missions and amendments from Earth and allocates them to rovers. Hence, it must maintain a table of missions. Each mission comprises a set of tasks to be executed at a certain, and is identified with a mission ID and revision number. The control center also associates states with each mission: whether it has been assigned, and whether it has been partially or fully completed. The control center also maintains a table of rovers. A rover can be: \begin{itemize} \item \emph{available}, meaning it is present at the control center and waiting to be assigned a mission \item \emph{busy}, meaning it is currently executing a mission \item \emph{lame}, meaning it is currently executing a mission that has become moot due to an amendment % is there a better term for this? \item \emph{dead}, meaning that the control center has not heard from the rover and has presumed it to have failed \end{itemize} In addition, the control center keeps track of what mission, including revision number (if any) the rover is currently executing, the time it started, and when the rover is expected back (see below). When a mission is received, it is added to the mission table in the unassigned state. Scattered and scattered-cluster missions are broken up into their individual or cluster components, and these are treated as separate missions.\footnote{It may also be necessary to ensure that the components of a scattered or scattered-cluster mission are executed in order. If so, this can also be done by adding extra information to the missions table.} This forces rovers to return to the control center to offload their data between parts of the scattered mission, minimizing the amount of data that will be lost if the rover fails. Whenever rovers are in the available state, missions are assigned to them. Since all missions are treated as having equal priority, the goal is to maximize the number of missions that are executed before all rovers fail. Hence, we use a greedy algorithm that ranks missions in terms of their expected time to complete\footnote{This algorithm is simple but sub-optimal. It could (and probably will) be improved by taking in to account factors such as batching of nearby missions to minimize travel time.} This is computed as the sum of the round-trip travel time required plus for each experiment the product of the expected time required to execute the experiment and the expected number of times the experiment will need to be repeated. When a rover is available, it assigns the ``cheapest'' mission to the rover by uploading the proper instruction and setting the state tables appropriately. When an amendment is received from Earth, it updates the mission table with the new parameters and increments the mission's revision number. If the mission is still unassigned, no further processing is necessary. If the mission has already been assigned or completed, then the modified portions of the mission will be rescheduled as though it were a new mission. If a ``lame'' rover is still executing the mission, then it may be advantageous to dispatch another rover to tell it to abort.\footnote{Batching would be especially useful here --- the two rovers could then proceed to execute the amended mission and/or other missions without returning to the control center.} This can be implemented by treating the message-delivery as a mission whose expected time to completion for scheduling purposes takes into account both the time required for delivering the message and the time saved by aborting the mission. If the lame rover completes the old mission, the revision number information will indicate to the control center that it is outdated and should probably be ignored. Rovers can fail unpredictably. The control center can only detect failures when it fails to receive any response from the rover. There must therefore be a time after which the control center assumes the rover has failed if no response has been received. We compute this time by summing the required travel time and the bound of required time to perform and repeat the experiments in 95\% of cases. If the mission has not been completed by this time, it is marked unassigned and the rover assigned to it is marked dead. If it turns out that the rover was in fact still operating, and it returns after being marked dead, the experiment results will be downloaded, the mission marked completed, and any new rover assigned to that mission treated as lame, as though a mission amendment had been received. \section{Conclusion} \label{sec:conclusion} This design should be able to ensure that resources are efficiently allocated in order to ensure that as much valid scientific data as possible gets sent back to the scientists at NASA. In order to achieve this goal, we took into consider the various complications that could hinder the process. These complications include mission amendments, the failure of sensors and rovers, and poor communications. The mission distribution algorithm is made flexible by the variable amount of rovers that can get sent out to perform a mission. This flexibility is an asset because it allows the system to efficiently allocate resources under a wide range of work loads. Also, the priority values that can be assigned to missions make the overall system fairly customizable. With simple changes in the procedure for value assignment, the overall system can be customized to suit a variety of needs. Some of the design choices we have made depend on assumptions that would be valid in the real world, whereas other assumptions might only be valid given this design project’s model of the world. However, we believe that we have kept these assumptions to a reasonable level and that within the world of this design project the design would soundly deal with the issues with which it is presented. \end{document}