'\"
'\" Term paper for CS262 Fall 1987, Prof. Alan J. Smith.
'\" $Header$
'\"
'\" This paper must be run through ditroff, since it uses gremlin
'\" files. The complete processing command is:
'\"
'\" 	grn paper.ms | ditroff -ms
'\"
'\" With appropriate printer flags in the various commands.
'\"
'\" CW is used to place a string in fixed-width or switch to a
'\" fixed-width font.
'\" C is a typewriter font for a laserwriter. Use something else if
'\" you don't have one...
.de CW
.ie !\\n(.$ .ft C
.el \&\\$3\fC\\$1\fP\\$2
..
.RP
.TL
Customs \*- A Load Balancing System
.AU
Adam de Boor
.AI
Final Project
U.C. Berkeley, Computer Science 262
549-2264, deboor@buddy.Berkeley.EDU
.AB
This paper describes the implementation and performance of Customs, a
load-balancing system designed for a local area network of cooperating
workstations. It provides simple shell-level and program-level
interfaces to allow tasks to be sent to idle workstations. Customs
has been interfaced to a parallel Make facility and the combination
provides the user with significant performance improvements, usually with
no changes required of the user. The system has been in operation for
roughly three weeks as of this writing, and provides a reasonable
alternative to the usual remote-execution facilities available under
.UX .
.AE
.nr TM 1
.NH
Introduction
.PP
Customs is a load-balancing system for a local network of cooperating
workstations sharing a uniform filesystem. Experience shows that at
any given time, many of these workstations will be idle. Customs is
designed to take advantage of these idle workstations by providing a
simple method of finding an available workstation and executing a job
on it. Simple interfaces to the Customs system exist at both the
program and shell levels.
.PP
While Customs does attempt to provide an environment as similar to the local
environment as possible, it is designed for CPU-intensive jobs, thus it provides
no facilities (such as remote terminals or flow control) to support
interactive remote jobs.
.PP
A basic tenet of Customs is that a user's workstation is his or her
castle; thus it should be available only when the user is not using it
for anything (e.g. at night). To truly support this, however, requires
that foreign jobs disappear when the user begins to make use of the
workstation again. This means that the foreign jobs must be either
terminated or sent somewhere else. Customs does neither, assuming that
foreign jobs will complete in a few minutes, since they are non-interactive.
.PP
Customs was implemented and tested on a network of Sun 3 workstations running
Sun
.UX
release 3.2.
It requires no kernel modifications.
.PP
The rest of this paper is organized as follows: section 2 describes
the architecture of the system \*- the various components and their
responsibilities. Section 3 introduces the mechanism by which the
components communicate. Section 4 discusses the election algorithm
used to maintain the system, while section 5 describes how one uses
the system to execute a job on an idle workstation. Section 6 presents
examples of the system's performance, and section 7 details possible
security problems and how Customs deals with them (and where it
punts). Finally, section 8 suggests possible improvements that could
(or should) be made to the system in the future.
.NH
System Architecture
.PP
The Customs system is divided into a group of servers (called
\fIcustoms agents\fP) organized in a master/slave relationship,
and a set of client programs that use the system.
.PP
Each machine in the system runs a single customs agent, which serves
several purposes:
.RS
.IP (1)
It must decide if its machine is available
and transmit this decision to the current master agent regularly.
Four criteria are used to determine the machine's
availability:
.RS
.IP \(bu 2
The machine has a maximum load average above which none
of the three load averages may be. A load average is the number of
runnable jobs averaged over a given period of time. The three times
used are one, five and fifteen minutes. Typically, this maximum is set at 1.
.IP \(bu 2
There is a minimum amount of swap space, expressed as
a percentage of the total available, that must be free. The initial
value for this limit is twenty-five percent.
.IP \(bu 2
The user must not have been active within a specified amount of time (on Suns,
idleness is determined from the modification time of the keyboard
device. The idleness is the time since the keyboard device was last
modified). This is usually fifteen minutes.
.IP \(bu 2
Finally, only a certain (usually small) number of jobs may
currently be running on behalf of other machines.
.RE
.IP (2)
It must execute jobs for other machines as ordered by
the master agent.
.IP (3)
It acts as the contact point for clients that wish to
use the system.
.IP (4)
It provides system information, such as when a
job starts and finishes, to a single logging process.
.RE
.PP
The \fImaster agent\fP has three responsibilities,
in addition to those mentioned above:
.RS
.IP \(bu 2
It tracks the availability of all the machines in the network (as well
as the machines from which each agent is willing to accept jobs).
.IP \(bu 2
It allocates other \fIslave agents\fP (and itself) to execute (import)
jobs for clients based on the machines that are available.
.IP \(bu 2
It provides information about the network configuration
to any client that requests it (this information consists of the names
of the participating machines, their availability, the hosts from
which they will accept jobs, and the machine that was last allocated).
.RE
Any agent may act as the master agent.
.PP
The client programs that serve as the shell-level interface to the Customs
system are divided into three classes. The first class of clients use
the system to actually export jobs.
The two clients that fall into this class are called
.CW pmake
and
.CW export .
PMake is a program that is compatible with the
.UX -standard
Make but that creates files in parallel and, when it is interfaced
with Customs, distributes their creation across a network.
Export is employed by the user to export a single command to an available
workstation. If no workstation is available, the command is run
locally. Export is also executed by PMake to handle the I/O and
signals and similar things that are required to maintain the exported job.
.PP
Another class of clients is represented by the
.CW importquota
client, which sets and prints the parameters used to determine the local
machine's availability. The above-mentioned four criteria may be set
along with the interval at which the availability of the machine is
transmitted to the master agent. This interval is initially ten
seconds. In addition to these parameters the ``use-local'' flag may
also be set using
.CW importquota .
If use-local is true (non-zero),
the local agent will refuse to allow jobs to be exported until the
local machine is unavailable, barring user idleness. Use-local
defaults to false.
.PP
The final class contains the clients that provide information about
the current state of the network.
.CW Reginfo
finds and displays all the hosts registered in the local network,
their availability (if they are not available, it prints why they are
not available), and the clients they serve, while
.CW logd
accepts logging information from all the registered hosts in the
network and prints the information in a standard format to a file.
.PP
The program-level interface that is used by most of the
above-mentioned clients consists of a library of C subroutines that
hide most of the details of the communications protocol
(e.g.
.CW Customs_Host
returns a host to which a job may be exported.
.CW Customs_SetAvailInterval
changes the interval at which the availability of the local machine is
sent to the master agent, etc.).
.NH
The RPC System
.PP
The various parts of the Customs system communicate using a remote
procedure call mechanism that is based on UDP datagrams, but also includes
support for TCP connections. Both point-to-point and broadcast RPCs
are supported. Remote procedures are called by number (the numbers must
be agreed upon beforehand). Functions in the serving program to handle the
remote calls are bound to (\fIsocket\fP, \fIprocedure-number\fP) pairs. When
a call is received on that socket for that procedure number, the function
is called with the address of the caller, a token for replying and the
input data for the call. When using a TCP stream connection, servers are
bound to the passive socket and are automatically duplicated to the
active connection created when the RPC system performs an \fIaccept\fP
on the passive socket.
.PP
The system is ``multi-threaded'' in that
multiple calls to different destinations and procedures may be pending
at the same time (from the same socket) and both calls and services
may be active on the same socket at the same time. In addition, the
server functions may be called by the system any time control has been
transferred to it (e.g. by making a remote call); thus the server
functions must be completely re-entrant.
Each server binding in the system has a cache of recent calls that is
used to avoid calling the server function more than once per call.
Entries are flushed from a server's cache ten seconds after they are
last referenced.
.PP
There are no real connections between a client and a server, barring
the use of TCP for the communication, so a full internet destination address
must be specified with each call. In addition to the destination,
the parameters and a place to store the results of the call, the number of
times to resend the call and the time between resends must also be
specified for each call.
.PP
The RPC system places no interpretation on either the outgoing
parameters or the returned results. It does no byte-swapping,
encryption or scatter-gathering.
.PP
The RPC system also provides arbitrary timeout events and the ability
to watch input sources other than the sockets being used to send and
receive RPCs. The event mechanism is used extensively in the customs agents.
.PP
The protocol uses the acknowledgement implicit in the return of the
results of an RPC to reduce the number of messages sent. An
explicit acknowledgement is sent only when a reissued call is received
by a server while the original call is still being processed. In such
a case, an acknowledgement message is returned to the caller
and the duplicate call is dropped. The acknowledgement serves to
postpone the timeout of the call on the client side.
.PP
Each message sent by the system has the following header:
.KS
.sp
.GS
pointscale on
width 4i
height 1.5i
file header.g
.GE
.sp
.KE
Each field in the header is presented in network (big-indian) byte-order.
The \fIMessage ID\fP field is simply a sequence number that is
incremented for each RPC call. All messages having to do with a single call
have the same ID number. The \fIMessage Length\fP field contains
the number of bytes of data that follow the header.
.PP
A message may be of one of four types:
.nr pw \w'RPC_ACKNOWLEDGE  'u
.RS
.IP RPC_CALL \n(pwu
A call on the specified procedure number. The rest of the datagram
contains the parameters to pass to the remote procedure.
.IP RPC_ERROR \n(pwu
Sent as a reply to an RPC call when the call is in error. The rest of
the datagram contains the error code in network byte-order.
.IP RPC_REPLY \n(pwu
The message contains the results of an RPC call.
.IP RPC_ACKNOWLEDGE \n(pwu
The message is an explicit acknowledgement, as described above.
.RE
.LP
An additional bit is set in the \fIMessage Type\fP field if it
is a broadcast, rather than a point-to-point, call. Broadcast
calls are handled specially in that errors are not returned nor are
explicit acknowledgements sent. This is to reduce the number of
replies received in response to the call.
.PP
The RPC system is organized around a time-ordered event queue and a
set of masks of interesting streams for use with the \fIselect\fP
system call. These two together provide all that is needed to handle
the resends and message receipts required by the system. The arbitrary
timeouts and input sources are merely an exportation of the procedures
used internally.
.PP
In the initial implementation of the Customs system, I used Sun's RPC.
This obviated the need for a well-known port for the customs agents,
but the problems it created \*- agents mutually timing out when they
attempted to call each other at the same time, a lack of robustness
(due to time constraints imposed by the desire to have reasonably
short timeouts) when an allocated host went down \*- forced the writing
of the simpler, stateless RPC system now used.
.NH
When the Master Goes Away
.PP
Because there is a single master customs agent and no software is perfect,
there needs to be a way to select a new master agent when the old one goes
away (for whatever reason). This section discusses the election algorithm used
to maintain a single master controller in the local network.
.PP
Initial implementations were
oriented toward using the network file system for coordination. The
first used Sun's network file locking service on a well-known file to
determine which agent was the master. However, improper semantics
(locks not being released when the lock holder crashed) and bugs in
the lock servers that caused them to use all a machine's swap space
over time, brought about the second algorithm.
.PP
The second algorithm required the master to place the addresses of all
slaves in a well-known file. When the master crashed, the slaves would
attempt to elect the first one on the list and work their way down.
Problems with a stale data file and the inability to perform network
locking without starvation and without using the already-rejected Sun
service forced the scrapping of this algorithm as well.
.PP
The election algorithm used in the current implementation is based on
that designed by Gusella and Zatti\** for their clock synchronization daemon.
.FS
R. Gusella and S. Zatti, "An Election Algorithm for a Distributed
Clock Synchronization Program," IEEE 6th International Conference on
Distributed Computing Systems, Boston, (May 1986).
.FE
The algorithm uses two broadcast RPC's (Campaign and NewMaster).
Customs_Campaign() returns a boolean, while Customs_NewMaster()
returns nothing. The algorithm has the following components:
.IP (1)
When a slave finds itself without a master, either because the slave
just started up or a call to the master returned an error, it performs
a broadcast RPC on all the local agents' Campaign procedure. This call
can generate one of three responses from each agent in the network:
.RS
.RS
.IP \(bu 2
None. The agent approves of the election.
.IP \(bu 2
FALSE. The agent is the master agent and the campaigning agent should
register with it.
.IP \(bu 2
TRUE. The agent is, or knows of an agent that is, campaigning, and
there is a conflict.
.RE
.RE
.IP "\&"
If the agent receives no responses to its campaign, it elects itself
the master agent and broadcasts a call on all the agents' NewMaster
procedures informing them of its election.
.IP "\&"
If the agent receives a return of FALSE, it immediately aborts its
campaign and registers with the agent that returned FALSE. If the
registration fails, it starts over from the beginning.
.IP "\&"
If the agent receives a return of TRUE, it continues broadcasting for
the entire six seconds of the campaign, in the hopes of contacting the
master. If, at the end of the campaign, it still has not located the
master agent, it undergoes exponential backoff, waiting twice as long
each time some agent returns TRUE during the election. The initial wait
time is a random number of seconds between two and ten. Each agent's random
number generator is randomized based on the machine's ID number and the time
at which the agent started up. When the waiting period expires, it
campaigns again.
.IP (2)
When an agent receives a call on its Campaign procedure, it returns
TRUE if it is campaigning (the RPC system calls servers at any time,
even in the middle of a remote call) or it has a record of some other agent
that is, and FALSE if it is the master. If neither of these conditions
holds, the agent returns nothing, but records the caller's address for
future reference (the address is forgotten after ten seconds). This
makes it less likely that two agents will become master because of
packet loss.
.IP (3)
When an agent's NewMaster procedure is called, and the agent is not
campaigning, it immediately registers with the caller and notes the
new master's address. Should the agent consider itself to be the
master, it immediately cancels its master agent services and returns
to being a slave agent. This would be used to resolve conflicts should
two masters appear on the network, perhaps due to a network partition.
In the current implementation, however this feature is not used.
.PP
A heterogeneous network is supported by the passing of a special token
as the parameter for the two election remote procedure calls. In the
current implementation, this token is an integer that reveals the
byte-ordering of the calling machine. The RPC is serviced only if the
byte-ordering is the same. A better use of the token would be to specify the
actual machine type of the caller (Sun's and IBM RT PC's have the same
byte-ordering but are obviously not binary-compatible, yet with the
current implementation, they would be in the same Customs system).
Additionally, part of the token could be used to partition the network into
smaller Customs networks of cooperating workstations, without having to
enumerate all the hosts in the sub-net as potential clients.
.PP
The algorithm itself is reasonably efficient. It allows an agent to
quickly find the current master on start-up. It has no problems with
stale data left in the file system. However, because the campaign time (six
seconds) is a large portion of the default interval at which the
machine's availability is sent to the master (which is when most
master-failures are detected), the chances of a
collision are much greater than those reported by Gusella and Zatti,
thus the election can take as long as thirty seconds, though
typically, even if the entire network is restarted at the same time,
it takes no more than five. The initial wait in the exponential backoff
algorithm (half shorter and half longer than the campaign interval) was
chosen to allow the conflict to be resolved quickly while lessening the chances
of another conflict.
.PP
The chances of a conflict could be reduced by reseting, when a
Campaign call is received, the timer used to send the machine's availability
to the master, since most of the master-failures are detected when the
availability is transmitted. In practice, however, the master does not
fail very often, so the time required to elect a new master is not as
important as it might be.
.PP
The only other problem with the algorithm lies in point number 3,
above. On rare occasions, say if a machine's network interface seizes
up long enough for a complete election to take place, a host can
become the master of a Customs network of one. If it responds faster
than the true master, it will gradually take away agents from the real
master, thus creating two Customs networks where only one should
exist. This could be alleviated, as is done in Gusella and Zatti's
clock synchronization daemon, by an agent that receives two (or more) FALSE
responses to its Campaign call notifying the first master it meets of
the other's existence. The two masters could then negotiate and one
revert to being just a slave.
.NH
Exporting a Job
.PP
The following section describes in detail the calls and actions
required to export a job and maintain it while it is running.
.PP
The exporting of a job to an idle workstation is a five-step process,
only three of which are visible to the client that wishes to export
the job. The process is illustrated in figure 1.
.KF
.sp
.GS
width 5i
height 4i
pointscale on
file export.g
.GE
.sp
.ce
\fBFigure 1\*-Exporting a Job\fP
.KE
.PP
The first step in the process is to locate a host to which the job may
be exported. This is accomplished by performing a Customs_Host()
call to the local customs agent. The local agent returns to the client
a structure, called an export permit, that contains both the address of the
host that will import the job and an unique identifier that is used to
identify the job once it is running. If the address in the permit is 0
(the internet wildcard address), there is no machine available to
import the job and the exportation has failed.
The call and return are messages 1 and 6 in figure 1.
.PP
In order for the local agent to return the export permit to the
client, it must first contact the current master agent to request a
host. This is a remote procedure call on the master's HostInternal
procedure and is message 2 in figure 1. The master eventually returns
the desired export permit, which the local agent then forwards to the
client as the return value for its Host RPC.
.PP
When the master receives a call on its HostInternal procedure, it
examines its list of hosts and, starting from the host it allocated
last, steps down the list searching for one that is marked available
and that will accept jobs from the requesting host. If no such host is
found before it returns to where it started, it returns an export
permit with the 0 address. If an available host is located, however,
it creates the unique identifier for the job and builds an export
permit that contains this identifier and the address of the requesting
client. It then executes a remote procedure call on the allocated
host's Allocated procedure to inform it of the job it will soon
receive. The allocated host records the information (it is flushed if
the job does not arrive within thirty seconds) and returns to the
master its new availability. These are messages 3 and 4.
.PP
Once the call to the allocated host completes, the master creates
another export permit using the allocated host's address and the same
unique identifier and returns this permit to the requesting agent as
the result of the HostInternal call. This is the fifth message in
figure 1.
.PP
If the allocated host has crashed since the last time the master heard
from it, the Allocated call will fail. The master then continues
searching from where it left off until it ends up where it started
from or it finds an available host that is up. When a slave transmits
its availability to the master, it also sends the time it will wait
before it transmits it again. If the master doesn't hear from the
slave in that time, it marks the host as unavailable. This increases
the probability that an allocated host is, in fact, up.
.PP
Once the client has received an export permit, it must then build a
structure to be transmitted to the importing agent. This structure is
called a way bill (or invoice) and contains the following information:
the job identifier, the real and effective user ID's, the real and
effective group ID's, the ID's of all the other groups the process is
in, the file creation mask, the port number of a UDP socket to which
the agent may return the exit status of the job, the current working
directory, the file to execute, the arguments for the file and all the
strings in the current environment.
.PP
The client performs an RPC on it to the importing agent's Import procedure,
using TCP for the communication, passing the above-mentioned data
as the call's arguments. If the importing agent agrees to accept the
job (the permit may have expired, as mentioned above, or the user ID
may have changed between when the host was requested and when the
Import procedure was called, or the job may expect to run as the super-user.
All of these will cause the export to be denied), it returns the
string ``Ok'' to the client and executes the desired program in the
given environment. If the agent refuses to import the job, the return
value for the RPC is a string explaining why the job was refused.
These are the seventh and eighth messages in figure 1.
.PP
When the job has finished, the importing agent performs an RPC, to the
UDP port it was given in the way bill, on the client's Exit procedure.
The parameter for this call is simply the status returned by the job.
The client need return no data, it simply acknowledges the receipt of
the call and acts accordingly. These are the final messages (9 and 10)
sent during the lifetime of the exported job.
.PP
While the job is running, input to it is sent on the TCP connection
created by the initial Import RPC. Output
from the job comes over the same connection. Signals may be sent to
the job via an RPC on the importing agent's Kill procedure. The
parameters for the call are the job identifier and the signal to be
delivered. The call returns no data.
.PP
This paradigm was chosen because the system was designed mainly for
PMake. An alternative was to allocate a set of hosts at the
beginning of the process and cycle through them, executing the jobs on
each machine in turn. However, because PMakes tend to run for a while,
and because people tend to work in bursts and would like to have their
machines back as soon as possible when they return to work, and
because I planned not to have reclamation or termination of foreign
jobs in Customs, I deemed it unwise to do anything but query the
system for each job that needed exportation.
.NH
System Performance
.PP
This section discusses the performance of the Customs system, both
in terms of the overhead of using the system and the advantages of the
parallel, distributed PMake.
.PP
The time required to locate a host to which a job
may be exported averages 40 ms., while the entire overhead (excluding
the time required to find the absolute path of the current working
directory, which, under NFS, is significant) of executing a job on
another machine (this includes the time to find a host to which to
export the job, contact the host, start the job and receive the exit
status of the job from that host) ranges between 200 and 300 ms. These
times were arrived at by exporting the C program:
.DS
.CW
main()
{
	return(0);
}
.DE
several thousand times.
.KF
.sp
.GS
width 5i
height 4i
pointscale on
file performance.g
.GE
.sp
.ce
\fBFigure 2\*-Recreating the System in Parallel\fP
.QP
.ps -2
The \fIAll Local\fP curve represents the time required to
recreate the system when all the commands were executed on the same machine.
The \fIAll Remote\fP curve is the time when all the commands were
exported using Customs. The \fIOptimal\fP curve was calculated by
performing several runs where each command required to re-create the
system was executed and timed individually. Those execution times were
then used to calculate the running time of PMake if there were
absolutely no overhead of any kind. The curves were calculated using
between two and five runs for each concurrency. When the times varied
widely, more runs were used to determine the ``actual'' running time
for that concurrency. The data were gathered over a 10 hour period
(from 1 to 11 pm) on a Saturday.
.KE
.PP
The practical advantages of the parallel, distributed PMake are
displayed in figure 2. The test case used was that of recreating the
entire Customs system. This consists of compiling 27 source files,
ranging in length from 50 to 2200 lines of C code, and linking the
resulting object files into 7 executable images. The vertical axis
represents the time required by PMake to re-create the system, from
start to finish. The horizontal axis is the maximum number of jobs
executing at once. Because of the nature of the dependency graph
amongst the various files, that many jobs were executing most, but not all,
of the time (for example, the number of running jobs would go to zero
just after the last source file was compiled and before the linking of
the executables started).
.PP
As can be seen from the graph, while it is not economical to use
remote execution when only one file is being created at a time, as the
number of creations executing simultaneously increases, the running
time of the PMake approaches the optimal. As is also clear from the
graph, while running two or three jobs in parallel on the same machine
provides a noticeable improvement (due mostly to multiple jobs filling the
idle time normally caused by the network file access), the machine
quickly saturates, unlike the remote case.
.PP
The time required for the distributed creation varied considerably
more than that for the all-local creation.
Most of the variation in the remote times can be attributed to the
absolute nature of the availability of a machine. In Customs, a
machine is either available or it is not. There is no way to indicate
that one machine is more available than another (perhaps it is more
lightly loaded than the other), thus there is no way to determine
which would, in fact, be the best machine to export to. The machines
used in the test had load averages that ranged from 0.02 to above 0.5.
The running time of the various jobs, therefore, depended on the
machine on which they executed. I think it would be worthwhile to
change the definition of availability from an absolute to a rating,
say from one to a hundred, and the master could then allocate the
``most available'' machine in the network. This would reduce the
variability of the performance.
.PP
One concern I, and others, had at the start of the project was that
the master agent might prove a bottleneck when the system received
heavy use. Since the time required to locate an available host is
approximately 40 milliseconds, this allows the master to service
roughly twenty-five such requests a second. In addition it must field
availability calls and perform the tasks the slave agents perform. Yet
none of these tasks lasts very long (the most expensive task it must
do is import a process, and most of the work there is performed by a
child process) plus the RPC system allows the host requests to be
serviced even when the master is involved in the expensive operation
of performing an RPC. Finally, unlike the Butler system at CMU\**,
.FS
D. Nichols, ``Using Idle Workstations in a Shared Computing
Environment,'' ACM Proceedings of the 11th Symposium on Operating
Systems Principles, pp 5-12.
.FE
I do not expect the Customs system to have to handle networks of more
than two hundred workstations, internet addressing being what it is
(the addresses are four bytes long and typically, at least at
Berkeley, the first three bytes are used to specify a network, leaving
a single byte to differentiate between machines. Two values of the
final byte are devoted to broadcast messages (0 and 255), leaving 254
machines per network),
thus I do not forsee the master as creating a bottleneck in the system.
.NH
Security
.PP
One of the most difficult problems for a
load-balancing/remote-execution system is that of security. This
section describes how Customs handles some of these problems and how
it could be changed to handle more of them.
.PP
There are two main aspects involved in creating a secure load-balancing
system. The client machines must be protected from malicious jobs
while the users of the system must be protected from modified and malicious
system software on the importing machine.
.PP
Customs attempts to address the former problem by means of the unique
job identifier associated with each exported job. This identifier
contains a monotonically-increasing integer in its lower half and the
user ID of the exporter in its upper half. The identifier expires
thirty seconds after it is created unless a job is actually exported
using that identifier. This prevents processes from exporting a job
using an old identifier. When a new master is elected, the low half of
the identifier is set to a random value and increases from there.
The passing of the user ID of the exporter to the master agent allows the
use of the system to be logged in a single place. A problem with this is it
is possible to transmit a false user ID in the initial Host RPC and Customs
has no way of verifying that the process sending the RPC is, in fact, running
under that user ID.
.PP
An inelegant solution to this problem, since the Host call is
transmitted entirely on the local machine, is for the process ID and
stream number to be transmitted, instead of the user ID, and for the
customs agent to then examine kernel data structures to verify that
the process did indeed send the message and to extract the various
identifiers required by the importing agent from those data structures.
In the absence of a true authentication service, such admittedly-obscene
methods are required if authentication is to take place.
.PP
Another possibility is to use an authentication server such as that
proposed by Parris\**
.FS
C. Parris, ``An Authentication Scheme for Use in Dynamic Load Balancing,''
Technical Report UCB/CSD 87/375, Computer Science Division (EECS), University
of California, Berkeley, 1987.
.FE	
for Zhou's\** load-balancing system.
.FS
S. Zhou and D. Ferrari, ``An Experimental Study of Load Balancing
Performance,'' Technical Report, Computer Science Division (EECS), University
of California, Berkeley, July 1986.
.FE
The particular authenticator described is not really appropriate to
the Customs system, however, because only one job is exported at a
time, as opposed to maintaining a connection to a remote shell that
executes jobs for the local shell; thus the overhead of the
authentication would be incurred for each transaction. Given the
amount of overhead involved, this is unacceptable.
.PP
While Customs will not execute a job as the super-user, it is, in fact,
rather lacking in the area of security. There is absolutely no protection,
save public censure and ostracism,
in Customs against modified system software or a modified customs agent.
Even were the above-mentioned reading of kernel structures to be implemented,
there would be nothing to prevent someone from modifying the customs agent
to produce false identification. If the Customs service port were to be in
the internet reserved port range (below 1024), such fraudulence would have to be
performed by a person with super-user privileges. However,
.UX
(and especially Sun
.UX )
security being what it is, such privileges are not particularly hard to come by.
.NH
Summary and Conclusions
.PP
Customs is a load-balancing system for a small (no more than 200 or
so) network of cooperating workstations that share a uniform file
system. The various elements of the system communicate using a simple
remote procedure call mechanism that is based on datagrams. Each time
a client wishes to export a job, it must request a host from the
system, as opposed to maintaining a shell of some sort on another host
and using it to execute jobs for as long as it wants. The agents
themselves execute the jobs in the proper environment. This
environment is as similar to that on the exporting machine as
possible, but no support is provided for interactive jobs, as I expect
Customs to be used mostly for CPU-intensive tasks.
.PP
While Customs is operative and both useful and usable now, it and
its users could certainly benefit from more work on the system.
.IP \(bu 2
The election algorithm needs to be improved to deal gracefully with more than
one master on the network.
.IP \(bu 2
The security of the system needs to be vastly improved. This could be
helped immensely by a relatively simple authentication service.
.IP \(bu 2
There needs to be more support for an heterogeneous network of
machines.
.IP \(bu 2
It may become necessary to implement some sort of workstation
reclamation if imported jobs execute for too long after the user of
the workstation returns.
.IP \(bu 2
People will want even greater control over the availability of their
machine.
.IP \(bu 2
Availability should be changed from an absolute to a relative quantity
so that the most appropriate machine may be selected.
.IP \(bu 2
The customs agents should provide more information so that programs
such as PMake can limit their concurrency automatically, without user
intervention.
