Automated system administration
by Bob Toxen
These days, almost everything can be done faster, more reliably, and less
expensively by computer -- even the administration of computers themselves!
It is amazing how much time a system administrator can save by automating
daily chores. Among the tasks that can be automated are: system startups,
checks for file system consistency, backups, and the addition of new users.
Of these, one of the most critical -- and commonly automated -- procedures is
system startup. Smooth startups are vital to the integrity of file systems
and the data they contain, so it's little wonder that many hardware vendors
bundle automatic boot programs with the systems they sell.
The first step in bringing a system up is to press the system's boot button,
sometimes called the reset button. (Even this can be automated on a
VAX, but we will not cover that here.) After pressing the boot button, the
system usually enters a state referred to as the PROM monitor. At
this point, a command must typically be entered to start up UNIX. The
command's content depends on the make and model of computer. On some systems,
such as Silicon Graphics' IRIS, the system can be configured via DIP switches
to boot UNIX automatically when the boot button is pressed, thus making the
PROM monitor transparent to users. Such an approach is a good first step to
Irrespective of how UNIX is booted, though, it will generally come up in
single-user mode. At this point, an administrator will usually
fsck and remove daemon locks for lpr,
uucp, or any other process -- before bringing the system up
in multiuser mode. This, too, can be automated. I usually make an
entry in my /.cshrc file to determine whether I'm in single-user mode. If I
am, the system starts a background process that brings on multiuser mode
within 30 seconds. Thus, if there is a reason not to enter multiuser mode, I
have plenty of time to kill the background process. The .cshrc
sequence that provides this function for csh is:
if ( $$ < 5 ) then
echo "Going Multi in 30 seconds \
(kill -9 $child to abort)"
The alias you choose for multi depends on the version of UNIX you
have. For example:
Version 7 and Berkeley 4.x:
alias multi 'kill -9 $$'
alias multi '/etc/init 2'
alias multi '/etc/telinit 2'
This alias device generally cannot be used if one's
single-user shell comes up in the Bourne shell because of a bug in
init that fails to tell the shell that it is a login shell.
The Bourne shell, moreover, looks at the .profile file rather than
the .cshrc file for login shells.
For those interested in fixing the bug in init, note that the
Bourne shell should be invoked as a login shell (with a name starting with a
dash (-) in argv). In System III, this can be done by linking
/bin/sh to /bin/-sh and specifying the single-user shell as
/bin/-sh. This can be dangerous, though, since someone may think
that -sh was accidentally placed in /bin and so delete it.
In that event, you would want to refer to my previous column on file system
repair (October, 1984) since your system will subsequently not boot. In
System V, the bug can be fixed by changing the line in init
In other versions of UNIX, change the line looking something like:
A different shell may also be specified here if desired.
As soon as the system comes up, fsck must be run -- before
any disk writes occur, apart from the updating of a few inode access times.
It is convenient to invoke fsck at the start of the
/etc/rc file, which is itself invoked as a shell script by
init at the outset of multiuser mode. If one enters
multiuser mode automatically, as discussed above, it is mandatory that
fsck be run as the first item in /etc/rc, unless it
is done in the single-user .cshrc file. This is typically done as
echo "Checking the file systems for consistency."
near the top of the /etc/rc file. In System III and V, these lines
should, of course, appear inside the if or
case statement for multiuser mode (usually state 2). (System
III and V offer many states besides single-user and multiuser modes. The
/etc/rc file is invoked when any of them are entered, and supplied
with arguments that indicate what the new mode is, how many times it has been
entered before, and what the previous mode was. System III even invokes
/etc/rc when single-user mode is entered. But all of this is fodder
for another column.) Of course, the /etc/checklist file should
contain a list of device filenames describing file systems to be checked by
fsck, specifying the block device for the root file system,
and defining raw (character) device files for other file systems.
In some implementations, /etc/rc's standard input, output, and error
are not directed to the console. Symptomatic of this is that either the echo
message does not appear on the console, you cannot get acknowledgment of your
responses to fsck's questions, or some of
fsck's messages are lost. To work around this condition,
surround the body of commands in /etc/rc with:
) < /dev/console 2>&1 > /dev/console
You may also have to use stty to reset the console's modes.
Once all file system checks have been performed, the next step would typically
be to mount the file systems that are most often used. This is particularly
important since /usr, commonly one of the file systems to be mounted,
must be accessible for many of the commands that users will subsequently
enter. The following sequence of file system mounts is typical for a system
with two disks that are each split into three partitions:
# md0b is the swap device
echo "Mounting md0c usr"; mount /dev/md0c /usr
echo "Mounting md1a mnt"; mount /dev/md1a /mnt
echo "Mounting md1b scratch";mount /dev/md1b /scratch
echo "Mounting md1c usr/src";mount /dev/md1c /usr/src
Next, the administrator must perform "clean-up" operations to minimize any
damage possibly caused by an earlier crash:
echo "Preserving editor files"
echo "Clearing tmp dirs"
rm -rf /tmp /usr/tmp
mkdir /tmp /usr/tmp ; chmod 777 /tmp /usr/tmp
chgrp sys /tmp /usr/tmp ; chown root /tmp /usr/tmp
echo "Removing locks"
rm -f /usr/adm/acct/nite/lock*
rm -f /usr/spool/uucp/LCK* /usr/spool/uucp/ST* \
echo "Resetting logs"
cd /usr/adm ; cp sulog OLDsulog ; cp /dev/null sulog
cd /usr/adm ; cp cronlog OLDcronlog ; \
cp /dev/null cronlog
At this point, the administrator is ready to start up various daemons:
echo "Starting update"; /etc/update
echo "Starting cron"; /etc/cron
echo "Starting uucico"; /usr/lib/uucp/uucico -r1&
Some of the preceding commands are commonly incorporated into /etc/rc
files. An additional feature, though, that I've added to the /etc/rc
file on my M68010 workstation allows me to get the current time from a VAX by
way of an Ethernet connection. To do this, I do a remote execution of
date and use sed to change the output to a
form suitable as a parameter to date on the local system.
This allows the local system to set the date to within a few seconds of the
OTHER CANDIDATES FOR AUTOMATION
Incremental backups of disk file systems -- that is, backups of all files that
have been created or changed since the last full backup -- are seldom done as
often as they should due to the time and effort required. But incremental
backups can be automated by making an entry in /usr/lib/crontab that
starts the process in the wee hours of the morning, when file systems are
usually quiescent. (Even if there is some activity, the only files that won't
be backed up are the ones that are actually changing at the time.) Backing up
disk 0 onto a file on disk 1 and vice versa is usually safe.
Alternatively, one could leave a tape in the tape drive, which is typically
not used heavily, and backup onto it using a crontab entry.
Of course, users should be warned not to leave write-enabled tapes in the
tape drive. In one installation, I created a backup script that first checked
to see if there was a tape in the drive. If so, it then read the first few
blocks and would only perform the backup if it could ascertain that the
mounted tape was, in fact, a backup tape.
Scripts for creating new user accounts offer another example of automated
system administration. Such a script, run by root, can find the next unused
user-id (perhaps kept in a file) and prompt for the account
name, the person's name, and group-id to use. It could then
add an entry to the /etc/passwd file, create the home directory with
the correct permissions and ownership, and copy in default .profile,
.cshrc, .login, .logout, .exrc,
.mailrc, and .newsrc files.
Many repetitive operations can be automated by creating a shell script or
csh alias. Almost invariably, use of these tools results in
less human effort and more efficient operation. Though I have discussed some
of the most common automated procedures, you can probably think up more that
apply to your environment.
Bob Toxen is a member of the technical staff at Silicon Graphics, Inc.
who has gained a reputation as a leading expert on UUCP
communications, file system repair, and UNIX utilities.
He has also done ports of System III and System V to systems based on
the Zilog 8000 and Motorola 68010 chips.
Copyright © 1985, 2007, 2014, 2020 Robert M. Toxen. All rights reserved.