Inexpensive, effective mixed-platform Network Security using Linux-based solutions.

Horizon Network Security™


	Home
	About Horizon Network Security™
	Network Security Products
	Network Security Services
	Security Consulting
	Linux System Administration
	Traffic Shaping
	Security Audits
	Disaster Recovery
	Compromise Planning
	Programming Services
	Security News
	Our Publications
	Linux Resources
	Contact Us
	Contact Us Securely

Our Publications

PROBLEM SOLVER

Anatomy of a boot

by Bob Toxen


        "Panic:  init died!"

Most system administrators eventually see this message. It means that a system is dead or dying. If it occurs after the system has been running free of trouble for a while, it's indicative of a minor problem that can be solved by shutting down the system normally (if possible), or executing a sync and rebooting (running fsck in the process).

If, on the other hand, this same message or something similar appears when your system is booting up, you should panic! It means that files critical to your system's operation are incorrect or missing. In a previous issue (October, 1984), we investigated what a system administrator can do to prepare for this eventuality.

This month's column is concerned with what a system implementor (or anyone else with source code) can do to prevent the problem. I define the "system implementor" as a company that maintains system software. This is usually a hardware manufacturer.

THE BIRTH OF A KERNEL

In order to understand what prevents UNIX from booting up, one must first understand how it boots up normally. Many people know that to start a computer, they need only press a reset (boot) button. Some UNIX systems even reboot automatically when you turn them on. This starts a program stored in non-erasable PROM memory called the "PROM monitor" or simply the "monitor".

This program in turn starts UNIX when a cryptic command is entered at the console terminal. The monitor then reads UNIX from the disk into memory and starts it running. Immediately, UNIX determines the amount of memory available in the system, ascertains how much is available for user processes, and displays the values on the console terminal.

If the system does not get to this point, it can be assumed that one of four things has happened. One, there may have been a hardware failure. Two, the hardware may have been incorrectly configured; perhaps a DIP switch was accidentally bumped. Three, the wrong version of software may have been installed in either the PROM monitor or the UNIX kernel. Four, the copy of the kernel on disk may have been damaged or erased. The name of the file containing the kernel is usually /unix or /vmunix. A copy should be kept in a separate file as insurance against a damaged kernel. When /unix (or /vmunix) is changed, this backup copy should not be updated until after the new kernel has booted the system successfully. This is a hedge against the possibility that the new version will not work with your hardware or is otherwise defective.

THE KERNEL MATURES

After the kernel has "sized memory", it initializes any hardware needed for the root and swap disk devices, (Initialization of the console tty device and memory already should have been performed by this point.) The kernel then simulates a mount system call to configure the root file system. Next, process zero (which will become the scheduler) is built and initiated. This process, which contains hand-compiled code copied from kernel data space, does a fork system call.

The child that is created, named process one, invokes an exec system call to start /etc/init. The parent, process zero, then becomes the scheduler, also known as the swapper. This is not a user process but rather just another face of the kernel itself. At this point, the kernel is fully operational.

If the exec of /etc/init fails (because /etc/init is missing or incorrect) or if init ever dies, the kernel will detect it and print the message "panic: init died!" In some implementations, though, this actually does not designate a panic situation (that is, a fatal error). Although the chance of a single file (/etc/init) getting damaged is small, the kernel can easily be modified to invoke, say, /etc/getty if init cannot be exec'd. Getty, like init, does not require standard input or output to be set up -- unlike most other programs.

INIT FIRES UP (VERSION 7 AND BERKELEY UNIX)

Different versions of UNIX have different versions of init. On Version 7 and Berkeley UNIX, init forks off a child process that opens /dev/console for reading and writing. Since the system has no open file descriptors up to this point in the startup process, /dev/console becomes file descriptor zero, which is also known as standard input. The dup system call is then invoked twice to duplicate this file descriptor for descriptors one and two, which are known as standard output and standard error. It then issues ioctl or stty system calls to set the correct baud rate, erase character, and so forth on the tty port. This child process then exec's /bin/sh and voilà -- the machine is in single-user mode.

INIT CHOKES AND DIES (VERSION 7 AND BERKELEY UNIX)

The kernel, /etc/init, /dev/console, and /bin/sh must all exist for the system to come up. A crash causing file system damage to one of these, or a problem as simple as an erroneous chmod can keep the system down for good. I have already covered contingency plans for the kernel and /etc/init being damaged. Let's now consider how to deal with /dev/console problems. If either the open or ioctl system call fails, init can assume that the device node (the entry in /dev is bad.

To catch other problems, one also might set a 10-second alarm clock prior to an open call and turn it off when the open completes. This will account for situations where an open hangs, which may occur if the major or minor device values are wrong (they might, for instance, erroneously refer to a tape drive that already has been turned off).

If init determines that /dev/console is bad, it can create its own version of the file. When init must resort to this, the file should be created in the root directory as a hedge against damage to the /dev directory. The file, typically called /console, first should be removed with the unlink system call in case an old version exists, and then created with the mknod system call. The major and minor device numbers that should be used will, of course, be hardwired in init but these are unlikely to change from release to release and are usually both zero anyway. The init process then can open /console instead of /dev/console.

If the exec of /bin/sh fails, it can try executing other programs that might allow the system to come up. If your system has csh, then /bin/csh is a good second choice. It's possible that a copy of your shell also is kept in /etc, so you might try to execute that file next. If this also fails, you can be assured that you have a very damaged file system. However, recovery is still possible.

One can create a copy of the tape (or floppy) device, usually /dev/rmt0, in the root directory in much the same way as a copy of the console was created. A temporary file, say, /tmpexec -- with mode 770 -- can then be created. This will allow init to copy data from the tape drive to the temporary file until an EOF is reached on the tape. The init process can then close both file descriptors, issue a sync system call, sleep for 10 seconds, and exec /tmpexec.

The idea is to keep a tape of the shell and other useful programs around so they can be used when disaster strikes. The material on this backup can then be loaded into the system and used to fix damage. It may be necessary to create special versions of utilities to be included in the backup since the loading procedure will not allow arguments to be supplied. The tar and fsck commands are likely candidates for such modification.

Another problem to deal with is that the single-user shell may be successfully exec'd but then die immediately thereafter. This can happen if part of the binary gets clobbered in such a way that it starts up but quickly core dumps.

A way to detect this is to have the parent process invoke the time system call before the fork occurs (prior to execing the child process) and then check the wait afterwards to see how long the child was alive. If it was less than roughly 15 seconds, the shell can be assumed to have terminated abnormally and the parent, init, should be prodded into invoking other programs such as csh or tar, or possibly into using /console instead of /dev/console.

There are other possible techniques. One is to create a file system on a floppy (or a tape, if you are clever), including such critical programs as sh, ls, tar, chmod, and so forth. The init process could then attempt to mount the floppy when an exec of /bin/sh fails. In some cases, it may turn out that the best alternative is simply to fix the hardware and reload lost software from backup media.

INIT GOES MULTIUSER (VERSION 7 AND BERKELEY UNIX)

If all goes well, the parent will see its child process die. The final blow is usually delivered by a CTRL-D. The parent, init, then enters multiuser mode. This means that it reads the /etc/ttys file and forks and execs a getty (/etc/getty) for each tty that users will be allowed to login at.

Each getty opens a tty device specified in /etc/ttys for standard input, output, and error, and then prompts for a login name. It then execs login, using the login name it receives as an argument. Login goes on to prompt for a password, verify it against /etc/passwd, and start whatever shell it finds listed in /etc/passwd.

INIT (SYSTEM III AND SYSTEM V)

In System III and V, the administrator is given more control over init states (generically called single-user and multiuser modes) by configuring the ASCII file /etc/inittab. Under System III, one specifies the program that should be invoked on particular ttys in certain states. State 1 is considered to be single-user mode and one usually starts /bin/sh or /bin/csh on /dev/console. For additional security, one might wish to invoke login instead.

Under System V, single-use mode is called "state s". When this state is entered, init will first look in the /etc/inittab file to see if it should enter single-user mode or one of the multiuser modes when the system is first booted. If single-user mode is specified (with the defaultboot entry, or simply by default), init will invoke su which in turn will look in the /etc/passwd file for an entry called root. The su command will then exec the program specified in this entry as the shell.

Thus, if /unix, /etc/init, /etc/inittab, /bin/su, /etc/passwd, or /bin/sh (or /bin/csh) is damaged, the system will not be able to come up. This should illustrate the perils of requiring so many files to exist and be correct for a system to initialize correctly. The dangers are even greater than they might initially appear because the administrator will frequently have cause to alter /etc/passwd and /etc/inittab. The init program can be modified to deal with these problems by using the techniques discussed in "Init Chokes and Dies (Version 7 & Berkeley UNIX)".

Be on guard, though -- if the /etc/inittab file is missing, the System V init program still will prompt the user on the console for the correct state to enter but due to a bug, it will not accept data that has been specified using a computer based on the MC68000. The bug makes init dependent on the byte ordering of ints. To cure the bug, search for where init attempts to read a single byte into the variable c, which is declared as an int. Use a variable declared as a char instead in these instances. If you should decide, though, to add these features to the kernel and init, be sure you have a way to boot your system from a different disk whenever you debug your code!

I have implemented most of the features described here and thus have been able to boot up many systems and remedy many problems that otherwise would have been untouchable. These steps should prove to be good insurance for you as well.

Another insurance policy you should keep in the vault is a recent backup of all files. Under no circumstances should the steps proposed in this article be considered as a replacement for regular backup procedures; consider them, rather, as a complement. With a full backup to resort to, you'll be able to restore vital system and user files even after the severest disaster.

Bob Toxen has gained a reputation as a leading expert on UUCP communications, file system repair, and UNIX utilities. He has also done ports of System III and System V to systems based on the Zilog 8000 and Motorola 68010 chips.

Back

design by: Digital Images Design