Error management guidelines
The following points define the policy for dealing with errors and
error stacks:
-
Errors are divided into three general categories:
-
Expected errors - errors such as invalid user input, missing files, incorrect
volume in drive.
-
Unexpected errors - errors that occurred because of an external problem, but
are not normally encountered in the course of using a program. These include
being out of system resource, system call failures, data file corruption and
external programs failing for unknown reasons.
-
Fatal errors - errors that normally indicate a bug within a program. This
include data structure failure and logic errors that can be detected. This
includes malloc failures unless a program is allocating large amounts
of memory in an indeterminate manner.
-
All programs must be able to recover from expected errors. This normally
includes asking a user to intervene and correct the problem. Making an
error stack available for a user to display often will help the user determine
the cause of the error.
-
Unexpected errors should be handled by gracefully shutting down the program.
An error stack should be made available to the user as the data might help
them correct the problem and rerun the program. The error stack should also
be logged.
-
Fatal errors are errors that usually cannot be diagnosed or corrected by a
user. When a program detects a fatal error, it should call one of the panic
functions as soon as possible. Trying to recover from a corrupt program is
usually not possible and can make the problem more difficult to resolve.
-
All return codes from system or library calls should be checked. This
includes close (close failure is normally a fatal error). Special
routines are provided to push UNIX errno error messages
onto the stack.
-
An error stack should contain data relevant to diagnosing a problem.
Great care must be taken in defining what is pushed on the error stack. Too
many inappropriate entries will confuse the user. Too little data will not
provide the user with enough information to determine the cause of an error.
-
Generally, the bottom entry in the stack should contain the error that occurred
and entries above that should contain the actions that were being performed
when the error occurred.
-
Where data such as file names are available they should be formatted into the
localized messages.
-
Normally, a given function should only call one error manager push per
error instance. The function will then return to another layer, which can add
more information to the error stack if it has more meaningful data. The
exception is when a system or library call fails. Since the function is
returning its error in errno rather than a stack, one of the
PushUnixErr routines
described in
Error(3tlib)
can be used to add two entries to the stack. If
a function is performing a complex operation, it might have enough information
to add more than one entry to the stack, but this probably means that the
function should be broken down into smaller operations.
-
Function names should normally not be included in the text of messages. The
localized text is aimed at the end user who does not have the source. If the
same error can originate from two different functions, then different message
IDs with the same localized text should be assigned for each instance. Having
the same error being returned in multiple locations often indicates the need
to provide a common routine to handle the function.
-
Since the message text is isolated from the code, message IDs should be
verbose and meaningful. They will only be typed a few times, they will be
read many times.
-
Error stacks should be allocated at the top level of a program and passed
down. Allocating intermediate error stacks should be avoided unless there
is a need to preserve an existing error.
-
The convention for clearing an error status is to clear it as part of the
error recovery code. Thus all procedures can expect an error status passed to
them has a value of OK and they do not need to clear the error status to
indicated success.
-
The error stack should be the last non-optional parameter to a procedure. In
C++ they should be passed as reference parameters rather than pointers. The
exception is for code were the error stack is optional or C++ code that is
callable from C.
-
The error stack should be the only reporting mechanism for an error. Use of
a functional return value is redundant and ambiguous. Leave the return value
free for returning data. Use the isOk field of the error stack to
indicate success or failure.
Previous topic:
About SCOadmin error handling
© 2004 The SCO Group, Inc. All rights reserved.
UnixWare 7 Release 7.1.4 - 27 April 2004