SDI bus timeout/reset recovery
The SDI bus timeout/reset mechanism
is used to recover from hardware problems
in mass storage devices.
These failures can result in jobs timing out,
or can cause a target device-generated bus reset.
First, a few definitions to help explain how this mechanism
works:
Bus device reset-
Immediately forces a target device (disk, tape drive, for
example) to drop all current and pending jobs and perform a
hard reset.
Hard Reset-
A peripheral that implements hard resets will fail all
current and pending jobs to the HBA controller upon
receipt of a SCSI bus reset or bus device reset.
Different brands of controllers may respond to this differently;
for example, some HBAs may keep track
of failed jobs following a bus reset and resubmit them,
without the driver's intervention.
Recovery gauntlet-
A target driver procedure used to recover from
a SCSI bus reset or timeout,
resubmitting those jobs which may have been dropped
or for which errors have occurred.
SCSI bus reset-
Clears all SCSI devices from the bus.
The reset may be initiated by any physical device on the bus,
including the HBA controller itself.
A SCSI bus reset is not a SCSI command,
but rather it is caused by asserting
an electrical signal on the SCSI bus.
What happens to jobs issued to a device prior to the reset
depends on whether the device supports hard resets and/or
soft resets.
One should always use only the supported types of resets for a
given device.
Soft reset-
After a soft reset,
a given SCSI peripheral will attempt
to process all jobs it has identified,
that is, all jobs that have gone through
the necessary SCSI protocol
to be owned by the target device.
If a job has not been identified completely by target device,
the device will not process the job
and it is up to the HBA controller to resubmit or fail the job.
The following commands and return values are used
for this feature.
Command completion values-
SDI_TIME_NOABORT-
The command has timed-out but may still be
on the HBA controller or target device.
SDI_TIME-
A timed-out job has been either aborted or completed.
sfb commands-
SFB_RESET_DEVICE-
Reset a SCSI device through a bus device reset.
Returns SDI_RET_OK if the device reset is supported,
SDI_RET_ERR otherwise.
SFB_RESET_BUS-
Reset the SCSI bus.
Returns SDI_RET_OK if bus reset is supported,
SDI_RET_ERR otherwise.
SFB_TIMEOUT_ON-
Inform the HBA driver that timeouts are enabled.
SFB_TIMEOUT_OFF-
Inform the HBA driver that timeouts are disabled.
SDI ioctls-
B_NEW_TIMEOUT_VALUES-
Provide new timeout values for a specific SCSI address.
B_TIMEOUT_SUPPORT-
Turns timeout support on or off.
NOTE:
Any HBA driver supporting timeout/reset should set
HBA_TIMEOUT_RESET in its device flag.
Two triggers can cause the recovery mechanism to be invoked:
a job timeout detected by the HBA watchdog timer
or a specific SCSI command completion status code.
These latter include:
SDI_CRESET-
The associated command failed due to some device resetting the
SCSI bus.
SDI_RESET-
The associated command failed
due to the HBA resetting the SCSI bus.
SDI_TIME-
The job has been timed out and aborted.
SDI_TIME_NOABORT-
The job has been timed out, but not aborted.
For more details, see
sdi_timeout.
© 2005 The SCO Group, Inc. All rights reserved.
OpenServer 6 and UnixWare (SVR5) HDK - June 2005