LISA05 Tuesday: device errors, iostat, and logging

One of the questions raised at Tuesday night’s BoF was “why are some of the statistics that iostat -E displays result in a console message and some do not?” I was sitting in the back with a copy of Mike Kupfer’s split ON source tree, and decided to have a look. iostat(1M) is a kstat reader, with some simple processing and formatted output. The output function is show_disk_errors() but requires understanding how iostat groups the disks and statistics in its implementation; the code that acquires the device statistics is located in cmd/stat/common/acquire_iodevs.c. Searching for error show that the critical function is acquire_iodev_errors(), and that two classes of kstats contribute to the error output: device_error and iopath_error.

The most direct way to see these statistics is to invoke kstat(1M) with these classes. On my laptop, the result for device_error is

$ kstat -p -c device_error
sderr:0:sd0,err:Device Not Ready        0
sderr:0:sd0,err:Hard Errors     0
sderr:0:sd0,err:Illegal Request 1
sderr:0:sd0,err:Media Error     0
sderr:0:sd0,err:No Device       0
sderr:0:sd0,err:Predictive Failure Analysis     0
sderr:0:sd0,err:Product UJ-832D         Revision
sderr:0:sd0,err:Recoverable     0
sderr:0:sd0,err:Revision        1.50
sderr:0:sd0,err:Serial No
sderr:0:sd0,err:Size    0
sderr:0:sd0,err:Soft Errors     1
sderr:0:sd0,err:Transport Errors        0
sderr:0:sd0,err:Vendor  MATSHITA
sderr:0:sd0,err:class   device_error
sderr:0:sd0,err:crtime  76.139658104
sderr:0:sd0,err:snaptime        1857.960128997

(The laptop has no kstats of class iopath_error.)

We then look for the creation of the named kstat for each of these strings—invocations of kstat_create() to identify the structure member names associated with each. And then we can look for statements that involve those member names; this leads us to the various SD_UPDATE_ERRSTATS() invocations throughout uts/common/io/scsi/targets/sd.c.

The discrepancy between updates and logging arises because the macro, SD_UPDATE_ERRSTATS(), which bumps the counters and the function which displays the error, sd_print_sense_msg(), are sometimes both invoked, and sometimes are not. I don’t know the details of the SCSI error categories, but the decision to make some of these errors silent and some not appears arbitrary. So, unfortunately, the only answer today to determine “why is this messaged” is to look at the code. (Perhaps an ‘sd’ expert can offer an enlightening comment.)

If you’re trying to anticipate and avoid potential failures, having random messages emitted arbitrarily isn’t very helpful: that’s why Mike and the FMA team developed the fault management architecture to have a framework in which errors are processed in a predictable fashion, resulting in the proper diagnosis of faults. Eric described one possible scenario involving disk errors, FMA, and ZFS a couple of weeks ago, but there’s a smaller step that seems useful involving only sd and FMA: the error increments could be converted to error events, and the decision to issue a notice deferred until a series of errors is diagnosed into a fault except, I suppose, if the error can be immediately diagnosed as a critical fault. Taking this step would result in consistent reporting for all disk consumers, including less sophisticated consumers than ZFS, like older filesystems and raw disk accessors.

In a software engineering sense, the FMA approach, where error issuance and diagnosis are separated, is much more sound: at the initial driver software composition, the field experience with the hardware device is typically limited. Over time, the actual impact of errors on system practice becomes better known, and the diagnosis and the actions associated with it can be refined. fmd(1M) can handle on-the-fly module updates gracefully, and also deal with overlapping event flows so that both primitive and ZFS-specific fault handling policies can be implemented, depending on the use of a particular device.

Now, the community that’s discussing technical issues and directions for fault management is aptly called the Fault Management community; if you are interested in how this work is going to proceed, and ways to contribute, I suggest joining it.

[ T: Solaris OpenSolaris LISA05 FMA zfs iostat kstat ]