Debugging Drivers
The hardest part of debugging code—especially kernel code—is not learning how to use the tools but how to analyze the data generated by the tools. The success you have with your analysis depends to a large degree on the skills and experience you bring to it, and these can always be enhanced by learning from others who have “earned their stripes” debugging kernel code.
This chapter offers a collection of driver-debugging tips and techniques from Apple engineers. It starts by listing a few general suggestions for driver debugging and itemizing the tools at your disposal. It then discusses these general areas of driver debugging:
Debugging drivers during the matching and loading stages
Two-machine debugging, including the set-up procedure, the kernel debugging macros, and debugging panics and hangs
Logging, including custom event logging
Some Debugging Basics
Before delving into the nitty-gritty of driver debugging, it might be worth your while to consider a few general debugging tips and to survey the debugging tools available to you.
General Tips
To help you get started, here are a few debugging tips based on the experiences of Apple kernel engineers.
Develop in incremental steps. In other words, don’t write a big chunk of driver code before debugging it. And at each stage verify that the driver can unload as well as load; a driver that can’t unload suggests a memory leak (most likely caused by a missing
release
somewhere).As a suggestion, get driver matching and the basic driver life-cycle methods working first (especially
start
). Then begin talking with your hardware. Once you’re sure that’s working properly, then write the code that talks with the rest of the system.Keep earlier versions of your code. If bugs start to appear in your code, you can “diff” the current version with the prior version to locate exactly which code is new (and thus the probable source of the problem).
Obtain the source code for all other parts of the kernel that your driver interacts with or depends on (directly or indirectly). Things go much easier when you have full source-level debugging.
Obtain symbols for each kernel extension that might affect, or be affected by, your driver. This would include your driver’s provider and its clients. For the full gamut of symbols, obtain a symboled kernel. In particular, a symboled kernel is required to use the kernel debugging macros (described in Using the Kernel Debugging Macros).
See Setting Up for Two-Machine Debugging for instructions on generating symbol files for kernel extensions. To obtain a symboled kernel, you might have to build the Darwin kernel from the open source. See Building and Debugging Kernels in Kernel Programming Guide for details.
Learn
gdb
thoroughly. Start by reading Debugging with GDB, provided as part of Tools Documentation.Become familiar with how computer instructions look in assembler (see Examining Computer Instructions).
Every now and then, single-step through every line of code to make sure your driver’s doing exactly what you want it to.
Don’t overly rely on calls to
IOLog
to provide you with debugging information (see Using IOLog).Make sure your KEXT includes only header files from
Kernel.framework
, in addition to the header files you define. If you include other headers, although their definitions may be in scope at compile time, the functions and services they define will not be available in the kernel environment.
Issues With 64-Bit Architectures
In OS X version 10.3, Apple introduced some changes to support the new 64-bit architectures. Because the changes are implemented in the operating system itself, some drivers may be affected even if they aren’t running on a 64-bit computer. To better explain these changes, this section first provides a brief description of PCI address translation.
Hardware devices on the PCI bus can handle 32-bit addresses, which means that a PCI device has a 4-gigabyte window into main memory. When a PCI device performs a data transaction to or from main memory, the device driver prepares this memory for I/O. On systems where both the memory subsystem and the PCI device drivers use 32-bit addressing, there are no difficulties. On systems where the memory subsystem uses 64-bit addressing, however, a PCI device can see only four gigabytes of main memory at a time. To address this issue, Apple chose to implement address translation. In this scheme, blocks of memory are mapped into the 32-bit address space of PCI devices. The PCI device still has a 4-gigabyte window into main memory, but that window may contain noncontiguous blocks of main memory. The address translation is performed by a part of the memory controller called the DART (device address resolution table). The DART keeps a table of translations to use when mapping between the physical addresses the processor sees and the addresses the PCI device sees (called I/O addresses).
If your driver adheres to documented, Apple-provided APIs, this address-translation process is transparent. For example, when your driver calls IOMemoryDescriptor’s prepare
method, a mapping is automatically placed in the DART. Conversely, when your driver calls IOMemoryDescriptor’s release
method, the mapping is removed. Although this has always been the recommended procedure, failure to do so may not have resulted in undesirable behavior in previous versions of OS X. Beginning in OS X version 10.3, however, failure to follow this procedure may result in random data corruption or panics.
If your driver experiences difficulty on an OS X version 10.3 or later system, you should check that you are following these guidelines:
Always call
IOMemoryDescriptor::prepare
to prepare the physical memory for the I/O transfer (in OS X version 10.3 and later, this also places a mapping into the DART).Balance each call to
prepare
with a call tocomplete
to unwire the memory.Always call
IOMemoryDescriptor::release
to remove the mapping from the DART.On hardware that includes a DART, pay attention to the DMA direction for reads and writes. On a 64-bit system, a driver that attempts to write to a memory region whose DMA direction is set up for reading will cause a kernel panic.
A side effect of the changes in the memory subsystem is that OS X is much more likely to return physically contiguous page ranges in memory regions. In earlier versions of OS X, the system returned multi-page memory regions in reverse order, beginning with the last page and going on towards the first page. Because of this, a multi-page memory region seldom contained a physically contiguous range of pages.
The greatly increased likelihood of seeing physically contiguous page ranges in memory regions might expose latent bugs in drivers that previously did not have to handle physically contiguous pages. If your driver is behaving incorrectly or panicking, be sure to investigate this possibility.
Another result of the memory-subsystem changes concerns the physical addresses some drivers obtain directly from the pmap layer. Because there is not a one-to-one correspondence between physical addresses and I/O addresses, physical addresses obtained from the pmap layer have no purpose outside the virtual memory system itself. Drivers that use pmap calls to get such addresses (such as pmap_extract
) will fail to function on systems with a DART. To prevent the use of these calls, OS X version 10.3 will refuse to load a kernel extension that uses them, even on systems without a DART.
Driver-Debugging Tools
Apple provides a number of tools that you can use to debug I/O Kit device drivers. Table 7-1 presents some of the more important tools.
Tool or API | Description |
---|---|
The GNU debugger, used in two-machine kernel debugging. See Tips on Using gdb for some useful information on | |
Kernel debugging macros | Powerful macros designed for use in |
A utility that does a variety of things with kernel extensions. It loads kernel extensions and, prior to loading, validates the extensions, providing diagnostic information related to file-permission and dependency errors. It also enables the debugging of kernel extensions (using | |
A utility that prints columns of information about the kernel extensions that are currently loaded in a system. The information most useful to driver debugging is the driver’s load address, references held by the driver to other kernel extensions, the number of references other extensions have to it, and the version of a kernel extension. See Unloading Your Driver for more information. | |
| Provides a snapshot of the I/O Registry, showing the hierarchical structure of client–provider relationships among current drivers and nubs in the system. With the necessary options, it shows the properties associated with each node in the registry. The I/O Registry Explorer shows the same information in a GUI interface. Both tool and application are particularly useful for debugging problems with matching. See Debugging Matching Problems. |
| The former tool displays a summary of memory allocation by allocator type (instance, container, and |
| Defines a set of values for the IOKitDebug property. When this property is defined, the system writes the status of various driver events, such as attaching, matching, and probing, to the system log. See Debugging Matching Problems for further information. |
Debugging Matching and Loading Problems
Before you begin debugging your driver’s hardware-specific functionality, you should first ensure that your KEXT is valid and that it loads, unloads, and matches properly. This section describes how to use OS X tools, such as kextload
and kextstat
, to authenticate and validate your KEXT, resolve its dependencies, and test its matching.
Driver Dependencies
Every driver declares its dependencies on other loadable kernel extensions (such as an I/O Kit family), kernel subcomponents (such as com.apple.kernel.iokit
), or KPIs (such as com.apple.kpi.libkern
) in its Info.plist
file. At the top level of the Info.plist
file, a driver lists each dependency in the form of a key-value pair in the OSBundleLibraries
dictionary; the key is the KEXT or kernel subcomponent name and the value is the KEXT’s or subcomponent’s version number.
In OS X v10.2, declaring a dependency on the I/O Kit kernel subcomponent com.apple.kernel.iokit
(version 6.x) implicitly allows your driver to access symbols in two other kernel subcomponents: com.apple.kernel.mach
and com.apple.kernel.bsd
.
In OS X v10.3, a driver that declares a dependency on com.apple.kernel.iokit
version 6.x (the 10.2 version) will still enjoy automatic access to Mach and BSD symbols.
In OS X v10.4, Apple introduced sustainable kernel programming interfaces, or KPIs. In particular, the KPIs support the development of NKEs (network kernel extensions), file-system KEXTs, and other non-I/O Kit KEXTs. KPIs are also recommended for pure I/O Kit KEXTs (KEXTs that use only I/O Kit-provided APIs) that target OS X v10.4 and later. Be aware, however, that no KEXT can declare dependencies on a combination of both kernel subcomponents and KPIs. To learn more about KPIs, see Kernel Framework Reference; to find out how to declare dependencies on them, see Kernel Extension Dependencies.
Using kextload, kextunload, and kextstat
During the development process, it’s a good idea to load and unload your driver regularly to catch problems in these processes before they become very difficult to find. The kextload
tool allows you to load your driver at the command line and performs a wide range of authentication and validation tests. kextload
also allows you to debug your driver’s start
and probe
routines before matching begins. The kextunload
tool terminates and unregisters the I/O Kit objects associated with your KEXT and unloads the code and personalities for that KEXT. For complete information on kextload
, kextunload
, and kextstat
, view the man
pages for these tools in a Terminal window.
You can use kextload
to load and start your KEXT with or without starting I/O Kit matching, but kextload
provides no information about the matching process itself. If your driver isn’t matching properly, see Debugging Matching Problems for more information.
Loading Your Driver
The kextload
tool checks your driver’s Info.plist
file thoroughly before it attempts to load your driver into the kernel. If required properties are missing or incorrectly identified, if your declared dependencies are missing or incompatible, or if your driver has the wrong permissions or ownership, kextload
refuses to load your KEXT. To find out why your KEXT doesn’t load, run kextload
with the -t
or the -tn
option. This tells kextload
to perform all its tests on your KEXT and provide you with a dictionary of the errors it finds (the addition of the -n
option prevents kextload
from loading the KEXT even if it finds no errors). For example, Listing 7-1 shows the Info.plist
of a KEXT named BadKEXT, that contains three errors.
Listing 7-1 Example Info.plist
containing errors
<key>IOKitPersonalities</key> |
<dict> |
<key>BadKEXT</key> |
<dict> |
<key>CFBundleIdentifier</key> |
<string>com.MySoftwareCompany.driver.BadKEXT</string> |
<key>IOClass></key> |
<string>com_MySoftwareCompany_driver_BadKEXT</string> |
<!-- No floating point numbers are allowed in the kernel. --> |
<key>IOKitDebug</key> |
<number>0.0</number> |
<key>IOMatchCategory</key> |
<string>com_MySoftwareCompany_driver_BadKEXT</string> |
<key>IOResourceMatch</key> |
<string>IOKit</string> |
<!-- The personality is missing its IOProviderClass key-value pair. --> |
</dict> |
</dict> |
<key>Libraries</key> |
<dict> |
<key>com.apple.kernel.iokit</key> |
<string>1.1</string> |
<!-- com.apple.kernel.libkern is misspelled. --> |
<key>com.apple.kernel.libker</key> |
<string>1.1</string> |
<key>com.apple.kernel.mach</key> |
<string>1.1</string> |
</dict> |
In addition, BadKEXT has ownership and permissions problems: It is owned by the user user_name
and the group staff
and its folders have permissions rwxrwxr-x
(775 octal). For reasons of security, KEXTs must be owned by the root user and the wheel group. Further, no component of a KEXT may be writable by any user other than root. (For more information on the correct permissions and ownership of KEXTs, see Testing and Deploying Drivers). Assuming root privileges by using the sudo
command and running kextload -t
on BadKEXT provides the following information, shown in Listing 7-2.
Listing 7-2 kextload -t
outputs errors found in KEXT
[computer_name:] user_name% sudo kextload -t BadKEXT.kext |
Password: |
can't add kernel extension BadKEXT.kext (validation error) (run kextload |
on this kext with -t for diagnostic output) |
kernel extension BadKEXT.kext has problems: |
Validation failures |
{ |
"Info dictionary missing required property/value" = { |
"IOKitPersonalities:BadKEXT:IOProviderClass" |
} |
"Info dictionary property value is of illegal type" = { |
"IOKitPersonalities:BadKEXT:IOKitDebug" |
} |
} |
Authentication failures |
{ |
"File owner/permissions are incorrect" = { |
"/Users/user_name/Projects/BadKEXT/build/BadKEXT.kext" |
"/Users/user_name/Projects/BadKext/build/BadKEXT.kext/Contents/Info.plist" |
"/Users/user_name/Projects/BadKext/build/BadKEXT.kext/Contents" |
"/Users/user_name/Projects/BadKext/build/BadKEXT.kext/Contents/MacOS/BadKEXT" |
"/Users/user_name/Projects/BadKext/build/BadKEXT.kext/Contents/MacOS" |
} |
} |
Missing dependencies |
{ |
"com.apple.kernel.libker" = |
"No valid version of this dependency can be found" |
} |
The checks kextload
performs are listed below.
Info.plist
dictionary must be present and must be of typedictionary
.CFBundleIdentifier
property must be of typestring
and no longer than 63 characters.CFBundleVersion
property must be a proper “vers” stylestring
.If the optional
OSBundleCompatibleVersion
property is present, it must be a proper “vers” style string with a value less than or equal to theCFBundleVersion
value.CFBundleExecutable
must be of typestring
.If the
IOKitPersonalities
property is present (theIOKitPersonalities
property is required for driver matching but isn’t required for the KEXT to be valid), all its values must be of typedictionary
.kextload
performs the following checks on the personalities.IOClass
property must be of typestring
.IOProviderClass
property must be of typestring
.CFBundleIdentifier
must be of typestring
.IOKitDebug
property, if present, must be of typeinteger
(and not a floating-point number).All property types must be valid for in-kernel use; for example, the
date
type is not allowed in the kernel.
OSBundleLibraries
must be present and be of typedict
. TheOSBundleLibraries
values must exist and have valid versions.If all personalities have the
IOKitDebug
property set to a nonzero value,kextload
marks the KEXT an ineligible for safe boot, even if theOSBundleRequired
property is present and valid.
If the -t
option doesn’t provide you with enough information, you can tell kextload
to give you a step-by-step account of its actions by using the -v
option followed by a number from 1 to 6 indicating how much information you want. In this way, you can see at what point kextload
fails.
Unloading Your Driver
The kextunload
tool simply tries to unload your driver’s code and personalities, terminating and unregistering all I/O Kit objects associated with your driver. Unfortunately, kextunload
provides very little information about why your driver might fail to unload. It’s possible that the system log (which you can view at /var/log/system.log
) will display some information about a failure to unload, but you’ll get the most useful information using the other methods and tools described in this section.
A driver that has a mismatched retain
or release
, one that is depended on by other kernel modules, or one that has a failing stop
method will fail to unload. Whatever the cause, if your driver fails to unload, the debugging process becomes arduous because you are forced to restart the computer each time you need to load and unload your KEXT.
In general, you should make only incremental changes in your driver’s code, checking its ability to load and unload after every change, to narrow down the possible culprits if your driver suddenly refuses to unload. In this way, you can revert to a previous, unloadable version of your driver and examine the changes you’ve made to it.
You can determine if your driver cannot unload because another module is depending on it by using the kextstat
tool to display information about all currently loaded KEXTs. Listing 7-3 shows a few lines of the output from kextstat
.
Listing 7-3 Partial listing of kextstat
output
Index Refs Address Size Wired Name (Version) <Linked Against> |
1 1 0x0 0x0 0x0 com.apple.kernel (6.0) |
2 8 0x0 0x0 0x0 com.apple.kernel.bsd (6.0) |
3 33 0x0 0x0 0x0 com.apple.kernel.iokit (6.0) |
4 32 0x0 0x0 0x0 com.apple.kernel.libkern (6.0) |
// Some lines not shown... |
11 7 0xaadd000 0x9000 0x8000 com.apple.iokit.IOPCIFamily (1.2) <4 3> |
// Some lines not shown... |
26 6 0xb14c000 0x1a000 0x19000 com.apple.iokit.IOUSBFamily (1.2) <4 3 2> |
// Some lines not shown... |
The first column of the kextstat
output is the load index of the KEXT. kextstat
uses this index in the last column to show which KEXTs a particular KEXT is linked against. For example, the IOPCIFamily KEXT, index 11 in Listing 7-3, depends on both com.apple.kernel.iokit
(index 3) and com.apple.kernel.libkern
(index 4).
The second column is the sum of all the references to the selected KEXT by other currently loaded KEXTs. Check this column if your driver fails to unload: If its number of references is nonzero, kextunload
cannot unload it.
If there are no other KEXTs depending on your driver yet it fails to unload, the problem is often a mismatched retain
or release
on an object instantiated from one of your classes. If you have multiple classes defined in your KEXT, you can use ioclasscount
to find out which class has outstanding instances. In a Terminal window, type
ioclasscount MyClassName |
The ioclasscount
tool outputs the instance count of the specified object, offset by the number of direct subclasses that have at least one instance allocated. With this information, you then go through your driver’s code looking for a retain
on your object that’s not balanced with a release
. A mismatched retain
can be difficult to find because often it is not explicitly in your code, but in a container class you’ve put your object into, in a method you’ve called, or in a still-running user application that holds an io_object_t
reference on the object. For example, calling
addEventSource ( My_Event_Source_Object ) |
implicitly places a retain on My_Event_Source_Object that remains until you call
removeEventSource ( My_Event_Source_Object ) |
Debugging Your Driver’s start and probe Methods
When you’ve determined that your KEXT can load and unload successfully, you may want to debug your driver’s start
and probe
methods. To do this, you use kextload
. This is because installing your KEXT in /System/Library/Extensions
and allowing kextd
to load it when needed does not present you with the opportunity to attach to the computer in a gdb
session before your driver has already matched and started.
If you are developing an I/O Kit driver, you can use kextload
with the -l
option to load your KEXT binary on the target computer but to refrain from sending any personalities to the kernel. By separating loading from matching, kextload -l
allows you to defer matching (and the calling of your driver’s start
and probe
methods) until after you set up a two-machine debugging session (for more information on how to set up such a session, see Setting Up for Two-Machine Debugging).
To avoid matching before you’re ready, you can invoke kextunload
on your driver before using kextload -l
to make sure none of your driver’s personalities remain in the kernel from an earlier debugging session. Be aware, however, that plugging in a new device after you’ve used kextload -l
to load your KEXT binary may cause some other driver to match on the device. If this is not desirable, take care not to change your system’s configuration until you are ready.
After you’ve attached to the target computer and set breakpoints on the start
and probe
methods, you use kextload
with the -m
option to send the personalities to the kernel, which triggers matching. If your KEXT’s start
and probe
methods do not execute, a successful match did not occur and you should examine the driver personalities to find out why. See Debugging Matching Problems for advice on how to debug matching problems.
If you are developing a non–I/O Kit KEXT, such as an NKE (network kernel extension) or a file system KEXT, you implement the MODULE_START
method instead of the start
method an I/O Kit driver implements. To debug the MODULE_START
method, you use kextload
with the -i
or -I
options to interactively load the KEXT and its dependencies, pausing after each step for your permission to continue.
Using the -i
option tells kextload
to automatically load your KEXT’s dependencies but to ask for permission to continue at each step in the process of loading the KEXT itself. The -I
option causes kextload
to pause for input at each step of every stage of loading, including the loading of dependencies. If you want to debug your KEXT’s MODULE_START
method, you use kextload -i
or -I
to load your KEXT binary and then set up a two-machine debugging session before you give permission to invoke the MODULE_START
method. Listing 7-4 shows the loading of an example KEXT called MyKEXT using kextload -i
.
Listing 7-4 Using kextload -i
for interactive KEXT loading
[computer_name:/tmp] user_name% sudo kextload -i MyKEXT.kext |
Password: |
Load extension MyKEXT.kext and its dependencies [Y/n]? y |
Load module /tmp/MyKEXT.kext/Contents/MacOS/MyKEXT [Y/n]? y |
kextload: module com.MySoftwareCompany.driver.MyKEXT created as #70 at |
address 0xaf5c000, size 8192 |
kextload: You can now break to the debugger and set breakpoints for this |
extension. |
Start module /tmp/MyKEXT.kext/Contents/MacOS/MyKEXT (answering no will abort |
the load) [Y/n]? y |
kextload: started module /tmp/MyKEXT.kext/Contents/MacOS/MyKEXT |
kextload: MyKEXT.kext loaded successfully |
Send personality "MyKEXT" to the kernel [Y/n]? y |
kextload: matching started for MyKEXT.kext |
[computer_name:/tmp] user_name% |
Debugging Matching Problems
Getting your driver to match correctly can be one of the more challenging aspects of I/O Kit driver development. If your driver loads and unloads properly but does not match, the most important thing you can do is become thoroughly familiar with your family’s matching language. Although the I/O Kit implements driver matching in a three-phase, subtractive process (described in detail inIOKit Fundamentals), each I/O Kit family is free to define an arbitrarily complex matching language of its own.
After the I/O Kit discards drivers of the wrong class in the class-matching phase, the I/O Kit examines the remaining drivers for family-specific matching properties in the passive-matching phase. If the family implements family-specific matching, it considers the candidate driver’s personality and raises or lowers the driver’s probe
score based on the matches it finds.
Although some families define no family-specific matching, instead preferring drivers to probe the device, many others exercise some control over the passive-matching phase. The USB family, for example, requires its drivers to use specific combinations of properties defined by the USB Common Class Specification. In order for your USB driver to match, you must include all the elements of exactly one property combination. Another example is the PCI family which defines a number of match keys whose values are the contents of various PCI configuration-space registers. A candidate PCI driver can then match on a single register value, a list of values, or even a partial register value indicated by a supplied bit mask.
If your driver does not match, first make sure the property values in your driver’s personality match the corresponding properties your device publishes in the I/O Registry. You can use the I/O Registry Explorer application (available at /Developer/Applications
) or the command-line tool ioreg
to view the properties your device publishes. If you use I/O Registry Explorer, select Find to search for your device name, follow the path to the object representing your device, and view its properties in the lower window. Alternately, to view the I/O Registry in a Terminal window, type
ioreg -bl |
The -b
option displays the object names in bold and the -l
option displays the object properties.
If there are no discrepancies between your device’s published properties and your driver’s matching properties, next make sure you know how your driver’s family implements matching. Families that define their own matching language do so by implementing the matchPropertyTable
method. By viewing the code for this method, you can determine how your driver’s family uses the properties it finds in a driver’s personality dictionary to adjust the probe
score.
In very rare cases, a driver might declare IOResources
as the value of its IOProviderClass
key. IOResouces
is a special nub attached to the root of the I/O Registry that makes resources, such as the BSD kernel, available throughout the system. Traditionally, drivers of virtual devices match on IOResources
because virtual devices do not publish nubs of their own. Another example of such a driver is the HelloIOKit KEXT (described in Hello I/O Kit: Creating a Device Driver With Xcode) which matches on IOResources
because it does not control any hardware.
Finally, you can place the IOKitDebug
property in your driver’s personality dictionary. When you give this property a non-zero value, it places the status of various events, such as attaching, matching, and probing, into the system log (available at /var/log/system.log
). The value of the IOKitDebug
property defines which events are logged; see IOKitDebug.h
(available in /System/Library/Frameworks/Kernel.framework/Headers/IOKit
) for the I/O Kit–defined values. Unless you are interested in only a single event or a particular set of events, set the value to 65535
to get information about all the events IOKitDebug
covers. With this information, you can see which events occurred and whether they succeeded or failed, but not why they failed.
Although the IOKitDebug
property can be useful during development, be sure to set its value to 0
(or remove the property altogether) before shipping your driver because a non-zero value will prevent your driver from loading during a safe-boot. For more information on safe booting, see the Loading Kernel Extensions at Boot Time section of Kernel Extension Programming Topics.
Two-Machine Debugging
For debugging a device driver or indeed any code that resides in the kernel, two computers are a necessity. Buggy kernel code has a nasty tendency to crash or hang a system, and so directly debugging that system is often impossible.
Two-machine debugging with gdb
is the main pathway to finding bugs in driver code. This section takes you through the procedure for setting up two computers for debugging, offers a few tips on using gdb
in kernel code, introduces you to the kernel debugging macros, and discusses techniques for finding bugs causing kernel panics and hangs.
Setting Up for Two-Machine Debugging
This section summarizes the steps required to set up two computers for kernel debugging. It draws heavily on the following documents, which you should refer to for more detailed information:
The tutorial Hello Debugger: Debugging a Device Driver with GDB. (This tutorial is part of Kernel Extension Programming Topics.)
The “When Things Go Wrong” section of Building and Debugging Kernels in Kernel Programming Guide. (This document is available on-line in Darwin Documentation.)
In two-machine debugging, one system is called the target computer and the other the host (or development) computer. The host computer is the computer that actually runs gdb
. It is typically the computer on which your driver is developed—hence, it’s also referred to as the development computer. The target computer is the system on which the driver to be debugged is run. Your host and target computers should be running the same version of the Darwin kernel, or as close as possible to same version. (Of course, if you’re debugging a panic-prone version of the kernel, you’ll want the host computer to run the most recent stable version of Darwin.) For optimal source-level debugging, the host computer should have the source code of the driver, any kernel extensions related to your driver (such as its client or provider), and perhaps even the kernel itself (/xnu
).
In order for two-machine debugging to be feasible, the following must be true:
For versions of OS X before OS X version 10.2, both computers must be on the same subnet.
You must have login access to both computers as an administrator (group
admin
), because you’ll need root privileges to load your KEXT (you can use thesudo
command).You must be able to copy files between the computers using FTP,
scp
(SSH),rsync
, AFP, or similar protocol or tool.
When all this is in place, complete the following steps:
Target Set the NVRAM
debug
variable to0x144
, which lets you drop into the debugger upon a non-maskable interrupt (NMI) and, if you’re running OS X v. 10.2 or later, lets you debug two computers not on the same subnet. You can usesetenv
to set the flag within Open Firmware itself (on PowerPC-based Macintosh computers), or you can use thenvram
utility. For the latter, enter the following as root at the command line:nvram boot-args="debug=0x144"
It’s a good idea to enter
nvram boot-args
(no argument) first to get any current NVRAM variables in effect; then include these variables along with the debug flag when you give thenvram boot-args
command a second time. Reboot the system.Host or Target Copy the driver (or any other kernel extension) to a working directory on the target computer.
Host Set up a permanent network connection to the target computer via ARP. The following example assumes that your test computer is
target.goober.com
:$ ping -c 1 target.goober.com
ping results: ....
$ arp -an
target.goober.com (10.0.0.69): 00:a0:13:12:65:31
$ arp -s target.goober.com 00:a0:13:12:65:31
$ arp -an
target.goober.com (10.0.0.69) at00:a0:13:12:65:31 permanent
This sequence of commands establishes a connection to the target computer (via
ping
), displays the information on recent connections ARP knows about (arp -an
), makes the connection to the target computer permanent by specifying the Ethernet hardware address (arp -s
), and issues thearp -an
command a second time to verify this.Target Create symbol files for the driver and any other kernel extensions it depends on. First create a directory to hold the symbols; then run the
kextload
command-line tool, specifying the directory as the argument of the-s
option:$ kextload -l -s /tmp/symbols /tmp/MyDriver.kext
This command loads
MyDriver.kext
but, because of the-l
option, doesn’t start the matching process yet (that happens in a later step). If you don’t want the driver to load just yet, specify the-n
option along with the-s
option. See Using kextload, kextunload, and kextstat for thekextload
procedure for debugging a driver’s start-up code.Target or Host Copy the symbol files to the host computer.
Host Optionally, if you want to debug your driver with access to all the symbols in the kernel, obtain or build a symboled kernel. For further information, contact Apple Developer Technical support. You can find the instructions for building the Darwin kernel from the Open Source code in the Building and Debugging Kernels in Kernel Programming Guide.
$ gdb /mach_kernel
If you have a symboled kernel, specify the path to it rather than
/mach_kernel
. It is important that you rungdb
on a kernel of the same version and build as the one that runs on the target computer. If the versions are different, you should obtain a symboled copy of the target’s kernel and use that.Host In
gdb
, add the symbol file of your driver.(gdb) add-symbol-file /tmp/symbols/com.acme.driver.MyDriver.sym
Add the symbol files of the other kernel extensions in your driver’s dependency chain.
Host Tell
gdb
that you will be debugging remotely.(gdb) target remote-kdp
Target Break into kernel debugging mode. Depending on the model of your target system, either issue the appropriate keyboard command or press the programmer’s button. On USB keyboards, hold down the Command key and the Power button; on ADB keyboards, hold down the Control key and the Power button. If you’re running OS X version 10.4 or later, hold down the following five keys: Command, Option, Control, Shift, and Escape.
You may have to hold down the keys or buttons for several seconds until you see the “Waiting for remote debugger connection” message.
Host Attach to the target computer and set breakpoints.
(gdb) attach target.goober.com
(gdb) break 'MyDriverClass::WriteData(* char)'
(gdb) continue
Be sure you give the
continue
command; otherwise the target computer is unresponsive.Target Start the driver running.
$ kextload -m -t /tmp/MyDriver.kext
The
-m
option starts the matching process for the driver. The-t
option, which tellskextload
to conduct extensive validation checks, is really optional here; ideally, your driver should have passed these checks during an earlier stage of debugging (see Using kextload, kextunload, and kextstat). After starting the driver, perform the actions necessary to trigger the breakpoint.Host When the breakpoint you set is triggered, you can begin debugging your driver using
gdb
commands. If you “source” the kernel debugging macros (see the following section, Using the Kernel Debugging Macros), you can use those as well.
Using the Kernel Debugging Macros
Apple includes a set of kernel debugging macros as part of Darwin. They have been written by engineers with an intimate knowledge of how the Darwin kernel works. Although it is possible to debug driver code without these macros, they will make the task much easier.
The kernel debugging macros probe the internal structures of a running OS X system in considerable depth. With them you can get summary and detailed snapshots of tasks and their threads in the kernel, including such information as thread priority, executable names, and invoked functions. The kernel debugging macros also yield information on the kernel stacks for all or selected thread activations, on IPC spaces and port rights, on virtual-memory maps and map entries, and on allocation zones. See Table 7-2 for a summary of the kernel debugging macros.
You can obtain the kernel debugging macros from the Darwin Open Source repository. They are in the .gdbinit
file in the /xnu/osfmk
branch of the source tree. Because .gdbinit
is the standard name of the initialization file for gdb
, you might already have your own .gdbinit
file to set up your debugging sessions. If this is the case, you can combine the contents of the files or have a “source” statement in one .gdbinit
file that references the other file. To include the macros in a .gdbinit
file for a debugging session, specify the following gdb
command shortly after running gdb
on mach_kernel
:
(gdb) source /tmp/.gdbinit |
(In this example, /tmp
represents any directory that holds the copy of the .gdbinit
file you obtained from the Open Source repository.) Because the kernel debugging macros can change between versions of the kernel, make sure that you use the macros that match as closely as possible the version of the kernel you’re debugging.
Macro | Description |
---|---|
| Displays a summary listing of tasks |
| Displays a summary listing of all activations |
| Displays the kernel stacks for all activations |
| Displays a summary listing of all the VM maps |
| Displays a summary listing of all the VM map entries |
| Displays a summary listing of all the IPC spaces |
| Displays a summary listing of all the IPC rights |
| Displays a summary listing of all the kernel extension binaries |
| Displays status of the specified task |
| Displays the status of all activations in the task |
| Displays all kernel stacks for all activations in the task |
| Displays status of the specified task's VM map |
| Displays a summary list of the task's VM map entries |
| Displays status of the specified task's IPC space |
| Displays a summary list of the task's IPC space entries |
| Displays status of the specified thread activation |
| Displays the kernel stack for the specified activation |
| Displays the status of the specified VM map |
| Displays a summary list of the specified VM map's entries |
| Displays the status of the specified IPC space |
| Displays a summary list of all the rights in an IPC space |
| Displays the status of the process identified by PID |
| Displays the status of the process identified by a proc pointer |
| Displays information about a kernel extension binary |
| Given an address, displays the kernel extension binary and offset |
| Displays zone information |
| Displays the panic log information |
| Switch thread context |
| Switch context |
| Reset context |
A subset of the kernel debugging macros are particularly useful for driver writers: showallstacks
, switchtoact
, showkmodaddr
, showallkmods
, and switchtoctx
. The output of showallstacks
lists all tasks in the system and, for each task, the threads and the stacks associated with each thread. Listing 7-5 shows the information on a couple tasks as emitted by showallstacks
.
Listing 7-5 Example thread stacks shown by showallstacks
(gdb) showallstacks |
... |
task vm_map ipc_space #acts pid proc command |
0x00c1e620 0x00a79a2c 0x00c10ce0 2 51 0x00d60760 kextd |
activation thread pri state wait_queue wait_event |
0x00c2a1f8 0x00ccab0c 31 W 0x00c9fee8 0x30a10c <ipc_mqueue_rcv> |
continuation=0x1ef44 <ipc_mqueue_receive_continue> |
activation thread pri state wait_queue wait_event |
0x00c29a48 0x00cca194 31 W 0x00310570 0x30a3a0 <kmod_cmd_queue> |
kernel_stack=0x04d48000 |
stacktop=0x04d4bbe0 |
0x04d4bbe0 0xccab0c |
0x04d4bc40 0x342d8 <thread_invoke+1104> |
0x04d4bca0 0x344b4 <thread_block_reason+212> |
0x04d4bd00 0x334e0 <thread_sleep_fast_usimple_lock+56> |
0x04d4bd50 0x81ee0 <kmod_control+248> |
0x04d4bdb0 0x45f1c <_Xkmod_control+192> |
0x04d4be00 0x2aa70 <ipc_kobject_server+276> |
0x04d4be50 0x253e4 <mach_msg_overwrite_trap+2848> |
0x04d4bf20 0x257e0 <mach_msg_trap+28> |
0x04d4bf70 0x92078 <.L_syscall_return> |
0x04d4bfc0 0x10000000 |
stackbottom=0x04d4bfc0 |
task vm_map ipc_space #acts pid proc command |
0x00c1e4c0 0x00a79930 0x00c10c88 1 65 0x00d608c8 update |
activation thread pri state wait_queue wait_event |
0x00ddaa50 0x00ddbe34 31 W 0x00310780 0xd608c8 <rld_env+10471956> |
continuation=0x1da528 <_sleep_continue> |
The typical number of stacks revealed by showallstacks
runs into the dozens. Most of the threads associated with these stacks are asleep, blocked on continuation (as is that for the second task shown in the above example). Stacks such as these you can usually ignore. The remaining stacks are significant because they reflect the activity going on in the system at a particular moment and context (as happens when an NMI or kernel panic occurs).
Thread activations and stacks in the kernel—including those of drivers—belong to the task named kernel_task
(under the command
column). When you’re debugging a driver, you look in the active stacks in kernel_task
for any indication of your driver or its provider, client, or any other object it communicates with. If you add the symbol files for these driver objects before you begin the debugging session, the indication will be much clearer. Listing 7-6 shows an active driver-related thread in kernel_task
in the context of adjacent threads.
Listing 7-6 Kernel thread stacks as shown by showallstacks
activation thread pri state wait_queue wait_event |
0x0101ac38 0x010957e4 80 UW 0x00311510 0x10b371c <rld_env+13953096> |
continuation=0x2227d0 <_ZN10IOWorkLoop22threadMainContinuationEv> |
activation thread pri state wait_queue wait_event |
0x0101aaf0 0x01095650 80 R |
stack_privilege=0x07950000 |
kernel_stack=0x07950000 |
stacktop=0x07953b90 |
0x07953b90 0xdf239e4 <com.apple.driver.AppleUSBProKeyboard + 0x19e4> |
0x07953be0 0xe546694 <com.apple.iokit.IOUSBFamily + 0x2694> |
0x07953c40 0xe5a84b4 <com.apple.driver.AppleUSBOHCI + 0x34b4> |
0x07953d00 0xe5a8640 <com.apple.driver.AppleUSBOHCI + 0x3640> |
0x07953d60 0xe5a93bc <com.apple.driver.AppleUSBOHCI + 0x43bc> |
0x07953df0 0x2239a8 <_ZN22IOInterruptEventSource12checkForWorkEv+18> |
0x07953e40 0x222864 <_ZN10IOWorkLoop10threadMainEv+104> |
0x07953e90 0x2227d0 <_ZN10IOWorkLoop22threadMainContinuationEv> |
stackbottom=0x07953e90 |
activation thread pri state wait_queue wait_event |
0x0101b530 0x0101c328 80 UW 0x00311500 0x10b605c <rld_env+13963656> |
continuation=0x2227d0 <_ZN10IOWorkLoop22threadMainContinuationEv> |
You can use showallstacks
in debugging panics, hangs, and wedges. For instance, it might reveal a pair of threads that are deadlocked against each other or it might help to identify a thread that is not handling interrupts properly, thus causing a system hang.
Another common technique using the kernel debugging macros is to run the showallstacks
macro and find the stack or stacks that are most of interest. Then run the switchtoact
macro, giving it the address of a thread activation, to switch to the context of that thread and its stack. From there you can get a backtrace, inspect frames and variables, and so on. Listing 7-7 shows this technique.
Listing 7-7 Switching to thread activation and examining it
(gdb) switchtoact 0x00c29a48 |
(gdb) bt |
#0 0x00090448 in cswnovect () |
#1 0x0008f84c in switch_context (old=0xcca194, continuation=0, new=0xccab0c) at |
/SourceCache/xnu/xnu-327/osfmk/ppc/pcb.c:235 |
#2 0x000344b4 in thread_block_reason (continuation=0, reason=0) at |
/SourceCache/xnu/xnu-327/osfmk/kern/sched_prim.c:1629 |
#3 0x000334e0 in thread_sleep_fast_usimple_lock (event=0xeec500, lock=0x30a3ac, |
interruptible=213844) at /SourceCache/xnu/xnu-327/osfmk/kern/sched_prim.c:626 |
#4 0x00081ee0 in kmod_control (host_priv=0xeec500, id=4144, flavor=213844, |
data=0xc1202c, dataCount=0xc12048) at /SourceCache/xnu/xnu-327/osfmk/kern/kmod.c:602 |
#5 0x00045f1c in _Xkmod_control (InHeadP=0xc12010, OutHeadP=0xc12110) at |
mach/host_priv_server.c:958 |
#6 0x0002aa70 in ipc_kobject_server (request=0xc12000) at |
/SourceCache/xnu/xnu-327/osfmk/kern/ipc_kobject.c:309 |
#7 0x000253e4 in mach_msg_overwrite_trap (msg=0xf0080dd0, option=3, send_size=60, |
rcv_size=60, rcv_name=3843, timeout=12685100, notify=172953600, rcv_msg=0x0, |
scatter_list_size=0) at /SourceCache/xnu/xnu-327/osfmk/ipc/mach_msg.c:1601 |
#8 0x000257e0 in mach_msg_trap (msg=0xeec500, option=13410708, send_size=213844, |
rcv_size=4144, rcv_name=172953600, timeout=178377984, notify=256) at |
/SourceCache/xnu/xnu-327/osfmk/ipc/mach_msg.c:1853 |
#9 0x00092078 in .L_syscall_return () |
#10 0x10000000 in ?? () |
Cannot access memory at address 0xf0080d10 |
(gdb) f 4 |
#4 0x00081ee0 in kmod_control (host_priv=0xeec500, id=4144, flavor=213844, |
data=0xc1202c, dataCount=0xc12048) at /SourceCache/xnu/xnu-327/osfmk/kern/kmod.c:602 |
602 res = thread_sleep_simple_lock((event_t)&kmod_cmd_queue, |
(gdb) l |
597 simple_lock(&kmod_queue_lock); |
598 |
599 if (queue_empty(&kmod_cmd_queue)) { |
600 wait_result_t res; |
601 |
602 res = thread_sleep_simple_lock((event_t)&kmod_cmd_queue, |
603 &kmod_queue_lock, |
604 THREAD_ABORTSAFE); |
605 if (queue_empty(&kmod_cmd_queue)) { |
606 // we must have been interrupted! |
Remember that when use the switchtoact
that you’ve actually changed the value of the stack pointer. You are in a different context than before. If you want to return to the former context, use the resetctx
macro.
The showallkmods
and showkmodaddr
macros are also useful in driver debugging. The former macro lists all loaded kernel extensions in a format similar to the kextstat
command-line utility (Listing 7-8 shows a few lines of output). If you give the showkmodaddr
macro the address of an “anonymous” frame in a stack, and if the frame belongs to a driver (or other kernel extension), the macro prints information about the kernel extension.
Listing 7-8 Sample output from the showallkmods
macro
(gdb) showallkmods |
kmod address size id refs version name |
0x0ebc39f4 0x0eb7d000 0x00048000 71 0 3.2 com.apple.filesystems.afpfs |
0x0ea09480 0x0ea03000 0x00007000 70 0 2.1 com.apple.nke.asp_atp |
0x0e9e0c60 0x0e9d9000 0x00008000 69 0 3.0 com.apple.nke.asp_tcp |
0x0e22b13c 0x0e226000 0x00006000 68 0 1.2 com.apple.nke.IPFirewall |
0x0e225600 0x0e220000 0x00006000 67 0 1.2 com.apple.nke.SharedIP |
0x0df5d868 0x0df37000 0x00028000 62 0 1.2 com.apple.ATIRage128 |
0x0de96454 0x0de79000 0x0001e000 55 3 1.3 com.apple.iokit.IOAudioFamily |
... |
Tips on Using gdb
If you hope to become proficient at I/O Kit driver debugging, you’ll have to become proficient in the use of gdb
. There’s no getting around this requirement. But even if you are already familiar with gdb
, you can always benefit from insights garnered by other driver writers from their experience.
Examining Computer Instructions
If you don’t have symbols for a driver binary—and even if you do—you should try examining the computer instructions in memory to get a detailed view of what is going on in that binary. You use the gdb
command x
to examine memory in the current context; usually, x
is followed by a slash (“/”) and one to three parameters, one of which is i
. The examine-memory parameters are:
A repeat count
The display format:
s
(string),x
(hexadecimal), ori
(computer instruction)The unit size:
b
(byte),h
(halfword),w
(word—four bytes),g
(giant word—eight bytes)
For example, if you want to examine 10 instructions before and 10 instructions after the current context (as described in Tips on Debugging Panics), you could issue a command such as:
(gdb) x/20i $pc -40 |
This command says “show me 20 instructions, but starting 40 bytes” (4 bytes per instruction) “before the current address in the program counter” (the $pc
variable). Of course, you could be less elaborate and give a simple command such as:
(gdb) x/10i 0x001c220c |
which shows you 10 computer instructions starting at a specified address. Listing 7-9 shows you a typical block of instructions.
Listing 7-9 Typical output of the gdb
“examine memory” command
(gdb) x/20i $pc-40 |
0x8257c <kmod_control+124>: addi r3,r27,-19540 |
0x82580 <kmod_control+128>: bl 0x8d980 <get_cpu_data> |
0x82584 <kmod_control+132>: addi r0,r30,-19552 |
0x82588 <kmod_control+136>: lwz r31,-19552(r30) |
0x8258c <kmod_control+140>: cmpw r31,r0 |
0x82590 <kmod_control+144>: bne+ 0x825c0 <kmod_control+192> |
0x82594 <kmod_control+148>: mr r3,r31 |
0x82598 <kmod_control+152>: addi r4,r27,-19540 |
0x8259c <kmod_control+156>: li r5,2 |
0x825a0 <kmod_control+160>: bl 0x338a8 |
<thread_sleep_fast_usimple_lock> |
0x825a4 <kmod_control+164>: lwz r0,-19552(r30) |
0x825a8 <kmod_control+168>: cmpw r0,r31 |
0x825ac <kmod_control+172>: bne+ 0x825c0 <kmod_control+192> |
0x825b0 <kmod_control+176>: addi r3,r27,-19540 |
0x825b4 <kmod_control+180>: bl 0x8da00 <fast_usimple_lock+32> |
0x825b8 <kmod_control+184>: li r3,14 |
0x825bc <kmod_control+188>: b 0x82678 <kmod_control+376> |
0x825c0 <kmod_control+192>: lis r26,49 |
0x825c4 <kmod_control+196>: li r30,0 |
0x825c8 <kmod_control+200>: lwz r0,-19552(r26) |
Needless to say, you need to know some assembler in order to make sense of the output of the examine-memory command. You don’t need to be an expert in assembler, just knowledgeable enough to recognize patterns. For example, it would be beneficial to know how pointer indirection with an object looks in computer instructions. With an object, there are two indirections really, one to get the data (and that could be null) and one to an object’s virtual table (the first field inside the object). If that field doesn’t point to either your code or kernel code, then there’s something that might be causing a null-pointer exception. If your assembler knowledge is rusty or non-existent, you can examine the computer instructions for your driver’s code that you know to be sound. By knowing how “healthy” code looks in assembler, you’ll be better prepared to spot divergences from the pattern.
Breakpoints
Using breakpoints to debug code inside the kernel can be a frustrating experience. Often kernel functions are called so frequently that, if you put a breakpoint on a function, it’s difficult to determine which particular case is the one with the problem. There are a few things you can do with breakpoints to ameliorate this.
Conditional breakpoints. A conditional breakpoint tells
gdb
to trigger a breakpoint only if a certain expression is true. The syntax iscond
<breakpoint index> <expression>. An example is the following:(gdb) cond 1 (num > 0)
However, conditional breakpoints are very slow in two-machine debugging. Unless you’re expecting the breakpoint expression to be evaluated only a couple dozen times or so, they are probably too tedious to rely on.
Cooperative breakpoints. To speed things up you can use two breakpoints that cooperate with each other. One breakpoint is a conditional breakpoint set at the critical but frequently invoked function. The other breakpoint, which has a command list attached to it, is set at a point in the code which is only arrived at after a series of events has occurred. You initially disable the conditional breakpoint and the second breakpoint enables it at some point later where the context is more pertinent to the problem you’re investigating (and so you don’t mind the slowness of expression evaluation). The following series of
gdb
commands sets up both breakpoints:(gdb) cond 1 (num > 0)
(gdb) disable 1
(gdb) com 2
enable 1
continue
end
If this debugging is something you do frequently, you can put breakpoint-setup commands into a macro and put that macro in your
.gdbinit
file.Breakpoints at dummy functions. You can use the previous two techniques on code that you do not own. However, if it’s your code that you’re debugging, the fastest way to trigger a breakpoint exactly when and where you want is to use dummy functions. Consider the follow stripped code snippet:
void dummy() {
;
}
void myFunction() {
// ...
if (num > 0)
dummy();
// ...
}
The expression in
myFunction
is the same exact expression you would have in a conditional breakpoint. Just set a breakpoint on the dummy function. When the breakpoint is triggered, get a backtrace, switch frames, and you’re in the desired context.
Single-Stepping
Single-stepping through source code does not necessarily take you from one line to the next. You can bounce around in the source code quite a bit because the compiler does various things with the symbols to optimize them.
There are two things you can do to get around this. If it’s your code you’re stepping through, you can turn off optimizations. Or you can single-step through the computer instructions in assembler because one line of source code typically generates several consecutive lines of assembler. So, if you find it hard to figure things out by single-stepping through source, try single-stepping through assembler.
To single-step in gdb
, use the stepi
command (si
for short). You can get a better view of your progress if you also use the display
command, as in this example:
(gdb) display/4i $pc |
This displays the program counter and the next three instructions as you step.
Debugging Kernel Panics
You might be familiar with kernel panics: those unexpected events that cripple a system, leaving it completely unresponsive. When a panic occurs on OS X, the kernel prints information about the panic that you can analyze to find the cause of the panic. On pre-Jaguar systems, this information appears on the screen as a black and white text dump. Starting with the Jaguar release, a kernel panic causes the display of a message informing you that a problem occurred and requesting that you restart your computer. After rebooting, you can find the debug information on the panic in the file panic.log
at /Library/Logs/
.
If you’ve never seen it before, the information in panic.log
might seem cryptic. Listing 7-10 shows a typical entry in the panic log.
Listing 7-10 Sample log entry for a kernel panic
Unresolved kernel trap(cpu 0): 0x300 - Data access DAR=0x00000058 PC=0x0b4255b4 |
Latest crash info for cpu 0: |
Exception state (sv=0x0AD86A00) |
PC=0x0B4255B4; MSR=0x00009030; DAR=0x00000058; DSISR=0x40000000; LR=0x0B4255A0; |
R1=0x04DE3B50; XCP=0x0000000C (0x300 - Data access) |
Backtrace: |
0x0B4255A0 0x000BA9F8 0x001D41F8 0x001D411C 0x001D6B90 0x0003ACCC |
0x0008EC84 0x0003D69C 0x0003D4FC 0x000276E0 0x0009108C 0xFFFFFFFF |
Kernel loadable modules in backtrace (with dependencies): |
com.acme.driver.MyDriver(1.6)@0xb409000 |
Proceeding back via exception chain: |
Exception state (sv=0x0AD86A00) |
previously dumped as "Latest" state. skipping... |
Exception state (sv=0x0B2BBA00) |
PC=0x90015BC8; MSR=0x0200F030; DAR=0x012DA94C; DSISR=0x40000000; LR=0x902498DC; |
R1=0xBFFFE140; XCP=0x00000030 (0xC00 - System call) |
Kernel version: |
Darwin Kernel Version 6.0: |
Wed May 1 01:04:14 PDT 2002; root:xnu/xnu-282.obj~4/RELEASE_PPC |
This block of information has several different parts, each with its own significance for debugging the problem.
The first line. The single most important bit of information about a panic is the first line, which briefly describes the nature of the panic. In this case, the panic has something to do with a data access exception. The registers that appear on the same line as the message are the ones with the most pertinent data; in this case they are the DAR (Data Access Register) and the PC (Program Counter) registers. The registers shown on the first line vary according to the type of kernel trap. The hexadecimal code before the description, which is defined in
/xnu/osfmk/ppc_init.c
, indicates the exception type. Table 7-3 describes the possible types of exceptions.
Trap Value | Type of Kernel Trap |
---|---|
| System reset |
| Computer check |
| Data access |
| Instruction access |
| External interrupt |
| Alignment exception |
| Illegal instruction |
The registers. Under the first “Exception state” is a snapshot of the contents of the CPU registers when the panic occurred.
The backtrace. Each hexadecimal address in the backtrace indicates the state of program execution at a particular point leading up to the panic. Because each of these addresses is actually that of the function return pointer, you need to subtract four from this address to see the instruction that was executed.
The kernel extensions. Under “Kernel loadable modules in backtrace” are the bundle identifiers (
CFBundleIdentifier
property) of all kernel extensions referenced in the backtrace and all other kernel extensions on which these extensions have dependencies. These are the kernel extensions for which you’ll probably want to generate symbol files prior to debugging the panic.The other exception states. Under “Proceeding back via exception chain:” are the previous exception states the kernel experienced, separated by snapshots of the contents of the CPU registers at the time the exceptions occurred. Most of the time, the first exception state (immediately following the first line of the panic log) gives you enough information to determine what caused the panic. Sometimes, however, the panic is the result of an earlier exception and you can examine the chain of exceptions for more information.
The kernel version. The version of the Darwin kernel and, more importantly, the build version of the
xnu
project (the core part of the Darwin kernel). If you are debugging with a symboled kernel (as is recommended), you need to get or build the symboled kernel from this version ofxnu
.
General Procedure
There are many possible ways to debug a kernel panic, but the following course of action has proven fruitful in practice.
Get as many binaries with debugging symbols as possible.
Make a note of all the kernel extensions listed under “Kernel loadable modules in backtrace”. If you don’t have debugging symbols for some of them, try to obtained a symboled version of them or get the source and build one with debugging symbols. This would include
mach_kernel
, the I/O Kit families, and other KEXTs that are part of the default install. You need to have the same version of the kernel and KEXT binaries that the panicked computer does, or the symbols won’t line up correctly.Generate and add symbol files for each kernel extension in the backtrace.
Once you’ve got the kernel extension binaries with (or without) debugging symbols, generate relocated symbol files for each KEXT in the backtrace. Use
kextload
with the-s
and-n
options to do this; kextload prompts you for the load address of each kernel extension, which you can get from the backtrace. Alternatively, you can specify the-a
option with-s
when usingkextload
to specify KEXTs and their load addresses. Although you don’t need to relocate symbol files for all kernel extensions, you can only decode stack frames in the kernel or in KEXTs that you have done this for. After you rungdb
onmach_kernel
(preferably symboled), usegdb
’sadd-symbol-file
command for each relocatable symbol files you’ve generated; see Setting Up for Two-Machine Debugging for details.Decode the addresses in the panic log.
Start with the PC register and possibly the LR (Link Register). (The contents of the LR should look like a valid text address, usually a little smaller than the PC-register address.) Then process each address in the backtrace, remembering to subtract four from each of the stack addresses to get the last instruction executed in that frame. One possible way to go about it is to use a pair of
gdb
commands for each address:(gdb) x/i <address>-4
...
(gdb) info line *<address>-4
You need the asterisk in front of the address in the
info
command because you are passing a raw address rather than the symbolgdb
expects. Thex
command, on the other hand, expects a raw address so no asterisk is necessary.Listing 7-11 gives an example of a symbolic backtrace generated from
x/i <address>-4
. You’ll know you’ve succeeded when all the stack frames decode to some sort of branch instruction in assembler.Interpret the results.
Interpreting the results of the previous step is the hardest phase of debugging panics because it isn’t mechanical in nature. See the following section, Tips on Debugging Panics, for some suggestions.
Listing 7-11 Example of symbolic backtrace
(gdb) x/i 0x001c2200-4 |
0x1c21fc <IOService::PMstop(void)+320>: bctrl |
0xa538260 <IODisplay::stop(IOService *)+36>: bctrl |
0x1bbc34 <IOService::actionStop(IOService *, IOService *)+160>: bctrl |
0x1ccda4 <runAction__10IOWorkLoopPFP8OSObjectPvn3_iPB2Pvn3+92>: bctrl |
0x1bc434 <IOService::terminateWorker(unsigned long)+1824>: bctrl |
0x1bb1f0 <IOService::terminatePhase1(unsigned long)+928>: bl |
0x1bb20c <IOService::scheduleTerminatePhase2(unsigned long)> |
0x1edfcc <IOADBController::powerStateWillChangeTo(unsigned long, unsigned long, IOService *)+88>: bctrl |
0x1c54e8 <IOService::inform(IOPMinformee *, bool)+204>: bctrl |
0x1c5118 <IOService::notifyAll(bool)+84>: bl |
0x1c541c <IOService::inform(IOPMinformee *, bool)> |
0x1c58b8 <IOService::parent_down_05(void)+36>: bl |
0x1c50c4 <IOService::notifyAll(bool)> |
0x1c8364 <IOService::allowCancelCommon(void)+356>: bl |
0x1c5894 <IOService::parent_down_05(void)> |
0x1c80a0 <IOService::serializedAllowPowerChange2(unsigned long)+84>: bl |
0x1c8200 <IOService::allowCancelCommon(void)> |
0x1ce198 <IOCommandGate::runAction(int (*)(OSObject *, void *, void *, void *, void *), void *, void *, void *, void *)+184>: bctrl |
0x1c802c <IOService::allowPowerChange(unsigned long)+72>: bctrl |
0xa52e6c ???? |
0x3dfe0 <_call_thread_continue+440>: bctrl |
0x333fc <thread_continue+144>: bctrl |
Tips on Debugging Panics
The following tips might make debugging a panic easier for you.
As noted previously, always pay attention to the first line of the panic message and the list of kernel extensions involved in the panic. The panic message provides the major clue to the problem. With the kernel extensions, generate relocated symbol files for the debugging session.
Don’t assume that the panic is not your driver’s fault just because it doesn’t show up in the backtrace. Passing a null pointer to an I/O Kit family or any other body of kernel code will cause a panic in that code. Because the kernel doesn’t have the resources to protect itself from null pointers, drivers must be extremely vigilant against passing them in.
The
showallstacks
kernel debugging macro is very useful for debugging panics. See Using the Kernel Debugging Macros for more information.Use
gdb
’s$pc
variable when debugging panics withgdb
. The$pc
variable holds the value of the program counter, which identifies the place where the panic exception was taken. If you want to examine the context of the panic, you could issue a command such as:(gdb) x/20i $pc -40
This displays 10 instructions in assembler before and after the point where the panic occurred. If you have the appropriate source code and symbols, you can enter:
(gdb) l *$pc
This shows the particular line of code that took the panic.
If you have a panic caused by an Instruction Access Exception (
0x400
) and the PC register is zero, it means that something in the kernel branched to zero. The top frame in the stack is usually a jump through some function pointer that wasn’t initialized (or that somehow got “stomped”).Panics caused by a Data Access Exception
0x300
are quite common. These types of panics typically involve indirection through zero (in other words, a dereferenced null pointer). Any time a null pointer is dereferenced, a panic results. When you get a Data Access Exception, first check the DAR register; if the value is less than about 1000 the panic is probably the result of indirection through a null pointer. This is because most classes are no larger than about 1000 bytes and when an offset is added to a null pointer, the result is about 1000 or less. If the result is much larger, it’s probable that the location of a pointer has been trashed (as opposed to its contents) and the contents of an unknown location is being used as a pointer in your code.A null pointer implies the possibility of a race condition with a shutdown or completion value. When called, a completion routine starts to free resources and if your driver is referencing these resources after the routine is called, it will probably get null pointers back.
If you get a panic in
kalloc
,kmem
, orIOMalloc
, it suggests that you’re using a freed pointer. Even if your driver code doesn’t show up in the backtrace, your driver could be the culprit. Using a freed pointer is likely to break the kernel’s internal data structures for allocation.Panics can also be caused by accidentally scribbling on someone else’s memory. These “land mines” can be notoriously hard to debug; a backtrace doesn’t show you much, except that something bad has happened. If the panic is reproducible, however, you have a chance to track down the offending code by using a “probe” macro. A probe macro helps to bracket exactly where in the code the scribbling happened. By definition, a scribble is a byte that does not have the expected contents (because they were altered by some other code). By knowing where the panic occurred and where the byte last held its expected value, you know where to look for the scribbling code.
Often it’s the case that your driver is the offending scribbler. To find your offending code (if any), define a probe macro that tests whether a memory address (
A
in the example below) has the expected value (N
). (Note thatA
is the address in the DAR register of the original scribble panic.) IfA
doesn’t have the expected value, then cause a panic right then and there:Uint32 N = 123;
Uint32 *A;
A = &N;
// ...
#define PROBE() do {
if (A && *A != N)
*(Uint32)0 = 0;
} while (0)
By putting the probe macro in every function where
A
andN
appear, you can narrow down the location of the scribble. If your driver is not the one doing the scribbling, it still might be indirectly responsible because it could be causing other code to scribble. For example, your driver might be asking some other code to write in your address space; if it’s passed them the wrong address, it might result in data being scribbled on inside it.
Debugging System Hangs
System hangs are, after kernel panics, the most serious condition caused by badly behaved kernel code. A hung system may not be completely unresponsive, but it is unusable because you cannot effectively click the mouse button or type a key. You can categorize system hangs, and their probable cause, by the behavior of the mouse cursor.
Cursor doesn’t spin and won’t move. This symptom indicates that a primary interrupt is not being delivered. The mouse doesn’t even spin because that behavior is on a primary interrupt; its not spinning indicates that the system is in a very tight loop. What has probably happened is that a driver object has disabled an interrupt, causing code somewhere in the driver stack to go into an infinite loop. In other words, a piece of hardware has raised an interrupt but the driver that should handle it is not handling it and so the hardware keeps raising it. The driver has probably caused this “ghost” interrupt, but is unaware it has and so is not clearing it.
Cursor spins but won’t move. This symptom indicates that a high-priority thread such as a timer is spinning.
Cursor spins and moves, but nothing else. This symptom suggests that the USB thread is still scheduled, thus indicating a deadlock in some driver object that is not related to USB or HI.
For system hangs with the first symptom—the cursor doesn’t spin and won’t move—your first aim should be to find out what caused the interrupt. Why is the hardware controlled by your driver raising the interrupt? If your driver is using a filter interrupt event source (IOFilterInterruptEventSource), you might want to investigate that, too. With a filter event source a driver can ignore interrupts that it thinks aren’t its responsibility.
With any system hang, you should launch gdb
on the kernel, attach to the hung system and run the showallstacks
macro. Scan the output for threads that are deadlocked against each other. Or, if it’s an unhandled primary interrupt that you suspect, find the running thread; if it is the one that took the interrupt, it is probably the thread that’s gone into an infinite loop. If the driver is in an infinite loop, you can set a breakpoint in a frame of the thread’s stack that is the possible culprit; when you continue and hit the breakpoint almost immediately, you know you’re in an infinite loop. You can single-step from there to find the problem.
Debugging Boot Drivers
The OS X BootX booter copies drivers for hardware required in the boot process into memory for the kernel’s boot-time loading code to load. Because boot drivers are already loaded by the time the system comes up, you do not have as much control over them as you do over non-boot drivers. In addition, a badly behaving boot driver can cause your system to become unusable until you are able to unload it. For these reasons, debugging techniques for boot drivers vary somewhat from those for other drivers.
The most important step you can take is to treat your boot driver as a non-boot driver while you are in the development phase. Remove the OSBundleRequired
property from your driver’s Info.plist
file and use the techniques described in this chapter to make sure the driver is performing all its functions correctly before you declare it to be a boot driver.
After you’ve thoroughly tested your driver, add the OSBundleRequired
property to its Info.plist
(see the document Loading Kernel Extensions at Boot Time to determine which value your driver should declare). This will cause the BootX booter to load your driver into memory during the boot process.
If your boot driver does have bugs you were unable to find before, you cannot use gdb
to debug it because it is not possible to attach to a computer while it is booting. Instead, you must rely on IOLog
output to find out what is happening. IOLog
is synchronous when you perform a verbose boot so you can use IOLog
statements throughout your boot driver’s code to track down the bugs. See Using IOLog for more information on this function.
To perform a verbose boot, reboot holding down both the Command
and V
keys. To get even more detail from the I/O Kit, you can set a boot-args
flag before rebooting. Assuming root
privileges with the sudo
command, type the following on the command line
%sudo nvram boot-args=”io=0xffff” |
Password: |
%shutdown -r now |
Although this technique produces voluminous output, it can be difficult to examine because it scrolls off the screen during the boot process. If your boot driver does not prevent the system from completing the boot process, you can view the information in its entirety in the system log at /var/log/system.log
.
Logging
Logging, the capture and display of information at certain points in the code, is useful in some situations as a debugging tool because it provides a linear record of what happens during the execution of driver code. This section describes the I/O Kit’s primary logging function, IOLog
, and discusses ways you can go about creating your own logging facility.
Using IOLog
It is natural to rely on printf
statements to display what’s going on in your application code during the debugging phase. For in-kernel drivers, the I/O Kit equivalent to the printf
statement is the IOLog
function, defined in /System/Library/Frameworks/Kernel.framework/Headers/IOKit/IOLib.h
. Unfortunately, IOLog
is only marginally useful for debugging kernel-resident drivers.
Because it executes inside the kernel, IOLog
is necessarily resource-frugal. The message buffer is not large and if you use IOLog
to log I/O activity, for example, the older messages are likely to be overwritten by newer ones, erasing the information you need for debugging. In general, you cannot successfully use IOLog
in any tight loop in your driver’s code, whether they are I/O loops or unintentional loops caused by errors in the code. The only way to ensure you get all the information you need in such a situation is to allocate your own buffer and create your own logging facility to write to it (for more information on this technique, see Custom Event Logging ).
If your driver is causing a panic, IOLog
will not help much in determining the cause. Because the IOLog
function is asynchronous, when the computer panics, the last IOLog
message is still waiting to be scheduled for output so you will miss the message containing information about the cause of the panic.
You cannot call IOLog
from an interrupt context. When you create an IOFilterInterruptEventSource object, you send it a filter function and an action function. You cannot use IOLog
in your filter routine because it runs at primary (hardware) interrupt time when very little of the system is available. You can, however, call IOLog
in your action routine because it runs on the work loop the event source is attached to. Note that when you create an IOInterruptEventSource object, you send it an action function that runs on the event source’s work loop, but no filter function. As with the action function for an IOFilterInterruptEventSource object, you can call IOLog
in the action function for an IOInterruptEventSource object.
Although the IOLog
function is not well-suited for use in the debugging process, it is useful for logging normal status and error messages. For example, the PhantomAudioDevice class (contained in the PhantomAudioDriver project at /Developer/Examples/Kernel/IOKit/Audio/PhantomAudioDriver
) uses IOLog
to log status in several of its methods. Listing 7-12 shows partial listings of PhantomAudioDevice’s createAudioEngines
and volumeChanged
methods.
Listing 7-12 Using IOLog
in PhantomAudioDevice methods
bool PhantomAudioDevice::createAudioEngines() |
{ |
... |
IOLog("PhantomAudioDevice[%p]::createAudioEngines()\n", this); |
audioEngineArray = OSDynamicCast(OSArray, |
getProperty(AUDIO_ENGINES_KEY)); |
if (audioEngineArray) { |
/* Create new audio engine with AudioEngine array in personality. */ |
... |
} |
else { |
IOLog("PhantomAudioDevice[%p]::createAudioEngines() - Error: no |
AudioEngine array in personality.\n", this); |
goto Done; |
} |
... |
} |
IOReturn PhantomAudioDevice::volumeChanged(IOAudioControl *volumeControl, |
SInt32 oldValue, SInt32 newValue) |
{ |
IOLog("PhantomAudioDevice[%p]::volumeChanged(%p, %ld, %ld)\n", this, |
volumeControl, oldValue, newValue); |
... |
} |
Custom Event Logging
You can pepper your code with IOLog
or printf
calls as a debugging technique, but, as noted in the previous section (Using IOLog), this approach has its limitations. Some drivers are especially sensitive to any impact on performance, and require more reliable and finer-grained logging data. If your driver falls into this category, you can create your own logging facility.
There are various ways to implement a custom debug-logging solution that avoids the drawbacks of IOLog
. You would need to allocate your own buffer for you log entries, implement functions and macros for writing those entries to the buffer, and devise some means for examining the contents of the buffer.
For purposes of illustration, this section looks at the logging feature of the AppleGMACEthernet driver. You can find the source code for this driver (including the logging tools) in the Darwin source code for OS X version 10.2.7 and above, available at http://www.opensource.apple.com/darwinsource/index.html. To find the source code, select a version of OS X equal to or greater than v10.2.7 and click Source (choose the source for the PPC version, if there's a choice). This displays a new page, which lists the open source projects available for the version of OS X you've chosen. Scroll down to AppleGMACEthernet and click it to view the source. Be prepared to supply your ADC member name and password.
The AppleGMACEthernet driver implements a kernel-resident logging infrastructure as well as a user-space tool to print the formatted contents of a log buffer. The tool used by this driver prints a report to the standard output that begins with header information followed by time-stamped log entries. You can redirect the output to a file and then add comments to the entries in the file. Listing 7-13 shows the first several seconds of the driver’s life (along with comments).
Listing 7-13 Sample output of a custom event log
0 1376a020 13779fe0 1376a020 [ffffffe0] |
27 38e9000 0 1 |
20: 39 291159 0 0 eNet |
30: 39 291266 19aec90 181c200 Strt |
40: 39 293857 19aec90 18a4000 KeyL |
50: 39 293899 100 100 =txq |
60: 39 293903 100 100 =txe |
70: 39 293907 40 40 =rxe |
80: 39 293908 100 1000040 parm |
90: 39 307103 1000 135f0000 =TxR |
a0: 39 307111 400 198b000 =RxR |
b0: 39 307119 17f4400 19c8f00 arys |
c0: 39 307120 19aec90 0 AttI |
d0: 39 307232 19aec90 0 RwPM |
e0: 39 307247 19aec90 2 mx4d |
f0: 39 307249 19aec90 2 ip4d |
100: 39 307316 19aec90 1 Pwr! |
110: 39 307436 19aec90 198e400 cfig |
120: 39 307465 19aec90 0 AttD |
130: 39 307598 19aec90 0 powr |
140: 39 307599 19aec90 0 RegS |
150: 39 310366 19aec90 0 Exit end of start method |
160: 46 944895 191d400 198e400 NetE |
170: 46 944899 191d400 0 Wake wake up ethernet cell |
180: 46 944938 0 0 +Clk |
190: 46 945093 199a5c0 1ee21000 Adrs |
1a0: 46 945096 19c8f00 198b000 IRxR |
1b0: 46 946111 191d400 135f0000 ITxR |
1c0: 46 946127 3 1010 wReg |
1d0: 46 946135 0 ff ChpI chip initialization |
1e0: 46 946136 4 9050 |
The report header consists of two lines. The first line identifies any flag in effect and gives the start, ending, and current addresses of the log buffer. The second line indicates the period (in seconds) spanned by the log entries, shows the physical address of the buffer, and tallies the numbers of alerts and buffer wrap-arounds since the driver started up.
Each log entry is identified by an offset (in hexadecimal) from the start of the buffer and a two-part time-stamp. The first part is the time in seconds since the driver started executing and the second is the time in microseconds (actually nanoseconds right-shifted by 10 to closely approximate microseconds). The final column contains four-character codes with special significance to the AppleGMACEthernet driver. The intermediate two columns show parameters whose value depends on the context indicated by the code.
The AppleGMACEthernet driver (in the UniEnet class) implements logging in a way that minimizes its effect on performance. The major steps it undertakes are the following:
It allocates a 64 kilobyte wrap-around buffer to hold the logging data and defines global pointers to that buffer.
It implements two low-level functions to log events to the buffer, one for general events and the other for alerts; alerts are supersets of events which first perform the log, then the
IOLog
, and then whatever you customize it to do.For convenience, it also defines macros built on these functions.
It inserts macro calls at critical points in the code.
It implements a user-space tool to format and print the contents of the log buffer.
It implements a user client to send the buffer data to the tool.
Listing 7-14 shows the definitions of the event-log and alert functions and macros.
Listing 7-14 Definition of logging macros and functions
#if USE_ELG /* (( */ |
#define ELG(A,B,ASCI,STRING) EvLog( (UInt32)(A), (UInt32)(B), |
(UInt32)(ASCI), STRING ) |
#define ALERT(A,B,ASCI,STRING) Alert( (UInt32)(A), (UInt32)(B), |
(UInt32)(ASCI), STRING ) |
void EvLog( UInt32 a, UInt32 b, UInt32 ascii, char* str ); |
void Alert( UInt32 a, UInt32 b, UInt32 ascii, char* str ); |
#else /* ) not USE_ELG: ( */ |
#define ELG(A,B,ASCI,S) |
#define ALERT(A,B,ASCI,STRING) IOLog( "UniNEnet: %8x %8x " STRING "\n", |
(unsigned int)(A), (unsigned int)(B) ) |
#endif /* USE_ELG )) */ |
If you’re curious, see the UniEnet class source for the implementations of EvLog
and Alert
.
The AppleGMACEthernet driver calls the ELG
and ALERT
macros at appropriate points in the source code. In the example in Listing 7-15, the ELG
macro is called just after the configureInterface
member function is invoked; the event code is ‘cfig’
and the parameters are the addresses of the current thread and the IONetworkInterface object passed in. (The final string parameter is printed in alerts and via IOLog
if the custom debugging feature is disabled.)
Listing 7-15 Calling the logging macro
bool UniNEnet::configureInterface( IONetworkInterface *netif ) { |
IONetworkData *nd; |
ELG( IOThreadSelf(), netif, 'cfig', "configureInterface" ); |
// ... |
The report shown in Listing 7-13 includes a log entry from this macro call, illustrating how it might appear (it’s the sixteenth one).
Although it might be convenient to have a tool that nicely prints the contents of the log buffer, there is considerable time and effort involved in developing this tool and the user-client object that must feed it the logging data. Instead of a user-space tool, you can examine the contents of the log buffer in gdb
. The contents of the buffer, however, will require more effort on your part to interpret.
Copyright © 2002, 2009 Apple Inc. All Rights Reserved. Terms of Use | Privacy Policy | Updated: 2009-08-14