Solaris Core Analysis, Part 1: mdb

来源：互联网发布：洛奇英雄传优化补丁编辑：程序博客网时间：2024/06/10 07:06

Solaris is one of the most stable operating systems available... butlets face it, stuff happens. Solaris does panic, but I want everyone tobe clear, a "panic", despite the seemingly contradictory name, is byits nature a controlled event. When the kernel encounters behavior thatis uncorrectable and will cause irreparable harm to the running systemor, even worse, corrupt data, the system will voluntarily tap out usingthe panic system call to get the system down quickly, hopefully leavinga core dump in its wake for post-mortem analysis.

In this blog entry we'll discuss core dumps and panic's in general.In part 2 we'll discuss a tool to make life just a little easier, theSolaris Crash Analysis Tool, or "Solaris CAT".

I want to point out that post-mortem core analysis is reallythe task of a kernel engineer. The fact is, way less than 1% of us whoever engage in core analysis are actually going to have any real ideaof what the hell we're doing. And thats ok! I guarantee that you'llpost something from an analysis to a mailing list and you'll get someasshole who forgets that he's been paid to work on the Solaris kernelfor the last 20 years while you work a job which is now on hold becauseof said core dump, with replies like "We can clearly see that due tothe memory address in this register that you are a moron...." The pointhere is, if you don't know what your doing, don't be discouraged. Whatwe, mere mortals, are trying to do is not necessarily solve the problembut provide clues which will help us guide our search, either byposting a stack trace to a mailing list, or send the dump to SunSupport, or to take a panic string and search the bug database or google for. The cuddletech rule of crashes is:

An unexpected crash is unacceptable; An unexplained crash is inexcusable.

If you're reading this you've probably lived through a panic before,but lets recap. The best explanation of a "crash" event and resultingdump can be found in the dumpadm(1M) man page:

     A crash dump is a disk copy of the physical memory
     of  the computer at the time of a fatal system error. When a
     fatal operating system error occurs,  a  message  describing
     the  error  is  printed to the console. The operating system
     then generates a crash dump by writing the contents of  phy-
     sical  memory to a predetermined dump device, which is typi-
     cally a local disk partition. The dump device can be config-
     ured  by way of dumpadm.  Once the crash dump has been writ-
     ten to the dump device, the system will reboot.

     Fatal operating system errors can be caused by bugs  in  the
     operating system, its associated device drivers and loadable
     modules, or by faulty  hardware.  Whatever  the  cause,  the
     crash  dump  itself  provides invaluable information to your
     support engineer to aid in diagnosing the problem. As  such,
     it  is  vital  that the crash dump be retrieved and given to
     your support provider. Following an operating system  crash,
     the  savecore(1M)  utility  is executed automatically during
     boot to retrieve the crash dump from the  dump  device,  and
     write it to a pair of files in your file system named unix.X
     and vmcore.X, where X is an integer  identifying  the  dump.
     Together,  these  data  files form the saved crash dump. The
     directory in which the crash dump is  saved  on  reboot  can
     also be configured using dumpadm.

I encourage you to read both the savecore(1M) and dumpadm(1M) man pages. You'll find that with savecore -Lyou can create a dump of a live system, so if you don't have a crashedsystem around to play with, use that. Alternatively, you can use reboot -d to dump a core and reboot.

At this point we'll assume you have a dump available. By default you'll find them in /var/crash/hostname/,you'll have dumps in pairs: vmcore.0 and unix.0. We feed these twofiles to mdb, the (-k, kernel) Modular DeBugger, to preform ouranalysis like so:

# mdb -k unix.0 vmcore.0 
Loading modules: [ unix krtld genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp ufs ip sctp usba lofs zfs random ipc md fcip fctl fcp crypto logindmux ptm nfs ]
>

You are now free to move about the dump. mdb commands are strangeand unusual at first, it takes a lot of time to get comfortable withit, but there are a couple of debugger commands that can give us theessence of what we need. Lets walk through them.

The ::status command will display high level informationregarding this debugging session. Of usefulness here is the dumps"panic message" and OS release.

> ::status
debugging crash dump vmcore.0 (64-bit) from hostname
operating system: 5.11 snv_43 (i86pc)
panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffe80000ad3d0 addr=0 occurred in module "unix" due to a NULL pointer dereference
dump content: kernel pages only

The ::stack command will prove you with a stack trace, this is the same thing trace you would have seen in syslog or the console.

> ::stack
atomic_add_32()
nfs_async_inactive+0x55(fffffe820d128b80, 0, ffffffffeff0ebcb)
nfs3_inactive+0x38b(fffffe820d128b80, 0)
fop_inactive+0x93(fffffe820d128b80, 0)
vn_rele+0x66(fffffe820d128b80)
snf_smap_desbfree+0x78(fffffe8185e2ff60)
dblk_lastfree_desb+0x25(fffffe817a30f8c0, ffffffffac1d7cc0)
dblk_decref+0x6b(fffffe817a30f8c0, ffffffffac1d7cc0)
freeb+0x89(fffffe817a30f8c0)
tcp_rput_data+0x215f(ffffffffb4af7140, fffffe812085d780, ffffffff993c3c00)
squeue_enter_chain+0x129(ffffffff993c3c00, fffffe812085d780, fffffe812085d780, 1, 1)
ip_input+0x810(ffffffffa23eec68, ffffffffaeab8040, fffffe812085d780, e)
i_dls_link_ether_rx_promisc+0x266(ffffffff9a4c35f8, ffffffffaeab8040, fffffe812085d780)
mac_rx+0x7a(ffffffffa2345c40, ffffffffaeab8040, fffffe812085d780)
e1000g_intr+0xf6(ffffffff9a4b2000)
av_dispatch_autovect+0x83(1a)
intr_thread+0x50()

The ::msgbuf command will output the message buffer at thetime of crash; the message buffer is most commonly used by sysadminsthrough the "dmesg" command.

> ::msgbuf
MESSAGE                                                               
....
WARNING: IP: Hardware address '00:14:4f:xxxxxxx' trying to be our address xxxx
WARNING: IP: Hardware address '00:14:4f:xxxx' trying to be our address xxxx

panic[cpu0]/thread=fffffe80000adc80: 
BAD TRAP: type=e (#pf Page fault) rp=fffffe80000ad3d0 addr=0 occurred in module "unix" due to a NULL pointer dereference

sched: 
#pf Page fault
Bad kernel fault at addr=0x0
.... blah blah, snipped for brevity.

The ::panicinfo command will give you lots of fun crypticcounter information, of most interest is the first 3 lines, whichcontain the CPU on which the panic occured, the running thread, and thepanic message. You'll notice these are commonly repeated and the mostuseful pieces of information.

> ::panicinfo
             cpu                0
          thread fffffe80000adc80
         message BAD TRAP: type=e (#pf Page fault) rp=fffffe80000ad3d0 addr=0 occurred in module "unix" due to a NULL pointer dereference
             rdi                0
             rsi                1
             rdx fffffe80000adc80
             rcx                0
              r8                0
              r9 fffffe80dba125c0
             rax                0
             rbx fffffe8153a36040
             rbp fffffe80000ad4e0
             r10              3e0
             r10              3e0
             r11 ffffffffaeab8040
             r12 ffffffffb7b4cac0
             r13                0
             r14 fffffe820d128b80
             r15 ffffffffeff0ebcb
          fsbase ffffffff80000000
          gsbase fffffffffbc27850
              ds               43
              es               43
              fs                0
              gs              1c3
          trapno                e
             err                2
             rip fffffffffb838680
              cs               28
          rflags            10246
             rsp fffffe80000ad4c8
              ss                0
          gdt_hi                0
          gdt_lo         defacedd
          idt_hi                0
          idt_lo         80300fff
             ldt                0
            task               60
             cr0         80050033
             cr2                0
             cr3        10821b000

In my opinion, the koolest command is ::cpuinfo -v. Truthbe told, if you run multiple applications on a server the most commonquestion people (especially managers) want answered is "whichapplication did it?", being translated into geek-esse "who do I blame?"This command will help you determine that by displaying, complete withbeautiful ASCII art, the threads and process names running on each CPU(NRUN). In the following example, we know the event occured on CPU 0,thus thats the one we want to look at. Note that the "sched" processshould be interpreted as "kernel".

>  ::cpuinfo -v
 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  0 fffffffffbc2f370  1b    1    0 165   no    no t-1    fffffe80000adc80 sched
                       |    |    |
            RUNNING <--+    |    +--> PIL THREAD
              READY         |           6 fffffe80000adc80
             EXISTS         |           - fffffe80daab6a20 ruby
             ENABLE         |
                            +-->  PRI THREAD           PROC
                                   99 fffffe8000b88c80 sched

 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  1 ffffffff983b3800  1f    1    0  59  yes    no t-0    fffffe80daac2f20 smtpd
                       |    |
            RUNNING <--+    +-->  PRI THREAD           PROC
              READY                99 fffffe8000bacc80 sched
           QUIESCED
             EXISTS
             ENABLE

 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  2 ffffffff9967a800  1f    2    0  -1   no    no t-0    fffffe8000443c80
 (idle)
                       |    |
            RUNNING <--+    +-->  PRI THREAD           PROC
              READY                99 fffffe8000b82c80 sched
           QUIESCED                60 fffffe80018f8c80 sched
             EXISTS
             ENABLE

 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  3 ffffffff9967a000  1f    1    0  -1   no    no t-0    fffffe8000535c80
 (idle)
                       |    |
            RUNNING <--+    +-->  PRI THREAD           PROC
              READY                60 fffffe8000335c80 zsched
           QUIESCED
             EXISTS
             ENABLE

The ::ps command allows us to see all running processes. Several flags are supported, including -z to display Zone ID's.

> ::ps -z
S    PID   PPID   PGID    SID  ZONE    UID      FLAGS             ADDR NAME
R      0      0      0      0     0      0 0x00000001 fffffffffbc25900 sched
R      3      0      0      0     0      0 0x00020001 ffffffff9970d928 fsflush
R      2      0      0      0     0      0 0x00020001 ffffffff9970e558 pageout
R      1      0      0      0     0      0 0x42004000 ffffffff9970f188 init
R  20534      1  20533  20533    24   1006 0x42010400 ffffffffb246f9b8 ruby
R  20532      1  20531  20531    24   1006 0x42010400 fffffe8109674308 ruby
R  20529      1  20528  20528    24   1006 0x42010400 fffffe80dc5602f0 ruby
...

We can use ::pgrep to search for processes and use theappropriate address for further digging. In the following example I'llfind a Java process and then determine which zone that process wasrunning in:

> ::pgrep java
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R   3628      1   3620   3574      0 0x42004400 fffffe80deeb3240 java
> fffffe80deeb3240::print proc_t p_zone->zone_name
p_zone->zone_name = 0xffffffffae0cef00 "testzone03"

There are many more tools and way to dig into your dumps using mdb.It can be confusing because you need to reference things by address,but you get more comfortable with it as you play around. If you areinterested in learning more I highly recommend reading Eric Lowe's "Examining the Anatomy of a Process", which digs into the topic of process examination via mdb.

One thing you'll notice in all this is that the messages at the time ofcrash on the console or in syslog contain almost everything you need toknow without digging too deeply. Therefore, assuming you have thosemessages, the most useful thing most people will extract from the corefiles is the output of the ::cpuinfocommand to see what process was on the offending CPU at the time of thecrash. Knowing what processes, zones, etc, were running at the time ofcrash are interesting but rarely mean much if they weren't directlyinvolved in the panic.

As I said, once you start getting into referencing memory addressesto deepen your analysis things get sticky and tricky very veryquickly... thats where Solaris CAT comes in, which we'll talk about inpart 2.