Database hangs are characterised by a number of processes waiting for some other activities to complete. Typically there is one or more blockers that are stuck or perhaps working hard and not freeing resources quickly enough. In order to diagnose this the following diagnostics are needed:
A. Hanganalyze and Systemstate Dumps
B. AWR/Statspack snapshots of General database performance
C. Up to date RDA
Hanganalyze and Systemstate Dumps
Hanganalyze and Systemstate dumps provide information on the processes in the database at a specific point in time. Hanganalyze provides information on all processes involved in the hang chain, whereas systemstate provides information on all processes in the database. When looking at a potential hang situation, you need to determine whether a process is stuck or moving slowly. By collecting these dumps at 2 consecutive intervals this can be established. If a process is stuck, these traces also provide the information to start further diagnosis and possibly help to provide the solution.
-
Hanganalyze is a summary and will confirm if the db is really hung or just slow and provides a consistent snapshot.
-
Systemstate dump shows what each process on the database is doing
Collection commands for Hanganalyze and Systemstate: RAC with fixes for bug 11800959 and bug 11827088
sqlplus '/ as sysdba'
oradebug setorapname reco
oradebug unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 266
oradebug -g all dump systemstate 266
exit
Collection commands for Hanganalyze and Systemstate: RAC without fixes for Bug 11800959 and Bug 11827088
sqlplus '/ as sysdba'
oradebug setorapname reco
oradebug unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 258
oradebug -g all dump systemstate 258
exit
In RAC environment, a dump will be created for all RAC instances in the DIAG trace file for each instance.
Explanation of Hanganalyze and Systemstate Levels
Hanganalyze levels:
-
Level 3: In 11g onwards, level 3 also collects a short stack for relevant processes in hang chain
Systemstate levels:
-
Level 258 is a fast alternative but we'd lose some lock element data
-
Level 267 can be used if additional buffer cache / lock element data is needed with an understanding of the cost