Subscribe YouTube Channel For More Live Tutorials

Oracle RAC Node Evictions — Top/Common Causes and Factors

Oracle RAC Node Evictions — Top/Common Causes and Factors

The following are only a few of the most common symptoms/factors that lead to node evictions, cluster stack sudden
death, reboots, and status going unhealthy:

• Network disruption, latency, or missing network heartbeats
• Delayed or missing disk heartbeats
• Corrupted network packets on the network may also cause CSS reboots on certain platforms
• Slow interconnect or failures
• Known Oracle Clusterware bugs
• Unable to read/write or access the majority of the voting disks (files)
• Lack of sufficient resource (CPU/memory starvation) availability on the node for OS scheduling by key CRS daemon processes
• Manual termination of the critical cluster stack daemon background processes (css, cssdagent, cssdmonitor)
• No space left on the device for the GI or /var file system
• Sudden death or hang of CSSD process
• ORAAGENT/ORAROOTAGENT excessive resource (CPU, MEMORY, SWAP) consumption resulting in node eviction on specific OS platforms

 

Which Process evicts nodes from the RAC Cluster?

CSSD Monitors and evicts nodes.

Monitors nodes using 2 communication Channels :

- Private Interconnect <=> Network Heartbeat

- Voting Disk Based communication <=> Disk Heartbeat

Evicts (forcibly removes nodes from a cluster) nodes dependent on heartbeat feedback(failures)

Why are Nodes evicted?

Evicting (fencing) nodes is a preventive measure(a good thing)!

Nodes are evicted to prevent consequences of a split brain:

- Shared data must not be written by independently operating nodes.

- The easiest way to prevent this is to forcibly remove a node from the cluster.

Node eviction
Node eviction