https://database-heartbeat.com/

Understanding Exadata disk layout | Page On-Call DBA Exadata

Step By Step Patching an Exadata cell node Exadata Storage Cell Patching

OracleDBPro - Pini Dibask Blog: Data Protection Flashback

Oracle Wait Events Cheat Sheet - Performance Tuning Quick Reference | Oracle DBA Scripts & Database Utilities Wait Events

SQL Plan Baseline, SQL Patch, SQL Profile: Differences and Use Cases – Osman’s DBlog SQL PROFILE ,SQL PATCH ,SQL BASE LINE

In Oracle RAC (Real Application Clusters), the components CSSD, CRS, EVMD, and DIAG are core parts of the Clusterware stack. They work together to ensure cluster stability, node coordination, failover, and diagnostics.

1. CSSD (Cluster Synchronization Services Daemon)

🔹 What it is

CSSD (ocssd.bin) is the heartbeat and cluster membership manager.

🔹 Core Responsibilities

✅ Node Membership Management

Tracks which nodes are alive or dead
Maintains the cluster node list

✅ Heartbeat Mechanism

Uses two types:

Network heartbeat → via private interconnect
Disk heartbeat → via voting disks

👉 If a node misses heartbeat → it is considered dead.

🔹 Split-Brain Prevention

This is CSSD’s most critical job.

Uses voting disks
Requires majority quorum
If a node loses quorum → it gets evicted (rebooted)

🔹 Failure Scenario

Example:

Node loses network connectivity
CSSD checks voting disks
If quorum lost → node is killed to protect data integrity

🔹 Key Files / Logs


$GRID_HOME/log/<node>/cssd/ocssd.log

🔹 Key Process


ocssd.bin


⚙️ 2. CRS (Cluster Ready Services)
🔹 What it is
CRS (crsd.bin) is the resource manager of Oracle Clusterware.

🔹 Core Responsibilities
✅ Resource Management
Manages:


Databases


Instances


Listeners


VIPs


ASM



✅ Start/Stop Resources


Starts resources in correct order


Handles dependencies


Example:
ASM → Database → Services


✅ Failover Management


If a resource fails → CRS restarts it


If node fails → relocates resources to another node



🔹 Resource Dependency Example
Database depends on ASM
Listener depends on network


🔹 Logs
$GRID_HOME/log/<node>/crsd/crsd.log


🔹 Key Process
crsd.bin


📡 3. EVMD (Event Manager Daemon)
🔹 What it is
EVMD (evmd.bin) is the event notification system.

🔹 Core Responsibilities
✅ Event Publishing


Publishes cluster events:


Node up/down


Instance start/stop


Failover events





✅ Event Subscription


Applications/scripts can subscribe to events



✅ FAN (Fast Application Notification)


Sends events to clients (e.g., JDBC, OCI)


Helps apps react instantly to failures



🔹 Example Use Case


Node crashes


EVMD sends event


Application connection pool drops dead connections immediately



🔹 Logs
$GRID_HOME/log/<node>/evmd/evmd.log


🔹 Key Process
evmd.bin


🩺 4. DIAG (Diagnostic Daemon)
🔹 What it is
DIAG (diagdaemon / integrated diag framework) is responsible for diagnostics and health monitoring.

🔹 Core Responsibilities
✅ Log Collection


Collects logs from all cluster components



✅ Health Monitoring


Tracks component health


Works with Cluster Health Monitor (CHM)



✅ Incident Detection


Detects critical issues


Generates trace files and dumps



🔹 Integration


Works with:


ADR (Automatic Diagnostic Repository)


Trace infrastructure





🔹 Logs Location
$GRID_HOME/log/<node>/diag/


🔗 How They Work Together
🔄 Startup Flow


OHASD starts


CSSD starts → establishes cluster membership


CRS starts → manages resources


EVMD starts → enables event system


DIAG runs in background



🔄 Failure Flow Example
Node Crash:


CSSD detects heartbeat loss


Node evicted


CRS relocates resources


EVMD sends notifications


DIAG logs everything



🧩 Architecture Relationship
OHASD (Oracle High Availability Service)
   │
   ├── CSSD  → Cluster membership + heartbeat
   ├── CRS   → Resource management
   ├── EVMD  → Event system
   └── DIAG  → Diagnostics


🚨 Quick Comparison Table
Component Role Critical Function
CSSD Cluster control Node membership, heartbeat
CRS Resource manager Start/stop/failover
EVMD Event system FAN notifications
DIAG Diagnostics Logs, health, incidents

🛠️ Important Commands
Check cluster status
crsctl stat res -t

Check cluster health
crsctl check cluster -all

Check CSS
crsctl check cssd

Component	Role	Critical Function
CSSD	Cluster control	Node membership, heartbeat
CRS	Resource manager	Start/stop/failover
EVMD	Event system	FAN notifications
DIAG	Diagnostics	Logs, health, incidents

Tuesday, September 23, 2025

URL

1. CSSD (Cluster Synchronization Services Daemon)

🔹 What it is

🔹 Core Responsibilities

✅ Node Membership Management

✅ Heartbeat Mechanism

🔹 Split-Brain Prevention

🔹 Failure Scenario

🔹 Key Files / Logs

🔹 Key Process

⚙️ 2. CRS (Cluster Ready Services)

🔹 What it is

🔹 Core Responsibilities

✅ Resource Management

✅ Start/Stop Resources

✅ Failover Management

🔹 Resource Dependency Example

🔹 Logs

🔹 Key Process

📡 3. EVMD (Event Manager Daemon)

🔹 What it is

🔹 Core Responsibilities

✅ Event Publishing

✅ Event Subscription

✅ FAN (Fast Application Notification)

🔹 Example Use Case

🔹 Logs

🔹 Key Process

🩺 4. DIAG (Diagnostic Daemon)

🔹 What it is

🔹 Core Responsibilities

✅ Log Collection

✅ Health Monitoring

✅ Incident Detection

🔹 Integration

🔹 Logs Location

🔗 How They Work Together

🔄 Startup Flow

🔄 Failure Flow Example

Node Crash:

🧩 Architecture Relationship

🚨 Quick Comparison Table

🛠️ Important Commands

Check cluster status

Check cluster health

Check CSS

No comments:

Post a Comment