Tuesday, September 23, 2025

URL




DB MONITORING SCRIPTS DBACLASS

Tales From A Lazy Fat DBA

GitHub - fatdba/Oracle-Database-Scripts: My Oracle DB Scripts

DBATracker

oracle-developer (oracle-developer.net) · GitHub

ORACLE RAC STARTUP SEQUENCE - Oracle Consulting Services | USA | 99% Customer Retention | Doyensys

OracleView

download Oracle database virtual machines

Databases Are Fun – dohdatabase.com

Oracle Scratchpad | Just another Oracle weblog

Tanel Poder Consulting

GoldenGate Archives -

Home - All Things Oracle    (Oracle Patch)

https://database-heartbeat.com/ 

Oracle Exadata Command Reference  

Understanding Exadata disk layout | Page On-Call DBA   Exadata

Step By Step Patching an Exadata cell node  Exadata Storage Cell Patching

OracleDBPro - Pini Dibask Blog: Data Protection    Flashback 

Oracle Wait Events Cheat Sheet - Performance Tuning Quick Reference | Oracle DBA Scripts & Database Utilities     Wait Events

SQL Plan Baseline, SQL Patch, SQL Profile: Differences and Use Cases – Osman’s DBlog  SQL PROFILE ,SQL PATCH ,SQL BASE LINE

In Oracle RAC (Real Application Clusters), the components CSSD, CRS, EVMD, and DIAG are core parts of the Clusterware stack. They work together to ensure cluster stability, node coordination, failover, and diagnostics.

1. CSSD (Cluster Synchronization Services Daemon)

🔹 What it is

CSSD (ocssd.bin) is the heartbeat and cluster membership manager.


🔹 Core Responsibilities

✅ Node Membership Management

  • Tracks which nodes are alive or dead
  • Maintains the cluster node list

✅ Heartbeat Mechanism

Uses two types:

  • Network heartbeat → via private interconnect
  • Disk heartbeat → via voting disks

👉 If a node misses heartbeat → it is considered dead.


🔹 Split-Brain Prevention

This is CSSD’s most critical job.

  • Uses voting disks
  • Requires majority quorum
  • If a node loses quorum → it gets evicted (rebooted)

🔹 Failure Scenario

Example:

  • Node loses network connectivity
  • CSSD checks voting disks
  • If quorum lost → node is killed to protect data integrity

🔹 Key Files / Logs

$GRID_HOME/log/<node>/cssd/ocssd.log

🔹 Key Process

ocssd.bin


⚙️ 2. CRS (Cluster Ready Services)

🔹 What it is

CRS (crsd.bin) is the resource manager of Oracle Clusterware.


🔹 Core Responsibilities

✅ Resource Management

Manages:

  • Databases
  • Instances
  • Listeners
  • VIPs
  • ASM

✅ Start/Stop Resources

  • Starts resources in correct order
  • Handles dependencies

Example:

ASM → Database → Services

✅ Failover Management

  • If a resource fails → CRS restarts it
  • If node fails → relocates resources to another node

🔹 Resource Dependency Example

Database depends on ASM
Listener depends on network

🔹 Logs

$GRID_HOME/log/<node>/crsd/crsd.log

🔹 Key Process

crsd.bin

📡 3. EVMD (Event Manager Daemon)

🔹 What it is

EVMD (evmd.bin) is the event notification system.


🔹 Core Responsibilities

✅ Event Publishing

  • Publishes cluster events:
    • Node up/down
    • Instance start/stop
    • Failover events

✅ Event Subscription

  • Applications/scripts can subscribe to events

✅ FAN (Fast Application Notification)

  • Sends events to clients (e.g., JDBC, OCI)
  • Helps apps react instantly to failures

🔹 Example Use Case

  • Node crashes
  • EVMD sends event
  • Application connection pool drops dead connections immediately

🔹 Logs

$GRID_HOME/log/<node>/evmd/evmd.log

🔹 Key Process

evmd.bin

🩺 4. DIAG (Diagnostic Daemon)

🔹 What it is

DIAG (diagdaemon / integrated diag framework) is responsible for diagnostics and health monitoring.


🔹 Core Responsibilities

✅ Log Collection

  • Collects logs from all cluster components

✅ Health Monitoring

  • Tracks component health
  • Works with Cluster Health Monitor (CHM)

✅ Incident Detection

  • Detects critical issues
  • Generates trace files and dumps

🔹 Integration

  • Works with:
    • ADR (Automatic Diagnostic Repository)
    • Trace infrastructure

🔹 Logs Location

$GRID_HOME/log/<node>/diag/

🔗 How They Work Together

🔄 Startup Flow

  1. OHASD starts
  2. CSSD starts → establishes cluster membership
  3. CRS starts → manages resources
  4. EVMD starts → enables event system
  5. DIAG runs in background

🔄 Failure Flow Example

Node Crash:

  1. CSSD detects heartbeat loss
  2. Node evicted
  3. CRS relocates resources
  4. EVMD sends notifications
  5. DIAG logs everything

🧩 Architecture Relationship

OHASD (Oracle High Availability Service)

├── CSSD → Cluster membership + heartbeat
├── CRS → Resource management
├── EVMD → Event system
└── DIAG → Diagnostics

🚨 Quick Comparison Table

ComponentRoleCritical Function
CSSDCluster controlNode membership, heartbeat
CRSResource managerStart/stop/failover
EVMDEvent systemFAN notifications
DIAGDiagnosticsLogs, health, incidents

🛠️ Important Commands

Check cluster status

crsctl stat res -t

Check cluster health

crsctl check cluster -all

Check CSS

crsctl check cssd


No comments:

Post a Comment