Saturday, February 22, 2020

AWR



show parameter statistics_level 

SELECT statistics_name, activation_level, system_status
FROM v$statistics_level
order by statistics_name;


You need to set an interval or retention period for your AWR snapshots to values other than the default

SELECT EXTRACT(day from retention) || ':' ||
       EXTRACT(hour from retention) || ':' ||
       EXTRACT (minute from retention)  awr_snapshot_retention_period,
       EXTRACT (day from snap_interval) *24*60+
       EXTRACT (hour from snap_interval) *60+
       EXTRACT (minute from snap_interval) awr_snapshot_interval
FROM dba_hist_wr_control;


exec DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(retention=> ,interval => );

Generating an AWR Report Manually 

Path : $ORACLE_HOME/rdbms/admin

awrrpt.sql (Non RAC or RAC Single instance)
awrgrpt.sql (RAC All instance 

select dbms_workload_repository.awr_report_text(l_dbid=> ,l_inst_num=> ,l_bid=>,l_eid=> ) from dual;

SELECT snap_id, begin_interval_time, end_interval_time
FROM DBA_HIST_SNAPSHOT
WHERE begin_interval_time > sysdate-1/6
ORDER BY 2;




Generating an AWR report for a Single SQL Statement

Path : $ORACLE_HOME/rdbms/admin

awrsqrpt.sql

Comparing Database Performance Between Two Periods

Path : $ORACLE_HOME/rdbms/admin

awrddrpt.sql

Creating a Statistical Baseline for your database

The following example illustrates how to create a baseline based on a known begin and end date and time for which the baseline will be created.



exec dbms_workload_repository.create_baseline( -
start_time=>to_date('2013-07-21:00:00:00','yyyy-mm-dd:hh24:mi:ss'), -
end_time=>to_date('2013-07-21:06:00:00','yyyy-mm-dd:hh24:mi:ss'), -
baseline_name=>'Batch Baseline #1');


If you want to have  a set expiration for a baseline , you can simple specify the retention period for baseline when creating it by using the EXPIRATION parameter , which is specified in days :

exec dbms_workload_repository.create_baseline( -
start_time=>to_date('2013-07-21:00:00:00','yyyy-mm-dd:hh24:mi:ss'), -
end_time=>to_date('2013-07-21:06:00:00','yyyy-mm-dd:hh24:mi:ss'), -
baseline_name=>'Batch Baseline #1', -
expiration=>30);


You can also create a baseline based on already created AWR snapshot IDs. In order to do this, you could run the create_baseline procedure as follows :

exec dbms_workload_repository.create_baseline( -
start_snap_id=>258,end_snap_id=>268,baseline_name=>'Batch Baseline #1', -
expiration=>30);


To modify the moving window baseline ,use the modify_baseline_window_size procedure of the dbms_workload_repository package, as in the following :

exec dbms_workload_repository.modify_baseline_window_size(30);

To get the Baseline information:

column baseline_name format a20
column baseline_id format 99 heading B_ID
column start_id heading STA
column end_id heading END
column expiration heading EXP
set lines 150 
SELECT baseline_id, baseline_name, start_snap_id start_id,
         TO_CHAR(start_snap_time, 'yyyy-mm-dd:hh24:mi') start_time,
         end_snap_id end_id,
         TO_CHAR(end_snap_time, 'yyyy-mm-dd:hh24:mi') end_time,
         expiration
  FROM dba_hist_baseline
  ORDER BY baseline_id;



column start_snap_id heading STA
column end_snap_id heading END
SELECT start_snap_id, start_snap_time, end_snap_id,
end_snap_time, pct_total_time pct 
FROM (SELECT * FROM
       TABLE(DBMS_WORKLOAD_REPOSITORY.select_baseline_details(4)));




column metric_name format a50
column average format 99999999.99
column maximum format 99999999.99
SELECT metric_name, average, maximum FROM
(SELECT * FROM TABLE
(DBMS_WORKLOAD_REPOSITORY.select_baseline_metric('SYSTEM_MOVING_WINDOW')))
where lower(metric_name) like '%read%'
order by metric_name;


Renaming Baseline :

exec dbms_workload_repository.rename_baseline -
      ('Batch Baseline #9','Batch Baseline #10');


Dropping Baseline :

exec dbms_workload_repository.drop_baseline('Batch Baseline #1');

Dropping AWR snapshots:

exec dbms_workload_repository.drop_snapshot_range(255,256);

Getting ASH Information from the Data Dictionary


SELECT min(sample_time) FROM dba_hist_active_sess_history;

SELECT DISTINCT wait_class FROM dba_hist_event_name order by 1;

If you wanted to see the datafiles associated with waits , which could help pinpoint access hot spots in your database , and you could perform some file management and reorganization activities to alleviate such contentions, you could issue the following query :

SELECT
  d.file_id file#, d.file_name, count(*) wait_count,  sum(s.time_waited) time_waited
FROM
  v$active_session_history s,
  dba_data_files d
WHERE
  d.file_id = s.current_file#
GROUP BY d.file_id, d.file_name
ORDER BY 3 desc;



If you wanted to see the top five events and their total wait time for activity with in past 15 minutes in your database , you could issue the following query :


select * from (
SELECT NVL(event, 'ON CPU') event, COUNT(*) total_wait_tm
FROM   v$active_session_history
WHERE  sample_time > SYSDATE - 15/(24*60)
GROUP BY event
ORDER BY 2 desc
)
where rownum <= 5;


If you wanted to get session-specific information , and wanted to see the top five sessions that were using the most CPU resources within the last 15 minutes, you could issue the following query : 

column username format a12
column module format a30

SELECT * FROM
(
SELECT s.username, s.module, s.sid, s.serial#, count(*)
FROM v$active_session_history h, v$session s
WHERE h.session_id = s.sid
AND   h.session_serial# = s.serial#
AND   session_state= 'ON CPU' AND
      sample_time > sysdate - interval '15' minute
GROUP BY s.username, s.module, s.sid, s.serial#
ORDER BY count(*) desc
)
where rownum <= 5;


For instance , if you wanted to get the users that were consuming the most resource for a given day when performance was poor ,you could issue the following query : 


SELECT * FROM
(
SELECT u.username, h.module, h.session_id sid,
       h.session_serial# serial#, count(*)
FROM dba_hist_active_sess_history h, dba_users u
WHERE h.user_id = u.user_id
AND   session_state= 'ON CPU'
AND  (sample_time between to_date('2013-07-25:00:00:00','yyyy-mm-dd:hh24:mi:ss')
AND   to_date('2013-07-25:23:59:59','yyyy-mm-dd:hh24:mi:ss'))
AND u.username != 'SYS'
GROUP BY u.username, h.module, h.session_id, h.session_serial#
ORDER BY count(*) desc
)
where rownum <= 5;


For instance , if you wanted to get the top table wait events for a given day when performance was poor ,you could issue the following query : 

SELECT * FROM
(
select * from (
SELECT o.object_name, o.object_type, s.event, s.time_waited
FROM dba_hist_active_sess_history s, dba_objects o
WHERE s.sample_time
between to_date('2013-07-27:00:00:00','yyyy-mm-dd:hh24:mi:ss')
AND   to_date('2013-07-27:23:59:59','yyyy-mm-dd:hh24:mi:ss')
AND s.current_obj# = o.object_id
ORDER BY 4 desc
)
WHERE rownum <= 5;



DBMS_ADDM

Summary of DBMS_ADDM Subprograms

Table 17-1 DBMS_ADDM Package Subprograms

SubprogramDescription

ANALYZE_DB Procedure

Creates an ADDM task for analyzing in database analysis mode and executes it

ANALYZE_INST Procedure

Creates an ADDM task for analyzing in instance analysis mode and executes it.

ANALYZE_PARTIAL Procedure

Creates an ADDM task for analyzing a subset of instances in partial analysis mode and executes it

COMPARE_CAPTURE_REPLAY_REPORT Function

Produces a Compare Period ADDM report comparing the performance of a capture to a replay

COMPARE_DATABASES Function

Produces a Compare Period ADDM report for a database-wide performance comparison

COMPARE_INSTANCES Function

Produces a Compare Period ADDM report for an instance-level performance comparison

COMPARE_REPLAY_REPLAY_REPORT Function

Produces a Compare Period ADDM report comparing the performance of a replay to another replay

DELETE Procedure

Deletes an already created ADDM task (of any kind)

DELETE_FINDING_DIRECTIVE Procedure

Deletes a finding directive

DELETE_PARAMETER_DIRECTIVE Procedure

Deletes a parameter directive

DELETE_SEGMENT_DIRECTIVE Procedure

Deletes a segment directive

DELETE_SQL_DIRECTIVE Procedure

Deletes a SQL directive

GET_ASH_QUERY Function

Returns a string containing the SQL text of an ASH query identifying the rows in ASH with impact for the finding

GET_REPORT Function

Retrieves the default text report of an executed ADDM task

INSERT_FINDING_DIRECTIVE Procedure

Creates a directive to limit reporting of a specific finding type.

INSERT_PARAMETER_DIRECTIVE Procedure

Creates a directive to prevent ADDM from creating actions to alter the value of a specific system parameter

INSERT_SEGMENT_DIRECTIVE Procedure

Creates a directive to prevent ADDM from creating actions to "run Segment Advisor" for specific segments

INSERT_SQL_DIRECTIVE Procedure

Creates a directive to limit reporting of actions on specific SQL

REAL_TIME_ADDM_REPORT Function

Produces a real-time report of ADDM activity



This contains information about the Database and environment. Along with the snapshot Ids and times. Important thing to notice is that the configuration like CPU and Memory has not changed when the performance is degraded.

 PARAMETER     DESCRIPTION ANALYSIS
 DB TIMETime spent in database during the Elapsed Time
OR
Sum of the time taken by all sessions in the database during the ‘Elapsed’ time.
DB Time= CPU Time + Non IDLE wait time.

Note: it does not include background processes
 DB TIME > Elapsed Time will mean that the sessions were active on database concurrently

You cam find the average active sessions during AWR Time:

DB TIME/ELAPSED  => 1964.97/899.99 = 2.18

So database load (average active sessions) = 2.18

(SAME IDEA AS CPU LOAD on UNIX)

It means that

May be ~2 users were active on database for ‘Elapsed’ Time.
or
May be 4 users were active for Elapsed Time/2 each
or
May be 8 users were active for Elapsed Time/4 each
or
May be 64 users were active for Elapsed Time/32 each

If DB Time has higher value means DB Activity/Sessions were High during the AWR Time.

AWR REPORT THAT WE WILL REVIEW BELOW IS BASED ON THIS DB TIME.

This means that for every minute of Elapsed time there is 2.2 minutes of work in done in the database

 

 ELAPSED TIMEThe time duration in which this AWR report has been generated.
Elapsed time should contain the issue duration.
Take manual snapshots if required
 CPUsThread count per core. It is not “actual” CPU. 
 STARTUP TIMEDatabase startup time 
 RAC If you have more than one node then take AWR from all nodes if you don’t know issues are happening in which node.
Load Profile

SIGNIFICANCE OF THIS SECTION:

Here in load profile (average active sessions, DB CPU, logical and physical reads, user calls, executions, parses, hard parses, logons, rollbacks, transactions) —

check if the numbers are consistent with each other and with general database profile (OLTP/DWH/mixed)

  • Pay most attention to physical reads, physical writes, hard parse to parse ratio and executes to transaction ratio.
  • The ratio of hard parses to parses tells you how often SQL is being fully parsed. Full parsing of SQL statements has a negative effect on performance.
  • High hard parse ratios (>2 – 3 percent) indicate probable bind variable issues or maybe versioning problems.
  • Rows per sort can also be reviewed here to see if large sorts are occurring.
  • This section can help in the load testing for application releases. You can compare this section for the baseline as well as high load situation.
 PARAMETER DESCRIPTION ANALYSIS
 Redo Size (Bytes)The main sources of redo are (in roughly descending order): INSERT, UPDATE and DELETE. For INSERTs and UPDATE sNot very scary number in our report

High redo figures mean that either lots of new data is being saved into the database, or existing data is undergoing lots of changes.

What do you do if you find that redo generation is too high (and there is no business reason for that)? Not much really — since there is no “SQL ordered by redo” in the AWR report.

Just keep an eye open for any suspicious DML activity. Any unusual statements? Or usual statements processed more usual than often? Or produce more rows per execution than usual? Also, be sure to take a good look in the segments statistics section (segments by physical writes, segments by DB block changes etc.) to see if there are any clues there.

 DB CPUIts the amount of CPU time spent on user calls. Same as DB time it does not include background process. The value is in microsecondsWe have 8 CORES and so we can potentially use 8 seconds of CPU time per second. In this case DB CPU (s) : 1.9 (per second) is reporting that the system
is using 1.9 seconds of CPU of the potential 8 seconds/second that it can use.
We are not CPU Bound
 LOGICAL READS Consistent gets + db block gets = Logical Reads

As a process, Oracle will try to see if the data is available in Buffer cache i.e. SGA?

If it does, then logical read increases to 1.

To explain a bit further, if Oracle gets the data in a block which is consistent with a given point in time, then a counter name “Consistent Gets” increases to 1.

But if the data is found in current mode, that is, the most up-to-date copy of the data in that block, as it is right now or currently then it increases a different counter name “db block Gets”.

Therefore, a Logical read is calculated as = Total number of “Consistent Gets” + Total number of “db block gets”.

These two specific values can be observed in ‘Instance Activity Stats’ section.

 Logical and physical reads combined shows measure of how many IO’s (Physical and logical) that the database is performing..

If this is high go to section “SQL by logical reads”. That may help in pointing which SQL is having more logical reads.

 USER QUERIES     Number of user queries generated 
 PARSES    The total of all parses, hard and soft 
 HARD PARSESThe parses requiring a completely new parse of the SQL statement. These consume both latches and shared pool area.How much hard parsing is acceptable?

It depends on too many things, like number of CPUs, number of executions, how sensitive are plans to SQL parameters etc. But as a rule of a thumb, anything below 1 hard parse per second is probably okay, and everything above 100 per second suggests a problem (if the database has a large number of CPUs, say, above 100, those numbers should be scaled up accordingly). It also helps to look at the number of hard parses as % of executions (especially if you’re in the grey zone).

If you suspect that excessive parsing is hurting your database’s performance:

1) check “time model statistics” section (hard parse elapsed time, parse time elapsed etc.)

2) see if there are any signs of library cache contention in the top-5 events

3) see if CPU is an issue.

 Soft Parses:Soft parses are not listed but derived by subtracting the hard parses from parses. A soft parse reuses a previous hard parse; hence it consumes far fewer resources. 
 Physical Reads     But if it Oracle does not find the data in buffer cache, then it reads it from physical block and increases then Physical read count to 1. Clearly, buffer get is less expensive than physical read because database has to work harder (and more) to get the data. Basically time it would have taken if available in buffer cache + time actually taken to find out from physical block. If this is high go to section “SQL by Physical reads”. That may help in pointing which SQL is having more Physical reads.
 Executes  (SQL)  If executes per second looks enormous then its a red flag.

Example: This numbers, combined with high CPU usage, are enough to suspect there MAY BE context switching as the primary suspect: a SQL statement containing a PL/SQL function, which executes a SQL statement hundreds of thousands of time per function call.

if it is high then it also suggests that most of the database load falls on SQL statements in PL/SQL routines.

User Callsnumber of calls from a user process into the database – things like “parse”, “fetch”, “execute”, “close” This is an extremely useful piece of information, because  it sets the scale for other statistics (such as commits, hard parses etc.).

In particular, when the database is executing many times per a user call, this could be an indication of excessive context switching (e.g. a PL/SQL function in a SQL statement called too often because of a bad plan). In such cases looking into “SQL ordered by executions” will be the logical next step.

 Logons    logons – really means what it means. Number of logonsEstablishing a new database connection is also expensive (and even more expensive in case of audit or triggers). “Logon storms” are known to create very serious performance problems. If you suspect that high number of logons is degrading your performance, check “connection management elapsed time” in “Time model statistics”.
 

Sorts

 Sort operations consume resources. Also, expensive sorts may cause your SQL fail because of running out of TEMP space.  So obviously, the less you sort, the better (and when you do, you should sort in memory). However, I personally rarely find sort statistics particularly useful: normally, if expensive sorts are hurting your SQL’s performance, you’ll notice it elsewhere first. 
 DB Time      average number of active sessions is simply DB time per second.
 Block Changes  Number of blocks modified during the sample interval                     

Instance Efficiency Percentage


Rule of thumb: Always minimize the number of Hard parses. This reduction yields the benefits of minimizing CPU overhead spent performing costly parse work.
Every ratio here should reach 100%

 PARAMETER DESCRIPTION ANALYSIS
In memory sort %Shows %of times Sorting operations happened in memory than in the disk (temporary tablespace).In Memory Sort being low (in the high 90s or lower) indicates PGA_AGGREGATE_TARGET or SORT_AREA_SIZE issues
soft parse %Shows % of times the SQL in shared pool is used. Shows how often sessions issued a SQL statement that is already in the shared pool and how it can use an existing version of that statement.Soft Parsing being low indicates bind variable and versioning issues. With 99.25 % for the soft parse meaning that about 0.75 % (100 – soft parse) is happening for hard parsing. Low hard parse is good for us.
 % Non-Parse CPU Oracle utilizes the CPU mostly for statement execution but not for parsing.If this value is near 100% means most of the CPU resources are used into operations other than parsing, which is good for database health.

Most of our statements were already parsed so we weren’t doing a lot of re parsing. Re parsing is high on CPU and should be avoided.

 Execute to Parse %Shows how often parsed SQL statements are reused without re-parsing. The way this ratio is computed, it will be a number near 100 percent when the application executes a given SQL statement many times over but has parsed it only once.

If the number of parse calls is near the number of execute calls, this ratio trends towards zero.

If the number of executes increase while parse calls remain the same this ratio trends up.

When this number is low, parsing is consuming CPU and shared pool latching.

 Parse CPU to Parse Elapsed %Gives the ratio of CPU time spent to parse SQL statements.If the value are low then it means that there could be a parsing problem. You may need to look at bind variable issues or shared pool sizing issue. If low it also means some bottleneck is there related to parsing. We would start by reviewing library cache contention and contention in shared pool latches. You may need to increase the shared pool.
 Buffer Hit %Measures how many times a required block was found in memory rather than having to execute an expensive read operation on disk to get the block. 
 Buffer Nowait%Indicates % of times data buffers were accessed directly without any wait time.This ratio relates to requests that a server process makes for a specific buffer. This is the percentage of those requests in which the requested buffer is immediately available. All buffer types are included in this statistic. If the ratio is low, check the Buffer Wait Statistics section of the report for more detail on which type of block is being contended. Most likely, additional RAM will be required.
 Library Hit%Shows % of times SQL and PL/SQL found in shared pool. Library hit % is great when it is near 100%. If this was under 95% we would investigate the size of the shared pool.

In this ration is low then we may need to:
• Increase the SHARED_POOL_SIZE init parameter.
• CURSOR_SHARING may need to be set to FORCE.
• SHARED_POOL_RESERVED_SIZE may be too small.
• Inefficient sharing of SQL, PLSQL or JAVA code.
• Insufficient use of bind variables

Latch Hit %Shows % of time latches are acquired without having to wait. If Latch Hit % is <99%, you may have a latch problem. Tune latches to reduce cache contention
Redo NOWait%Shows whether the redo log buffer has sufficient size.

Top 10 Foreground Wait Events by Total Wait Time


This section is critical because it shows those database events that might constitute the bottleneck for the system.
Here, first of all check for wait class if wait class is  User I/O , System I/O,  Others etc this could be fine but if wait class has value “Concurrency” then there could be some serious problem. Next to look at is Total Wait Time (sec) which show how many times DB was waiting in this class and then Wait Avg (ms). If Total Wait Time(sec) are high but Wait Avg(ms) is low then you can ignore this. If both are high or Wait Avg(ms) is high then this has to further investigate.

Note that there could be significant waits that are not listed here, so check the Foreground Wait Events (Wait Event Statistics) section for any other time consuming wait events.

For the largest waits look at the Wait Event Histogram to identify the distribution of waits.

 PARAMETER DESCRIPTION ANALYSIS
 DB CPU Time running in CPU (waiting in run-queue not included)
Here 84.8% is the %DB Time for this Event which is really HIGH!

DB Time was 1965 minutes and DB CPU is (100000/60= 1667 minutes)
1667/1965  is 84.8% which is shown in above table.

We can find

1) DB CPU LOAD

=Total wait time /AWR TIME
=100,000/(900*60)
=1.85

2) DB CPU UTLIZATION  %
=DB CPU LOAD/Number of Cores
=
=(1.85/8) X 100
= 23% of Host cores

IMPORTANT: Your server may have other database instances sharing the CPU resources so take into account those too.

Also this do not mean that server CPU is 84% Utilized!

 

 Sum of %DB Time   The sum should be approx 100%. If it is way below 100% then it may mean that wait events were irrelevant OR Server is overloaded.
 enq TX – row lock contention         waited for locked rowsThis parameter value currently is only 0.2% of total DB time so we don’t have to much worry about it.

Say that it was higher value, 10% then we will have to look into root cause.

You will have to go to “Segments by Row Lock Waits” and see what tables are getting locked and then you will have to see in which SQL_ID these are used.

 

 DB FILE SEQUENTIAL READ         single block i/o

Sequential read is an index read followed by table read because it is doing index lookups which tells exactly which block to go to

 

 Average I/O call is 2ms which is not very high. If you have say very high wait average example 100ms or 200ms, it means that your disks are slow

Are your SQLs returning too many rows, is the I/O response pretty bad on the server, is DB not sized to cache enough result sets

You need to see then the “File IO Stats” section in the AWR report.

The event indicates that index scan is happening while reading data from table. High no. of such event may be a cause of unselective indexes i.e. oracle optimizer is not selecting proper indexes from set of available indexes. This will result in extra IO activity and will contribute to delay in SQL execution. Generally high no. is possible for properly tuned application having high transaction activity.

•If Index Range scans are involved, more blocks than necessary could be being visited if the index is un-selective.By forcing or enabling the use of a more selective index, we can access the same table data by visiting fewer index blocks (and doing fewer physical I/Os).

•If indexes are fragmented, we have to visit more blocks because there is less index data per block. In this case, re-building the index will compact its contents into fewer blocks.

• If the index being used has a large Clustering Factor, then more table data blocks have to be visited in order to get the rows in each index block. By rebuilding the table with its rows sorted by the particular index columns we can reduce the Clustering Factor and hence the number of table data blocks that we have to visit for each index block.

 

 LOG FILE SYNC     Here Wait AVG (MS) is 6 which is not a cry number.
Above 20ms we don’t consider good number
Also go to “Instance Activity Stats” section and see how many commits actually happened and then see here that what % of COMMITS have to wait.Remember that short transactions, frequent commits is property of OLTP Application.
 DB FILE SCATTERED READcaused due to full table scans may be because of insufficient indexes or un-avilablity of updated statisticsTo avoid this event, identify all the tables on which FTS is happening and create proper indexes so that oracle will do Index scans instead of FTS. The index scan will help in reducing no. of IO operations.

To get an idea about tables on which FTS is happening please refer to “Segment Statistics” -> “Segments By Physical Read” section of AWR report. This section lists down both Tables and Indexes on which Physical Reads are happening. Please note that physical reads doesn’t necessarily means FTS but a possibility of FTS.

 Concurrency, wait class  Concurrency wait class is not good and if high then need to be analyzed.
 direct path read temp or direct path write temp this wait event shows Temp file activity (sort,hashes,temp tables, bitmap)
check pga parameter or sort area or hash area parameters. You might want to increase them
 
 Wait Class, column helps in classifying whether the issue is related to application or infrastructure.Wait events are broadly classified in to different WAIT CLASSES:

Administrative
Application
Concurrency
User IO
System IO
Cluster
Commit
Configuration
Idle
Network

 Buffer Busy Wait  Indicates that particular block is being used by more than one processes at the same. When first process is reading the block the other processes goes in a wait as the block is in unshared more. Typical scenario for this event to occur is, when we have batch process which is continuously polling database by executing particular SQL repeatedly and there are more than one parallel instances running for the process. All the instances of the process will try to access same memory blocks as the SQL they are executing is the same. This is one of the situation in which we experience this event.
 enq: TX – row lock contention: Oracle maintenance data consistency with the help of locking mechanism. When a particular row is being modified by the process, either through Update/ Delete or Insert operation, oracle tries to acquire lock on that row. Only when the process has acquired lock the process can modify the row otherwise the process waits for the lock. This wait situation triggers this event. The lock is released whenever a COMMIT is issued by the process which has acquired lock for the row. Once the lock is released, processes waiting on this event can acquire lock on the row and perform DML operation. 
 enq: UL – contention: This enq wait occurs when application explicitly locks by executing the lock table command. 
 enq: TM – contention This usually happens due to a missing foreign key constraint on a table that’s part of a DML operation. 
Host CPU

A high level of DB CPU usage in the Top N Foreground Events (or Instance CPU: %Busy CPU) does not necessarily mean that CPU is a bottleneck. In this example also we have DB CPU as the highest consuming category in the “Top 10 Foreground Events”
Look at the Host CPU and Instance CPU sections. The key things to look for are the values “%Idle” in the “Host CPU” section and “%Total CPU” in the “Instance CPU” section.
If the “%Idle” is low and “%Total CPU” is high then the instance could have a bottleneck in CPU (be CPU constrained).  Otherwise, the high DB CPU usage just means that the database is spending a lot of time in CPU (processing) compared to I/O and other events.  In either case (CPU is a bottleneck or not) there could be individual expensive SQLs with high CPU time, which could indicate suboptimal
execution plans, especially if accompanied with high (buffer) gets.
If you see in our case %idle is high 74% AND %Total CPU is just 7.45 so CPU is not a bottle neck in this example.
 PARAMETER DESCRIPTION ANALYSIS
 CPUs are actually threads.
Here we have 8 Cores and 8 Threads per Core so
CPU = number of core X number of threads per core
= 8 X 8
= 64
 
 Load Average Compare Load average with Cores.

Very Ideal thing is that Load Average  should be less than Cores although this may not be happening (any it may not be issue also!)

%Idle  Can be misleading as sometimes your %Idle can be 50% but your server is starving for CPU.

50% means that all your cores are BUSY. You may have free threads (CPU) but you can not run two processes CONCURRENTLY on same CORE.

All % in this reports are calculated based on CPU ( which are actually threads)

 

 Cores Here we have 8 core system
So we have 8 cores, meaning in a 60 min hour we have 60 X 8 = 480 CPU minsand total AWR duration is 8 hoursso 480X8 = 3840 CPU minutes in total
Instance CPU

A high level of DB CPU usage in the Top N Foreground Events (or Instance CPU: %Busy CPU) does not necessarily mean that CPU is a bottleneck. In this example also we have DB CPU as the highest consuming category in the “Top 10 Foreground Events”

Look at the Host CPU and Instance CPU sections. The key things to look for are the values “%Idle” in the “Host CPU” section and “%Total CPU” in the “Instance CPU” section.

If the “%Idle” is low and “%Total CPU” is high then the instance could have a bottleneck in CPU (be CPU constrained). Otherwise, the high DB CPU usage just means that the database is spending a lot of time in CPU (processing) compared to I/O and other events. In either case (CPU is a bottleneck or not) there could be individual expensive SQLs with high CPU time, which could indicate suboptimal
execution plans, especially if accompanied with high (buffer) gets.

If you see in our case %idle is high 74% AND %Total CPU is just 7.45 so CPU is not a bottle neck in this example.

Cache Sizes

From Oracle 10g onwards, database server does Automatic Memory Management for PGA and SGA components. Based on load, database server keeps on allocating or deallocating memory assigned to different components of SGA and PGA. Due to this reason, we can observe different sizes for Buffer Cache and Shared Pool, at the beginning or end of AWR snapshot period.

Shared Pool Statistics


PARAMETER DESCRIPTION ANALYSIS
 Memory Usage% shared pool usage
 If your usage is low (<85 percent) then your shared pool is over sized. if Memory Usage % is too large like 90 % it could mean that your shared pool is tool small
   
 % SQL with executions >1  Shows % of SQLs executed more than 1 time. The % should be very near to value 100. If your reuse is low (<60 – 70 percent) you may have bind variable or versioning issues. Ideally all the percentages in this area of the report should be as high (close to 100) as possible.
 memory for SQL w/exec>1 From the memory space allocated to cursors, shows which % has been used by cursors more than 1. 



Time Model Statistics


Important statistics here is the DB Time. The statistic represents total time spent in database calls. It is calculated by aggregating the CPU time and wait time of all sessions not waiting on idle event (non-idle user sessions). Since this timing is cumulative time for all non-idle sessions, it is possible that the time will exceed the actual wall clock time.


 PARAMETER DESCRIPTION ANALYSIS
 SQL EXECUTE ELAPSED TIME Time spent executing the SQL Statement
Out of all the DB Time which is 117,898.47 seconds , 92.32% of time, 108,843.51 seconds, database is executing the SQL query so our attention will be to find out what all SQLs took so much of DB Time

 

 DB CPU     DB CPU represents time spent on CPU resource by foreground user processes. This time doesn’t include waiting time for CPU. DB time and DB CPU define two important timescales:

wait times should be measured against the DB TIME,
while CPU consumption during certain activity (e.g. CPU time parsing) should be measured against DB CPU.

Above % showing for DB CPU may not be the only % to focus on. You should find below number and then see what is DB CPU usage.

DB CPU usage (% of CPU power available) =
CPU time / NUM_CPUS / elapsed time

Where NUM_CPUS is found in the Operating System statistics section.
Of course, if there are other major CPU users in the system, the formula must be adjusted accordingly. To check that, look at OS CPU usage statistics either directly in the OS.

 Parse Time Elapsed “Parse time elapsed” represents time spent for Syntax and Semantic checks. 
 Hard parse elapsed time “Hard parse include time” represents time spent for Syntax and Semantic checks PLUS time spent for optimizing the SQL and generating optimizer plan. 
 % DB Time In the time model statistics hierarchy, a child statistic may be counted under more than one parent and that is why the sum of the percentages equal more than 100 
 soft parse can be get by subtracting parse time from hard parse


Foreground Wait Class


This is of less use. A wait could have multiple possible causes (in different classes) depending on the context. There are normally only a handful of time consuming waits, which can be analyzed and investigated separately.
There are over 800 distinct wait events. Oracle has grouped these wait events in 12 wait classes. These wait classes are further divided in 2 categories, Administrative Wait Class and Application Wait Class.
These wait classes gives overall information about whether the waits happening for Application or for System events.
 PARAMETER DESCRIPTION ANALYSIS
 User I/O
High User IO means,

From the pool of available indexes proper indexes are not being used
OR
FTS is happening on big tables with millions of rows


Foreground Wait Events

Mostly The idle events are listed down in the end which should not be focused much.
This is useful because there could be time consuming report wait events that do not appear in the “Top N Timed Foreground Events”.
For the larger waits look at the Wait Event Histogram to identify the distribution of waits. Are they closely clustered around an average value or are there a wide variance of values ? Are there a large number of smaller waits or a few larger waits ?
 PARAMETER DESCRIPTION ANALYSIS
SQL*Net Message from client     Idle wait event
We can find the number of average inactive sessions by this wait event

Number of inactive sessions
= Total Wait Time/ (AWR Time * 60)

= 12679570/ (900 * 60)

= 235 average inactive sessions

This doesn’t mean user sessions as such but the number of such connections from Application Server connection pool.

 

 Direct path read/write to temp Shows excessive sorting/hashing/global temp table/bitmap activity going to your temporary tablespace. Review PGA_AGGREGATE_TARGET settings. Even if it looks like it is big enough, if you aregetting multiple small sorts to disk it could mean your user load is over-utilizing it.
 SQL*Net Message to client SQL*Net message to client waits almost always indicates network contention.
 SQL*Net more data from client If it is very low then it indicates that the Oracle Net session data unit size is likely set correctly.
 Db file sequential reads  Usually indicates memory starvation, look at the db cache analysis and for buffer busy waits along with cache latch issues.
 Db file scattered reads  Usually indicates excessive full table scans, look at the AWR segment statistics for tables that are fully scanned
 Log file Sync  Log file related waits: Look at excessive log switches, excessive commits or slow IO subsystems.

Wait Event Histogram

This can be used to determine the distribution of wait times. These days less than 5ms is expected and more than 10ms is considered poor. An analysis of the histogram can indicate if a high average time is due to a few individual long waits.
 PARAMETER DESCRIPTION ANALYSIS
 DB FILE SEQUENTIAL READ
This parameter will have mostly higher number of wait events in the histogram.

Now if you see approx 50% wait events  have less than 1 ms of wait and another 30% has less than 2 ms. It means that our disks are working good. Wait is low for most of the sessions going to database.

We simply don’t want that high% (and high wait events numbers) are above 8ms of wait.

Now if you see that the DB FILE SEQUENTIAL READ is the key wait event then next thing will be to find

a) which segment is the bottleneck (go to “Segments by Physical Reads” section
b) which sql query has that segment used.


SQL ordered by Elapsed Time


This can be used to identify the long running SQLs that could be responsible for a performance issue. It can give useful information about the CPU time, the number of executions
and the (SQL) Module. The Top SQLs can be matched to long running or slow Processes in the application.

In this report, look for query has low executions and high Elapsed time per Exec (s) and this query could be a candidate for troubleshooting or optimizations. In above report, you can see first query has maximum Elapsed time but only 2 execution. So you have to investigate this.

NOTE 1: The Elapsed time can indicate if a SQL is multithreaded (either Parallel DML/SQL or multiple workers). In this case the elapsed time will be multiple times the AWR duration (or the
observed clock time of the process/SQL). The elapsed time for multithreaded SQL will be the total of elapsed time for all workers or parallel slaves.
NOTE 2:  The “SQL Ordered” sections can often contain the PL/SQL call that contains SQLs. So in this case the procedure WF_ENGINE (via procedures) ultimately calls the SQL b6mcn03jvfg41. Also if you see the first line here that is also a package BEGIN XXINV7566…. and inside this package it is running the SQL query running in the line 2 which is insert into XXINV7566_IQR…..

 PARAMETER DESCRIPTION ANALYSIS
 Elapse per Exec (s) Elapse time in seconds for per execution of the SQL. 
 Captured SQL Account for 79.1% of
total DB Time
 Shows that how many % of SQL this AWR report was able to capture and show us Remember that AWR reports shows those SQL which were in shared pool at the end of the AWR Time.

This number should be high value which will mean that we were able to capture all those SQLs which consumed the DB Time.

If this is low number than try to generate AWR for lower snap duration so that we are able to capture the required SQLs which are consuming DB Time

 

 ExecutionsTotal no. of executions for the SQL during the two snapshot period.
An important point, if executions is 0 also sometimes, it doesn’t means query is not executing, this might be the case when query was still executing and you took AWR report. That’s why query completion was not covered in Report.
 
 SQL Module Provides module detail which is executing the SQL. Process name at the OS level is displayed as SQL Module name.

If the module name starts with any of the names given below, then don’t consider these SQLs for tuning purpose as these SQLs are oracle internal SQLs,

DBMS,
sqlplusw,
TOAD,
rman,
SQL,
Enterprise Manager,
ORACLE,
MMON_SLAVE,
emagent etc…

In the list XXIN1768 has two SQLIDs. The SQL id #1 is PL/SQL code as a wrapper and it took around 53k seconds. The sql #2 took 51k seconds and seems to be called in sql ID# 1, as their module names are same. Since the SQL#2 insert statement took almost all of the time so we sill focus on this query for tuning.

 Elasped Time The Elapsed Time is the sum of all individual execution time for the sql_id. So if multiple sessions execute the same SQLs, the elapsed time can be greater than the period of two snap_ids.

SQL ordered by CPU Time


The most useful sections are SQL ordered by Elapsed Time, CPU Time, Gets and Reads. All the sections can be useful in identifying if a particular SQL from a particular module was
running during the AWR report period
However, In most cases this section does not reveal much more information than the “SQL Ordered by Elapsed Time” section. However, it does sort by CPU and can output SQLs that are not
in the previous section.
 PARAMETER DESCRIPTION ANALYSIS
 The top record in this table
The first and second report are part of same transaction.
Second SQL is the inside part of first PLSQL.
It is accounting huge % of the DB CPU and remember that DB CPU was the top event in our AWR.


SQL ordered by Gets


The most useful sections are SQL ordered by Elapsed Time, CPU Time, Gets and Reads. All the sections can be useful in identifying if a particular SQL from a particular module was
running during the AWR report period
This  is the logical Reads from Buffer Cache
When “CPU Other” is a significant component of total Response Time, then it is likely that the time is being spent retrieving and manipulating Blocks and Buffers (Block accesses are also known as Buffer Gets and Logical I/Os). Then as a next step we will come to this section to find the SQL statements that access the most blocks because these are likely to be responsible for the majority of this time.
This section will tell you HOW MANY BLOCKS WERE READ and “SQL Ordered by Executions” section will tell you how many rows were fetched. Many a time this and that section will have same SQL_ID.
Now if more BLOCKS are read here and that section tells that number of rows fetched are real low means something is not right with the query. Why it has to read so many blocks to get less rows.May be bad execution plan.
A high number of buffer gets is one of the main indicators of SQLs with suboptimal execution plans Example if you see in this example, insert statement in row number 4 (which is related to PLSQL in row number 3) is doing very high number of buffer gets for single execution..
BUT Bear in mind that the SQL could have a good execution plan and just be doing a lot of work. So we need to bring into account the data volume (and parameters that are getting passed to the query).
You can easily see execution plans by running @awrsqrpt.sql and passing offending SQL_ID as parameter.
You can refer to the “SQL Ordered by Executions” after reading this section.
 PARAMETER DESCRIPTION ANALYSIS
  Gets per Exec HOW MANY BLOCKS WERE READ insert statement in XXINV1738 module has Gets per Exec that is too high, you need to analyze the SQL with some additional output such as sqlt and sqlhc.
SQL ordered by Reads


This section reports the contents of the SQL area ordered by the number of reads from the data files and can be used to identify SQL causing IO bottlenecks which consume the following resources.
• CPU time needed to fetch unnecessary data.
• File IO resources to fetch unnecessary data.
• Buffer resources to hold unnecessary data.
• Additional CPU time to process the query once the data is retrieved into the buffer.
• % Total can be used to evaluate the impact of each statement.
If we talk about wait time then “Wait Time” is the sum of time waited for non-idle Wait Events. These include I/O waits for reading blocks from disk as measured by the Wait Events ‘db file sequential read’ for single-block reads and ‘db file scattered read’ for multi-block reads.
When such Wait Events are found to be significant components of Response Time, the next step is to find the SQL statements that read the most blocks from disk. We will refer to this section then.
This  is the Physical Reads from Disk
If the physical I/O waits (e.g. db file sequential read, db file scattered read, direct path read) are relatively high then this section can indicate which SQLs are responsible.
This section can help identify the SQLs that are responsible for high physical I/O and may indicate suboptimal execution plans, particularly if the execution plan contains full table scans or large index range scans (where more selective index scans are preferable).
 PARAMETER DESCRIPTION ANALYSIS
 Captured SQL account for  76% of total
 Our goal is to have this % as high as possible. Probably breaking down this AWR into smaller interval will increase this %.

 

 The top record in this table We have seen in the “Segments by Physical Reads” section that
MTL_ITEM_CATEGORIES account for 51.32% and looking here the SQL_ID which is a top and having 40.18% TOTAL is using this table.Yousee here that although it has executed 73 times but reads per executions is high making it top query consuming physical i/o.In contrast query at number 4 in this table has been executed around 40k times but since reads per execution is low so number 4th query is not the top query to worry about.
 Reads per Exec  Possible reasons for high Reads per Exec are use of unselective indexes require large numbers of blocks to be fetched where such blocks are not cached well in the buffer cache, index fragmentation, large Clustering Factor in index etc.


SQL ordered by Physical Reads (UnOptimized)

This section is of concern when you have exadata machine in use
Read requests that are satisfied from the Smart Flash Cache in Exadata are termed ‘optimized’ since they are returned much faster than requests from disk (the implemention uses solid state device (SSD) technology). Additionally, read requests accessing Storage Indexes using smart scans in Oracle Exadata V2 (and significantly reducing  I/O operations) also fall under the category ‘optimized read requests’ since they avoid reading blocks that do not contain relevant data.
IMPORTANT: In database systems where ‘Optimized Read Requests’ are not present (which are not using EXA) , UnOptimized Read Reqs will be equal to Physical Read Reqs (I/O requests satisfied from disk).  In this case columns ‘UnOptimized Read Reqs’ and ‘Physical Read Reqs’ will display the same values and column ‘%Opt’ will display zero (as seen in extract from AWR report above).
 
 PARAMETER DESCRIPTION ANALYSIS
 
 
Physica Read Reqs
 Note that the ‘Physical Read Reqs’ column in the ‘SQL ordered by Physical Reads (UnOptimized)’ section is the number of I/O requests and not the number of blocks returned. Be careful not to confuse these with the Physical Reads statistics from the AWR section ‘SQL ordered by Reads’, which counts database blocks read from the disk not actual I/Os (a single I/O operation  may return many blocks from disk).

No comments:

Post a Comment