RAC性能优化:遭遇read by other session等待事件

日期: 2011-03-15 作者:人在旅途 来源:TechTarget中国

  2011年1月21日,接到运维中心王工电话,说ODS系统比较慢,要求诊断一下,给ODS系统做了AWR。首先交代一下ODS环境:   (1)环境:RedHat 5.1+Oracle10.2.0.4 RAC(两节点)   (2)现象:系统异常慢   由于当时质疑人资管控系统数据库有问题,而人资管控系统是在节点2实例上,所以查看节点2实例的top5等待事件:   节点2: Top 5 Timed Events             &n……

我们一直都在努力坚持原创.......请不要一声不吭,就悄悄拿走。

我原创,你原创,我们的内容世界才会更加精彩!

【所有原创内容版权均属TechTarget,欢迎大家转发分享。但未经授权,严禁任何媒体(平面媒体、网络媒体、自媒体等)以及微信公众号复制、转载、摘编或以其他方式进行使用。】

微信公众号

TechTarget微信公众号二维码

TechTarget

官方微博

TechTarget中国官方微博二维码

TechTarget中国

电子邮件地址不会被公开。 必填项已用*标注

敬请读者发表评论,本站保留删除与本文无关和不雅评论的权力。

  2011年1月21日,接到运维中心王工电话,说ODS系统比较慢,要求诊断一下,给ODS系统做了AWR。首先交代一下ODS环境:

  (1)环境:RedHat 5.1+Oracle10.2.0.4 RAC(两节点)

  (2)现象:系统异常慢

  由于当时质疑人资管控系统数据库有问题,而人资管控系统是在节点2实例上,所以查看节点2实例的top5等待事件:

  节点2:

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
------------------------------ ------------ ----------- ------ ------ ----------
gc buffer busy                    2,346,848      24,567     10   33.2    Cluster
gc buffer busy                    2,346,848      24,567     10   33.2    Cluster
read by other session             1,941,773      18,734     10   25.3   User I/O
read by other session             1,941,773      18,734     10   25.3   User I/O
db file scattered read              673,646      10,209     15   13.8   User I/O
db file scattered read              673,646      10,209     15   13.8   User I/O
CPU time                                          6,627           9.0
CPU time                                          6,627           9.0
log file sync                       129,112       4,068     32    5.5     Commit
log file sync                       129,112       4,068     32    5.5     Commit

  从top 5 event来看,主要是read类型的event比较多

  db file sequential read 顺序读导致

  gc buffer busy 产生的根本原因是Oracle 的cache fusion机制,简单来说cache fusion的主要目的

  1、减少各个节点对数据库的读写,提交效率,内存之间的数据同步要比磁盘到内存快很多

  2、保持数据的完整性,当某个节点执行查询时首先在本节点buffer中是否存在,如果不存在就会到另外节点buffer去查找,

  如果存在大量的cache交换就会产生gc buffer busy

  3、read by other session

  When information is requested from the database, Oracle will first read the data from disk into the database buffer cache.

  If two or more sessions request the same information, the first session will read the data into the buffer cache while other sessions wait.

  In previous versions, this wait was classified under the "buffer busy waits" event.

  However, in Oracle 10.1 and higher, this wait time is now broken out into the "read by other session" wait event.

  Excessive waits for this event are typically due to several processes repeatedly reading the same blocks,

  e.g. many sessions scanning the same index or performing full table scans on the same table.

  Tuning this issue is a matter of finding and eliminating this contention.

  该event是10.1以后新引入的,之前归类于buffer busy waits,Oracle对event的划分越来越细,

  产生原因:对同一个索引进行的大量扫描或对同一个表进行全表扫描

  查询等待的文件

  SELECT p1 "file#", p2 "block#", p3 "class#"
  FROM v$session_wait
  WHERE event = 'read by other session';

  查询热点块对应的对象

  SQL> SELECT relative_fno, owner, segment_name, segment_type
  FROM dba_extents
  WHERE file_id = &file
  AND &block BETWEEN block_id AND block_id + blocks - 1

  查看Top Sql 发现如下比较可疑,单次执行要花费20多分钟,

  查看表的索引无异常,查看执行计划是index range scan,貌似没有问题

  该表的数据量比较大且变化比较频繁,怀疑是由于索引引起的热点块,于是将其改为

  全表扫描,这样可以避免索引的热点块产生

  经过一段时间观察系统正常

  SQL ordered by Elapsed Time DB/Inst: BILLBJ/billbj1 Snaps: 15955-15956
  -> Resources reported for PL/SQL code includes the resources used by all SQL
  statements called by the code.
  -> % Total DB Time is the Elapsed Time of the SQL statement divided
  into the Total Database Time multiplied by 100
  Elapsed CPU Elap per % Total
  Time (s) Time (s) Executions Exec (s) DB Time SQL Id
  ---------- ---------- ------------ ---------- ------- -------------
  10,435 709 8 1304.3 66.8 5uc64k11j23wx
  Module: JDBC Thin Client
  select a.day day,sum(a.data) data,sum(a.indata) indata from bandwidth_d
  ailytable a where a.day between :v_start_date and :v_end_date and a.channelId
  in (select * from the (select cast(:v_channelId as channelIdArray) from dual))
  and a.nodeId in (select * from the (select cast(:v_nodeId as nodeIdArray) from d

  有时候产生同一个问题的几个原因可能是相悖的,比如read by other session,所以我们不能局限于某一个方面,此案例中db file sequential read wait也说明顺序读等待过多,最后的解决方法也说明查询未必适用索引就好。

  其实read by other session等待事件是要解决热点块问题,而解决热点块问题首先要减低IO次数,我们来看看IO情况:

~~~~~~~~~~~~                            Per Second       Per Transaction
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
                                   ---------------       ---------------
                  Redo size:             28,546.06                881.39
                  Redo size:             28,546.06                881.39
              Logical reads:             37,292.67              1,151.45
              Logical reads:             37,292.67              1,151.45
              Block changes:                151.32                  4.67
              Block changes:                151.32                  4.67
             Physical reads:              2,872.88                 88.70
             Physical reads:              2,872.88                 88.70
            Physical writes:                  2.64                  0.08
            Physical writes:                  2.64                  0.08
                 User calls:                114.65                  3.54
                 User calls:                114.65                  3.54
                     Parses:                 60.26                  1.86
                     Parses:                 60.26                  1.86
                Hard parses:                 28.64                  0.88
                Hard parses:                 28.64                  0.88
                      Sorts:                 15.86                  0.49
                      Sorts:                 15.86                  0.49
                     Logons:                  0.42                  0.01
                     Logons:                  0.42                  0.01
                   Executes:                106.84                  3.30
                   Executes:                106.84                  3.30
               Transactions:                 32.39
               Transactions:                 32.39

  很明显逻辑读比较多,另外等待事件cr request retry也是造成性能差的一个原因。我们可以通过分割应用,对大表分区等方法进行系统优化。

相关推荐