Saturday, June 23, 2012

All About Oracle Database Latches


All About Latches

1. What are Oracle latches?
Latches are Oracle-internal low-level locks that protect the memory structures of the system global area (SGA) against simultaneous accesses. Depending on the memory range, there is either a single parent latch or several child latches that each protect subareas of the memory.

2. How does latch allocation work?
There are two different methods for requesting a latch: "Willing to Wait" and "Immediate", where most latch requests are based on "Willing to Wait":

Willing to Wait mode

  •  If a latch is currently in use, the system checks repeatedly in a loop whether the latch has become available ("spinning"). CPU is used in this loop. At the same time, there are NO active Wait events (not even "latch free" events). The number of loops is defined by the internal _SPIN_COUNT Oracle parameter. If only one CPU is available on the system, this step is skipped. Spinning is useful because, in general, latches are only held for a very short time and the request can be fulfilled quickly. In this case, spinning can prevent a context switch, which also consumes a large amount of resources.
  •  If the latch could not be allocated during the entire loop, the process assumes a "sleep" status for a certain period of time. This period is 0.01 seconds for the first sleep. During this sleep, the "latch free" wait event is activated. As of Oracle 10g "important" latches have to be specified in the form "latch: <latch_name>" whereas the less relevant latches will still be logged as "latch free".
  •  The two steps above are repeated until the latch is allocated successfully. The sleep time doubles with every second program run. The maximum sleep time is 2 seconds if another latch is not held, or 0.04 seconds if the process already holds another latch. 


Immediate mode

 
If the requested latch is already in use, the system immediately tries all the other child latches. If none of the child latches can be allocated, the system switches to Willing to Wait mode. 

3. How can I access latch information on the Oracle system?
The following V$ views contain latch information:

V$LATCH: Provides an overview of the latch waits since the system was started.


  •     NAME: The name of the latch.
  •     GETS: The number of "Willing to Wait" requests.
  •     MISSES: The number of "Willing to Wait" requests that could not allocate the latch in the first attempt.
  •     SPIN_GETS: The number of "Willing to Wait" requests that could allocate the latch in the first spinning without sleep periods.
  •     SLEEPS: The number of "Willing to Wait" requests where the process has entered a sleep period at least once.
  •     SLEEP1: The number of "Willing to Wait" requests with exactly one sleep.
  •     SLEEP2: The number of "Willing to Wait" requests with exactly two sleeps.
  •     SLEEP3: The number of "Willing to Wait" requests with exactly three sleeps.
  •     SLEEP4: The number of "Willing to Wait" requests with four or more sleeps.
  •     IMMEDIATE_GETS: The number of "Immediate" requests.
  •     IMMEDIATE_MISSES: The number of "Immediate" requests where the latch could not be allocated on the first attempt.
  •     WAIT_TIME: The combined sleep times of the latch (as of Oracle 9i). 

V$LATCHHOLDER: Provides an overview over the currently held latches.

  •    PID     : Oracle PID <opid> of the process that holds the latch.
  •    SID: Oracle SID of the session that holds the latch.
  •    NAME: The name of the held latch. 

V$LATCHNAME: Provides an overview of the names of all latches.

  •    LATCH#: The number of the latch.
  •    NAME: The name of the latch. 

V$LATCH_MISSES: The number of sleeps and immediate misses, including the Oracle kernel area.


  •    NWFAIL_COUNT: The number of immediate misses.
  •     SLEEP_COUNT: The number of sleeps.
  •     LOCATION: The Oracle kernel area that holds the requested latch. 


V$LATCH_PARENT: Provides an overview of parent latch waits since the system was started.

    This contains the same fields as V$LATCH.

V$LATCH_CHILDREN: Provides an overview of child latch waits since the system was started.

  •    CHILD#: The number of the child latch.
  •     All other fields are the same as in V$LATCH. 


4. How can I access latch information on the R/3 system?
Note the fundamentals for the Oracle performance analysis contained in Note 618868.

5. How I can recognize latch problems?
A latch problem usually exists when there are long "latch free" wait times. Therefore if, when you analyze the wait events in accordance with Note 619188,

you notice that either the entire system or individual transactions are impaired by "latch free" or "latch: <latch_name>" waits, it is worthwhile analyzing and optimizing the latch waits.

6. How can I determine the latches for which the system is currently waiting?
Use the following command to determine which sessions are currently waiting for which latch:

SELECT
   W.SID,
   L.NAME
FROM
   V$SESSION_WAIT W,
   V$LATCHNAME L
WHERE
   W.EVENT LIKE 'latch%' AND
   W.P2 = L.LATCH# AND
   W.STATE = 'WAITING';

7. How can I determine the current latch holders?
Note 20071 describes how you can use the V$LATCHHOLDER view or the BLOCKING_SESSION column from V$SESSION (Oracle 10g or higher) to determine the current

latch holder.

8. How can I determine the most critical latches?
The most critical latches can be determined based on the time spent waiting fr a latch. This information is contained in the WAIT_TIME column of the V$LATCH view. Use the following statement to retrieve the ten latches with the longest wait times since the system was started:

SELECT * FROM (SELECT NAME, WAIT_TIME FROM V$LATCH
ORDER BY WAIT_TIME DESC) WHERE ROWNUM <= 10;

The WAIT_TIME column is only available in Oracle versions after Oracle 8i. In earlier versions, you must use the SLEEPS column for this purpose.It is only useful to determine and tune the most critical latches if an actual latch problem exists. It is completely normal that certain latches such as "shared pool" or "library cache" have significant wait times and are listed at the top of the list.

9. How can I determine objects that are linked to latch waits?
As of Oracle 10g, the V$ACTIVE_SESSION_HISTORY view provides information about the last active wait situations and the affected objects. See Note 619188 for further information. Frequently, latch waits are the essential concurrency waits. Therefore, candidates for latch problems as of Oracle 10g can be determined using the following 

query:

SELECT * FROM
(SELECT
     ROUND(CONCURRENCY_WAIT_TIME / 1000000)
       "CONCURRENCY WAIT TIME (S)",
     EXECUTIONS
     SQL_ID,
     SQL_TEXT
   FROM
     V$SQLSTATS
   ORDER BY CONCURRENCY_WAIT_TIME DESC )
WHERE ROWNUM <=10;

10. What are the general reasons and possible solutions for long latch wait times?
The following are typical reasons for long latch wait times:

Very high access rates to certain memory resources
Reduce the load of these resources depending on the latch wait times that occur as described below.


CPU bottleneck

A CPU bottleneck on the database server may cause extremely long latch waits if processes that hold critical latches are pushed out of the CPU (context switch). In addition, a CPU bottleneck is intensified by the spinning process. Therefore, you should check at operating system level, or by using transactions ST06 or OS07, that there is no CPU bottleneck on the database server (see the CPU guidelines described in Note 618868).

Large-scale parallel processing

On R/3, if an action is parallel-processed too intensively and a lot of processes therefore access the same memory area at the same time, latch waits are very likely. The spinning causes the processes waiting for a latch to consume a lot of CPU memory, which means that the processes that have the latch are pushed from the CPU more quickly. As a result, the latch is kept longer than necessary and the situation deteriorates. Therefore check whether reducing the parallel processing of processes with similar database accesses would reduce the latch waits and improve performance.

Oracle bugs

           Some Oracle bugs may cause long latch wait times. Some of the bugs of which we are aware are listed below in the tuning tips for actual latch waits. However, it is always useful to check for other possible bugs in each case.

Interrupted network connection

           Read Note 20071 and ensure that the latch problem is not caused by an interrupted network connection.

Events and underscore parameters

           In individual cases, Oracle events or underscore parameters may be responsible for increased latch times. For example, setting the 10501 event may cause massive problems due to "shared pool" latches.

           Therefore, check whether there are settings that are not recommended by SAP or are no longer required with the database release that you use, and

remove these.

Oversized shared pool

  In some cases, an oversized shared pool can cause latch wait situations on the shared pool (for example, "library cache" or "row cache objects" latch) since the Oracle-internal administrative structure can by overloaded by the amount of simultaneously retained information. Check whether the configuration of the shared pool is significantly larger than Note 789011 recommends. If this is the case, you should consider a reduction of the SHARED_POOL_SIZE. As a quick solution you can use

ALTER SYSTEM FLUSH SHARED_POOL;

Paging

Increased paging may also lead to latch problems. If a process has allocated a latch, and must retrieve the relevant memory area from the disk, it may take longer until the latch is released again. Therefore, latch wait situations are more likely.

HP-UX: Process scheduling

When you use HP-UX, critical latch situations may be made worse by the default process scheduling mechanism, if processes in critical sections (in other words, with a held latch) are displaced from the CPU. In this case, you can reconsider using the HPUX_SCHED_NOAGE Oracle parameter in accordance with 

metalink document 217990.1 to reduce these displacements.

Error ORA-04031:

In individual cases, ORA-04031 error situations can lead to increased latch waits. Therefore, check if ORA-04031 errors were logged in critical latch periods, and see Note 869006 if this is the case.

11. How can I tune actual latch waits?
Depending on which latch the system is waiting for, you have the following options:

library cache / library cache pin

           Wait event > = 10g: "latch: library cache" / "latch: library cache pin"

Wait situations on the "library cache" latch and "library cache pin" latch usually occur in the SAP environment, if you execute a large number of identical SQL statements in parallel (with relation to bind variables). Therefore, avoid the parallel execution of a large number of similar SQL statements that only differ in the content of their bind variables. To determine which SQL statements are responsible for the latch problems, first determine the child latches:

SELECT NAME, CHILD#, WAIT_TIME
FROM V$LATCH_CHILDREN
WHERE NAME LIKE 'library cache%'
ORDER BY WAIT_TIME DESC;

           You can then start a query on V$SQL for the child latches that have the longest waiting period:

SELECT * FROM
( SELECT EXECUTIONS, PARSE_CALLS, ADDRESS, SQL_TEXT
   FROM V$SQL
   WHERE CHILD_LATCH = <child#>
   ORDER BY EXECUTIONS DESC )
WHERE ROWNUM <=10;

The system returns the 10 SQL statements that are protected by the child latch specified, and that are executed most frequently. In certain cases it may be recommended to increased the number of latches using the Oracle parameter _KGL_LATCH_COUNT. Depending on this parameter, the number of library cache" latches, "library cache pin" latches and "library cache pin allocation" latches is determined using the following 

formula:

#Latches = MIN(67, next_larger_prime_number(_KGL_LATCH_COUNT))

As a last solution, you may consider using SQL statements that have similar meaning but are different from a parse point of view (for example, by specifying a different comment or pseudo hint). By this, the load is distributed more evenly on different child latches. Wait situations for the "library cache" latch occur if the system must parse a large number of SQL statements. Therefore, you should check the 

following points:


  •     Check, as described in Note 618868, whether the shared pool size is sufficient.
  •     Avoid the large-scale use of REPARSE or SUBSTITUTE VALUES hints according to Note 129385, as these hints mean that significantly more statements must  be parsed.
  •    The creation of statistics, shared pool flushes and DDL statements (for example, TRUNCATE) also require additional parsing of statements. Therefore, 


you should also check to what extent these actions occur on a large scale.

shared pool

           Wait event >= 10g: "latch: shared pool"
           Refer to the information about the "library cache" latch above.

cache buffers chains

           Wait event >= 10g: "latch: cache buffer chains"
           "Cache buffers chains" latch waits are usually caused by hot blocks, that is, blocks that are accessed frequently. Hot blocks can be identified as  follows:

Execute the following statement repeatedly to determine which "cache buffers chains" child latch is responsible for the highest number of sleeps:

      SELECT
      *
      FROM
      (SELECT
      CHILD#,
      ADDR,
      WAIT_TIME
      FROM
      V$LATCH_CHILDREN
      WHERE
      NAME = 'cache buffers chains'
      ORDER BY
      WAIT_TIME DESC)
      WHERE
      ROWNUM < 10;


Now execute the following query for X$BH with the ADDR value of the most expensive child latch to determine which blocks protected by the child latch 

have the highest touchcount:

      SELECT /*+ RULE */
      SUBSTR(E.OWNER, 1, 15) OWNER,
      SUBSTR(E.SEGMENT_NAME, 1, 20) SEGMENT_NAME,
      X.FILE# "FILE",
      X.DBABLK "BLOCK",
      DECODE(X.DBABLK - BLOCK_ID + EXTENT_ID, 0, 'YES', 'NO') HEADER,
      SUM(X.TCH) "TOUCHCOUNT"
      FROM V$LATCH_CHILDREN L, X$BH X, DBA_EXTENTS E
      WHERE X.HLADDR = '<addr>' AND
      E.FILE_ID = X.FILE# AND
      X.HLADDR = L.ADDR AND
      X.DBABLK BETWEEN
      E.BLOCK_ID AND E.BLOCK_ID + E.BLOCKS - 1
      GROUP BY E.OWNER, E.SEGMENT_NAME, X.FILE#, X.DBABLK,
      DECODE(X.DBABLK - BLOCK_ID + EXTENT_ID, 0, 'YES', 'NO'),
      X.TCH
      ORDER BY SUM(X.TCH) ASC;
   
The touchcount is only a rough measurement for using individual blocks - even if a block is used repeatedly within a very short time, the touchcount is only increased once. Nevertheless, combined with the executed SQL statements, it may be a useful indication of the trigger of "cache buffers chains" latches.

By optimizing the accesses to the relevant objects, you can also reduce the number of "cache buffers chains" latches. On Oracle 8.1. 7, the "cache buffers chains" latches may be triggered by the Oracle bugs described in Notes 449136 and 488583. The number of "cache buffers chains" latches depends on the size of the buffer pool and can be approximated using the following formula:

#Latches =
   Next largest power of 2 (Size of the buffer pool/1048576).

The exact calculation formula depends on the use of DB_BLOCK_BUFFERS vs. DB_CACHE_SIZE, since block header information is also stored in the  granules when DB_CACHE_SIZE is used. Also, the size of the granules is important for DB_CACHE_SIZE. The following default values should in any case return 

the larger number of latches:

    * Buffer Pool < 1124073472 Byte -> 1024 Latches
    * Buffer Pool >= 1124073472 Byte -> 2048 Latches
    * Buffer Pool >= 2231369728 Byte -> 4096 Latches
    * Buffer Pool >= 4445962240 Byte -> 8192 Latches
    * Buffer Pool >= 8875147264 Byte -> 16384 Latches
    * Buffer Pool >= 17733517312 Byte -> 32768 Latches
    * Buffer Pool >= 35500000000 Byte -> 65536 Latches

If you have problems with "cache buffers chains" latches, and have defined an Oracle buffer pool just below one of the threshold values described above and have enough memory available to increase the size of the buffer pool, it may be useful to increase the buffer pool size to the next threshold value.

Alternatively, you can use the _DB_BLOCK_HASH_LATCHES parameter to define the number of "cache buffers chains" latches. However, you should only explicitly set this parameter in individual cases that warrant this setting.  If "cache buffers chains" latches occur in Oracle 10.2.0. 2, see Note 1020225.

cache buffers lru chain

         Wait event >= 10g: "latch: cache buffers lru chain"
          They are usually triggered by expensive SQL statements. Perform an SQL optimization as described in Note 766349.

 In addition, check whether the DBWR performs poorly when you save dirty blocks (for example, using a wait event analysis as described in Note 619188). You should also ensure that the Oracle buffer pool size is sufficient (see the information in Note 618868). Alternatively, you can use the _DB_BLOCK_LRU_LATCHES parameter to adjust the number of "cache buffers lru chain" latches. However, you should only explicitly set this parameter in individual cases that warrant this setting.

cache buffer handles

           Wait event >= 10g: "latch: cache buffer handles"

In Oracle 8.1. 7, the bug described in Note 400698 may be responsible for serializations of the "cache buffer handles" latch. enqueue hash chains

           Wait event >= 10g: "latch: enqueue hash chains"

If wait times for "enqueue hash chains" latches occur together with deadlock dumps, see Note 596420 and ensure that bottlenecks do not occur when dump files are created (I/O, mount options of the file system, and so on). archive control Wait situations on the "archive control" latch are generally a follow-on problem of archiver stuck situations. You should therefore check whether an archiver stuck situation has occurred, as described in Note 391.

undo global data

The "undo global data" latch is required for accessing undo data. Increased waiting imes on this latch are usually due to expensive SQL statements that access consistent read images due to the data in the undo segments. In this case, optimize the expensive SQL statements and make sure that there is no large number of open changes (also see Note 913247). 

query server process

The "query server process" latch is retained if new slaves are generated during parallel execution. Refer to the corresponding information in Note 651060.

simulator lru latch

Significant wait times for the "simulator lru latch" are often a result of processing-intensive SQL statements that execute a large number of buffer gets. Therefore, you must first check if these accesses exist and whether they can be optimized. Long waiting times for the "simulator lru latch" may also occur when you create fixed object statistics using DBMS_STATS.GATHER_FIXED_OBJECTS_STATS (or "brconnect -f oradict_stats"). This affects mainly systems with a large buffer pool, as in these cases, the table X$KCBSH becomes quite large. Therefore, the statistics creation on this table takes a long time. During the statistics creations, problems with the "simulator lru latch" may occur.

As a workaround, you can set the parameter  DB_CACHE_ADVICE to OFF, which means that the LRU simulation is deactivated in the buffer pool. Note that if you perform this task, you can no longer collect data in V$DB_CACHE_ADVICE (Note 617416).

12. Will latch deadlocks occur?
Generally, the Oracle implementation ensures that no latch deadlocks can occur. However, in the case of bugs, deadlocks may occur. In this case, the transaction terminates with ORA-00600 [504]. If you encounter this error and you cannot find a suitable note, you should create a customer message.

13. Where can I find more information about latches?
Oracle documentation:

Oracle9i Performance Tuning Guide and Reference
22 Instance Tuning
-> Wait Events
-> latch free

Oracle 9i Reference
-> Dynamic Performance Views
-> V$LATCH

Oracle 10g Performance Tuning Guide and Reference
10 Instance Tuning Using Performance Views
-> Wait Events Statistics

Oracle 10g Reference
-> 4 Dynamic Performance Views
-> V$LATCH


Regards
Manoj

No comments:

Post a Comment

Oracle E-business suite logs clean up

 Oracle E-business suite logs clean up #!/bin/bash cd $EBS_DOMAIN_HOME find $EBS_DOMAIN_HOME -type f -path "*/logs/*.log?*" -mtime...