Saturday, June 23, 2012

Tuning Garbage Collection Outline


Tuning Garbage Collection Outline (Oracle Apps) 


This document is a summary or outline of Sun's document: Tuning Garbage collection with the 1.4.2 Hotspot JVM located here: http://java.sun.com/docs/hotspot/gc1.4.2/

1.0 Introduction

    * For many applications garbage collection performance is not significant
    * Default collector should be first choice

2.0 Generations

    * Most straightforward GC will just iterate over every object in the heap and determine if any other objects reference it.
          o This gets really slow as the number of objects in the heap increase
    * GC's therefor make assumptions about how your application runs.
    * Most common assumption is that an object is most likely to die shortly after it was created: called infant mortality
    * This assumes that an object that has been around for a while, will likely stay around for a while.
    * GC organizes objects into generations (young, tenured, and perm) This is important!

2.1 Performance Considerations

    * Ways to measure GC Performance
          o Throughput - % of time not spent in GC over a long period of time.
          o Pauses - app unresponsive because of GC
          o Footprint - overall memory a process takes to execute
          o Promptness - time between object death, and time when memory becomes available
    * There is no one right way to size generations, make the call based on your applications usage.

2.2 Measurement

    * Throughput and footprint are best measured using metrics particular to the application.
    * Command line argument -verbose:gc output
      [GC 325407K->83000K(776768K), 0.2300771 secs]
          o GC - Indicates that it was a minor collection (young generation). If it had said Full GC then that indicates that it was a major collection (tenured generation).
          o 325407K - The combined size of live objects before garbage collection.
          o 83000K - The combined size of live objects after garbage collection.
          o (776768K) - the total available space, not counting the space in the permanent generation, which is the total heap minus one of the survivor spaces.
          o 0.2300771 secs - time it took for garbage collection to occur.
    * You can get more detailed output using -XX:+PrintGCDetails and -XX:+PrintGCTimeStamps

3 Sizing the Generations

    * The -Xmx value determines the size of the heap to reserve at JVM initialization.
    * The -Xms value is the space in memory that is committed to the VM at init. The JVM can grow to the size of -Xmx.
    * The difference between -Xmx and -Xms is virtual memory (virtually committed)

3.1 Total Heap

    * Total available memory is the most important factor affecting GC performance
    * By default the JVM grows or shrinks the heap at each GC to keep the ratio of free space to live objects at each collection within a specified range.
          o -XX:MinHeapFreeRatio - when the percentage of free space in a generation falls below this value the generation will be expanded to meet this percentage. Default is 40
          o -XX:MaxHeapFreeRatio - when the percentage of free space in a generation exceeded this value the generation will shrink to meet this value. Default is 70
    * For server applications
          o Unless you have problems with pauses grant as much memory as possible to the JVM
          o Set -Xms and -Xmx close to each other or equal for a faster startup (removes constant resizing of JVM). But if you make a poor choice the JVM can't compensate for it.
          o Increase memory sa you increase # of processors because memory allocation can be parallelized.

3.2 The Young Generation

    * The bigger the young generation the less minor GC's, but this implies a smaller tenured generation which increases the frequency of major collections.
    * You need to look at your application and determine how long your objects live for to tune this.
    * -XX:NewRatio=3 - the young generation will occupy 1/4 the overall heap
    * -XX:NewSize - Size of the young generation at JVM init. Calculated automatically if you specify -XX:NewRatio
    * -XX:MaxNewSize - The largest size the young generation can grow to (unlimited if this value is not specified at command line)

3.2.1 Young Generation Guarantee

    * The -XX:SurvivorRatio option can be used to tune the number of survivor spaces.
    * Not often important for performance
          o -XX:SurvivorRatio=6 - each survivor space will be 1/8 the young generation
          o If survivor spaces are too small copying collection overflows directly into the tenured generation.
          o Survivor spaces too large uselessly empty
          o -XX:+PrintTenuringDistribution - shows the threshold chosen by JVM to keep survivors half full, and the ages of objects in the new generation.
    * Server Applications
          o First decide the total amount of memory you can afford to give the virtual machine. Then graph your own performance metric against young generation sizes to find the best setting.
          o Unless you find problems with excessive major collection or pause times, grant plenty of memory to the young generation.
          o Increasing the young generation becomes counterproductive at half the total heap or less (whenever the young generation guarantee cannot be met).
          o Be sure to increase the young generation as you increase the number of processors, since allocation can be parallelized.

4 Types of Collectors

    * Everything to this point talks about the default garbage collector, there are other GC's you can use
    * Throughput Collector - Uses a parallel version of the young generation collector
          o -XX:+UseParallelGC
          o Tenured collector is the same as in default
    * Concurrent Low Pause Collector
          o Collects tenured collection concurrently with the execution of the app.
          o The app is paused for short periods during collection
          o -XX:+UseConcMarkSweepGC
          o To enable a parallel young generation GC with the concurrent GC add -XX:+UseParNewGC to the startup. Don't add -XX:+UseParallelGC with this option.
    * Incremental Low Pause Collector
          o Sometimes called Train Collector
          o Collects a portion of the tenured generation at each minor collection.
          o Tries to minimize large pause of major collections
          o Slower than the default collector when considering overall throughput
          o Good for client apps (my observation)
          o -Xincgc
    * Don't mix these options, JVM may not behave as expected.

4.1 When to use Throughput Collector

    * Large number of processors
    * Reduces serial execution time of app, by using multiple threads for GC
    * App with lots of threads allocating objects should use this with a large young generation
    * Server Applications (my observation)

4.2 The Throughput collector

    * By default the throughput collector uses the number of CPU's as its value for number of GC threads.
    * On a computer with one CPU it will not perform as well as the default collector
    * Overhead from parallel execution (synchronization costs)
    * With 2 CPU's the throughput collector performs as well as the default garbage collector.
    * With more then 2 CPU's you can expect to see a reduction in minor GC pause times
    * You can control the number of threads with -XX:ParallelGCThreads=n
    * Fragmentation can occur
          o Reduce GC threads
          o Increase Tenured Generation size

4.2.1 Adaptive Sizing

    * Keeps stats about GC times, allocation rates, and free space then sizes young and tenured generation to best fit the app.
    * J2SE 1.4.1 and later
    * -XX:+UseAdaptiveSizePolicy (on by default)

4.2.2 Aggressive Heap

    * Attempts to make maximum use of physical memory for the heap
    * Inspects computer resources (memory, num processors) and sets params optimal for long running memory allocation intensive jobs.
    * Must have at least 256MB of RAM
    * For lots of CPU's and RAM, but 1.4.1+ has shown improvements on 4-Way machines.
    * -XX:+AggressiveHeap

4.3 When to use the Concurrent Low Pause Collector

    * Apps that benefit from shorter GC pauses, and can share resources with GC during execution.
    * Apps with large sets of long living data (tenured generation)
    * Two or more processors
    * Interactive apps with modest tenured generation size, and one CPU

4.4 The Concurrent Low Pause Collector

    * Uses a separate GC thread to do parts of the major collection concurrently with the app threads.
    * Pauses App threads in the beginning of a collection and toward the middle (longer pause in middle)
    * The rest of the GC is in a single thread that runs at the same time as the app

4.4.1 Overhead of Concurrency

    * Doesn't provide much of an advantage on single processor machines.
    * Fragmentation can occur.
    * Two processor machine eliminates pauses due to the GC thread.
    * The more CPU's the advantages of concurrent collector increase.

4.4.2 Young Generation Guarantee

    * There has to be enough contiguous space available in the tenured generation for all objects in the eden and one survivor space.
    * A larger heap is needed compared to the default collector.
    * Add the size of the young generation to the tenured generation.

4.4.3 Full Collections

    * If the concurrent collector is unable to finish collecting the tenured generation before the tenured generation fills up, the application is paused and the collection is completed.
    * When this happens you should make some adjustments to your GC params

4.4.4 Floating Garbage

    * Floating Garbage - Objects that die while the GC is running (after they have been checked).
    * Increase the tenured generation by 20% to reduce floating garbage.

4.4.5 Pauses

    * First Pause - marks live objects - initial marking
    * Second Pause - remarking phase - checks objects that were missed during the concurrent marking phase due to the concurrent execution of the app threads.

4.4.6 Concurrent Phases

    * Concurrent Marking phase occurs between initial mark and remarking phase.
    * Concurrent sweeping phase collects dead objects after the remarking phase.

4.4.7 Measurements with the Concurrent Collector

    * Use -verbose:gc with -XX:+PrintGCDetails
    * vCMS-initial-mark shows GC stats for the initial marking phase
    * CMS-concurrent-mark - shows GC stats for concurrent marking phase.
    * CMS-concurrent-sweep - shows stats for concurrent sweeping phase
    * CMS-concurrent-preclean - stats for determining work that can be done concurrently
    * CMS-remark - stats for the remarking phase.
    * CMS-concurrent-reset - concurrent stuff is done, ready for next collection.

4.4.8 Parallel Minor Collection Options with Concurrent Collector

    * -XX:+UseParNewGC - for multiprocessor machines, enables multi threaded young generation collection.
    * -XX:+CMSParallelRemarkEnabled - reduce remark pauses

4.5 When to use the Incremental Low Pause Collector

    * Use when you can afford to tradeoff longer and more frequent young generation GC pauses for shorter tenured generation pauses
    * You have a large tenured generation
    * Single Processor

4.6 The Incremental Low Pause Collector

    * Minor collections same as default collector.
    * Don't use try to use parallel GC with this collector
    * Incrementally Collects parts of the tenured generation at each young collection.
    * Tries to avoid long major collections by doing small chunks each minor collection.
    * Can cause fragmentation of the heap. Sometimes need to increase tenured generation size compared to the default.
    * There is some overhead required to maintain the position of the incremental collector. Less overhead than is required by the default collector.
    * First try the default collector, and adjust heap sizing. If major pauses are too long try incremental.
    * If the incremental collector can't collect the tenured generation fast enough you will run out of memory, try reducing the young generation.
    * If young generation collections do not free any space, could be because of fragmentation. Increase tenured generation size.

4.6.1 Measurements with the Incremental Collector

    * -verbose:gc and -XX:+PrintGCDetails
    * Look for the Train: to see the stats for the incremental collection.

5 Other Considerations

    * The permanent generation may be a factor on apps that dynamically generate and load many classes (JSP, CFM application servers)
    * You may need to increase the MaxPermSize, eg: -XX:MaxPermSize=128m
    * Apps that rely on finalization (finalize method, or finally clauses) will cause lag in garbage collection. This is a bad idea, use only for errorious situations.
    * Explicit garbage collection calls (System.gc()) force a major collection. You can measure the effectiveness of these calls by disabling them with -XX:+DisableExplicitGC
    * RMI garbage collection intervals can be controlled with
          o -Dsun.rmi.dgc.client.gcInteraval=3600000
          o -Dsun.rmi.dgc.server.gcInterval=3600000
    * On Solaris 8+ you can enable libthreads, lightweight thread processes, these may increase thread performance.
    * To enable add /usr/lib/lwp to LD_LIBRARY_PATH
    * Soft References cleared less aggressively in server.
    * -XX:SoftRefLRUPolicyMSPerMB=10000
    * Default value is 1000, or one second per MB

6 Conclusion

    * GC can be bottleneck in your app.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

There are some good notes on JVM tuning from Mike Shaw on Steven Chan’s blog  here , here and here and some good Metalink notes at end of this post.
                     Important thing missing from all these notes (for some one like me who is new to Java) is basics of Garbage Collection, Generation and how to read GC output.
In this post I’ll start with basics of JVM GC (Garbage Collection) and then in next post apply this theory for real time performance issues w.r.t. JVM (11i Java Virtual Machine) .

Garbage - Java object is considered garbage when it can no longer be reached from any pointer in the running program.

Generations - Memory in JVM is managed in terms of generation i.e. Young generation and tenured generation. Memory pool holding object of different ages like young, tenured. If a particular generation fills up, garbage collection occurs.

A. Young generation - Objects are initially allocated in Young generation (most of objects die here). When Young generation fills up, it causes Minor Garbage Collection. Any objects survived after Minor GC (Garbage Collection) are moved to Tenured Generation.  Minor Garbage collection is quick as compared to Full/Major GC.

B. Tenured generation - Surviving objects (from young generation) after minor garbage collection are moved to area called tenured generation, When tenured generation fills up it causes major collection (aka Full GC or Full Garbage Collection). Major collection is slow as it involves all live objects.

Garbage Collection (GC) - is program which clears garbage(dead java objects). Garbage Collection work on fundamental principle that majority of java objects die young (quickly after arriving in JVM). There are two kind of Garbage Collection Minor Garbage Collection and Major Garbage Collection (aka Full GC)

Example of Minor GC -  3824.164: [GC 196725K->141181K(209864K), 0.3295949 secs]
Example of Minor GC -  3841.051: [Full GC 150466K->87061K(217032K), 3.2626248 secs]

Pauses: is the time when application becomes unresponsive because garbage collection is occurring.

.

Understanding JVM parameter for 11i
Sizing the generation is very important in tuning JVM GC. Before jumping to Sizing generation (Young and Tenured) lets look at default 11i JVM parameters

In context file($APPL_TOP/ admin/ $CONTEXT_NAME.xml) default entry for JVM is like

<jvm_options oa_var=”s_jvm_options” osd=”Solaris”>-verbose:gc -Xmx512M -Xms128M -XX:MaxPermSize=128M -XX:NewRatio=2-XX:+PrintGCTimeStamps -XX:+UseTLAB </jvm_options>

1. Above line represents JVM (OACoreGroup) size in 11i
2. -Xms128M, means start with 128MB heap size
3. -Xmx512M, means grow JVM heap size upto max size of 512 MB
4. -XX:NewRatio=2 is to control young generation i.e. ratio between young and tenured generation is 1:2 (i.e. if size of young generation is 50 MB then size of tenured generation should be approx. 100MB)
5. -XX:MaxPermSize=128M limit the permanent generation to 128M (permanent generation is part/area in tenured generation)
6. -XX:+UseTLAB represents to use thread-local object allocation
7. There are two more parameters (11i JVM uses default values) -XX:MinHeapFreeRatio=<minimum> & -XX:MaxHeapFreeRatio=<maximum> with default value of 40 & 70 resp. (for Solaris)

If percentage of free space in generation falls below 40%, size of generation will expand and if percentage of free space exceeds 70%, the size of generation will shrunk.

.
Various type of Garbage Collector
From JDK 1.4.2 there are total 4 type of collectors (prior to 1.4.2 it was just one collector i.e. default collector)

1. Default Collector: JDK prior to 1.4.2 uses default collector. If you don’t specify any parameter with JVM default is default collector.

2. ThroughPut Collector : This collector uses parallel version of young generation collector but Tenrured generation is collected in normal way. To set throughput collector use -XX:+UseParallelGC  so change

<jvm_options oa_var=”s_jvm_options” osd=”Solaris”>-verbose:gc -Xmx512M -Xms128M -XX:MaxPermSize=128M -XX:NewRatio=2 -XX:+PrintGCTimeStamps -XX:+UseTLAB </jvm_options>
to
<jvm_options oa_var=”s_jvm_options” osd=”Solaris”>-verbose:gc -Xmx512M -Xms128M -XX:MaxPermSize=128M -XX:NewRatio=2 -XX:+PrintGCTimeStamps -XX:+UseTLAB -XX:+UseParallelGC</jvm_options>

3. Concurrent Low Pause Collector : Concurrent Collector is used to collect tenured generation collection concurrently with execution of application. Parallel version of collector is used for young generation. To set Concurrent Low Pause Collector use -XX:+UseConcMarkSweepGC
like
<jvm_options oa_var=”s_jvm_options” osd=”Solaris”>-verbose:gc -Xmx512M -Xms128M -XX:MaxPermSize=128M -XX:NewRatio=2 -XX:+PrintGCTimeStamps -XX:+UseTLAB -XX:+UseConcMarkSweepGC</jvm_options>

4. Incremental low pause collector : This collector collects just portion of tenured generation at each minor garbage collection. To use Incremental low pause collector use
-Xincgc

If you are on JDK 1.4.2 with multi CPU try setting Concurrent Low Pause Collectoras Garbage Collector.

Thumb rule for Grabage Collection/ JVM tuning w.r.t. 11i
1.Stay on latest JVM/JDK version where ever possible (latest certified with 11i is JRE 6, you should be at-least 1.4.2 and higher)
2. For OACoreGroup consider no more than 100 active users per JVM
3. There should NOT be more than 1 active JVM per CPU
4. Try to reduce GC (Garbage Collection) frequency (specially Major/Full GC). Play with various JVM parameters like (-Xmx, -Xms, -XX:MaxPermSize, -XX:NewRatio, -XX:+UseParallelGC/ -XX:+UseConcMarkSweepGC)
5. If you are on JDK 1.4.2 with multiple CPU middle tier, use Concurrent Low Pause Garbage Collector  by setting -XX:+UseConcMarkSweepGC with JVM
6. If you are using Oracle Configurator, assign dedicated JVM for configurator requests
7. Try setting JVM max size NOTgreater than 1 GB, (use multiple JVM’s of 512MB or 1024 MB), this is to reduce GC time (more heap size means more time in GC)
8. Minor GC should be occurring at interval long enough to allow many objects to die young (i.e. lot of objects should die between two minor GC).
9. Throughput (which is time NOT spent on GC) is inversely proportion to amount of memory. Higher the memory for JVM, more time for GC meaning low throughput.
10. Unless you have problems with pauses (time when application becomes unresponsive because garbage collection is occurring), try granting as much memory as possible to VM (128 to 512 is good start and fine tune as per load testing results)
.

How to find JDK version used by Apache/Jserv (JVM) in 11i ?

In context file search for parameter like s_jdktop

<JDK_TOP oa_var=”s_jdktop”>/oracle/apps/11i/vis11icomn/util/java/1.4/j2sdk1.4.2_04</JDK_TOP>

Where is JVM log location in 11i ?
$IAS_ORACLE_HOME/ Apache/ Jserv/ logs/ jvm/ OACoreGroup.0.stdout  (GC output)
$IAS_ORACLE_HOME/ Apache/ Jserv/ logs/ jvm/ OACoreGroup.0.stderr  (JVM Error)

.

How to read GC (JVM stdout) file ?

Example of JVM out file to understand Garbage Collection in 11i

3824.164: [GC 196725K->141181K(209864K), 0.3295949 secs]
3840.734: [GC 207741K->150466K(217032K), 0.3168890 secs]
3841.051: [Full GC 150466K->87061K(217032K), 3.2626248 secs]
3854.717: [GC 155413K->97857K(215568K), 0.2732267 secs]
3874.714: [GC 166209K->109946K(215568K), 0.3498301 secs]

1. Line 1,2 4 and 5 are example of Minor Collection
2. Line 3 (Full GC) is example of Major Collection
3. First entry in each line is time in seconds since JVM started, To find out time between two GC (Garbage Collection) just subtract second entry from first i.e. (3840.734 - 3824.164 = 16.57 seconds)
4. 196725K->141181K in first line indicates combined size of live objects before and after Garbage Collection (GC)
5. (209864K) in first line in parenthesis, represents object after minor collection that aren’t necessarily alive but can’t be reclaimed, either because they are directly alive, or because they are referenced from objects in tenured generation.
6. 0.3295949 secs in first line represents time taken to run minor collection.
7. Full GC in line three represents Full Garbage Collection or Major Collection

References

  • 362851.1  Guidelines to setup the JVM in Apps E-Business Suite 11i and R12
  • 370583.1  Basic troubleshooting of JVM consuming cpu or too many JDBC connections in Apps 11i
  • 567647.1  Using Various Garbage Collection Methods For JVM Tuning
  • 390031.1  Performance Tuning Forms Listener Servlet In Oracle Applications


Regards
Manoj

No comments:

Post a Comment

Oracle E-business suite logs clean up

 Oracle E-business suite logs clean up #!/bin/bash cd $EBS_DOMAIN_HOME find $EBS_DOMAIN_HOME -type f -path "*/logs/*.log?*" -mtime...