Java Performance Mindmap

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Use new releases of JVM! 1.6 is going to EOL (end of life) on autumn 2012!!!

JVM KEY: verbose:gc USE TOOLS JVM KEY: -X X:+PrintGCDetails VisualGC try change thread stack size (each thread takes 200-300 KBytes for its stack by default) JVM KEY: -Xss<size>
Hack: you can try to replace the HotSpot DLL in 1.6 by HotSpot v23 from JVM v7

The JVM v6 has HotSpot v20, this HotSpot version doesn't have many critical performance xes! try increase the heap size JVM KEY: -Xmx<size> than more memory for GC than better its work!

The default GC is chosen for your platform characteristics!!! CMS or G1 are never chosen as default ones!!!
Serial GC JVM KEY: -X X:+UseSerialGC JVM KEY: -X X:+UseParallelGC JVM KEY: -X X:+UseParallelOldGC CMS (Concurrent Mark Sweep) only the GC supports NUMA! it didn't turn on the old gen collection until 7u2 it will be chosen automatically since 7u2 Doesn't unload classes by default JVM KEY: -X X:+CMSClassUnloadingEnabled

Parallel GC

JVM KEY: -X X:+UseConcMarkSweepGC

Fully supported by Oracle since 7u4!!!


Don't recommended to be used for JVM 6!
At present (june 2012) it knows nothing about NUMA! JVM KEY: -X X:+UseG1GC the default heap memory region size 1MB tuning JVM KEY: -X X:G1HeapRegionSize=n the maximum value 32MB!

set the maximum pause for GC (a soft parameter)

JVM KEY: -X X:MaxGCPauseMillis=<milliseconds> JVM KEY: -X X:GCPauseIntervalMillis=<milliseconds>

set the allowed time interval between GC (a soft parameter) a good productivity is desired Throughput Latency

Glossary

Garbage-First GC (G1) recommended if..

pause length < 0.5-1s minimal tuning is desired the heap size more than 5-8Gb the heap is used more than 50%

http://www.amazon.com/gp/product/0596003773

Java Performance Tuning (2nd Edition) Java Performance

Jack Shirazi Charlie Hunt

http://www.amazon.com/Java-Performance-Charlie-Hunt/dp/0137142528

Books

GC Oracle HotSpot

serious vary object allocation time (for inst. day 500Mb/sec, night 10Mb/sec) try change CG algorithm lesser heap fragmentation desired (decreasing FullGC) your current chosen GC works well not recommended if.. strict requirement for pauses lesser than 100ms the maximal throughput is desired (use ParallelGC)

Information sources and authors of used materials

common tuning GC

[email protected] https://shipilev.net http://www.linkedin.com/in/alekseyshipilev https://shipilev.net/pub/talks/jeeconf-May2012-perfMethodology.pdf https://shipilev.net/pub/talks/jugru-June2012-perfMethodology-hi. v

Aleksey Shipilev JVM problems?

do you need maximal throughput?

still choosing GC?

do you have heap less than 2GB? do you need lesser pauses? do you have strict requirement for pauses <20-30msec?

Materials

Java Performance don't be afraid to use the GC logging in production!!! the possible overhead is very low!
JVM KEY: -X X:+PrintGCDetails JVM KEY: -X X:+PrintGCTimeStamps JVM KEY: -X X:+PrintGCDateStamps

Sergey Kuksenko
[email protected] http://ru.linkedin.com/pub/sergey-kuksenko/0/b49/b81

The Main information sources


USE TOOLS

JVM KEY: -X X:+PrintHeapAtGC JVM KEY: -X X:+PrintTenuringDistribution JVM KEY: -Xloggc=< le> PrintGCStats

Authors

GChisto VisualGC JVM KEY: -XgcPrio:deterministic JVM KEY: -XpauseTarget=<milliseconds>

Oracle JRockit [email protected] http://ru.linkedin.com/in/iwanowww http://vimeo.com/43574752 http://www.slideshare.net/iwanowww/g1-gc-hotspot-jvm

try change GC algorithm

DeterministicGC

Vladimir Ivanov

Garbage Collector
USE TOOLS JVM KEY: -X X:+PrintCompilation MXBeans (VisualVM)

Materials

THE DEFAULT MODE DEPENDS ON THE PLATFORM CHARACTERISTICS!!!


Must be the rst switch provided on the command line. It makes more aggressive optimization Chose the mode if you need faster work JIT a method will be compiled into native code after 10000 calls (by default) slow launch? JVM KEY: -XX:+TieredCompilation JVM KEY: -X X:MaxInlineSize=<bytecode size> may be to try more aggressive inline strategy? the default value is 35 (it is very small number) The default value varies with the platform on which the JVM is running. JVM KEY: -X X:CompileThreshold=<calls number> the default value is 10000

Igor Maznitsa
[email protected] http://www.igormaznitsa.com http://ru.linkedin.com/in/igormaznitsa

The mind map preparation and translation

try change work mode (select the server mode for the JVM)

JVM KEY: -server

It's a good idea to make warm-up calls for your methods to be compiled into native code

JVM KEY: -X X:InlineSmallCode=<size of native code in bytes> JVM KEY: -X X:FreqInlineSize=<bytecode size>

The default value varies with the platform on which the JVM is running.

JVM KEY: -client sudo apt-get install sysstat mpstat - Report processors related statistics

Must be the rst switch provided on the command line. Choose the mode if you need faster start

CPU

Oracle JVM has a lot of keys to tune JIT compilation JVM KEY: verbose:class MXBeans JVM KEY: --no-verify JVM KEY: -Xshare:on

netstat - Print network connections, tables, statistics, connections sudo apt-get install iptraf sudo apt-get install bwm-ng iptraf - Interactive Colorful IP LAN Monitor

USE TOOLS

Network and IO

Classload

bwm-ng - a live bandwidth monitor for network and disk io sar - collect, report, or save system activity information top - display Linux tasks strace - trace system calls and signals

try disable class veri cation try switch on the class data sharing Ubuntu USE TOOLS Prolers

sudo apt-get install linux-tools

perf - Performance analysis tools for Linux

System

too complex algorithms

make some simpli cation of algorithms try algorithms with lesser "performance constants" (ArrayList instead of LinkedList for instance) check that you have caching in appropriate places may be it is better to use new objects instead caching in your case?

opro le - is a system-wide pro ler for Linux systems vmstat - Report virtual memory statistics sudo apt-get install numactl numastat - Print statistics about NUMA memory allocation

TOOLS

algorithmic problems?
data (anti)caching polling

active idle http://java.net/projects/gchisto http://java.net/projects/printgcstats/ GChisto - a garbage collection log visualization tool

PrintGCStats - a tool to report garbage collection statistics from HotSpot GC VisualGC - a plugin for VisualVM jstack - Stack Trace jrmc - Oracle JRocket Mission Control pro ler

GC
Java

Executor

http://visualvm.java.net/

VisualVM - a visual tool integrating several command line JDK tools

Java Performance Mind-Map (the info is compiled from presentations and the internet.) IT IS NOT AN OFFICIAL DOCUMENT AND MAY HAVE ERRORS! USE IT FOR YOUR OWN RISK!
v 1.0.8

too big %usr (mpstat)

spinloops

They look like 100% CPU loading solstudio USE TOOLS vtune perf hardware counters try large memory pages Out of memory but not grows JVM KEY: -X X:+UseLargePages JVM KEY: -X X:PermSize JVM KEY: -X X:MaxPermSize JVM KEY: -X X:+UseConcMarkSweepGC Grows and out of memory JVM KEY: -X X:+CMSPermGenSweepingEnabled JVM KEY: -X X:+CMSClassUnloadingEnabled

TLB (translation lookaside buer)

netstat sar iptraf bwm-ng check network cables and their characteristics! try decrease number of writing/reading operations try decrease data size for writing/reading operations try data compression try buerization, Bandwith-Delay Product, MTU try to change network interfaces to faster ones try to use virtual network interfaces (move your application components into cloud) vmstat mpstat USE TOOLS

Troubles with PermGen

USE TOOLS

busstat (Solaris)

too big network utilizing?

memory bandwidth

try faster memory (dual channel memory instead single channel memory) several channels in IMC (integrated memory controller) solstudio USE TOOLS vtune perf hardware counters try compressed 32 bit pointers JVM KEY: -X X:+UseCompressedOops for 64 bit systems!

USE TOOLS

too big scheduler utilizing?

shrink your data sets (remember that RAM is a slow entity!)

too big %sys (mpstat)


JVM KEY: -X X:AllocPrefetchStyle=<N> capacity try enable/disable software JVM prefertcher

try limit the thread number in your application top sar

0 - no prefetch instructions are generated 1 - execute prefetch instructions after each allocation (DEFAULT) 2 - use TLAB allocation watermark pointer to gate when prefetch instructions are executed JVM KEY: -X X:AllocatePrefetchLines=<number of lines>

USE TOOLS

try to add physical memory try to decrease memory per process swappiness

swapping?

JVM KEY: -X X:AllocatePrefetchDistance=<distance in bytes> try enable/disable hardware prefetcher Temporal locality Spatial locality try block decompositions try more compact data structures Complex java.util collections may take 14-30 times more memory per its item than its primitive representation! since Java 6u21 since Java 6u20

GC IS NOT SWAPPING FRIENDLY!!!!


strace perf oprole tune kernel open bugs for the kernel USE TOOLS

JVM KEY: -X X:+UseCompressedStrings strings JVM KEY: -X X:+OptimizeStringConcat JVM KEY: -X X:+UseStringCache Java prolers USE TOOLS perf solstudio hardware counters

memory problems? kernel calls?

mpstat sar

caches USE TOOLS

plain shared memory (not any guarantee) HB via volatile (guarantee that changes are visible)

device communication?

too big %irq,%soft (mpstat)


primitives

Atomics (CAS) (guarantee of atomic changes) Spin-loops Spin-locks Locks Wait-locks They generate 100% CPU loading synchronized java.util.concurrent.ReentrantLock There is a bug in java.util.concurrent.locks.ReentrantReadWriteLock, the bug xed since 7b25! consistence

try balance irq processing, may be only one CPU processes interruptions in the system check number of timers in your system

system items?

iostat sar caching buerization

USE TOOLS

many disk operations?


decrease number of disk operations DON'T USE SSD!

choose the right primitive for interthread communication coherence

expected number of conicts expected conict length

too big %iowait (mpstat)

try noncoherency checks

use light condition before hard operations it doesnt work? it's a hard operation Locks

top sar

USE TOOLS

try striping shared places

Queues Counters Immutability

le/block cache number is not enough?


techniques

increase memory for caches don't call ush() too often

try give up interthread communication at all

Thread Locals Check that your threads don't share a java.util.Random object Use java.util.concurrent.ThreadLocalRandom

vmstat mpstat jstack add/increase parallelization into your application switch-o CMT (chip multithreading) (?) USE TOOLS lock prolers (jrmc, etc) jstack (it will show only very big lock) USE TOOLS wait locks RUNNABLE thread number is not enough? vtune USE TOOLS too few GC threads? (a rare case) USE TOOLS solstudio perf hardware counters overclocking the frequency is not enough? tune cpufreq check for the "ondemand" mode and change if it is turned on Libraries https://github.com/peter-lawrey/Java-Thread-Anity numastat try restrict communications between cores, packages, data centers JVM KEY: -X X:+UseNUMA USE TOOLS thread number is not enough? check "false sharing" (cores working with the same cache memory line) make object padding

ARM (32 bytes) x86/SPARC/ARM (64 bytes) PowerPC (128 bytes) @sun.misc.Contended Break to independent objects and padding Java 1.8

too big %idle (mpstat)

NUMA(NUCA) Non-Uniform Memory Access

Fractal structure try switch on NUMA

it partly works under windows (only interleaving)!

try to optimize locks and decrease their number try lock-free algorithms and data structures JVM KEY: -verbose:gc

try increase GC thread number decrease GC pauses

lock critical threads to a CPU (Thread anity)

CPU problems?

Remember that to wake up a Thread in Java is an expensive operation takes about 50 uS! try special code go to native code (JNI) make own intrinsics for JIT cryptoprocessors GPU Java->JNI calls are faster than JNI->Java ones too hardcore solution!

the number of execution units is not enough?

try special equipment add more CPUs

Use ForkAndJoin to parallel your tasks try decrease number of branches limited ILP (instruction level parallelism)?

Since Java 1.7

try rewrite code to decrease LSD (Loop Stream Detector) make data loose coupling

it's a very hard approach and mainly it's impossible from java

You might also like