Java Performance Mindmap
Java Performance Mindmap
Java Performance Mindmap
JVM KEY: verbose:gc USE TOOLS JVM KEY: -X X:+PrintGCDetails VisualGC try change thread stack size (each thread takes 200-300 KBytes for its stack by default) JVM KEY: -Xss<size>
Hack: you can try to replace the HotSpot DLL in 1.6 by HotSpot v23 from JVM v7
The JVM v6 has HotSpot v20, this HotSpot version doesn't have many critical performance xes! try increase the heap size JVM KEY: -Xmx<size> than more memory for GC than better its work!
The default GC is chosen for your platform characteristics!!! CMS or G1 are never chosen as default ones!!!
Serial GC JVM KEY: -X X:+UseSerialGC JVM KEY: -X X:+UseParallelGC JVM KEY: -X X:+UseParallelOldGC CMS (Concurrent Mark Sweep) only the GC supports NUMA! it didn't turn on the old gen collection until 7u2 it will be chosen automatically since 7u2 Doesn't unload classes by default JVM KEY: -X X:+CMSClassUnloadingEnabled
Parallel GC
set the allowed time interval between GC (a soft parameter) a good productivity is desired Throughput Latency
Glossary
pause length < 0.5-1s minimal tuning is desired the heap size more than 5-8Gb the heap is used more than 50%
http://www.amazon.com/gp/product/0596003773
http://www.amazon.com/Java-Performance-Charlie-Hunt/dp/0137142528
Books
GC Oracle HotSpot
serious vary object allocation time (for inst. day 500Mb/sec, night 10Mb/sec) try change CG algorithm lesser heap fragmentation desired (decreasing FullGC) your current chosen GC works well not recommended if.. strict requirement for pauses lesser than 100ms the maximal throughput is desired (use ParallelGC)
common tuning GC
do you have heap less than 2GB? do you need lesser pauses? do you have strict requirement for pauses <20-30msec?
Materials
Java Performance don't be afraid to use the GC logging in production!!! the possible overhead is very low!
JVM KEY: -X X:+PrintGCDetails JVM KEY: -X X:+PrintGCTimeStamps JVM KEY: -X X:+PrintGCDateStamps
Sergey Kuksenko
[email protected] http://ru.linkedin.com/pub/sergey-kuksenko/0/b49/b81
JVM KEY: -X X:+PrintHeapAtGC JVM KEY: -X X:+PrintTenuringDistribution JVM KEY: -Xloggc=< le> PrintGCStats
Authors
DeterministicGC
Vladimir Ivanov
Garbage Collector
USE TOOLS JVM KEY: -X X:+PrintCompilation MXBeans (VisualVM)
Materials
Igor Maznitsa
[email protected] http://www.igormaznitsa.com http://ru.linkedin.com/in/igormaznitsa
try change work mode (select the server mode for the JVM)
It's a good idea to make warm-up calls for your methods to be compiled into native code
JVM KEY: -X X:InlineSmallCode=<size of native code in bytes> JVM KEY: -X X:FreqInlineSize=<bytecode size>
The default value varies with the platform on which the JVM is running.
JVM KEY: -client sudo apt-get install sysstat mpstat - Report processors related statistics
Must be the rst switch provided on the command line. Choose the mode if you need faster start
CPU
Oracle JVM has a lot of keys to tune JIT compilation JVM KEY: verbose:class MXBeans JVM KEY: --no-verify JVM KEY: -Xshare:on
netstat - Print network connections, tables, statistics, connections sudo apt-get install iptraf sudo apt-get install bwm-ng iptraf - Interactive Colorful IP LAN Monitor
USE TOOLS
Network and IO
Classload
bwm-ng - a live bandwidth monitor for network and disk io sar - collect, report, or save system activity information top - display Linux tasks strace - trace system calls and signals
try disable class veri cation try switch on the class data sharing Ubuntu USE TOOLS Prolers
System
make some simpli cation of algorithms try algorithms with lesser "performance constants" (ArrayList instead of LinkedList for instance) check that you have caching in appropriate places may be it is better to use new objects instead caching in your case?
opro le - is a system-wide pro ler for Linux systems vmstat - Report virtual memory statistics sudo apt-get install numactl numastat - Print statistics about NUMA memory allocation
TOOLS
algorithmic problems?
data (anti)caching polling
active idle http://java.net/projects/gchisto http://java.net/projects/printgcstats/ GChisto - a garbage collection log visualization tool
PrintGCStats - a tool to report garbage collection statistics from HotSpot GC VisualGC - a plugin for VisualVM jstack - Stack Trace jrmc - Oracle JRocket Mission Control pro ler
GC
Java
Executor
http://visualvm.java.net/
Java Performance Mind-Map (the info is compiled from presentations and the internet.) IT IS NOT AN OFFICIAL DOCUMENT AND MAY HAVE ERRORS! USE IT FOR YOUR OWN RISK!
v 1.0.8
spinloops
They look like 100% CPU loading solstudio USE TOOLS vtune perf hardware counters try large memory pages Out of memory but not grows JVM KEY: -X X:+UseLargePages JVM KEY: -X X:PermSize JVM KEY: -X X:MaxPermSize JVM KEY: -X X:+UseConcMarkSweepGC Grows and out of memory JVM KEY: -X X:+CMSPermGenSweepingEnabled JVM KEY: -X X:+CMSClassUnloadingEnabled
netstat sar iptraf bwm-ng check network cables and their characteristics! try decrease number of writing/reading operations try decrease data size for writing/reading operations try data compression try buerization, Bandwith-Delay Product, MTU try to change network interfaces to faster ones try to use virtual network interfaces (move your application components into cloud) vmstat mpstat USE TOOLS
USE TOOLS
busstat (Solaris)
memory bandwidth
try faster memory (dual channel memory instead single channel memory) several channels in IMC (integrated memory controller) solstudio USE TOOLS vtune perf hardware counters try compressed 32 bit pointers JVM KEY: -X X:+UseCompressedOops for 64 bit systems!
USE TOOLS
0 - no prefetch instructions are generated 1 - execute prefetch instructions after each allocation (DEFAULT) 2 - use TLAB allocation watermark pointer to gate when prefetch instructions are executed JVM KEY: -X X:AllocatePrefetchLines=<number of lines>
USE TOOLS
try to add physical memory try to decrease memory per process swappiness
swapping?
JVM KEY: -X X:AllocatePrefetchDistance=<distance in bytes> try enable/disable hardware prefetcher Temporal locality Spatial locality try block decompositions try more compact data structures Complex java.util collections may take 14-30 times more memory per its item than its primitive representation! since Java 6u21 since Java 6u20
JVM KEY: -X X:+UseCompressedStrings strings JVM KEY: -X X:+OptimizeStringConcat JVM KEY: -X X:+UseStringCache Java prolers USE TOOLS perf solstudio hardware counters
mpstat sar
plain shared memory (not any guarantee) HB via volatile (guarantee that changes are visible)
device communication?
Atomics (CAS) (guarantee of atomic changes) Spin-loops Spin-locks Locks Wait-locks They generate 100% CPU loading synchronized java.util.concurrent.ReentrantLock There is a bug in java.util.concurrent.locks.ReentrantReadWriteLock, the bug xed since 7b25! consistence
try balance irq processing, may be only one CPU processes interruptions in the system check number of timers in your system
system items?
USE TOOLS
use light condition before hard operations it doesnt work? it's a hard operation Locks
top sar
USE TOOLS
Thread Locals Check that your threads don't share a java.util.Random object Use java.util.concurrent.ThreadLocalRandom
vmstat mpstat jstack add/increase parallelization into your application switch-o CMT (chip multithreading) (?) USE TOOLS lock prolers (jrmc, etc) jstack (it will show only very big lock) USE TOOLS wait locks RUNNABLE thread number is not enough? vtune USE TOOLS too few GC threads? (a rare case) USE TOOLS solstudio perf hardware counters overclocking the frequency is not enough? tune cpufreq check for the "ondemand" mode and change if it is turned on Libraries https://github.com/peter-lawrey/Java-Thread-Anity numastat try restrict communications between cores, packages, data centers JVM KEY: -X X:+UseNUMA USE TOOLS thread number is not enough? check "false sharing" (cores working with the same cache memory line) make object padding
ARM (32 bytes) x86/SPARC/ARM (64 bytes) PowerPC (128 bytes) @sun.misc.Contended Break to independent objects and padding Java 1.8
try to optimize locks and decrease their number try lock-free algorithms and data structures JVM KEY: -verbose:gc
CPU problems?
Remember that to wake up a Thread in Java is an expensive operation takes about 50 uS! try special code go to native code (JNI) make own intrinsics for JIT cryptoprocessors GPU Java->JNI calls are faster than JNI->Java ones too hardcore solution!
Use ForkAndJoin to parallel your tasks try decrease number of branches limited ILP (instruction level parallelism)?
try rewrite code to decrease LSD (Loop Stream Detector) make data loose coupling
it's a very hard approach and mainly it's impossible from java