Computing Infrastructure and Performance For CAE: George Chaltas Intel Corporation July 2008
Computing Infrastructure and Performance For CAE: George Chaltas Intel Corporation July 2008
Computing Infrastructure and Performance For CAE: George Chaltas Intel Corporation July 2008
Processor
TICK Pentium
D, Xeon
32nm
45nm
65nm
11
Copyright 2008 Intel Corporation
45nm Advantage
820m Transistors
12 MB Cache
107 mm
2
*
143 mm
2
*
582m Transistors
8 MB Cache
143 mm
2
*
Intel
Xeon
5300 Processor
(Clovertown)
65nm
Intel
Xeon
5400 Processor
(Harpertown)
45nm Hi-k
107 mm
2
*
12
Copyright 2008 Intel Corporation
1.00
1.19
Quad-Core Intel Xeon X5365/
1333 FSB
Quad-Core Intel Xeon E5472/
1600 FSB
Quad-Core Intel Xeon Processor
5400 Series Performance
ANSYS* Mechanical* 11.0
Quad-Core Intel Xeon E5472 performance is 19% to 26%
faster than previous quad core processor
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests.
Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of
systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
Relative Performance
Higher is better
1.00
1.26
Quad-Core Intel Xeon X5365/
1333 FSB
Quad-Core Intel Xeon E5472/
1600 FSB
Relative Performance
Higher is better
FLUENT* 6.3.26
Data Source: Approved/published results as of Nov 11, 2007 (Fluent) and Nov 28, 2007 (ANSYS) using standard benchmarks, confi gurations on slide 30.
13
Copyright 2008 Intel Corporation
1.00
1.48
1.32
Dual-Core Intel Xeon
5160/ 1333 FSB
Quad-Core Intel Xeon
X5365/ 1333 FSB
Quad-Core Intel Xeon
E5472/ 1600 FSB
Quad-Core Intel Xeon Processor
5400 Series Performance
Abaqus/Standard v6.7-2
+13%
Quad-Core Intel Xeon E5472 performance is 13% - 24%
faster than previous quad core processor
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests.
Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of
systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
Relative Performance
Higher is better
1.00
1.74
1.41
Dual-Core Intel Xeon
5160/ 1333 FSB
Quad-Core Intel Xeon
X5365/ 1333 FSB
Quad-Core Intel Xeon
E5472/ 1600 FSB
Relative Performance
Higher is better
Abaqus/Explicit* v6.7-2
+24%
Data Source: Approved/published results as of Nov 1, 2007 using standard Simulia benchmarks, configurations on slide 30.
14
Copyright 2008 Intel Corporation
1.00
1.78
1.40
Dual-Core Intel Xeon
5160/ 1333 FSB
Quad-Core Intel Xeon
X5365/ 1333 FSB
Quad-Core Intel Xeon
E5472/ 1600 FSB
Quad-Core Intel Xeon Processor
5400 Series Performance
PAM-CRASH* v2006.0
+27%
Quad-Core Intel Xeon E5472 performance is 15% - 27%
faster than previous quad core processor
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests.
Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of
systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
Relative Performance
Higher is better
1.00
1.81
1.58
Dual-Core Intel Xeon
5160/ 1333 FSB
Quad-Core Intel Xeon
X5365/ 1333 FSB
Quad-Core Intel Xeon
E5472/ 1600 FSB
Relative Performance
Higher is better
PowerFLOW* 4.0a
+15%
Data Source: Approved/published results as of Nov 11, 2007 using standard benchmarks, configurations on slide 30.
15
Copyright 2008 Intel Corporation
23560
39416
30198
Dual-Core Intel Xeon
5160/ 1333 FSB
Quad-Core Intel Xeon
X5365/ 1333 FSB
Quad-Core Intel Xeon
E5472/ 1600 FSB
Quad-Core Intel Xeon Processor
5400 Series Performance
LS-DYNA* 3 Vehicle Collision
+22%
Quad-Core Intel Xeon E5472 performance is about 22%
faster than previous quad core processor
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests.
Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of
systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
Elapsed time in seconds
Lower is better
1715
2885
2212
Dual-Core Intel Xeon
5160/ 1333 FSB
Quad-Core Intel Xeon
X5365/ 1333 FSB
Quad-Core Intel Xeon
E5472/ 1600 FSB
LS-DYNA* Neon Refined Revised
+22%
Data Source: Published/submitted results as of Nov 11, 2007 on www.topcrunch.org, configurations on slide 30.
Elapsed time in seconds
Lower is better
16
Copyright 2008 Intel Corporation
Tukwila for the Worlds Most Demanding
Computers
Quad-core with 30 MB cache
2 billion transistors
Multi-threading technology
Intel QuickPath interconnect
Dual integrated memory controllers
Estimate >2X
*
performance
Mainframe-class RAS
*Compared to Dual-core Itanium Processor 9100 series
HP has already successfully booted four key operating systems
(Linux, Windows, HP-UX and OpenVMS) on our Tukwila-based
Integrity servers.and have found the initial silicon to be robust
and of high quality.
Martin Fink, Senior VP & GM, Business Critical
Systems, HP
17
Copyright 2008 Intel Corporation
Next Generation Nehalem Processor
Greater Instruction per clock and
improved cache hierarchy
Simultaneous Multi-Threading
45nm Intel
multi-core processors
(2 and 4 core implementations planned)
Dynamic Resource Scaling
Any unneeded cores automatically put into sleep mode;
remaining operating cores get access to ALL cache,
bandwidth and power/thermal budgets
Turbo Mode
CPU operates at higher-than-stated frequency when
operating below power and thermal design points
KEY FEATURES
Faster Processing / core
Two Threads / core
Energy efficient multi-core processing
Lower power consumption during
periods of low utilization
Additional Processing boost during
peak demand periods
BENEFITS
Source: Intel. All future products, computer systems, dates, and figures specified are preliminary based
on current expectations, and are subject to change without notice.
Faster cores, More cores/threads, Dynamically adaptable
18
Copyright 2008 Intel Corporation
Intel Software Advantage
Intel works directly with CAE software vendors.
Intel provides software development tools
for High Performance Computing
Enables best performance on Intel Architecture
workstations and servers today.
Enables the definition of and adoption of new
platform and processor features tomorrow.
19
Copyright 2008 Intel Corporation
Software leadership in HPC
Intel valued supplier to the industry
including the very popular parallelism support from
Intel Threading Building Blocks
Intel has a rich and strong
SW tools portfolio for HPC
20
Copyright 2008 Intel Corporation
Performance & Scalability
Cluster Interconnect
File System
Model Decomposition
Model Complexity
Software Improvements
21
Copyright 2008 Intel Corporation
Interconnect Performance Impact
Ethernet vs. InifiniBand* Scaling Efficiency
FLUENT* 6.5.35, Sedan 4M Model, Intel "Endeavor" Cluster
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
120.0%
8 16 32 64 128
Cores (8 cores per node)
S
c
a
l
i
n
g
E
f
f
i
c
i
e
n
c
y
GigE
InfiniBand
4x with 16 nodes
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those
tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the
performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
Source: Intel internal measurements,
configuration on slide 29
22
Copyright 2008 Intel Corporation
File System Performance Impact
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those
tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the
performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
18%
19.7%
17%
Source: Intel internal measurements,
configuration on slide 29
File System Performance Impact
Simulia* ABAQUS*/Standard 6.7-EF1 (Implicit Solver)
5M Element model, Intel "Endeavor" Cluster
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
32p 64p 128p
Cores (8 cores per node)
R
e
l
a
t
i
v
e
P
e
r
f
o
r
m
a
n
c
e
(
H
i
g
h
e
r
i
s
b
e
t
t
e
r
)
default/Ethernet
Lustre*/InfiniBand*
23
Copyright 2008 Intel Corporation
Model Decomposition Affects
Scalability & Performance
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those
tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the
performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
LS-DYNA mpp971.2.7600.R3.1.sp, 3 Vehicle Collision, approx. 796K elements 150ms
LS-DYNA mpp971.2.7600.R2.sp, Customer Problem, approx. 225K elements, 280ms
Intel Atlantis Cluster
3 Vehicle Collision
1.00
0.52
0.18
0.47
0.26
0.15
0.09
0.29
0.13
0.96
0.0
0.2
0.4
0.6
0.8
1.0
1.2
16p 32p 64p 128p 256p
cores
E
l
a
p
s
e
d
T
i
m
e
i
n
s
e
c
o
n
d
s
(
l
o
w
e
r
i
s
b
e
t
t
e
r
)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
R
e
l
a
t
i
v
e
P
e
r
f
o
r
m
a
n
c
e
(
h
i
g
h
e
r
i
s
b
e
t
t
e
r
)
Default
Optimal
Relative Performance
Customer Problem
1.00
0.76
0.26
0.11
0.56
0.62
0
0.2
0.4
0.6
0.8
1
1.2
4p 8p 16p
cores
R
e
l
a
t
i
v
e
E
l
a
p
s
e
e
d
T
i
m
e
(
l
o
w
e
r
i
s
b
e
t
t
e
r
)
0
1
2
3
4
5
6
R
e
l
a
t
i
v
e
P
e
r
f
o
r
m
a
n
c
e
(
h
i
g
h
e
r
i
s
b
e
t
t
e
r
)
Default
Optimal
Relative Performance
Source: Intel internal measurements, configuration on slide 29
24
Copyright 2008 Intel Corporation
Software & Model Impact on Scalability
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those
tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the
performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
Scaling Efficiency Improvements in Software and Model Size
FLUENT* 12.0 Beta and FLUENT 6.3.35
Intel's "Endeavor" Cluster, Inifiband*
0%
20%
40%
60%
80%
100%
120%
8 16 32 64 128 256 512 768 1024 1536
Cores (8 cores per node)
S
c
a
l
i
n
g
E
f
f
i
c
i
e
n
c
y
Truck 14M - 6.3.35
Truck 111M - 6.3.35
Source: Intel internal measurements, configuration on slide 29
25
Copyright 2008 Intel Corporation
Software & Model Impact on Scalability
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those
tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the
performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
Scaling Efficiency Improvements in Software and Model Size
FLUENT* 12.0 Beta and FLUENT 6.3.35
Intel's "Endeavor" Cluster, Inifiband*
0%
20%
40%
60%
80%
100%
120%
8 16 32 64 128 256 512 768 1024 1536
Cores (8 cores per node)
S
c
a
l
i
n
g
E
f
f
i
c
i
e
n
c
y
Truck 14M - 12 Beta
Truck 111M - 12 Beta
Truck 14M - 6.3.35
Truck 111M - 6.3.35
Source: Intel internal measurements, configuration on slide 29
26
Copyright 2008 Intel Corporation
CAE Performance Factors
Processors & Platforms
Number & performance of cores
Reduced system bottlenecks
Efficient Platforms
Clusters
Interconnect latency & bandwidth
I/O Performance
Software
Optimization for target processors
Design for scalability
Models
Model partitioning
Model size
27
Copyright 2008 Intel Corporation
Summary
CAE Complexity has Grown
High Performance Computing Has Grown
CAE Performance derives from Processors, Platforms,
Clusters, Software and Models
28
Copyright 2008 Intel Corporation
29
Copyright 2008 Intel Corporation
Intel Cluster Configurations Overview
Endeavor Atlantis
Nodes / cores 256 / 2048 64 / 512
Platform IntelServer System S1560SF
IntelServer Board S5400SF 1U DP server
IntelServer System S1560SF
IntelServer Board S5400SF
1U DP server
CPU / Stepping IntelXeonProcessor E5462; C0 step
(Harpertown)
2.8 GHz / 12MB L2 cache 1600 MHz FSB
IntelXeonProcessor X5482; C0 step
(Harpertown)
3.2 GHz / 12 MB L2 cache 1600 MHz FSB
RAM 16 GB / node:
FBDIMM 16x1GB 667MHz
16 GB / node:
FBDIMM 8x2GB 667MHz
Hard drive 250GB SATA HDD 250GB SATA HDD
Cluster File System Abstract Panasas*
7 shelves
35 TB storage
Lustre*
2.7 TB
Panasas
4 shelves
13 TB storage
Interconnects GigE, IB GigE, IB
GigE Switch detail Cisco Catalyst* 4510
336 ports
Cisco Catalyst 4506
144 ports
IB switch Cisco SFS 7024D (DDR)
288 ports
SilverStorm 9080 (DDR)
96 ports
IB adapters Mellanox* MHGH28-XTC
PCI-E x8
Dual DDR InfiniBand* 4x
Mellanox MHGH29-XTC
PCI-E x8
Dual DDR InfiniBand 4x
OS / IB stack RedHat* EL4 update 4
OFED 1.3
RedHat EL4 update 4
OFED 1.2.5.5
* Other names and brands may be claimed as the property of others.
30
Copyright 2008 Intel Corporation
Server Configurations
Dual-Core Intel Xeon processor 5160 based platform details: Supermicro* X7DB8+ server
platform with two Dual-Core Intel Xeon processors 5160 3.00GHz, 4MB L2 cache, 1333MHZ FSB,
16GB memory (8x2GB FBD 667MHz), 64-bit RedHat Enterprise* Linux* 4 Update 4.
Quad-Core Intel Xeon processor X5365 based platform details: Supermicro X7DB8+ server
platform with two Quad-Core Intel Xeon processors X5365 3.00GHz, 2x4MB L2 cache, 1333MHZ FSB,
16GB memory (8x2GB FBD 667MHz), 64-bit RedHat Enterprise Linux 4 Update 4.
Quad-Core Intel Xeon processor E5472 based platform details: Supermicro X7DWA-N server
platform with two Quad-Core Intel Xeon processors E5472 3.00GHz, 2x6MB L2 cache, 1600MHZ FSB,
16GB Memory (8x2GB FBD 800MHz), 64-bit RedHat Enterprise Linux 4 Update 4.
Benchmark Information: Further details are available at http://www.intel.com/performance/server/
PowerFLOW Benchmark description:
Geometric mean of two standard benchmarks: External Case 1 [19 state] and External Case 2 [19
state].
PAM-CRASH Benchmark description:
Standard frontal crash test from USNCAP consortium. Model Chrysler Neon* with 300K and 1M
elements.
LS-DYNA* Benchmark description: The workloads used in these comparisons are called
neon_refined_revised and 3 Vehicle Collision and is publicly available from www.topcrunch.org. The
metric for the benchmark is elapsed time in seconds (lower is better).
ANSYS* Benchmark description: The benchmark suite of 15 workloads cover a representative set of
structural analysis solvers and analysis types. Quad-core results are based on 8-process parallel
execution.
FLUENT benchmark description: The benchmark suite of 15 real-world cases covers a
representative set of CFD analysis types and simulation model sizes. Quad core results are based on 8-
process parallel FLUENT. See http://www.fluent.com/software/fluent for details.
Abaqus* benchmark description: Abaqus/Standard benchmarks include linear statics, nonlinear
statics, and natural frequency extraction workloads. Abaqus/Explicit benchmarks include workloads
modeling high-speed dynamic impact events and quasi-static events with complicated contact
conditions.
* Other names and brands may be claimed as the property of others.