Computing Infrastructure and Performance For CAE: George Chaltas Intel Corporation July 2008

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

1

Copyright 2008 Intel Corporation


Computing
Infrastructure
and Performance
for CAE
George Chaltas
Intel Corporation
July 2008
2
Copyright 2008 Intel Corporation
Legal Disclaimers
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate
performance of Intel products as measured by those tests. Any difference in system hardware or software design or
configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of
systems or components they are considering purchasing. For more information on performance tests and on the performance of
Intel products, visit http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104.
All dates and products specified are for planning purposes only and are subject to change without notice
Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual
benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and
assigning them a relative performance number that correlates with the performance improvements reported.
Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor
series, not across different processor sequences. See http://www.intel.com/products/processor_number for details.
Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear
facility applications. All dates and products specified are for planning purposes only and are subject to change without notice
Intel, Intel Xeon, Intel Core microarchitecture, Intel Pentium-D, Intel. Leap ahead. logo, Xeon Inside logo and the Itanium 2
Inside logo and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United
States and other countries.
* Other names and brands may be claimed as the property of others.
Copyright 2008 Intel Corporation.
3
Copyright 2008 Intel Corporation
Overview
CAE & High Performance Computing
Intel Architecture in
High Performance Computing
Performance and Scalability
4
Copyright 2008 Intel Corporation
Great Wall
Wright Flyer
Pantheon
Brooklyn Bridge
Eiffel Tower
Great Engineering Before CAE
5
Copyright 2008 Intel Corporation
VAX* 11/780
Introduced in
October 1977
1 CPU
8 KB cache
4MB memory
$150,000 (US, approx.)
Near the dawn of CAE
* Other names and
brands may be
claimed as the
property of others.
6
Copyright 2008 Intel Corporation
__ _________ CAE is Everywhere
Millau Viaduct
Shanghai
Maglev
Computer
Chips
Aircraft
Consumer
Products
Sports
Automobiles
7
Copyright 2008 Intel Corporation
30 Years of Engineering Computing
Intel Endeavor Cluster, #78 on TOP500
287 nodes
574 E4372 Intel Xeon Processors @ 2.8 GHz
2296 cores
12 MB cache per processor 3.4 GB
16 GB memory per node 4.5 TB
21.810 TFLOPS LINPACK
VAX 11/780
1 CPU
8 KB cache
4MB memory
0.16 MFLOPS LINPACK
* Other names and brands may be
claimed as the property of others.
1.36E+8 times more FLOPS!
1978
2008
8
Copyright 2008 Intel Corporation
CAE Historical Complexity
1
10
100
1000
10000
100000
1000000
1975 1980 1985 1990 1995 2000 2005 2010
Implicit (DOF)
Crash (elements)
CFD (cells)
More Complex CAE Models need More Compute Cycles
Source: non-scientific survey of the authors
colleagues who are CAE professionals.
9
Copyright 2008 Intel Corporation
Intel in High Performance Computing
374 systems in TOP500*
Large scale clusters
in Dupont, WA for
test & optimization
Teraflops
Research
Chip
Leading
performance,
performance/watt
Dedicated,
renowned
expertise
Broad SW
tools
portfolio
EMCORE**
Connects
Cables
*July, 2008, www.top500.org
** Other names and brands may be claimed as the property of others.
10
Copyright 2008 Intel Corporation 10
Product Cadence for Sustained
Leadership
All dates, product features and plans are subject to change without notice.
2

Y
E
A
R
S
2

Y
E
A
R
S
2

Y
E
A
R
S
TOCK NEHALEM
TOCK SANDY BRIDGE
TICK WESTMERE
TICK PENRYN Family
TOCK Core 2 Processor, Xeon

Processor
TICK Pentium

D, Xeon

32nm
45nm
65nm
11
Copyright 2008 Intel Corporation
45nm Advantage
820m Transistors
12 MB Cache
107 mm
2
*
143 mm
2
*
582m Transistors
8 MB Cache
143 mm
2
*
Intel

Xeon

5300 Processor
(Clovertown)
65nm
Intel

Xeon

5400 Processor
(Harpertown)
45nm Hi-k
107 mm
2
*
12
Copyright 2008 Intel Corporation
1.00
1.19
Quad-Core Intel Xeon X5365/
1333 FSB
Quad-Core Intel Xeon E5472/
1600 FSB
Quad-Core Intel Xeon Processor
5400 Series Performance
ANSYS* Mechanical* 11.0
Quad-Core Intel Xeon E5472 performance is 19% to 26%
faster than previous quad core processor
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests.
Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of
systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
Relative Performance
Higher is better
1.00
1.26
Quad-Core Intel Xeon X5365/
1333 FSB
Quad-Core Intel Xeon E5472/
1600 FSB
Relative Performance
Higher is better
FLUENT* 6.3.26
Data Source: Approved/published results as of Nov 11, 2007 (Fluent) and Nov 28, 2007 (ANSYS) using standard benchmarks, confi gurations on slide 30.
13
Copyright 2008 Intel Corporation
1.00
1.48
1.32
Dual-Core Intel Xeon
5160/ 1333 FSB
Quad-Core Intel Xeon
X5365/ 1333 FSB
Quad-Core Intel Xeon
E5472/ 1600 FSB
Quad-Core Intel Xeon Processor
5400 Series Performance
Abaqus/Standard v6.7-2
+13%
Quad-Core Intel Xeon E5472 performance is 13% - 24%
faster than previous quad core processor
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests.
Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of
systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
Relative Performance
Higher is better
1.00
1.74
1.41
Dual-Core Intel Xeon
5160/ 1333 FSB
Quad-Core Intel Xeon
X5365/ 1333 FSB
Quad-Core Intel Xeon
E5472/ 1600 FSB
Relative Performance
Higher is better
Abaqus/Explicit* v6.7-2
+24%
Data Source: Approved/published results as of Nov 1, 2007 using standard Simulia benchmarks, configurations on slide 30.
14
Copyright 2008 Intel Corporation
1.00
1.78
1.40
Dual-Core Intel Xeon
5160/ 1333 FSB
Quad-Core Intel Xeon
X5365/ 1333 FSB
Quad-Core Intel Xeon
E5472/ 1600 FSB
Quad-Core Intel Xeon Processor
5400 Series Performance
PAM-CRASH* v2006.0
+27%
Quad-Core Intel Xeon E5472 performance is 15% - 27%
faster than previous quad core processor
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests.
Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of
systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
Relative Performance
Higher is better
1.00
1.81
1.58
Dual-Core Intel Xeon
5160/ 1333 FSB
Quad-Core Intel Xeon
X5365/ 1333 FSB
Quad-Core Intel Xeon
E5472/ 1600 FSB
Relative Performance
Higher is better
PowerFLOW* 4.0a
+15%
Data Source: Approved/published results as of Nov 11, 2007 using standard benchmarks, configurations on slide 30.
15
Copyright 2008 Intel Corporation
23560
39416
30198
Dual-Core Intel Xeon
5160/ 1333 FSB
Quad-Core Intel Xeon
X5365/ 1333 FSB
Quad-Core Intel Xeon
E5472/ 1600 FSB
Quad-Core Intel Xeon Processor
5400 Series Performance
LS-DYNA* 3 Vehicle Collision
+22%
Quad-Core Intel Xeon E5472 performance is about 22%
faster than previous quad core processor
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests.
Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of
systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
Elapsed time in seconds
Lower is better
1715
2885
2212
Dual-Core Intel Xeon
5160/ 1333 FSB
Quad-Core Intel Xeon
X5365/ 1333 FSB
Quad-Core Intel Xeon
E5472/ 1600 FSB
LS-DYNA* Neon Refined Revised
+22%
Data Source: Published/submitted results as of Nov 11, 2007 on www.topcrunch.org, configurations on slide 30.
Elapsed time in seconds
Lower is better
16
Copyright 2008 Intel Corporation
Tukwila for the Worlds Most Demanding
Computers
Quad-core with 30 MB cache
2 billion transistors
Multi-threading technology
Intel QuickPath interconnect
Dual integrated memory controllers
Estimate >2X
*
performance
Mainframe-class RAS
*Compared to Dual-core Itanium Processor 9100 series
HP has already successfully booted four key operating systems
(Linux, Windows, HP-UX and OpenVMS) on our Tukwila-based
Integrity servers.and have found the initial silicon to be robust
and of high quality.
Martin Fink, Senior VP & GM, Business Critical
Systems, HP
17
Copyright 2008 Intel Corporation
Next Generation Nehalem Processor
Greater Instruction per clock and
improved cache hierarchy
Simultaneous Multi-Threading
45nm Intel

multi-core processors
(2 and 4 core implementations planned)
Dynamic Resource Scaling
Any unneeded cores automatically put into sleep mode;
remaining operating cores get access to ALL cache,
bandwidth and power/thermal budgets
Turbo Mode
CPU operates at higher-than-stated frequency when
operating below power and thermal design points
KEY FEATURES
Faster Processing / core
Two Threads / core
Energy efficient multi-core processing
Lower power consumption during
periods of low utilization
Additional Processing boost during
peak demand periods
BENEFITS
Source: Intel. All future products, computer systems, dates, and figures specified are preliminary based
on current expectations, and are subject to change without notice.
Faster cores, More cores/threads, Dynamically adaptable
18
Copyright 2008 Intel Corporation
Intel Software Advantage
Intel works directly with CAE software vendors.
Intel provides software development tools
for High Performance Computing
Enables best performance on Intel Architecture
workstations and servers today.
Enables the definition of and adoption of new
platform and processor features tomorrow.
19
Copyright 2008 Intel Corporation
Software leadership in HPC
Intel valued supplier to the industry
including the very popular parallelism support from
Intel Threading Building Blocks
Intel has a rich and strong
SW tools portfolio for HPC
20
Copyright 2008 Intel Corporation
Performance & Scalability
Cluster Interconnect
File System
Model Decomposition
Model Complexity
Software Improvements
21
Copyright 2008 Intel Corporation
Interconnect Performance Impact
Ethernet vs. InifiniBand* Scaling Efficiency
FLUENT* 6.5.35, Sedan 4M Model, Intel "Endeavor" Cluster
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
120.0%
8 16 32 64 128
Cores (8 cores per node)
S
c
a
l
i
n
g

E
f
f
i
c
i
e
n
c
y
GigE
InfiniBand
4x with 16 nodes
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those
tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the
performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
Source: Intel internal measurements,
configuration on slide 29
22
Copyright 2008 Intel Corporation
File System Performance Impact
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those
tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the
performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
18%
19.7%
17%
Source: Intel internal measurements,
configuration on slide 29
File System Performance Impact
Simulia* ABAQUS*/Standard 6.7-EF1 (Implicit Solver)
5M Element model, Intel "Endeavor" Cluster
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
32p 64p 128p
Cores (8 cores per node)
R
e
l
a
t
i
v
e

P
e
r
f
o
r
m
a
n
c
e
(
H
i
g
h
e
r

i
s

b
e
t
t
e
r
)
default/Ethernet
Lustre*/InfiniBand*
23
Copyright 2008 Intel Corporation
Model Decomposition Affects
Scalability & Performance
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those
tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the
performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
LS-DYNA mpp971.2.7600.R3.1.sp, 3 Vehicle Collision, approx. 796K elements 150ms
LS-DYNA mpp971.2.7600.R2.sp, Customer Problem, approx. 225K elements, 280ms
Intel Atlantis Cluster
3 Vehicle Collision
1.00
0.52
0.18
0.47
0.26
0.15
0.09
0.29
0.13
0.96
0.0
0.2
0.4
0.6
0.8
1.0
1.2
16p 32p 64p 128p 256p
cores
E
l
a
p
s
e
d

T
i
m
e

i
n

s
e
c
o
n
d
s
(
l
o
w
e
r

i
s

b
e
t
t
e
r
)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
R
e
l
a
t
i
v
e

P
e
r
f
o
r
m
a
n
c
e
(
h
i
g
h
e
r

i
s

b
e
t
t
e
r
)
Default
Optimal
Relative Performance
Customer Problem
1.00
0.76
0.26
0.11
0.56
0.62
0
0.2
0.4
0.6
0.8
1
1.2
4p 8p 16p
cores
R
e
l
a
t
i
v
e

E
l
a
p
s
e
e
d

T
i
m
e
(
l
o
w
e
r

i
s

b
e
t
t
e
r
)
0
1
2
3
4
5
6
R
e
l
a
t
i
v
e

P
e
r
f
o
r
m
a
n
c
e
(
h
i
g
h
e
r

i
s

b
e
t
t
e
r
)
Default
Optimal
Relative Performance
Source: Intel internal measurements, configuration on slide 29
24
Copyright 2008 Intel Corporation
Software & Model Impact on Scalability
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those
tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the
performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
Scaling Efficiency Improvements in Software and Model Size
FLUENT* 12.0 Beta and FLUENT 6.3.35
Intel's "Endeavor" Cluster, Inifiband*
0%
20%
40%
60%
80%
100%
120%
8 16 32 64 128 256 512 768 1024 1536
Cores (8 cores per node)
S
c
a
l
i
n
g

E
f
f
i
c
i
e
n
c
y
Truck 14M - 6.3.35
Truck 111M - 6.3.35
Source: Intel internal measurements, configuration on slide 29
25
Copyright 2008 Intel Corporation
Software & Model Impact on Scalability
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those
tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the
performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit
http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright 2007, Intel Corporation. * Other names and brands may be
claimed as the property of others.
Scaling Efficiency Improvements in Software and Model Size
FLUENT* 12.0 Beta and FLUENT 6.3.35
Intel's "Endeavor" Cluster, Inifiband*
0%
20%
40%
60%
80%
100%
120%
8 16 32 64 128 256 512 768 1024 1536
Cores (8 cores per node)
S
c
a
l
i
n
g

E
f
f
i
c
i
e
n
c
y
Truck 14M - 12 Beta
Truck 111M - 12 Beta
Truck 14M - 6.3.35
Truck 111M - 6.3.35
Source: Intel internal measurements, configuration on slide 29
26
Copyright 2008 Intel Corporation
CAE Performance Factors
Processors & Platforms
Number & performance of cores
Reduced system bottlenecks
Efficient Platforms
Clusters
Interconnect latency & bandwidth
I/O Performance
Software
Optimization for target processors
Design for scalability
Models
Model partitioning
Model size
27
Copyright 2008 Intel Corporation
Summary
CAE Complexity has Grown
High Performance Computing Has Grown
CAE Performance derives from Processors, Platforms,
Clusters, Software and Models
28
Copyright 2008 Intel Corporation
29
Copyright 2008 Intel Corporation
Intel Cluster Configurations Overview
Endeavor Atlantis
Nodes / cores 256 / 2048 64 / 512
Platform IntelServer System S1560SF
IntelServer Board S5400SF 1U DP server
IntelServer System S1560SF
IntelServer Board S5400SF
1U DP server
CPU / Stepping IntelXeonProcessor E5462; C0 step
(Harpertown)
2.8 GHz / 12MB L2 cache 1600 MHz FSB
IntelXeonProcessor X5482; C0 step
(Harpertown)
3.2 GHz / 12 MB L2 cache 1600 MHz FSB
RAM 16 GB / node:
FBDIMM 16x1GB 667MHz
16 GB / node:
FBDIMM 8x2GB 667MHz
Hard drive 250GB SATA HDD 250GB SATA HDD
Cluster File System Abstract Panasas*
7 shelves
35 TB storage
Lustre*
2.7 TB
Panasas
4 shelves
13 TB storage
Interconnects GigE, IB GigE, IB
GigE Switch detail Cisco Catalyst* 4510
336 ports
Cisco Catalyst 4506
144 ports
IB switch Cisco SFS 7024D (DDR)
288 ports
SilverStorm 9080 (DDR)
96 ports
IB adapters Mellanox* MHGH28-XTC
PCI-E x8
Dual DDR InfiniBand* 4x
Mellanox MHGH29-XTC
PCI-E x8
Dual DDR InfiniBand 4x
OS / IB stack RedHat* EL4 update 4
OFED 1.3
RedHat EL4 update 4
OFED 1.2.5.5
* Other names and brands may be claimed as the property of others.
30
Copyright 2008 Intel Corporation
Server Configurations
Dual-Core Intel Xeon processor 5160 based platform details: Supermicro* X7DB8+ server
platform with two Dual-Core Intel Xeon processors 5160 3.00GHz, 4MB L2 cache, 1333MHZ FSB,
16GB memory (8x2GB FBD 667MHz), 64-bit RedHat Enterprise* Linux* 4 Update 4.
Quad-Core Intel Xeon processor X5365 based platform details: Supermicro X7DB8+ server
platform with two Quad-Core Intel Xeon processors X5365 3.00GHz, 2x4MB L2 cache, 1333MHZ FSB,
16GB memory (8x2GB FBD 667MHz), 64-bit RedHat Enterprise Linux 4 Update 4.
Quad-Core Intel Xeon processor E5472 based platform details: Supermicro X7DWA-N server
platform with two Quad-Core Intel Xeon processors E5472 3.00GHz, 2x6MB L2 cache, 1600MHZ FSB,
16GB Memory (8x2GB FBD 800MHz), 64-bit RedHat Enterprise Linux 4 Update 4.
Benchmark Information: Further details are available at http://www.intel.com/performance/server/
PowerFLOW Benchmark description:
Geometric mean of two standard benchmarks: External Case 1 [19 state] and External Case 2 [19
state].
PAM-CRASH Benchmark description:
Standard frontal crash test from USNCAP consortium. Model Chrysler Neon* with 300K and 1M
elements.
LS-DYNA* Benchmark description: The workloads used in these comparisons are called
neon_refined_revised and 3 Vehicle Collision and is publicly available from www.topcrunch.org. The
metric for the benchmark is elapsed time in seconds (lower is better).
ANSYS* Benchmark description: The benchmark suite of 15 workloads cover a representative set of
structural analysis solvers and analysis types. Quad-core results are based on 8-process parallel
execution.
FLUENT benchmark description: The benchmark suite of 15 real-world cases covers a
representative set of CFD analysis types and simulation model sizes. Quad core results are based on 8-
process parallel FLUENT. See http://www.fluent.com/software/fluent for details.
Abaqus* benchmark description: Abaqus/Standard benchmarks include linear statics, nonlinear
statics, and natural frequency extraction workloads. Abaqus/Explicit benchmarks include workloads
modeling high-speed dynamic impact events and quasi-static events with complicated contact
conditions.
* Other names and brands may be claimed as the property of others.

You might also like