Exadata MAA
Exadata MAA
Exadata MAA
The preceding is intended to outline our general product direction. It is intended for information purposes
only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code,
or functionality, and should not be relied upon in making purchasing decisions. The development,
release, timing, and pricing of any features or functionality described for Oracle’s products may change
and remains at the sole discretion of Oracle Corporation.
Statements in this presentation relating to Oracle’s future plans, expectations, beliefs, intentions and
prospects are “forward-looking statements” and are subject to material risks and uncertainties. A detailed
discussion of these factors and other risks that affect our business is contained in Oracle’s Securities and
Exchange Commission (SEC) filings, including our most recent reports on Form 10-K and Form 10-Q
under the heading “Risk Factors.” These filings are available on the SEC’s website or on Oracle’s website
at http://www.oracle.com/investor. All information in this presentation is current as of September
2019 and Oracle undertakes no duty to update any statement in light of new information or future events.
Technical Presentation
April, 2020
Program Agenda
1
Exadata & Maximum Availability Architecture
2
MAA Reference Architectures
3
MAA Features in Exadata
4
MAA Exadata Lifecycle Operations
5
Summary
Autonomous Database
DBCS/ExaCS/ExaCC
Average cost of
Average cost of
unplanned data center
downtime per hour
$350K $10M outage or disaster
Percentage of
companies that have
Average amount of
experienced an
downtime per year
87 hours 91% unplanned data center
outage in the last 24
months
Source: Gartner, Data Center Knowledge, IT Process Institute, Forrester Research
Disruptive Schema Changes due to application changes to meet Schema Changes impacts are greatly reduced with faster
ever-changing business requirements changes, index and object rebuilds and reorganizations
Planned Outages Downtime required for lifecycle management like periodic Downtime required for lifecycle management is mitigated
upgrades of firmware & software, data migration using fast online upgrades, patching automation with
service migration, standby first patching, zero downtime
migration
Data Corruptions due to hardware/software faults, media issues Data Corruptions are prevented or the potential downtime
is reduced dramatically with additional corruption
prevention, detection and auto-repair
Application Brownouts due to server, instance storage failures or Application Brownout reduced to sub-second with fastest
Unplanned Outages due to planned maintenance instance recovery.
Disaster Recovery (DR) Challenges where the DR site is not keeping Disaster Recovery (DR) Challenges are mitigated with
up with Production fastest redo apply resulting in low Recovery Time Objective
Partitioning
Oracle Exadata Columnar Flash Cache
Advanced
Database DB Machine
Hybrid Columnar
Compression Innovations Innovations Compression 10:1
HCC
Secure
Virtual
Networks
Cloud Exadata Public Cloud Service
Security and In Oracle Public
Hardening
Cloud Data
Oracle- Centers
Managed
Core Exadata Exadata
Platform Infrastructure
• Redundant Network
– Redundant 40Gb/s IB connections and switches
– Client access using HA bonded networks
– Integrated HA software/firmware stack
https://oracle.com/goto/maa
Bronze
The picture can't be displayed.
DATABASE IN-MEMORY
DATABASE IN-MEMORY
Software checking
Active clusters, Online patching,
Disk/flash mirroring reconfiguration,
expansion
LAN WAN
Best MAA Database Platform | Fastest RAC Instance and Node Failure Recovery | Fastest Backup - RMAN Offload to Storage
Deep ASM Mirroring Integration | Fastest Data Guard Redo Apply | Complete Failure Testing with Lowest Brownouts
Frequently Updated Health Checks
Copyright © 2020 Oracle and/or its affiliates. 16
Exadata MAA Evolution • Choosing the SLA policy
Customer • Architecture • Application performance
Oracle • Database Management (Tooling)
• Infrastructure • Configuration, Tuning
Management • Lifecycle Operations (Tooling)
• Architecture • Application Performance
• Database Management Autonomous
• Configuration, Tuning Database / Database
• Lifecycle operations
Exadata
Infrastructure • Application Performance
• Cloud
Management
• Architecture On-Premises • Oracle owns and • Oracle owns and manages
• Configuration, Tuning Exadata manages the best Infrastructure
• Database Management integrated MAA
• Blueprints • Policy driven
• Lifecycle Operations DB platform
• Exadata is the best deployments
• Application Performance Cloud automation
integrated MAA DB • • MAA Integrated cloud
On-Premises platform for provisioning • Fully automated Self-
and life cycle Driving, Self-Securing,
• Blueprints operations Self-Repairing Database
• Feedback to
products & features
Instance
Dev, Test, Prod - Single Instance or
Database
Multitenant Database with Backups
Fiber Channel
SAN
10gigE or 25GigE
Recovery Appliance
• Delta Push & Backup Validation Tape library
• Incremental Forever • Offsite Backups
• Zero Data Loss Recoverability • Vaulting
Prod/Departmental
Bronze +
Local Backup Replicated
• Real Application Clustering (RAC) Backups
• Application Continuity
Outage Matrix
Unplanned Outage RTO/RPO*
Recoverable node or instance failure Zero**
Disasters: corruptions and site failures Hours to days. RPO since last
Checklist found in MAA OTN backup or near zero with ZDLRA
https://www.oracle.com/technetwork/database/op Planned Maintenance
tions/clustering/applicationcontinuity/adb-
continuousavailability-5169724.pdf Software/hardware updates Zero**
Major database upgrade Minutes to hour
Primary Database
1) MAA Whitepaper: Application Checklist for Continuous Service for MAA Solutions
2) Using RHPhelper to Minimize Downtime During Planned Maintenance on Exadata (MOS 2385790.1)
3. Fleet Patch and Provisioning incorporates MAA practices
Copyright © 2020 Oracle and/or its affiliates. 31
GOLD Primary Region
AD2
DG FSFO
AD1
Secondary Region
Mission Critical
Silver +
• Active Data Guard
Local Local Remote Local
• Comprehensive Data Protection Primary backup
backup Standby Standby
MAA Architecture:
• At least one standby required
across AD or region.
Outage Matrix
• Primary in one data center(or AD) Unplanned Outage RTO/RPO*
replicated to a Standby in another Recoverable node or instance failure Seconds
data center
• Active Data Guard Fast-Start Disasters: corruptions and site failures Seconds. RPO zero or seconds
Failover (FSFO) Planned Maintenance
• Local backups on both primary and
standby Software/hardware updates Zero
Major database upgrade Seconds
Exadata HARD checks on write, automatic disk scrub and repair HARD checks on write
Use Edition-based
Use Oracle Golden Gate Redefinition Use Oracle Sharding
Required Optional Alternative
Trail Trail
Capture Dist. Receiver Files Delivery
Files
Service Service
LAN / WAN / Internet
Over TCP/IP
https://www.oracle.com/database/technologies/high-availability/sharding.html
Copyright © 2020 Oracle and/or its affiliates. 45
Maximum Availability Architecture (MAA)
Brownout Quality of
Reduction Service
Performance Management
Brownout Quality of
Reduction Service
Performance Management
49
Exadata: Data Protection
Corruption Detection, Prevention & Repair
50
Exadata: Data Protection
Storage Failures
• When a drive is reported as failed, but not physically
failed
Automatic power cycle the drive to avoid false positive drive failure
• Works on both High Capacity Disks & Extreme Flash Cells
Brownout Quality of
Reduction Service
Performance Management
Seconds
•
• The write IO is canceled and temporarily written to healthy flash on the 20
same cell
10
1
• Cell Side Disk Confinement 0
• When a disk goes bad and is taken offline Exadata Traditional
• Diagnostic is automatically run on the disk to determine Storage
health
• If healthy, disk is returned to ONLINE status and re-
synched
• If unhealthy, health factor drop is performed, rebalance
is performed and blue LED is lit after completion
• Each IO is tagged with who issued the IO, purpose & priority
• Enables mixed workloads, consolidation of many databases with multiple tiers of performance
?
IOs are Pumping
Slow IO ? Cell IO Latency Capping
Undiscovered hardware
/ software issue?
Brownout Quality of
Reduction Service
Performance
Management
Fault Management
Components break Fully automated notification and replacement process through ASR (Auto
Service Request)
Components get sick Exadata uniquely qualified to handle sick components with full stack
integration. Exadata provides system/service level high availability.
Intelligent hardware/software Blue light indicating disk replacement can be performed. Cell shutdown
integration helps prevent human error prevention and notification when redundancy would be compromised. X7
Do Not Service LED
Cell Shutdown causing application Smart handshake with database tier and proactive redundancy checks
outage during cell (or cellsrv) shutdown to prevent application outage.
• EXAchk provides configuration specific, up-to-date health check across the entire stack
• Covers Exadata, Database, Grid Infrastructure, ASM critical issues
• Provides MAA scorecard with MAA configuration gaps and guidance to mitigate
• Automated periodic scheduled runs with email notifications
• Continuous evolution of configuration checks
• EXAchk helps with saving a lot of time and money due to proactive health verification which
dramatically reduces downtime
• Currently has over 1000 checks per target
• Development recommends that the latest EXAchk be executed with the following frequency:
• Monthly
• Week before any planned maintenance activity
• Day before any planned maintenance activity
• Immediately after completion of planned maintenance activity or an outage or incident
Brownout Quality of
Reduction Service
Management
Performance
Scale-Out with Fastest CPUs Get New and Fastest Processors First
Ultra-fast NVMe PCIe Flash Get Fastest NVMe PCIe Flash First
Tier PCIe Flash & Huge Disks Get Bigger Disk Drives First
PCI Flash
Exadata Hardware + Exadata Software + Oracle Database provides the ultimate performance !!
Fastest OLTP
Fastest OLTP I/O with scale-out storage, RDMA, and NVMe flash
Fastest scale-out with unique RDMA algorithms for inter-node cluster coordination
Fully redundant and fastest recovery from failed or sick components
Best Consolidation
Uniquely prioritizes latency sensitive or important workloads through full stack
Uniquely isolates workloads from multiple tenants through full stack
Copyright © 2020 Oracle and/or its affiliates. 80
Exadata X8M (changes from X8 in red)
Scale-Out 2 or 8 Socket Database Servers
Latest 24 core Intel Cascade Lake
X
cache on a read miss Database Read PMEM
<1
X
Second to complete cell eviction,
maintaining SLA
Cell
• -verify-config and
–roceswitch-precheck options
available to check state ahead
of time
6000
5000
• Thomson-Reuters
• Data Warehouse on Exadata, prior to write-back flash cache
• While resolving a gap of observed an average apply rate of 580MB/second
• Allstate Insurance
• Data Warehouse ETL processing resulted in average apply rate over a 3 hour
period of 668MB/second, with peaks hitting 900MB/second
Brownout Quality of
Reduction Service
Performance Management
Example of Database node power failure with an OLTP workload and CSS
misscount=60
Application Brownout
If a server disappears from both 350
InfiniBand switches, declare it dead in less 300
300
Seconds
No waiting for long heartbeat timeouts
200
Reduces application brownouts from 30+
seconds to < 2 seconds 150
100
Active/Active IB configuration provides:
Extreme throughput - 40 Gb/s QDR 50
0.8
Extreme availability - RDS failover in seconds 0
with minimum application impact Exadata 3rd Party Storage
Let’s watch a 1 minute video featuring our Fast Node Death Detection
(FNDD) feature. If you watch carefully you will still be rewarded with
one new feature referenced at about the 35 second mark
The brownout associated with active/passive client access network port failure is
now 60% lower after we thoroughly verified a reduction to the network downdelay
parameter. It also prevents false positive VIP failover when using OVM.
This configuration change is now in the default Exadata deployment and the best
practice check is in exachk.
Database Tier
Storage Tier
X
• The tertiary mirror continues to provide
protection just in case its one of those
days
• After the storage failure is repaired and
the cell caching state is deemed healthy
again, return to the primary mirror
Brownout Quality of
Reduction Service
Performance Management
Fast network failure detection Exadata Smart Write Back, Smart Flash Logging, Smart Scan Appliance mode support
and Reverse Offload
Fastest Redo Apply and Instance Recovery Cell Alert Summary
Redundancy protection on cellsrv shutdown
Efficient resilver rebalance after flash failure Flash and Disk Life Cycle Management Alerts
Reduced brownout for instance recovery
I/O latency capping for reads and writes Automatic LED support for disk removal
ILOM hang detection and repair
Cell IO timeout threshold Auto online
Redundancy protection on cell shutdown
Smart Write Back Flash Cache persistence Auto disk management
Automatic ASM mirror read on IO error corruption I/O and Network Resource Management Priority rebalance support
IO error prevention with Exadata disk scrubbing / ASM Health factor on predicatively failed disks EM failure reporting
corruption repair
Disk confinement Failure Monitoring on database servers
Exadata HARD
IO hang detection and repair Updating database nodes with patchmgr
Corruption prevention with HARD support
Cell to Cell offload for Disk Repair Optimized and Faster Exadata Patching
Elimination of false positive drive failures
Cell-to-Cell Rebalance Preserves Flash Cache Custom Diagnostic Package for Cell Alerts
Redundancy Check during power down
Exadata Elastic Configuration VLAN support and automation
Blue OK-to-remove LED light notification Drop hard disk for replacement Exachk – full stack health check with critical issues alerts
Faster
Zero Reliable
deployment*, Integrated
Few seconds downtime Meeting HA network &
Reduced tools/reports
of Blackout / using SLAs at any storage
guess-work & with end-to-
Brownout corruption scale performance
tuning end visibility
prevention at any scale
requirements
Brownout Quality of
Reduction Service
Performance Management
Productive hours lost per 100 users per year 1,021 66 955 94%
Source: IDC
Copyright © 2020 Oracle and/or its affiliates. 110
Maximum Availability Architecture (MAA)
Summary
4 OF THE TOP 5
BANKS, TELCOS, RETAILERS RUN EXADATA
• Petabyte Warehouses
• Online Financial Trading
• Business Applications
• SAP, Oracle, Siebel, PSFT, …
• Massive DB Consolidation
• Public SaaS Clouds
• Oracle Fusion Apps,
Salesforce, SAS, …
• Scale-Out Servers
Oracle
Database
Appliance