Data Guard - Cheatsheet
Data Guard - Cheatsheet
Data Guard - Cheatsheet
currently being updated, this statement will be removed when I have completed this section
Terminology
Log Files
## You can get the log locations from the below view
# Primary Database server - if you have setup db_unique_name, tnsname and log_archive_dest_n
There are many options see the broker section for more information
# There are a number of specific information commands, here are the most used
DGMGRL> show database prod1 statusreport;
DGMGRL> show database prod1 inconsistentProperties;
DGMGRL> show database prod1 inconsistentlogxptProps;
DGMGRL> show database prod1 logxptstatus;
DGMGRL> show database prod1 latestlog;
# change the instance name to reflect the one you have choosen
There are a number of commands that you can use to change the state of the database
turn off/on the redo DGMGRL> edit database prod1 set state=transport-off;
transport service for all Primary
standby databases DGMGRL> edit database prod1 set state=transport-on;
Redo Processing
There are a number of Oracle background processes that play a key role, first the primary database
LGWR - log writer process flushes from the SGA to the ORL files
LNS - LogWriter Network Service reads redo being flushed from the redo buffers by the LGWR and performs a
network send of the redo to the standby
ARCH - archives the ORL files to archive logs, that also used to fulfill gap resolution requests, one
ARCH processes is dedicated to local redo log activity only and never communicates with a standby
database
RFS - Remote File Server process performs a network receive of redo transmitted from the primary and
writes the network redo to the standby redo log (SRL) files.
ARCH - performs the same as the primary but on the standby
MRP - Managed Recover Process coordinates media recovery management, recall that a physical standby is in
perpetual recovery mode
LSP - Logical Standby Process coordinates SQL apply, this process only runs in a logical standby
PR0x - recovery server process reads redo from the SRL or archive log files and apply this redo to the
standby database.
Real-Time Apply
Enable real-time apply sql> alter database recover managed standby database using current logfile disconnect;
## primary (example)
Note: this command can only run when the database is open
Logical Standby
select owner from dba_logstdby_skip where statement_opt = 'INTERNAL SCHEMA' order by owner;
schema that are not
maintained by SQL apply Note: system and sys schema are not replicated so don't go creating tables in these schemas, the above command
should return about 17 schemas (Oracle 11g) that are replicated.
## Syntax
dbms_logstdby.skip (
stmt in varchar2,
schema_name in varchar2 default null,
object_name in varchar2 default null,
proc_name in varchar2 default null,
use_like in boolean default true,
skip replication of tables esc in char1 default null
);
## Examples
execute dbms_logstdby.skip(stmt => 'DML', schema_name => 'HR', object_name => 'EMPLOYEE');
execute dbms_logstdby.skip(stmt => 'SCHEMA_DDL', schema_name => 'HR', object_name => 'EMPLOYEE');
How much LCR cache is select used_memory_size from v$logmnr_session where session_id = (select value from v$logstdby_stats where name
being used = 'SESSION_ID');
# apply lag: indicates how current the replicated data at the logical standby is
# transport lag: indicates how much redo data that has already been generated is missing at the logical
# standby in term of redo records
lagging SQL Apply
select name, value, unit from v$dataguard_stats;
Name Value
-----------------------------------------------------------------------------------------------------
SQL Apply component
TRANSACTIONS APPLIED 3764
bottleneck TRANSACTIONS MINED 4985
The mined transactions should be about twice the applied transaction, if this decreases or staying at a low
value you need to start looking at the mining engine.
select count(1) as idle_preparers from v$logstdby_process where type = 'PREPARER' and STATUS_CODE = 16166;
Make sure all preparers are IDLE_PREPARER
busy ----------------------------
0
select used_memory_size from v$logstdby_session where session_id = (select value from v$logstdby_stats where
Make sure the peak size is name = 'LOGMINER SESSION ID');
well below the amount USED_MEMORY_SIZE
allocated ----------------------------
32522244
select (available_txn - pinned_txn) as pipleline_depth from v$logstdby_session where session_id (select value
from v$lostdby_stats where name = 'LOGMINER SESSION ID');
PIPELINE_DEPTH
verify that the preparer ----------------------------
8
does not have enough work
for the applier processes select count(*) as applier_count from v$logstdby_process where type = 'APPLIER';
APPLIER_COUNT
----------------------------
20
Now subtract one from the other and work out the percentage rate, if pageout has increase above 5% then
increase the MAX_SERVERS
unassigned large
transactions ## By default SQL apply should be one-sixth of the number of applier processes
select (available_txn - pinned_txn) as pipleline_depth from v$logstdby_session where session_id (select value
from v$lostdby_stats where name = 'LOGMINER SESSION ID');
PIPELINE_DEPTH
----------------------------
256
select count(1) as idle_applier from v$logstdby_process where type = 'APPLIER' and statuscode = 16166;
IDLE_APPLIER
---------------------------
12
select value from v$logstdby_stats where name = 'LARGE TXNS WAITING TO BE ASSIGNED';
VALUE
---------------------------
12
Monitoring
## you can use the dg_archivelog_monitor.sh script, which accepts three parameters, primary, physical
delays in redo transport ## and the archive log threshold (# of archive logs)
## now check on the primary we should be one in front (run on the primary)
sql> select thread#, sequence#, status from v$log;
Note: if using a RAC environment make sure you check each instance
## check that MRP (applying_log) matches the RFS process, if the MRP line is missing then you nee
## start the apply process, you also may see the status of wait_for_gap so wait until the gap hav
check that redo has been applied ## resolved first
2
(physical)
sql> select client_process, process, sequence#, status from v$managed_standby;
## if you are using a logical standby then you need to check the following to confirm the redo ha
## applied
check that redo has been applied sql> select applied_scn, latest_scn, mining_scn from v$logstdby_progress;
3
(logical)
## if the mining scn is behind you may have a gap check this by using the following
switchover (primary) 9 sql> alter database commit to switchover to physical standby with session shutdown;
Note: at this point if you want to rollback this switchover see my troubleshooting section to get
check the switchover status 10 sql> select switchover_status from v$database;
complete the switchover (physical) 11 sql> alter database commit to switchover to primary with session shutdown;
open the new primary 12 sql> alter database open;
sql> shutdown immediate;
finish off the old primary 13 sql> startup mount;
sql> alter database recover managed standby database using current logfile disconnect;
## check the syn status, it should say yes (run on the standby)
sql> select db_unique_name, protection_mode, synchronization_status, synchronized from v$archive_
Note: if using a RAC environment make sure you check each instance
## check that MRP (applying_log) matches the RFS process, if the MRP line is missing then you nee
## start the apply process, you also may see the status of wait_for_gap so wait until the gap hav
check that redo has been applied ## resolved first
2
(physical)
sql> select client_process, process, sequence#, status from v$managed_standby;
## if you are using a logical standby then you need to check the following to confirm the redo ha
## applied
check that redo has been applied sql> select applied_scn, latest_scn, mining_scn from v$logstdby_progress;
3
(logical)
## if the mining scn is behind you may have a gap check this by using the following
## confirm that the prepare has started to happen, you should see "preparing dictionary"
Prepare the logical standby 10 sql> select switchover_status from v$database;
## wait a while until the dictionary is built and sent and you should see "preparing switchover"
sql> select switchover_status from v$database;
## you should now see its in the state of "to logical standby"
Check primary database state 11
sql> select switchover_status from v$database;
## On the primary
sql> alter database prepare to switchover cancel;
the last chance to CANCEL the
12
switchover (no going back after this) ## on the logical
sql> alter database prepare to switchover cancel;
switchover the primary to a logical
13 sql> alter database commit to switchover to logical standby;
standby
## check that its ready to become the primary, you should see "to primary"
select name, value, time_computed from v$dataguard_stats where name like '%lag%';
Check redo applied 1
## You can also use the SCN number
## Start by telling the apply process that this standby is going to be the new primary, and to ap
## the redo that it has
the failover process (logical standby) 2 alter database activate logical standby database finish apply;
FAILOVER_SCN
-----------------------------------------------
7658841
## Now flashback the old primary to this SCN and start in mount mode
startup mount;
flashback database to scn 7658841;
alter database convert to physical standby;
shutdown immediate;
startup mount;
## hopefully the old primary will start to resolve any gap issues at the next log switch, which m
## process to get this standby going to catchup as fast as possible
alter database recover managed standby database using current logfile disconnect;
## eventually the missing redos will be sent to the standby and applied, bring us back to synchro
flashback_scn recovery_scn
---------------------------------------------------------
7658941 7659568
## Now flashback the old primary to this SCN and start in mount mode
startup mount;
flashback database to scn 7658841;
alter database convert to physical standby;
shutdown immediate;
startup mount;
## Now we need to hand feed the archive logs from the primary to the standby (old primary) into t
## process, so lets get those logs (run on the primary)
bring back the old primary (logical select file_name from dba_logstdby_log where first_changed# <= recovery_scn and next_change# > fl
2
standby) ## Now you will hopefully have a short list of the files you need, now you need to register them
## the standby database (old primary)
## Now you can recover up to the SCN but not including the one you specify
recover managed standby database until change 7659568;
## Now the standby database becomes a logical standby as up to this point it has been a physical
alter database active standby database;
## Lastly you need tell your new logical standby to ask the primary for a new copy of the diction
## all the redo in between. The SQL Apply will connect to the new primary using the database link
## retrieve the LogMiner dictionary, once the dictionary has been built, SQL Apply will apply all
## redo sent from the new primary and get itself synchronized
create public database link reinstatelogical connect to system identified by password using 'serv