AIX Au Quotidien
AIX Au Quotidien
Troubleshooting issues:
1. Don't do issue fixing without a proper incident record.
2. Engage relevant parties while working on the issue
3. Always try get the information about the issue from the user ( requestor) with questions
line "what, when, where"
4. Look at errpt first
5. Check ‘alog -t console -o’ to see if its boot issue
6. Also looking log files mentioned in "/etc/syslog.conf" , may give some more information
for investigation.
7. Check backups if your looking for configuration change issues
8. if your running out of time,involve your next level team and managers
9. Take help from vendors like IBM,EMC,Symantec if necessary
P1 issues:
if its a priority 1 (P1) issue you may need to consider few more additional points apart from above.
1. Make sure change record is in fully approved otherwise don't start any of your task
2. Ensure proper validated CR procedure is in place; Precheck -> Installation -> Backout ->
Post-Verification
3. Supress alerts if needed
4. Remember Application/Database teams are responsible for their Application/Database
backup/restore and stop/start. Therefore alert the application teams .
5. Check the history of the servers(CRs or IRs )…to see if there were any issues or change
failures for these servers.
6. EXPECT THE UNEXPECTED : Ensure you have the proper back out plan in place.
7. Ensure you are on right server('uname -n'/'hostname') before you perform change.
8. Make sure your id as well as root id is not expired and working.
9. Ensure no other from your team are working on the same task to avoid one change being
performed by multiple SAs. Its better to verify with the ‘who -u’ command, to see if there
are any SAs already working on the server.
10. Remember one change at onetime; multiple changes could cause problems & can
complicate troubleshooting.
11. Ensure there are no other conflicting changes from other departments such as SAN,
network, firewall, application.. which could dampen your change.
12. Maintain/record the commands run/console output in the notepad(named after the
change).
1. Check if the server is running any cluster (HACMP/PowerHA), if so then you have to follow
different procedure.
2. Always remember three essential things are in place before you perform any change
“backup(mksysb); system information; console”
3. Take system configuration information (sysinfo script).
4. Check the lv/filesystems consistency “df -k”(df should not hang); all lvs should be in sync
state “lsvg -o|lsvg -il”.
5. Check errpt & ‘alog -t console -o’ to see if there are any errors.
6. Ensure latest mksysb(OS image backup) kept in relevant NIM server
7. Ensure non-rootvg file systems backup taken
8. Verify boot list & boot device: “bootlist -m normal -o” “ipl_varyon -i”
9. Login to HMC console
1. Put the servers in maintenance mode (stop alerts) to avoid unnecessary incident alerts.
2. Check filesystems count “df -g|wc -l” ; verify the count after migration or reboot.
3. Ensure there are no schedule reboots in crontab. If there is any then comment it before
you proceed with the change.
4. If the system has not rebooted from long-time(> 100 days); then perform ‘bosboot’ & then
reboot the machine(verify the fs/appfs after reboot), & then commence with the
migration/upgrade. [Don't reboot the machine if the bosboot fails!]
5. Look for the log messages carefully; don't ignore warnings.
inform the relevant application teams and SDMs and take extended with proper approvals
Raise a incident record in supporting the issue.
Successful Change: