Sunday, April 24, 2011

Recovering from 552, 554 and 556 in AIX


Causes of an LED 552, 554, or 556

An LED code of 552, 554, or 556 during a standard disk based boot indicates a failure occurred during the varyon of the rootvg volume group.
Some known causes of an LED 552, 554, or 556 are:
  • a corrupted file system
  • a corrupted Journaled File System (JFS) log device
  • a bad IPL-device record or bad IPL-device magic number; the magic number indicates the device type
  • a corrupted copy of the Object Data Manager (ODM) database on the boot logical volume
  • a fixed disk (hard disk) in the inactive state in the root volume group

Recovery procedure

To diagnose and fix the problem, boot to a Service mode shell and run the fsck command (file system check) on each file system. If the file system check fails, you may need to perform other steps.
WARNING: Do not use this document if the system is a /usr client, diskless client, or dataless client.
  1. Boot your system into a limited function maintenance shell (Service or Maintenance mode) from bootable AIX media to use this recovery procedure. Refer to your system user's or installation and service guide for specific IPL procedures related to your type and model of hardware. You can also refer to the document titled "Booting in Service Mode", available at http://techsupport.services.ibm.com/server/aix.srchBroker for more information.
  2. With bootable media of the same version and level as the system, boot the system into Service mode. The bootable media can be any ONE of the following:
    • Bootable CD-ROM
    • mksysb
    • Bootable Install Tape
    Follow the screen prompts to the Welcome to Base OS menu.
  3. Choose Start Maintenance Mode for System Recovery (Option 3). The next screen displays prompts for the Maintenance menu.
    • Choose Access a Root Volume Group (Option 1). The next screen displays a warning that indicates you will not be able to return to the Base OS menu without rebooting.
    • Choose 0 continue. The next screen displays information about all volume groups on the system.
    • Select the root volume group by number. The logical volumes in rootvg will be displayed with two options below.
    • Choose Access this volume group and start a shell before mounting the file systems (Option 2).
    If you receive errors from the preceding option, do not continue with the rest of this procedure. Correct the problem causing the error. If you need assistance correcting the problem causing the error, contact one of the following:
    • local branch office
    • your point of sale
    • your AIX support center
  4. Run the following commands to check and repair file systems.
    fsck -p /dev/hd4 
       fsck -p /dev/hd2 
       fsck -p /dev/hd9var 
       fsck -p /dev/hd3
       fsck -p /dev/hd1 
    
    NOTE: The -y option gives the fsck command permission to repair file system corruption when necessary. This flag can be used to avoid having to manually answer multiple confirmation prompts, however, use of this flag can cause permanent data loss in some situations.
    If any of the following conditions occur, proceed accordingly.

    • If fsck indicates that block 8 could not be read, the file system is probably unrecoverable. See step 5 for information on unrecoverable file systems.
    • If fsck indicates that a file system has an unknown log record type, or if fsck fails in the logredo process, then go to step 6.
    • If the file system checks were successful, skip to step 8.
  5. The easiest way to fix an unrecoverable file system is to recreate it. This involves deleting it from the system and restoring it from a very current system backup. Note that hd9var and hd3 can be recreated, but hd4 and hd2 cannot be recreated. If hd4 and/or hd2 is unrecoverable, AIX must be reinstalled or restored from system backup. For assistance with unrecoverable file systems, contact your local branch office, point of sale, or AIX support center. Do not follow the rest of the steps in this document.
  6. A corruption of the JFS log logical volume has been detected. Use the logform command to reformat it.
    /usr/sbin/logform /dev/hd8
    
    Answer yes when asked if you want to destroy the log.
  7. Repeat step 4 for all file systems that did not successfully complete fsck the first time. If step 4 fails a second time, the file system is almost always unrecoverable. See step 5 for an explanation of the options at this point. In most cases, step 4 will be successful. If step 4 is successful, continue to step 8.
  8. With the key in Normal position (for microchannel machines), run the following commands to reboot the system:
    exit
       sync;sync;sync;reboot
    
    As you reboot in Normal mode, notice how many times LED 551 appears. If LED 551 appears twice, fsck is probably failing because of a bad fshelper file. If this is the case and you are running AFS, see step 11.
    The majority of instances of LED 552, 554, and 556 will be resolved at this point. If you still have an LED 552, 554, or 556, you may try the following steps.
    ATTENTION: The following steps will overwrite your Object Data Manager (ODM) database files with a very primitive, minimal ODM database. Due to the potential loss of user configuration data caused by this procedure, it should only be used as a last resort effort to gain access to your system to attempt to back up any data that you can. It is NOT recommended to use the following procedure in lieu of restoring from a system backup.
  9. Repeat step 1 through step 3.
  10. Run the following commands, which remove much of the system's configuration and save it to a backup directory.
    mount /dev/hd4 /mnt
       mount /dev/hd2 /mnt/usr
       mkdir /mnt/etc/objrepos/bak
       cp /mnt/etc/objrepos/Cu* /mnt/etc/objrepos/bak
       cp /etc/objrepos/Cu* /mnt/etc/objrepos
       umount /dev/hd2
       umount /dev/hd4
       exit
    
    Determine which disk is the boot disk with the lslv command. The boot disk will be shown in the PV1 column of the lslv output.
    lslv -m hd5
    
    Save the clean ODM database to the boot logical volume. (# is the number of the fixed disk, determined with the previous command.)
    savebase -d /dev/hdisk# 
    If you are running AFS, go to step 11; otherwise, go to step 12.
  11. If you are running the Andrew File System (AFS), use the following commands to find out whether you have more than one version of the v3fshelper file.
    cd /sbin/helpers
       ls -l v3fshelper*
    
    If you have only one version of the v3fshelper file (for example, v3fshelper), proceed to step 12.
    If there is a version of v3fshelper marked as original (for example, v3fshelper.orig), run the following commands:
    cp v3fshelper v3fshelper.afs
       cp v3fshelper.orig v3fshelper
    
  12. WARNING: Do not proceed further if the system is a /usr client, diskless client, or dataless client. Make sure that hd5 is on the edge of the drive and if it is more than 1 partition that the partitions are contiguous. For systems of 5.1 and above, make sure that hd5 is greater than 12 MB:
    lslv hd5 (Check to see what the PP Size: is equal to)
         lslv -m hd5
    
    LP    PP1  PV1           PP2   PV2                    PP3   PV3
    0001   0001 hdisk2
    0002   0002 hdisk2
    
    Recreate the boot image (hdisk# is the fixed disk determined in step 11):
    bosboot -a -d /dev/hdisk# 
    Make sure the bootlist is set correctly:
    bootlist -m normal -o
    
    Make changes, if necessary:
    bootlist -m normal hdiskX cdX
    
    (This can be edited to whatever you wish it to be.) NOTE: If you suspect an inactive or damaged disk device is causing the boot problem and the boot logical device, hd5, is mirrored onto another device, you may wish to list this other boot device first in the bootlist.

    Make sure that the disk drive that you have chosen as your bootable device has a yes next to it:
    ipl_varyon -i
    
    Example:
    PVNAME                     BOOT DEVICE       
    PVID                                 VOLUME GROUP ID
    hdisk1                     NO                
    0007b53cbfd04a9000000000000000000007b53c00004c00
    hdisk4                     NO                
    0007b53c1244625d00000000000000000007b53c00004c00
    hdisk2                     YES               
    0007b53c8ffd631200000000000000000007b53c00004c00
    
    From the above example, hdisk2 would be a bootable disk drive while hdisk1 and hdisk4 would not be.
  13. If you copied files in step 11, copy the AFS file system helper back to v3fshelper:
    cp v3fshelper.afs v3fshelper
    
  14. Turn the key to the Normal position if dealing with microchannel machine and run
    sync;sync;sync;reboot
    
If you followed all of the preceding steps and the system still stops at an LED 552, 554, or 556 during a reboot in Normal mode, you may want to consider reinstalling your system from a recent backup. Isolating the cause of the hang could be excessively time-consuming and may not be cost-effective in your operating environment. To isolate the possible cause of the hang, would require a debug boot of the system. Instructions for doing this are included in the document "Capturing Boot Debug", available at http://techsupport.services. ibm.com/server/aix.srchBroker . It is still possible, in the end, that isolation of the problem may indicate a restore or reinstall of AIX is necessary to correct it.
If you wish to pursue further system recovery, you may be able to obtain assistence from one of the following:
  • local branch office
  • your point of sale
  • your AIX support center
 

Causes of an LED 552, 554, or 556

An LED code of 552, 554, or 556 during a standard disk based boot indicates a failure occurred during the varyon of the rootvg volume group.
Some known causes of an LED 552, 554, or 556 are:
  • a corrupted file system
  • a corrupted Journaled File System (JFS) log device
  • a bad IPL-device record or bad IPL-device magic number; the magic number indicates the device type
  • a corrupted copy of the Object Data Manager (ODM) database on the boot logical volume
  • a fixed disk (hard disk) in the inactive state in the root volume group

Recovery procedure

To diagnose and fix the problem, boot to a Service mode shell and run the fsck command (file system check) on each file system. If the file system check fails, you may need to perform other steps.
WARNING: Do not use this document if the system is a /usr client, diskless client, or dataless client.
  1. Boot your system into a limited function maintenance shell (Service or Maintenance mode) from bootable AIX media to use this recovery procedure. Refer to your system user's or installation and service guide for specific IPL procedures related to your type and model of hardware. You can also refer to the document titled "Booting in Service Mode", available at http://techsupport.services.ibm.com/server/aix.srchBroker for more information.
  2. With bootable media of the same version and level as the system, boot the system into Service mode. The bootable media can be any ONE of the following:
    • Bootable CD-ROM
    • mksysb
    • Bootable Install Tape
    Follow the screen prompts to the Welcome to Base OS menu.
  3. Choose Start Maintenance Mode for System Recovery (Option 3). The next screen displays prompts for the Maintenance menu.
    • Choose Access a Root Volume Group (Option 1). The next screen displays a warning that indicates you will not be able to return to the Base OS menu without rebooting.
    • Choose 0 continue. The next screen displays information about all volume groups on the system.
    • Select the root volume group by number. The logical volumes in rootvg will be displayed with two options below.
    • Choose Access this volume group and start a shell before mounting the file systems (Option 2).
    If you receive errors from the preceding option, do not continue with the rest of this procedure. Correct the problem causing the error. If you need assistance correcting the problem causing the error, contact one of the following:
    • local branch office
    • your point of sale
    • your AIX support center
  4. Run the following commands to check and repair file systems.
    fsck -p /dev/hd4 
       fsck -p /dev/hd2 
       fsck -p /dev/hd9var 
       fsck -p /dev/hd3
       fsck -p /dev/hd1 
    
    NOTE: The -y option gives the fsck command permission to repair file system corruption when necessary. This flag can be used to avoid having to manually answer multiple confirmation prompts, however, use of this flag can cause permanent data loss in some situations.
    If any of the following conditions occur, proceed accordingly.

    • If fsck indicates that block 8 could not be read, the file system is probably unrecoverable. See step 5 for information on unrecoverable file systems.
    • If fsck indicates that a file system has an unknown log record type, or if fsck fails in the logredo process, then go to step 6.
    • If the file system checks were successful, skip to step 8.
  5. The easiest way to fix an unrecoverable file system is to recreate it. This involves deleting it from the system and restoring it from a very current system backup. Note that hd9var and hd3 can be recreated, but hd4 and hd2 cannot be recreated. If hd4 and/or hd2 is unrecoverable, AIX must be reinstalled or restored from system backup. For assistance with unrecoverable file systems, contact your local branch office, point of sale, or AIX support center. Do not follow the rest of the steps in this document.
  6. A corruption of the JFS log logical volume has been detected. Use the logform command to reformat it.
    /usr/sbin/logform /dev/hd8
    
    Answer yes when asked if you want to destroy the log.
  7. Repeat step 4 for all file systems that did not successfully complete fsck the first time. If step 4 fails a second time, the file system is almost always unrecoverable. See step 5 for an explanation of the options at this point. In most cases, step 4 will be successful. If step 4 is successful, continue to step 8.
  8. With the key in Normal position (for microchannel machines), run the following commands to reboot the system:
    exit
       sync;sync;sync;reboot
    
    As you reboot in Normal mode, notice how many times LED 551 appears. If LED 551 appears twice, fsck is probably failing because of a bad fshelper file. If this is the case and you are running AFS, see step 11.
    The majority of instances of LED 552, 554, and 556 will be resolved at this point. If you still have an LED 552, 554, or 556, you may try the following steps.
    ATTENTION: The following steps will overwrite your Object Data Manager (ODM) database files with a very primitive, minimal ODM database. Due to the potential loss of user configuration data caused by this procedure, it should only be used as a last resort effort to gain access to your system to attempt to back up any data that you can. It is NOT recommended to use the following procedure in lieu of restoring from a system backup.
  9. Repeat step 1 through step 3.
  10. Run the following commands, which remove much of the system's configuration and save it to a backup directory.
    mount /dev/hd4 /mnt
       mount /dev/hd2 /mnt/usr
       mkdir /mnt/etc/objrepos/bak
       cp /mnt/etc/objrepos/Cu* /mnt/etc/objrepos/bak
       cp /etc/objrepos/Cu* /mnt/etc/objrepos
       umount /dev/hd2
       umount /dev/hd4
       exit
    
    Determine which disk is the boot disk with the lslv command. The boot disk will be shown in the PV1 column of the lslv output.
    lslv -m hd5
    
    Save the clean ODM database to the boot logical volume. (# is the number of the fixed disk, determined with the previous command.)
    savebase -d /dev/hdisk# 
    If you are running AFS, go to step 11; otherwise, go to step 12.
  11. If you are running the Andrew File System (AFS), use the following commands to find out whether you have more than one version of the v3fshelper file.
    cd /sbin/helpers
       ls -l v3fshelper*
    
    If you have only one version of the v3fshelper file (for example, v3fshelper), proceed to step 12.
    If there is a version of v3fshelper marked as original (for example, v3fshelper.orig), run the following commands:
    cp v3fshelper v3fshelper.afs
       cp v3fshelper.orig v3fshelper
    
  12. WARNING: Do not proceed further if the system is a /usr client, diskless client, or dataless client. Make sure that hd5 is on the edge of the drive and if it is more than 1 partition that the partitions are contiguous. For systems of 5.1 and above, make sure that hd5 is greater than 12 MB:
    lslv hd5 (Check to see what the PP Size: is equal to)
         lslv -m hd5
    
    LP    PP1  PV1           PP2   PV2                    PP3   PV3
    0001   0001 hdisk2
    0002   0002 hdisk2
    
    Recreate the boot image (hdisk# is the fixed disk determined in step 11):
    bosboot -a -d /dev/hdisk# 
    Make sure the bootlist is set correctly:
    bootlist -m normal -o
    
    Make changes, if necessary:
    bootlist -m normal hdiskX cdX
    
    (This can be edited to whatever you wish it to be.) NOTE: If you suspect an inactive or damaged disk device is causing the boot problem and the boot logical device, hd5, is mirrored onto another device, you may wish to list this other boot device first in the bootlist.

    Make sure that the disk drive that you have chosen as your bootable device has a yes next to it:
    ipl_varyon -i
    
    Example:
    PVNAME                     BOOT DEVICE       
    PVID                                 VOLUME GROUP ID
    hdisk1                     NO                
    0007b53cbfd04a9000000000000000000007b53c00004c00
    hdisk4                     NO                
    0007b53c1244625d00000000000000000007b53c00004c00
    hdisk2                     YES               
    0007b53c8ffd631200000000000000000007b53c00004c00
    
    From the above example, hdisk2 would be a bootable disk drive while hdisk1 and hdisk4 would not be.
  13. If you copied files in step 11, copy the AFS file system helper back to v3fshelper:
    cp v3fshelper.afs v3fshelper
    
  14. Turn the key to the Normal position if dealing with microchannel machine and run
    sync;sync;sync;reboot
    
If you followed all of the preceding steps and the system still stops at an LED 552, 554, or 556 during a reboot in Normal mode, you may want to consider reinstalling your system from a recent backup. Isolating the cause of the hang could be excessively time-consuming and may not be cost-effective in your operating environment. To isolate the possible cause of the hang, would require a debug boot of the system. Instructions for doing this are included in the document "Capturing Boot Debug", available at http://techsupport.services. ibm.com/server/aix.srchBroker . It is still possible, in the end, that isolation of the problem may indicate a restore or reinstall of AIX is necessary to correct it.
If you wish to pursue further system recovery, you may be able to obtain assistence from one of the following:
  • local branch office
  • your point of sale
  • your AIX support center
 
Causes of an LED 552, 554, or 556
An LED code of 552, 554, or 556 during a standard disk based boot indicates a failure occurred during the varyon of the rootvg volume group.
Some known causes of an LED 552, 554, or 556 are:
  • a corrupted file system
  • a corrupted Journaled File System (JFS) log device
  • a bad IPL-device record or bad IPL-device magic number; the magic number indicates the device type
  • a corrupted copy of the Object Data Manager (ODM) database on the boot logical volume
  • a fixed disk (hard disk) in the inactive state in the root volume group

Recovery procedure

To diagnose and fix the problem, boot to a Service mode shell and run the fsck command (file system check) on each file system. If the file system check fails, you may need to perform other steps.
WARNING: Do not use this document if the system is a /usr client, diskless client, or dataless client.
  1. Boot your system into a limited function maintenance shell (Service or Maintenance mode) from bootable AIX media to use this recovery procedure. Refer to your system user's or installation and service guide for specific IPL procedures related to your type and model of hardware. You can also refer to the document titled "Booting in Service Mode", available at http://techsupport.services.ibm.com/server/aix.srchBroker for more information.
  2. With bootable media of the same version and level as the system, boot the system into Service mode. The bootable media can be any ONE of the following:
    • Bootable CD-ROM
    • mksysb
    • Bootable Install Tape
    Follow the screen prompts to the Welcome to Base OS menu.
  3. Choose Start Maintenance Mode for System Recovery (Option 3). The next screen displays prompts for the Maintenance menu.
    • Choose Access a Root Volume Group (Option 1). The next screen displays a warning that indicates you will not be able to return to the Base OS menu without rebooting.
    • Choose 0 continue. The next screen displays information about all volume groups on the system.
    • Select the root volume group by number. The logical volumes in rootvg will be displayed with two options below.
    • Choose Access this volume group and start a shell before mounting the file systems (Option 2).
    If you receive errors from the preceding option, do not continue with the rest of this procedure. Correct the problem causing the error. If you need assistance correcting the problem causing the error, contact one of the following:
    • local branch office
    • your point of sale
    • your AIX support center
  4. Run the following commands to check and repair file systems.
    fsck -p /dev/hd4 
       fsck -p /dev/hd2 
       fsck -p /dev/hd9var 
       fsck -p /dev/hd3
       fsck -p /dev/hd1 
    
    NOTE: The -y option gives the fsck command permission to repair file system corruption when necessary. This flag can be used to avoid having to manually answer multiple confirmation prompts, however, use of this flag can cause permanent data loss in some situations.
    If any of the following conditions occur, proceed accordingly.

    • If fsck indicates that block 8 could not be read, the file system is probably unrecoverable. See step 5 for information on unrecoverable file systems.
    • If fsck indicates that a file system has an unknown log record type, or if fsck fails in the logredo process, then go to step 6.
    • If the file system checks were successful, skip to step 8.
  5. The easiest way to fix an unrecoverable file system is to recreate it. This involves deleting it from the system and restoring it from a very current system backup. Note that hd9var and hd3 can be recreated, but hd4 and hd2 cannot be recreated. If hd4 and/or hd2 is unrecoverable, AIX must be reinstalled or restored from system backup. For assistance with unrecoverable file systems, contact your local branch office, point of sale, or AIX support center. Do not follow the rest of the steps in this document.
  6. A corruption of the JFS log logical volume has been detected. Use the logform command to reformat it.
    /usr/sbin/logform /dev/hd8
    
    Answer yes when asked if you want to destroy the log.
  7. Repeat step 4 for all file systems that did not successfully complete fsck the first time. If step 4 fails a second time, the file system is almost always unrecoverable. See step 5 for an explanation of the options at this point. In most cases, step 4 will be successful. If step 4 is successful, continue to step 8.
  8. With the key in Normal position (for microchannel machines), run the following commands to reboot the system:
    exit
       sync;sync;sync;reboot
    
    As you reboot in Normal mode, notice how many times LED 551 appears. If LED 551 appears twice, fsck is probably failing because of a bad fshelper file. If this is the case and you are running AFS, see step 11.
    The majority of instances of LED 552, 554, and 556 will be resolved at this point. If you still have an LED 552, 554, or 556, you may try the following steps.
    ATTENTION: The following steps will overwrite your Object Data Manager (ODM) database files with a very primitive, minimal ODM database. Due to the potential loss of user configuration data caused by this procedure, it should only be used as a last resort effort to gain access to your system to attempt to back up any data that you can. It is NOT recommended to use the following procedure in lieu of restoring from a system backup.
  9. Repeat step 1 through step 3.
  10. Run the following commands, which remove much of the system's configuration and save it to a backup directory.
    mount /dev/hd4 /mnt
       mount /dev/hd2 /mnt/usr
       mkdir /mnt/etc/objrepos/bak
       cp /mnt/etc/objrepos/Cu* /mnt/etc/objrepos/bak
       cp /etc/objrepos/Cu* /mnt/etc/objrepos
       umount /dev/hd2
       umount /dev/hd4
       exit
    
    Determine which disk is the boot disk with the lslv command. The boot disk will be shown in the PV1 column of the lslv output.
    lslv -m hd5
    
    Save the clean ODM database to the boot logical volume. (# is the number of the fixed disk, determined with the previous command.)
    savebase -d /dev/hdisk# 
    If you are running AFS, go to step 11; otherwise, go to step 12.
  11. If you are running the Andrew File System (AFS), use the following commands to find out whether you have more than one version of the v3fshelper file.
    cd /sbin/helpers
       ls -l v3fshelper*
    
    If you have only one version of the v3fshelper file (for example, v3fshelper), proceed to step 12.
    If there is a version of v3fshelper marked as original (for example, v3fshelper.orig), run the following commands:
    cp v3fshelper v3fshelper.afs
       cp v3fshelper.orig v3fshelper
    
  12. WARNING: Do not proceed further if the system is a /usr client, diskless client, or dataless client. Make sure that hd5 is on the edge of the drive and if it is more than 1 partition that the partitions are contiguous. For systems of 5.1 and above, make sure that hd5 is greater than 12 MB:
    lslv hd5 (Check to see what the PP Size: is equal to)
         lslv -m hd5
    
    LP    PP1  PV1           PP2   PV2                    PP3   PV3
    0001   0001 hdisk2
    0002   0002 hdisk2
    
    Recreate the boot image (hdisk# is the fixed disk determined in step 11):
    bosboot -a -d /dev/hdisk# 
    Make sure the bootlist is set correctly:
    bootlist -m normal -o
    
    Make changes, if necessary:
    bootlist -m normal hdiskX cdX
    
    (This can be edited to whatever you wish it to be.) NOTE: If you suspect an inactive or damaged disk device is causing the boot problem and the boot logical device, hd5, is mirrored onto another device, you may wish to list this other boot device first in the bootlist.

    Make sure that the disk drive that you have chosen as your bootable device has a yes next to it:
    ipl_varyon -i
    
    Example:
    PVNAME                     BOOT DEVICE       
    PVID                                 VOLUME GROUP ID
    hdisk1                     NO                
    0007b53cbfd04a9000000000000000000007b53c00004c00
    hdisk4                     NO                
    0007b53c1244625d00000000000000000007b53c00004c00
    hdisk2                     YES               
    0007b53c8ffd631200000000000000000007b53c00004c00
    
    From the above example, hdisk2 would be a bootable disk drive while hdisk1 and hdisk4 would not be.
  13. If you copied files in step 11, copy the AFS file system helper back to v3fshelper:
    cp v3fshelper.afs v3fshelper
    
  14. Turn the key to the Normal position if dealing with microchannel machine and run
    sync;sync;sync;reboot
    
If you followed all of the preceding steps and the system still stops at an LED 552, 554, or 556 during a reboot in Normal mode, you may want to consider reinstalling your system from a recent backup. Isolating the cause of the hang could be excessively time-consuming and may not be cost-effective in your operating environment. To isolate the possible cause of the hang, would require a debug boot of the system. Instructions for doing this are included in the document "Capturing Boot Debug", available at http://techsupport.services. ibm.com/server/aix.srchBroker . It is still possible, in the end, that isolation of the problem may indicate a restore or reinstall of AIX is necessary to correct it.
If you wish to pursue further system recovery, you may be able to obtain assistence from one of the following:
  • local branch office
  • your point of sale
  • your AIX support center

VIO basics

The Virtual I/O Server is part of the IBM System p Advanced Power Virtualization hardware feature. Virtual I/O Server allows sharing of physical resources between LPARs including virtual SCSI and virtual networking. This allows more efficient utilization of physical resources through sharing between LPARs and facilitates server consolidation.

The Virtual I/O Server is software that is located in a logical partition. This software facilitates the sharing of physical I/O resources between AIX® and Linux® client logical partitions within the server. The Virtual I/O Server provides virtual SCSI target and Shared Ethernet Adapter capability to client logical partitions within the system, allowing the client logical partitions to share SCSI devices and Ethernet adapters. The Virtual I/O Server software requires that the logical partition be dedicated solely for its use.
The Virtual I/O Server is available as part of the Advanced POWER™ Virtualization hardware feature.
Using the Virtual I/O Server facilitates the following functions:
-->Sharing of physical resources between logical partitions on the system
-->Creating logical partitions without requiring additional physical I/O resources
-->Creating more logical partitions than there are I/O slots or physical devices available with the ability for partitions to have dedicated I/O, virtual I/O, or both
-->Maximizing use of physical resources on the system
-->Helping to reduce the Storage Area Network (SAN) infrastructure
The Virtual I/O Server supports client logical partitions running the following operating systems:
-->AIX 5.3 or later
-->SUSE Linux Enterprise Server 9 for POWER (or later)
-->Red Hat® Enterprise Linux AS for POWER Version 3 (update 2 or later)
-->Red Hat Enterprise Linux AS for POWER Version 4 (or later)
For the most recent information about devices that are supported on the Virtual I/O Server, to download Virtual I/O Server fixes and updates, and to find additional information about the Virtual I/O Server, see the Virtual I/O Server Web site.
The Virtual I/O Server comprises the following primary components:
-->Virtual SCSI
-->Virtual Networking
-->Integrated Virtualization Manager
The following sections provide a brief overview of each of these components.


Virtual SCSIPhysical adapters with attached disks or optical devices on the Virtual I/O Server logical partition can be shared by one or more client logical partitions. The Virtual I/O Server offers a local storage subsystem that provides standard SCSI-compliant logical unit numbers (LUNs). The Virtual I/O Server can export a pool of heterogeneous physical storage as an homogeneous pool of block storage in the form of SCSI disks.
Unlike typical storage subsystems that are physically located in the SAN, the SCSI devices exported by the Virtual I/O Server are limited to the domain within the server. Although the SCSI LUNs are SCSI compliant, they might not meet the needs of all applications, particularly those that exist in a distributed environment.
The following SCSI peripheral-device types are supported:
-->Disks backed by a logical volume
-->Disks backed by a physical volume
-->Optical devices (DVD-RAM and DVD-ROM)


Virtual networking
Shared Ethernet Adapter allows logical partitions on the virtual local area network (VLAN) to share access to a physical Ethernet adapter and to communicate with systems and partitions outside the server. This function enables logical partitions on the internal VLAN to share the VLAN with stand-alone servers.


Integrated Virtualization Manager
The Integrated Virtualization Manager provides a browser-based interface and a command-line interface that you can use to manage IBM® System p5™ and IBM eServer™ pSeries® servers that use the IBM Virtual I/O Server. On the managed system, you can create logical partitions, manage the virtual storage and virtual Ethernet, and view service information related to the server. The Integrated Virtualization Manager is packaged with the Virtual I/O Server, but it is activated and usable only on certain platforms and where no Hardware Management Console (HMC) is present.

Intorduction to VIO

Prior to the introduction of POWER5 systems, it was only possible to create as many separate logical partitions (LPARs) on an IBM system as there were physical processors. Given that the largest IBM eServer pSeries POWER4 server, the p690, had 32 processors, 32 partitions were the most anyone could create. A customer could order a system with enough physical disks and network adapter cards to so that each LPAR would have enough disks to contain operating systems and enough network cards to allow users to communicate with each partition.
The Advanced POWER Virtualization™ feature of POWER5 platforms1 makes it possible to allocate fractions of a physical CPU to a POWER5 LPAR. Using virtual CPU's and virtual I/O a user can create many more LPARs on a p5 system than there are CPU's or I/O slots. The Advanced POWER Virtualization feature accounts for this by allowing users to create shared network adapters and virtual SCSI disks. Customers can use these virtual resources to provide disk space and network adapters for each LPAR they create on their POWER5 system
(see Figure ).


There are three components of the Advanced POWER Virtualization feature: Micro-Partitioning™, shared Ethernet adapters, and virtual SCSI. In addition, AIX 5L Version
5.3 allows users to define virtual Ethernet adapters permitting inter-LPAR communication. This paper provides an overview of how each of these components works and then shows the details of how to set up a simple three-partition system where one partition is a Virtual I/O Server and the other two partitions use virtual Ethernet and virtual SCSI to differing degrees. What follows is a practical guide to help a new POWER5 customer set up simple systems where high availability is not a concern, but becoming familiar with this new technology in a development environment is the primary goal.

Micro-Partitioning
An element of the IBM POWER Virtualization feature called Micro-Partitioning can divide a single processor into many different processors. In POWER4 systems, each physical processor is dedicated to an LPAR. This concept of dedicated processors is still present in POWER5 systems, but so is the concept of shared processors. A POWER5 system administrator can use the Hardware Management Console (HMC) to place processors in
a shared processor pool. Using the HMC, the administrator can assign fractions of a CPU to individual partitions. If one LPAR is defined to use processors in the shared processor pool, when those CPUs are idle, the POWER Hypervisor™ makes them available to other partitions. This ensures that these processing resources are not wasted. Also, the ability to assign fractions of a CPU to a partition means it is possible to partition POWER5 servers into many different partitions. Allocation of physical processor and memory resources on POWER5 systems is managed by a system firmware component called the POWER Hypervisor.

Virtual Networking
Virtual networking on POWER5 hardware consists of two main capabilities. One capability is provided by a software IEEE 802.1q (VLAN) switch that is implemented in the Hypervisor on POWER5 hardware. Users can use the HMC to add Virtual Ethernet adapters to their partition definitions. Once these are added and the partitions booted, the new adapters can be configured just like real physical adapters, and the partitions can communicate with each other without having to connect cables between the LPARs. Users can separate traffic from different VLANs by assigning different VLAN IDs to each virtual Ethernet adapter. Each AIX 5.3 partition can support up to 256 Virtual Ethernet adapters

In addition, a part of the Advanced POWER virtualization virtual networking feature allows users to share physical adapters between logical partitions. These shared adapters, called Shared Ethernet Adapters (SEAs), are managed by a Virtual I/O Server partition which maps physical adapters under its control to virtual adapters. It is possible to map many physical Ethernet adapters to a single virtual Ethernet adapter thereby eliminating a single physical adapter as a point of failure in the architecture.
There are a few things users of virtual networking need to consider before implementing it. First, virtual networking ultimately uses more CPU cycles on the POWER5 machine than when physical adapters are assigned to a partition. Users should consider assigning a physical adapter directly to a partition when heavy network traffic is predicted over a certain adapter. Secondly, users may want to take advantage of larger MTU sizes that virtual Ethernet allows if they know that their applications will benefit from the reduced fragmentation and better performance that larger MTU sizes offer. The MTU size limit for SEA is smaller than Virtual Ethernet adapters, so users will have to carefully choose an MTU size so that packets are sent to external networks with minimum fragmentation.

Virtual SCSI
The Advanced POWER Virtualization feature called virtual SCSI allows access to physical disk devices which are assigned to the Virtual I/O Server (VIOS). The system administrator uses VIOS logical volume manager commands to assign disks to volume groups. The administrator creates logical volumes in the Virtual I/O Server volume groups. Either these logical volumes or the physical disks themselves may ultimately appear as physical disks (hdisks) to the Virtual I/O Server’s client partitions once they are associated with virtual SCSI host adapters. While the Virtual I/O Server software is
packaged as an additional software bundle that a user purchases separately from the AIX 53 distribution, the virtual I/O client software is a part of the AIX 5.3 base installation media so an administrator does not need to install any additional filesets on a Virtual SCSI client partition. Srikrishnan provides more details on how the Virtual SCSI feature works

Thursday, April 21, 2011

AIX Concepts

AIX
LVM:
VG: One or more PVs can make up a VG.
Within each volume group one or more logical volumes can be defined.
VGDA(Volume group descriptor area) is an area on the disk that contains information pertinent to the vg that the PV belongs to. It also includes information about the properties and status of all physical and logical volumes that are part of the vg.
VGSA(Volume group status area) is used to describe the state of all PPs from all physical volumes within a volume group. VGSA indicates if a physical partition contains accurate or stale information.
LVCB(Logical volume control block) contains important information about the logical volume, such as the no. of logical partitions or disk allocation policy.
VG type Max Pv’s Max LV’s Max PP’s/VG Max PP Size
Normal 32 256 32512 1G
BIG 128 512 130048 1G
Scalable 1024 4096 2097152 128G
PVIDs stored in ODM.
Creating PVID : chdev –l hdisk3 –a pv=yes
Clear the PVID : chdev –l hdisk3 –a pv=clear.
Display the allocation PP’s to LV’s : lspv –p hdisk0
Display the layout of a PV: lspv –M hdisk0
Disabling partition allocation for a physical volume : chpv –an hdisk2 : Allocatable=no
Enabling partition allocation for a physical volume : chpv –ay hdisk2 : Allocatable = yes
Change the disk to unavailable : chpv –vr hdisk2 : PV state = removed
Change the disk to available : chpv –va hdisk2 : PV state = active
Clean the boot record : chpv –c hdisk1
To define hdsik3 as a hotspare : chpv –hy hdisk3
To remove hdisk3 as a hotspare : chpv –hn hdisk3
Migrating ttwo disks : migratepv hdisk1 hdisk2
Migrate only PPS that belongs to particular LV : migratepv –l testlv hdisk1 hdisk5
Move data from one partition located on a physical disk to another physical partition on a different disk: migratelp testlv/1/2 hdisk5/123
Logical track group(LTG) size is the maximum allowed transfer size for an IO disk operation. Lquerypv –M hdisk0
VOLUME GROUPS
For each VG, two device driver files are created under /dev.
Creating VG : mkvg –y vg1 –s64 –v99 hdisk4
Creating the Big VG : mkvg –B –y vg1 –s 128 –f –n –V 101 hdisk2
Creating a scalable VG: mkvg –S –y vg1 –s 128 –f hdisk3 hdisk4 hdisk5
Adding disks that requires more than 1016 PP’s/PV using chvg –t 2 VG1
Information about a VG read from a VGDA located on a disk: lsvg –n VG1
Change the auto vary on flag for VG : chvg –ay newvg
Change the auto vary off flag for VG: chvg –an newvg
Quorum ensures data integrity in the event of disk failure. A quorum is a state in which 51 percent or more of the PVs in a VG accessible. When quorum is lost, the VG varies itself off.
Turn off the quorum : chvg –Qn testvg
Turn on the quorum : chvg –Qy testvg
To change the maximum no of PPs per PV : chvg –t 16 testvg.
To change the Normal VG to scalable vg : 1. Varyoffvg ttt 2. chvg –G ttt 3. varyonvg ttt
Change the LTG size : chvg –L 128 testvg à VG’s are created with a variable logical track group size.
Hot Spare: In Physical volume all PP’s shou;d be free. PP located on a failing disk will be copied from its mirror copy to one or more disks from the hot spare pool.
Designate hdisk4 as hot spare: chpv –hy hdisk4
Migrate data from a failing disk to spare disk: Chvg –hy vg;
Change synchronization policy : chvg –sy testvg; synchronization policy controls automatic synchronization of stale partitions within the VG.
Change the maximum no. of pps within a VG: chvg –P 2048 testvg
Change maximum no.of LVs/VG : chvg –v 4096 testvg.
How to remove the VG lock : chvg –u
Extending a volume group : extendvg testvg hdisk3; If PVID is available use extendvg –f testvg hdisk3
Reducing the disk from vg : reducevg testvg hdisk3
Synchronize the ODM information : synclvodm testvg
To move the data from one system to another use the exportvg command. The exportvg command only removes VG definition from the ODM and does not delete any data from physical disk. : exportvg testvg
Importvg : Recreating the reference to the VG data and making that data available.. This command reads the VGDA of one of the PV that are part of the VG. It uses redefinevg to find all other disks that belong to the VG. It will add corresponding entries into the ODM database and update /etc/filesystems with new values. importvg –y testvg hdisk7
  • Server A: lsvg –l app1vg
  • Server A: umount /app1
  • Server A: Varyoffvg app1vg
  • Server B: lspv|grep app1vg
  • Server B: exportvg app1vg
  • Server B: importvg –y app1vg –n V90 vpath0
  • Chvg –a n app1vg
  • Varyoffvg app1vg
Varying on a volume group : varyonvg testvg
Varying off a volume group : varyoffvg testvg
Reorganizing a volume group : This command is ued to reorganize physical partitions within a VG. The PP’s will be rearranged on the disks according to the intra-physical and inter-physical policy. reorgvg testvg.
Synchronize the VG : syncvg –v testvg ; syncvg –p hdisk4 hdisk5
Mirroring a volume group : lsvg –p rootvg; extendvg rootvg hdisk1; mirrorvg rootvg; bosboot –ad /dev/hdisk1; bootlist –m normal hdisk0 hdisk1
Splitting a volume group : splitvg –y newvg –c 1 testvg
Rejoin the two copies : joinvg testvg
Logical Volumes:
Create LV : mklv –y lv3 –t jfs2 –a im testvg 10 hdisk5
Remove LV : umount /fs1, rmlv lv1
Delete all data belonging to logical volume lv1 on physical volume hdisk7: rmlv –p hdsik7 lv1
Display the no. of logical partitions and their corresponding physical partitions: lslv –m lv1
Display information about logical volume testlv read from VGDA located on hdisk6: lslv –n hdisk6 testlv
Display the LVCB : getlvcb –AT lv1
Increasing the size of LV : extendlv –a ie –ex lv1 3 hdisk5 hdisk6
Copying a LV : cplv –v dumpvg –y lv8 lv1
Creating copies of LV : mklvcopy –k lv1 3 hdisk7 &
Splitting a LV : umount /fs1; splitlvcopy –y copylv testlv 2
Removing a copy of LV : rmlvcopy testlv 2 hdisk6
Changing maximum no.of logical partitions to 1000: chlv –x 1000 lv1
Installation :
New and complete overwrite installation : For new machine, Overwrite the existing one, reassign your hard disks
Migration: upgrade AIX versions from 5.2 to 5.3. This method preserves most file systems, including root volume group.
Preservation installation : If you want to preserve the user data.. use /etc/preserve.list. This installation overwrites /usr, /tmp,/var and / file systems by default. /etc/filesystems file is listed by default.
TCB:
  • To check the tcb is installed or not: /usr/bin/tcbck.
  • By installing a system with the TCB option, you enable the trusted path, trusted shell, trusted processes and system integrity checking.
  • Every device is part of TCB and every fle in the /dev directory is monitored by the TCB.
  • Critical information about so many files storing in /etc/security/sysck.cfg file
  • You can enable TCB anly at installation time
Installation steps : Through HMC à activate à override the boot mode to SMS.
Without hmc à After POST à hear the 2 beeps à press 1.
Insert the AIX 5L CD1. à select boot options(NO:5)àSelect install / Boot devise(Option1)à select CD/DVDà select SCSIà select the normal bootà exit from SMSàSystem boots from mediaàChoose languageàChange/show installation settingsàNew and complete overriteàselect harddiskàInstall optionsàenter to confirmàAfter installation system reboots automatically
Erase a hard disk à using diag command
Alternate Disk Installation:
  • Cloning the current running rootvg to an alternate disk
  • Installing a mksysb image on another disk.
Alt_disk_copy: Creates copies of rootvg on an alternate set of disks.
Alt_disk_mksysb: Installs an existing mksysb on an alternate set of disks.
Alt_rootvg_op: Performs wake, sleep and customize operations.
Alternate mksysb installation: smitty alt_mksysb
Alternate rootvg cloning: smitty alt_clone.
Cloning AIX :
  • Having online backup. As in case of disk crash.
  • When applying new maintenance levels, a copy of the rootvg is made to an alternate disk, then updates are applied to that copy
To view the BOS installation logs : cd /var/adm/ras à cat devinst.log. or alog –o –f bosinstlog. Or smit alog_show
Installation Packages:
Fileset : A fileset is smallest installable unit. Ex: bos.net.uucp
Package : A group of installable filesets Ex: bos.net
Licenced program products : A complete s/w product Ex :BOS
Bundle : A bundle is a list of software that contain filesets, packages and LPPs. Install the software bundle using smitty update_all.
PTF:Program temporary fix. It’s an updated fileset or a new fileset that fixes a previous system problem. PTF’s installed through installp.
APAR: Authorised program analysis report. APAR’s applied to the system through instfix.
Fileset revision level identification : version:release:modification:fixlevel
The file sets that are below level 4.1.2.0, type: oslevel –l 4.1.2.0
The file sets at levels later than the current maintenance level, type: oslevel -g
To list all known recommended maintenance levels on the system, type:oslevel –rq
Oslevel –s for SP level
Current maintenance level: oslevel -r
Installing S/W: Applied and commited
Applied: In applied state the previous version is stored in /usr/lpp/packagename.
Commited : First remove the previous version and go to for the installation
To install filesets within the bos.net software package in /usr/sys/inst.images directory in the applied state: installp –avx –d /usr/sys/inst.images bos.net
Install S/W in commited state: installp –acpX –d/usr/sys/inst.images bos.net
Record of the installp output stored in /var/adm/sw/installp.summary
Commit all updates: installp –cgX all
List all installable S/W : installp –L –d /dev/cd0
Cleaning up after failed installation : installp –C
Removing installed software: installp –ugp
Software Installation: smitty install_latest
Commiting applied updates: smitty install_commit
Rejecting applied updates: smitty install_reject
Removing installed software: smitty install_remove
To find what maintenance level your filesets are currently on : lslpp –l
To list the individual files that are installed with a particular fileset : lslpp –f bos.net
To list the installation and update history of filesets : lslpp –h
To list fixes that are on a CDROM in /dev/cd0 – instfix –T –d /dev/cd0
To determine if APAR is installed or not :     instfix –iK IY737478
To list what maintenance levels installed : instfix –i |grep ML
To install APAR : instfix –K IY75645 –d /dev/cd0
Installing individual fix by APAR: smitty update_by_fix
To install new fixes available from IBM : smitty update_all
Verifying the integrity of OS : lppchk –v
Creating installation images on disk: smitty bffcreate
Verify whether the software installed on your system is in a consistent state: lppchk
To install RPM packages using geninstall. à geninstall –d Media all
Uninstall software: geninstall –u –f file
List installable software on device: geninstall –L –d media.
AIX Boot Process:
1.      When the server is Powered on Power on self test(POST) is run and checks the hardware
2.      On successful completion on POST Boot logical volume is searched by seeing the bootlist
3.      The AIX boot logical contains AIX kernel, rc.boot, reduced ODM & BOOT commands. AIX kernel is loaded in the RAM.
4.      Kernel takes control and creates a RAM file system.
5.      Kernel starts /etc/init from the RAM file system
6.      init runs the rc.boot 1 ( rc.boot phase one) which configures the base devices.
7.      rc.boot1 calls restbase command which copies the ODM files from Boot Logical Volume to RAM file system
8.      rc.boot1 calls cfgmgr –f command to configure the base devices
9.      rc.boot1 calls bootinfo –b command to determine the last boot device
10. Then init starts rc.boot2 which activates rootvg
11. rc.boot2 calls ipl_varyon command to activate rootvg
12. rc.boot2 runs fsck –f /dev/hd4 and mount the partition on / of RAM file system
13. rc.boot2 runs fsck –f /dev/hd2 and mounts /usr file system
14. rc.boot2 runs fsck –f /dev/hd9var and mount /var file system and runs copy core command to copy the core dump if available from /dev/hd6 to /var/adm/ras/vmcore.0 file. And unmounts /var file system
15. rc.boot2 runs swapon /dev/hd6 and activates paging space
16. rc.boot2 runs migratedev and copies the device files from RAM file system to /file system
17. rc.boot2 runs cp /../etc/objrepos/Cu* /etc/objrepos and copies the ODM files from RAM file system to / filesystem
18. rc.boot2 runs mount /dev/hd9var and mounts /var filesystem
19. rc.boot2 copies the boot log messages to alog
20. rc.boot2 removes the RAM file system
21. Kernel starts /etc/init process from / file system
22. The /etc/init points /etc/inittab file and rc.boot3 is started. Rc.boot3 configures rest of the devices
23. rc.boot3 runs fsck –f /dev/hd3 and mount /tmp file system
24. rc.boot3 runs syncvg rootvg &
25. rc.boot3 runs cfgmgr –p2 or cfgmgr –p3 to configure rest of the devices. Cfgmgr –p2 is used when the physical key on MCA architecture is on normal mode and cfgmgr –p3 is used when the physical key on MCA architecture is on service mode.
26. rc.boot3 runs cfgcon command to configure the console
27. rc.boot3 runs savebase command to copy the ODM files from /dev/hd4 to /dev/hd5
28. rc.boot3 starts syncd 60 & errordaemon
29. rc.boot3 turn off LED’s
30. rc.boot3 removes /etc/nologin file
31. rc.boot3 checks the CuDv for chgstatus=3 and displays the missing devices on the console
32. The next line of Inittab is execued
/etc/inittab file format: identifier:runlevel:action:command
MkitabàAdd records to the /etc/inittab file
LsitabàList records in the /etc/inittab file
Chitabàchanges records in the /etc/inittab file
Rmitabàremoves records from the /etc/inittab file
To display a boot list: bootlist –m normal –o
To change a boot list: bootlist –m normal cd0 hdisk0
Troubleshooting on boot process:
Accessing a system that will not boot: Press F5 on a PCI based system to boot from the tape/CDROMàInsert volume 1 of the installation media àselect the maintenance mode for system recoveryà Access a root volume groupàselect the volume groupà
Damaged boot image:Access a system that will not bootàCheck the / and /tmp file system sizeàdetermine the boot disk using lslv –m hd5àRecreate the boot image using bosboot –a –d /dev/hdisknàcheck for CHECKSTOP errors on errlog. If such errors found probably failing hardware. àshutdown and restart the system
Corrupted file system, Corrupted jfs log: Access a system that will not bootàdo fsck on all filw systemsà format the jfs log using /usr/sbin/logform /dev/hd8àRecreate the boot image using bosboot –a –d /dev/hdiskn
Super block corrupted: If fsck indicates that block 8 is corrupted, the super block for the file system is corrupted and needs to be repaired ( dd count=1 bs=4k skip=31 seek=1 if=/dev/hdn of=/dev/hdn)àrebuild jfslog using /usr/sbin/logform /dev/hd8àmount the root and usr file systems by (mount /dev/hd4 /mnt, mount /usr)àCopy the system configuration to backup directory(cp /mnt/etc/objrepos* /mnt/etc/objrepos/backup)àcopy the configuration from the RAM fs(cp /etc/objrepos/Cu* /mnt/etc/objrepos)àunmount all file systemsàsave the clean ODM to the BLV using savebase –d /dev/hdiskàreboot
Corrupted /etc/inittab file: check the empty,missing inittab file. Check problems with /etc/environment, /bin/sh,/bin/bsh,/etc/fsck,/etc/profileàReboot
Runlevelà selected group of processes. 2 is muti user and default runlevel. S,s,M,m for Maintenance mode
Identifying current run levelàcatt /etc/.init.state
Displaying history of previous run levels: /usr/lib/acct/fwtmp < /var/adm/wtmp |grep run-level
Changing system run levels: telinit M
Run level scripts allow users to start and stop selected applications while changing the run level. Scripts beginning with k are stop scripts and S for start scripts.
Go to maintenance mode by using shutdown -m
Rc.boot fle: The /sbin/rc.boot file is a shell script that is called by the init. rc.boot file configures devices, booting from disk, varying on a root volume group, enabling fle systems, calling the BOS installation programs.
/etc/rc file: It performs normal startup initialization. It varyon all vgs, Activate all paging spaces(swapon –a), configure all dump devices(sysdumpdev –q), perform file system checks(fsck –fp), mount all
/etc/rc.net: It contains network configuration information.
/etc/rc.tcpip: it start all network related daemons(inted, gated, routed, timed, rwhod)
Backups:
MKSYSB : Creates a bootable image of all mounted filesystems on the rootvg. This command is for restore a system to its original state.
Tape Format : BOS boot image(kernel device drivers), BOS install image(tapeblksz, image.data, bosinst.data), dummy table of contents, rootvg backup
Exclude file systems using mksysb –ie /dev/rmt0
Cat /etc/exclude.rootvg
List content of MKSYSB image smitty lsmksysb
Restore a mksysb image : smitty restmksysb
Savevg command finds and backs up all files belonging to the specified volume group. Ex: savevg –ivf /dev/rmt0 uservg.
Restvg command restores the user volume group
Backup command backs up all files and file systems. Restore command extracts files from archives created with the backup command.
Verify the content of a backup media à tcopy /dev/rmt0
Daily Management :
/etc/security/environ : Contains the environment attributes for a user.
/etc/security/lastlog : Its an ascii file that contains last login attributes.(time last unsuccessful login, unsuccessful login
count, time last login)
/etc/security/limits : It specify the process resource limits for each user
/etc/security/user :
/usr/lib/security/mkuser.default : It contains the default attributes for a new user.
/etc/utmp file contains record of users logged into the system Command : who –a
/var/adm/wtmp file contains connect-time accounting records
/etc/security/failedlogin contains record of unsuccessful login attempts.
/etc/environment contains variables specifying the basic environment for all processes.
/etc/profile file is first file that the OS uses at login time.
To enable user smith to access this system remotely : chuser rlogin=true smith
Remove the user rmuser smith
Remove the user with remove the authentication information rmuser –p smith
Display the current run level : who –r
How to display the active processes : who –p
Changing the current shell : chsh
Change the prompt : export PS1=”Ready.”
To list all the 64-bit processes : ps –M
To change the priority of a process : nice and renice
SUID –set user id – This attribute sets the effective and saved user ids of the process to the owner id of the file on execution
SGID – set group id -- This attribute sets the effective and saved group ids of the process to the group id of the file on execution
CRON daemon runs shell commands at specified dates and times.
AT command to submit commands that are to be run only once.
System Planning:
RAID: Redundant array of independent disks.
RAID 0: Striping. Data is split into blocks of equal size and stored on different disks.
RAID 1: Mirroring. Duplicate copies are kept on separate physical disks.
RAID 5: Striping with Parity. Data is split into blocks of equal size. Additional data block containing parity information.
RAID 10: It is a combination of mirroring and striping.
AIX 5.3 requires at least 2.2 GB of physical space.
Configuration:
ODM: ODM is a repository in which the OS keeps information about your system, such as devices, software, TCP/IP configuration.
Basic Components of ODM: object classes, objects, descriptors
ODM directories: /usr/lib/objrepos, /usr/share/lib/objrepos, /etc/objrepos
Following steps for NFS implementation:
<!--[if !supportLists]-->· <!--[endif]-->NFS daemons should be running on both server and client
<!--[if !supportLists]-->· <!--[endif]-->The file systems that need to be remotely available will have to be exported(smitty mknfsexp, exportfs –a , showmount –e myserver)
<!--[if !supportLists]-->· <!--[endif]-->The exported file system need to be mounted on the remote systems
NFS services: /usr/sbin/rpc.mountd, /usr/sbin/nfsd, /usr/sbin/biod,rpc.statd, rpc.lockd
Changing an exported file system: smitty chnfsexp TCP/IP Daemons: inetd,gated, routed,named,
Configuration:
ODM: ODM(Object data manager) is a repository in which the OS keeps information regarding your system, such as devices, software or TCP/IP information.
ODM information is stored in /usr/lib/objrepos, /usr/share/lib/objrepos, /etc/objrepos.
ODM commands: odmadd, odmchange, odmcreate, odmshow, odmdelete, odmdrop, odmget,
To start the graphical mode smit using smit –m
Creating alias: alias rm=/usr/sbin/linux/rm
Export PATH=/usr/linux/bin:$path; print $path
Netwok File System:
Daemons: Server side(/usr/sbin/rpc.mountd, /usr/sbin/nfsd, portmap, rpc.statd, rpc.lockd) Client side ( /usr/sbin/biod)
Start the NFS faemons using mknfs –N. To start all nfs daemons using startsrc –g nfs.
Exporting nfs directories:
  • Verify nfs is running or not using lssrc –g nfs
  • Smitty mknfsexp
  • Specify path name, set the mode(rw,ro). It updates /etc/exports file.
  • /usr/sbin/exportfs –a à it sends all information in the /etc/exports to kernel.
  • Verify all file systems exported or not using showmount –e Myserver
Exporting an nfs directory temporarily using exportfs –i /dirname
Un exporting an nfs directory using smitty rmnfsexp
Establishing NFS mounts using smitty mknfsmnt
Changing an exported file system using smitty chnfsexp
Network configuration:
Stopping TCP IP daemons using /etc/tcp.clean script.
/etc/services file contains information about the known services
Add network routes using smitty mkroute or route add –net 192.168.1 –netmask 255.255.255.0
Traceroute command shows the route taken
Changing IP address smitty mktcpip
Identifying network interfaces : lsdev –Cc if
Activating network interface: ifconfig interface address netmask up
Deactivating network interface: ifconfig tr0 down
Deleting an address: ifconfig tr0 delete
Detaching network interface: ifconfig tr0 detach
Creating an IP alias: ifconfig interface address netmask alias
To determine MTU size of a network interface using lsattr –El interface.
Paging Space: A page is unit of virtual memory that holds 4kb of data.
Increasing paging space: chps –s 3 hd6 ( it’s a 3LP)
Reducing paging space: chps –d 1 hd6
Moving a paging space within the VG: migratepv –l hd6 hdisk0 hdisk1
Removing a paging space: swapoff /dev/paging03; rmps paging03
Device configuration:
Lscfgà detail about devices ex: lscfg –vpl rmt0
To show more about a particular processor: lsattr –El proc0
To discover how much memory is installed: lsattr –El sys0 | grep realmem.
To show processor details: lscfg |grep proc or lsdev –Cc processor
To show available processors: bindprocessor –q
To turn on SMT using smtctl –m on –w boot
To turn off SMT : smtctl –m off –w now
Modifying an existing device configuration using chdev. The device can be in defined,stopped,available state.
To change maxuproc value: chdev –l sys0 –a maxuproc=100
Remove a device configuration: rmdev –Rdl rmt0
Bootinfo –y command à returns 32 bit or 64 bit.
Commands to run enable 64 bit: ln –sf /usr/lib/boot/unix_64 /unixàln –sf /usr/lib/boot/unix_64 /usr/lib/boot/unixàbosboot –ad /dev/ipldevice àshutdown –r àls –al /unix
File Systems:
Types: Journaled, Enhanced journaled, CDROM, NFS
FS Structure: Super block, allocation groups, inodes, blocks, fragments, and device logs
Super block: It contains control information about file system, such as overall file system in 512 byte blocks, FS name, FS log device, version no, no. of inodes, list of free inodes, list of free data blocks, date and time of creation, FS state.
This data is stored in first block of FS and 31.
Allocation group:It consists of inodes and corresponding data blocks.
Inodes: It contains control information about the file. Such as type, size, owner, date and time when the file was created, modifies, last accessed, it contains pointers to data blocks that stores actual data. For JFS maximum no.of inodes and files is determined by the no. of bytes per inode(NBPI). For JFS 16MB inode. For JFS2 there is no NBPI.
Data Blocks: actual data. The default value is 4KB.
Device logs: JFS log stores transactional information. This data can be used to roll back incomplete operations if the machine crashes. Rootvg use LV hd8 as a common log.
FS differences:
Function JFS JFS2
Max FS Size 1TB 4PB
Max File Size 64G 4PB
Np.of inodes Fixed Dynamic
iNode size 128B 512B
Fragment Size 512 512
Block size 4KB 4KB
Creatinf FS: crfs –v jfs2 –g testvg –a size=10M –m /fs1
Display mounted FS: mount
Display characteristics of FS: lsfs
Initialize log device: logform /dev/loglv01
Display information about inodes: istat /etc/passwd
Monitoring and Performance Tuning:
Quotaon command enables disk quotas for one or more file systems
Ouotaoff command disables disk quotas for one or more file systems
Enable user quotas on /home: chfs –a “quota=userquota,groupquota” /home
To check the consistency of the quota files using quotacheck
Edquota command to create each user or group’s soft and hard limits for allowable disk space and maximum number of files
Error logging is automatically started by the rc.boot script
Errstop command stops the error logging
The daemon for errlog is errdemon
The path to your system’s error log file: /usr/lib/errdemon –l
Change the maximum size of the error log: errdemon –s 2000000
Display all the errors which have an specific error id: errpt –j 8527F6F4
Display all the errors logged in a specific time: errpt –s 1122164405 –e 1123100405
To delete all the entries: errclear 0
Delete all the entries classified as software errors: errclear –d s 0
VMSTAT: It reports kernel threads, virtual memory, disks, traps and cpu activity.
To display 5 summaries at 1 second intervals use vmstat 1 5
Kthr(kernel thread state) ràaverage number of runnable kernel threads. Bàaverage number of kernel threads placed in the VMM wait queue
Memory(usage of virtual and real memory). Avm à active virtual pages, total number of pages allocated in page space. A high value is not an indicator of poor performance. Freàsize of the free list. A large portion of real memory is utilized as a cache for file system data.
Page(information about page faults and page activity). Reàpager input/output list, piàpages paged in from paging space, poàpages paged out to paging space, fràpages freed, sràpages scanned by page replacement algorithm, cyà clock cycles used by page replacement algorithm
Faults(trap and interrupt rate averages per second): inàdevice interrupts, syàsystem calls, csàkernel thread context switches
CPU(breakdown of percentage usage of CPU time): usàuser time, syàsystem time, idàcpu idle time,waàwaiting for request, pcànumber of physical processors consumed ecàthe percentage of entitled capacity consumed.
Disks(provides number of transfers per second)
SAR: sar 2 5(%usr, %sys, %wio, %idle, physc)
To report activity for the first 2 processors for each second for next 5 times: sar –u –P 0,1 1 5
Topas:
<!--[if !vml]--> <!--[endif]-->
Tuning Parameters:
/etc/tunables directory centralizes the tunable files.
Nextboot: this file is automatically applied at boot time.
Lastboot: It contains tunable parameters with their values after the last boot.
Lastboot.log: It contains logging of the creation of the lastboot file.