dinsdag 8 december 2009

NIM : Replication issue

Introduction:
Since AIX 5.3 TL5, Network Installation Manager supports replication of NIM objects from the NIM master to the alternate NIM master (APAR IY81860). Apparently, this feature does not function properly.

Impacted:
- All AIX versions up till now
http://www-933.ibm.com/eserver/support/fixes/fixcentral/pfixpacks/53
http://www-933.ibm.com/eserver/support/fixes/fixcentral/pfixpacks/61

Details:
The setup consists of the following two nodes:
master (NIM master) and alternate (Alternate NIM master)

When issuing a regular sync operation on the NIM master, the operation is successful:
# nim -Fo sync alternate
...
nim_master_recover Complete

When issuing a sync operation on the NIM master with the replicate option (this will copy all resources that are not present on the alternate NIM master), the following error is observed.
# nim -Fo sync -a replicate=yes alternate
...
nim_master_recover Complete
error replicating resources: unable to /usr/lpp/bos.sysmgt/nim/methods/c_rsh master
Finished Replicating NIM resources
...
Finished checking SPOTs
nim_master_recover Complete

The replicate operation fails because of the broken c_rsh utility.
Further debugging of c_rsh on the NIM master learned that there are several ODM lookups prior to the error.
# truss /usr/lpp/bos.sysmgt/nim/methods/c_rsh master date 2>&1 | grep objrepos
...
statx("/etc/objrepos/nim_object", 0x2FF1FD70, 76, 0) = 0
kopen("/etc/objrepos/nim_object", O_RDONLY) = 5
kopen("/etc/objrepos/nim_attr", O_RDONLY) = 5
kopen("/etc/objrepos/nim_attr.vc", O_RDONLY) = 6
...

Resolution:
Following PMR 25293.300.624, the IBM lab stated that a failing ODM lookup is the root cause of the issue. As a result, a particular data structure is not populated and the signal 11 occurs when trying to copy a string to this structure.

APAR IZ66255 was created to address this issue.

Geen opmerkingen:

Een reactie posten