No Gravatar

Issue:

Recently I came across a situation with a customer, where they were experiencing sparactic LUN drops in there Vsphere clusters.  This occured in 4.1 and 4.0.  Also, what is unique is this occurred between two HP storage arrays…The luns would disappear for a few minutes than come back.  I looked at it from the angle of performance problems in there SAN enviroment.

These x028 errors aren’t unique to HP Arrays shown below other Manufactures seem to be fighting them as well including EMC and IBM you can read about it here:

Looking at the vmkernel logs lots of lines followed like:

Dec 17 07:03:24 esx01 vmkernel: 22:01:35:32.584 cpu7:4517)NMP: nmp_CompleteCommandForPath: Command 0x2a (0×410001176200) to NMP device ”naa.600508b40008dfcc0000600000cc0000″ failed on physical path ”vmhba2:C0:T0:L1″ H:0×0 D:0×28 P:0×0 Possible sense data: 0×0 0×0 0×0.   Dec 17 07:03:24 esx01 vmkernel: 22:01:35:32.584 cpu7:4517)ScsiDeviceIO: 770: Command 0x2a to device ”naa.600508b40008dfcc0000600000cc0000″ failed H:0×0 D:0×28 P:0×0 Possible sense data: 0×0 0×0 0×0.
Here is the HP Published Solution
NOTE: The above-mentioned URL will take you to a non-HP Web site. HP does not control and is not responsible for information outside of the HP Web site.
The hexa decimal values H:0×0 D:0×28 P:0×0 decodes to Task set full as per the above article.
VMware reports this error when the storage controller returns Queue Full or BUSY signal to an IO request.
A storage controller may return Queue Full or BUSY signal when it encounters resource congestion due to overutilization.
In VMware environments, this may be caused by high Queue Depth at controller ports during heavy workload or due to large size IOs issued by VMware. By default VMware is capable of sending IO blocks up to 32MB.
In many cases the following steps have helped mitigate the issue:
  1. Capture the evaperf logs during the time the errors are reported and ensure that the array utilization is well within the acceptable safe IOPS values for the given configuration.
  2. Set the maximum IO size to 128 as mentioned in the below VMware article.
  3. Follow the EVA – VMware bestpractices guide and ensure the multipath policy is set correctly.
    Click here to access the technical article available athttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-2185ENW.pdf.
  4. Enable Adaptive Queue depth throttling as mentioned in the below article.
    NOTE: The above-mentioned URLs will take you to a non-HP Web site. HP does not control and is not responsible for information outside of the HP Web site.
    For EVA, QFullSampleSize value of 32 and QFullThreshold value of 8 is found helpful in many cases.

One Response to “VMware NMP Errors and Lun Dropping with HP EVA SANs”

  1. This solution solved the NMP errors from reoccuring in the logs also more importantly storage hasn’t dropped :)

    Here is the official link:
    http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSeriesId=3664583&prodTypeId=12169&objectID=c02697105

Leave a Reply

*