content top

Cisco UCS Recall Firmware Issues

No Gravatar

Cisco is having to replace all the B440 blades that are deployed:
Update:
The firmware upgrade initially prescribed in this Field Notice has successfully detected FET failures and shut down servers, preventing a potential thermal event. Since this upgrade was released, however, a FET failure on a UCS B440 Blade Server has resulted in a second incident as described above. Cisco is directly contacting UCS B440 Blade Server customers and will replace UCS B440 Blade Servers currently deployed at customer sites. Cisco is making UCS B440 Blade Server hardware modifications, and a hardware replacement program has been launched. No other UCS hardware is affected.
Information is from a Cisco public website: http://www.cisco.com/en/US/ts/fn/634/fn63430.html

 

Read More

What’s New Netapp 8.1 Storage Efficency

No Gravatar

Netapp 8.1.Cluster Mode Overview -

  • - Scalable cluster of Netapp filers or vfilers with 3rd party storage managed centrally and working collectively.   NAS scalable to 24 nodes, SAN scalable to 4.
  • Support for SAN protocols
  • NFS4.1 Support
  • PNFS Support
  • Asnychronous volume snap mirror for DR

Netapp 8.1 7 Mode

-Upgrades much easier, straight upgrade from 7.3.2+ to 8.1
-SAS and SATA drives can now be used in metrocluster
 

Both Modes

***Ability to do 32 to 64 bit aggregate conversion; rather than copy to new aggregate.  Happens as aggregate grows beyond 16tb aggregate limit automatically.  No performance impact.
***Aggregate size increased for the most part 50% for all systems from 8.0.x to 8.1.  This is inline with 3TB drives released.
-Max volume size is static.
***32 Bit volume snap mirror to 64Bit volume supported. 
***64 bit aggregrates are now default rather than 32.

Biggest Changes in Both Releases Storage Efficency = Deduplication, Cloning and Compression

  1. No deduplication license
  2.  Dedupe volume limit increased maxium volume size of supported system
  3. Deduplication Performance increased, 33% higher
  4. Hourly checkpoints checkpoints hourly, incase dedupe is stopped and restarted
  5. Block Ratios

DataCompression

  1. Compression wasn’t GA in 8.0.X in 8.1 its GA.  When it was originally released it was inline only meaning that it was compressed before going to disk.  Now it is offered both inline and post process scheduling.
  2. MetroCluster Support since 8.0.2
  3. Current customer usage is 40% breakdown to production vs 60% backup and archive
  4. No licenses or volume limits
  5. Compression of existing data 7x faster than 8.0.x

***Comparing Dedupe to Compression – Dedupe looks at the 4k blocks and creates pointers to other similar 4k blocks.  Compression looks at 32k block and looks for largest repeating pattern, cyclical process until no more repeating patterns.

Systems Manager

  • 8.1 will require system manager 2.0.  System Manager will work back to DataOnTap 7.2
  • Browser base supported on Windows and Linux
  • ClusterMode Support
  • Ability to halt and reboot controllers
  • Create and Manage Snapmirror relationships
  • Dedupe,compression, and snapshot integration
  • File and Lun Cloning
Read More

Whats New In vSphere 5

No Gravatar

Here is a list I compiled after completing my What’s New In vSphere 5 class.  I won’t bother getting into licensing….

-No block limit size for VMDK
-VMDK scales to 64 TB with 2TB limit on files
- 32 Cores up from 8, 1 TB of Ram
-NFS DataStores increase from 32 to 256
-SATP Modules loaded on demand no more setting from command prompt
-No more setting iscsi port binding from command prompt gui driven
-Software base fcoe initiator

–VAAI -thin provisioning stun + VMFS space reclamation = As the data store reaches max compacity, instead of overfilling and risking data corruption at 97% VMs are put in suspend mode.  Space Reclamation works with the storage array vendor to shrink the lun size on a thin provisioned volume as used space decreases to increase storage efficency.
–vCenter client for Linux.
-vCenter appliance linux based….No it doesnt support linked mode
-Profile Driven Storage- Create profiles for SLAs based on performance etc for performance and availability.
-Storage DRS- DRS based on i/o and size
-Virtual Machine format 8 not really exciting usb 3.0 support and also Windows Aero
-Storage I/O Control – Extended to NFS
-Network I/O Control – More granular offering per virtual machine control.
-Vsphere WebClient – Wasnt this in ESX2?
–RTT Latency increase from 5ms to 10ms in Enterprise Plus vMotion

http://www.vmware.com/files/pdf/products/vsphere/vmware-what-is-new-vsphere5.pdf

http://www.vmware.com/files/pdf/techpaper/Whats-New-VMware-vSphere-50-Storage-Technical-Whitepaper.pdf

http://www.vmware.com/files/pdf/techpaper/Whats-New-VMware-vSphere-50-Networking-Technical-Whitepaper.pdf

Read More

3Par Virtulization Update

No Gravatar

I got this from a friend this morning. Good stuff.  New 3Par Virtualization white paper for inform os 3.1.1 and Recovery Manager updates…http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA3-4023ENW.pdf

Q: Should a Thin Provisioned Virtual Volume (TPVV) be used for ESX datastores?
A: Thin provisioning makes sense as long as you are not going to fill up the volume right away. If the volume is going to be full in a short time, there is no benefit to be gained from thin provisioning. If VMware snapshots are being created and are not being cleaned up, the benefits of thin provisioning will be negated. To the ESX host, it makes no difference if the host sees a thin provisioned LUN or a traditional “thick” provisioned LUN.

Q: Is there any overhead to using Thin Provisioned Virtual Volume (TPVV)?
A: The additional overhead of TPVVs as compared to traditional volumes is negligible.

Q: If using a 2 TB Thin Provisioned Virtual Volume (TPVV), what will vSphere see?
A: vSphere will see a 2 TB LUN available for its use. Without the use of VAAI and/or T10, vSphere cannot determine if a volume is a thin provisioned virtual volume and a traditional virtual volume. With vSphere 4 or vSphere 5.0 on InForm OS 2.3.1 we recommend installing the VAAI plug-in. For vSphere 5.0 and InForm OS 3.1.1 or higher, no plug-in is required.

Q: What size Virtual Volume (VV) should be created?
A: The volume size is not as important to HP 3PAR Storage Systems, as VVs are widely striped across as many drives as possible within the array. If using HP 3PAR Thin Provisioning Software, actual storage capacity is only consumed upon write. However, a 2 TB VV will be able to accommodate many more VMs than a 500 GB VV. ESX 4.1 or later has improved the way it performs metadata locking, meaning that you can now create a VV that is as large as you are comfortable with. For ESX 4.0 and prior, in order to minimize the impact of SCSI reservations and keep the environment well balanced, it is best to create 500 GB volumes (thin provisioned or “thick” provisioned).

Q: How many VMs can be put on a single Virtual Volume (VV)?
A: It depends. There is no one answer that will work for every situation. A number of factors such as server hardware, number of CPUs, amount of memory, type of VMs, applications running in the VMs, etc. will determine how many VMs can be comfortably hosted on a LUN.

Q: What type of path policy (Fixed, MRU, Round Robin) should be used with HP 3PAR Storage Systems?
A: With ESX 3.5, use the default policy (Fixed). With ESX 4.0 and later, change the default policy to the Round Robin path policy. To do this, log in to the service console for each ESX 4.0 host and run the following command line:
esxcli nmp satp setdefaultpsp –satp VMW_SATP_DEFAULT_AA –psp VMW_PSP_RR
For vSphere 5 you can change the path selection algorithm using the Manage Paths dialog box either from the Datastores or Devices view or from the command line on each host:
esxcli storage nmp satp set –satp=VMW_SATP_DEFAULT_AA –default-psp=VMW_PSP_RR

Q: VMware supports thin VMs. Which thin provisioning should be used: VMware, HP 3PAR, or both?
A: VMware thin provisioning only applies to VMs at the VMFS level. It allows one to over-allocate VMs to maximize VMFS usage. If the goal is to reduce storage costs and maximize storage utilization, then use HP 3PAR Thin Provisioning Software to provision large VMFS volumes with minimal upfront storage costs. There are no additional storage savings to be realized by using VMware thin provisioning. VMware thin provisioning does consume some CPU cycles on the ESX host as it is performed at the software layer (as compared to HP 3PAR Thin Provisioning Software, which is performed on the array). It is perfectly fine to place VMware thin VMs on HP 3PAR Thin Provisioning Software volumes so long as you are prepared to manage thin provisioning at both the VMware level and the array level.

Q: Is the UNMAP primitive of VAAI supported across the board?
A: No, UNMAP is not supported prior to vSphere 5.0 with InForm OS 3.1.1.

Q: When do I need to install the VAAI plug-in?
A:

·         VAAI is not supported on HP 3PAR InForm OS 2.3.1 MU1 or earlier.

·         On vSphere 4.1, the VAAI 1.1 plug-in is required.

·         When running vSphere 5.0 with HP 3PAR InForm OS 2.31, the 3PAR VAAI 2.2 plug-in is required.

·         For vSphere 5.0 with HP 3PAR InForm OS 3.1.1 or higher, no plug-in is needed as all of the VAAI primitives are supported natively.

Read More

HP CloudSystem Matrix Lab

No Gravatar

This was really cool.  It was literally the most impressive thing at HP discover for me.  It also tied in the Instant On Enterprise theme or slogan in seeing the power cloud maps and automation in the Matrix OS…  In the past as working as a customer and consultant I was able to implement and administrate Insight Control, however I never had the chance to use Insight Orchestratration.

Some Quick Notes that I took are:

  • The Matrix software is version 6.3 running on a DL 380 Central Management Server
  • Blades Chassis with Flex-10 Virtual Connect/Flex-Fabric…
  • All HP Storage is Supported even Netapp and EMC
  • To use Storage Provisioning Manager you have to use an EVA or 3Par, I found it odd not 3Par Lefthand

Here is what we did in Lab…I’ll try to get slides, btw besides your workstation with a takehome lab manual they provided the information on a 17inch notebook as well.  It was really done Vegas style.

1) Creating Cloud Maps-  Inside Insight Orchestration Designer we created two templates on for physical servers on for virtual servers.  On the template we dropped the logical group objects for networks, storage, and servers being physical or virtual then published them.

2) Self Service Portal- We logged into this, and were presented with history of requests, running services, and resource utiliziation.  You can specify here in the self service portal also what users have access to what templates.  Next we created a service this creates a request to deploy your cloud map.  Once this is done the only thing left is to approve.  We approved and our infastructure was automatically deployed.  No lab hangs up.  It worked flawlessly.

3)  Importing Cloud Maps-  This was the coolest thing we did in lab, it really demonstrated the power and dynamic capability of the tool, definately makes you think Instant on Enterprise.  We logged into Insight Orchestration Designer and imported in a very elaborate cloud map.  It just showed how easy it would to replicate private cloud infrastructures through the process of importing and exporting cloud maps.

4) Storage Provisioning Manager and SPE –  As stated earlier this is the part that is only supported with EVA and 3Par.  It manages storage provisioning in the Matrix OS.  

 Two main use models are:

  • Pre-poulated model the adminstrator populates the SPM catalog with storage services to be exposed to server management software, then sets access rights…Matrix then enables an administrator to request storage, which is sent to SPM as a storage services requests.  SPM then provides canidate services, and the administrator selects one or more apporiate volume services to fulfill the set of goals(requests)
  • Request Based Model-  Administrator uses Matrix OS to request storage before it is imported or created.  The Administrator then adds the arrays and volumes to fulfil the goal.

In class we modified SPE (Storage Pool Entities) and changed storage type, portability, OS, and added a volume. 

Insight Capacity Planner/Advisor- 

  • We created a planning senario selecting 20 legacy then gathering the performance statistics we could do senario forecasts with entering in CPU, Memory, Network I/O, and Disk I/O growth rate. 
  • We also created a planning senario in which we consolidated to VMs
  • Power and Price were also reportable and comparable
Read More

HP P6000′s Whats New?

No Gravatar

A lot.  I got to see in the flesh at HPDiscover last week, couldn’t really talk about it before then… So many new features and enhancements for one code release upgrade for existing x400 customers, that is the first thing that came to mind as I poured over all the new features in the P6000 from HP, not to mention the P6000 itself.  HP released the P6300 and P6500 looks like there isn’t an option as of yet for the 8400 replacement in terms of cache.  Can only guess 3Par is the probley the reason why…

Here is my summation, I am probley missing some:

1) Thin Provisioning –  This needed to be done, glad to see it is finally here.

2) Dynamic Capacity Management – This is free now.

3) Business Copy – Support for VMFS volumes; also support for large luns

4) Different Connectivity Options – You can get different connectivity options like iSCSI FCoE FCP this is nice

5) Shrink live luns

6) Change vRaid on Fly (can’t be thin provisioned)

7) Move Lun between diskgroup –  This is great; I wouldn’t want autotiering anyhow

8) Drive Form Factor shelves are either LFF 12 drives or 24 SFF SAS

9) 6Gb Sas Connectivity for drive shelves…

10) Multiple connectivity option e.g. fibre channel, iscsi, FCoE, not to mention combo option e.g. fibre channel and iSCSI

What’s Missing?

Still not sure on VAAI support and where this is headed

What do I have to do to get there?

Buy a P6000, or upgrade your existing commandview and xcs firmware

Anything you don’t like?

Yeah, I don’t like that they took the EVA4400 controller design and implemented it in the P6000 design with 2 controllers and 1 shelf.  I liked the previous EVA6400/8400 design much better.

How about the price?

The price is significantly less

Read More

HP Discover 2011

No Gravatar

Yesterday I found out I am going to Vegas for the HP 2011 Discover Conference.  The first thing I did was start the registration process. I heard a rumor last week that a lot of sessions are already filled and I can confirm this is true.  A side note there is so much stuff left though, you could easily spend two full weeks.

For me the highlight is labs. I could only pick two labs per stated in the registration system this was somewhat disappointed as I tried to originally schedule 5.  I scheduled Hands on with the Blade System Matrix and Hands on with 3Par Advanced.  It was really hard to narrow this down to two…I am very excited to about E5000 best practices session.  I love the idea of the E5000.  If you haven’t had a chance to read about it check it out here it’s very innovative.

Subject Start Date Start Time
4573  –  Keynote: IT will run on a converged infrastructure 6/7/11 10:30 AM
4563  –  HP CloudSystem: Build, manage and consume services across private, public and hybrid clouds 6/7/11 1:30 PM
4020  –  Virtual Desktop Infrastructure for the enterprise from HP, Microsoft, and Citrix 6/7/11 3:00 PM
4857  –  Addressing VDI storage requirements 6/7/11 4:30 PM
3501  –  Ten ways that HP StoreOnce helps protect your data 6/7/11 6:00 PM
2080  –  Hands-on experience with HP BladeSystem Matrix 6/8/11 8:00 AM
4368  –  HP Storage Track Keynote:  converged storage for the next era of computing 6/8/11 11:00 AM
5420  –  Simplified server connectivity management with Virtual Connect Enterprise Manager 6/8/11 2:00 PM
3139  –  P4000 SAN evolution 6/8/11 3:30 PM
3313  –  HP StorageWorks EVA feature enhancements 6/8/11 5:00 PM
4628  –  Discover what’s new with the HP StorageWorks Enterprise Virtual Array 6/9/11 8:00 AM
4964  –  Implementing HP 3PAR Storage With VMware 6/9/11 9:30 AM
3106  –  Enterprise Databases on HP 3PAR Virtualized Storage Arrays 6/9/11 11:00 AM
3704  –  Hands-on with HP 3PAR advanced 6/10/11 8:00 AM
4182  –  Using the HP E5000 Messaging System to implement Exchange 2010 SP1 best practices 6/10/11 11:00 AM
Read More

Virtual Connect Manager May Be Unable to Communicate if DNS Is Enabled for Virtual Connect Ethernet Modules

No Gravatar

This advisory was released today  for Virtual Connect you can read about it here:

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02720395&lang=en&cc=us&taskId=101&prodSeriesId=3540808&prodTypeId=329290

 

SUPPORT COMMUNICATION – CUSTOMER ADVISORY

Document ID: c02720395

Version: 3

Advisory: (Revision) HP Virtual Connect – Virtual Connect Manager May Be Unable to Communicate (NO_COMM) if DNS Is Enabled for Virtual Connect Ethernet Modules
NOTICE: The information in this document, including products and software versions, is current as of the Release Date. This document is subject to change without notice.

Release Date: 2011-03-07

Last Updated: 2011-03-07


DESCRIPTION
Document Version
Release Date
Details
3
03/07/2011
Added clarifications to the three scenarios described in the Resolution section to ensure the full sequence of steps is followed.
2
03/04/2011
Added additional details regarding the circumstances in which the issue may occur. Also, added three different workaround scenarios depending on whether Enclosure Bay IP Addressing (EBIPA) or external DHCP is being used and the version of OA firmware that is in use.
1
02/14/2011
Original Document Release.

The HP Virtual Connect Manager (VCM) may not be able to communicate (NO_COMM) with Virtual Connect (VC) Ethernet modules in an HP BladeSystem c-Class enclosure or multiple enclosures that are part of the same Virtual Connect Domain. The loss of communication may occur in a new or an existing environment when a VCM Administrator attempts to perform any of the following tasks:

  • Firmware Update
  • Add/remove/reset server blades or Onboard Administrator (OA) modules
  • Retrieve any VC Ethernet module status and state information (e.g. stacking links, port statistics, etc.)
  • Add/edit/copy/delete/assign Server Profile
  • Add/edit/delete VC Network
  • Configure Port Mirroring
  • Restore Domain Configuration
  • Change SNMP Settings
  • Change Advanced Ethernet Settings

IMPORTANT: Attempting to execute any of the above tasks during NO_COMM adds additional risk of a network outage during the recovery steps described below.

Customers particularly susceptible to this issue have VC Modules with management IP Addresses configured in the 10.x.x.x range and configured for DNS. When this problem occurs, the VC Manager will still be accessible, but all VC Ethernet modules in the domain will be displayed with an Overall Status of “No Communication.” The Virtual Connect Domain will show a “failed” status, stacking links will show “failed” and Profiles and Networks will show a status of “Unknown.”

When in this state, server blades in the Virtual Connect domain will still be able to pass data traffic with no impact.

This occurs if DNS is enabled for the primary VC module. The VCM may initiate a DNS reverse lookup for a very limited scope of incorrect IP addresses for the VC Ethernet modules. If this reverse lookup fails, (i.e., it is not answered by the DNS infrastructure), the primary VC module will be able to communicate correctly with the VC Ethernet modules.

If the DNS infrastructure responds to this incorrect DNS reverse lookup, then VCM attempts to communicate with the VC Ethernet modules on this incorrect IP Address and fails, triggering a NO_COMM condition. Recently, the global DNS infrastructure began responding to these limited DNS reverse lookups.

SCOPE

Any HP Virtual Connect Ethernet Modules in a c-Class BladeSystem enclosure running VC Firmware Version 1.3x, 2.x or 3.x (up to and including 3.15)

RESOLUTION

A future version of Virtual Connect firmware (currently planned for release in mid-March 2011) will prevent this issue from occurring.

As a workaround, disable DNS for the Virtual Connect Ethernet Modules in Enclosure Bay IP Addressing (EBIPA) or external DHCP. Removing DNS from VC Modules can potentially impact the following Virtual Connect features, if configured to use DNS names:

  1. Directory Server Settings – If a DNS name is configured for the Directory Server Address then it will no longer be resolved. The IP address will need to be configured as the Directory Server Address.
  2. SNMP Trap Destination – If a DNS name is configured for the SNMP Trap Destination then it will no longer be resolved. The IP address will need to be configured as the SNMP Trap Destination.
  3. From the VCM CLI – Any URL targets provided to save backup configuration or support dump will need to use IP address and not DNS name.

The following three workaround scenarios depend on whether Enclosure Bay IP Addressing (EBIPA) or external DHCP is being used and the version of Onboard Administrator (OA) firmware that is in use:

Scenario 1 – Enclosure Bay IP Addressing is being used to provide IP Addresses to the VC Ethernet Modules and the OA firmware version is 3.00 (or higher):

  1. Log into the Onboard Administrator and select Enclosure Settings > Enclosure Bay IP Addressing.
  2. Select the “Interconnect Bays” tab and remove DNS server IP address entries from the bays that include VC Ethernet Modules and click Apply.
  3. Within 5 minutes, the DNS settings for the modules should update and normal module communication will be restored.
  4. It is important that no VC domain changes are made until the following step is fully completed.
  5. From the VCM GUI, select “Tools” => “Reset Virtual Connect Manager.” This will force resynchronization of the modules if not synchronized in Step 3 above.IMPORTANT : If the NO_COMM condition was present or detected during one of the VCM administrative update tasks (listed in the DESCRIPTION section above), VCM may automatically resynchronize the modules, which would create a temporary VC domain-wide network outage during VC module initialization in either Step 3 or Step 5 above (but not both). Outage time will vary depending on the size of the VC domain.

Scenario 2 - Enclosure Bay IP Addressing is being used to provide IP Addresses to the VC Ethernet Modules and the OA firmware version is 2.60 (or earlier). If iLO DNS name registrations are statically assigned in the DNS infrastructure, move to Step 3 below:

  1. If relying on Dynamic DNS updates for iLO, the OA firmware version must be updated to at least OA FW 3.11 before proceeding with the next step, otherwise iLO will only be reachable by IP address and there may be other ramifications to iLO LDAP Authentication.
  2. In the OA, in Enclosure Bay IP Addressing, Select the “Interconnect Bays” tab and remove the DNS server IP address entries from the “Shared Interconnect Settings.” Click Apply.
  3. In the OA, in Enclosure Bay IP Addressing, Select the “Device Bays” tab and remove DNS server IP address entries from the “Shared Interconnect Settings” and click Apply.
  4. Within 5 minutes, the DNS settings for the modules should update and normal module communication will be restored.
  5. It is important that no VC domain changes are made until the following step is fully completed.
  6. From the VCM GUI, select “Tools” => “Reset Virtual Connect Manager.” This will force resynchronization of the modules if not synchronized in Step 3 above.IMPORTANT : If the NO_COMM condition was present or detected during one of the VCM administrative update tasks (listed in the DESCRIPTION section above), VCM may automatically resynchronize the modules, which would create a temporary VC domain-wide network outage during VC module initialization in either Step 4 or Step 6 above (but not both). Outage time will vary depending on the size of the VC domain.

Scenario 3 - External DHCP is being used to provide IP Addresses to the VC Ethernet Modules with any version of OA firmware:

  1. On the External DHCP Scope, create an exclusion range of IP addresses (preferably only the VC Ethernet module addresses). This exclusion range needs to be configured within EBIPA on the OA.
  2. In the OA, in Enclosure Bay IP Addressing, select the “Interconnect Bays” tab and configure the IP Addresses that were excluded in Step 1 above for bays that contain VC Ethernet modules. Do not configure DNS Server entries. Click Apply.
  3. It is important that no VC domain changes are made until the following steps are fully completed.
  4. Reboot the standby and primary VC modules to force them to use the new EBIPA lease. In a redundant design, the modules should be rebooted serially to mitigate downtime.

a. Reset the standby VC Module from OA.
b. Wait 15 minutes for the standby module to recover.
c. Reset the primary VC Module from OA.
d. Within 5 minutes, normal module communication will be restored.

IMPORTANT : If the NO_COMM condition was present or detected during one of the VCM administrative update tasks (listed in the DESCRIPTION section above), VCM may automatically resynchronize the modules, which would create a temporary VC domain-wide network outage during VC module initialization. Outage time will vary depending on the size of the VC domain.

Read More

Why VMware Enterprise Plus?

No Gravatar

I got this questions two times this week “What is the benifit of VMware Enterprise Plus?” when I was a VMware customer I went Enterprise Plus right away…Lots of people complained however its great value IMO.   Perhaps the biggest value of Enterprise plus is just the licensing core count increase from 6 to 12.  The current Nahalem EX processors are at 8 cores, and AMD are at 12….Besides this however there are other great features:

  1. Host Profiles-  These are used to automate configurations and ensure compliance; VMware is adding more and more configuration parameters that are included in each release.
  2. 8 Core SMP virtual machines
  3. Network I/O Control this is used to QOS virtual network resources
  4. Storage I/O Control- priortization for VMs resources to shared storage, redistibutes vms un
  5. Distributed Virtual Switch- Besides trimming down to one switch, and then adding in hosts for consistency a few other benefits…PVLANS, Load Based Teaming, ingress traffic shaping , required for Cisco Nexus 1000v integration

Here is a quick few use cases and review on Pvlans (they were new to me) :

  • Communities can communicate with eachother in community and router use case: DMZ with 1 app server and 1 corresponding database server. 
  • Isolated- Can only communicate with router  use case:  in a DMZ where you have a standalone webserver that doesn’t have the needs to communicate to other servers in the DMZ as often the case with DMZ; 2nd use case in a desktop vdi enviroment where desktops don’t need to communicate with eachother (This example is from Eric Slueth blog video on Pvlans)
  • Promiscous – For routers.
Read More

Configuration Best Practices Vsphere and HP EVA

No Gravatar

Recently I was helping a customer architect a solution and I went back to the Best Practices for Vpshere and the EVA you can find it here:  http://h20195.www2.hp.com/V2/GetPDF.aspx/4AA1-2185ENW.pdf

I thought I would make a quick summation and comment as some of it seems neglected in terms of being updated.

  1. When creating hosts in commandview create them as Type: VMware
  2. Use a single disk group with the same size disks and the largest possible disk group
  3. Single Diskgroup sparing is sufficent unless MTTR is > 7 days
  4. Don’t use VRaid 6 – Vraid 5 provides adequate redundancy
  5. When using disks of varying performance characteristics still use a single diskgroup
  6. Balance your VMFS Luns across both controllers for ownership with alternating failback and failover policies  read more here (http://www.ivobeerens.nl/?p=465)
  7. Configure Round Robin advanced parameters to IOPS=1….I dont think HP is right here this was fixed fixed in Vpshere 4 update 1; This shouldnt be necessary
  8. When using DR groups, ensure that DR groups managing controllers are spread across controllers
  9. Use less than 16 VMDKs per VMFS datastore
  10. Use the same lunid per host per lun
  11. Create VMFS datastores through Vcenter for proper alignment
  12. For heavy i/o load vms use the paravirtualized virtualized virtual adapters for the VM data Luns
  13. Verify data drives for windows 2003 machines within os for proper alignment
  14. Set Round Robin multipathing on your hosts and then reboot; dont forget to set your Microsoft clusters to MRU here is the quick and easy command:
esxcli nmp satp setdefaultpsp --satp VMW_SATP_ALUA --psp VMW_PSP_RR
Read More

VMware NMP Errors and Lun Dropping with HP EVA SANs

No Gravatar

Issue:

Recently I came across a situation with a customer, where they were experiencing sparactic LUN drops in there Vsphere clusters.  This occured in 4.1 and 4.0.  Also, what is unique is this occurred between two HP storage arrays…The luns would disappear for a few minutes than come back.  I looked at it from the angle of performance problems in there SAN enviroment.

These x028 errors aren’t unique to HP Arrays shown below other Manufactures seem to be fighting them as well including EMC and IBM you can read about it here:

Looking at the vmkernel logs lots of lines followed like:

Dec 17 07:03:24 esx01 vmkernel: 22:01:35:32.584 cpu7:4517)NMP: nmp_CompleteCommandForPath: Command 0x2a (0×410001176200) to NMP device ”naa.600508b40008dfcc0000600000cc0000″ failed on physical path ”vmhba2:C0:T0:L1″ H:0×0 D:0×28 P:0×0 Possible sense data: 0×0 0×0 0×0.   Dec 17 07:03:24 esx01 vmkernel: 22:01:35:32.584 cpu7:4517)ScsiDeviceIO: 770: Command 0x2a to device ”naa.600508b40008dfcc0000600000cc0000″ failed H:0×0 D:0×28 P:0×0 Possible sense data: 0×0 0×0 0×0.
Here is the HP Published Solution
NOTE: The above-mentioned URL will take you to a non-HP Web site. HP does not control and is not responsible for information outside of the HP Web site.
The hexa decimal values H:0×0 D:0×28 P:0×0 decodes to Task set full as per the above article.
VMware reports this error when the storage controller returns Queue Full or BUSY signal to an IO request.
A storage controller may return Queue Full or BUSY signal when it encounters resource congestion due to overutilization.
In VMware environments, this may be caused by high Queue Depth at controller ports during heavy workload or due to large size IOs issued by VMware. By default VMware is capable of sending IO blocks up to 32MB.
In many cases the following steps have helped mitigate the issue:
  1. Capture the evaperf logs during the time the errors are reported and ensure that the array utilization is well within the acceptable safe IOPS values for the given configuration.
  2. Set the maximum IO size to 128 as mentioned in the below VMware article.
  3. Follow the EVA – VMware bestpractices guide and ensure the multipath policy is set correctly.
    Click here to access the technical article available athttp://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-2185ENW.pdf.
  4. Enable Adaptive Queue depth throttling as mentioned in the below article.
    NOTE: The above-mentioned URLs will take you to a non-HP Web site. HP does not control and is not responsible for information outside of the HP Web site.
    For EVA, QFullSampleSize value of 32 and QFullThreshold value of 8 is found helpful in many cases.
Read More

5 Things I Love About 3PAR

No Gravatar

Coming from an EVA background 3PAR was very new to me.  Right away there were several features that stuck out that I really like:

1)  VAAI Integration –  3PAR fully supports VAAI with its VAAI plugin for Vsphere.  You can read more about that here

2) Rapid Provisioning –  This is perhaps my favorite thing about 3PAR, from suffering through pain in LUN management and presentations in ESX clusters the past several years….3PAR offers Autonomic Groups simplifying volume provisioning by the use of Autonomic Host Groups or Volume Groups when a new host is added to a host group all volumes are autonomically exported to the new host, likewise when a new volume is added to a volume group it is autonomically exported to all hosts in the group.  This can save hundreds of clicks literally.

3) Scalability-  I like how you can add additional controller nodes through upgrade paths in the F400, T400, T800

4) Hardware Based Thin Provisioning – Utilization of 3PAR Gen3 ASIC enables fat to thin without impacting workloads  hardware > software :) Also thin provisioning in Oracle

5) Replication – built in Gigabit ethernet ports not only can you add a card for Fiber Channel FCIP replication, you can use the built in Gigabit ethernet ports for replication….Remote copy = easy mode not hard to setup.

6) Really Active/Active – Unlike other midtier arrays from other manufactors who have both optimized and non optimized paths from the controller to the LUN all 3PAR paths are optimized.  This is similar to high end arrays like Hitachi and Symetrix.  You can read about it here on page 4
What I was impressed by is this holds true in both F Series and T Series systems. 

More To Follow Soon…

Read More

Boston Trip with C7000, HP Flex-10, Vsphere and Netapp 3140 Implementation

No Gravatar

Arriving in Boston last week we were greeted with lots of SNOW and an unfinished office environment, we didn’t have a desk the whole time and no chairs to the 3rd day.  Here is some pictures and videos of Boston implementation.  It was a pleasure to get to know Brian Erwin and Troy Janke though.

Read More

SNMP Monitoring HP ESX Servers Tips and Tricks

No Gravatar

Recently I was at a customers site doing a vSphere 4.1 upgrade, and SAN expansion.  I configured ESX hosts with SNMP monitoring for proactive support with Insight Remote Support for call home functionality with alerting incase of hardware failures.  Here are three good points I came across in dealing with SNMP monitoring configurations:

1)  If you deploy the latest 4.1 image through HP Insight Server Deployment Console you won’t have to install the HP agents.  I really like this saves a lot of time.

Note:  If your not using the HP Insight Server Deployment then you can get the agent here at this link you will have to manually install it.  It is pretty lengthy more than just an RPM.  You can just take all the defaults then modify what I am showing in step 2 below.

2)  The easiest way to configure the settings is post install through the systems management homepage.  You can just configure one system then copy its configuration show in the System Management Homepage to other ESX hosts System Management Homepage.  Below is the information that you will want to copy from the host.  Alternatively, if you are just starting out fill out the information below in <> to your desired configuration and your good to go.

dlmod cmaX /usr/lib64/libcmaX64.so
rwcommunity <your private community name>127.0.0.1
rocommunity <your public community name> 127.0.0.1
rwcommunity  <your private community name> <your ip of Insight Control server w/ Remote Support>
rocommunity  <your public community name> <your ip of Insight Control server w/ Remote Support>
trapcommunity <your public community name>
trapsink <your ip of Insight Control server w/ Remote Support> <your public community name>
syscontact Root <root@localhost> (configure /etc/snmp/snmp.local.conf)
syslocation Unknown (edit /etc/snmp/snmpd.conf)

3) Don’t forget when sending test traps you have to comment out the line in /etc/sudoers on each ESX box “Defaults requiretty” :)

Read More

Rethinking Network Design with FlexFabric and Vsphere 4.1

No Gravatar

The biggest changes we have seen is the with Vsphere is the vDS (distributed virtual switch), and FT (fault tolerance).  One of the biggest changes we have with the latest version of Vsphere 4.1 on the network side of the house is NET I/O Control NetIOC, not to mention the not often mentioned LBT (load-balanced teaming).   Also while these changes occur with VMware we see changes in the HP virtual 10GB networking bringing us Flex-Fabric which is enough change to really draw some confusion…I see these changes as bringing a certain synergy to the datacenter from a HP Blade prospective with Vsphere 4.1 implementations.  I also see a serious cost savings and increased efficiency not to mention cleaner design.

Here is the look from the physical topology:

Flex-Fabric brings HPs Flex-10 Converged I/O and 1 HOP FCOE while, not as far as Cisco maybe with there current Nexus line there is still a major cost savings to the customer.  Take for instance the current Flex-10 Virtual Connect Implementation, each c7000 would need a minimum of 4 switches two for Virtual Connect SAN connections and two for Network uplinks.  Now with converged I/O the customer could buy two switches and save roughly 40k per C7000.  The two switches would both have uplinks to both SAN and Network.

What will this look like inside of virtual connect/flex-fabric?

Instead of getting four flex Nics you will get three and one Converged Adapter.

Will this work on G5′s or G6′s, what about previous flex-10 modules?

No unfortunately only G7′s, and flex-fabric modules

Any recommendations on thoughts for a network design with Vsphere 4.1 and Flex-Fabric?

VMware provides us with this best practice document: http://www.vmware.com/files/pdf/techpaper/VMW_Netioc_BestPractices.pdf  I honestly haven’t seen the latest cookbook for virtual connect.  Hopefully it addresses the distributed virtual switch now.  This was lacking last I saw.

Utilize Vsphere Network I/O control and .1q with different dv port groups for virtual machine traffic, FT, vmotion, service console to achieve a fully dynamic use of your 10gb bandwidth and only use two uplinks (the converged I/O) in an Active/Active scenario.  I really don’t see the sense of having a scenario with 1 dVs then different uplinks to different dv port groups to different virtual flexnic uplinks since you already have features in VMware to tackle I/O contention and prioritize latency sensitive traffic like shares, limits, traffic shaping.  I would avoid the use of limits and reservations were possible.  Shares will trump limits and reservation providing a better use of capacity.

Limit VMotion through Egress Traffic Shaping at the dv port group as Ingress isn’t needed with NetIOC.  This will help in a situation with multiple vmotions from many hosts.  Picture a scenario where you place multiple hosts in a cluster in maintenance mode and it is set to fully automated DRS.  Limiting the MAX vmotion will help in ensuring latency sensitive traffic is interrupted.  The below example limits vmotion to 3GB.

Network Resource Pool Host Limit Physical Share Share Value
FT Unlimited High 100
vMotion Traffic 3GB Normal 50
Management Unlimited Normal 50
VirtualMachine Traffic Unlimited Custom 75

So In Active/Active Flex-10 or Flex-Fabric does this mean that it will load balance automatically?

This isn’t exactly what you think….This is where LBT steps in.  To use this select Route based on physical nic load on your dvportgroup settting for teaming and failover.  LBT will only move a flow when the mean send and receive utilization of an uplink exceeds 75% of a capacity over a 30 second period.  It won’t move it more than 30 seconds.  There may be some hidden way to adjust that setting, I just don’t know it. :)

The actual best practices from VMware are as follows:

NetIOC Best Practices: VMware provides us with this best practice document: http://www.vmware.com/files/pdf/techpaper/VMW_Netioc_BestPractices.pdf

Flex-Fabric Best Practices:      http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02499726/c02499726.pdf

Best practice 1: When using bandwidth allocation, use “shares” instead of “limits,” as the former has greater flexibility for unused capacity redistribution. Partitioning the available network bandwidth among different types of network traffic flows using limits has shortcomings. For instance, allocating 2Gbps bandwidth by using a limit for the virtual machine resource pool provides a maximum of 2Gbps bandwidth for all the virtual machine traffic even if the team is not saturated. In other words, limits impose hard limits on the amount of the bandwidth usage by a traffic flow even when there is network bandwidth available.

Best practice 2: If you are concerned about physical switch and/or physical network capacity, consider imposing limits on a given resource pool. For instance, you might want to put a limit on vMotion traffic flow to help in situations where multiple vMotion traffic flows initiated on different ESX hosts at the same time could possibly oversubscribe the physical network. By limiting the vMotion traffic bandwidth usage at the ESX host level, we can prevent the possibility of jeopardizing performance for other flows going through the same points of contention.

Best practice 3: Fault tolerance is a latency-sensitive traffic flow, so it is recommended to always set the corresponding resource- pool shares to a reasonably high relative value in the case of custom shares. However, in the case where you are using the predefined default shares value for VMware FT, leaving it set to high is recommended.

Best practice 4: We recommend that you use LBT as your vDS teaming policy while using NetIOC in order to maximize the networking capacity utilization.

NOTE: As LBT moves flows among uplinks it may occasionally cause reordering of packets at the receiver.

Best practice 5: Use the DV Port Group and Traffic Shaper features offered by the vDS to maximum effect when configuring the vDS. Configure each of the traffic flow types with a dedicated DV Port Group. Use DV Port Groups as a means to apply configuration policies to different traffic flow types, and more important, to provide additional Rx bandwidth controls through the use of Traffic Shaper. For instance, you might want to enable Traffic Shaper for the egress traffic on the DV Port Group used for vMotion. This can help in situations when multiple vMotions initiated on different vSphere hosts converge to the same destination vSphere server.

Read More

VCAP-DCD-BETA-Experience

No Gravatar

Today I was scheduled to start my VCAP-DCD beta test today.  The last month, or so I have been working on my prep for my DCA on Dec 6th which is a lot, lot of lab prep which takes forever….Last week I got a beta invite for the DCD so I choose at the last minute to switch gears.   Houston, didn’t show available seats till Friday evening. I actually called Person Vue VMware Friday morning to see if there was a glitch when there was 0 open seats…

I decided to take the test and wish for the best.

Arriving at the test center at 8:00am when my test was scheduled to start I was confronted with a long line of doctors and told to take a number literally.  These doctors had to take biometrics both palm and finger prints for there exam and it took a hella long time for them to get going.  I was instructed “not to worry your exam doesnt start till your seated”

Anyhow, I was seated around 8:25 then started a 10minute survey….The exam seemed reasonable in difficulty as I was plowing away, however all the sudden I realized the questions were taking me to long being 130 questions with 4 hours.  The questions had a lot of situations and they were involved which means you have to read it more than once.  I could only answer the 1st 90 or so questions then speed through the rest in my last half hour and my test ended at 12:10 so I didnt even get 4 hours..I think my clock started at 8:00 even though my exam wasnt started then…Ugh….The problem is there is these situational questions and there is a ton of information, hindsight knowing what I know now I would have skimmed the questions more also I had a drag and drop question I had to reset 7 times.  Basically drag 4 from the left more than once to the 7 on the right but that whole system was bad.  I wasnt expecting a perfect glitchless exam though.  For the most part I thought the questions were good, one visio one was pretty awful though.  If the exam is cut down to about 100 questions that will probley be about right.  I did see one guy only finished 80/130 so who knows.  Hopefully I am not penalized for quick answering the last 30 of my questions who knows.

Anyone who reads this here is my advice…One thing watch your time, and don’t dwell on any one thing and if you don’t know it move on….Oh and btw there is no back button you cant go back….:)I am glad I am not getting my results for 6-8weeks after my DCA…

Read More

Troubleshooting Netapp VSC2.0 with Vsphere 4.1

No Gravatar

I recently came across a few issues implementing Netapp Virtualization Storage Console with Vsphere 4.1….When adding filers to the provisioning and cloning  section when timeout occur or you get the generic message “A general Error has occured” try the following work around steps:

1) Does your filer have NFS enabled even if your not using enable it support can provide you a temporary key

https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb59430

2) e0M interface issue should be temporarily disabled

: bug # 320355

http://now.corp.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=320355

3) Increasing ZAPI timeout via VSCPreferences file is in the KB below, and also in VSC’s known issues, and IAG on the NOW site. Links below:

https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb59154

http://now.netapp.com/NOW/download/software/vsc_win/2.0/

4) Enable SSL 2.0 in webbrowser

Read More

VMware Vsphere Site Survey Fault Tolerance Troubleshooting

No Gravatar

I was troubleshooting an issue with hardware compatability with Fault Tolerance when I ran into this neat utility (site survery)  A link to it can be found here on the Vmware site.  Once installed when you close your Vcenter client and open it back up, click on a cluster and then the site survey tab.  A report will generate after a few minutes….

Also on this site is a cpu identification utility for Vmotion compatability testing etc

Read More

Vsphere Command-Line Reference

No Gravatar

vSphere Command-Line Interface Reference

Also check here: http://www.vmware.com/pdf/vsphere4/r41/vsp4_41_vcli_inst_script.pdf

Data taken from: http://www.vmware.com/support/developer/vcli/vcli41/doc/reference/

The vSphere CLI command set allows you to run common system administration commands against vSphere systems from an administration server of your choice.

Linux Installation: If you accepted the defaults during installation, you can find the installed software in the following locations:

  • vSphere CLI scripts – /usr/bin
  • vSphere SDK for Perl utility applications – /usr/lib/vmware-vcli/apps
  • vSphere SDK for Perl sample scripts – /usr/share/doc/vmware-vcli/samples

Windows Installation: vSpere CLI commands are installed in C:\Program Files\VMware\VMware vSphere CLI\bin by default.

The vSphere CLI includes the commands listed below, as well as the resxtop and esxcli commands.

  • For resxtop, see the documentation in the Resource Management Guide.
  • For esxcli, see the online help. This command differs depending on the system you are running it on.

To display usage information, click the command name in the Documentation column of Table 1.

Table 1: vSphere CLI commands

Documentation Description
svmotion Moves a virtual machine’s configuration file and optionally its disks while the virtual machine is running. Must run against a vCenter Server system.
vicfg‑advcfg Performs advanced configuration including enabling and disabling CIM providers. Use this command as instructed by VMware.
vicfg‑authconfig Manages Active Directory authentication.
vicfg‑cfgbackup Backs up the configuration data of an ESXi system and restores previously saved configuration data.
vicfg‑dns.pl Specifies an ESX/ESXi host’s DNS (Domain Name Server) configuration.
vicfg‑dumppart Manages diagnostic partitions.
vicfg‑hostops Allows you to start, stop, and examine ESX/ESXi hosts and to instruct them to enter maintenance mode and exit from maintenance mode.
vicfg‑ipsec Supports setup of IPSec.
vicfg‑iscsi Manages iSCSI storage.
vicfg‑module Enables VMkernel options. Use this command with the options listed, or as instructed by VMware.
vicfg‑mpath Displays information about storage array paths and allows you to change a path’s state.
vicfg‑mpath35 Configures multipath settings for Fibre Channel or iSCSI LUNs.
vicfg‑nas Manages NAS file systems.
vicfg‑nics Manages the ESX/ESXi host’s NICs (uplink adapters).
vicfg‑ntp Specifies the NTP (Network Time Protocol) server.
vicfg‑rescan Rescans the storage configuration.
vicfg‑route Lists or changes the ESX/ESXi host’s route entry (IP gateway).
vicfg‑scsidevs Finds available LUNs.
vicfg‑snmp Manages the Simple Network Management Protocol (SNMP) agent.
vicfg‑syslog Specifies the syslog server and the port to connect to that server for ESXi hosts.
vicfg‑user Creates, modifies, deletes, and lists local direct access users and groups of users.
vicfg‑vmknic Adds, deletes, and modifies virtual network adapters (VMkernel NICs).
vicfg‑volume Supports resignaturing a VMFS snapshot volume and mounting and unmounting the snapshot volume.
vicfg‑vswitch Adds or removes virtual switches or vNetwork Distributed Switches, or modifies switch settings.
vifs.pl Performs file system operations such as retrieving and uploading files on the remote server.
vihostupdate Manages updates of ESX/ESXi hosts. Use vihostupdate35 for ESXi 3.5 hosts.
vihostupdate35 Manages updates of ESX/ESXi version 3.5 hosts.
vmkfstools Creates and manipulates virtual disks, file systems, logical volumes, and physical storage devices on ESX/ESXi hosts.
vmware‑cmd Performs virtual machine operations remotely. This includes, for example, creating a snapshot, powering the virtual machine on or off, and getting information about the virtual machine.

VMware welcomes your suggestions for improving technical publications. Email your feedback to docfeedback@vmware.com

Read More

resxtops and troubleshooting Memory for ESXi/ESX hosts from the vMA

No Gravatar

Here is a quick cheat sheet for the vMA (if you need it).

  • The default user name is vi-admin password is set on install
  • To add an esx host type vifp addserver <your esxhostname.company.com>
  • To show a quick list of your servers vifp listservers
  • To initialize a connection to a particular host type vifpinit esxhostname.company.com
  • To capture logs from an esx host type vilogger enable –server <fqdn of esxhost you want to monitor> –numrotation 30 –maxfilesize 1023 –collectionperiod 100

How can I launch resxtop remotely?

$resxtop –server <fqdn of your vCenter> –vihost <fqdn of esxhost you want to monitor> –username <your username to login Virtual Center>

(Note: you will be prompted for your password)

How can I run resxtop in batch mode and store all that in a .csv ?  The command below batches it takes 60 samples every 5 seconds then stores it in a file named data.csv

$resxtop -b -a -n 60 –server <fqdn of your virtualcenter server> –vihost <fqdn of your esxhost> –username (for your virtualcenter)

Why would I ever want to run resxtops when this stuff is in vCenter?  lol.

Some quick notes now that were in esxtops remotely using the vMA….

1) resxtops updates every 5 seconds to delay it type s then the refresh interval (20 would be 20 seconds)
2) Type V to just show virtual machines
3) To drill down into a virtual machine and look at the worlds type e then the gid
4) typing c will bring up cpu, m will bring up memory, d will bring up disk and n will bring up network

Troubleshooting Memory from the vMA with resxtops

1) Determine if the balloon driver is installed in a Virtual Machine  type m for memory view then f to toggle fields select MCTL
–Now that it is selected return back to the screen looking at the MCTL if the MCTL collumn says N on a virtual machine then the balloon driver isnt installed

2) Look at MEMSZ and GRANT counters…GRANT\MEMSZ = %memory used

3) To check demand of virtual machines memory  a  quick peak at Memory Usage Counter, the Average column and the Maximum (peak) column will help greatly if average> 80 or peak >90 high demand for virtual machines memory might be causing problem. ( I know virtual center)

4) Too check to see if you ESX host swapped in the past look at SWAP/MB if the value > 0 it has swapped virtual machine memory in the past.  If the answer is no, the ESX host doesnt have any virtual machine memory swapped.

5) Look at your SWCUR for your virtual machine if  the value> 0 then the ESX host has swapped memory from your test VM.

6) Look at your MCTLSZ if this > 0 your vm is balloning, if SWR/s or SWW/s your virtual machine is swapping

7) Look at MCTLSZ for your test virtual machine if value > 0 then vm is balloning

Read More
Page 1 of 212
content top