This is the design that I constructed and implemented for my last companies Vsphere 4.0 Update 2 upgrade and hardware refresh for production virtual environment, I created two highly available vSphere clusters which I like to call “vClusters” using the latest HP blade technology with HP Virtual Connect and Flex-10. I was able to create a very dynamic system with 2 clusters which could easily be scaled to 4.
Hardware:
- 2 HP Blade Chassis each equipped with 2 Flex-10 and 2 8gb Virtual Connect
- Each Chassis is interconnected with 4 CX-4 stacking cables 2 per per Chassis side running between the Flex-10 modules
- 18 Bl 460s G6 each with Intel Westmere Nehalems 32 nms 6 core procs each equipped with 48gb of memory
- SAN 2 HP EVA 8400s
- SAN Core Brocade 48000 (4GB director series)
- Networking Core Cisco 6509s
- 1 DataDomain DD 560
VMware Environment:
- Licensing – All Enterprise Plus for dvs, host profiles, storage i/o (future), 12 core processors (future)
- Each Cluster will hold 100-125 Virtual Machines with room for more than double the capacity
- VMware thin provisioning (reduced storage by more than 200%)
- Estimated capacity max per blade 30 VMS
- 2 vClusters each with 8 servers 1 Server for HA reserved; fully automated DRS with DPM configured (not fully automated)
- 2 Sandbox Servers Clustered with Private Virtual Honeypot
- VMs each upgrades to virtual hardware 7 with VMware vmxnet 3
- Vranger Pro 4.5
- 4 resource pools per cluster
- Templates – CPU and Share Resources kept to a minimum. The templates are actually powered on VM’s why? Who likes patching
- Delete – A resource pool with no resources mainly used to put VMs that are powered off and waiting to be deleted
- Prod – A resource pool with shares set to high for both CPU and Memory with expandable reservation
- Dev/Test – A resource pool with shares set to normal for both CPU and Memory with expandable reservations
Networking:
- 80 gb uplinks to core router (Cisco 6509) 20 gb trunk per flex-10 module (2 flex-10) modules per chassis.
- Flex-10 (Active/Active) 20 GB of networking to each blade with 20gb of networking between blades inter chassis (read about the configuration for Flex-10 and Virtual connect here)
- dVs Fault Tolerance -Private Network – Non Routable only communicates within Blade Chassis
- dVs Vmotion – Private Network – Non Routable only communicates within Blade Chassis
- dVs Virtual Machines- Different Port groups each for different Vlans for Dev/Test/Prod
- vS Service Console
Note: In 4.1 I would change this design and route VMotion, and do mapped VLANS and 1 dVs for Vmotion/Service Console/Virtual Machines Dev/Virtual Machines Test…Id keep fault tolerance on a seperate private switch. However with the main dVs switch I would encorporate Network I/O control to effectively and dynamically utilize the 10gb pipe this would also solve the issue of the egress problem with flex-10 only controlling traffic one way.
Storage and Backup:
- vRanger Pro 4.5 – Installed on VMs, configured to backup vClusters 50 VMs per hour very effective 50 vms per hour backup 100% success rate on backups 0 errors or troubleshooting. I honestly never thought that I would see the day after troubleshooting VCB for 2 years backups this good.
- DD 560 set up with CIFS share for VMware backups, ESX boxes backup directly to DD560. Pre thin provision 40:1 compression ratio.
- LUNS presented to each cluster with standard size of 500gb. sVMotion capability between clusters
14 Responses to ““The vCluster” – A Highly Available Dynamic Blade Solution Design with Vsphere 4”
Trackbacks/Pingbacks
- The vCluster” – A Highly Available Dynamic Blade Solution Design with Vsphere 4 « Techpot’s Blog - [...] The vCluster” – A Highly Available Dynamic Blade Solution Design with Vsphere 4 is available here. [...]


Twitter
RSS
LinkedIn
Jeffrey,
I like the solution but I have one question. Why use Fiber Channel and not ISCSI? It appears you’ve got plenty of IP bandwidth with your up-links to the 6509′s.
Hi John-
Thanks for your commment….Your correct iscsi would be great or NFS. I would have like to do with Lefthand or a Netapp.
Hey Jeffrey, just wondering why:
you selected only 48GB of memory on BL460 g7 servers with 12 cores of CPU
EVA 8400s were the choice for storage (which explains picking FC for connectivity)
Did you consider the Nexus switches instead of catalyst (FCoE instead of FC). If so why did you go with the catalyst.
Any solution that meets the requirements properly within the budget is the right solution, just like asking why.
Good questions the servers were g6′s…
The original servers that were picked were the quad-core nahelem however the vendor upgraded to westmere post ordering…There is 12 slots in a g6 and 4gb dimms are more cost effective that 8gb also I believe if you increased to 8gb the memory speed toggles down to 1033…
I originally designed the solution with nexus 5000 upstream, however the company already had the catalyst and choose just to get a blade…
In regards to fcoe solution if I redid this today with an Hp solution I would do flex fabric…where the San and Ethernet is converged…
Makes perfect sense with the procs then. you can keep your speed at 1333 with 8GB, just need to use the LV/Reg memory. I would want to investigate any potential perf issue with low voltage memory before utilising it though (I haven’t yet).
Hardware constraints are always a good reason for a design. If you are constrained to a piece of kit you just have to architect with it.
HP flexfabric looks quite good – not true convergance, without top of rack, but you can do this with either a nexus 5000 or a 3com (w/ fcoe module) – and to be honest I prefer it the way it is (doesn’t rely on a nexus switch to work)
Is there any reason you used EVAs? Another design constraint or did you prefer them? If so what was the reason?
My fear with FCoE, still, is that it is over complicated, sending a lossless protocol over ethernet. At least with flexfabric the uplinks can be FC (fabric) or 10Gbe copper/fibre so is only converged within the enclosure.
I’m personally a big fan of NFS – EMC, NetApp and other storage vendors are currently investing a lot of R&D into it. Much simpler, reliable (production for decades) and there is little perceivable difference with NFS vs iSCSI vs FCoE vc FC
I would have preferred to use EMC or Netapp for the storage in fact I recommended both be considered over HP After looking at VMware VAAI support my recommendation looks only stronger…However I think the 2 yr plan was to use the 8400s then do storage virtualization using SVSP not sure if you have seen that….It is not only for HP, however I am not sure how tricky that would be to support with other SAN vendors….
I forgot to mention that hp have something coming up codenamed “fenway”. Provides flexfabric and full matrix capabilities for rack mount HP servers – think a competitor for Xsigo
This sounds cool, I really hate rack mounts all together though….lol
rack mounds are one of those things that polaris people – they either love them or hate them. I like them but see a lot of people who hate/don’t trust them.
All about meeting customer requirements
I am stuck with them currently at my present employer, instead of blading everything we get a lot of rackmounts. I don’t think all customers really understand the lower operational costs with a bladed datacenter things get a lot easier. With a converged dynamic infrastructure you can do more with less….
playing devils advocate here (as I’m a blade fan). I come across these kind of things a fair bit when architecting a solution for a customer.
Negative feedback I’ve heard on blades is:
firmware/drivers update is too hard – if I miss one on the enclosure the servers don’t work (that one really annoys me actually – lazy IT)
If the enclosure fails I lose all my servers. If a server fails I only lose one server
They are more expensive
Power savings are not really that great
I am locked into one vendor when I buy an enclosure
I dont think with IBM your locked with one vendor I think they use an open standard. However, I used HP because there passive backplane. Blades are a bit of a learning curve especially with Flex-10 Virtual Connect it was easy for me however I have a strong enough background in networking…Not sure how many server people are strong in networking.
I have to say I agree with your preferences for storage. I’d also be worried about how long EVA will be around for. They have just bought a mid-enterprise to enterprise storage vendor in 3par and they already have VAAI….will be interesting to see
Yeah I think the EVA will be retired soon, I wouldn’t be shocked if Lefthand was supported with VAAI before the EVA…..
Good stuff Jeff. Thanks.