- VMware vSphere Virtual Datacenter design: which included the vCenter Server, Database Systems (vCenter & VUM), HA/DRS/DPM, Clusters and Resource Pools.
- VMware vSphere Network design: including the network design requirements, number of networks and why to separate or segment them, (P)VLAN’s, virtual and distributed switches (incl. Cisco Nexus 1000v), security, overview of the Fiber Channel over Ethernet (FCoE), physical switch (pSwitch) configuration, VMdirectPath I/O design and limitations, the DNS design and last but not least the network naming conventions.
This looks interesting so let’s have a closer look at the sub-topics.
1. vCenter Server and Database Systems:
- OS type for the vCenter Server: obviously, always install a supported OS for your vCenter server. This will prevent support issues when something goes wrong. Consider using an x64-bit OS for more scalability. Check the “Configuration Maximums for VMware vSphere” document for the supported OS’s.
- vCenter Server Hardware: well, this can change between the software releases so check the document mentioned above for the latest specifications.
- vCenter Server name and IP address: for ease of administration and management, use a static configuration.
- Type of vCenter Server (virtual/physical): both installations are supported and in my opinion, the virtual installation is
preferable because it provides more flexibility. Virtual vCenter Server should also be placed in a HA cluster to prevent from becoming a single point of failure in the infrastructure. - OS type for the vCenter Database Server: just like for the vCenter Server, always use a supported OS for the Database Server. The same goes for the database, use only supported database type and version.
- vCenter Database Server name and IP address: just like for the vCenter Server, use static configuration.
- Type of vCenter Database Server (virtual/physical): you can use one of the existing database servers in the infrastructure to save cost if it already is high available. Also, consider migrating the physical machine to virtual to gain the advantages of a VM.
- vCenter Server Database location: for small infrastructure with few ESX(i) hosts and VM’s you could use the vCenter Server as the location for the database. In all other cases, use a separate machine.
2. Clusters and Resource Pools:
- Scaling the infrastructure: when buying fewer, more powerful hosts or more, less powerful hosts evaluate the pros & cons of both options.
- Mixed Clusters: you can use mixed clusters (with ESX 3.x and ESX 4.x) during an upgrade to the latest version but you should
upgrade the hosts first and then the VM’s. - VMware HA failover policy: there are three options that you can choose when you configure the failover policy of the HA cluster. The “Host Failures Cluster Tolerates”, the “Percentage of Cluster Resources Reserved” and the “Specify a Failover Host”. I’m not going to explain each option here. Read the VMware vSphere 4.1 HA and DRS technical deepdive book written by Frank Denneman and Duncan Epping instead 🙂
- VMware HA Network Redundancy: network redundancy is crucial in a HA cluster. Make sure the host does not become isolated by configuring a second management network on a separate physical network or use redundant NIC’s on a single network. Use the das.isolationaddress parameter in the advanced option of HA to eliminate the single point of failure of the isolation test address. When you configure this option, you should also increase the setting for das.failuredetectiontime parameter to 20,000 milliseconds or more. Again, this topic is extensively discussed in Duncan and Frank book.
- Isolation Response: the default setting in vSphere 4 is Power off VM. Do not change the default unless the management network is not redundant, the availability cannot be guaranteed and the virtual machine and storage network are separated from the management network and are far more reliable.
- Blade Servers in a VMware HA cluster: keep in mind that the first 5 hosts added to the HA cluster automatically become “Primary Nodes”. In a blade environment, the blade chassis is a single point of failure therefore the blades servers should be distributed across multiple clusters and each cluster should not contain more than four primary nodes.
- VMware Fault Tolerance: you can protect the most critical VM’s in the infrastructure with FT. Keep in mind that many virtualization benefits like snapshots, DRS and Storage vMotion will be not available on a FT machine. FT is only applicable to single CPU VM’s and requires a dedicated 1Gb network per host for logging which increase the costs per host.
- VMware DRS Cluster design: as a thumb of rule, you should always configure DRS in the cluster. DRS will automatically load balance the workloads, which leads to better performance, scalability and manageability. Enable Enhanced vMotion Compatibility (EVC) mode, this will enable you to add newer hosts in the future. From the cluster scaling perspective, DRS will benefit of more hosts per cluster rather than fewer hosts. Be careful when configuring the DRS affinity and anti-affinity rules. Many rules will limit the migration choices and could potentially have a negative effect on the workload balance.
- Multiple HA & DRS Clusters: VM’s/hosts per cluster maximums are one of many reasons why you should deploy multiple cluster in the infrastructure. It would be advisable to create a separate cluster for server-based load and a separate one for VDI based load. Other aspects like security (dedicated DMZ cluster), production/non-production environment, are also a valid argument to creating a multiple cluster design.
- Resource Pools: you can deploy resource pools with workloads that require dedicated and isolated resources or if you want to distribute the system resources among departments or project teams. Keep in mind that resource pools create an
additional layer to monitor and that the share settings on the resource pool are not inherited by the VM’s. - VMware DPM design: using the DPM in environments where the workload is not constant, will save power and cooling costs. If your design includes DPM, make sure that the hosts support iLO or IPMI technology.
3. VMware vSphere Network Design:
- Network Design Guidelines & Requirements: when designing a new virtual network infrastructure, try to meet the goals like reducing costs, boost performance, improve security and availability, provides more flexibility and manageability. In other words, the new design should be more efficient and modular.
- Number of Networks: the number of separate networks depends on the type of traffic required in the new environment. Consider placing the virtual machine network, FT, iSCSI/NFS, HA, vMotion and the management network on a separate network. This will reduce congestion and latency but will also improve performance.
- Network Segmentation and VLAN’s: there are several reasons why you should use VLAN’s. In most cases, this will be applicable when there are more networks than physical NIC’s available.
- Private Virtual LAN’s (PVLAN’s): private VLAN’s provide layer 2 network isolation between VM’s on the same subnet. Common use of PVLAN’s is a DMZ network.
- Standard switches & distributed switches: dvSwitches is the right choice when your design includes the Enterprise Plus
license, otherwise you’ll have to design the network with standard vSwitches. - Cisco Nexus 1000V switch: the Nexus 1000V switch from Cisco is a distributed virtual software switch and behaves like a normal physical switch in the vSphere environment.
- Number of virtual switches: keep the number of switches as low as possible, this will simplify configuration, administration and monitoring.
- NIC Teams, Load Distribution and Availability: team the physical NIC’s to increase the network bandwidth, availability and
redundancy. To avoid a single point of failure, never team two or more ports on the same NIC and spread the ports across multiple physical switches. - NIC Team Failure Detection: as a failure detection mechanism, you could enable the beacon probing on the NIC teams that consist of three or more uplinks and are connected to multiple physical switches. Make sure that the physical switches don’t block the beacon probing packets.
- Virtual Switch Security: as a best practice, set the setting of the “Forged Transmits”, “MAC address changes” and “Promiscuous Mode” to “Reject”. Enable those setting only when there is a valid argument to do it. For example, you must enable the “MAC address changes” when you plan to use Microsoft NLB.
- Physical NIC’s features: when purchasing new physical NIC’s make sure that they support the Checksum off-load, TCP segmentation off-load, 64-bit direct memory access, Jumbo Frames etc. These features will help increasing the network performance and reduce the CPU workload on the ESX(i) host.
- Fiber Channel over Ethernet: in large network infrastructures, consider using FCoE to consolidate I/O and reduce the number of host adapters and even the physical switches. This will also save on the power, heat, cabling etc. Make sure that switches used in the new environment are FCoE capable.
- Physical Switch configuration: always use redundant physical switches. Turn off STP if possible or enable the “PortFast mode” on the ports connected to the ESX(i) host. This will prevent VM time outs during a network outage. Consider using Jumbo Frames to improve the throughput on NFS/iSCSI networks. Do not forget to configure the whole stack to use Jumbo Frames (NIC’s, switches, SAN etc.) and benchmark your setup to determine if there are performance gains when using Jumbo Frames.
- VMDirectPath I/O design: this feature should not be used unless there is a clear and valid reason to do it. The features like vMotion, DRS, HA etc. are lost when using VMDirectPath I/O.
- DNS and Network Naming Conventions: to simplify the administration and management of the vSphere infrastructure, all key components like the ESX(i) hosts, vCenter Server, Database Server etc. should be registered in DNS. Choose simple, clear and easy to remember names for the infrastructure components. This will ease the administration and will probably save time when troubleshooting.
OK, this was day two. Again, a lot of interesting and useful stuff. Believe me there is a lot more but the topics described above will give you a quick and clear overview on the material this course covers. Next, the final day, day 3. Stay tuned!
Cheers!
– Marek.Z