Posted by Marek.Z on 7 March 2011
OK, the last and final day of the VMware vSphere Design Workshop covers the VMware vSphere Storage design, the Virtual Machine design and the Management and Monitoring design modules. These topics look interesting so let’s have a closer look.
1. VMware vSphere Storage design:
- Storage Design Guidelines: the storage design should provide several benefits to the enterprise. It should reduce costs, ease administration, improve availability and should not decrease performance.
- Storage Network Technology (NFS, iSCSI, FC): there is no storage network technology that is “the best”. It really depends on the case. All technologies have their pros & cons. The key to performance is proper sizing, reducing saturation and latency.
- Storage size: when designing the storage size, add about 20% to 30% extra capacity to accommodate snapshots, swap and logfiles. Keep in mind that recovery time objective (RTO) could also affect the size of the storage.
- VMFS Datastore size: the main factor for choosing the right size of the VMFS datastore is the number of VM’s that can be run in the datastore with acceptable latency.
- VMFS Block Size: the VMFS block size should be determined by the largest required virtual disk. As best practice, keep the same block size across all datastores.
- Command Queuing: this can occur at the host and/or at the storage array level and can degrade the storage performance. The LUN queue depth parameter determines how many commands can be active to one LUN at the same time. If a host generates more commands to a LUN than the LUN queue depth, the excess commands are queued in the VMkernel, which leads to increased latency.
- VMFS Volumes per LUN: use single VMFS volume per LUN. This will minimize the number of SCSI reservations per LUN and will improve performance in a multi host environment.
- Storage Security & Access: access to the storage array depends on chosen technology. In NFS you can use network segmentation or choose not to mount a NFS volume to a ESX(i) host. On a iSCSI network, you can use VLAN’s and the Challenge Handshake Authentication Protocol (CHAP). On a Fiber Channel fabric, use zoning and LUN masking.
- Host LUN ID numbers: the LUN’s presented to the ESX(i) host should be configured to present the same LUN ID to all hosts. This will prevent inconsistency between the hosts and will ease administration.
- Redundant Storage Paths: to provide storage high availability, you should always configure multipathing. The paths should be configured using separate HBA’s/NIC’s, switches and storage processor. You should also consult the array documentation for specific multipath configuration and the multipath policy that is applicable to your array.
- Raw Device Mapping: in most cases the VMDK’s are sufficient but if you want to take advantage of, for example, SAN software inside a VM you should use RDM’s. The I/O performance between a VMDK and RDM is almost negligible.
- Virtual Disks: when choosing between thick and thin provisioned disk in a design, you should first make a decision on using NFS or VMFS. The NFS datastores are thin provisioned by default and the monitoring/management is done entirely at the NFS server side. When using VMFS datastores, the thin provisioning is optional. Before configuring, consider all pros & cons of thin vs. thick disk provisioning.
- Boot from SAN: boot ESX(i) from SAN if you are using diskless systems like blade servers. Keep in mind that you cannot boot from SAN when using NFS datastores or software iSCSI adapters. This also create a dependency on the SAN and since the ESXi Embedded version is gaining popularity, I would suggest to use it where possible.
- N_Port ID Virtualization: NPIV gives each VM a virtual WWN identity on the FC fabric. This is useful for access control and if there is a requirement to monitor LUN usage on the VM level. Keep in mind that NPIV requires the VM to use RDM and the HBA must support the NPIV technology.
- Storage Naming Conventions: Just like in other sub-components of the vSphere infrastructure, the names of the storage units should be used consistently and reflect for example the location, type and number of the unit.
2. Virtual Machine design:
- Number of vCPU’s: the default is one vCPU unless there is an obvious need for more. Scheduling VM’s with one vCPU is a lot easier to do for DRS than VM’s with multiple vCPU’s.
- Memory: for best memory performance, the memory of the VM’s should be kept in physical RAM memory. Limit the memory overcommitment and be careful when using reservations. Make sure the VMware tools are installed and the transparent page sharing (TPS) is enabled.
- Virtual Machine Disk: as best practice, use separate system and data disk and place them on one datastore unless they require different I/O characteristics (RAID level, latency etc.). Separate disks simplify backup/restore and help distribute the I/O load.
- Swap File location: the swap file location of the VM from the ESX(i) host level can be stored on several locations. You can store it on the shared storage with the VM files. This is the default option and in most cases the preferable one. Other option is to store the VM files on shared storage and the swap files on local disk. This will reduce the replication bandwidth required but slows down vMotion. Another alternative is to store the swap files on a dedicated shared storage. This will improve replication performance but adds more administrative overhead. Fourth option is to store the swap file and VM files on a local disk but this not advisable unless you are creating a test environment.
- Virtual SCSI HBA type: use default choice provided by the wizard when creating a new VM. One exception is the use of the Paravirtual SCSI adapter (PVSCSI). The PVSCSI offers more throughput, lower CPU utilization and should be used in I/O extensive VM’s.
- Virtual NIC’s: wherever possible, use the VMXNET3 NIC adapter. It offers the most features and performance.
- VMware Tools: there are many features that VMware tools provide like memory management, improved display/mouse performance, graceful shutdown of the VM etc. Always install VMware tools for supported operating systems.
- Virtual Machine Security: keep the VM’s secure as the physical machines, there is no exception. For an extra layer of security, you can use VMware vShield Zones in your environment.
- Virtual Machine Naming Conventions: name the VM’s in an easy, consistent and logical way to ease the management and administration.
3. Management and monitoring design:
- Management Guidelines: in general, limit the number of monitoring and management agents that use the service console. Use tools like vSphere CLI, vSphere PowerCLI or vMA.
- Host Installation & Configuration: to simplify the installation of the hosts, use wherever possible the ESXi Embedded version and use automated installers for ESX. The post installation tasks can be easily deployed using the Host Profiles feature.
- Number of vCenter Server Systems: the number of vCenter Servers depends on two key factors: the infrastructure size (is it exceeding the maximums?) and requirements for other products like Site Recovery Manager. Geographical location of the datacenters is also a good reason to deploy multiple vCenter Servers. If the design includes multiple vCenter Servers, configure vCenter Linked Mode unless the systems are administrated by different administration teams and require separate authentication system.
- Templates: a good way to deploy the templates is to configure a template per OS type. Templates ease administration and are faster to deploy than new VM’s. A good way to save storage cost is to place the templates on a less expensive storage.
- vCenter Update Manager: for updating the ESX(i) hosts, the vCenter Update Manager is the obvious choice. Updates are automatic and the compliance check is build-in.
- Time Synchronization: the time synchronization must be maintained between the critical components of every infrastructure. Configure the VM’s to sync the time from a PDC or internal stand-alone NTP server. For more info on time keeping in VM’s, read the “Time keeping in VMware Virtual Machines” document.
- Snapshot Management: develop and maintain snapshot policy for the virtual infrastructure to prevent performance issues and storage space overhead. Alternatively, you could make snapshots part of the change management procedure.
- CIM & SNMP: in the ESXi version, the CIM component is installed by default. The vendor version of ESXi software might provide more information than the standard VMware version. Enabling SNMP might be useful in an infrastructure where a SNMP management application is already running.
- Performance Monitoring & Alarms: first question that arise is what to monitor? You can use the organizations SLA’s in combination with the “Performance Best Practices for VMware vSphere” document to determine the monitoring strategy and then configure the alarms to meet those requirements.
- Logging: just like in monitoring and alarms, the first question that arises when configuring logging is what to log and how long the information should be retained. Longer retention means more information to troubleshoot and audit but if there are no specific requirements from the organization, the best way is to keep the defaults. To simplify the management of the logs, a central logging system will take care of storing and archiving of the logs. If there already is a logging server running in the infrastructure, use it. Otherwise, use the one provided with VMware vMA (vilogger).
After three days of immense load of information about the design of a new VMware vSphere 4 infrastructure, now it’s the time to recap the whole course. In my opinion, the VMware vSphere 4 Design Workshop will give you a clear picture of the design process, step-by-step. The course will not teach you “the best” way to design a new infrastructure or provide you “the best” practices because there is no such thing. The best practices are merely guidelines. Every design is different and every decision made in a design has its pros & cons. Just make sure you understand why these decisions have been made and you are able to explain it in your design. This course is close in line with the VMware Plan and Design Kit for vSphere that is available through the VMware Partner Central so I suggest you take a look at it as well. The course is not intended for administrators but for system engineers, (virtual) infrastructure specialist and consultants responsible for designing new virtual infrastructures. The course length is too short in my opinion. One more day should be sufficient to discuss some of the topics deeper and share the experience among the participants. Overall, it is a good, informative course and I would recommend it to anyone who is interested in designing VMware vSphere 4 infrastructures.
Posted in VCAP-DCD, VMware | Leave a Comment »
Posted by Marek.Z on 28 February 2011
Here is a recap if the second day of the VMware vSphere Design Workshop, which contained the following two modules:
- VMware vSphere Virtual Datacenter design: which included the vCenter Server, Database Systems (vCenter & VUM), HA/DRS/DPM, Clusters and Resource Pools.
- VMware vSphere Network design: including the network design requirements, number of networks and why to separate or segment them, (P)VLAN’s, virtual and distributed switches (incl. Cisco Nexus 1000v), security, overview of the Fiber Channel over Ethernet (FCoE), physical switch (pSwitch) configuration, VMdirectPath I/O design and limitations, the DNS design and last but not least the network naming conventions.
This looks interesting so let’s have a closer look at the sub-topics.
1. vCenter Server and Database Systems:
- OS type for the vCenter Server: obviously, always install a supported OS for your vCenter server. This will prevent support issues when something goes wrong. Consider using an x64-bit OS for more scalability. Check the “Configuration Maximums for VMware vSphere” document for the supported OS’s.
- vCenter Server Hardware: well, this can change between the software releases so check the document mentioned above for the latest specifications.
- vCenter Server name and IP address: for ease of administration and management, use a static configuration.
- Type of vCenter Server (virtual/physical): both installations are supported and in my opinion, the virtual installation is
preferable because it provides more flexibility. Virtual vCenter Server should also be placed in a HA cluster to prevent from becoming a single point of failure in the infrastructure.
- OS type for the vCenter Database Server: just like for the vCenter Server, always use a supported OS for the Database Server. The same goes for the database, use only supported database type and version.
- vCenter Database Server name and IP address: just like for the vCenter Server, use static configuration.
- Type of vCenter Database Server (virtual/physical): you can use one of the existing database servers in the infrastructure to save cost if it already is high available. Also, consider migrating the physical machine to virtual to gain the advantages of a VM.
- vCenter Server Database location: for small infrastructure with few ESX(i) hosts and VM’s you could use the vCenter Server as the location for the database. In all other cases, use a separate machine.
2. Clusters and Resource Pools:
- Scaling the infrastructure: when buying fewer, more powerful hosts or more, less powerful hosts evaluate the pros & cons of both options.
- Mixed Clusters: you can use mixed clusters (with ESX 3.x and ESX 4.x) during an upgrade to the latest version but you should
upgrade the hosts first and then the VM’s.
- VMware HA failover policy: there are three options that you can choose when you configure the failover policy of the HA cluster. The “Host Failures Cluster Tolerates”, the “Percentage of Cluster Resources Reserved” and the “Specify a Failover Host”. I’m not going to explain each option here. Read the VMware vSphere 4.1 HA and DRS technical deepdive book written by Frank Denneman and Duncan Epping instead
- VMware HA Network Redundancy: network redundancy is crucial in a HA cluster. Make sure the host does not become isolated by configuring a second management network on a separate physical network or use redundant NIC’s on a single network. Use the das.isolationaddress parameter in the advanced option of HA to eliminate the single point of failure of the isolation test address. When you configure this option, you should also increase the setting for das.failuredetectiontime parameter to 20,000 milliseconds or more. Again, this topic is extensively discussed in Duncan and Frank book.
- Isolation Response: the default setting in vSphere 4 is Power off VM. Do not change the default unless the management network is not redundant, the availability cannot be guaranteed and the virtual machine and storage network are separated from the management network and are far more reliable.
- Blade Servers in a VMware HA cluster: keep in mind that the first 5 hosts added to the HA cluster automatically become “Primary Nodes”. In a blade environment, the blade chassis is a single point of failure therefore the blades servers should be distributed across multiple clusters and each cluster should not contain more than four primary nodes.
- VMware Fault Tolerance: you can protect the most critical VM’s in the infrastructure with FT. Keep in mind that many virtualization benefits like snapshots, DRS and Storage vMotion will be not available on a FT machine. FT is only applicable to single CPU VM’s and requires a dedicated 1Gb network per host for logging which increase the costs per host.
- VMware DRS Cluster design: as a thumb of rule, you should always configure DRS in the cluster. DRS will automatically load balance the workloads, which leads to better performance, scalability and manageability. Enable Enhanced vMotion Compatibility (EVC) mode, this will enable you to add newer hosts in the future. From the cluster scaling perspective, DRS will benefit of more hosts per cluster rather than fewer hosts. Be careful when configuring the DRS affinity and anti-affinity rules. Many rules will limit the migration choices and could potentially have a negative effect on the workload balance.
- Multiple HA & DRS Clusters: VM’s/hosts per cluster maximums are one of many reasons why you should deploy multiple cluster in the infrastructure. It would be advisable to create a separate cluster for server-based load and a separate one for VDI based load. Other aspects like security (dedicated DMZ cluster), production/non-production environment, are also a valid argument to creating a multiple cluster design.
- Resource Pools: you can deploy resource pools with workloads that require dedicated and isolated resources or if you want to distribute the system resources among departments or project teams. Keep in mind that resource pools create an
additional layer to monitor and that the share settings on the resource pool are not inherited by the VM’s.
- VMware DPM design: using the DPM in environments where the workload is not constant, will save power and cooling costs. If your design includes DPM, make sure that the hosts support iLO or IPMI technology.
3. VMware vSphere Network Design:
- Network Design Guidelines & Requirements: when designing a new virtual network infrastructure, try to meet the goals like reducing costs, boost performance, improve security and availability, provides more flexibility and manageability. In other words, the new design should be more efficient and modular.
- Number of Networks: the number of separate networks depends on the type of traffic required in the new environment. Consider placing the virtual machine network, FT, iSCSI/NFS, HA, vMotion and the management network on a separate network. This will reduce congestion and latency but will also improve performance.
- Network Segmentation and VLAN’s: there are several reasons why you should use VLAN’s. In most cases, this will be applicable when there are more networks than physical NIC’s available.
- Private Virtual LAN’s (PVLAN’s): private VLAN’s provide layer 2 network isolation between VM’s on the same subnet. Common use of PVLAN’s is a DMZ network.
- Standard switches & distributed switches: dvSwitches is the right choice when your design includes the Enterprise Plus
license, otherwise you’ll have to design the network with standard vSwitches.
- Cisco Nexus 1000V switch: the Nexus 1000V switch from Cisco is a distributed virtual software switch and behaves like a normal physical switch in the vSphere environment.
- Number of virtual switches: keep the number of switches as low as possible, this will simplify configuration, administration and monitoring.
- NIC Teams, Load Distribution and Availability: team the physical NIC’s to increase the network bandwidth, availability and
redundancy. To avoid a single point of failure, never team two or more ports on the same NIC and spread the ports across multiple physical switches.
- NIC Team Failure Detection: as a failure detection mechanism, you could enable the beacon probing on the NIC teams that consist of three or more uplinks and are connected to multiple physical switches. Make sure that the physical switches don’t block the beacon probing packets.
- Virtual Switch Security: as a best practice, set the setting of the “Forged Transmits”, “MAC address changes” and “Promiscuous Mode” to “Reject”. Enable those setting only when there is a valid argument to do it. For example, you must enable the “MAC address changes” when you plan to use Microsoft NLB.
- Physical NIC’s features: when purchasing new physical NIC’s make sure that they support the Checksum off-load, TCP segmentation off-load, 64-bit direct memory access, Jumbo Frames etc. These features will help increasing the network performance and reduce the CPU workload on the ESX(i) host.
- Fiber Channel over Ethernet: in large network infrastructures, consider using FCoE to consolidate I/O and reduce the number of host adapters and even the physical switches. This will also save on the power, heat, cabling etc. Make sure that switches used in the new environment are FCoE capable.
- Physical Switch configuration: always use redundant physical switches. Turn off STP if possible or enable the “PortFast mode” on the ports connected to the ESX(i) host. This will prevent VM time outs during a network outage. Consider using Jumbo Frames to improve the throughput on NFS/iSCSI networks. Do not forget to configure the whole stack to use Jumbo Frames (NIC’s, switches, SAN etc.) and benchmark your setup to determine if there are performance gains when using Jumbo Frames.
- VMDirectPath I/O design: this feature should not be used unless there is a clear and valid reason to do it. The features like vMotion, DRS, HA etc. are lost when using VMDirectPath I/O.
- DNS and Network Naming Conventions: to simplify the administration and management of the vSphere infrastructure, all key components like the ESX(i) hosts, vCenter Server, Database Server etc. should be registered in DNS. Choose simple, clear and easy to remember names for the infrastructure components. This will ease the administration and will probably save time when troubleshooting.
OK, this was day two. Again, a lot of interesting and useful stuff. Believe me there is a lot more but the topics described above will give you a quick and clear overview on the material this course covers. Next, the final day, day 3. Stay tuned!
Posted in VCAP-DCD, VMware | Leave a Comment »
Posted by Marek.Z on 21 February 2011
The upcoming posts that I will be posting here, will describe my experience of the VMware vSphere Design Workshop that I was participating at Xpert Training Group (XTG) in Gouda. The instructor was Marcel van Os. Today, day one of this three day course.
After some coffee and quick introduction of the students, we jumped into the course material, which consisted of the following topics:
1. Course introduction and the modules overview.
2. Design Process Overview: this module describes the design methodology, criteria and approach. The module describes the following phases in the design methodology: Architectural vision (scope, goals, requirements, assumptions, constrains, risks), Architectural analysis (a current-state analysis of the existing infrastructure), Technology architecture (create a conceptual, logical and physical design). The design criteria should include the usability (performance, availability, and scalability), manageability (easy to deploy, administer, maintain and update/upgrade), security (minimize risks) and costs (meet the needs and objectives, fit within the budget). At the end of this module, there is an example of the design process. This example will show and guide you through the design process and as a final step; it will describe what documents will be created (required and optional).
3. Host design module: this module concentrates on making the host design decisions based on the gathered information. This concerns the following:
- CPU capacity and number of hosts: one of many methods to gather the CPU usage is the VMware Capacity Planner. You could also use the OS build-in CPU monitoring utility like Perfmon in Windows operating systems. Do not saturate the CPU with workloads. Leave some spare room for future growth, ESX Service Console, VMkernel, VMware HA etc.
- Number of CPU cores per host: beside the preference of a certain model keep in mind that the number of the physical CPU cores should exceed the number of the vCPU’s in the largest Virtual Machine. More on this topic can be found in the “VMware vSphere CPU Scheduler in VMware ESX” document.
- Number of vCPU per core: the number of vCPU’s per processor core is mostly determined by the VMware support maximum. Check the “Configurations Maximums for VMware vSphere” document for detailed information because the supported maximums can change with a new software release.
- CPU features: first of all the processor must be x64-bits. The x32-bits systems are no longer applicable. Check if the chosen processor supports the VT-x and EPT from Intel and the AMD-V and RVI from AMD.
- Licensing: the licensing for CPU’s concerns two major aspects. The chosen license must support the total number of CPU sockets and the total number of the cores per socket.
- Memory capacity and number of hosts: just like in the CPU capacity sub-topic described above, you could use the VMware Capacity Planner or an OS build-in utility to measure the memory resource usage on the systems. This should give you an overview of the amount of the current memory capacity and from there you can calculate the needed memory capacity for the new environment. Keep some spare room for future growth, VM’s and Service Console overhead, VMware HA etc.
- Memory overcommitment: overcommitting the total physical memory of a host could save costs but keep in mind that if the active memory is overcommitted the VM’s will be forced to use the memory saving techniques to compensate the physical memory. This could lead to some performance issues.
- Service console memory: set the memory for ESX host to its maximum (800MB) to prevent running out of memory and having the Service Console to swap. Other option is to implement the ESXi version that does not require the configuration of the Service Console memory.
- Service Console swap partition: for the ESX hosts, configure 1600 MB partition for the Service Console swap or use the ESXi version to eliminate this.
- Nouniform Memory Access (NUMA) and settings in the host BIOS: for NUMA capable hosts like Intel Nehalem or AMD Opteron distribute the physical memory evenly between CPU’s. VMware ESX(i) is NUMA aware so you can disable the “Node Interleaving” feature in the BIOS.
- Host hardware type (rack mount vs. blade): compare the hardware aspects as scalability, modularity, flexibility, costs etc. Base the decision on those criteria’s.
- ESX software type (ESX/ESXi): In the future, the ESX version will be replaced by the ESXi version. ESXi is the way to go unless there are some hardware or software constraints like compatibility issues or management/back-up software agents.
- Host BIOS settings: for best performance, enable the Intel VT-x & EPT and AMD RVI & AMD-V settings in the BIOS. Enable Hyperthreading and Turbo Mode if the processor supports it. Disable unnecessary devices like serial, parallel, USB ports etc. Basically, turn off what you don’t need.
- PCI slot design for networking: consistent PCI slot assignment across all hosts is crucial if you plan to use automated installations or Host Profiles. It is easier to administer and simplifies troubleshooting.
- Power and cooling design: make sure that the datacenter where the new infrastructure will run has sufficient power and cooling capacity and all active components have redundant power supplies. This will prevent unnecessary downtime.
- Host name convention and IP addresses: to ease the management and administration use easy to understand host names and assign static IP addresses for the hosts. Register the name and IP address of the hosts in the DNS.
- Host security: From the security perspective, the best choice is the ESXi version. It has smaller attack surface but it also offers a lockdown mode. The other thing to consider is the firewall. The ESX has a build-in firewall, as ESXi requires an external firewall. Whatever the version, choose central management through vCenter Server. Limit the number of users that have root/administrator access. Limit the number of 3rd party agents that run on the Service Console or if possible move them to the vMA appliance.
OK, that was just day one. A lot of info to digest but it all makes sense and is interesting and good to know. Next blog post will cover day 2 of the course. Stay tuned!
Posted in VCAP-DCD, VMware | 2 Comments »