Next Gen Computing @nextgencomputing-blog - Tumblr Blog

A great video about how it is to work at Dell.

#iwork4dell

iSCSI Multipathing: Essential to ensure high availability and bandwidth in your iSCSI SAN

iSCSI connectivity is critical when used for SAN in a datacenter. High Availability and guaranteed bandwidth availability are essential for successful operation with the desired up-time and performance requirements. Different multipathing schemes are used for the iSCSI connections to ensure high availability and bandwidth. Multipathing in simple terms is the availability of redundant physical components like, adapters, cables, switches, to create redundant logical iSCSI paths between an iSCSI initiator and an iSCSI target. Having multiple paths ensures high availability in case of a failure and higher bandwidth. There are three main types of multipathing for iSCSI.

Link Aggregation

Active/Standby Multipathing

Active/Active multipathing

Link Aggregation: IEEE 802.3ad Link Aggregation Control Protocol (LACP) and EtherChannel

Link Aggregation refers to the concept of aggregating two or more iSCSI links into a single link. IEEE standard 802.3ad and EtherChannel both independently specify a method for aggregating multiple Ethernet links to form a single network link. The packet traffic between an initiator and a target is distributed among aggregated links using a hash function based on a variety of parameters, including source and destination MAC addresses, source and destination IP address, or other. The paths are generally converged at the NIC driver layer or the operating system interface layer. A single IP address is assigned to the set of links on both the initiator and the target.

IEEE 802.3ad standard does not require any particular distribution algorithm but the distribution algorithm used needs to meet the following criteria.

The algorithm will not

a) Mis-order frames that are part of any given conversation, or

b) Duplicate of frames.

These conditions are met by ensuring that all frames that compose a given conversation are transmitted on a single link in the order that they are generated. This means that the traffic might not be uniformly distributed across the aggregated links. Generally, there are enough conversations to, more or less, distribute the traffic between the aggregated links but if the traffic is composed of only one conversation, i.e. one initiator/destination, then at any given time, only one of the aggregated links will be used. In such a scenario, the effective bandwidth will be equivalent to the bandwidth of a single port. If this is a 10Gb environment and if only iSCSI traffic is passing through this port, this may be OK. Regardless, this will provide redundancy and will make the iSCSI link highly available. This is an important concept: Link Aggregation does not always mean a fatter pipe.

Using a round-robin algorithm, instead MAC or IP hash, would correct this problem, but it does not meet the criteria for the 802.3ad standard. EtherChannel and other nonstandard mechanisms do allow for the use of round-robin distribution. Using EtherChannel would provide the opportunity to get rid of this issue but not all NIC and switch hardware support EtherChannel.

As evident, the end-user performance and redundancy requirements, available hardware and other factors need to be carefully considered before deciding to use Link Aggregation.

Active/Standby Multipathing:

Active/Standby multipathing creates high-availability but does not increase the bandwidth of the solution. In case of Active/Standby Multipathing, iSCSI initiator will initiate a TCP connection per port to the iSCSI target. That means, for two NIC ports, two TCP connections will be established. One of these TCP connections will be the primary connection, which will be active; and other ones will be standby at any given time. Data will flow down the active connection until active connection fails. In case of such a failure, traffic will then be diverted to the standby connection. Since the standby connection is fully established, failover is very fast.

The choice between Active/Standby and Active/Active multipathing also depends upon the iSCSI SAN storage array. Not all array controllers support Active/Active and Active/Standby both. Some might support either of the two and some might support both.

Also, it is important to note that Link Aggregation can be used with Active/Active or Active/Standby multipathing. As mentioned before, Link Aggregation creates a single link by aggregating multiple links. These "aggregated single links" can be multipathed in Active/Active or Active/Standby mode. This is especially useful when some aspect of the design dictates the use of Active/Standby multipathing. In this scenario, multipathing is not going to increase your bandwidth. In this case, Link Aggregation can be used to improve total bandwidth by creating Active/Standby multipathing between two aggregating links of two. For example, if you need 20Gb of storage bandwidth with Active/Standby multipathing, you would need 4x 10 Gb ports in two bundles of two to achieve this.

Active/Active Multipathing:

Active/Active multipathing increases the available bandwidth between an iSCSI initiator and target. In Active/Active multipathing, bandwidth of both of the adapters can be used. This is, of course, dependent upon the algorithm used for load balancing. If the load balancing algorithm performs hashing of IP or MAC, then both of the links may not be used simultaneously all the time. If round-robin or other similar algorithms are used, efficiency simultaneous use of both of the adapters is possible.

Active/Active multipathing is performed by using a software agent (a multipathing driver) in the software stack above the SCSI layer. This multipathing module presents the two iSCSI initiators as a single path or link to the disk driver. As you can imagine, this is only possible when both of the iSCSI initiators are connected to the same end device (a LUN or datastore). This is ensured by using SCSI inquiry commands to check for commonality in device paths, usually the LUN serial number.

As mentioned before, Link Aggregation can also be used with Active/Active multipathing, although it may not provide any significant benefits.

Summary:

Active/Active multipathing is a great way to ensure redundancy on the iSCSI connection and also to utilize the bandwidth of all available adapters used to create the iSCSI connection. It may not always be possible to use Active/Active multipathing from storage array compatibility, iSCSI adapter support, OS/hypervisor support or cost perspective. When Active/Active multipathing is not possible, Active/Standby or Active/Standby along with Link Aggregation also provide a good way of ensuring high availability and bandwidth for iSCSI connection.

#iSCSI #Multipathing #HBA #NIC #Active-Active #Active-Standby #Active-Passive #Link Aggregation #LACP #IEEE 802.3ad #Round Robin #Datacenter #Data Center #High Availability #Redundancy #SAN

DDR4 is coming!

THE DDR4 memory standard, the next generation of the memory technology standard, is expected to be available from JEDEC (the Joint Electron Device Engineering Council) in mid-2012, so some time pretty soon.

DDR4 standard development:

JEDEC announced the key attributes of the DDR4 memory in August, 2011. According to the information DDR4 on the JEDEC website, VDDQ will be held constant at 1.2V, while allowing for a future reduction in the VDD supply voltage. The per-pin data rates, over time, will be 1.6 GT/s to an initial maximum objective of 3.2 giga transfers per second. With DDR3 exceeding its expected peak of 1.6 GT/s, JEDEC expects that higher performance levels will be proposed for DDR4 in the future. A geardown mode for 2667 Mhz data rates and beyond is also expected to be there in DDR4.

Here are some of the other features and attributes that JEDEC has mentioned on their website as expects to be there in the DDR4 standard.

Three data width offerings: x4, x8 and x16

New JEDEC POD12 interface standard for DDR4 (1.2V)

Differential signaling for the clock and strobes

New termination scheme versus prior DDR versions: In DDR4, the DQ bus shifts termination to VDDQ, which should remain stable even if the VDD voltage is reduced over time.

Nominal and dynamic ODT: Improvements to the ODT protocol and a new Park Mode allow for a nominal termination and dynamic write termination without having to drive the ODT pin

Burst length of 8 and burst chop of 4

Data masking

DBI: to help reduce power consumption and improve data signal integrity, this feature informs the DRAM as to whether the true or inverted data should be stored

New CRC for data bus: Enabling error detection capability for data transfers – especially beneficial during write operations and in non-ECC memory applications.

New CA parity for command/address bus: Providing a low-cost method of verifying the integrity of command and address transfers over a link, for all operations.

DLL off mode supported

DDR4 product availability:

In July 2012, Samsung announced that it had begun sampling industry's first 16GB DDR4 RDIMMs for enterprise class server systems. (Samsung had developed its first DDR4 module much earlier.) Samsung provided information that they had created 8GB and 16GB DDR4 DIMMs using 30nm-class technology operating at 1.2 V, which they expected to result in approximately 40% reduction in power consumption over DDR3 DIMMs operating at 1.35V. Samsung also said in the announcement that 32GB DDR4 memory manufactured using 20nm-class technology will be available sometime next year.

In May 2012, Micron announced its first fully functional DDR4 module. They have co-developed with Nanya, based on 30nm technology. Micron website provides a great summary of DDR4 features and their comparison with DDR3.

Source: http://www.micron.com

Although the DDR4 DIMM modules will be available, we won't be able to use them until a processor is available that uses/supports DDR4 memory. Currently, no such processor is available in the market. The enterprise customers will have to wait till Intel and AMD come out with their processors that support DDR4.

#DDR4 #DRAM #DDR3 #JEDEC #Samsung #Nanya #Micron #16GB #8GB #30nm #20nm #DDR3 VS. DDR4 #Comparison

Hardware VS. Software iSCSI Initiators: The choice matters.

When deploying an iSCSI based network, especially large iSCSI SAN in datacenters, the choice of iSCSI initiator is very critical and has a great impact on the choice of hardware, operating systems and your overall network architecture.

Let's start with a quick overview of iSCSI SAN.iSCSI SAN is basically a network that allows one or more servers (or any other computing entity) to access shared storage using iSCSI protocol.

What is an iSCSI initiator?:

An iSCSI Storage Area Network comprises of an iSCSI initiator and an iSCSI target. The initiator is generally a server or some other for of computing entity that needs access to storage; and a target is generally a storage entity connected to the IP network, like a storage array with RAID controller.

The diagram above conceptually describes iSCSI SAN with an initiator and a target.

Type of iSCSI initiators:

Typically, there are three different types of iSCSI initiators.

1. Software iSCSI initiator with traditional NIC:

The iSCSI initiator is implemented in the OS through iSCSI driver and a traditional NIC (Network Interface Card) is used for network connectivity. Traditional NICs, which are basically Ethernet adapters, can only transfer data in the form of TCP/IP packets. On the other hand, the data generated by the server or PC is block level data. This block level data needs to be converted into TCP/IP packets before the NIC can transmit that data. The server handles the packet creation of block level data and performs all of the TCP/IP processing using the iSCSI driver in the OS.

Benefit: Traditional NICs can be used. Expensive iSCSI HBAs are not needed.

Drawback: TCP/IP processing takes CPU cycles and impacts CPU performance.

2. Software iSCSI initiator with TOE capable NIC:

A NIC with TOE (TCP/IP Offload Engine) can perform TCP/IP processing and relives the server from that task. The software iSCSI initiator still handles the iSCSI connection. Thus a TCP/IP offload storage NIC operates more like a storage HBA rather than a standard NIC.

Benefit: A traditional NIC cannot be used but the upgraded NIC hardware generally does not cost as much as an iSCSI HBA. At the same time, this results in performance improvement over iSCSI with traditional NICs.

Drawback: There is still some performance impact on the host server.

3. iSCSI HBA:

iSCSI HBA takes the data in block form, performs processing on the adapter card with TCP/IP processing engines, and then send the IP packets across an IP network.

Benefit: All packet processing is performed on the HBA hardware and iSCSI connections are also managed by the HBA hardware. This provides the highest performance for the host servers among the three options.

Drawback: iSCSI HBAs are generally more expensive than traditional NICs. Also, if you want to introduce iSCSI SAN to an infrastructure with multiple servers connected to an IP network, the existing NICs cannot be used. Additional iSCSI HBAs need to be used in each server.

How to make the right choice?

Every environment has different needs and different considerations. There are no hard rules about when to use which type of iSCSI initiator. The choice depends upon your budget, management requirements, legacy hardware, performance requirements, budget and available skill-set.

Software iSCSI initiator is the cheapest option, and in most cases, the simplest to implement and manage. Software iSCSI initiators will surely impact the performance of the server. There days, when high-speed, multi-core CPUs are common in servers, not all servers are run at maximum CPU capacity. In these scenarios, dedicating CPU resources to iSCSI operation for a software iSCSI initiator may not be an issue.If you have a legacy environment with traditional NICs, you can use them with software iSCSI initiators. A lot of new NIC hardware provides TCP/IP offloading capabilities. If you have such a NIC and if you would like to dedicate most of the your CPU resources to your applications, then you can use software iSCSI initiators but can offload TCP/IP operations to the NIC. If CPU performance is really critical and if all CPU resources need to be assigned to your applications and if budget for iSCSI HBAs for hardware iSCSI initiators is available with the understanding of the management component requirements, iSCSI HBAs can be used.

#Driver #HBA #IP #NIC #Offload #SAN #SCSI #Software #TCP #VMware #Virtualization #hardware #iSCSI #iSCSI initiator

Stretched Storage Cluster with VMware vMSC (vSphere Metro Storage Cluster): Great solution for Remote/Branch Office Virtualization

A stretched storage cluster is a storage cluster that is distributed across two or more geographical locations. It is suitable for geographically separate datacenters with high speed connectivity between them and with stringent disaster/downtime avoidance requirements. A stretched storage cluster is an important solution type and architectural concept, especially now when a lot of customers are utilizing virtualization in ROBO (Remote Office / Brach Office) type environment. VMware vSphere Metro Storage Cluster is a stretched storage cluster with VMware vSphere.

The diagrams below shows the concept of vMSC architecture.

Source: http://www.vmware.com/files/pdf/techpaper/vSPHR-CS-MTRO-STOR-CLSTR-USLET-102-HI-RES.pdf

When to use stretched or metro clusters:

If you need to often move workloads between two or more sites non-disruptively.

If one site goes down (network failure, server failure, storage failure or complete site failure at one site), you need to continue running workload from the other site.

The two sites are not very far. The distance will depend on interlink technologies, cluster requirements and storage technology used but generally it should be less than 100 km.

The two sites should be connected through dedicated high speed link and not just regular WAN connection.

How a backup-restore type disaster recovery solution is different from a stretched or metro cluster:

A disaster recover solution works at any distance without a need for dedicated high-speed low-latency interlinks between sites.

The disaster recovery solution includes backup and replication at different levels and locations, i.e. at OS or hypervisor level, application level or at block or file level and also locally or at another site. It means that this strategy can be differently determined for different servers and applications. Some applications can be backed up only locally and some high priority critical applications can be backed up at another site. This means that you can customize your disaster recovery strategy to your budget and requirements. You do not have to set up an extensive inter-site network and storage infrastructure, like in the case of a stretched cluster.

A disaster recovery solution does not generally provide true non-disruptive high availability needed to migrate a workload between sites.

In most cases, when a stretched cluster is deployed, a complementary disaster recovery solution is also used.

Design considerations:

There are multiple factors affecting the design of a stretched storage cluster. Major considerations are which workloads should run at which of the locations during normal operation. This can be determined by applying affinity to each workload for specific servers (and so to a site). A lot of scenarios involving the type of failures (host level, network switch level, storage array level, enclosure level, disk level, etc.), affinity of different workload to different datacenters in regular operation (no-failure), correct HW sizing to allow workload failover across datacenters for different type of failures, etc. have to be considered to be able to correctly design a vSphere vMSC or any other stretched cluster.

Further reading:

VMware vSphere Metro Storage Cluster Case Study

Stretched Clusters and VMware vCenter Site Recovery Manager

#Stretched Storage Cluster #Metro Storage Cluster #VMware #vSphere #VMware vMSC #vSphere Storage Metro Cluster #Remote Office #Branch Office #ROBO #Virtualization #High Availability #Disaster Recovery

At the Dell Social Innovation Challenge 2012 award ceremony.

#Dell Social Innovation Challenge #DSIC #Social Entrepreneurship

Network Edge Virtualization: IEEE 802.1 Qbg and IEEE 802.1 BR

One of the biggest changes in the edge network due to virtualization is the VEB (Virtual Ethernet Bridge) within a host for layer-2 switching between VMs on the host.

Issues with current virtual server networking:

Management and policy change:

Network administrators cannot enforce policies on vSwitches within the hypervisor. vSwitches are controlled by server administrators.

Traditional datacenter network monitoring and control tools do not have visibility into the virtual networking within hypervisor.

Security policy cannot be applied to the vSwitches using the same tools.

This results in increased end-point network complexity.

Limited features:

Virtual switches within hypervisor generally do not support all functionalities of a traditional switch. This may sometimes be deliberate so as to keep the performance impact of the virtual switches to minimal.

Performance impact:

Virtual networking within hypervisor uses host CPU cycles and memory resources and impacts performance.

The problem exists currently because the Ethernet bridge for layer-2 switches for VM-to-VM traffic within the same host is currently within the host server, in form of a vSwitch or a virtual Ethernet bridge within NIC silicon. The obvious solution would be to keep this Ethernet bridge outside the host server and into one of the edge switches (ToR or EoR). That way, the traffic between the VMs will also have to go to the edge switches and will be under the visibility and control of the network administrator and will be able to have the same network policies as the rest of the network.The reason that this is not implemented so far is that the current Ethernet standard does not allow "hairpin traffic", which the packets exit the through the same port or the switch that they entered. There are two solutions that are currently being pursed to be made into IEEE standards.

Two different solutions:

Edge Virtual Bridging (EVB) - IEEE 802.1 Qbg

Bridge Port Extension - IEEE 802.1 BR

Both of these standards have not yet been finalized and are expected to be ratified soon.

Edge Virtual Bridging (EVB) - IEEE 802.1 Qbg

IEEE 802.1 Qbg specifies a function in the controlling switch that allows a packet received on a switch port to be pinned on the same port, a behavior called reflective relay or hairpin forwarding.

The VEB, the Ethernet bridge within a host, will forward all frames sourced by the virtual machines to the adjacent controlling switch.The controlling switch will apply various policies on those frames and then will forward them back to the VEB. The VEB will then forward the frame to the appropriate virtual machine based on the MAC address and the VLAN ID.

Supporting IEEE 802.1 Qbg requires no hardware change. It will require changes in the VEB (in hypervisor or in NIC) and in the switch firmware to support reflective relay.

The IEEE 802.1 Qbg standard does not specify how a VEB in hypervisor or NIC uses the relay function. These implementations will be vendor dependent and propriety.

Bridge Port Extension - IEEE 802.1 BR

The purpose of this standard is to extend a bridge, and the management of its objects, beyond its physical enclosure using 802 LAN technologies and interoperable interfaces. IEEE 802.1 BR defines E-Tag.

E-Tag can be used to identify a virtual or physical interface and provide frame forwarding. Using an E-Tag capable NIC or software driver these interfaces could potentially be individual virtual or physical servers.

Management of large networks is highly complex. This complexity may be reduced by aggregating the more complex bridging functions onto fewer bridges and by collapsing bridge layers from a management perspective.

Implementation of E-Tag based networking will require E-Tag aware switches and port or fabric extending devices. This will require new hardware.

#Network Edge Virtualization #IEEE 802.1 Qbg #IEEE 802.1 BR #VEPA #E-Tag #VN-Tag #Reflective Relay #Port Extension #Bridge Extension

Dilbert on cloud computing.

#Cloud Computing

Virtualization. What? Why? How?

In last 3-4 years, virtualization has gone from a marketing buzz word to becoming a necessity that every CIO will prioritize. But even today when I talk to a engineers and technologists, not everyone is sure what virtualization is and, more importantly, what it is not. So many times, it is confused with aspects of cloud computing. I think that an attempt to unravel the intricacies of virtualization will be a justified start to a new blog on the next generation computing technologies.

Essentially, virtualization is a collection of technologies that abstracts the physical infrastructure and isolates it from the computing utility that utilizes that infrastructure. In case of a server or a PC this means abstracting the CPU, memory and IO devices and allowing simultaneously running more than one OS images on single hardware (known as a "physical machine") by virtually dividing the hardware CPU, memory and IO resources among the "virtual machines" running those OS images. In this case, each OS image running on a virtual machine will behave as if it were running on a hardware with CPU, memory and IO resources (that are, in fact, assigned to it virtually). The diagram on right describes this concept. Don't worry if you do not know what a "hypervisor" is. We will talk about it below.

"Virtualization" generally refers to the concept mentioned above, i.e. "hardware virtualization". Today, there are different technologies also enabling what is called "Network Virtualization", "Storage Virtualization", "memory virtualization", etc. They rely on the basic virtualization technologies as explained below in detail and also some other innovative techniques specific to networking and storage hardware subsystems to achieve certain behavior analogous to "server virtualization" or "hardware virtualization". These topics are out of the scope of this post but I would emphasize that if you can grasp the details of "server virtualization", then you can easily understand the details of "Network Virtualization" and "Storage Virtualization" and the rationale behind it going forward. I promise to deal with them in future posts.

Getting back to server virtualization .......

There are three methods or types of server virtualization:

1. Full Virtualization:

In Full Virtualization, hypervisor abstracts all hardware by capturing all I/O between an OS and the hardware. The guest OS is completely agnostic to the fact that it is running on a hypervisor and not natively on a physical machine. The advantage is that any guest OS can be run on a virtual machine without worrying about support for virtualization in the guest OS. Operating systems designed before virtualization can also be used on virtual machines. The potential disadvantage is that the system performance will degrade because the hypervisor needs to emulate each I/O operation of each virtual machine.

VMware ESXi enables full virtualization.

2. Paravirtualization:

Just like Full Virtualization, a hypervisor is installed on a physical server and a guest OS is installed into the environment. But unlike Full Virtualization, the guest OS knows that it is operating in a virtualized environment. The guest operating systems require extensions to make API calls to the hypervisor. Not every operating system or application can support paravirtualization. This form of virtualization is generally performs more efficiently and faster than Full Virtualization, at least theoretically.

3. Hardware Assisted Virtualization:

Hardware Assisted Virtualization enables Full Virtualization by using hardware capabilities. Hardware-assisted virtualization relies on hardware extensions to the x86 system architecture to eliminate much of the hypervisor overhead associated with trapping and emulating I/O operations and status instructions executed within a guest OS. Hardware-assisted virtualization was added to x86 processors (Intel VT-x or AMD-V) in 2006. Most known hypervisors support Hardware-Assisted Virtualization.

Why Virtualization?

Efficient resource utilization and complexity reduction:

By abstracting the hardware resources, isolating the utilities/applications and optimizing the placement of the utilities on hardware, virtualization enables very efficient use of hardware. Also, by reducing the total hardware requirement, virtualization simplifies an architecture.

Enabler of Cloud Computing:

Virtualization is the fundamental enabling technology for cloud computing. Virtualization is the technology that enables moving workload or application anywhere in the cloud and accessible from anywhere from different devices. There are a lot of different cloud architectures classified based on architecture, usage type, end goal, etc. but in the end the isolation of utility from the compute infrastructure, which is the fundamental principle of cloud computing, is enabled by virtualization.

Some useful resources:

Here is a brief introductory video on the concept of virtualization with some whiteboarding by Dan Chu of VMware.

If you are interested in understand the concepts of virtualization in greater details, here is a great free e-book by AMD.

If you interested in undersreading more about virtualization technologies and products, this is a good introductory book.

#virtualization #what is virtualization #concept #theory

Trending Blogs

Recently Viewed Blogs

Next Gen Computing