Cisco Throws Its Hat in the HCI Ring with HyperFlex

To the surprise of no one but to the delight of some, Cisco unveiled their official entry into the hyperconverged infrastructure (HCI) market today. While there have been preceding Cisco partnerships with HCI vendors like SimpliVity and Maxta, the newly-announced Cisco HyperFlex Systems provide the first Cisco-only branded and supported HCI offering.

Cisco HyperFlex Systems: Hardware


The HyperFlex systems consists of two main physical components: Cisco UCS servers for compute and storage, and UCS Fabric Interconnects (FIs)for management. The FIs can be either 6248 or 6296, and there will be two flavors of the servers: HX220c nodes, based upon the UCS C220 platform, and HX240c nodes, based upon the UCS C240 platform. A hybrid cluster that includes UCS B200 blades will also be supported.

The product will be initially offered as bundles of servers plus Fabric Interconnects, with individual servers available to add-on to an existing cluster. The smallest bundle will consist of three nodes (plus FIs) and is priced at under $60k US, while a four node bundle will be offered for less than $70k.


HyperFlex clusters require a minimum of three converged servers, and in the first release can support as many as eight in the same cluster. Servers will be fully configurable at time of order, so customers won’t be as restricted in their component choices as with many other HCI vendors. It’s also worth noting that different node types can be mixed within the same cluster.

New Fabric Interconnects are required for HyperFlex systems, but there are future plans to allow customers to re-use existing FIs, and/or to incorporate into existing UCS systems, rather than having to purchase new FIs.

Cisco HyperFlex Systems: Software


Unsurprisingly, VMware vSphere is the hypervisor of choice for HyperFlex, and systems will ship with vSphere pre-installed. Hyper-V support will rapidly follow, and the development team is looking hard at both KVM and containers – with the latter a more likely tertiary target than KVM itself.

Storage services

The defining characteristic of HCI – software-defined storage, or as I prefer to call it in this context, aggregated DAS – is provided by Cisco’s HX Data Platform using technology from HCI startup vendor Springpath. Cisco states that their storage implementation is more performant, more resilient, and allows for faster recovery than other HCI platforms.  With HX Data Platform all write data is striped across all nodes simultaneously, rather than the more typical write-local-and-remote methodology.

As has become table stakes for HCI, inline deduplication and inline compression are both provided, though unlike some vendors neither can be turned off but are always enabled.


Cisco, echoing my own experience, says that customers have insisted that they do not want yet-another-management-interface, as is so common in both converged and hyperconverged products, but instead want this new architecture to get incorporated within the context of their current management platforms.  On-going HyperFlex management is comprised of UCS Manager and vCenter, with vCenter expected to be the primary interface for daily operations. HCI operations are embedded within the vCenter UI via the standard UCS Manager plugin and an HX Data Platform plugin. The HX plugin integrates pointer-based snapshots into the vCenter Snapshot Manager, and provides rapid VAAI-assisted VM cloning.

Further down the road are UCS Director and ACI integrations to help Cisco tie HyperFlex into the rest of their datacenter infrastructure and provide full automation and orchestration capabilities (which are lacking in this first release).


With HyperFlex, Cisco is trying to address some of the deficiencies seen by many of the current crop of HCI vendors:

Independent scaling of compute vs. capacity

HyperFlex supports the addition of compute-only nodes into the cluster. These compute hosts connect to shared storage presented by the converged nodes via a proprietary IOVisor software (not to be confused with the open source and networking-focused IO Visor project).

In the initial release, compute-only nodes can be added to a cluster that already has at least four converged nodes, with a total of four compute-only nodes supported in a  cluster. This means that the largest cluster size supported in the first release is twelve: eight converged nodes and four compute-only nodes. Going forward, there will need to be at least as many converged nodes as compute-only nodes.

Cisco has stressed that is a matter of qualification time and cycles to begin supporting larger cluster sizes, rather than a technical or hard limitation.

Complete convergence

As many people have noted, despite the hyper-moniker the current slate of HCI vendors don’t handle networking convergence at all. With UCS Manager and the Fabric Interconnects, Cisco is providing the same level of convergence as with their standard UCS servers, which, to be frank, helped popularize the entire idea of “converged infrastrucure.” In addition, Cisco has the full SDN capabilities of ACI to wrap and extend the solution from the application to the edge – something no one, other than VMware themselves, can do today.


Until we get our hands on it and break play with it, the jury is, of course, still out on HyperFlex. On its face, however, Cisco have taken an interesting approach and appear to have a strong product. If the execution can match the overall design and put meat on the current bones of the roadmap, the HCI space will get very interesting over the next 12-18 months.


Cisco Live Runs on FlexPod

NetApp and Cisco have a long and well-regarded partnership, with the joint FlexPod offering being the best known and marketed. The collaboration between the companies often extends in less well-advertised but no less interesting ways. One that has been a personal highlight for me is NetApp providing the storage for the infrastructure that runs the Network Operations Center (NOC) for five of the last Cisco Live events in the US and Europe. This includes acting as a member of the NOC team both prior to the show and during the event: NetApp personnel arrive with Cisco staff the week before the show begins to setup the environment, and ensure that everything runs smoothly and non-disruptively for the attendees.

The core infrastructure – comprised of FlexPods as we leverage Cisco Nexus switches and UCS servers in conjunction with our NetApp FAS storage – has been relatively small: less than 20 servers and and less than 50TB of provisioned storage.  From a sheer numbers perspective, the majority of the equipment managed by the NOC team is at the edge: 500+ switches and 600-900 wireless access points. (Any and all numbers vary by year and by location. YMMV.) What is common to all of this infrastructure: it must be able to be stood up quickly once on site, it must perform well (as the large number of attendees do their best to test the limits of the environment – whether accidentally or deliberately), and, most importantly, it must be highly reliable and can not go down.

When we started it was with classic 7-mode systems: a mid-range FAS3200 series HA pair with several shelves of SAS drives for production on-site at the event, and a secondary FAS2200 series HA pair for DR and co-located services. Both systems worked well supporting the virtual infrastructure powering the event.



In 2014 we upgraded the production hardware to a FAS8000 series running clustered Data ONTAP along with some new disk shelves. Flash Cache was also included to assist with things like VDI – that year the NOC provided virtual desktops for many of the labs that were being performed at the show. The system continued to work well with zero downtime or performance issues, and providing significant storage efficiencies. We had so much extra space due to NetApp dedupe, thin provisioning, etc. that we even mirrored most data locally between the controllers to provide yet-another level of redundancy (belts, suspenders, and safety pins).




Now we’ve upgraded again: starting with this week’s Cisco Live Europe show in Berlin, the Cisco Live NOC runs on an AFF MetroCluster!

What’s AFF?  AFF stands for “All-Flash FAS” – this is the flash-only version of NetApp’s storage controllers that run clustered Data ONTAP: specifically optimized for low-latency flash performance. While sharing the same OS with our traditional FAS storage arrays enables customers to get all of the benefits of our rich family of integrated data management services, there are now software optimizations for flash that are only enabled in the AFF series, and those optimizations are already showing significant improvements across minor version releases (8.3.0 -> 8.3.1 -> 8.3.2).

Why AFF?  …. why not? During last year’s Cisco Live US we found that the IO load on the existing back-end disks was approaching the point at which contention and undesirable latency would start to be introduced. While the controllers themselves could produce more performance, we would have needed to add more disk shelves in order to provide any significantly increased amount of IOPS. Because we were not capacity bound, it made much more sense to instead replace the SAS drives with SSDs for the best performance possible and the most room for growth (in IO). We could have kept the existing FAS controllers to use with new SSDs – many of our FAS customers have been using hybrid or all-SSD configurations for years – but there was no good reason to not also take advantage of the performance improvements specific to the AFF line of controllers.

What’s MetroCluster? It’s an implementation of NetApp’s FAS (or AFF) storage controllers that provides high availability and disaster recovery across physical sites with zero data loss (zero RPO – recovery point objective) and minimal downtime (low to near-zero RTO – recovery time objective).  In order to achieve zero data loss, of course, you must be performing synchronous writes to two different sets of physical media, and for disaster recovery those sets must be in different physical locations. Because the speed of light is a real limit, in order to perform synchronous writes those two locations need to be relatively near each other so that the round-trip time latencies are acceptable (the controller can’t acknowledge a write operation back to the host until that write is committed at the remote site, not just the local site).  With a maximum supported distance of 200km (for now) you get a cluster that can operate across a “metropolitan” area. Customers have been using MetroCluster to protect their most mission critical data in this fashion for 10 years now.

So why MetroCluster? As I noted above, we had been replicating most of the Cisco Live data locally for an extra level of protection anyway, but, more importantly, for Cisco Live Europe a different need arose: active/active storage across two physical locations. At prior shows, the completely redundant FlexPod environments (as shown in the diagram above) had been located proximal to each other. For the 2016 show the goal was to take advantage of the building layouts at the new location (City Cube in Berlin) to provide even more redundancy by placing half of the infrastructure in each of two different buildings (one FlexPod per building). Very early in these planning stages it became obvious that using an AFF MetroCluster for Cisco Live was simply the right thing to do.


We’re now a few days into Cisco Live Europe 2016, and things are going well. On Friday we’ll be having the traditional NOC panel during the last session slot of the show where we’ll discuss the build-out, how the entire infrastructure (wired, wireless, WAN, datacenter, etc.) has performed, lessons learned, and any interesting statistics.  I’ll also post a follow-up blog about my experiences at the show.

For now, here’s a pic of one of the FlexPods (one half of the core datacenter infrastructure) as we were getting it plugged in on the first day. This was before it was powered on – hence the lack of blinkenlights.






Cisco Champions 2016: NetApp Honorees

On Friday January 29th, Cisco welcomed this year’s honorees for the Cisco Champions 2016 program. While the complete list of award winners has not yet been published, I’m proud to be able to say I’ve been chosen a Champion for the second year.  And yes, even prouder to see other NetApp/Solidfire employees and “extended family” on the list:

  • Chris Reno (@thechrisreno), National Pre-Sales Engineer at ePlus, Inc
  • Dave Cain (@thedavecain), TME for Converged Infrastructures at NetApp
  • Henry Vail, Senior Architect for Converged Infrastructures at NetApp
  • Jarett Kulm (@JK47theweapon and, Principal Technologist at HA Storage Systems and NetApp A-Team member
  • Melissa Palmer (@vmiss33 and, TME for Converged Infrastructures at NetApp
  • Pete Ybarra (@CertiPete), Field Technical Consultant at Avnet and NetApp A-Team member
  • Shawn Lieu (@ShawnLieu), Solutions Architect at Veeam and NetApp A-Team member

If there’s anyone that I’ve missed in the above list, please let me know and I’ll be happy to update & make sure that you’re included.

While a much younger program than the VMware vExpert one, the team at Cisco have done a fantastic job of ramping up quickly and truly building a thriving and interactive community. All the success of the program is due to the hard work, passion, and openness of the both program’s current leaders, Lauren Friedman (@Lauren) and Brandon Prebynski (@Prebynski), and its former stewards, Amy Lewis (@CommsNinja – now Director of Marketing for Solidfire at NetApp) and Rachel Bakker (@RBakker).


Tech Smorgasbord #6

An on-going reference series for interesting technology or projects which deserve further investigation, or for technical documentation (of one media format or another) that looks to be especially good reference material.

There’s been so much good material coming out of late that I’m going to need to put together several of these smorgasbords just to catch up. Here’s the first batch of things I think you’ll find interesting:

Automatic for the People

If you’re into network automation, you might be following the work of Kirk Byers (@kirkbyers). Kirk has been focusing on various tools and methods for automating network devices, such as Ansible, Paramiko, and Python, for awhile now – particularly with Python. His Python for Network Engineers is a good reference, and he routinely teaches classes on that subject – including free-by-email classes, the next of which starts in April. He recently blogged about NAPALM – Network Automation and Programmability Abstraction Layer – in conjunction with Ansible to automate IOS:

NAPALM, Ansible, and Cisco IOS

Another automation project, also utilizing Python and Ansible but originating from VMware, is Chaperone. The new toolkit is targeted at VMware’s SDDC products including vSphere, vCenter, vRealize Automation, vRealize Orchestrator, vRealize Operations, NSX, etc.

Virtually Anything

DoubleCloud Inc., founded by Steve Jin (@sjin2008),  has announced a new “Super vCenter” product called DoubleCloud vSearch that looks pretty interesting: Google search and big data analytics for VMware environments delivered as a single OVA and leveraging a simple HTML5 web UI.

You may also recall his DoubleCloud Interactive Cloud Environment (ICE) product that was launched last year to provide a single console for both CLI & GUI management of vCenter/ESXi environments (and the guests that run in those environments). Both vSearch and ICE are available as 60 day demo downloads, and ICE has a permanently free edition as well.

Keith Tenzer (@keithtenzer) has a really good blog covering Red Hat’s virtualization related technologies such as Red Hat Enterprise Virtualization and OpenStack. His most recent post is a nice write-up on Red Hat Enterprise Virtualization (RHEV) – Management Options.

NetApp News

Stefan Renner (@rennerstefan) has been publishing a number of interesting blog posts of late, with these two covering SnapMirror and Storage Virtual Machine (SVM) DR being of particular note.

How to create mirror-vault and version flexible SnapMirror relationship in CDOT 8.3

How to setup a SVM DR in CDOT 8.3.1 including all configuration and data

NetApp’s very own Andrew Sullivan (@andrew_ntap), co-host of the Tech ONTAP Podcast, has been very productive. He’s churned out a number of great scripting or automation focused blogs (including the first two below and more on Docker in the section), as well as co-writing this recent technical report on SDS from a NetApp/VMware perspective.

cDOT Environment Monitoring Using PowerShell

NetApp PowerShell Toolkit – Templates

TR-4308: Software-Defined Storage with NetApp and VMware

Ed Morgan (@mo6020) has written a handy little post on automating the NetApp simulator using Vagrant:

Using Vagrant to provision the Clustered Data ONTAP vSim

Docker Delights

Mr. Sullivan at work again – this time wearing his Containers Cap with a couple excellent posts on running some NetApp tools inside of Docker:

Putting the NetApp Manageability SDK Into Docker Containers

Perfstat in a Docker Container

Another NetAppian, Jacint Juhaz (@jac1nt), has a nice compendium post around using Docker Swarm on AWS with Cloud ONTAP for persistent data.


Microsoft acquires SwiftKey

SwiftKey has been a must-have on all of my Android devices for years now. It’ll be interesting to see what happens after this acquisition  – trepidation abounds.

Udacity is now offering an Advanced level Deep Learning course developed by Google that’s free for anyone to take so long as they’re willing to put in some time: participants are expected to take approximately 3 months when working about 6hrs/week . It’s part of Udacity’s Machine Learning Engineer Nanodegree program, which is not free overall but  – at $199/month for an expected 10-12 months worth of work – is still pretty affordable, particularly since they promise a 50% refund if you complete & graduate within 12 months. 





VMware vExpert 2016: NetApp Honorees

Last Friday VMware released the official list of the honorees for the VMware vExpert 2016 program. I’m proud to have been chosen for this award for the third year, and even prouder to see how many other NetApp employees, including our new Solidfire brethren, and “extended family” are on the list:

  • Chris Gebhardt (@chrisgeb), vTME and Dr. Desktop, Lord of EUC at NetApp
  • Henry Vail, Senior Architect for Converged Infrastructures at NetApp
  • Joel Kaufman (@thejoelk), TME Director for manageability at NetApp
  • Kyle Murley (@kylemurley), Systems Engineer for Solidfire at NetApp
  • Melissa Palmer (@vmiss33 and, TME for Converged Infrastructures at NetApp
  • Shawn Lieu (@ShawnLieu), Solutions Architect at Veeam and NetApp A-Team member

If there’s anyone that I’ve missed in the above list, please let me know and I’ll be happy to update & make sure that you’re included.