Aside

Cisco Live Runs on FlexPod

NetApp and Cisco have a long and well-regarded partnership, with the joint FlexPod offering being the best known and marketed. The collaboration between the companies often extends in less well-advertised but no less interesting ways. One that has been a personal highlight for me is NetApp providing the storage for the infrastructure that runs the Network Operations Center (NOC) for five of the last Cisco Live events in the US and Europe. This includes acting as a member of the NOC team both prior to the show and during the event: NetApp personnel arrive with Cisco staff the week before the show begins to setup the environment, and ensure that everything runs smoothly and non-disruptively for the attendees.

The core infrastructure – comprised of FlexPods as we leverage Cisco Nexus switches and UCS servers in conjunction with our NetApp FAS storage – has been relatively small: less than 20 servers and and less than 50TB of provisioned storage.  From a sheer numbers perspective, the majority of the equipment managed by the NOC team is at the edge: 500+ switches and 600-900 wireless access points. (Any and all numbers vary by year and by location. YMMV.) What is common to all of this infrastructure: it must be able to be stood up quickly once on site, it must perform well (as the large number of attendees do their best to test the limits of the environment – whether accidentally or deliberately), and, most importantly, it must be highly reliable and can not go down.


When we started it was with classic 7-mode systems: a mid-range FAS3200 series HA pair with several shelves of SAS drives for production on-site at the event, and a secondary FAS2200 series HA pair for DR and co-located services. Both systems worked well supporting the virtual infrastructure powering the event.

 

CLUS2014

In 2014 we upgraded the production hardware to a FAS8000 series running clustered Data ONTAP along with some new disk shelves. Flash Cache was also included to assist with things like VDI – that year the NOC provided virtual desktops for many of the labs that were being performed at the show. The system continued to work well with zero downtime or performance issues, and providing significant storage efficiencies. We had so much extra space due to NetApp dedupe, thin provisioning, etc. that we even mirrored most data locally between the controllers to provide yet-another level of redundancy (belts, suspenders, and safety pins).

CLUS2015_NOC_capacity

 


 

Now we’ve upgraded again: starting with this week’s Cisco Live Europe show in Berlin, the Cisco Live NOC runs on an AFF MetroCluster!

What’s AFF?  AFF stands for “All-Flash FAS” – this is the flash-only version of NetApp’s storage controllers that run clustered Data ONTAP: specifically optimized for low-latency flash performance. While sharing the same OS with our traditional FAS storage arrays enables customers to get all of the benefits of our rich family of integrated data management services, there are now software optimizations for flash that are only enabled in the AFF series, and those optimizations are already showing significant improvements across minor version releases (8.3.0 -> 8.3.1 -> 8.3.2).

Why AFF?  …. why not? During last year’s Cisco Live US we found that the IO load on the existing back-end disks was approaching the point at which contention and undesirable latency would start to be introduced. While the controllers themselves could produce more performance, we would have needed to add more disk shelves in order to provide any significantly increased amount of IOPS. Because we were not capacity bound, it made much more sense to instead replace the SAS drives with SSDs for the best performance possible and the most room for growth (in IO). We could have kept the existing FAS controllers to use with new SSDs – many of our FAS customers have been using hybrid or all-SSD configurations for years – but there was no good reason to not also take advantage of the performance improvements specific to the AFF line of controllers.

What’s MetroCluster? It’s an implementation of NetApp’s FAS (or AFF) storage controllers that provides high availability and disaster recovery across physical sites with zero data loss (zero RPO – recovery point objective) and minimal downtime (low to near-zero RTO – recovery time objective).  In order to achieve zero data loss, of course, you must be performing synchronous writes to two different sets of physical media, and for disaster recovery those sets must be in different physical locations. Because the speed of light is a real limit, in order to perform synchronous writes those two locations need to be relatively near each other so that the round-trip time latencies are acceptable (the controller can’t acknowledge a write operation back to the host until that write is committed at the remote site, not just the local site).  With a maximum supported distance of 200km (for now) you get a cluster that can operate across a “metropolitan” area. Customers have been using MetroCluster to protect their most mission critical data in this fashion for 10 years now.

So why MetroCluster? As I noted above, we had been replicating most of the Cisco Live data locally for an extra level of protection anyway, but, more importantly, for Cisco Live Europe a different need arose: active/active storage across two physical locations. At prior shows, the completely redundant FlexPod environments (as shown in the diagram above) had been located proximal to each other. For the 2016 show the goal was to take advantage of the building layouts at the new location (City Cube in Berlin) to provide even more redundancy by placing half of the infrastructure in each of two different buildings (one FlexPod per building). Very early in these planning stages it became obvious that using an AFF MetroCluster for Cisco Live was simply the right thing to do.


 

We’re now a few days into Cisco Live Europe 2016, and things are going well. On Friday we’ll be having the traditional NOC panel during the last session slot of the show where we’ll discuss the build-out, how the entire infrastructure (wired, wireless, WAN, datacenter, etc.) has performed, lessons learned, and any interesting statistics.  I’ll also post a follow-up blog about my experiences at the show.

For now, here’s a pic of one of the FlexPods (one half of the core datacenter infrastructure) as we were getting it plugged in on the first day. This was before it was powered on – hence the lack of blinkenlights.

NOC_FlexPod

 

 

 


 

Advertisement

Cisco Champions 2016: NetApp Honorees

On Friday January 29th, Cisco welcomed this year’s honorees for the Cisco Champions 2016 program. While the complete list of award winners has not yet been published, I’m proud to be able to say I’ve been chosen a Champion for the second year.  And yes, even prouder to see other NetApp/Solidfire employees and “extended family” on the list:

  • Chris Reno (@thechrisreno), National Pre-Sales Engineer at ePlus, Inc
  • Dave Cain (@thedavecain), TME for Converged Infrastructures at NetApp
  • Henry Vail, Senior Architect for Converged Infrastructures at NetApp
  • Jarett Kulm (@JK47theweapon and jk-47.com), Principal Technologist at HA Storage Systems and NetApp A-Team member
  • Melissa Palmer (@vmiss33 and vmiss.net), TME for Converged Infrastructures at NetApp
  • Pete Ybarra (@CertiPete), Field Technical Consultant at Avnet and NetApp A-Team member
  • Shawn Lieu (@ShawnLieu), Solutions Architect at Veeam and NetApp A-Team member

If there’s anyone that I’ve missed in the above list, please let me know and I’ll be happy to update & make sure that you’re included.

While a much younger program than the VMware vExpert one, the team at Cisco have done a fantastic job of ramping up quickly and truly building a thriving and interactive community. All the success of the program is due to the hard work, passion, and openness of the both program’s current leaders, Lauren Friedman (@Lauren) and Brandon Prebynski (@Prebynski), and its former stewards, Amy Lewis (@CommsNinja – now Director of Marketing for Solidfire at NetApp) and Rachel Bakker (@RBakker).

CiscoChampion2016_small

Tech Smorgasbord #6

An on-going reference series for interesting technology or projects which deserve further investigation, or for technical documentation (of one media format or another) that looks to be especially good reference material.


There’s been so much good material coming out of late that I’m going to need to put together several of these smorgasbords just to catch up. Here’s the first batch of things I think you’ll find interesting:


Automatic for the People

If you’re into network automation, you might be following the work of Kirk Byers (@kirkbyers). Kirk has been focusing on various tools and methods for automating network devices, such as Ansible, Paramiko, and Python, for awhile now – particularly with Python. His Python for Network Engineers is a good reference, and he routinely teaches classes on that subject – including free-by-email classes, the next of which starts in April. He recently blogged about NAPALM – Network Automation and Programmability Abstraction Layer – in conjunction with Ansible to automate IOS:

NAPALM, Ansible, and Cisco IOS

Another automation project, also utilizing Python and Ansible but originating from VMware, is Chaperone. The new toolkit is targeted at VMware’s SDDC products including vSphere, vCenter, vRealize Automation, vRealize Orchestrator, vRealize Operations, NSX, etc.


Virtually Anything

DoubleCloud Inc., founded by Steve Jin (@sjin2008),  has announced a new “Super vCenter” product called DoubleCloud vSearch that looks pretty interesting: Google search and big data analytics for VMware environments delivered as a single OVA and leveraging a simple HTML5 web UI.

You may also recall his DoubleCloud Interactive Cloud Environment (ICE) product that was launched last year to provide a single console for both CLI & GUI management of vCenter/ESXi environments (and the guests that run in those environments). Both vSearch and ICE are available as 60 day demo downloads, and ICE has a permanently free edition as well.

Keith Tenzer (@keithtenzer) has a really good blog covering Red Hat’s virtualization related technologies such as Red Hat Enterprise Virtualization and OpenStack. His most recent post is a nice write-up on Red Hat Enterprise Virtualization (RHEV) – Management Options.


NetApp News

Stefan Renner (@rennerstefan) has been publishing a number of interesting blog posts of late, with these two covering SnapMirror and Storage Virtual Machine (SVM) DR being of particular note.

How to create mirror-vault and version flexible SnapMirror relationship in CDOT 8.3

How to setup a SVM DR in CDOT 8.3.1 including all configuration and data

NetApp’s very own Andrew Sullivan (@andrew_ntap), co-host of the Tech ONTAP Podcast, has been very productive. He’s churned out a number of great scripting or automation focused blogs (including the first two below and more on Docker in the section), as well as co-writing this recent technical report on SDS from a NetApp/VMware perspective.

cDOT Environment Monitoring Using PowerShell

NetApp PowerShell Toolkit – Templates

TR-4308: Software-Defined Storage with NetApp and VMware

Ed Morgan (@mo6020) has written a handy little post on automating the NetApp simulator using Vagrant:

Using Vagrant to provision the Clustered Data ONTAP vSim


Docker Delights

Mr. Sullivan at work again – this time wearing his Containers Cap with a couple excellent posts on running some NetApp tools inside of Docker:

Putting the NetApp Manageability SDK Into Docker Containers

Perfstat in a Docker Container

Another NetAppian, Jacint Juhaz (@jac1nt), has a nice compendium post around using Docker Swarm on AWS with Cloud ONTAP for persistent data.


Miscellania

Microsoft acquires SwiftKey

SwiftKey has been a must-have on all of my Android devices for years now. It’ll be interesting to see what happens after this acquisition  – trepidation abounds.

Udacity is now offering an Advanced level Deep Learning course developed by Google that’s free for anyone to take so long as they’re willing to put in some time: participants are expected to take approximately 3 months when working about 6hrs/week . It’s part of Udacity’s Machine Learning Engineer Nanodegree program, which is not free overall but  – at $199/month for an expected 10-12 months worth of work – is still pretty affordable, particularly since they promise a 50% refund if you complete & graduate within 12 months. 


 

 

 

 

VMware vExpert 2016: NetApp Honorees

Last Friday VMware released the official list of the honorees for the VMware vExpert 2016 program. I’m proud to have been chosen for this award for the third year, and even prouder to see how many other NetApp employees, including our new Solidfire brethren, and “extended family” are on the list:

  • Chris Gebhardt (@chrisgeb), vTME and Dr. Desktop, Lord of EUC at NetApp
  • Henry Vail, Senior Architect for Converged Infrastructures at NetApp
  • Joel Kaufman (@thejoelk), TME Director for manageability at NetApp
  • Kyle Murley (@kylemurley), Systems Engineer for Solidfire at NetApp
  • Melissa Palmer (@vmiss33 and vmiss.net), TME for Converged Infrastructures at NetApp
  • Shawn Lieu (@ShawnLieu), Solutions Architect at Veeam and NetApp A-Team member

If there’s anyone that I’ve missed in the above list, please let me know and I’ll be happy to update & make sure that you’re included.

 VMW-LOGO-vEXPERT-2016-k