The Evolution of Data Center Fabrics
Evolving needs. Blooming technologies. Massive scale. Unified networks. Infinite bandwidth. Zero latency. Virtual overlays. Blend these all with ice, lemon and a pinch of sugar, and you have the perfect data center cocktail.
DC networks are currently a very hot topic among industry pundits. The disruption of the cloud-enabled world has pushed the performance challenge back to the data center. The era of distributed workloads starts to fade, and all the compute might is now concentrated in the “cloud”. How are systems designs and new technologies going to adapt to this change?
Masters of complexity
As human beings, we have an innate tendency to address problems from within the current paradigm. It’s from time to time that disruptions emerge in such manner that we are forced to take a step back and seek a new solution starting from scratch. We are reluctant to get out of our comfort zone, but when we do, there is usually the payoff of a great leap forward.
When approaching a new challenge, the initial focus is always on getting things to work. This is the comfort zone of the Masters of Complexity: there are our tools, there are our protocols, figure out a way to solve the problem with these tools. And in a timely and efficient manner. That’s what we engineers do, right?
When trying to build upon this, though, design decisions that made sense for the first iteration (the solution of the initial problem) don’t necessarily make sense anymore for the next iterations. At a certain point in time, there will be a need to reframe the current situation and design a new approach. This is precisely the stage at which networking, and more specifically Data Center networking, sits today. Let’s assess the situation, then.
As Berkeley’s Prof. Scott Shenker argues, we have become experts in this art of mastering complexity. If we are to build a new foundation for the next-generation, cloud-oriented services world, it’s time to focus ourselves on extracting simplicity. Simpler problems are easier to understand, analyze and solve architecturally, rather than accidentally. What are the tools that allow this process of extracting simplicity? Abstractions.
The power of abstractions.
Abstractions allow for two fundamental benefits.
First, they enable a modular approach to problem solving. Every module can address a separate part of the problem, which is usually simpler and faster, and therefore it can evolve independently to address future needs without constraints from other modules. This is the “low-coupling, high-cohesion” principle.
Second, they make it possible to integrate existing technologies into the new system, providing a familiar interface to the legacy environment while itself operating in a completely new manner. This makes it easier to use the new technology, and it becomes more likely to have a significant impact.
Therefore, with the right abstractions and the right interfaces we can design simpler, more elegant solutions that can be integrated into the existing world in a way that it’s productive and meaningful. And these new solutions and integrations, if they reach the appropriate level of abstraction, will enable new disruptions.
The key question is what is the appropriate level of abstraction needed for these disruptions to emerge. We find a great example of this with the steps that the computing industry took in the road to server virtualization. Virtual memory was a nice abstraction, but hardly disruptive. Hard drive partitioning allowed for a new storage abstraction: nice, but no revolution ignited from that. Multi-threading? Some sort of CPU virtualization, nothing more. But when the unit of abstraction was moved up the chain to the server itself, things truly changed: this not only provided a consolidation and dollar-cutting benefit, it had a profound impact in the way servers could be operated, while preserving every way of interacting with them. Almost every service, application, network service or system could successfully interact with a virtual server from day one. The right interfaces were in place, and the right abstractions were those that provided benefits further beyond their usual scope.
These ideas are condensed in Nicira’s Martín Casado’s article The Overhead of Software Tunneling, where he argues:
…virtualization of computing hardware preserves the abstractions that were presented by the resources being virtualized. Why is this important? Because changing abstractions generally means changing the programs that want to use the virtualized resources. (…) Virtualization should not change the basic abstractions exposed to workloads, however it nevertheless does introduce new abstractions. However, the big disruption that followed server virtualization was not consolidation but the fundamental change to the operational model created by the introduction of the VM as a basic unit of operations.
We have seen the impact this abstraction, the virtual machine, has made in the computing world. Networking has seen nothing similar to this level of simplicity extraction so far. Now, there is a real need to disrupt this space too, because there is a new value chain, a new “stack” of cloud services that demand a new networking.
The Cloud Stack
Virtual machines, ubiquitous bandwidth and new business models have transformed the landscape and brought upon us the Cloud era, in which the investment model target for new data centers has changed to a pay-per-use, cloud billing at home desire everywhere.
Virtualization, or the abstraction of computing hardware, requires low-coupling from the computing and networking needs: any service can now physically live anywhere, and the network must accommodate to this. Server solutions like Cisco’s Unified Computing System provide this stateless computing platforms up to the server system level. Other providers like HP offer their own solutions. These systems can grow to tens or hundreds of physical servers, depending on the implementation, and within the system they address these ubiquitous networking need, but at some point they need to be interconnected between themselves and to the rest of the world. That is the role of the Data Center fabric.
We could define a fabric as a deterministic, lossless communications platform that provides and guarantees a certain bandwidth, oversubscription and latency conditions to the connected nodes, while offering certain features (QoS, security…). It does not necessarily mean infinite bandwidth, zero oversubscription and no latency, it just has to provide the same service level to every node, so that applications and services can be dynamically moved without communications constraints.
The so-called Cloud stack is the combination of the technologies that make every layer of the value chain inside a data center, that speak to each other and to an orchestrator in order to provide the defining elements of a Cloud: automation, on-demand service provisioning, and tailored per-use billing. This stack, therefore, must use the interfaces that the storage, networking, computing, operating system, service provisioning and business process management layers (read abstractions) provide, in order to orchestrate the delivery of cloud services.
The data center fabric seems to be the glue that puts together every other piece of the Cloud model. It therefore seems important to have this fabric be integrated into this stack in an effective way. This means it must provide a friendly interface, too. Cisco’s recently announced Open Network Environment is a step towards this interface between the networking fabric and the other layers, that provides the needed programmability for automation and customization to take place. There are other initiatives across the industry that also will to do this. Is this the right level of abstraction in order to bring true disruption? Are we too narrow, or too broad, in our abstraction ambitions in order to be meaningful?
Building the new Fabric
The hot question today is how to provide this fabric to the computing, storage and services nodes. Lossless, low-latency, deterministic, feature-rich. And usable.
During the past years, an number of efforts have been overtaken by vendors in order to provide greater stability and performance to existing networks. A summary of these efforts would be:
Topology simplification: via Multi-chassis Link Aggregation (MLAG) technologies, like Virtual Port-Channel, hardware redundancy can be provided while preserving topology simplicity. A single access node connects to what it sees as a single interconnect node, although it physically is split between two boxes. This enables loop-free topologies, which in turn provides STP simplification. And we all know how good is to be liberated from Spanning Tree.
Fabric Extension: Another simplification mechanism, that allows for a single switch to be ripped apart and spread all over the Data Center. This means less boxes, and a certain topological simplification too, but most of all a significant reduction of management burden. Now your top-of-the-rack switch is not really a switch anymore, but a linecard of your interconnect box, and therefore it needs not be managed but from the central switch.
Data Center Bridging: Extensions have been standardized to provide losslessness to Ethernet transport, in order to enable the convergence of all kinds of Data Center traffic upon the same fabric. Fibre Channel, I’m looking at you.
10 Gigabit Ethernet: Raw performance increase that allows for greater VM consolidation within a single server while accommodating every VM’s networking needs.
Some are more prevalent and widely adopted than others, but we could agree that all of these seem like interesting tools to build upon. Still, these are only tools, we need to solve the architectural problem of designing the next-gen fabric.
How to build it then? There are several approaches, but among all two options emerge:
One alternative is to build every desired property onto the fabric itself. A combination of cutting-edge equipment, modern, robust protocols and rich feature sets that are able to move information around blazingly fast, and allow for moving workloads.
The other alternative is to build a raw, “dumb” physical fabric that provides lots of bandwidth and a predictable environment, and provide all features at a new layer on top of this platform, an overlay layer.
The first alternative provides a single abstraction for the whole Data Center fabric. The second provides two, one for the underlying physical fabric, one for the overlay. What is the advantage of each approach?
The full-fledged fabric seems like a simpler, more elegant solution, and it probably is. It is also a more ambitious one. It must accommodate multi-tenancy mechanisms in order to separate traffic from different domains (business areas, customers, services…). It must provide all the features and the speed, and this most likely requires this features to be implemented in hardware, which in turn requires the design of specific ASICs or SoC. It must be comprised of separate hardware elements that behave as one, which requires either “Borg-like” behemoth proprietary megaswitch architectures or modern protocols that replace the good ol’ spanning tree of yore. Of course, to exit this fabric, there must be technologies in place that allow for this interaction with the legacy STP deployments, and MLAG compatible technologies like Virtual Port Channel+ are another small piece of complexity involved in order to provide this interactivity.
This Lord-of-Fabrics should also have Virtual Machine visibility built-in if it wants to properly serve the new atomic unit of the Data Center, and there’s more than one way of doing that being discussed right now. New standard tags, like 802.1BR, no tags, like 802.1Qbg, new views, like SR-IOV… all explained in the links before. This is a far from closed discussion, and there is no standard agreement on the market right now about what path to take for this, although 802.1BR is starting to look nicer and nicer. It addresses all kinds of port-extension, both for VM visibility and for fabric extension. Still looks like a tough situation to be in, though.
For the Overlay Fabric approach the situation looks even worse, at least initially. From the VM’s perspective things look nice: they just are presented with a full traditional network within their overlay, with its L2, its L3 and the usual operational model. Little change is needed here. You just need to insert a small piece of software within the hypervisor and it will route VM traffic to the appropriate overlay tunnel. The physical network will be sandwiched between overlays, opaque to the endpoints’ traffic. The technology used to build this tunnel is not so relevant (GRE, UDP…) as the way of interacting with the rest of the world. And here is where things look ugly, at the interface level.
To exit the “overlay world”, traffic must be delivered to a traditional network via the only possible current way: an 802.1q trunk. So watch out, you’d better take good care of the old environment too, and find a way for your overlay technology to fit within the traditional VLAN limit, or multi-tenancy will be compromised.
This is one of the things that Derick Winkworth discusses in his article “The Sad State of Data Center Networking”:
Virtual-machines talk with each other through the overlays, but to get out to the network they transit an 802.1q trunk into the fabric and ultimately over to their default gateway. Worse yet, “fabric” vendors are developing features in the fabric that integrate with VMware APIs so they can track or otherwise do nifty things with VMs in the fabric. In other words, the state of affairs is such that vendors are accepting the “Inconsistent Network” as a fact of life and they are developing features around it.
What about performance? There are some like Martín Casado who argue (in the same article quoted before) that:
…at the edge, in software, tunneling overhead is comparable to raw forwarding, and under some conditions it is even beneficial. For virtualized workloads, the overhead of software forwarding is in the noise when compared to all of the other machinations performed by the hypervisor.
Technologies like passthrough are unlikely to have a significant impact on throughput, but they will save CPU cycles on the server. However, that savings comes at a fairly steep cost as we have explained before, and doesn’t play out in most deployment environments.
Let’s assume that this is not an issue right now. Still, we have lots of work to do, it would appear.
But there seems to be a lot of buzz around “OpenFlow” and “SDN” that surely mean this is about to change, right? Right?
Software Defined Networking and OpenFlow
These two are the new kids on the block. Suddenly, it’s hip and cool to be “SDN oriented” and to “support OpenFlow”. Let’s take a look inside and see what these actually mean to the modern Data Center fabric.
OpenFlow is a protocol developed in the academic world with the main goal of providing programmability and customization to networking environments. It is an interface by which to instruct our network to do things. A standard programming language for the switches and routers that make your network. Should you desire to implement a new, custom-made network protocol, OpenFlow is the way. Should you want to use a controller-based centralized architecture in which the Forwarding Information Base is pushed to every edge node and all routing decisions are taken centrally by the “brain”, OpenFlow would be a possible way to program this FIB in the remote switches.
Software Defined Networking is a philosophy. It is the vision of a basic-hardware and feature-rich-software combination in which networking and communication needs will be addressed by, basically, general purpose servers with lots of ports running networking specific software.
This vision opens the door to futuristic revolutions like multipurpose wireless chips, but they also mean the barriers of entry for new competitors in the networking space are close to gone: anyone with a good team of distributed systems developers can now become a player in the huge networking space, without the need to manufacture a single box, design a single custom ASIC or managing stock, inventory and production chains.
This is what some new companies like Nicira or Big Switch Networks are offering. They literally want to become the “VMware of networking”. They believe in delivering all the intelligence at the edge, within their software switches embedded in hypervisors or with general purpose boxes loaded with customizable software. And happen to use OpenFlow for that.
SDN is a vision of a new network in which the specific hardware used does not matter. OpenFlow is one of the technologies that this software network could be built upon.
But neither of those by themselves solve the challenges discussed before that DC fabrics should address: how to provide a traditional interface towards the network for both VMs and physical nodes, while at the same time implementing multi-tenancy, stability, losslessness and predictable bandwidth and latency to every node.
Progress has been made in the last several years, and new technologies have enabled the transition from traditional access-distribution-core, spanning-tree based DC networks, to more modern loop free designs. The evolution of these technologies into new protocols could even make some of these technologies next to irrelevant. After all, most of the efforts up to this day have been focused in fixing the shortcomings inherent to bridging. Loops kill bridged networks, so we do away with loops. Great scale suffocates bridged networks, so we split our L2 domains in little pieces. If we do away with bridging itself, there is suddenly no need for these workarounds anymore in the new fabric.
Take TRILL (and its current pre-standard siblings) for example. It’s really MAC routing. With TTLs in order to make loops a non-issue. And optimal forwarding. And fast convergence. In this picture, do we really need MLAG technologies anymore? Don’t we lose some (not all) of the benefits of fabric extension? Plus, multipathing makes it possible to have more than two interconnect nodes, so new designs like leaf-and-spine, with lots of edge nodes connected to many backbone nodes becomes a reality. A two-stage CLOS network built with simpler, smaller nodes, but more resilient overall. Suddenly, you get deterministic latency for free, for every node.
I believe there is a good chance that the technologies that started the transition to cloud-enabling networks will not make it to the end of the journey, and will be superseded by new, simpler solutions.
The picture of the fabric itself as the abstraction unit seems attractive. The picture of an almost self-configuring network that can be extended adding new nodes like Lego blocks seems very flexible. The overlay alternative presents a traditional network and trusts that the underlying fabric will do its job.
Either vision, the Mighty-Fabric or the Overlay-Sandwich, has its pros and cons. Both worlds should in any case evolve together and looking at what the other is doing.
The need for a solid fabric is there anyway for both scenarios, and there is a good chance that the level of functionality required to build this kind of fabric is such that adding a bit of extra functionality in order to handle VM needs and multi-tenancy involves just a bit more work. There is also the possibility that the transition to these fabrics is not so fast, as they involve new networking protocols and, probably, a higher learning curve for network admins and architects. This is more friction, a less friendly interface. And usually involve hardware implementations that work better within a single vendor. Overlay solutions could then gain traction running over existing networks and obviate the need for a better fabric. It would prove that the fabric-as-a-whole abstraction was too beautiful to be true, and that the more disruptive abstraction was the “old network presented in an overlay”, instead of the “new transparent network”.
Software-defined networking is a trend that definitely will help in the construction of this architecture, be it within the physical fabric or at the overlay level. It opens the door to several other sources of input that have been traditionally ignored. Business-logic based forwarding? Automated partial fabric shutdowns based on workload demands? OpenFlow is the effort to standardize how the pieces of the SDN world could talk between themselves, but SDN itself is the enabling trend. I see OpenFlow being to networking what Android is to mobility: a standard, open platform to build value upon, that does not negate the value of more cohesive, single-vendor platforms (iOS).
In the end, the vision that provides the greater potential for operational and business disruption will be the one that is finally adopted by the market at a large. Today, it may be a bit too early to tell which one of these, or if a new one, will mean the bigger change. It’s still important to understand where steps are being taken, in what direction they are headed, and how they are complementary or competitive, in order to discover whether we are using the right level of abstraction, or if we should take another deep breath and re-frame the question once again.
Update: Nicira has just been acquired by VMware. And so it begins. The great battle of our time.