What is Open Converged?

This past December I made a switch and started to work for Datrium (found here).  One of the most common things I hear in a first meeting with a prospect is "What is this open converged stuff?".  What I normally do is do a very brief history of converged to explain how we got to this moment. 

Starting about 5-6 years ago the conversation from companies was Converged Infrastructure, or CI.  This meant that you continued to buy servers from a vendor, storage from a vendor (probably a separate vendor from the compute), networking, etc.  But you bought them all at once, converging also your budget.  You then had an integrator or a partner wrap them all together in a rack with a listing of approved software levels for all hardware and VMware combined.  For some it took the inconsistency of building environments off the table, but it still forced you into a situation where each item was managed independently.  And when it's time to upgrade, you again have to upgrade the entire stack because the old stack isn't supported.  It also kept you living in a world of RAID, pools, and LUNs/volumes.  Data migrations are painful for many organizations, so ripping and replacing can take months.  And in the end the biggest bottleneck was your storage controller.  You either had dual controllers that were active/active, but you could never use more than 50% of one because that would mean you would be in trouble during a controller failure/crash.  Or you have an active/passive architecture, and you're just having the other controller pass front-end IO to the primary node and waiting in case of emergency.  The whole process needed some optimization.

Hyper Converged came about from the 2000's tech companies, like a Google for instance.  The idea here was that converging your infrastructure wasn't enough.  How do we take advantage of flash as more than just a cache tier and put it right next to the application?  How do we provide a single management interface for the entire stack so folks aren't jumping around on consoles?  And so from a simplicity standpoint, HCI was a big benefit to VM admins everywhere.  They could live in vCenter and do all their work from a single console.  But with that came some additional issues.  Vendor lock in continued, especially as it related to nodes.  Nodes had to be the same to be part of the same cluster.  Rich data services, like protection, dedupe and compression, came at the expense of VM performance.  That means people are forced to choose between lowering their storage footprint or having more optimal VM performance.  It also meant I needed to keep multiple copies of the same data on different nodes to prevent outages in the case of node failure.  That architecture also created a lot of write amplification, where a write to one node had to be copied to multiple other nodes to give you the same resiliency of your own SAN.  That created situations where the node to node traffic could be up to 75% of the overall HCI network traffic.  And lastly, you have to juggle this compute/storage combination and understand you'll either have a lot of compute you can't use...or a lot of storage you don't.  It became pretty wasteful to Infrastructure budgets.

Open converged came about as an idea as the Internet companies like Google and Facebook started to understand the limitations of the above architectures and wanted more of a rackscale like architecture.  Think of that like storage at the top of rack with compute nodes underneath driving it.   So around 2010 this began to develop as an idea.  That is more what Datrium is.  Think of your compute in your VMware environment.  How utilized are your processors?  Most studies have shown that the average customer is somewhere between 25-35%.  Wouldn't it be effective to take that excess compute to drive the performance of your environment?  How do we leverage flash at the host for performance?  And how do I keep with those data services that I love from my SAN (and maybe add a few more?)  And that is where Datrium starts.  I think it's best to use a few bullets here to explain (high level) what we do.

  • Processing Power - We will install our software on your ESX host, KVM host, or Docker host that will leverage your existing free CPU to drive I/O.  For every core you give our software (minimum of 2.5 required), we will give you 10k IOPS.  And that is per host, so your performance in a system that can scale to 128 nodes can meet any workload.
  • Host flash - HCI got it right that flash is best at the host.  With a minimum of 2 flash drives at the host, we can store the entire hosts VMs in flash and serve all reads from there.  We also dedupe, compress, and encrypt at the host.  If you are an average environment, that means we are serving 70% of your data traffic local to the host that is requesting it.  And we are doing that at the cost of server side flash, which is a fraction of the cost of SAN flash.
  • Durable data - We provide a data node, either all flash or with SATA drives.  This is your standard SAN architecture:  Dual controller, dual power supply, battery backed up, with NVRAM in both controllers.  All writes in your system go to this node, and this is where we do snapshots and replication to other DVX systems or...the cloud!  This is an NFS node that can scale out to 10 total, and with our data reduction you can grow to over 1PB of data in a single namespace.  No volume management, no LUNs.   This is completely VM/Container aware.  And we globally dedupe (more on that on a future blog post).  
  • Snapshots/Backup - On top of all of that, we have policy based snapshots and replication built in.  This means I can build a policy for all my SQL servers that have *SQL* in the name and they are added automatically to the policy.   A single policy can define hourly, daily, weekly, and monthly backup/retention policies while also defining replication of these snapshots to other DVX systems or, as mentioned before, to the cloud (another future blog post)

As you can tell, there is a lot to digest here.  But this high level detail should help introduce you to Datrium and give you enough information to understand who we are.  In the future I'm going to take each of these separate items and dive in deeper as to what the benefit to customers is. 

But in the end ask yourself, isn't a true converged offering something that allows you flexibility?  The ability to mix and match compute based upon workload type while sharing the same storage?  That adds performance every time you add compute, and adds bandwidth and more capacity separately from that?