Yes, it's that time of the year again -- EMC World!
I don’t do many industry events, but EMC World is mandatory for me — if you want to seriously talk storage (and everything that touches it!), it’s in a league all by itself.
The event, the people, the content -- all top-shelf.
Now that I’m part of VMware’s storage and availability group, EMC World is particularly important, as it’s our best opportunity to meet storage and data protection professionals face-to-face -- and ask serious questions.
The hot topic? SDS -- software-defined storage -- something VMware is extremely focused on.
One of the events I’m sponsoring is a VMware Storage and Availability Roadmap Review Session: 90 minutes, small group format, all-NDA. You won't find it on the EMC World agenda, as we're doing this separately.
Last year, we ran similar sessions that went very well — and much of the input shaped what eventually was announced as VSAN.
This year, we’d like to do it again. Lots of great stuff in the pipeline, but we need your perspective.
If you’d like to participate in a wide-ranging discussion (and potentially influence the direction of our roadmap!), here’s what you’ll need to know:
- 90 minute session, multiple storage and availability topics, small group format
- by reservation only, as last year we were over-subscribed
- presentation will be led by Vijay Ramachandran, facilitated by me
- we’ll be in a nice guest suite in the Venetian (very short walk from the floor)
- designed for VMware customers and channel partners
- sorry, no technology vendors this time due to space limitations.
If this sounds like something you’d be interested in — please drop me an email at chollis (at) vmware (dot) com. We’ll do our best to accommodate everyone’s busy schedule.
Looking forward to a great discussion!
Like this blog post? Why not subscribe via email?
My blog has been off-the-air for the last six days or so. Thousands of daily visitors would either get an error message, or perhaps a mangled version of text only.
My blog service provider -- TypePad -- successfully fended off a massive DDoS (distributed denial of service) attack, but it took them many days to do so. Thousands of their customers could only watch helplessly, day after day, and hope for the best.
After all, our options were limited :)
While the final story has yet to come out, there is strong evidence that this was an extortion attempt. Ransom was demanded -- and refused. Several similar incidents have already occured, more are inevitable.
A couple of thoughts?
First, I realized just how highly dependent I had personally become regardling certain online services: the blog, my calendaring, sync-and-share, financial, travel, etc.
Take any one of them away for a few days, and I'm in a world of hurt -- with no Plan B. Worse, I don't have any good ideas on how to mitigate the risk going forward without serious complexity and cost.
Second, I had to wonder -- how many online services are prepared to respond to a similar extortion attempt? I talk to many folks who are involved in disaster recovery and business continuity -- I'd be curious on just how many recognize this relatively new threat, and have succcessfully prepared themselves -- or have even tested their capabilities.
When information is the new wealth, we're all potential victims of digital extortion.
Welcome back to an ongoing series, exploring the new world of software-defined storage.
What Is A Data Service?
If you’d like to catch up, please take a moment to read the previous posts:
“Introducing The Software-Defined Storage Series”
“Why Software-Defined Storage Matters”
“Building The SDS Conceptual Model — Part 1”
In the last post, we introduced the notions of applications, their containers -- and policy. We also discussed how policy is interpreted by the control plane, while mediating access to services and providing the required perspective to multiple stakeholders.
In this post, we’ll extend our SDS model to discuss data services (snaps, dedupe, etc.) as well as the data plane where data is physically stored and persisted.
To briefly recap our SDS model, we started with applications, their containers, and policies attached to each that list specific requirements for each.
Policies are interpreted by the control plane, which provisions services needed by the application, mediating access to resources, and providing storage-related views to multiple roles within the organization.
The next stop on our journey is data services.
If I were to attempt a definition, it would be “something that alters the state of storage”. Yes, that’s vague.
It’s far easier to make a partial list of familiar data services:
- snaps, clones, remote replication, federation, geo-dispersion, etc.
- caching, tiering, striping, etc.
- dedupe, compression, thin provisioning
- backups, archiving to another location, etc.
- encryption, compliance, auditing, etc.
- layered presentations and/or protocols: file over block, object over file, etc.
Note that none of these data services actually “store” (i.e. persist) data itself — that’s done by the next layer down in the data plane — instead, these data services add value beyond simply storing and persisting 1s and 0s.
In This Model, Data Services Are Now Separate
Calling out data services as a distinct layer — separate and independent from the data plane and control plane — is an important feature of this SDS model.
And it will be very controversial to many.
Historically, we’ve grown up mostly with all-in-one array solutions: an external array might present NFS, with its snaps etc. implemented underneath the covers and bound tightly to a specific array. Or a fibre-channel block array that had its own remote replication or tiering mechanism.
Why the big change here? Why put data services above the data plane?
The big reason is uniformity.
If we truly want to uniformly compose dynamic data services without regard to the underlying hardware, we’re going to want a standard set of data services that aren’t bound to a specific piece of external hardware.
Everyone who’s worked with storage stuff knows that — while certain data services might look the same on different arrays — they are certainly not implemented the same! A snap is not a snap is not a snap as you move across different arrays.
Separating data services from how data is actually stored creates powerful ancillary benefits as well: for example, being able to stack data services, as well as dynamically invoking the resources required to implement a data service, align with application boundaries, and more. All of those are nice, but the real big deal is things work the same way, regardless of what you’re doing — or on what storage you’re using.
To be clear, nothing in this SDS model precludes the use of data services that are tightly bound to a specific piece of hardware; but the strong belief is that — over time — the preference will be for data services that are independent of the data plane.
Stackable Data Services?
Yes — that’s the goal. An application container’s policy should be able to dynamically compose a stack of data services that work together in a logical manner.
Here’s this application data container.
I’d like it to be cached and tiered, a remote copy made for disaster recovery, continuous data protection to guard against corruption, old data archived out to a separate location, have it be encrypted as it’s sensitive — and an audit trail please, as we have auditors to report to.
But please do this just for this one application container, and not a bunch of other stuff that doesn’t need it. Don’t make my choices today difficult to change in the future.
And ideally do this regardless of the hardware being used :)
Now, to be fair, the vast majority of data services in use today are quite array-specific. Indeed, that’s where a lot of the “secret sauce” originates in the array business.
But — just as we’re starting to see software-only implementations of server arrays (sometimes dubbed server-side storage, or server SANs) -- we will inevitably see more software-only implementations of data services that are quite agnostic to their back end.
Next: The Data Plane
As we get to the bottom of the software-defined storage model, at some point we actually have to store (or persist) data. Here we also have distinct choices regarding our dynamically composed storage service: capacity, performance, cost, redundancy, sharing attributes, etc.
It’s useful to speak of “capabilities” of a given data plane: how much, how fast, how protected, how costly, etc. Ideally, these capabilities are potentials, and not actually allocated or provisioned until requested — as a composable service delivered on demand.
In our SDS model, these potential capabilities are exposed for consumption, and then instantiated when requested, driven by an application container’s policy choices.
Change the policy, change the physical instantiation.
Data Plane Choices
The SDS model we’re building here has three choices for a data plane.
One, of course, is the familiar external storage array: many choices on protection, performance enhancement, etc.
A second option would be to persist data using an external cloud service, in which case protection and other mechanisms would be the responsibility of the service provider.
And the third choice would be a software-only storage stack that runs on standard servers, with VMware’s VSAN being but one example.
As long as each data plane provider is dynamically composable — as a service — using key policy attributes like capacity, redundancy, performance, cost, etc. — each would qualify as software-defined storage under the model being presented here.
It should also be noted that any data plane has one or more “personalities” (data structure and protocol) in how the storage capacity is presented: blocks, filesystem, objects, key-value, etc.
But that’s not the end of the story, as it is possible that an upper-level data service may impose a different personality than the one that is native to the data plane. EMC’s ViPR, as one example, can present files over blocks, or objects over files. NetApp’s V-Series is yet another example.
Software-Based Vs. Hardware-Based
There will be some purists who will inevitably recoil that the software-defined storage model presented here allows for purpose-built hardware as part of the definition.
Before you decry me as a vendor shill, please consider the following:
- With this SDS model, the focus is on dynamically composable (and stackable) storage services, driven by application-centric policies. Whether or not you choose to implement those storage services entirely in (virtualized) software, or entirely in purpose-built storage arrays, or more likely some combination — it’s basically the same functional architecture.
- While we could debate the pros and cons of doing things one way or another, those are implementation choices, and not architectural ones. For example, the palette of potential data services available today are very rich and mature when it comes to external storage arrays; and somewhat less so when considering newer software-only solutions.
Our model for software-defined storage should allow for various implementations, and not religiously enforce one over the other unless there’s a clear architectural reason for doing so.
Choices Still Abound
For many attributes in this model, there are still overlapping choices of where specific functionality goes.
One example: let’s say you’d like two copies of data for redundancy purposes. Should that go (a) in the data plane, (b) provisioned as a data service, or perhaps (c) something done by the application itself?
You could repeat that question for deduplication, caching, tiering, personality, etc. I think the tradeoff will end up being standard capabilities used consistently vs. specific and unique attributes used selectively.
Clearly, there is no “right” answer; it depends. But as long as the capability could be dynamically composed via programmatic interfaces (and done so aligned to application boundaries), it would qualify as software-defined storage using the model presented here.
In the next post, we’ll take a look at how this SDS model differs from the ones in use today — and what the impacts might be.
Like this post? Why not subscribe via email?
If we’re going to dig into software-defined storage, we’re going to need a conceptual model — just so we can keep the discussion organized.
The particular model I’ll be using for this discussion is the one VMware currently uses.
Vendor bias aside, I’ve personally found it the most useful model out there for explaining not only software-defined storage, but exposing important differences as compared to the way things are done today.
The model itself is not bound to any specific technology — but you will find many aspects already implemented in VMware’s current product set.
And an open invitation: if someone has a better model, please share it!
Models Do Matter
I think of a conceptual model as a precursor to an architectural model. The conceptual model details the functions and how they’d ideally interact; the architectural model instantiates them into a specific set of technologies and use cases.
As with any model, you’ll certainly find familiar functions and concepts — but here they are grouped and abstracted in different ways than you might expect.
If you’re new to this series — and are willing to do some prep — you might want to read this post and this post.
Enterprise architects know that how you group and abstract functionality is critically important. It’s not just drawing pretty pictures on a whiteboard. It’s serious stuff; and not lightly considered.
Behind every IT architectural model, there’s always an organizational one implied. Change the architecture; you’ll end up changing how the organization uses it.
A good architecture often fails in the hands of a poor organizational model; great organizational models can be greatly hampered by poor architecture.
I think this is an extremely relevant point, as so many models I see are explicit accommodations to the in-place way of doing things. That’s dangerous territory in my book: best to envision a better model, and then be clear about the organizational implications. Work back from that point as needed; don’t start with a long list of compromises.
So let’s begin?
Application Centricity — And Policy
It’s not just a buzzword, it’s largely why IT exists in the first place: to deliver the applications that people want to use. For a discussion of software-defined anything, it’s the logical starting point.
In any SDDC (or SDS) discussion, we start with the needs of applications (or logically related groups of applications, if you prefer), and work downwards. At a high level, I think of an application as comprised of logic, data, resources required — and policy.
It is helpful to distinguish more precisely between application logic, and the “container” of information, resources and services it uses. While there are cases where the application logic itself might potentially take direct control of the resources it needs, I would argue this approach is frequently undesirable: brittle, inflexible, inefficient and very challenging to code.
One of the key ideas in SDDC is that policies are instead bound to application containers (e.g. virtual machines). Those policies are then used to drive the behavior of everything else: resources needed, services required, security — you name it.
Think of a bar code affixed to a package.
The bar code describes its contents, and how it must be handled. And you don’t need to open up the package every time a decision needs to be made.
It’s a simple and elegant construct.
Change a policy, and you change a behavior of an application’s container. Establish compliant policies, and verifying compliance becomes that much easier. Application developers are not disenfranchised — they can specify external requirements (e.g. redundancy, performance, etc.) without needing to define the implementation of those policies.
Going back to previous notions of composability, it should be clear that a policy drives the composition of supporting services. Think of a policy statement as a build manifest - or a blueprint - exactly what’s needed for a specific application, and at a specific point in time.
Storage and Policy
If we think about storage, there’s a very long list of attributes that could potentially be part of an application container’s policy: capacity, performance, availability, protection, encryption, retention, compliance, cost optimization, geographical location — and that’s only a partial list!
In our software-defined storage world, we’d ideally be able to dynamically compose a set of storage services needed at a particular point in time - without regard for what specific hardware is being used, how it’s currently configured, etc.
The notions of policies and composability are very extensible.
New policies — reflecting new requirements and new compositions — can be slightly modified versions of existing ones. Going farther, policies can be conditional, as in if heavy-demand and end-of-month, then allocate-more-performance. Or if requested-service-not-available, try next-best-approach.
I also believe this approach to be reasonably future-proof.
New technologies — what ever they may be — are either new services that can be composed via policy, or better implementations of existing ones.
Yesterday, my highest-performance-possible policy was implemented via stripe-on-15K-disks. Today, it might be cache-using-flash, and tomorrow it might be use-all-flash.
Granularity Matters, Perspective Matters
I believe the ability to dynamically apply specific policies to specific application boundaries is an essential defining characteristic of software-defined storage. Individual application requirements change frequently, and simply provisioning vast buckets of pre-defined capabilities certainly isn’t efficient, effective or responsive.
But that raises the question — where in the stack can one obtain the required clear view of application boundaries — as well as the resources and services they touch?
VMware believes the hypervisor (vSphere in this case) is in an architecturally privileged position to interpret and enforce per-application policies.
It sees the application itself (actually, the application container), as well as can dynamically arbitrate all the resources and services an application may be using.
I can’t disagree with this perspective — trying to do it “somewhere else” in the stack makes your head hurt. As a specific example: in the storage array world, it is always devilishly tough to discover application boundaries, and interpret policies. “Something else” would have to do it on behalf of the storage array — an administrator, management software, etc.
I can only assume the network world struggles with the same challenge.
Control Planes And Policies
The next stop on our software-defined storage tour is the layer responsible for interpreting policies, and then translating them into the required composed resources and services.
You might call this layer “management” or something similarly non-descriptive; many of us now call it the control plane, or — more accurately — control planes, as there are always multiple points of control and monitoring in any large environment.
Here is where the balancing act occurs between supply and demand. Here is where issues and problems are resolved, ideally in a manner transparent to both application and user. Here is where we continually automate — and re-automate — to the greatest degree possible.
There is nothing wrong with the notion of a "single pane of glass" — it’s just that everyone want their own! Regardless of whether we’re considering a traditional IT organizational model, or a newer IT-as-a-service model — you can clearly delineate multiple points where storage has to appear in context with other relevant items.
One Thing — Many Perspectives
Let’s consider storage as an example of this.
If your operational model defines a dedicated storage administrator, they’re certainly going to want a storage-centric view of their world: what’s on the floor, who’s using it, how is it configured, how is it performing, etc.
If your model defines an infrastructure-as-a-service delivery manager, that person is going to want to see storage in the context of compute and network — all delivered as a consumable infrastructure service.
If you have application administrators and database administrators, the story will be familiar — they too will want to see storage in the context of their application and their database.
If you have dedicated availability administrators (data protection, business continuity, etc.) they will want to see storage in the context of applications and the services that protect them.
If you provide chargeback or showback, the portal will inevitably include storage services provisioned and consumed. Finance will want to understand the costs of storage services delivered. Compliance will want to understand how compliance policies are being enforced. Capacity planning will want forecasting models and constraints.
And that’s just a partial list.
Here is my point: it is unrealistic to think in terms of a single point of control when it comes to software-defined storage, or any dynamic, composable service for that matter. There will be many points of control, bounded by operational constraints: some passive observers, others able to change policy and thus change the composition of services and resources consumed.
The model for who-can-do-what must not be static either; it should quickly evolve and be as dynamic as applications themselves. The natural tendency will be to empower more organizational functions as needs evolve and maturity increases. When it comes to division of responsibilities, there are no right answers, only temporary solutions.
The implication? No more storage kingdoms.
Next up: data services and the data plane.
Like this post? Why not subscribe via email?
We live in an information age.
All those zettabytes of ones and zeros need to live somewhere. If they are to be of any value, they must be stored, protected and managed. The more information we produce, consume — and depend on — the more storage matters.
This was true twenty years ago; it will be true twenty years hence.
At the same time, it appears that software is eating our world: extending the power of human intellect in ways that continually surprise us — now often powered by the avalanche of information we are creating about ourselves and the world around us.
In particular, software is transforming how we think about data centers: the technologies and operating principles that enable us to produce, consume and act on information quickly and efficiently.
Software is inevitably changing core data center technologies — compute, network and storage — both individually and how they work together.
I believe this is what makes software-defined storage an interesting and relevant question for IT architects: how can we use software to become far better at storing, protecting and managing information?
These people are thinking about the future, and what it might bring.
The Expanding Digital Universe
It’s not hard to come to the conclusion that — yes — we’re all generating and consuming more information. But what may not be as obvious is where it’s all coming from — and how we will be expected to harness it.
The recent EMC/IDC Digital Universe report offers an excellent overview, giving good insights into the shape of the information world to come: dramatically more information from sensors and mobile devices, the growing need to extract analytical insights in near-realtime, as well as the obligation to secure and protect it from new threats.
Business leaders recognize the shift, and are starting to think differently.
An organization’s information base is now a critical asset on the balance sheet; the basis for new services, improved efficiency as well as competitive differentiation. In the information age, data matters.
The vast majority of this enterprise information wealth will inevitably be the responsibility of enterprise IT groups. While you can certainly outsource capacity; you can’t outsource accountability.
Challenges In The Here And Now
But we don’t have to look to an emerging world to make a case for relevancy; there are serious concerns in today’s pragmatic IT world: there’s an awful lot spent on storing, protecting and managing information, resulting in a continual search for new and innovative ways to get more value from every IT dollar.
While it’s true that unit costs for storage media will continue to drop, this modest cost is only a small component in a complex value chain that delivers consumable storage services to applications.
I pay $4 for a cup of coffee at Starbucks; the cost of the raw coffee beans is but pennies.
Indeed, to get an accurate lifecycle view of storage costs, we need to think less as technologists and more as manufacturers and retailers: considering labor, overhead, efficiency, distribution, consumption — and rapidly responding to changing demands.
The ultimate value of software-defined storage is greatly magnified when taking this broader view. And as information volumes continue to double and double and double again, more IT organizations will shift their view from the cost of the ingredients to the value of the service delivered.
“Software-Defined” — In One Word
What makes something software defined?
Definitions vary, but I have mine: composability — the ability to dynamically compose a service, on demand, and under programmatic control. If it helps, think of an automated factory line that builds each item to spec: quickly, accurately and efficiently.
Why is composability so important? Three reasons: efficiency, effectiveness and responsiveness.
Without composability, the catalog of services is largely static, and must be forecasted and provisioned far in advance. I call this “have a hunch, provision a bunch”. Since coming up short is not a good thing, over provisioning is the norm. And that’s not efficient.
Without composability, a given application’s requirement must “fit” within the confines of a pre-defined catalog of services. Making changes or adjustments is costly and time-consuming.
I wear a size 10.5 shoe. If the store only has size 10 or 11, I have some hard choices to make.
Composability ensures that the service delivered is effective for requirement at hand.
Without composability, a service will likely be static in a world of application requirements that are inherently dynamic: more/less performance, more/less protection, more/less capacity — services need to be responsive to the needs at hand.
We’ve already seen how composability can be effective in the world of server virtualization — or software-defined compute if you prefer. Here is what I need to compose for this new application: virtual CPUs, memory, operating system images, application code, high availability, etc. As I’m doing this, I don’t have to think much about what flavor of hardware I’m using, or even its current configuration.
Compared to the physical world: far more efficient, effective and responsive.
The same inevitable trend has now begun in the network world via software-defined networking. Here is what I need to compose for this new network: topology, protection, services, bandwidth, latency, resiliency, security, etc.
As I’m doing this, I don’t have to think much about what vendor’s hardware is underneath it all.
Once again: far more efficient, effective and responsive.
We want the same from storage.
We want to be able to think of storage services as composable. Here is an application requirement. Here is what I would like to compose in terms of capacity, performance, protection, location, security, etc. I can do so efficiently, effectively — and respond quickly to any changes in circumstance.
Can we do this with today’s storage technologies? Perhaps. But there’s a clear opportunity to do things much, much better.
Other, Less Satisfying Definitions Of SDS
Just for completeness, I thought I’d share my thinking as to why I've discarded the many alternate definitions floating around for software-defined storage.
One popular definition for SDS is anything with an open API: REST or similar. I see that more as a necessary condition vs. a defining attribute. If I’m going to be dynamically composing storage services, of course I want to do that via an API.
However, the reverse isn’t true: having an API doesn’t mean I can dynamically compose storage services with it.
Another definition you’ll hear is that software defined storage has to be entirely software-based, and that it runs on standard server hardware. While that might be a desirable attribute for some reasons, it doesn’t speak to what that storage actually does, nor does it speak to how it’s controlled.
As an example, I could have a very dumb and uncommunicative storage stack running on bog-standard hardware — and it wouldn’t be “software-defined” by this definition. Indeed, you’ll find that many of the current crop of “software-only” storage stacks lack this key ability to programmatically compose dynamic storage services.
Just because you’re software doesn’t mean you’re software-defined!
An interesting side-effect is that this “composability via API” definition can be inclusive of traditional external storage arrays — if they can compose a variety of storage services programmatically. One might argue that a software-only approach can do better than a hardware-based approach but that’s entirely subjective — and not a defining attribute.
Software-Defined Storage As Part Of The Whole: SDDC
We must not lose sight that each software-defined technology is part of a greater whole. The prize at hand is the software-defined data center: an architectural pattern where where all services are dynamically composable: both individually and collectively.
If we don’t keep this larger view firmly in mind, we can end up simply re-engineering existing technology silos. Put differently, any consideration of SDS implies new, converged operational models that span traditional boundaries.
It’s Not Going To Be Quick, And It’s Not Going To Be Easy
It took many years for server virtualization to be accepted as the norm in modern data centers. Part of this was technological maturity; a far greater challenge was the inherently slow pace of technology adoption and resulting organizational changes.
The way things were done had to be fundamentally re-engineered in order to fully exploit the benefits of server virtualization. And if you’ve been around ten years or more, you clearly remember how we used to do things.
I don’t spend much time in the networking world, but — from all appearances — that journey has clearly started. You can clearly identify progressive organizations who recognize the benefits of software-defined networking, and are motivated to move quickly. You can also see another tranche of slightly more conservative organizations who are studying and evaluating the technology, and are likely to move in that direction before long.
And, of course, some folks who aren’t interested in the least :)
I would expect to see the same pattern when it comes to software-defined storage.
My goal here is to be helpful to those progressive architects who “get” that storage is a fundamentally important technology in their worlds, who appreciate that the fundamental architectural models have begun to change.
These people are motivated to capitalize on what’s now becoming possible — and do so as quickly as possible.
Next, we’ll start our tour through a software-defined storage model.
Like this post? Why not subscribe via email?