Caution never went out of style. Software-Defined Storage is becoming mainstream, but does it bring benefits to the workloads you support? Identify the fundamental considerations that apply when choosing a storage system, and see what is suitable for your applications and data.

T oday’s software defined world brings opportunities (and challenges with them) for you to manage your data faster, cheaper and easier. In our ever changing world, while it is important to be aware of these new technologies, we must not forget the fundamentals of data.

Technology changes, Fundamentals don’t.

When we talk to IT managers, many of them say that one of their biggest pain points is not anything related to their data, but the question of which technology to bet on. Reliable, established but old traditional storage? Or flexible, faster, better… but new software defined; Or maybe both. Every vendor they talk to gives the customer their point of view (PoV), facts that are suited to selling, and not necessarily solving a problem.

It’s not the data that matters, but the applications you run and how they use data..

Sales Pitches change, Fundamentals don’t.

Rather than talking about Lenovo’s offerings, this article will discuss a framework for classifying data based on what we’ve learnt while dealing with multiple partners, in both the traditional and software defined world. We hope to assist you, the reader, in giving a perspective of how the world of storage stands today (and maybe for the next 6 months) and help you navigate to a decision that best suits you, instead of any particular vendor.

The Data Framework

Most companies run a myriad of applications, and each use data in different ways. It’s not the data that matters, but how you use it that does.

At the outset, make a list of all the different applications that you run in your company. Yes, not data, but applications.

Then segregate the applications into 2 categories – one that scales with the compute (like databases, web or application servers), and the other that scales independently (like CCTV images, or backups).

The first step is to check if the workload is worth virtualizing. For example, a workload where the software licensing is core based may not be worth it, as a physical server with high clock speed will give better TCO than virtualizing it (eg. Oracle DB). It’s not worth virtualizing a server that has a high load requirement (high CPU, high RAM). Quantifying the level would vary by your usage – as a thumb rule, any server needing over 20 cores, or over 128GB is probably better kept physical. This is where 3-tier architecture would be a better fit (footnote: Few HCI solutions are able to offer bare metal connect on iSCSI – which would also be a great solution, and recommended).

The second step is to then identify all candidates for virtualization (if not already virtualized). Once you have the list of workloads ready, use a procedure called application grouping to be able to correctly organize the management of these workloads.

Today, Hyper-convergence Infrastructure (HCI) has proved itself able to handle mission critical workloads as easily (or better) than 3-tier architectures. While VDI and Exchange are easy business cases to make for HCI, there is technically no virtualized workload today that runs better on 3-tier versus HCI. Market-research company IDC considers All Flash Arrays (AFA) and hybrid-storage as separate categories of products and systems. However, the reality is AFA that uses Software-Defined Storage (SDS) is a subset of HCI architectures.

Today, Hyper-convergence Infrastructure (HCI) has proved itself able to handle mission critical workloads as easily (or better) than 3-tier architectures. While VDI and Exchange are easy business cases to make for HCI, there is technically no virtualized workload today that runs better on 3-tier versus HCI. Market-research company IDC considers All Flash Arrays (AFA) and hybrid-storage as separate categories of products and systems. However, the reality is AFA that uses Software-Defined Storage (SDS) is a subset of HCI architectures.

Today, Hyper-convergence Infrastructure (HCI) has proved itself able to handle mission critical workloads as easily (or better) than 3-tier architectures. While VDI and Exchange are easy business cases to make for HCI, there is technically no virtualized workload today that runs better on 3-tier versus HCI. Market-research company IDC considers All Flash Arrays (AFA) and hybrid-storage as separate categories of products and systems. However, the reality is AFA that uses Software-Defined Storage (SDS) is a subset of HCI architectures.

As an example, a Nutanix node can function as an All Flash (or hybrid) storage node and scale out to unlimited nodes. But a Nimble Storage based All-Flash node cannot function as an HCI node. Hence the only reason one would consider a Nimble product vs. Nutanix would be the price. However, a non-HCI product still means adopting the 3-tier architecture – leading to the bottlenecks of using a SAN fabric. If the application performs better with the data close to it, 3-tier doesn’t make sense – even if one uses the latest greatest SDS technology.

In fact, most workloads in this category are better off either as bare metal, or in HCI nodes (regardless of brand) than 3-tier. The only case where 3-tier looks better, is when there is a human perception issue.

The third step is to move to the sizing and technology discussion. This is usually when vendor pre-sales teams get involved. One of the items that always helps vendors do correct sizing is the inventory sheet; not only does it give a view of the existing configurations and utilizations, but allows the team to quickly size rather than wait for data. It also allows for all the vendors in discussions to start with the same set of variables, reducing ambiguity in the proposals.

There is a major difference between a characteristic of a system and capability of that system. When it comes to storage, many vendors have blurred the line between the two, and are selling based on features, rather than addressing a pain point. Rapid advancements in technology due to innovation lead to new capabilities and characteristics, but fundamentals don’t change.

It is very important to be able to understand how data will be accessed, and what it would be used for. That defines its characteristics. Making that process more efficient is the capability of the system.

It’s not the data that matters, but the applications you run and how they use data.

When it comes to storage, we can categorize into 3 broad boxes:

#1

Performance & Latency – Applications such as analytics where an entire data set needs to be accessed quickly, loaded into the compute, and processed. However, this would be for datasets in the Terabyte range. All-Flash arrays fit perfectly into this space, and numerous companies have offerings in this segment.

#2

Capacity – Large file services or farms of images, where file sizes may range from bytes to Terabytes, and file quantities could be 100,000 to trillions. CCTV would fit in this category, where performance is not a concern, but large capacity is. The same applies to workloads like backup, archive, render farms for digital media, and large chip design companies with millions/billions of small to large CAD files.

#3

Performance & Capacity – This is typically large file farms that need all of their data analyzed, i.e. Big Data farms that need performance as well, Artificial intelligence setups, Large HPC and so on. These form a growing industry where the lines are blurring between software and hardware.

Broadly speaking, characteristics are mostly dependent on the system’s components, while capabilities are dependent on the software it runs. RAID is a capability, but performance is a function of the disks, as well the intelligence of the software. Never get caught up on the features or the details, instead understand how the system addresses the pain points that are vital to your organization.

Framework To Evaluate Needs

When considering infrastructure, map out different vendor solutions according to the below framework. The below framework is guidance that we in Lenovo use to help our sellers correctly position Lenovo offerings to our customers. But it can be effectively adapted to suit your needs as well:

As a reminder, this is a storage only framework – with the assumption that the application we are looking at is scaling independent of compute.


Framework Capabilities Explained

Data Type: This is applicable to how data is accessed, and how information can be collated using both types of data. Big Data workloads use both, Structured and Unstructured, while CCTV stores videos (unstructured data) in a structured manner. But one should consider how often the data needs to be accessed, and that will give insight into how you can manage various types of data.

Performance and Latency: These are defined by the application or the business need.

Capacity: Today’s hard disk sizes are up to 12TB a disk – but the performance is still low. To draw the necessary performance out of such disks, one needs many of these to be able to create a viable solution. However, we also have 7.68TB, and 15TB Read Intensive SSDs being used, and this is where we find a lot of interesting solutions. High Capacity Read Intensive SSDs are very good for analytics (which was earlier the domain of hundreds of 10K RPM disk drives), making such solutions more affordable.

Access to Open APIs: Industry standards help both small and big companies innovate and offer systems that integrate with other vendors or even the open source community. A must have.

Hybrid Cloud: While Cloud is a highly debated topic, Hybrid cloud is not. Time is the decider, like the decision between buying a house, and renting. The longer the data has to exist, the closer to home it should be. Public clouds are great for spikes in workload requirements – and having a system that can leverage the two is prudent.

TCA/TCO: is another key discussion point – and should consider the following factors:
Quantifiable: Acquisition, Installation, Warranty, Manpower, Software, Power & Cooling, Upgrades.
Unquantifiable: Infrastructure Complexity, Cost of migrating to another vendor, Cost of not using open standards (if any), Time required to learn.

Scaling Type: For most workloads scale out has better capabilities, is more cost effective for large capacities and comes with all the advantages that massive parallelism brings.
With the advancements of distributed file systems today, scale-up storage technologies are slowly becoming a thing of the past. However, two workloads where scale up remains applicable are:

  • Low-cost SMB setups, where storage budgets are usually less than 30,000 USD. SDS architectures require Hardware, Software & Support, and a minimum starting point of 2-3 nodes. This makes the entry point of SDS >50K, which is not relevant to this market.
  • Large Core-Banking setups, where costs are not as important as reliability, need as few updates as possible, and prefer tried/tested setups. As SDS is a new technology prone to multiple updates (as simple as they may be), they are yet to be implemented into core storage. We do see this changing in the near future, but not yet.

Fitting A Workload To The Framework

Now let’s take an example of 3 workloads, and how they fit into the framework:

The intent is that you can easily incorporate your workload into the above table – and it becomes a good starting point for any technology discussion on storage.

KASHISH KARNICK - PRODUCT MANAGER STORAGE AND SOFTWARE DEFINED, LENOVO ASIA PACIFIC
CIO PROFILE
Kashish Karnick
Product Manager Storage and Software Defined, Lenovo Asia Pacific

New technologies creep up everyday, creating opportunities for new innovative solutions that can truly add value. But the complexity of technology is a road block to this value, and Kashish loves being able to simplify this – everyday.