Enterprises today generate large amounts of data. This data can be used to gather valuable hidden insights and discover new business opportunities. However, the key to reaping the business benefits of this data is effective data management.

A famous company once coined the phrase—”the whole world is… data!” If you think about it, it makes sense. YouTube, Uber, Deliveroo, SAP, Facebook, and other new technology businesses. It’s all just data. Today’s increasingly connected world reflects that very phrase. Data growth is now exponential, due to the rise of Internet of Things, social media, big data, and other technologies that generate huge amounts of data. Having a substantial amount of data also means a substantial amount of untapped knowledge for companies.

Companies generate data. In many cases, they have more data than they need. But that data can only get converted into meaningful intelligence if organisations can manage it efficiently and run analysis on it for business insights. The possibilities of what one can do with the data are endless.

Possibilities of Big Data:

  • LinkedIn could use twitter feeds posted from select customers segments to suggest market demand/trends for specific products. Or even realise the need for a new product, the source content for innovation.
  • Toyota used Virtual reality to get customers to test drive and experience their new car, without it ever being physically created.
  • Biometric data of an entire population of a country like India was collected, to analyse citizen behaviour and create programs specifically for farmers.
  • The ability of a computer system to recognize visually the gender of the person in front of the camera, regardless of nationality, as they walk into a retail store, and then offer the relevant products to them.

In order to learn how they can analyze their data, corporations need to manage their data effectively.

Effective management of “hot data”

Frequently accessed data is called “hot data.” Databases, web farms, emails, web pages and so on, typical rely on hot data. With the cost gap between flash disks and traditional disks becoming smaller, the adoption of all-flash arrays for mission-critical data changed from being a fad, into becoming an industry standard practice. The leader in the all-flash array space is Nimble Storage.

But the real growth of flash adoption came with hybrid arrays, which used tiering between flash and traditional disks to drive adoption.

This capability then extended to storage virtualization where customers could use their existing traditional methods with new all-flash arrays—and the virtualization software would tier between these two (or 3) layers. IBM’s Spectrum Virtualize/Storwize and Datacore offer such capabilities and are the most mature in utilising these technologies.

How to store large capacity “cold data”

On the opposite side of the coin, you have what is commonly referred to as “cold data,” or infrequently accessed data—of which there are two types. Firstly there is the traditional structured data—which is what enterprises have been using. This is typically backup or archival data. The other type is the large file systems, where we have to store large video files, images and other large file types. Data sizes can range from small file sizes of a few kilobytes to large terabyte sized files; or from a few 100 files, to billions or trillions of files.

Furthermore, it’s a very different task to manage 100 million files of 2MB than it is to manage 10,000 files of 200GB

Even though they both require the same storage capacity. Some could be pictures, some videos, some blogs, some excel sheets, others powerpoints files, and so on. Existing architectures are not able to support this high level of granularity from a data management standpoint, and hence new software defined architectures evolved to address this need.

Enterprises today generate large amounts of data. This data can be used to gather valuable hidden insights and discover new business opportunities. However, the key to reaping the business benefits of this data is effective data management.
  • Object storage was created to address this high volume of granular data, which could scale to exabytes. The leader in the Object Storage SDS space is Cloudian.
  • “File and block”, especially in large capacities, was also difficult to address due to the escalating costs of storage area networks (SAN). This is where SDS stepped in with scale-out architectures that helped reduce the cost of storage significantly. The leader in the “file and block” SDS space is Nexenta.
  • Specific homogenous workloads like HPC—which require performance, capacity, and scale—have evolved into running machine learning tasks. It doesn’t just require petaflops of performance, but also petabytes of data. For this, specialized low-latency software defined offerings like IBM’s SpectrumScale works to create the ideal fit for large setups.

To cope with the different types of storage data, the industry needed to become more granular in how it addresses different workloads. This has to be done while adopting new Software Defined technologies, and creating a bridge between the traditional solutions of today, with the Software Defined solutions of tomorrow.

Sachin Bhatia
Chief Marketing Officer, Asia Pacific

Sachin Bhatia is Head of Marketing for Asia Pacific at Lenovo Data Center Group. He has over sixteen years of professional experience in integrated marketing, having developed and executed regional marketing programs for leading IT and Banking brands.