Taming Unstructured Data in Storage
Unstructured data is data that isn’t organized in a pre-defined way. In other words, it lacks useful classification to identify its content. Common examples include emails, documents and social media posts. Indeed, all human communications in a natural language format are unstructured, but other forms of data fall in this category as well, such as videos, photographs and audio recordings.
Structured data, by contrast, is clearly defined by its meaning. Structured data can include a database or table wherein the content is organized for easy search and retrieval.
In the course of business, unstructured data can easily account for much, if not most, of the data collected and stored. According to estimates published in IBM Systems Magazine, “90 percent of the world’s data has been generated in the past two years, and 80 percent of new data is unstructured, growing at twice the rate of structured data. It’s expected that 40 ZB of data will be created by 2020—300 times the amount in 2005.”
Impact of Unstructured Data on Storage
Like a knot in a fishing line, unstructured data that continues to grow can pile-up and tangle attempts at mining it for information that’s important to the business — such as identifying growth opportunities, mitigating risks and reaching compliance. But before sophisticated analytics can work their magic on unstructured data to render all of that, the data is stored, managed and moved in and through a variety of systems.
In short, unstructured data represents both an organizational and storage challenge as its volume continues to rise.
“Additional data volume translates into added workload for the storage administrator. It’s more data for the administrator to manage, and it generates additional costs in staffing to pick up some of that workload,” according to a TechTarget article summarizing its Unstructured data FAQ audiocast.
“And it’s not just sheer storage. The complexity of storage environments also compounds the problem, especially when factoring in issues like virtualization and tiered storage. Increasing data volume also impacts backup and disaster recovery in terms of backup windows and recovery time objectives (RTO).”
Cloud Versus Data Center Storage
Storage environments have also changed to accommodate the high-growth in structured and unstructured data. Certainly the cloud has brought cost and capacity advantages, but the public cloud presents too much risk for data that needs greater protection. Therefore, many companies are turning to private and hybrid clouds to take full advantage of its storage benefits while keeping data secure.
Whether a company chooses to store data in its own data center or in a cloud, storage hardware optimization is a must to prevent server sprawl and escalating costs. Furthermore, managing the burgeoning amount of data within the storage environment calls for sophisticated systems like virtualized and tiered storage.
“At root, the key requirements of big data storage are that it can handle very large amounts of data and keep scaling to keep up with growth, and that it can provide the input/output operations per second (IOPS) necessary to deliver data to analytics tools,” explains Antony Adshead, storage editor at ComputerWeekly.