Aug 01

1TB is not 1TB: do you know your Data Storage Factor, and what it is costing you?

Data volumes continue to breed like rabbits, just more so. The types of data needing storage are exploding. Generations of data (and tiers) are also breeding, not least through IT having to satisfy legislative and regulatory requirements. Enterprises are paying a fortune to store this explosion, but they need only do so if they use traditional approaches. Constellation’s analysis is that enterprises urgently to understand their Data Storage Factor (DSF). Researching this opens to major door to save and contain only the necessary data — without compromising the business. The problem is not new. But it is one that continues to accelerate out of control. Yet managing it is akin to herding an army of ants, if you do not know where you stand.

Understanding your DSF

Start with a simple proposition. Your enterprise has 1TB of ‘normal’ data – say traditional, transactional business data (for the moment forget about ‘all the other’ information on servers, PCs, etc.) that you need to run the business (most medium and larger business will have way more than this, but using 1TB makes for simpler example math). The simple fact is 1TB is not 1TB. It is not even 2TB once back up and, depending on disaster recovery (DR) policies and other options (for example, snapshots, operational disk efficiencies) are taken into account, it will not even be 4TB. It may be much, much more. How much more (‘the Data Storage Factor’) will only become apparent if you dig deep (remember this is not about the technologies to store your data — SSD, HDD, tape, optical, etc.) but about the volume of data that needs to be saved.

Considerations and challenges

When thinking about data, there are a plethora of issues (and the ones listed here are only a selection) that should come to mind: – backup – operations – DR – compliance – operational efficiency. Each of these possesses its own characteristics, many of which interweave. For example, what is the backup policy? Is it to have one backup or to have clear generations and tiers of backup (as happens in most enterprises) — yesterday, last week, last month, etc? Then there are operational considerations. If primary systems go down (those that affect the capability of the business to operate), then the imperative is to recover and restart as quickly as possible. This may involve, for example, regular snapshots so that the minimum that has to be done can be done. Then there is also the reliability consideration: how often do ‘backups’ which are thought to be reliable prove unreliable, incomplete or plain useless. This is what has driven the need for more backups, on the principle that the more there are, the better — for safety’s sake. Now add DR. IT Disaster Recovery is in use with most larger organizations. At its simplest, replica data is stored ‘somewhere else’ so that if a disaster occurs — it does not matter what — then not all is lost. DR likely doubles the data volume that needs storing. But it may be more, depending on the policies adopted and an enterprise’s specific requirements. In today’s world we also cannot forget compliance. Legislators and regulators want more and more to be kept in a form that can be readily accessed and not modified (to the eternal joy of lawyers armed with legally-enabled discovery). This adds a further (usually difficult to unquantify) aspect to the increase in the amount of data that has to be located somewhere and safe. Finally, at least for this blog, there is operational efficiency. It is not uncommon to see disks that are deliberately run at no more than 30%-50% ‘data occupancy’, because this is what assures the performance that users and applications want.

The impact on your DSF

Now let us make some simple policy assumptions: – backup is done daily over a 7 day week, and 30 days are kept – 3 generations of the total past month are kept as further backup – a full copy of the existing and previous 3 months is kept offsite (DR). If we start with our 1TB, that means 7TB by the end of the week and 30TB by the end of the month. Add 3 generations of past months, at its simplest, and this is 120TB. Add DR and you have 240TB in total. What matters here is the DSF, in this case 240 times the original 1TB. You think that a DSF of 240 is unreasonable. Maybe so — for example, if only 15% of data changes each day, you would reduce the data that changed to .15TB/day which amounts ‘only’ to 2.0TB at the end of week 1, and after 30 days only some ~6TB. Now add the 3 generations of monthly backup and you have 24TB; add again the DR and it is ‘only’ 48TB. Yes, a DSF of 48 is much better than the crude DSF of 240. Yet it is horrifying enough: what would you think if were buying something and were told that you would need 48 times as much in order to be safe? But remember you have not yet reflected disk efficiency, nor snapshots nor compliance requires (there are plenty of compliance executives who will argue that operational data and its backup must be kept separate from compliance copies) nor for growth nor for the multiple new types of data arriving (Twitter messages may only be 140 characters each but there are an awful lots of tweets that may be relevant to your enterprise’s social network image), etc.

What to do?

In essence you must know your DSF in order to understand the scale of the challenge. Only then can you act. If data storage factors like 240 or even 48 seem excessive, they probably are (they are used here to make the point). But is a DSF of 5 or 10 or 20 competitive for your industry? The good news is that there a variety of techniques and technologies that can assist. But these only become relevant if you understand the scale of the data volumes challenge before you start to implement ‘solutions’. Some, indeed, involve measures a radical as wholly rethinking how you store data (and these are practical). [Constellation is currently evolving its own ‘C-DSF’, a combination of best practices, use of automated software and application of Constellation’s analyst insights into storage technologies and techniques and their evolution.]

Leave a Reply

Your email address will not be published. Required fields are marked *