<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2enclosuresfull.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Road to Recovery</title>
	
	<link>http://www.backupandbeyond.com</link>
	<description>It's All About Getting Your Data Back!</description>
	<lastBuildDate>Wed, 06 Jan 2010 15:23:54 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/BackupBeyond" /><feedburner:info uri="backupbeyond" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><itunes:explicit>no</itunes:explicit><itunes:subtitle>It's All About Getting Your Data Back!</itunes:subtitle><item>
		<title>Farewell – Well, at Least to this URL</title>
		<link>http://feedproxy.google.com/~r/BackupBeyond/~3/96hi74Bze3U/</link>
		<comments>http://www.backupandbeyond.com/farewell-%e2%80%93-well-at-least-to-this-url/#comments</comments>
		<pubDate>Mon, 28 Dec 2009 01:55:20 +0000</pubDate>
		<dc:creator>Steve Kenniston</dc:creator>
				<category><![CDATA[Backup]]></category>

		<guid isPermaLink="false">http://www.backupandbeyond.com/?p=488</guid>
		<description><![CDATA[All good things… que sira sira… and all that jazz.  I am moving on, but NOT going away.  As many of you who follow me know, I have been in the storage industry for a good part of my career.  I had always wanted to deliver a consistent, storage focused, thought leadership blog and have [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_490" class="wp-caption alignleft" style="width: 160px"><img class="size-thumbnail wp-image-490" title="farewell" src="http://www.backupandbeyond.com/wp-content/uploads/2009/12/farewell-150x150.jpg" alt=" " width="150" height="150" /><p class="wp-caption-text"> </p></div>
<p>All good things… que sira sira… and all that jazz.  I am moving on, but NOT going away.  As many of you who follow me know, I have been in the storage industry for a good part of my career.  I had always wanted to deliver a consistent, storage focused, thought leadership blog and have been able to accomplish that with Road to Recovery.  The trouble is Road to Recovery, and the URL <a href="../../../../../">www.backupandbeyond.com</a> are very backup focused and my with my new role, I will be focused on another areas of the storage industry.</p>
<p>I am embarking on my own “Personal Journey”, and after giving it much thought (and for anyone who has tried to register a URL that is both “cool” and describes what you do) I am moving the blog to <a href="http://www.thestoragealchemst.com/">www.thestoragealchemist.com</a>.  I wanted to keep it more generic – so if I as I move around this great industry, I will not have to change the URL and risk folks not finding me.  The new site will discuss all things storage, not just data protection.</p>
<p>I want to thank all of you who have helped me create this blog and all of you who read it and hope you have found that it useful.  You have all inspired me to continue my blogging, as well as other social media activity (you can find me on twitter @skenniston).  For those of you who have placed me on your blogroll, I thank you and hope that you will update the roll to my new URL.</p>
<p>I hope many of you will follow me to the new site.  I am keeping to my commitment to myself to be consistent, thoughtful and to drive thought leadership without vendor hype.  I may talk about products that I am more familiar with than others, but my goal is to discuss technology value, not drive FUD or vendor hype.</p>
<p>Thanks again and please stay tuned.</p>
<input />
<input />
<input id="gwProxy" type="hidden" />
<input id="jsProxy" onclick="jsCall();" type="hidden" />
<input id="gwProxy" type="hidden" />
<input id="jsProxy" onclick="jsCall();" type="hidden" />
<input id="gwProxy" type="hidden" />
<input id="jsProxy" onclick="jsCall();" type="hidden" />
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Farewell+%E2%80%93+Well%2C+at+Least+to+this+URL+http://g9qtn.th8.us" title="Post to Twitter"><img class="nothumb" src="http://www.backupandbeyond.com/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Farewell+%E2%80%93+Well%2C+at+Least+to+this+URL+http://g9qtn.th8.us" title="Post to Twitter">Tweet This Post</a></p><i>Scridb filter</i><!-- Scridb filter--><img src="http://feeds.feedburner.com/~r/BackupBeyond/~4/96hi74Bze3U" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.backupandbeyond.com/farewell-%e2%80%93-well-at-least-to-this-url/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.backupandbeyond.com/farewell-%e2%80%93-well-at-least-to-this-url/</feedburner:origLink></item>
		<item>
		<title>Data Protection Management from ‘Nice to Have’ to ‘Need to Have’</title>
		<link>http://feedproxy.google.com/~r/BackupBeyond/~3/4e-1s6ffSqU/</link>
		<comments>http://www.backupandbeyond.com/data-protection-management-from-%e2%80%98nice-to-have%e2%80%99-to-%e2%80%98need-to-have%e2%80%99/#comments</comments>
		<pubDate>Tue, 15 Dec 2009 05:43:47 +0000</pubDate>
		<dc:creator>Steve Kenniston</dc:creator>
				<category><![CDATA[Archive]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[Data Deduplication]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Replication]]></category>
		<category><![CDATA[Avamar]]></category>
		<category><![CDATA[Classification]]></category>
		<category><![CDATA[Data Protection Advisor]]></category>
		<category><![CDATA[Data Protection Management]]></category>
		<category><![CDATA[Deduplication]]></category>
		<category><![CDATA[protection]]></category>
		<category><![CDATA[Recovery]]></category>
		<category><![CDATA[Virtualization]]></category>

		<guid isPermaLink="false">http://www.backupandbeyond.com/?p=482</guid>
		<description><![CDATA[Data protection management has come a long way in the past decade.  More importantly the features and functionality that are in products these days and what customers have come to expect are now no longer ‘nice to have’ feature in the data center, they are ‘need to have’ features.
Additionally, the term ‘data protection’ is morphing [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_483" class="wp-caption alignleft" style="width: 160px"><img class="size-thumbnail wp-image-483" title="imagine" src="http://www.backupandbeyond.com/wp-content/uploads/2009/12/imagine-150x150.jpg" alt=" " width="150" height="150" /><p class="wp-caption-text"> </p></div>
<p>Data protection management has come a long way in the past decade.  More importantly the features and functionality that are in products these days and what customers have come to expect are now no longer ‘nice to have’ feature in the data center, they are ‘need to have’ features.</p>
<p>Additionally, the term ‘data protection’ is morphing every day and has different meanings to different people.  Questions like ‘is replication data protection?’ or ‘is archive data protection?’ or ‘is DR / BC a function of protection?’ are now common in IT circles.  Each in their own right is a methodology for protecting information or has some play in the grand scheme of data protection.  The reality is, much like every answer in IT, the answer to these questions is ‘it depends’.  Data Protection has many different definitions, which start to expand the scope of what it actually is and more importantly, how it is managed cost effectively across the whole environment.</p>
<p>It is this expanding scope of data protection  where data protection management tools come into play, and the more flexible and granular the tool, the more effective.  It is hard to have good data protection capabilities without having insight to the environment.  First, understanding what type of data lives in the environment, where it is, how it is used and some characteristics about its age or its access frequency helps to determine how to best protect the information.  This is where a data protection management tool that provides some insight to the file system adds a great deal of value.</p>
<p>Next, if archive is a part of data protection (and I would argue that a functional archive, when used properly, is) then a data protection management tool that provides insight to the data in the archive can also help manage the overall protection process within the greater environment.  Knowing if the data in the archive is actually being accessed or if it can be deleted (unless stored for compliant purposes) can help to control archive costs.</p>
<p>If replication is a part of the overall data protection scheme, a data protection management tool that provides insight to this process can also add a great deal of value.  Identifying if links are up, if data is moving between sites and if the data is available, accessible and meets my recovery point objectives at the remote site can ease the concern of recoverability in the event of a disaster.</p>
<p>And finally, providing as much information as possible such as deduplication rates,  tape growth, disk growth (in disk based backup targets – including deduplication targets), as well as providing true analytics into the backup environment to help make decisions as to when to switch from a tape-based solution to a disk-based solutions.  These analytics need to be in-depth enough to show that if some data that is being protected with traditional backup technologies are moved to a next generation solution, such as source-based deduplication, then what affect will it have on the overall backup environment, will it help to better control costs, will it help to increase SLAs?</p>
<p>At a higher level, customers are telling me that they no longer want to manage backup, they just want it to work and they want proof it is working.  As customers move to a more virtualized IT infrastructure, they find that they are being forced to rearchitect their data protection environment and they are now looking to solutions that elevate the process.  IT is looking for tools to make their environment “data protection aware.” As virtual machines are added to the environment they are automatically protected and want notification if they are not so they can mitigate any risk, and let’s face it, backup is all about risk mitigation.  Backup is insurance.  Wouldn’t it be nice if your insurance company had deeper insight to all the cars / drivers in your family and told you when your teenager was speeding on a monthly basis and told you that your premiums are going to go up if they don’t start driving the speed limit <em>before</em> they got the ticket and your premiums increased?</p>
<p>Any tool that IT invests in for a common process, data protection in this case, needs to be flexible enough to allow IT to manage as much of the overall process from a single pain of glass.  Good data protection management tools need to provide IT as much visibility into the overall data protection environment as possible in order to help make good decisions about what data technologies should be invested in, in order to help IT meet its overall SLAs and hence business objectives.</p>
<p>There is no sense spending a great deal of money on rearchitecting a backup environment if there is no insight to the success of the new architecture.  Sooner or later, management needs to have the pretty graphs that prove to someone that the right decisions are being made when it comes to protecting information, or when it comes to how much is spent on data protection or if the SLAs can be met.  Not having good data protection management tool, and spending too much on new data protection architectures while not meeting your SLAs could lead to a RGE (resume generating event).  Data protection management tools today are a need to have, not a nice to have.  Make the investment and put your data protection environment back on the Road to Recovery.</p>
<input />
<input />
<input id="gwProxy" type="hidden" />
<input id="jsProxy" onclick="jsCall();" type="hidden" />
<input id="gwProxy" type="hidden" />
<input id="jsProxy" onclick="jsCall();" type="hidden" />
<input id="gwProxy" type="hidden" />
<input id="jsProxy" onclick="jsCall();" type="hidden" />
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Data+Protection+Management+from+%E2%80%98Nice+to+Have%E2%80%99+to+%E2%80%98Need+to+Have%E2%80%99+http://dprso.th8.us" title="Post to Twitter"><img class="nothumb" src="http://www.backupandbeyond.com/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Data+Protection+Management+from+%E2%80%98Nice+to+Have%E2%80%99+to+%E2%80%98Need+to+Have%E2%80%99+http://dprso.th8.us" title="Post to Twitter">Tweet This Post</a></p><i>Scridb filter</i><!-- Scridb filter--><img src="http://feeds.feedburner.com/~r/BackupBeyond/~4/4e-1s6ffSqU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.backupandbeyond.com/data-protection-management-from-%e2%80%98nice-to-have%e2%80%99-to-%e2%80%98need-to-have%e2%80%99/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.backupandbeyond.com/data-protection-management-from-%e2%80%98nice-to-have%e2%80%99-to-%e2%80%98need-to-have%e2%80%99/</feedburner:origLink></item>
		<item>
		<title>How Much Backup Capacity Does Deduplication Really Save?</title>
		<link>http://feedproxy.google.com/~r/BackupBeyond/~3/w2uF5yUaGcg/</link>
		<comments>http://www.backupandbeyond.com/how-much-backup-capacity-does-deduplication-really-save/#comments</comments>
		<pubDate>Mon, 30 Nov 2009 16:00:32 +0000</pubDate>
		<dc:creator>Steve Kenniston</dc:creator>
				<category><![CDATA[Backup]]></category>
		<category><![CDATA[Data Deduplication]]></category>
		<category><![CDATA[Replication]]></category>
		<category><![CDATA[Data Protection]]></category>
		<category><![CDATA[Deduplication]]></category>
		<category><![CDATA[Restore]]></category>
		<category><![CDATA[Tape]]></category>

		<guid isPermaLink="false">http://www.backupandbeyond.com/how-much-backup-capacity-does-deduplication-really-save/</guid>
		<description><![CDATA[There is a lot of discussion around data deduplication for backup these days.  (I wish I could deduplicate all the turkey I ate last week.)  In fact, Gartner claims that “…by 2012, deduplication will be applied to 75% of backups.”  And when asked “Why?” the response was “…deduplication is too compelling to ignore.”  But I [...]]]></description>
			<content:encoded><![CDATA[<p>There is a lot of discussion around data deduplication for backup these days.  (I wish I could deduplicate all the turkey I ate last week.)  In fact, Gartner claims that “…by 2012, deduplication will be applied to 75% of backups.”  And when asked “Why?” the response was “…deduplication is too compelling to ignore.”  But I say “prove it”.  So I put together some backup capacity numbers for storing data on tape (non-compressed and compressed) versus storing data, deduplicated (fixed block and variable block), on disk and the numbers show a dramatic savings in backup space which translates into cost savings.</p>
<h2>The Parameters</h2>
<p>As with any ‘analysis’ numbers can be ‘spun’ to make them say what you want.  That said, I tried to be as straight forward as possible, so let me also show my methodology so you can see how my numbers were derived.</p>
<ul>
<li>I charted the amount of capacity created using a retention policy of:
<ul>
<li>14 Dailies</li>
<li>4 Weeklies</li>
<li>12 Monthlies</li>
</ul>
</li>
<li>I selected 10TB of primary storage capacity</li>
<li>I did this for file system backups only</li>
<li>I charted the data for 30%, 40%, 50% and 60% primary storage growth rates</li>
<li>I charted traditional tape based backup (non-compressed)</li>
<li>I charted traditional tape based backup (compressed, 2:1)</li>
<li>I charted fixed block disk based deduplicated backup</li>
<li>I charted variable block disk based deduplicated backup (3 to 5 times more efficient than fixed block deduplication)</li>
</ul>
<h2>The Effect</h2>
<p>The first thing to think about is the sheer number of full backup copies that must be maintained when utilizing the above retention schedule.  The above retention policy leads to 17.2 copies of the primary storage (12 yearly’s + 4 monthlies + the equivalent of 1.2 with dailies = 17.2 copies) .  Translation: one terabyte of primary storage becomes 17.2 terabytes of tape storage.  This means, backup administrators need to pay for the physical tapes as well as the offsite transport and storage costs.  Now 17.2 terabytes of tape doesn’t sound like much but keep in mind that is for 1TB of primary capacity.  Ten TB of primary capacity yields 172 TB of tape capacity.  Now add in year over year storage growth.  At 30% primary storage growth, the backup storage growth grows 23%, at 40% primary storage growth, the backup storage growth grows 29%, at 50% primary storage growth, the backup storage growth grows 33% and at 60% primary storage growth and the backup storage grows 38%.</p>
<p>Figure 1 below shows, 10 TB of primary capacity growing at 30%, 40%, 50% and 60% along the x-axis respectively and the corresponding capacity of tape or disk consumed along the y-axis is.</p>
<p style="text-align: center;">
<p style="text-align: center;">
<div id="attachment_460" class="wp-caption aligncenter" style="width: 499px"><img class="size-full wp-image-460 " title="BU Storage Capacity" src="http://www.backupandbeyond.com/wp-content/uploads/2010/11/BU-Storage-Capacity.jpg" alt="  " width="489" height="239" /><p class="wp-caption-text">  </p></div>
<p style="text-align: center;">Figure 1</p>
<p>The graph shows that compressed backup to tape obviously yields a 50% capacity improvement over non-compressed tape as one would expect.  It also reflects that fixed block deduplicated disk capacity is only about 48% more efficient than uncompressed tape storage yet variable block deduplication is 81% more storage efficient than uncompressed tape storage.</p>
<p>Interesting as well, the chart reveals that fixed block deduplication is 3% less efficient than compressed tape whereas variable block deduplication is 62% more efficient than compressed tape.  Typically, with the same data change rates, and equivalent data sets, variable block deduplication is 3 to 5 times more efficient than fixed block deduplication.</p>
<p>The moral of the story – if you’re going to do deduplication, variable block is the way to go.  From a cost perspective, there is essentially no difference in the $/TB price however there is much more value in the long run with variable block deduplication. Vendors typically charge a $/TB price for their deduplication solutions.  The difference between fixed and variable block deduplication comes down to the capacity of data that is stored in the backups which directly translates into costs.  If you take a look at Figure 2, over time, starting with 1TB of primary capacity growing at 25% over the course of one year, IT will need almost 2TB of backup capacity with fixed block deduplication versus less than 1TB of capacity using variable block deduplication (assumes fixed block is 5x less efficient from imperial data that has been collected in the field.).  The most important part of this graph is the slope of the blue and red lines.  The greater the degree of slope (red line), the more frequently IT will need to purchase capacity to protect the given data set as well as need to pay for licensing as it pertains to deduplication software.  IT wants the smaller slope.</p>
<p style="text-align: center;">
<div id="attachment_472" class="wp-caption aligncenter" style="width: 532px"><img class="size-full wp-image-472 " title="fixed-var-dd" src="http://www.backupandbeyond.com/wp-content/uploads/2010/11/fixed-var-dd.jpg" alt=" " width="522" height="332" /><p class="wp-caption-text"> </p></div>
<p style="text-align: center;">Figure 2</p>
<p style="text-align: left;"><em>*Note: Some companies will position their fixed block technologies as variable block by stating that you (the user) has the ability to set the block size to what ever you want, however, once set, it stays that way for all of your data.  The difference is, true variable technologies adjust the block size on the fly using their algorithms to ensure maximum efficiency with no management.</em></p>
<h2>Bang for the Buck</h2>
<p>The most important benefit, as with most things in IT however is overall cost savings.  Deduplicated disk solutions are anywhere from 2.5X to 3X more expensive than tape, however with the overall capacity savings, there can be significant cost savings.  Figure 3 is representative of the overall costs of new deduplicating disk systems and traditional tape backup systems (including tapes and off-site storage costs).  I will caveat this by saying every TCO and ROI has a ton of ‘what ifs’ that factor into overall costs including things like FTE for backup engineers and long term retention costs, but for the most part, disk systems reduce a good deal of these costs (with the exception of power and cooling) and increase the reliability, security and performance of backups and recoveries.</p>
<p style="text-align: center;">
<div id="attachment_462" class="wp-caption aligncenter" style="width: 498px"><img class="size-full wp-image-462 " title="BU costs" src="http://www.backupandbeyond.com/wp-content/uploads/2010/11/BU-costs.jpg" alt="  " width="488" height="316" /><p class="wp-caption-text">  </p></div>
<p style="text-align: center;">Figure 3</p>
<p><em><sup>1</sup></em><em> The chart above is based on a rough cost of $8,000 per terabyte of tape backup system costs (including media and off-site storage) and rough cost of $20,000 per terabyte of deduplicated disk backup system costs for the period of one year.  Prices will vary depending upon your configuration and these estimates do not include space, power, cooling or human costs.</em></p>
<p>As I stated above there are only a few factors that are involved in this very raw calculation.  There are a number of other factors involved with a backup process including WAN costs (if replacing tape with disk), remote office facilities, installation (professional services), and software and hardware maintenance to name a few.  But no matter how you look at it, disk based backup with variable block deduplication wins over tape.</p>
<p>Backing data up to deduplicated disk not only saves the amount of backup capacity that is used, it also has other implications for a data protection environment.  First, backing up to disk versus backing up to tape helps to reduce the reliance on tape and the inherent limitations, security concerns and reliability issues surrounding tape.  Recovery of data from disk reduces the operational costs and decreases the recovery time objective.  Additionally the reliability of disk with RAID is much higher than the reliability of tape.</p>
<p>New data protection technologies are evolving backup to a degree where the entire data protection process is getting easier manage by removing multiple points of management (backup servers, media servers, tape libraries and physical tape).  As backup continues to evolve, this can help simplify the overall process and;</p>
<ul>
<li>Increase reliability of backups</li>
<li>Reliability of recoveries</li>
<li>Decrease backup times</li>
<li>Decrease the time to recover data</li>
</ul>
<h2>The Bottom Line</h2>
<p>New challenges in protecting information are arising every day, whether it is data growth, remote office data protection or virtualization, backup is getting harder not easier.  Data deduplication is providing backup administrators with tremendous benefits around backup processes and cost savings.  It is important to keep in mind that everybody’s environment is different and utilizes different methods and processes for managing and protecting information.  It is also important to take a look at your data protection environment today and understand the use cases where it is time to make new investments.  I encourage you to look at new technologies to help you with emerging challenges and weigh the overall solution including costs as well as benefits of disk based recovery.  New backup technologies that leverage data deduplication can save IT a lot of money and put you on back on the Road to Recovery.<span id="_marker"> </span></p>
<div class="mceTemp mceIEcenter">
<dl id="attachment_417" class="wp-caption aligncenter" style="width: 310px;">
<dt class="wp-caption-dt"><img class="size-medium wp-image-417" title="r2r" src="http://www.backupandbeyond.com/wp-content/uploads/2009/11/r2r-300x248.jpg" alt=" " width="300" height="248" /></dt>
<dd class="wp-caption-dd"></dd>
</dl>
</div>
<div id="_mcePaste" style="overflow: hidden; left: -10000px; width: 1px; position: absolute; top: 903px; height: 1px;"><!--  /* Font Definitions */  @font-face 	{font-family:"Cambria Math"; 	panose-1:2 4 5 3 5 4 6 3 2 4; 	mso-font-charset:0; 	mso-generic-font-family:roman; 	mso-font-pitch:variable; 	mso-font-signature:-1610611985 1107304683 0 0 159 0;} @font-face 	{font-family:Calibri; 	panose-1:2 15 5 2 2 2 4 3 2 4; 	mso-font-charset:0; 	mso-generic-font-family:swiss; 	mso-font-pitch:variable; 	mso-font-signature:-1610611985 1073750139 0 0 159 0;}  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-priority:1; 	mso-style-unhide:no; 	mso-style-qformat:yes; 	mso-style-parent:""; 	margin-top:0in; 	margin-right:0in; 	margin-bottom:10.0pt; 	margin-left:0in; 	mso-pagination:widow-orphan; 	font-size:11.0pt; 	mso-bidi-font-size:10.0pt; 	font-family:"Calibri","sans-serif"; 	mso-ascii-font-family:Calibri; 	mso-ascii-theme-font:minor-latin; 	mso-fareast-font-family:"Times New Roman"; 	mso-fareast-theme-font:minor-fareast; 	mso-hansi-font-family:Calibri; 	mso-hansi-theme-font:minor-latin; 	mso-bidi-font-family:"Times New Roman"; 	mso-bidi-theme-font:minor-bidi;} .MsoChpDefault 	{mso-style-type:export-only; 	mso-default-props:yes; 	mso-ascii-font-family:Calibri; 	mso-ascii-theme-font:minor-latin; 	mso-fareast-font-family:Calibri; 	mso-fareast-theme-font:minor-latin; 	mso-hansi-font-family:Calibri; 	mso-hansi-theme-font:minor-latin; 	mso-bidi-font-family:"Times New Roman"; 	mso-bidi-theme-font:minor-bidi;} .MsoPapDefault 	{mso-style-type:export-only; 	margin-bottom:10.0pt; 	line-height:115%;} @page Section1 	{size:8.5in 11.0in; 	margin:1.0in 1.0in 1.0in 1.0in; 	mso-header-margin:.5in; 	mso-footer-margin:.5in; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --><!--[if gte mso 10]> <mce:style><!   /* Style Definitions */  table.MsoNormalTable 	{mso-style-name:"Table Normal"; 	mso-tstyle-rowband-size:0; 	mso-tstyle-colband-size:0; 	mso-style-noshow:yes; 	mso-style-priority:99; 	mso-style-qformat:yes; 	mso-style-parent:""; 	mso-padding-alt:0in 5.4pt 0in 5.4pt; 	mso-para-margin-top:0in; 	mso-para-margin-right:0in; 	mso-para-margin-bottom:10.0pt; 	mso-para-margin-left:0in; 	line-height:115%; 	mso-pagination:widow-orphan; 	font-size:11.0pt; 	font-family:"Calibri","sans-serif"; 	mso-ascii-font-family:Calibri; 	mso-ascii-theme-font:minor-latin; 	mso-hansi-font-family:Calibri; 	mso-hansi-theme-font:minor-latin;}  > <! [endif] ></div>
<div class="MsoNormal" mce_tmp="1">Figure 2 below shows the same primary capacities and the associated backup capacity required with compression enabled to the tape and yield a savings of approximately 50% in backup storage capacity.</div>
</div>
<p></DIV></p>
<p></D ></d--></div>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=How+Much+Backup+Capacity+Does+Deduplication+Really+Save%3F+http://xybor.th8.us" title="Post to Twitter"><img class="nothumb" src="http://www.backupandbeyond.com/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=How+Much+Backup+Capacity+Does+Deduplication+Really+Save%3F+http://xybor.th8.us" title="Post to Twitter">Tweet This Post</a></p><i>Scridb filter</i><!-- Scridb filter--><img src="http://feeds.feedburner.com/~r/BackupBeyond/~4/w2uF5yUaGcg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.backupandbeyond.com/how-much-backup-capacity-does-deduplication-really-save/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.backupandbeyond.com/how-much-backup-capacity-does-deduplication-really-save/</feedburner:origLink></item>
		<item>
		<title>Enterprise Data Protection at the Edge</title>
		<link>http://feedproxy.google.com/~r/BackupBeyond/~3/tTitPZEFPVg/</link>
		<comments>http://www.backupandbeyond.com/enterprise-data-protection-at-the-edge/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 23:03:47 +0000</pubDate>
		<dc:creator>Steve Kenniston</dc:creator>
				<category><![CDATA[Backup]]></category>
		<category><![CDATA[Data Deduplication]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Avamar]]></category>
		<category><![CDATA[Data Protection]]></category>
		<category><![CDATA[Deduplication]]></category>
		<category><![CDATA[desktop laptop]]></category>
		<category><![CDATA[Recovery]]></category>

		<guid isPermaLink="false">http://www.backupandbeyond.com/?p=477</guid>
		<description><![CDATA[What does that really mean?  When I worked for Veritas, back in 1998 we acquired a company based out of Canada called TeleBackup that backed up desktop / laptops.  In 1999 Veritas acquired Seagate and the Backup Exec product which also had a desktop / laptop option.  These products were meant to eventually be integrated [...]]]></description>
			<content:encoded><![CDATA[<p>What does that really mean?  When I worked for Veritas, back in 1998 we acquired a company based out of Canada called TeleBackup that backed up desktop / laptops.  In 1999 Veritas acquired Seagate and the Backup Exec product which also had a desktop / laptop option.  These products were meant to eventually be integrated into the main backup applications but never were.  Additionally, a lot of that software was given away (hard to make a business on that) and for the most part,  lived on a shelf somewhere and was never installed.</p>
<p>In 2004 I worked for Connected Corporate (acquired by Iron Mountain), who’s sole business was desktop / laptop backup.  (In fact, from 2000 to 2004 I worked as an analyst for ESG covering all the vendors in the backup space and used the Connected product to backup my work laptop – and it actually saved my hide once.)  While the company executed a successful exit, the business was (and probably still is) only about a $20M to $40M business.</p>
<p>Why do I bring this up?  There is a new reality in IT these days.  I have said it before, IT is accountable for 100% of the data created in any company, including that stored on desktop/laptops.  This means that not only do they have to provide a location to store this data but IT also needs to provide tools to protect this information and ensure that this information is highly recoverable for both business productivity purposes as well as corporate and legal governance.   This means that desktop / laptop backup is now gaining a lot more visibility in the enterprise.</p>
<p>However, desktop / laptop data protection is one of those areas in IT that is just a nuisance because it seems like it should be an easy problem to solve, but there are so many moving parts to it that it ends up falling by the wayside.</p>
<p>A successful desktop / laptop backup technology needs three very specific capabilities:</p>
<ul>
<li>Integrate seamlessly with the existing backup solution in the enterprise</li>
<li>Share a common, deduplicated, back end repository</li>
<li>Have a very SIMPLE and robust end-user interface to allow for end-user restores</li>
</ul>
<p>The desktop / laptop solutions I discussed above did not, and do not, have these capabilities.  Even though these technologies come from reputable companies, not having these three capabilities is what has led to their very low adoption.</p>
<p>These three capabilities are all inter-related.  First IT needs an integrated solution because they do not want to have yet another piece of software in their environment that they have to manage, especially data protection software.  The fundamentals of backup are pretty simple.  Install an agent on the machine you want to protect, go to the management interface of the backup application and set up a few simple rules or policies (backup this system, at this time, to this device, catalog it and finally, keep the data for ‘x’ number of days, weeks, etc..) and start protecting your data.</p>
<p>One challenge is that most backup products don’t have an agent that is lightweight enough to run as a client on a desktop or laptop.  This causes incredible performance degradation of the system during backups, and let’s face it, if you have a laptop, 9 times out of 10 you’re going to be working on it when the backup kicks off so you will end up shutting it down which leaves you with unprotected data.  Client side data reduction techniques help to reduce this problem.  By moving less data, they run for shorter periods of time so there is little to no end user impact.</p>
<p>Next, if you did have an agent that worked well enough to backup all the desktop / laptop systems, then it would impede the backups of the other mission critical systems in the environment by utilizing all of the resources on the devices where the data is being backed up too.  (Take a look at <a href="../../../../../architecting-for-recovery/">Architecting for Recovery</a> for more info.)  This means that IT would have to set up additional, separate devices to protect one subset of systems leaving them with more devices to manage and making it a hassle to implement.  (This is one reason why ‘cloud’ like solutions have become popular, providing less things to manage, however not every company wants their data outside of their control.)</p>
<p>Also, if you look at the nature of data on desktops and laptops, they share a ton of common data.  Why would any IT person want to backup that much data over and over again?  Traditional desktop / laptop solutions don’t provide robust capabilities for reducing the amount of redundant data that needs to be protected which also translates into longer backup times and more ‘storage’ utilization (making it more costly).  Deduplication allows you to implement a common repository.</p>
<p>Finally, the tools for end user recoverability need to be very robust.  The last thing IT has time for is an increased call volume to perform data recovery for end users.  This also means that data needs to be stored on disk because end users aren’t going to load tapes to recover data which also means that data needs to be stored on disk in the most efficient manner possible to save on costs.</p>
<p>There are a number of other nice-to-have features, but the lack of the three capabilities outlined above have has limited the adoption of desktop / laptop backups. Until today there hasn’t been a good solution that met these criteria.</p>
<p style="text-align: left;">
<div id="attachment_478" class="wp-caption aligncenter" style="width: 310px"><img class="size-medium wp-image-478" title="dt-lt" src="http://www.backupandbeyond.com/wp-content/uploads/2009/11/dt-lt-300x42.jpg" alt=" " width="300" height="42" /><p class="wp-caption-text"> </p></div>
<p>This week <a href="http://www.emc.com/about/news/press/2009/20091117-01.htm">EMC | Avamar</a> launched a desktop / laptop backup component as part of their enterprise solution.  The difference between traditional desktop / laptop solutions and the Avamar solution is that the Avamar solution is 100% integrated as a part of its enterprise backup application, storing data on disk with a high degree of efficiency leveraging single instancing and deduplication.  Additionally, clients are free and they all share a common backend repository with the enterprise backup application that is protecting other common data in the enterprise.  Finally, end-users are able to perform their own restores.  What does all this mean?  Simplicity and low cost.</p>
<p>The Avamar backup technology provides enormous economies of scale when extending from the enterprise to the desktop / laptop.  By backing up to a single common repository utilizing global single instancing and deduplication you NEVER backup the same data twice, no matter where the data lives.</p>
<p>Think about this scenario – a user creates some document, say a PowerPoint presentation.  This presentation ends up being emailed to a number of people in the company and then saved on the desktop as well as in a number of file shares (home directories) on the NAS system.  This one 1MB presentation can represent 120MB of backup disk capacity.</p>
<p>Now if you utilize Avamar, the process would be, first the enterprise application would backup the NAS box and may see the file 20 times.  Avamar would single instance and deduplicate it such that it only one instance is backed up.  Next the desktops start their backup process and see that the Avamar Data Store has already protected this data so again, it doesn’t need to move or store any additional data.  A pointer is created to let the data store know that the desktop / laptop also has the ability to recover this same file.  This provides tremendous scalability.  This essentially means protecting all your desktops / laptops for free.</p>
<p>The technology is easy to manage (same client, same simple management tools), it provides a simple to navigate end user interface for self restores, and provides an integrated, single instance, deduplicated backend.</p>
<p>Seems like a triple play from the Avamar product and is helping to put IT back on the Road to Recovery.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Enterprise+Data+Protection+at+the+Edge+http://nqkhs.th8.us" title="Post to Twitter"><img class="nothumb" src="http://www.backupandbeyond.com/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Enterprise+Data+Protection+at+the+Edge+http://nqkhs.th8.us" title="Post to Twitter">Tweet This Post</a></p><i>Scridb filter</i><!-- Scridb filter--><img src="http://feeds.feedburner.com/~r/BackupBeyond/~4/tTitPZEFPVg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.backupandbeyond.com/enterprise-data-protection-at-the-edge/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.backupandbeyond.com/enterprise-data-protection-at-the-edge/</feedburner:origLink></item>
		<item>
		<title>Architecting for Recovery</title>
		<link>http://feedproxy.google.com/~r/BackupBeyond/~3/g-6HH6oFH5E/</link>
		<comments>http://www.backupandbeyond.com/architecting-for-recovery/#comments</comments>
		<pubDate>Tue, 17 Nov 2009 18:03:33 +0000</pubDate>
		<dc:creator>Steve Kenniston</dc:creator>
				<category><![CDATA[Backup]]></category>
		<category><![CDATA[Data Deduplication]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Avamar]]></category>
		<category><![CDATA[Recovery]]></category>

		<guid isPermaLink="false">http://www.backupandbeyond.com/?p=454</guid>
		<description><![CDATA[Here is a shocker for you, backup IS a science.  Good backup administrators / architects are worth their weight in gold.  CIO’s just wish backup would go away.   Backup costs money, it’s not strategic, it chews up man power and when it is &#8216;running&#8217; (successfully or not) no one really pays attention to it, but [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_470" class="wp-caption alignright" style="width: 261px"><img class="size-medium wp-image-470" title="architecture" src="http://www.backupandbeyond.com/wp-content/uploads/2009/11/architecture-251x300.jpg" alt=" " width="251" height="300" /><p class="wp-caption-text"> </p></div>
<p>Here is a shocker for you, backup IS a science.  Good backup administrators / architects are worth their weight in gold.  CIO’s just wish backup would go away.   Backup costs money, it’s not strategic, it chews up man power and when it is &#8216;running&#8217; (successfully or not) no one really pays attention to it, but when it fails or more likely when you need to restore data and can&#8217;t, someone can lose their job &#8211; so backup is VERY important, it is a science and to architect a backup environment correctly  it takes time, skill, money and someone who knows what they are dong.</p>
<p>Good backup administrators architect for recovery, not for backup.  Prove it you say.  Okay, question: “Why do backup administrators do full backups of Exchange every night?”  Answer &#8211; because it is way easier and much faster to perform a one step full recovery for Exchange than it is to lay down the weekly full and apply the incrementals.  Since mail is considered a “critical application” in the enterprise these days, and down time is critical for this application, good backup administrators architect for the least amount of downtime for the application.  This also applies to databases.  Ninety-five percent of all databases are actually snapped for quick recovery and I would also bet that a full backups is performed on them (or the snap) every evening.</p>
<p>Recovery is a primary driver of any good backup architecture but lately I have been hearing a great deal of talk around ‘backup consolidation’.  The reality is, there is no ‘one size fits all’ when it comes to backup software or hardware.  Consolidating backup software may make your environment easier to manage, but does it provide you the tools/technology you need to maximize your data protection objectives in your environment?  Consolidating backup targets (tape / disk) may yield fewer devices to manage, but what happens to your overall backup and recovery performance when doing so?  While new technologies may help fine-tune the science side of backup, they still need an artist’s touch.</p>
<p>An area where consolidation comes up quite frequently in the backup arena is around new data deduplication solutions.  While these technologies add tremendous value, it should not be suggested that you forget about good backup architecture practices.  For example, if deduplication is the removal of duplicate data, how much duplicate data is there really between your production data bases and your file systems within your company?  Mixing the storage repository for your file system and data base data just doesn’t buy you a lot in your deduplicated backend so why mix them?  It would make sense, however, to have a device / appliance for each database or set of databases that have common data as well as a device / appliance for file systems that have common data.  Doing so would yield better backup and recovery performance and would probably mirror the same set of rules you would you used your ‘old’ backup environment.  (Notice, I said ‘rules’ not devices or technologies.)  Now as long as the cost isn’t exponentially higher having multiple devices (including management costs), recovery can be much easier and faster.</p>
<p>Another interesting side note, since most IT shops do FULL backups every night of their database, for the purpose of faster recover, then why wouldn&#8217;t you want to have a dedicated backup storage device that does a &#8216;full&#8217; backup every night of the data and only needs to move the changed data?  This is the very nature of the Avamar technology and what this ‘next generation’ backup technology is designed to accomplish versus what traditional backup technologies try to do with cumbersome processes of full and incremental backups.  Why not, for example, set up a dedicated Avamar Data Store for DB backups with the proper number of nodes for performance, and leave it at that?</p>
<h3>Best Practices / Professional Services Have the Last Word</h3>
<p>Instead of naysayers making a bunch of statements that certain technologies ‘can’t’ solve a problem, why wouldn’t they take a page out of a professional services handbook that says ‘if the solution is architected properly (and can be delivered at the right cost, and meet your business objectives) then there is no reason not to make any technology work to its maximum potential and solve difficult problems, that is the real science.</p>
<p>Ten years ago, backup administrators would say, “okay, if you can&#8217;t get the backup / restore performance you need for that data set, then we will add another media server, get some more licenses and backup that data separately such that when you need to perform a restore, you can set up a dedicated media server for faster recovery.”  Should this be any different today?</p>
<p>Backup is about recovery and more importantly performance (RTO) but it is also about architecture and a good backup architecture will put you on the Road to Recovery.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Architecting+for+Recovery+http://5ys9e.th8.us" title="Post to Twitter"><img class="nothumb" src="http://www.backupandbeyond.com/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Architecting+for+Recovery+http://5ys9e.th8.us" title="Post to Twitter">Tweet This Post</a></p><i>Scridb filter</i><!-- Scridb filter--><img src="http://feeds.feedburner.com/~r/BackupBeyond/~4/g-6HH6oFH5E" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.backupandbeyond.com/architecting-for-recovery/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.backupandbeyond.com/architecting-for-recovery/</feedburner:origLink></item>
		<item>
		<title>Computer Comedy</title>
		<link>http://feedproxy.google.com/~r/BackupBeyond/~3/52kYMjvMjeI/</link>
		<comments>http://www.backupandbeyond.com/computer-comedy/#comments</comments>
		<pubDate>Mon, 09 Nov 2009 16:04:35 +0000</pubDate>
		<dc:creator>Steve Kenniston</dc:creator>
				<category><![CDATA[Backup]]></category>

		<guid isPermaLink="false">http://www.backupandbeyond.com/?p=437</guid>
		<description><![CDATA[I just love this&#8230; I have seen this circulating around the internet and just too funny not to pass along.
 Tweet This PostScridb filter]]></description>
			<content:encoded><![CDATA[<p>I just love this&#8230; I have seen this circulating around the internet and just too funny not to pass along.</p>
<div id="attachment_439" class="wp-caption aligncenter" style="width: 586px"><img class="size-full wp-image-439" title="country-folk-tech" src="http://www.backupandbeyond.com/wp-content/uploads/2009/11/country-folk-tech.jpg" alt=" " width="576" height="1994" /><p class="wp-caption-text"> </p></div>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Computer+Comedy+http://m2gzx.th8.us" title="Post to Twitter"><img class="nothumb" src="http://www.backupandbeyond.com/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Computer+Comedy+http://m2gzx.th8.us" title="Post to Twitter">Tweet This Post</a></p><i>Scridb filter</i><!-- Scridb filter--><img src="http://feeds.feedburner.com/~r/BackupBeyond/~4/52kYMjvMjeI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.backupandbeyond.com/computer-comedy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.backupandbeyond.com/computer-comedy/</feedburner:origLink></item>
		<item>
		<title>Deduplication – Older than You Think</title>
		<link>http://feedproxy.google.com/~r/BackupBeyond/~3/guxNIQUtbjo/</link>
		<comments>http://www.backupandbeyond.com/deduplication-older-than-you-think/#comments</comments>
		<pubDate>Fri, 30 Oct 2009 16:48:42 +0000</pubDate>
		<dc:creator>Steve Kenniston</dc:creator>
				<category><![CDATA[Data Deduplication]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[Dedupe]]></category>
		<category><![CDATA[Deduplication]]></category>

		<guid isPermaLink="false">http://www.backupandbeyond.com/?p=400</guid>
		<description><![CDATA[Forty years ago the internet was born.  Since that time, people have been trying to stuff more data in smaller spaces - This was the beginning of deduplication.]]></description>
			<content:encoded><![CDATA[<div id="attachment_404" class="wp-caption alignleft" style="width: 160px"><img class="size-thumbnail wp-image-404" title="data-deduplication" src="http://www.backupandbeyond.com/wp-content/uploads/2009/10/data-deduplication-150x150.jpg" alt="  " width="150" height="150" /><p class="wp-caption-text">  </p></div>
<p>So I am a big fan of National Public Radio – NPR.  Today I learned that yesterday 10/29/09 was the 40<sup>th</sup> anniversary of the ‘internet’.  Now, I am sure there are a number of theories on when the internet was started and who started it, but safe to say that at this time in history 40 years ago, two guys from California sent the first 5 letter message, ‘Hello’, over a wire between two computers and internet messaging was born.</p>
<p>Since this point in time people have been trying to reduce the amount of data sent over the internet.  From email to instant messaging, from full files to compressed files and from disk drives to USB drives – people are always trying to make information trafficked over the internet smaller and faster.  No surprise coming from a group of people who have turned every term on the internet into an acronym, from USB, ISP, PDA, and LCD to SRM, ARM, and DPM, techies are always trying to stuff more data into smaller spaces.</p>
<p>Over the past 2 years data deduplication has become the latest fad in putting more data into a smaller space.  By removing redundant ‘blocks’ of data from the mass of files stored it is conceivable to reduce your data foot print by as much as 70%.  Deduplication is playing a predominate role in backup, especially backup over the WAN.  With deduplication, you can easily move your data over the WAN to a central data center for protection moving only small changes (blocks not files) of data and make even more room for FaceBook, Hulu, iTunes and more.  What is next for the internet.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Deduplication+%E2%80%93+Older+than+You+Think+http://7mem6.th8.us" title="Post to Twitter"><img class="nothumb" src="http://www.backupandbeyond.com/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Deduplication+%E2%80%93+Older+than+You+Think+http://7mem6.th8.us" title="Post to Twitter">Tweet This Post</a></p><i>Scridb filter</i><!-- Scridb filter--><img src="http://feeds.feedburner.com/~r/BackupBeyond/~4/guxNIQUtbjo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.backupandbeyond.com/deduplication-older-than-you-think/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.backupandbeyond.com/deduplication-older-than-you-think/</feedburner:origLink></item>
		<item>
		<title>Comprehensive Capacity Optimization – Deduplication 2.0</title>
		<link>http://feedproxy.google.com/~r/BackupBeyond/~3/TAqWAW2Wso0/</link>
		<comments>http://www.backupandbeyond.com/comprehensive-capacity-optimization-deduplication-2-0/#comments</comments>
		<pubDate>Wed, 07 Oct 2009 15:58:42 +0000</pubDate>
		<dc:creator>Steve Kenniston</dc:creator>
				<category><![CDATA[Archive]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Data Deduplication]]></category>
		<category><![CDATA[Diaster Recovery]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Avamar]]></category>
		<category><![CDATA[Data Protection]]></category>
		<category><![CDATA[Deduplication]]></category>
		<category><![CDATA[Disk Library]]></category>
		<category><![CDATA[protection]]></category>
		<category><![CDATA[Replication]]></category>

		<guid isPermaLink="false">http://www.backupandbeyond.com/?p=318</guid>
		<description><![CDATA[Technology is great isn&#8217;t it?  When someone thinks they have a new idea on the same old technology foundation they call it &#8220;X 2.0&#8243;.  I have been watching the banter between analysts and vendors (specifically NTAP’s Dr. Dedupe and Permabit’s CEO Tom Cook) on the topic of Deduplication 2.0 and it is my belief that [...]]]></description>
			<content:encoded><![CDATA[<p>Technology is great isn&#8217;t it?  When someone thinks they have a new idea on the same old technology foundation they call it &#8220;X 2.0&#8243;.  I have been watching the banter between analysts and vendors (specifically NTAP’s Dr. Dedupe and Permabit’s CEO Tom Cook) on the topic of Deduplication 2.0 and it is my belief that the proverbial boat is being missed (since we are using water analogies).  I have been watching these guys hash it out for the past few weeks and decided I have to jump in.  I find the real value to these conversations is the value to the end user.  At the end of the day, it doesn&#8217;t really matter who &#8216;coined&#8217; or &#8216;invented&#8217; a term (like deduplication 2.0) but what does matter is if  the term actually helps describe a technology and how that technology can be leveraged to make things better in the data center.  We should focus on the implications of this new generation of deduplication &#8211; ‘deduplication 2.0’.</p>
<p>In May I delivered a presentation to a number of EMC customers on the topic of Data Deduplication 2.0 &#8211; Comprehensive Capacity Optimization.  The point of my presentation was simple (and keep in mind this was before the Data Domain acquisition); there are a number of capacity optimization technologies/capabilities that are available to customers today.  Originally these deduplication technologies were used primarily for backup purposes but slowly, deduplication is making its way into primary storage. Deduplication in primary storage makes a lot of sense FOR DATA THAT IS STATIC.  Why only static data?  Static data is data that isn&#8217;t used frequently (doesn&#8217;t mean it&#8217;s not important, it just simply is not accessed often); because access to this data is infrequent, the performance requirements for this data is less than that of active data. Remember; nothing in IT is free.  If I deduplicate data, in order to use it, I must ‘rehydrate’ it and thus there is a performance implication so I want to be careful where I deduplicate data so as not to inhibit performance on production data.</p>
<p>Dr. Dedupe and Tom allude to Deduplication 2.0 moving beyond backup storage and into primary storage.  While deduplication in primary storage is technically possible, it is important that customers understand two important points:</p>
<p>1) Performance: whatever I do to deduplicate (I like optimize) capacity in order to save space, I must ‘undo’ in order to use the data.  If I set a policy that says any data that is 30 days old can be ‘optimized’, I need to be sure that data 30 days old is not active or I could pay a substantial performance penalty when using this data.  I may set a policy ‘any data that hasn’t be touched in 30 days, can be optimized.  I would just want to make sure that there is no scenario where at the end of a quarter let’s say, I would need to rehydrate all data in order to run some report.</p>
<p>2) Comprehensive and cumulative deduplication throughout my storage tiers.  What do I mean?  If I compress and single instance (deduplicate) data on my primary storage utilizing one set of deduplication technologies, say single instancing and compression algorithms, and then I backup this data using sub-file deduplication, a separate set of algorithms, then what I am left with are two separate sets of deduplicated data silos, and no one wins in this scenario.</p>
<p>It is important, no matter what deduplication technology you decide to use, that you can actually leverage the data stored in the deduplication device and that as data moves from device to device it doesn’t need to be rehydrated before it is moved.</p>
<p>A great use case of capacity optimization in primary storage is how EMC evolved the Celerra product this year.  Through a policy, let&#8217;s say any data that is older than 30 days, is compressed and stored as a single instance, with users seeing as much as 30% to 50% storage savings.</p>
<p>The real goal of Deduplication 2.0, and I think Dr. Dedupe alluded to this in his post &#8220;<a href="http://blogs.netapp.com/drdedupe/2009/09/the-dedupe-20-pundits-are-still-swimming-in-lake-10.html" target="_blank">The Dedupe 2.0 Pundits Are Still Swimming in Lake 1.0</a>&#8221; is that customers win when deduplication technology is a part of the core system or file system, when I no longer need to rehydrate data as I move it from primary storage to secondary storage.  If each storage device in the &#8217;stack&#8217; understands the language of the device in the stack ahead of it and the &#8216;deduplication&#8217; or file system is coordinated and cumulative from device to device than the customer is the winner.  This pertains to primary storage, backup storage and archive storage.  Never having to rehydrate data allows for more efficiency and a reduced tax on devices that can save the end user money.</p>
<p>Tom Cook, CEO of <a href="http://www.permabit.com/" target="_blank">Permabit </a>points out in his blog post &#8220;<a href="http://www.dedupe2.com/blog/2009/9/11/dedupe-10-vs-dedupe-20-the-debate-ensues.html" target="_blank">Dedupe 1.0 vs. Dedupe 2.0: The debate ensues</a>&#8221; that the only value to deduplication for primary storage is to move your data to a deduplicated archive which allows you to store data, efficiently, long term which I agree with, but as we have seen, not that practical.  Why? Because at the end of the day, the costs to manage storage are going up, up, up and the costs to buy storage are going down, down, down.  End users (NOT IT) are generally lazy or should I really say, just too busy to manage this storage.  In order to properly archive data, you need to have a policy that tells you what to move and when to move it.  IT can make all the recommendations in the world about the value of archive, but if users or really, lines of business managers don&#8217;t tell IT what data is important and what can be archived, then IT doesn&#8217;t really have a choice, which makes the premise of moving data to an archive, deduplicated or not – moot.</p>
<p>The real issue is balancing capacity optimization (to what granularity you deduplicate data) against performance on the appropriate tier of data, given that deduplication will happen on all tiers of storage.  The higher the performance requirements (tier 1) the less &#8216;optimized&#8217; I make the data, the lower the performance requirements (tier x, archive) the more optimized I make the data.  The benefits to the customer are that I can A) optimize data, consistently among each of its devices, and B) it can be cumulative from device to device, removing silos of deduplicated data across the stack.</p>
<p>For more on tiered dedupe, read my <a href="../../../../../betamax-redux/" target="_blank">Betamax Redux </a>blog post on EMC&#8217;s vision for deduplication and hopefully this will put you on a high performance ‘Road to Recovery’.</p>
<p style="text-align: center;">
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Comprehensive+Capacity+Optimization+%E2%80%93+Deduplication+2.0+http://dr24z.th8.us" title="Post to Twitter"><img class="nothumb" src="http://www.backupandbeyond.com/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Comprehensive+Capacity+Optimization+%E2%80%93+Deduplication+2.0+http://dr24z.th8.us" title="Post to Twitter">Tweet This Post</a></p><i>Scridb filter</i><!-- Scridb filter--><img src="http://feeds.feedburner.com/~r/BackupBeyond/~4/TAqWAW2Wso0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.backupandbeyond.com/comprehensive-capacity-optimization-deduplication-2-0/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.backupandbeyond.com/comprehensive-capacity-optimization-deduplication-2-0/</feedburner:origLink></item>
		<item>
		<title>Bloody Backup and Archive</title>
		<link>http://feedproxy.google.com/~r/BackupBeyond/~3/EJ_r9G2bF5c/</link>
		<comments>http://www.backupandbeyond.com/bloody-backup-and-archive/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 18:40:27 +0000</pubDate>
		<dc:creator>Steve Kenniston</dc:creator>
				<category><![CDATA[Backup]]></category>

		<guid isPermaLink="false">http://www.backupandbeyond.com/?p=387</guid>
		<description><![CDATA[Another great post from my colleague Mike Dutch
Many users believe that their backup tapes are their archive as well.    Additionally deduplicating storage systems are driving a similar notion that a backup platform and archive platform could be common.  Opinions definitely vary on this topic so I encourage all to comment.  Let’s take a deeper look…
The [...]]]></description>
			<content:encoded><![CDATA[<p>Another great post from my colleague Mike Dutch</p>
<p>Many users believe that their backup tapes are their archive as well.    Additionally deduplicating storage systems are driving a similar notion that a backup platform and archive platform could be common.  Opinions definitely vary on this topic so I encourage all to comment.  Let’s take a deeper look…</p>
<p>The reason you “backup&#8221; a set of data is because you might need to recover the primary data if it becomes unavailable or corrupted.   If you want to access a data set as it existed at a particular point in time but couldn’t, you could replace the primary data with the backup copy.   (SNIA defines backup as … “A collection of data stored on (usually removable) non-volatile storage <a href="http://www.snia.org/education/dictionary/m#media">media</a> for purposes of <a href="http://www.snia.org/education/dictionary/r#recovery">recovery</a> in case the original copy of data is lost or becomes inaccessible; also called a <a href="http://www.snia.org/education/dictionary/b/#backup_copy#backup_copy">backup copy.</a>”</p>
<p>The reason you “archive” a data set is because you want to preserve it.  It remains the primary data but because you&#8217;ll rarely access it, you want to put it somewhere safe just in case you ever want or need to access it again.  The <a href="http://www.snia.org/forums/dmf/knowledge/term_bridge/">SNIA Data Management Forum</a> defines an archive as &#8220;a specialized repository (including the supporting processes, policies, hardware, and software) used to preserve information and data for the long-term.&#8221;  The capabilities of an archive &#8220;include the ability to preserve, protect, control, maintain authenticity and integrity accommodate physical and logical migration, and guarantee access to information and data objects over their required retention period.&#8221;</p>
<p>Regardless of whether archive should be used as a noun or a verb, the point is that the purpose and therefore the lifecycle of data in an archive repository differ from a backup copy. While few would disagree with this premise, I&#8217;d wager that most people believe this implies you must store and manage these copies separately.  You can, but you don&#8217;t have to if you&#8217;re using a data protection solution that fully supports your business processes.</p>
<p>Someday, the notion of data protection will be subsumed by the notion of data storage.  If we store data, why shouldn&#8217;t we expect to get it back when we want it?  Why shouldn&#8217;t we expect to resume an application from whatever point in time we want to?  If the system can’t do this, is it really protecting my data?  This leads us to the question of what data protection is.</p>
<p>The SNIA definition of data as &#8220;The digital representation of anything in any form&#8221; obscures its richness (sight, sound, touch, smell, taste). After all, shouldn&#8217;t analog information such as printed books be considered data?  Of course, a dictionary is not an encyclopedia and a definition should be succinct. I&#8217;ll read the SNIA definition as meaning, “Data is something that can be processed by a computer after any format transformations as necessary.”</p>
<p>Let&#8217;s posit that data protection means assurance that data is accessible to authorized users with acceptable performance in an auditable manner.  Sounds reasonable yet this definition exceeds the usual scope of data protection. Data protection is usually measured in terms of availability metrics, that is, in terms of RPO and RTO.  We also want assurance that data has not been altered or destroyed in an unauthorized manner (data integrity).  And of course, we don&#8217;t want our data to be available to anyone that should not have access to it, whether &#8220;leaked&#8221; over a network or by losing control of physical storage media.  Even if an authorized change was made, the user may change their mind and want to access an earlier version of the data. Also, what about poor performance?  I know I&#8217;ll find something else to do if application performance degrades to a point when I cannot remain productive.  Unacceptable performance equates to unavailability.  Auditable means the ability to verify who controlled what when (to comply with GRC initiatives and provide a chain of custody).</p>
<p>The traditional definitions of operational recovery and disaster recovery, distinguished by the impact of the outages (whether caused by operational errors, data corruption, or hardware failures), are subsumed by this accessible-performant-compliant definition of data protection.  Retention and long-term preservation of fixed content (and related metadata) within an archive repository also falls under the &#8220;ensure data is accessible&#8221; umbrella of our broad definition of data protection.  Regardless of whether performance and compliance capabilities are included in your definition of data protection, they remain requirements of conducting an effective and responsible business.</p>
<p>Let&#8217;s get back to the main idea of this post, namely, that while it is necessary to MANAGE backup and archive data separately, it is not necessary to STORE backup and archive data separately.  Storage systems with data deduplication capabilities are one proof point. An accessible-performant-compliant definition of data protection broadens the opportunities for both resource sharing and risk reduction.  Data protection is much more than backup and archive.  It&#8217;s about keeping your business fit by ensuring its lifeblood, its data, is clean and flowing freely.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Bloody+Backup+and+Archive+http://w958q.th8.us" title="Post to Twitter"><img class="nothumb" src="http://www.backupandbeyond.com/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Bloody+Backup+and+Archive+http://w958q.th8.us" title="Post to Twitter">Tweet This Post</a></p><i>Scridb filter</i><!-- Scridb filter--><img src="http://feeds.feedburner.com/~r/BackupBeyond/~4/EJ_r9G2bF5c" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.backupandbeyond.com/bloody-backup-and-archive/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.backupandbeyond.com/bloody-backup-and-archive/</feedburner:origLink></item>
		<item>
		<title>The Side Effects of Backup on Server Virtualization</title>
		<link>http://feedproxy.google.com/~r/BackupBeyond/~3/cJLcvgizC9w/</link>
		<comments>http://www.backupandbeyond.com/the-side-effects-of-backup-on-server-virtualization/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 22:25:48 +0000</pubDate>
		<dc:creator>Steve Kenniston</dc:creator>
				<category><![CDATA[Backup]]></category>
		<category><![CDATA[Data Deduplication]]></category>
		<category><![CDATA[Virtualization]]></category>
		<category><![CDATA[Avamar]]></category>
		<category><![CDATA[Data Protection]]></category>
		<category><![CDATA[Dedupe]]></category>
		<category><![CDATA[Deduplication]]></category>
		<category><![CDATA[protection]]></category>
		<category><![CDATA[Recovery]]></category>
		<category><![CDATA[Restore]]></category>
		<category><![CDATA[vmware]]></category>

		<guid isPermaLink="false">http://www.backupandbeyond.com/?p=380</guid>
		<description><![CDATA[Server virtualization has changed the IT landscape dramatically.  It has become a magic potion curing a number of ills in the physical server world such as low individual CPU utilization and excess use of space, power and cooling in the data center.  However, like all potions that cure what ails you, there can be side [...]]]></description>
			<content:encoded><![CDATA[<p>Server virtualization has changed the IT landscape dramatically.  It has become a magic potion curing a number of ills in the physical server world such as low individual CPU utilization and excess use of space, power and cooling in the data center.  However, like all potions that cure what ails you, there can be side effects.  You need to be careful of what the Witch Doctor orders.</p>
<p>When I speak with customers who have aggressively implemented a virtual server infrastructure, 9 out of 10 will tell me that they underestimated the affect that virtualization would have on their backups and backup process and how backup might actually make virtualization less of the magic potion they had hoped, when not considered during the virtual server assessment and planning process.  So what is the issue?  Backup is a virtualization bottleneck, and without addressing it, you may not be able to obtain the server consolidation ratios you had been expecting which can have a negative effect on your virtual server TCO and ROI.</p>
<p>This is a timely discussion as <a href="http://vmworld.com/index.jspa">VMworld</a> has just concluded.  VMware users flocked to VMworld looking for best practices when it comes to implementing virtual server technology.  Because virtualization allows IT to reduce the overall physical hardware infrastructure, users will be looking at how to maximize their server consolidation ratios (get as many virtual servers on a physical server as they can and still provide good application performance).</p>
<p>I often hear that companies assess their environments by looking at the production applications on their physical server environment, identify their work loads and translating that into some consolidation ratio of physical servers to virtual servers.  I also hear, from these same customers, that backup was never taken into consideration during the assessment phase when trying to identify the best possible consolidation ratios.  These customers implement their new virtual server environments, install the backup agent they had previously been using for physical server backups and attempt to backup their virtual servers and they find that they would only be able to protect 50% to 60% of the new environment.  Why?</p>
<p>Let’s look at the physics.  Let’s say your virtualization ratio is 12 virtual servers to 1 physical server.  Ten physical servers backup with 12 NIC cards, 12 CPUs, 12 Memory ‘chunks’, etc… When you moved these 12 physical servers into the virtual world and put them on one physical server did you put 12 NIC cards in the new physical server?  Did you put 12 CPUs in the new server?  Do you have 12x the memory?  Chances are, probably not.  However the capacity didn’t change did it?  So how could one expect that the backup performance, which is I/O, memory and CPU intensive would operate well in a virtual world?</p>
<p>Diagram 1 below show how when you backup 12 servers, the resource drain on each server is roughly 25% (per system during a full backup).  When you virtualize these 12 servers onto one or two physical servers, your physical system utilization shoots up to 80%+.  This utilization can be so dramatic that it actually effects the number of virtual servers you can have on these systems which can ruin your virtual server TCO / ROI.</p>
<p style="text-align: center;">
<div id="attachment_381" class="wp-caption aligncenter" style="width: 648px"><img class="size-full wp-image-381" title="vm" src="http://www.backupandbeyond.com/wp-content/uploads/2009/09/vm.jpg" alt="Figure 1" width="638" height="368" /><p class="wp-caption-text">Figure 1</p></div>
<p style="text-align: center;">
<p>Simple math dictates, unless you have all the same resources on your new physical server as you did on all your physical servers before the consolidation, you won’t get the same backup performance.  I have spoken with customers who aimed to do a 25 to 1 virtual to physical server consolidation, who  were only actually able to get a 15 to 1 consolidation ratio in reality because their backup application couldn’t handle 25 virtual servers on one physical server, leaving some unprotected.</p>
<p>People could argue that if you properly schedule each virtual machine to backup in a window when all the other systems are not backing up, then perhaps you could get by with traditional backup.  The flip side is, IT has been telling me they don’t want to manage the backup process anymore than they have to.  So how do you ‘fix’ this problem?</p>
<p>The issue is that backup is a very intensive I/O application therefore there is only one way to fix the problem.  You need to reduce the amount of I/O generated and sent through the physical devices that house the virtual servers during backup.  Virtual servers were designed to provide a lot of benefits but high I/O capabilities is not one of them.  (This is okay, every technology implementation has its tradeoffs.  When the positives outweigh the negatives, especially in a substantial way, as they do with virtual servers, you usually have a paradigm shift, and this is what we are seeing with virtual servers.)</p>
<p>So how do you change the I/O pattern of backup?   You do so by decreasing the amount of data that is utilizing the shared resources during backup.  There are a couple of ways to do this.  One way is to leverage the storage array and snapshot the data.  Snapshots allow you to make copies of virtualized server data and mount this snapshot to a proxy host and off-load the backups from the physical server that house the virtual servers.  The downsides are:</p>
<p>1)      This becomes a new set of processes to manage unlike traditional backup processes</p>
<p>2)      You need extra storage capacity with this solution</p>
<p>3)      You will need to manage another physical server (proxy server)</p>
<p>4)      You will need more backup agents from your backup software provider</p>
<p>The most efficient way, however, is to take advantage of a new backup software application that leverages data reduction (data deduplication) on the client.  Your processes stay the same, there is no need for additional primary storage hardware and by leveraging a ‘smarter’ backup client, you will reduce the I/O tax on your physical server devices and thereby have the ability to maximize your TCO / ROI for your new virtual server environment.</p>
<p>Additionally, a number of these technologies have additional offerings that truly make them next generation.  Backup licensing is slowly moving to a capacity based license model.  One great feature of these new products is the fact that there is no charge for clients or agents.  This allows you to create a virtual server template with the backup agent embedded within it.  You no longer have to worry about proliferating backup clients and then paying for all those clients when it is time to ‘true up’ with your backup software vendor.  Data deduplication technologies also offer the ability to replicate the backup data efficiently to disk at a remote site so you can develop a more efficient disaster recovery plan that reduces the reliance on a tape and increases your overall operational efficiency.</p>
<p>Regardless of which path you choose, each requires IT to rethink their backup strategies when it comes to protecting virtual server environments.</p>
<p>I encourage you to do two things as you consider moving to a virtual server infrastructure:</p>
<p>1)      Make sure you are thinking about data protection when architecting your new virtual server environment</p>
<p>2)      Check out some of the new technologies and best practices offered by vendors for protecting virtual servers.</p>
<p>Hopefully this will help put your virtual server world back on the <em>Road to Recovery</em>!</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=The+Side+Effects+of+Backup+on+Server+Virtualization+http://9ar26.th8.us" title="Post to Twitter"><img class="nothumb" src="http://www.backupandbeyond.com/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=The+Side+Effects+of+Backup+on+Server+Virtualization+http://9ar26.th8.us" title="Post to Twitter">Tweet This Post</a></p><i>Scridb filter</i><!-- Scridb filter--><img src="http://feeds.feedburner.com/~r/BackupBeyond/~4/cJLcvgizC9w" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.backupandbeyond.com/the-side-effects-of-backup-on-server-virtualization/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.backupandbeyond.com/the-side-effects-of-backup-on-server-virtualization/</feedburner:origLink></item>
	<media:rating>nonadult</media:rating></channel>
</rss>
