<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0">
<channel>
<title>Brian Biles' Blog</title>
<link>http://www.dedupematters.com/brianbilesblog/</link>
<description />
<language>en-US</language>
<lastBuildDate>Mon, 04 Oct 2010 05:30:00 -0700</lastBuildDate>
<generator>http://www.typepad.com/</generator>

<docs>http://www.rssboard.org/rss-specification</docs>
<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/BrianBiles" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="brianbiles" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
<title>1+1 = 3 Even Before OpEx Discount</title>
<link>http://www.dedupematters.com/brianbilesblog/2010/10/1-plus-1-equals-3-even-before-opex-discount.html</link>
<guid isPermaLink="true">http://www.dedupematters.com/brianbilesblog/2010/10/1-plus-1-equals-3-even-before-opex-discount.html</guid>
<description>The big story in backup software is that it’s not just software anymore. Apparently my prior blog’s last line (below) was prescient. Maybe it was the memo itself. Increasingly, the new field of battle is in deeper integration between backup software and dedupe storage. Avamar started this wave a long time ago, and it remains the most advanced implementation. More traditional backup software is also moving in this direction: Symantec pioneered the idea using OST and Data Domain Boost to integrate NetBackup, and now Backup Exec 2010 with Data Domain and other OST licensees. Starting this summer, NetApp and SyncSort...</description>
<content:encoded>&lt;p&gt;The big story in backup software is that it’s not just software anymore. Apparently my prior blog’s last line (below) was prescient. Maybe it was the memo itself.&lt;/p&gt; 

&lt;p&gt;Increasingly, the new field of battle is in deeper integration between backup software and dedupe storage. Avamar started this wave a long time ago, and it remains the most advanced implementation. More traditional backup software is also moving in this direction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Symantec pioneered the idea using OST and Data Domain Boost to integrate NetBackup, and now Backup Exec 2010 with Data Domain and other OST licensees.&lt;/li&gt;

&lt;li&gt;Starting this summer, NetApp and SyncSort are offering channel-based integration packages for SyncSort to manage snapshots on FAS systems. Not quite Boost, but reacting to the same problem.&lt;/li&gt;

&lt;li&gt;Also starting this summer, Symantec are offering their own storage systems in some geographies for PureDisk. (I love OST, but how weird is it to be on the NBU 5000 team in the “No Hardware Agenda” company? It would be like working in DB2 support at Oracle.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As noted below, EMC has moved aggressively to integrate NetWorker and Data Domain based on the Boost technology at the Storage Node (backup server) level. This is now available in the latest release of NetWorker. Avamar and NetWorker have already been integrated for some time from the NetWorker client.&lt;/p&gt;

&lt;p&gt;Integration brings speed, as discussed below, but it also brings lower OpEx. Following up on the launch last May of NetWorker using Boost to integrate with Data Domain, we looked in detail at some of the results in our beta sites. To these customers Data Domain was already fast, but being able schedule replication, manage multiple retention policies, clone to tape as well as monitor and report on Data Domain systems all within NetWorker also brings significant efficiency improvement. One customer said they expect to see a 20-30% reduction in management time.&lt;/p&gt;

&lt;p&gt;See prior memo.&lt;/p&gt;</content:encoded>


<category>Data Domain</category>
<category>Deduplication</category>

<dc:creator>Data Domain</dc:creator>
<pubDate>Mon, 04 Oct 2010 05:30:00 -0700</pubDate>

</item>
<item>
<title>Thinking Outside the Box</title>
<link>http://www.dedupematters.com/brianbilesblog/2010/05/thinking-outside-the-box.html</link>
<guid isPermaLink="true">http://www.dedupematters.com/brianbilesblog/2010/05/thinking-outside-the-box.html</guid>
<description>If you follow the backup / recovery technology landscape, the most important landscape change since the dawn of replicated dedupe systems just happened. But it might be hard to understand until you really play it out in your head. This blog ends with some of the conclusions that will continue to resonate long after the PR dies down. DD Boost is distributed software. Part of it runs on the backup server of traditional backup software. The other part of it runs on the Data Domain system. It off-boards the math of the Data Domain dedupe process (cutting the data stream...</description>
<content:encoded>&lt;p&gt;If you follow the backup / recovery technology landscape, the most important landscape change since the dawn of replicated dedupe systems just happened. But it might be hard to understand until you really play it out in your head. This blog ends with some of the conclusions that will continue to resonate long after the PR dies down.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;DD Boost&lt;/b&gt; is distributed software. Part of it runs on the backup server of traditional backup software. The other part of it runs on the Data Domain system. It off-boards the math of the Data Domain dedupe process (cutting the data stream into variable-length segments, fingerprinting them, and if they are new, compressing them) to the backup server. Lookups of what’s new versus old is done in batches on the backing Data Domain system.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This makes the backup server 20% less loaded (less data to copy).&lt;/li&gt;

&lt;li&gt;It makes the LAN between the backup server and the Data Domain system 95% less loaded (sending only deduped compressed segments).&lt;/li&gt;

&lt;li&gt;It gives the Data Domain system 50% higher aggregate backup throughput compared to our prior Symantec OpenStorage benchmarks, or about 2.4x faster than the VTL or NFS benchmarks on the same system.&lt;/li&gt;

&lt;li&gt;It includes management interfaces for backup software to control the Data Domain system, including managing per-file replication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The performance of all Data Domain systems is being restated this week based on DD Boost, our new fastest protocol. We didn’t say it at the time, but the recently announced Data Domain Global Deduplication Array (GDA) software is based on extensions to the DD Boost technologies. It will support DD Boost implementations across the industry. The GDA performance doesn’t need restating. But the DD880, for example, just went from a prior benchmark for aggregate backup speed of 5.4 TB/hour to 8.8 TB/hour.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;NetWorker&lt;/b&gt; will integrate the distributed DD Boost library in the second half of 2010. It will then have caught up with NetBackup in Data Domain integration and speed. It will have all the same control over Data Domain replication that Symantec products have had (or, in NetWorker terminology, Cloning Controlled Replication.) This caps a long list of fundamental improvements NetWorker has focused on for the last couple years to optimize for disk-based backup, especially to dedupe storage. NetWorker now offers best of breed technology for dedupe at both the NetWorker Client and through the Storage Node server.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;NetWorker has integrated the Avamar client code in its clients for some time. By policy, if an admin wants dedupe to happen at the client, the Avamar process may be invoked, interacting with an Avamar storage system in the datacenter.&lt;/li&gt;

&lt;li&gt;With these announcements, NetWorker now integrates through DD Boost in its Storage Node server. Data Domain will be seen as a standard storage device type.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;b&gt;What does this all mean?&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;&lt;i&gt;In the future, most backups to Data Domain systems will be through DD Boost.&lt;/i&gt; Using the OpenStorage distributed integration method, Symantec and Data Domain proved a while ago that the combination of backup software and dedupe storage could be made more manageable. Now, it’s also faster and more efficient. By a lot. The only reasons not to use DD Boost moving forward are that you need to use fibre channel (FC) or that the backup software doesn’t have a DD Boost integration program. Since LAN bandwidth is made so small with DD Boost, a lot of admins can consider LANs where before FC was the only technical alternative. Backup vendors have been asleep at the switch on competing with OpenStorage. They won’t make that mistake for too much longer.&lt;/p&gt;

&lt;p&gt;Remember when Symantec first partnered with NetApp after EMC bought Legato? Their pair-wise effort created some of the Ur-structures that underlie OpenStorage, but it took a while to generalize the interface. We are in the same maturation process between Data Domain and NetWorker teams. Those lessons will result in a clean set of interfaces, but it will take a little time to complete.&lt;/p&gt;

&lt;p&gt;But don’t misunderstand: our engineering relationship around Symantec is still critical to us. This is one of those times when it serves our mutual customers hugely to have us collaborate, even if some of our products can also compete. Our respective teams have continued to have strong links, and we wouldn’t have it any other way. DD Boost is the new name of the Data Domain OpenStorage software option because (1) DD Boost supports more than Symantec OpenStorage now, and (2) it does a lot more than it used to. But the interaction of DD Boost with NetBackup and now Backup Exec is through the Symantec OpenStorage APIs, and new features will continue to be developed in tandem as long as they allow us to participate.&lt;/p&gt;

&lt;p&gt;In the short term the advantages to Data Domain customers using DD Boost-linked software (EMC or Symantec) are significant and differentiating. At some point, the other backup software vendors will recognize how important it is to catch up on these features. You can estimate the relative R&amp;D focus of these vendors by timing how long it takes for them to support OpenStorage-like interfaces. We’re happy to help them figure it out. In the meantime, DD Boost is already supported by more than half the enterprise backup market.&lt;/p&gt;

&lt;p&gt;&lt;i&gt;You can’t do a DD Boost style approach with a post-process dedupe system.&lt;/i&gt; Data is deduped live, on the way in, inline. If they ever published their internal dedupe throughputs, post-process vendors might be asked to compare to Data Domain’s. That was already awkward; now they might never do it.&lt;/p&gt;

&lt;p&gt;Even inline vendors such as IBM are going to have to gulp a little. EMC now offers a complete dedupe storage system (the DD880) that is more than 4x the backup speed of their comparable single-controller ProtecTier gateway (rated at 500 MB/sec., or 1.8 TB/hour). Their real systems, appliances with disk attached so you can review price-performance apples/apples, are considerably smaller and slower.&lt;/p&gt;

&lt;p&gt;&lt;i&gt;Unlike pure software vendors trying to distribute dedupe, the EMC approach includes complete dedupe storage systems,&lt;/i&gt; so performance and management are predictable, simple and scalable. We’re focused on developing end-to-end solutions. We can imagine things differently because we develop both the software and the storage. Remember how in the 90s some used to think LUN management would be done on servers? It didn’t scale, and in the end, serious implementations use arrays to do this. Systems have already won with dedupe as well; some vendors just didn’t seem to get the memo.&lt;/p&gt;
</content:encoded>



<dc:creator>Data Domain</dc:creator>
<pubDate>Tue, 11 May 2010 05:30:00 -0700</pubDate>

</item>
<item>
<title>Global Positioning</title>
<link>http://www.dedupematters.com/brianbilesblog/2010/04/global-positioning.html</link>
<guid isPermaLink="true">http://www.dedupematters.com/brianbilesblog/2010/04/global-positioning.html</guid>
<description>Some have accused this blog of being published too slowly for the fast-moving world of the blogosphere. It shouldn’t be a surprise then that after going dark for a year, I’m going to respond to some older thoughts about global deduplication offered by Curtis Preston here (updated for the DD880 a little here), also from a year ago. In the meantime, a lot has changed, including input from a new user poll by ESG that observed that most users don’t think it’s a high priority. As of last weekend, Curtis and Lauren Whitehouse both believe that Global Dedupe matters a...</description>
<content:encoded>&lt;p&gt;Some have accused this blog of being published too slowly for the fast-moving world of the blogosphere. It shouldn’t be a surprise then that after going dark for a year, I’m going to respond to some older thoughts about global deduplication offered by Curtis Preston &lt;a href="http://www.backupcentral.com/content/view/231/47"&gt;here&lt;/a&gt; (updated for the DD880 a little &lt;a href="http://www.backupcentral.com/content/view/256/47"&gt;here&lt;/a&gt;), also from a year ago. In the meantime, a lot has changed, including input from a new &lt;a href="http://www.enterprisestrategygroup.com/2010/04/its-not-about-reduction-ratios-the-real-impact-of-global-deduplication/feed"&gt;user poll&lt;/a&gt; by ESG that observed that most users don’t think it’s a high priority. As of last weekend, Curtis and Lauren Whitehouse &lt;a href="http://www.backupcentral.com/content/view/317/47"&gt;both believe&lt;/a&gt; that Global Dedupe matters a lot. This week, we’re announcing our first Data Domain Global Deduplication Array, which settles the issue for EMC (and might have contributed to the recent spike in discussion activity on the topic.) We never said global dedupe was undesirable, we just said it was hard to build it right. It also may never be something most users ask for, since it should be invisible.&lt;/p&gt;

&lt;p&gt;A lot of the historic debate depended on your assumptions about how fast and big a dedupe system is with a given controller, and then how easy is it to deal with if the backup load overcomes a single-controller system’s resources. Let’s start with sizing. In the first articles, Curtis suggested that if you require throughput faster than one dedupe controller can support, you really need global dedupe, or sharing dedupe across multiple controllers. His suggested throughput threshold a year ago was about 10 TB/day of backups, assuming a 12-hour backup window. A 10 TB backup over 12 hours suggests a max of about 250 MB/sec., or a little less than 1 TB/hour. The success of Data Domain without global dedupe historically argues that our enablement of larger single-controller systems has generally kept pace with the adoption curve of dedupe storage into larger enterprises.&lt;/p&gt;

&lt;p&gt;It’s important to remember that the DD architecture is CPU-centric. DD systems are on a relentless march to be faster and bigger by riding the results of Moore’s law, benefits not enjoyed by most of our competitors who are disk-bound. In 2004, a DD system’s throughput was 150 GB/hour and stored 1.25 TB. Over 5 years (including the DD880, announced after Curtis’ first post), our systems became almost 40x faster and 60x bigger, a general trend that should continue based on Intel forecasts.&lt;/p&gt;

&lt;p&gt;So if you have to set a threshold for the debate regarding DD systems, 10 TB / day isn’t a reasonable point of concern. If the concern starts at 12 hours of backup for a DD880, that’s a 65TB/day workload. 65 TB of backup policies are a big granule to manage, it’s not in the details anymore. This is as big as a lot of customers’ spans for an entire NetWorker data zone or a NetBackup master server (don’t attack me here, I know folks with bigger ones too). It’s worth 6 of whatever Curtis was evaluating at the time of the original post. In the subsequent post, he notes this and observes that he knows sites that are bigger still. OK.&lt;/p&gt;

&lt;p&gt;How about migration if you outgrow a single-controller system’s footprint? This is easier than it might appear. For example, backup software policies target different groups of data to different targets. Some can even detect file-system full targets and automatically aim at a secondary target. Or, to expand or consolidate, because it’s backup, the simplest thing is just to start backing up to a new larger system and keep an older smaller system around for restores until its data’s retention period ends, at which point it can just be decommissioned. Data Domain systems (except the DD140) allow a user to start with a given capacity and add more later, so an expansion is not a migration. We also have a lot of ways to move easily from one single-controller system to another bigger one if a system is outgrown; e.g. on the DD690 to DD880 transition, existing disk shelves can stay in place and only the head needs to transition. Other techniques, such as replicating deduped data, can allow straightforward consolidation with a minimum of transition. If some think global dedupe is the only way to address these issues, as Chuck Hollis sometimes says, we may agree to disagree.&lt;/p&gt;

&lt;p&gt;But let’s assume global dedupe across controllers is important. The key place where the notes above don’t apply is when there are groups of our largest systems in one place, so a single larger system isn’t available to provide a consolidative effect. Integrating them seamlessly into one bigger system does take a lot more work under the hood. After 5 years in the lab, we just announced our first multi-controller system, the Global Deduplication Array (GDA), which extends our SISL architecture to support global deduplication with dynamic load balancing. Here, we specifically focused on scale and simplicity, without sacrificing any of our well known properties: inline dedupe, with a design aimed at maximizing data invulnerability. You can read elsewhere for &lt;a href="http://forms.datadomain.com/go/datadomain/WS_WP_DDGDA_10"&gt;details&lt;/a&gt;. This system includes up to 2 controllers. Quick summary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Performance: up to 12.8 TB/hour aggregate backup throughput, or with Curtis’ 12-hour metric, 150 TB/day;&lt;/li&gt;

&lt;li&gt;Capacity: up to 285 TB addressable (post-RAID, spares etc.) for storing globally deduped info;&lt;/li&gt;

&lt;li&gt;(Comparison to 2004 DD flagship system, for those keeping track: 85x faster, 228x bigger.)&lt;/li&gt;

&lt;li&gt;Dynamic load balancing for throughput and capacity, but with global dedupe across all data.&lt;/li&gt;

&lt;li&gt;Unlike many systems promoting global dedupe, the GDA design compares all new data to all data stored across the system, regardless which client it was backed up from. Similar but different files with different names from different clients are deduped against each other.&lt;/li&gt;

&lt;li&gt;In the first release, it will support Symantec and EMC backup products only, leveraging software distributed to the backup server (Symantec OpenStorage now, and in the second half of 2010, a different/similar approach in EMC NetWorker.) Here, for extra credit, we also gave this product a global namespace.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Could we do an N-way (N&gt;2) version of the GDA? Could we apply this technique to smaller systems? Yes architecturally. We’ll review these kinds of questions against other alternatives over time, and we’ll be very interested in customer feedback. The current GDA is &gt;15x faster than 10 TB/day. Based on Intel roadmaps, we expect this 2-controller approach to keep growing to 5x-10x its current scale in a few years. At that point, while this is all hypothetical, it begs the question of the most appropriate eggs/baskets ratio. Also, the current GDA is already highly competitive with the largest concentration of multi-controller dedupe nodes (sometimes more-way) in products from our competitors that do global dedupe.&lt;/p&gt;

&lt;p&gt;Does it have limitations in the first release? Sure. It’s doesn’t support all backup apps, we focused on the most advanced Enterprise apps. It doesn’t support archive apps the way other Data Domain systems do. Etc. It’s the first chapter in a long upcoming program. But we included the things first that our large customers wanted most, and watch this space.&lt;/p&gt;

&lt;p&gt;Will people still buy multiple single-controller systems, with local deduplication and global management, DDX-style? Yes. The DD880 just doubled in size, so it now hosts 140 TB of addressable space. Per the above discussion, 65 TB of throughput per day is a pretty big granule. And it’s going to get bigger over time. Migration and consolidation after growth is already pretty easy. So what is the first priority for us with the GDA? Even simpler management when you need a lot of big systems in one place.&lt;/p&gt;</content:encoded>



<dc:creator>Data Domain</dc:creator>
<pubDate>Sun, 11 Apr 2010 20:13:03 -0700</pubDate>

</item>
<item>
<title>If Not a Bailout, What?</title>
<link>http://www.dedupematters.com/brianbilesblog/2009/03/if-not-a-bailout-what.html</link>
<guid isPermaLink="true">http://www.dedupematters.com/brianbilesblog/2009/03/if-not-a-bailout-what.html</guid>
<description>Quantum's revenue run rate has been in free fall over the past several years, while their balance sheet has ballooned into one big IOU. This preceded the bad economy; even while Sun and IBM tape revenues were growing, Quantum tape was being voted off the island. With the loan last week, they're officially on EMC's life support. If they were going to acquire Quantum whole, they would have already done so. It is also a poorly kept secret that EMC has their own designs on this technology and only has a temporary need for Quantum to stay around. One can...</description>
<content:encoded>&lt;p&gt;Quantum&amp;#39;s revenue run rate has been in free fall&amp;#0160;over the past several years, while&amp;#0160;their balance sheet has ballooned into one big IOU. This preceded the bad economy; even while Sun and IBM tape revenues were growing, Quantum tape was being voted off the island. With the &lt;a href="http://www.theregister.co.uk/2009/03/27/emc_loan_to_quantum/"&gt;loan&lt;/a&gt;&amp;#0160;last week,&amp;#0160;they&amp;#39;re officially on EMC&amp;#39;s life support. If&amp;#0160;they were going to acquire Quantum whole, they would have already done so. It is also a poorly kept secret that EMC has&amp;#0160;their own designs on this technology and only has a temporary need for Quantum to stay around.&amp;#0160;One can imagine why it worked out this way, but evidently EMC just wants some assets, not the company or the people.&lt;/p&gt;
&lt;p&gt;Why customers would directly place their trust in such a vendor is a mystery. Think of other tech companies that have gone into years of chronic reverse. How many turn around? Their layoff history doesn&amp;#39;t bode well for future products and ongoing support. EMC (and later Dell) will functionally replace Quantum&amp;#39;s channel as customers hope for a &lt;a href="http://searchdatabackup.techtarget.com/news/article/0,289142,sid187_gci1351790,00.html" target="_blank"&gt;supportable product experience&lt;/a&gt;, hiding the weakened brand, but even that is still to come. As with other former high tech stars, Quantum has a rich history, but that&amp;#39;s not helpful anymore. They have to spend more time on how they are keeping creditors at bay than on how they&amp;#39;re building markets. &lt;/p&gt;
&lt;p&gt;Unlike the US Treasury with GM, where there&amp;#39;s no long term plan to keep the assets, this is not really a bailout; it&amp;#39;s something else. There is a critical choice facing bloggers. Which metaphor should be applied?&lt;/p&gt;
&lt;p&gt;A. In financial terms, Quantum is a &lt;a href="http://www.businessweek.com/magazine/content/09_04/b4117024316675.htm" target="_blank"&gt;zombie&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;B. They are cutting off one limb after another and trying to regain momentum. It reminds me of the &lt;a href="http://www.youtube.com/watch?v=2eMkth8FWno" target="_blank"&gt;Black Knight&lt;/a&gt; in Monty Python&amp;#39;s Holy Grail.&lt;/p&gt;
&lt;p&gt;C. EMC can prop up this regime as long as it suits&amp;#0160;theirneed to gain access to the resource it wants. Quantum is EMC&amp;#39;s &lt;a href="http://en.wikipedia.org/wiki/Banana_republic" target="_blank"&gt;banana republic&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;D. From the outside, it looks like slow &lt;a href="http://www.wisegeek.com/what-is-organ-harvesting.htm" target="_blank"&gt;organ harvesting&lt;/a&gt;.&lt;/p&gt;</content:encoded>


<category>Dedupe market</category>

<dc:creator>Data Domain</dc:creator>
<pubDate>Tue, 31 Mar 2009 15:21:33 -0700</pubDate>

</item>
<item>
<title>Dedupe and Storage are Not Commodities</title>
<link>http://www.dedupematters.com/brianbilesblog/2009/03/dedupe-and-storage-are-not-commodities.html</link>
<guid isPermaLink="true">http://www.dedupematters.com/brianbilesblog/2009/03/dedupe-and-storage-are-not-commodities.html</guid>
<description>In a commodity market, differentiation is very low between producers. Suppliers are chosen purely on price and distribution. Dedupe storage products are starkly differentiated. The products couldn't be more distinct, though in the rush to learn about why Data Domain has grown so fast, it's easy to misunderstand what's going on. Dedupe is not a commodity; it's not even a product category. Bob Passmore at Gartner makes a persuasive case that nothing in storage is a commodity. Enterprise disks aren't commodities. Storage systems definitely aren't commodities; the lowest common denominator between them is not sufficiently capable or predictable for customers...</description>
<content:encoded>&lt;p&gt;In a &lt;span style="TEXT-DECORATION: underline"&gt;&lt;a href="http://en.wikipedia.org/wiki/Commodity"&gt;commodity&lt;/a&gt;&lt;/span&gt; market, differentiation is very low between producers. Suppliers are chosen purely on price and distribution.&lt;/p&gt;
&lt;p&gt;Dedupe storage products are starkly differentiated. The products couldn&amp;#39;t be more distinct, though in the rush to learn about why Data Domain has grown so fast, it&amp;#39;s easy to misunderstand what&amp;#39;s going on. Dedupe is not a commodity; it&amp;#39;s not even a product category. &lt;/p&gt;
&lt;p&gt;Bob Passmore at Gartner makes &lt;a href="http://www.gartnerinfo.com/datacenter08/ "&gt;a persuasive case that&lt;/a&gt; nothing in storage is a commodity.&amp;#0160;Enterprise &lt;em&gt;disks &lt;/em&gt;aren&amp;#39;t commodities. Storage systems definitely aren&amp;#39;t commodities; the lowest common denominator between them is not sufficiently capable or predictable for customers of any reasonable scale. &lt;/p&gt;
&lt;p&gt;&lt;em&gt;Deduplication is not a product category&lt;/em&gt;. It&amp;#39;s not even a specific method. It&amp;#39;s an effect: data size gets reduced by pooling redundancies. Depending on vendor, deduplication itself can mean significant variation in: &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data reduction effects and predictability 
&lt;li&gt;Delays and impact on I/O 
&lt;li&gt;Applicability to applications and data types 
&lt;li&gt;Resilience 
&lt;li&gt;Overheads in hardware and competing system processes 
&lt;li&gt;Throughput 
&lt;li&gt;Replication flexibility and speed 
&lt;li&gt;Packaging (slow backup software, inline storage, post-process kludge) &lt;/li&gt;
&lt;/li&gt;&lt;/li&gt;&lt;/li&gt;&lt;/li&gt;&lt;/li&gt;&lt;/li&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;p&gt;Production dedupe systems have weird optimizations that you can&amp;#39;t find off the shelf. It is not a simple solution to get right, and if you start with the wrong architecture, it won&amp;#39;t get fixed. It uses massive system computing and disk access overhead for hours. It is sensitive to the data patterns being input. It&amp;#39;s also very sensitive to all the normal problems of storage systems; if there&amp;#39;s data loss or corruption, a single block can affect hundreds of files. Vendors who are late to market will take shortcuts. In the end, these only show up in bad side effects. Under stress, the problems will be visible. &lt;/p&gt;
&lt;p&gt;Dedupe has no particular packaging consistency. It has no particular interface for debate at standards committees. It&amp;#39;s less like an &amp;quot;it&amp;quot; and more like a &amp;quot;them.&amp;quot; &lt;a href="http://www.backupcentral.com/components/com_mambowiki/index.php/Disk_Targets%2C_currently_shipping"&gt;This site&lt;/a&gt; shows just some of the divergent characteristics of some of the vendors. &lt;/p&gt;
&lt;p&gt;Storage is not a commodity. Dedupe is not a commodity (it&amp;#39;s not even a single category). Dedupe storage is &lt;em&gt;far&lt;/em&gt; from being a commodity. &lt;/p&gt;</content:encoded>


<category>Dedupe market</category>

<dc:creator>Data Domain</dc:creator>
<pubDate>Wed, 18 Mar 2009 08:25:22 -0700</pubDate>

</item>
<item>
<title>Keeping It Real</title>
<link>http://www.dedupematters.com/brianbilesblog/2008/12/keeping-it-real.html</link>
<guid isPermaLink="true">http://www.dedupematters.com/brianbilesblog/2008/12/keeping-it-real.html</guid>
<description>Over the last few years, it started sounding like deduplication was becoming commoditized and turning into just a feature. FalconStor, Sepaton and Quantum announced massive clustered dedupe projects in 2005-7 bolted onto existing VTL products. They lined up OEMs or acquirers, including HP, Sun and EMC. The specs were unbelievable. If it is too good to be true, sometimes there is a reason. A year or more later, Data Domain is scaling as promised, but the bolt-ons are struggling to meet expectations in robustness and economic impact. Dedupe is an effect, not a method, and the methods are clearly not...</description>
<content:encoded>&lt;p&gt;Over the last few years, it started sounding like deduplication was becoming commoditized and turning into just a feature. FalconStor, Sepaton and Quantum announced massive clustered dedupe projects in 2005-7 bolted onto existing VTL products. They lined up OEMs or acquirers, including HP, Sun and EMC. The specs were unbelievable. &lt;/p&gt;
&lt;p&gt;If it is too good to be true, sometimes there is a reason. A year or more later, Data Domain is scaling as promised, but the bolt-ons are struggling to meet expectations in robustness and economic impact. Dedupe is an effect, not a method, and the methods are clearly not a commodity, as can be seen in examples &lt;a href="http://searchdatabackup.techtarget.com/news/article/0,289142,sid187_gci1315280,00.html"&gt;like this&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;Bolt-on specs are sometimes a wish list. For example, Quantum&amp;#39;s DXi 7500 maximum speed specs need a dual-controller system. But that configuration is apparently not available, more than a year since it &lt;a href="http://www.examiner.com/p-4469~Quantum_Introduces_DXi7500_Extending_Data_De_duplication_and_Replication_Benefits_Across_the_Distributed_Enterprise.html"&gt;was announced&lt;/a&gt;. What else is lurking that could throw claimed specs into a new light?&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.backupcentral.com/content/view/166/47"&gt;For example&lt;/a&gt;, in that same system (the basis of the EMC DL3D 3000)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Backups are written to a non-dedupe disk cache to get higher speed, except in the narrow case of the &amp;quot;adaptive&amp;quot; or &amp;quot;dedupe on&amp;quot; modes, in which writes that are a third or less of their benchmark rates will dedupe inlines; and 
&lt;li&gt;All cached data is kept after deduping as long as possible, so benchmark reads are generally from the pre-dedupe cache to get higher speed (ask about their truncation model). &lt;/li&gt;
&lt;/li&gt;&lt;/ul&gt;
&lt;p&gt;So their benchmarks could all involve the cache, not the dedupe store. What&amp;#39;s the real dedupe rate? Can you know when dedupe on the non-dedupe cache will complete? Can you trust the specs? Time will tell. &lt;/p&gt;
&lt;p&gt;Here is the problem. If dedupe matters, you have to be able to finish backup and dedupe in a day. A post-process system will fill up much faster than expected and/or get controller-bound and slow if it gets too far behind..If you replicate, the DR site will always be behind. It would really be a bummer if you didn&amp;#39;t expect this. &lt;/p&gt;
&lt;p&gt;So here&amp;#39;s an experiment you can try at home as a reality check. (These products still don&amp;#39;t have much of a reference base that will look like your situation, and without that, a reasonable buyer should do a proof of concept lab test.) If you do a bolt-on test, do it with a pair of large dedupe products, replicating across a LAN to avoid latency issues. &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Backup 30 or 40 TBs as fast as you can (or whatever they say they can dedupe in a day.) Bolt-on system designs can be very curious under stress, so don&amp;#39;t go small. 
&lt;li&gt;Time the duration from start of backup to finish or dedupe replication. The end-to-end rate of this test will be a fair proxy for the internal dedupe rate as long as it replicates only deduped data. (Good luck estimating dedupe completion time with their GUIs - it is often left as a daily calculus project.) 
&lt;li&gt;Now try restoring from the replica to see how fast restores are; the replica only has deduped data, so it won&amp;#39;t fake you out by restoring from the non-dedupe cache. &lt;/li&gt;
&lt;/li&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;p&gt;Data Domain&amp;#39;s systems are easy: with inline dedupe, the only throughput is dedupe throughput. &lt;/p&gt;</content:encoded>


<category>Dedupe market</category>

<dc:creator>Data Domain</dc:creator>
<pubDate>Thu, 11 Dec 2008 14:42:21 -0800</pubDate>

</item>
<item>
<title>Storage Vendors Stages of Grief About Deduplication </title>
<link>http://www.dedupematters.com/brianbilesblog/2008/12/storage-vendors-stages-of-grief-about-deduplication.html</link>
<guid isPermaLink="true">http://www.dedupematters.com/brianbilesblog/2008/12/storage-vendors-stages-of-grief-about-deduplication.html</guid>
<description>Deduplication is a challenge to most large storage vendors. IT means they get to sell fewer disk drives, and their business plans involve more drives. I expect most vendors will go through Elizabeth Kubler-Ross' five stages of grief in confronting this dilemma. You can already see it in practice in some of the biggest vendors. Denial: Internally: Deduplication is too slow and it puts data at risk. It ain't real storage, it doesn't work. Their implementation is risky! Tell customers: How about a free VTL with your next SAN/NAS upgrade? Let's get lunch. Anger: Internally: Who do they think they...</description>
<content:encoded>&lt;p&gt;Deduplication is a challenge to most large storage vendors. IT means they get to sell fewer disk drives, and their business plans involve more drives. I expect most vendors will go through Elizabeth Kubler-&lt;a href="http://en.wikipedia.org/wiki/K%C3%BCbler-Ross_model"&gt;Ross&amp;#39; five stages of grief &lt;/a&gt;in confronting this dilemma. You can already see it in practice in some of the biggest vendors. &lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Denial&lt;/strong&gt;&lt;/em&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="TEXT-DECORATION: underline"&gt;Internally: &lt;/span&gt;Deduplication is too slow and it puts data at risk. It ain&amp;#39;t real storage, it doesn&amp;#39;t work. Their implementation is risky! 
&lt;li&gt;&lt;span style="TEXT-DECORATION: underline"&gt;Tell customers: &lt;/span&gt;How about a free VTL with your next SAN/NAS upgrade? Let&amp;#39;s get lunch. &lt;/li&gt;
&lt;/li&gt;&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Anger:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="TEXT-DECORATION: underline"&gt;Internally: &lt;/span&gt;Who do they think they are? Dedupe is a fad! Real storage is tons of platters, spindles, lots of sheet metal. This whole industry is going to hell! We need to muddy the water. Buy or OEM something fast and hopefully cheap. Start blogging! 
&lt;li&gt;&lt;span style="TEXT-DECORATION: underline"&gt;Tell customers: &lt;/span&gt;How &amp;#39;bout a deal on your next VTL expansion? Running out of space? Right now we can discount VTL really low. I wouldn&amp;#39;t bring this up, but I see you&amp;#39;re talking to Data Domain...Did I tell you we have box seats at the game next week? &lt;/li&gt;
&lt;/li&gt;&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Bargaining:&lt;/em&gt;&lt;/strong&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="TEXT-DECORATION: underline"&gt;Internally:&lt;/span&gt; We need to do something. Our sales people are screaming bloody murder! Can we resell something cheap? &lt;em&gt;One size does not fit all.&lt;/em&gt;Toss something out we have lying around in the labs. We bundled it in enough deals to tell IDC we have market share, so stop bugging us. 
&lt;li&gt;&lt;span style="TEXT-DECORATION: underline"&gt;Tell customers: &lt;/span&gt;Please spend a month to review our VTL. Look, it has a little deduper in it. No? Take some more time, look at out esoteric backup software - it has that dedupe thing you have been asking about! No? We can get you a deal on it with that SAN upgrade we have been talking about. We totally understand your needs. Use some professional services hours. Oh, you need our challenging dedupe NAS. No? &lt;/li&gt;
&lt;/li&gt;&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Depression:&lt;/em&gt;&lt;/strong&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="TEXT-DECORATION: underline"&gt;Internally: &lt;/span&gt;Crap, this stuff isn&amp;#39;t going away. What, they went public? We let the market decide, and it turned out we got weak products, by &lt;em&gt;(choose one)&lt;/em&gt; trying to preserve and bolt on to our existing designs - or - we OEM&amp;#39;d weak products for time-to-market from vendors who were desperate. &lt;em&gt;(Sigh)&lt;/em&gt; 
&lt;li&gt;&lt;span style="TEXT-DECORATION: underline"&gt;Tell customers: &lt;/span&gt;What&amp;#39;s the difference anyway? It&amp;#39;s just a feature! What? It&amp;#39;s not? You need to test it? Look at this roadmap. This PDF says we check almost as many options as they do. I&amp;#39;m sure we have a reference &lt;em&gt;somewhere...&lt;/em&gt; &lt;/li&gt;
&lt;/li&gt;&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Acceptance:&lt;/strong&gt;&lt;/em&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="TEXT-DECORATION: underline"&gt;Internally:&lt;/span&gt; Wonder if they are accepting resumes? 
&lt;li&gt;&lt;span style="TEXT-DECORATION: underline"&gt;Tell customers:&lt;/span&gt; We are working on something else, we need more time. Wait for us! I&amp;#39;m serious, our backup software may be less fussy now. We can talk over lunch. &lt;/li&gt;
&lt;/li&gt;&lt;/ul&gt;</content:encoded>


<category>Deduplication</category>

<dc:creator>Data Domain</dc:creator>
<pubDate>Thu, 11 Dec 2008 14:42:04 -0800</pubDate>

</item>
<item>
<title>Nothing Has Changed</title>
<link>http://www.dedupematters.com/brianbilesblog/2008/12/nothing-has-changed.html</link>
<guid isPermaLink="true">http://www.dedupematters.com/brianbilesblog/2008/12/nothing-has-changed.html</guid>
<description>Last week, Data Domain announced the highest performance inline deduplication system in the industry with the DD690. EMC and Quantum followed hastily with their own announcements. Nothing has changed EMC is repacking the Quantum 7500 in the EMC DL3D line. But the inline dedupe rate is still about 150 MB/sec., according to this discussion, after which the only rates we can know for sure are to and from their non-dedupe cache. So the effective capacity, the amount you can dedupe in a backup day, is something less than 13 TBs (dedupe rate x 24 hours). If you don't finish deduping...</description>
<content:encoded>&lt;p&gt;Last week, Data Domain announced the highest performance inline deduplication system in the industry with the DD690. EMC and Quantum followed hastily with their own announcements. &lt;/p&gt;
&lt;p&gt;Nothing has changed&lt;/p&gt;
&lt;p&gt;EMC is repacking the Quantum 7500 in the EMC DL3D line. But the inline dedupe rate is still about 150 MB/sec., according to this discussion, after which the only rates we can know for sure are to and from their non-dedupe cache. So the &lt;em&gt;&lt;strong&gt;effective capacity&lt;/strong&gt;&lt;/em&gt;, the amount you can dedupe in a backup day, is something less than 13 TBs (dedupe rate x 24 hours). If you don&amp;#39;t finish deduping in a&amp;#0160;day, it will compromise tomorrow&amp;#39;s throughput and backup window. So the only way you could use 150 TB of this disk for dedupe is if your backup window is more than two weeks long. &lt;/p&gt;
&lt;p&gt;Why offer 148-180 TB in this architecture? Because it is not a dedupe system, it is a traditional VTL (even if it has a NAS interface) -- a disk-based, short-term-retention I/O buffer for data on the way to tape. The extra storage baggage? Call that &lt;strong&gt;&lt;em&gt;dupe&lt;/em&gt;&lt;/strong&gt;storage. It is not meaningful for sizing dedupe or replication. Data Domain&amp;#39;s dedupe speed and resulting effective capacity is greated by more than a factor of two. &lt;/p&gt;
&lt;p&gt;Seems like everything&amp;#39;s still the same:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data Domain is still the deduplication performance leader; 
&lt;li&gt;EMC and Quantum are still selling traditional VTLs based on dedupe storage; 
&lt;li&gt;EMC is still selling massively over-provisioned disk systems; 
&lt;li&gt;Quantum is really trying to sell tape libraries above all. &lt;/li&gt;
&lt;/li&gt;&lt;/li&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;p&gt;Oh, I guess one thing has changed. Inline deduplication used to be considered too slow compared to a post-process. I guess that has changed. The non-dedupe write speed of the EMC DL3D 3000, at 1.44 TB/hour, s about the same as the inline dedupe speed of the DD690. If post-process isn&amp;#39;t even faster than inline, why bother? &lt;/p&gt;&lt;br /&gt;
&lt;p&gt;&lt;strong&gt;&lt;font size="4"&gt;&lt;/font&gt;&lt;/strong&gt;&amp;#0160;&lt;/p&gt;</content:encoded>


<category>Deduplication</category>

<dc:creator>Data Domain</dc:creator>
<pubDate>Thu, 11 Dec 2008 14:32:58 -0800</pubDate>

</item>

</channel>
</rss><!-- ph=1 -->

