<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0">
<channel>
<title>Rich's Blog</title>
<link>http://www.dedupematters.com/richs_blog/</link>
<description />
<language>en-US</language>
<lastBuildDate>Fri, 12 Mar 2010 14:57:37 -0800</lastBuildDate>
<generator>http://www.typepad.com/</generator>

<docs>http://www.rssboard.org/rss-specification</docs>
<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/richcolbert" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="richcolbert" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
<title>Ambush!</title>
<link>http://www.dedupematters.com/richs_blog/2010/03/ambush.html</link>
<guid isPermaLink="true">http://www.dedupematters.com/richs_blog/2010/03/ambush.html</guid>
<description>Yesterday morning I was in San Francisco to meet with a multinational life sciences organization. The purpose of the meeting as we were told was to deliver a roadmap presentation across the several EMC storage product families. Our collective team met early, and huddled-up to go over our playbook. I over-do the sports metaphors intentionally. Sales and sports differ primarily in the lack of physical talent of the sales team, and the fact that sports are actually interesting to watch. In our huddle, we gathered around laptops, reviewed presentation materials, added slides to various decks, shuffled things around, traded decks...</description>
<content:encoded>Yesterday morning I was in San Francisco to meet with a multinational life sciences organization. The purpose of the meeting as we were told was to deliver a roadmap presentation across the several EMC storage product families. Our collective team met early, and huddled-up to go over our playbook. I over-do the sports metaphors intentionally. Sales and sports differ primarily in the lack of physical talent of the sales team, and the fact that sports are actually interesting to watch. In our huddle, we gathered around laptops, reviewed presentation materials, added slides to various decks, shuffled things around, traded decks on USB sticks, and finally double-checked to ensure we had the correct customer name on the title slide (we did not as it turns out, so that’s always a nice thing to add to the final checklist.)&lt;br /&gt;&lt;br /&gt;At the appointed time our team boldly strode to the meeting room and were greeted by our customer. One thing that I’ve noticed since joining EMC is that in a multi-divisional presentation we often outnumber the customer, sometimes by a significant margin. However, for this meeting we were instructed to bring no more than five people. Our customer however was under no such obligation, and as we quickly discovered, we were outnumbered by approximately 4-to-1 with around 20 members of the customer’s global technology organization in the room. This however was foreshadowing, and not the actual ambush. I’m sure you’re waiting for that shoe to drop given the title of this blog entry.&lt;br /&gt;&lt;br /&gt;No, the actual ambush happened moments later as we were seated and just prior to the introductions. The customer began with the following statement: “We thank you for coming. We have one hundred and twenty minutes together. If you have any PowerPoint presentations prepared, put them away. We have no interest in seeing them. We have one slide for you. That slide will present a scenario for you, and we would like to spend the next two hours in an open dialogue discussion what EMC’s vision is to help us achieve our goals.”&lt;br /&gt;&lt;br /&gt;It was brilliant.&lt;br /&gt;&lt;br /&gt;In that split second the customer transformed the meeting from a sales pitch to a solutions dialogue. I was reminded of an article I’d seen recently discussing dialogue versus debate and I thought it was worth sharing. The article can be found &lt;a href="http://www.nald.ca/clr/study/scdvd.htm"&gt;here&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;The customer’s scenario described a looming data growth problem they were anticipating over the next five years largely due to a transformation in the methods that they conduct research. From an IT and engineering point of view there are few enterprises that compare to life sciences in terms of the magnitude of data generated. Geology, climate, and energy seem to be built from similar DNA (pun intended), but these all share one thing in common. Anytime we take the massively large or infinitesimally small pieces of the physical world (universe, really) and extract data, we are faced with an enormous variety of challenges. Data storage footprint, cost, performance, protection, transmission, and administration are all exponentially more difficult due to the sheer volume. &lt;br /&gt;&lt;br /&gt;While these demanding environments require at a bare minimum the biggest and fastest solutions on the market today, it is clear that brute force alone is not sufficient to meet the needs of these challenges. While perhaps the storage industry in general has long been successful with a Tim Allen’s Home Improvement approach to product, it is apparent that bigger and faster alone doesn’t scale into these new, demanding environments.&lt;br /&gt;&lt;br /&gt;It would be really great if I could wrap-up by claiming that we presented products to the customer that met all of the needs and that digital ink has met a bevy of digital contracts. But that’s not the case, and that’s not the point. My point is simply that I believe in this dialogue that EMC learned as much if not more than the customer. The customer created a perfect environment for dialogue, and I thought to myself, &lt;em&gt;“isn’t that the sales team’s job?”&lt;/em&gt; Well, if so, then it doesn’t happen as often or as well as it could. I am thankful for the lesson. The customer has taught me something, and even more importantly they have armed me with important information. Now I can go sell where I need to – internally. In order for EMC to meet the demands of the most extreme workloads five years down the road we have some serious work to do. And while I truly believe EMC leads the industry in many sectors, and particularly in data protection, I am also aware that there is no rest to be had if we want to continue to satisfy the requirements of the most demanding customers in the storage industry. &lt;br /&gt;&lt;br /&gt;Why is that important? Well, the more capable our solutions become at the high end, the more headroom they provide for everyone else. While life sciences and a few other industries must lead the charge by their very nature, I think most customers would appreciate knowing that they aren’t venturing into new territory in terms of scope and scale when designing their infrastructure. All of this without subjecting anyone to two hours of PowerPoint; I hate to sound like a sales guy, but that’s a win-win.</content:encoded>



<dc:creator>Data Domain</dc:creator>
<pubDate>Fri, 12 Mar 2010 14:57:37 -0800</pubDate>

</item>
<item>
<title>Replication Without Borders</title>
<link>http://www.dedupematters.com/richs_blog/2009/10/replication-without-borders.html</link>
<guid isPermaLink="true">http://www.dedupematters.com/richs_blog/2009/10/replication-without-borders.html</guid>
<description>A past conversation with a Data Domain Fortune 1000 customer revealed that, at the primary data center, they were maintaining no less than 15 various types of backup devices and associated software - specifically for the purpose of being able to support the restoration of data in remote sites. They had nearly 700 of these remote sites and a variety of regional data centers. For a number of reasons, including acquisitions, the company's backup infrastructure had evolved over time to become incredibly diverse and difficult to manage. To mitigate, they had recently installed Data Domain systems at their primary and...</description>
<content:encoded>&lt;p&gt;A past conversation with a&amp;#0160;Data Domain Fortune 1000 customer revealed that, at the primary data center, they were maintaining no less than 15 various types of backup devices and associated software&amp;#0160;- specifically for the purpose of being able to support the restoration of data in remote sites. They had nearly 700 of these remote sites and a variety of regional data centers. For a number of reasons, including acquisitions, the company&amp;#39;s backup infrastructure had evolved over time to become incredibly diverse and difficult to manage. To mitigate, they had recently installed Data Domain systems at their primary and regional data centers deployed with a multi-site replication typology. Their longer term vision was to tackle the challenges they faced with their remote offices. &lt;/p&gt;
&lt;p&gt;The story underscores the point that the replication needs of large organizations can be virtually boundless and can require tremendous configuration flexibility. Data Domain recently expanded the capabilities of its replicator software in two significant dimensions. &lt;/p&gt;
&lt;p&gt;First, replication fan-in for many-to-one topology now supports up to 180 remote sites all replicating into a single system at a central hub site. One might wonder who would ever need that much, but the aforementioned customer is one of many who I have spoken to who have such an environment. &lt;/p&gt;
&lt;p&gt;Second, Data Domain systems now support cascaded replication. In other words, you can now replicate data from one location to a secondary location, and then from the secondary location to a third site. While this might be more than some customers require, it&amp;#39;s a requested feature that is more commonly asked for than you might think. &lt;/p&gt;
&lt;p&gt;Recently I was discussing Data Domain systems and technology at an energy company. Since the bulk of the conversation was about their two data center locations, I assumed that cascaded replication would be of little interest. Turns out I was way off target. The company in question recently experienced a security related &amp;#39;incident&amp;#39; that exposed a weakness in their dual data center model. The company realized then that they needed to have a stronger segregation of duties and are moving towards adding a third data center with strong isolation from either the first or the second site. In their new model, sites A and B will selectively cross-replicate with each other, and then both replicated data sets will also replicate into a secured and hardened site C. Administrators at sites A and B will not have a physical access or administrative access to equipment at site C. &lt;/p&gt;
&lt;p&gt;However, I believe the main adopters of cascaded replication will be organizations that want to create a replication topology that maps precisely to their distributed site model. I see cascading fitting well into organizations with a combination of small remote sites, medium-sized regional hubs, and large global data centers. In this model you will see any number of small, remote sites replicating into regional hubs. These regional hubs will then replicate the remote site data plus their own local data to the larger, global data centers for longer term retention and a minimal amount of tape creation as may be required. &lt;/p&gt;
&lt;p&gt;Let me backtrack a little and point out that Data Domain has&amp;#0160;already benefited from an industry-leading replication capability before the recent enhancements. In fact, some of the competition have been shipping a very limited and inflexible replication capability for a few short months, while others are still promising &amp;#39;Replication 1.0&amp;#39; in the near future. &lt;/p&gt;
&lt;p&gt;With Data Domain, IT architects can now even more easily design and implement whatever replication topology their businesses require. With a flexible replication technology, customers can mold their replication strategy around the environment with respect to WAN topology, regional affinities and the desire to massively centralize or eliminate tape automation. &lt;/p&gt;</content:encoded>



<dc:creator>Data Domain</dc:creator>
<pubDate>Fri, 16 Oct 2009 14:32:22 -0700</pubDate>

</item>
<item>
<title>Packing Your Bag for VMworld</title>
<link>http://www.dedupematters.com/richs_blog/2009/08/packing-your-bag-for-vmworld--vmworld-2009-is-drawing-near-and-its-time-to-start-preparing-for-the-trip-im-particular.html</link>
<guid isPermaLink="true">http://www.dedupematters.com/richs_blog/2009/08/packing-your-bag-for-vmworld--vmworld-2009-is-drawing-near-and-its-time-to-start-preparing-for-the-trip-im-particular.html</guid>
<description>VMworld 2009 is drawing near, and it’s time to start preparing for the trip. I’m particularly excited about this year’s show. If there’s one IT event you choose to attend all year, VMworld 2009 should be near the top of the list for consideration. VMware has a great website for planning your time at the show. If typing “vmworld” into your browser and pressing ctrl+enter is too much effort, I’ve provided this convenient link: http://www.vmworld.com/community/conferences/2009/ VMworld always has an extensive schedule of events and activities, not to mention a list of sponsors and exhibitors a mile long. This year is...</description>
<content:encoded>&lt;p&gt;VMworld 2009 is drawing near, and it’s time to start preparing for the trip. I’m particularly excited about this year’s show.&amp;#0160; If there’s one IT event you choose to attend all year, VMworld 2009 should be near the top of the list for consideration. &lt;/p&gt;&lt;p&gt;VMware has a great website for planning your time at the show. If typing “vmworld” into your browser and pressing ctrl+enter is too much effort, I’ve provided this convenient link:&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.vmworld.com/community/conferences/2009/"&gt;http://www.vmworld.com/community/conferences/2009/&lt;/a&gt;&lt;/p&gt;&lt;p&gt;VMworld always has an extensive schedule of events and activities, not to mention a list of sponsors and exhibitors a mile long. This year is no exception. So how do you not get overwhelmed by all the possibilities? No worries – just sit back and let me do the planning for you.&lt;/p&gt;&lt;p&gt;Item 1:&amp;#0160; Make a short list of your one or two biggest virtual wishes in advance. Touring the floor can be a marathon. If you have one or two problems to solve it can really help keep you focused on finding the right vendors to talk to.&lt;/p&gt;&lt;p&gt;Item 2:&amp;#0160; Select two or three speaking sessions to attend.&amp;#0160; See item 1 for suggestions on topics.&amp;#0160; It’s hard to pick speaking sessions while the show is in progress.&amp;#0160; Reading the list with a tote under one arm and a Caesar salad on a paper plate balanced on the hand opposite while standing in a moving sea of humanity takes more talent than most of us have. Also, since you’re here at dedupematters.com, I’ll go ahead and assume you’ve got some level of interest in the Data Domain story. Therefore, I’d like to be so bold as to suggest a couple of sessions. “Planning for Optimized and Cost-effective Storage Utilizing Deduplication and Virtualization” (Session BC3223) featuring Data Domain customer Jules Thomas, and “The Real Deal on Dedupe and Virtualization” featuring our own Enterprise Application Technologist, Daniel Budiansky.&lt;/p&gt;&lt;p&gt;Item 3:&amp;#0160; Make extensive use of the hands-on labs. There’s no substitute for the experience of doing it yourself. There are labs covering a vast array of subjects, so there’s sure to be more than one that gives you the opportunity to experience something new. &lt;/p&gt;&lt;p&gt;Item 4:&amp;#0160; Spend some time with a big vendor.&amp;#0160; I know it sounds obvious, but it’s easy to overlook the big companies because you’re already familiar with what they have to offer.&amp;#0160; Cisco is certainly on my short list.&lt;/p&gt;&lt;p&gt;Item 5:&amp;#0160; Stop by the Data Domain booth. We’re going to be wearing EMC shirts for the first time. I think a bunch of us are planning on sneaking over to the regular EMC booth and trying to blend in.&amp;#0160; Who knows, if we play our cards right we might even get invited back to &lt;a href="http://www.emcworld.com/" target="_blank"&gt;EMC World&lt;/a&gt; in Boston next May. Yeah, we love that show!&lt;/p&gt;&lt;p&gt;Most importantly, have a little fun while you’re in town. Maybe it’s time to put on your black leather baseball cap, rent a red Trans-Am, and ride in style to the &lt;a href="http://www.foreigneronline.com/"&gt;Foreigner&lt;/a&gt; show on Wednesday night. I wonder if they still have that big, inflatable jukebox?&lt;/p&gt;&lt;p&gt;However you pack your bags, and whatever your plans, I look forward to seeing you at the show.&lt;/p&gt;&lt;p&gt;Oh, and if you’re curious about how Data Domain can help protect your virtual infrastructure, check out our solutions page &lt;a href="http://www.datadomain.com/solutions/vmware.html"&gt;here&lt;/a&gt; . Or better yet, just stop by our booth. Once we’re done with our shenanigans, we’ll be happy to talk shop.&lt;/p&gt;</content:encoded>



<dc:creator>Data Domain</dc:creator>
<pubDate>Wed, 26 Aug 2009 07:56:00 -0700</pubDate>

</item>
<item>
<title>How Much Speed Does a Large Enterprise Need?</title>
<link>http://www.dedupematters.com/richs_blog/2009/07/how-much-speed-does-a-large-enterprise-need.html</link>
<guid isPermaLink="true">http://www.dedupematters.com/richs_blog/2009/07/how-much-speed-does-a-large-enterprise-need.html</guid>
<description>The storage industry is perpetually awash with a variety of performance measurements. These statistics, commonly referred to as “speeds and feeds,” are essential to help the enterprise technologist determine which solutions are suitable for a given storage workload. They allow for a semblance of apples-to-apples comparison across competing products, even if one must tweak the numbers to account for varying levels of vendor optimism. Protection storage (i.e. backup disk, tape, or other media) performance is typically measured in data transfer rates. These rates are commonly expressed in MB/sec or TB/hr. Both aggregate speed and single stream performance are important to...</description>
<content:encoded>&lt;p&gt;The storage industry is perpetually awash with a variety of performance measurements.&amp;#0160; These statistics, commonly referred to as “speeds and feeds,” are essential to help the enterprise technologist determine which solutions are suitable for a given storage workload.&amp;#0160; They allow for a semblance of apples-to-apples comparison across competing products, even if one must tweak the numbers to account for varying levels of vendor optimism.&lt;/p&gt;
&lt;p&gt;Protection storage (i.e. backup disk, tape, or other media) performance is typically measured in data transfer rates.&amp;#0160; These rates are commonly expressed in MB/sec or TB/hr.&amp;#0160; Both aggregate speed and single stream performance are important to the backup workload.&amp;#0160; Aggregate speed is indicative of the big picture; given x amount of data to backup, what is the minimum possible time that the storage system will require to complete all backups?&amp;#0160; Single stream performance is an important consideration as well, especially for high-speed backup clients such as large, powerful database servers.&amp;#0160; Ideally, both aggregate speed and individual stream speed will be very fast.&amp;#0160; But how fast is fast enough, especially for the enterprise?&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;The answer as it turns out is not all that simple.&amp;#0160; Storage vendors who compete on feeds and speeds have a tendency to focus the customer’s attention strictly on the characteristics of their own device, ignoring the fact that protection storage is a downstream element from what is often a very large and complex technology ecosystem.&amp;#0160; Here are a few somewhat imperfect analogies that may help simplify and put things in perspective.&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Pitching a Baseball (single-stream):&amp;#0160; When a pitcher throws a baseball it reaches its maximum velocity upon release.&amp;#0160; The baseball never accelerates on the way to home plate.&amp;#0160; The catcher’s mitt doesn’t exert pull on the baseball.&amp;#0160; Therefore, if you equate the storage device with the catcher’s mitt, you clearly need a mitt that can handle a 100mph fastball, but is there a material difference between mitts that can handle 150mph versus 200mph?&amp;#0160; Conversely, if the catcher’s mitt can only handle a 70mph pitch then your catcher will be calling for knuckleballs and changeups all night long.&amp;#0160; In a similar fashion, backup data never speeds up after it leaves the client, yet the backup storage device must be able to handle the fastest streams possible.&amp;#0160; Backup storage devices do not exert pull.&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Terminal Velocity or Backup to Null (aggregate):&amp;#0160; Any collection of data residing in a data center has what I think of as a “terminal velocity.”&amp;#0160; That’s the hypothetical transfer rate that would exist if all of the backup clients sent their data directly to a null device.&amp;#0160; In other words, how fast can the existing backup clients (or other protocols and techniques) lift the data off of their provisioned storage and send it into an infinitely fast receptacle?&amp;#0160; The goal here is to hypothetically remove network or fiber channel (i.e. transports), any other intermediary devices (backup servers), and of course the backup storage device itself.&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Given these two ideas, it should be fairly clear that a backup storage device cannot be deployed as an accelerator.&amp;#0160; It need only be thought of as a potential bottleneck.&amp;#0160; The goal then is to avoid having the backup storage device slow down the backups, which coincidentally is also the goal of the transports (network or SAN), and the intermediaries (backup servers.)&amp;#0160; The further upstream you push the backup bottleneck, the closer you come to achieving terminal velocity in your backup environment, which is exactly what you want to do.&amp;#0160; The more successful enterprise shops are the ones who intuitively understand these principles and architect complete solutions instead of comparing storage device speeds and feeds in a vacuum.&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;So I still haven’t answered the question ‘how fast is fast enough for the enterprise?’&amp;#0160; The answer lies in a compound question:&amp;#0160; How fast must a storage solution be to complete backups within the time allotted *and* if you have such a solution, is the rest of the infrastructure up to the task?&amp;#0160; Simply put, your backup storage should be fast enough to not be the bottleneck *unless* you are able to complete your backups comfortably within your backup window anyway, in which case you may not care.&amp;#0160; At the highest level you can say that faster is always better, but only up to a certain point.&lt;br /&gt;Now here’s how the performance question starts to get interesting with Data Domain and deduplication.&amp;#0160; For years many vendors have been using speeds and feeds arguments, warranted or not, to position their solutions against Data Domain.&amp;#0160; However, Data Domain Operating System 4.6 broke through an important inflection point in the evolution of deduplication.&amp;#0160; As a result, the DD690 attained an amazing 750 MB/sec of aggregate throughput utilizing a very minimal number of disks.&amp;#0160; This meant that for the first time it was almost as fast to backup straight to Data Domain’s deduplicated storage than it was to an “enterprise” post-process deduplication controller, or even to one of the many, non-deduplicating VTL systems.&amp;#0160; &lt;br /&gt;&lt;/p&gt;
&lt;p&gt;With the introduction of Data Domain’s new flagship product, the DD880, the ante has doubled.&amp;#0160; The DD880 sports an outstanding 1,500 MB/sec, or 5.4 TB/hr of aggregate throughput, placing it clearly on top of the leader board across all backup storage targets, dedupe and non-dedupe alike.&amp;#0160; By comparison, a single Data Domain DD880 controller ingests data faster than a 2-node NetApp 1400 (post-process dedupe), 2-node HP 9000 (post-process dedupe), or 2-node IBM 7650G (inline dedupe) active-active cluster.&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Read this if you read nothing else:&amp;#0160; Data Domain’s single-controller inline ingest speed is now faster than a 2-node active-active clustered post-process ingest speed.&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;With the introduction of the DD880, speeds and feeds are no longer an inline versus post-process argument.&amp;#0160; In fact, the sole argument that supported post-process deduplication has just been obliterated.&amp;#0160; Speeds and feeds are no longer a dedupe versus non-dedupe argument either.&amp;#0160; The DD880 is the new standard in backup storage performance spanning all categories.&amp;#0160; Now we can get back to the task at hand and select a solution based on what is really important – the business requirements and the other technological realities of the enterprise.&amp;#0160; We can put the speeds and feeds arguments to rest.&amp;#0160; The DD880 wins in an overwhelming landslide.&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;</content:encoded>



<dc:creator>Data Domain</dc:creator>
<pubDate>Fri, 24 Jul 2009 10:14:00 -0700</pubDate>

</item>
<item>
<title>"Second Generation" Deduplication</title>
<link>http://www.dedupematters.com/richs_blog/2009/05/second-generation-deduplication.html</link>
<guid isPermaLink="true">http://www.dedupematters.com/richs_blog/2009/05/second-generation-deduplication.html</guid>
<description>There's a new spin in the dedupe marketplace these days. Several of the smaller and more recent entrants to the market are waving the 'second generation' flag. This is the typical nonsense that accompanies products that are late to market. In my experience, 'late' and 'better prepared' are usually antithetical. Ask any high school teacher lucky enough to have had me as a student in a first period class. It's like implying that sitting on the sidelines for the entire season makes the backup quarterback better than the guy who has taken every snap and led the team to the...</description>
<content:encoded>&lt;p&gt;There&amp;#39;s a new spin in the dedupe marketplace these days. Several of the smaller and more recent entrants to the market are waving the &amp;#39;second generation&amp;#39; flag. This is the typical nonsense that accompanies products that are late to market. In my experience, &amp;#39;late&amp;#39; and &amp;#39;better prepared&amp;#39; are usually antithetical. Ask any high school teacher lucky enough to have had me as a student in a first period class. It&amp;#39;s like implying that sitting on the sidelines for the entire season makes the backup quarterback better than the guy who has taken every snap and led the team to the championship. Not a very logical argument. &lt;/p&gt;
&lt;p&gt;In very broad terms I think the occasional success of this &amp;#39;lack of real experience is a leadership quality&amp;#39; message is somewhat enabled by a broad turn in the perception of dedupe. Looking back as recently as last year, people who were investigating deduplication were asking the difficult and important questions. There wasn&amp;#39;t an inherent trust that the technology actually worked and was suitable for production. The problem today is that the tough questions are fewer and farther between. &lt;/p&gt;
&lt;p&gt;People can quickly be made to forget that dedupe is extremely hard to do. As evidence to support this hypothesis I&amp;#39;d point to EMC&amp;#39;s dedupe portfolio. First, let me be clear. This isn&amp;#39;t an attack on the Avamar technology they purchased or the Quantum technology they repackage. Those are different topics for a different day. The point I&amp;#39;d make here is that for a company as large as EMC, neither of the two deduplication technologies they bring to market&amp;#0160;were developed internally. You would think that EMC at least tried to build a dedupe product internally, and perhaps they did. But the trend seems to continue with IBM also looking to the outside world for help to release a dedupe product. &lt;/p&gt;
&lt;p&gt;Developing a deduplication product is not trivial. If it were just about time and money, EMC and IBM certainly have the resources to go it alone. Even when you do partner, there is no guarantee that the product will be ready for prime time. &lt;/p&gt;
&lt;p&gt;So why is it that any vendors are getting a free pass today on what should be a rigorous examination of technology? To be very clear, I&amp;#39;ve used EMC and IBM as examples to demonstrate that building dedupe is hard - i.e. you can&amp;#39;t just throw dollars at the problem and cross your fingers. However, these are not necessarily the same vendors selling the &amp;#39;second generation&amp;#39; story. Those &amp;#39;vendors who shall not be spoken of&amp;#39; won&amp;#39;t get a free bump in their Google relevance ranking here. Wait, if I&amp;#39;m speaking about the ones who shall not be spoken of...&lt;/p&gt;
&lt;p&gt;Let me put it directly. Data Domain has spent years in the real world proving our technology does what we claim, across thousands of customers in a myriad of complex environments. We&amp;#39;ve earned our solid reputation and continue to do so day in and day out. We have advanced the state of the technology by leaps and bounds over a long period of time with a great amount of engineering effort, upon a foundation of production deployment experience that is second to none in the industry. And yet I&amp;#39;d gladly return our free pass and welcome renewed skepticism from potential customers. Once the misperception that dedupe is easy is allowed to take hold, it takes a truly open mind to return to a healthy level of doubt. And that&amp;#39;s exactly what needs to happen in order to make an informed decision. Otherwise, we have a bizarre situation where new, immature and untested products in the dedupe marketplace are attempting to rest upon Data Domain&amp;#39;s laurels. That makes as much sense as version 1.0 being billed as a &amp;#39;second generation&amp;#39; product. &lt;/p&gt;</content:encoded>


<category>Dedupe technology</category>

<dc:creator>Data Domain</dc:creator>
<pubDate>Thu, 21 May 2009 13:31:45 -0700</pubDate>

</item>
<item>
<title>Collection Replication</title>
<link>http://www.dedupematters.com/richs_blog/2009/05/collection-replication.html</link>
<guid isPermaLink="true">http://www.dedupematters.com/richs_blog/2009/05/collection-replication.html</guid>
<description>There is a common misconception that the principle known as Occam's Razor can be summed up as "all things being equal, the simplest answer is the right one." While that notion works wonders on a television show where important mysteries need to get resolved in 48 minutes or less, it's not really the point of this very important concept. Occam's Razor in a modernized nutshell means "don't make things more complicated than they have to be." It's a powerful idea, and one that proves itself over and over again in the real world. One example that comes to mind is...</description>
<content:encoded>&lt;p&gt;There is a&amp;#0160;common misconception that the principle known as &lt;a href="http://en.wikipedia.org/wiki/Occam&amp;#39;s_razor"&gt;Occam&amp;#39;s Razor&lt;/a&gt; can be summed up as &amp;quot;all things being equal, the simplest answer is the right one.&amp;quot; While that notion works wonders on a television show where important mysteries need to get resolved in 48 minutes or less, it&amp;#39;s not really&amp;#0160;the point of this very important concept. &lt;/p&gt;
&lt;p&gt;Occam&amp;#39;s Razor in a modernized nutshell means &amp;quot;don&amp;#39;t make things more complicated than they have to be.&amp;quot; It&amp;#39;s a powerful idea, and one that proves itself over and over again in the real world. &lt;/p&gt;
&lt;p&gt;One example that comes to mind is Data Domain&amp;#39;s &lt;em&gt;Collection Replication&lt;/em&gt;. Collection replication is one of several forms of replication available when using Data Domain systems. The beauty of collection replication is the power of simplicity. Collection replication is an option when a customer wants to perform simple system-to-system mirroring, leveraging the efficiency of data deduplication to replicate vast quantities of data offsite for disaster recovery protection. &lt;/p&gt;
&lt;p&gt;Data Domain takes advantage of that inherent simplicity by streamlining the replication process to the greatest extent possible. Because collection replication is aware that &lt;em&gt;all &lt;/em&gt;new, unique (i.e. non-duplicate) data segments written to the local filesystem must be transferred, it is able to send that new data across the wire immediately. However, any data that is not new and unique does not need to be sent. In the instance of moves or deletes, collection replication sends across a minimal subset of deduplication-aware &amp;#39;housekeeping instructions&amp;#39; to the remote side. Once data in the form of a file or virtual tape arrives at the remote system, it is immediately visible and immediately available. This is true even while replication is still in progress and there are still additional files or tapes inbound. &lt;/p&gt;
&lt;p&gt;If a typical deduplication effect preemptively eliminates 98% of the data to be transferred, then collection replication ensures that the remaining 2% is moved as quickly as possible. And when compared to other Data Domain replication techniques, you can think of it this way; if multi-site replication functions like a network of highways, then collection replication is like a drag speedway. If all you want to do is move deduplicated data from point A to point B, there isn&amp;#39;t a faster way to do it. Period. &lt;/p&gt;
&lt;p&gt;There are other powerful use-cases as well. Collection replication is a fantastic option for nearline or archival storage when you want to replicate millions of small files without incurring overhead for the metadata associated with each individual file. There isn&amp;#39;t another deduplication-aware replication technology available that has this capability. &lt;/p&gt;
&lt;p&gt;Of course, sometimes a simple environment grows more complicated over the course of time. In that circumstance, Data Domain replication that started out as collection based can be converted to any of our other replication topologies quickly and easily, and in most cases without having to resend data that&amp;#39;s already been stored at the replication destination. &lt;/p&gt;
&lt;p&gt;Collection replication is an enterprise-class capability, and has a major advantage in the marketplace when it comes to serving the high-volume replication needs of the largest data centers. By comparison, neither IBM/Diligent nor Sepaton even have deduplication-aware replication - although they&amp;#39;ve both been promising it for some time. Other solutions such as the joint EMC/Quantum product, the DL 3000, (formerly known as the DL3D 3000 a.k.a. the DXi7500) must first wait for their slower, post-process deduplication to complete, then for data replication to occur, and then for several other byzantine processes including a &amp;#39;namespace sync&amp;#39; and possibly some shell script to execute before the data is available. Even then, the data on the remote side is painfully slow to read. &lt;/p&gt;
&lt;p&gt;So why is this collection replication unique to Data Domain?&amp;#0160;The answer has to do with&amp;#0160;the architecture. First, a system has to perform true inline deduplication to achieve the effective replication throughput of Data Domain&amp;#39;s collection replication. All net-new data stored on disk must be immediately known with certainty to be unique. That factor alone disqualifies most of the competition. Second and equally important, a system must have a log-structured file system in order to achieve the efficiency of Data Domain&amp;#39;s collection replication. The log-structure of the Data Domain file system effectively queues up data for replication and ensures write-order integrity by design, without the need for complex layers of additional replication code. Those two factors combined ensure that collection replication will remain unique to Data Domain for the foreseeable future. &lt;/p&gt;</content:encoded>


<category>Replication</category>

<dc:creator>Data Domain</dc:creator>
<pubDate>Thu, 07 May 2009 10:37:51 -0700</pubDate>

</item>

</channel>
</rss><!-- ph=1 -->

