Ongoing Trends with Tape: A Response

cyoung

On The Networker Blog, Preston de Guise makes a lot of observations that are spot-on and I would agree with 100% but there are a few that beg larger questions and further consideration and, at least as it applies to dedupe, I would say that Preston is wrong.

First, long term data retention for regulatory reasons will continue to use tape per point #2. Absolutely; we’re in sync there. But, is that data being stored as a backup or an archive? To some extent it depends on what you mean by an archive vs. a backup and what you consider a single copy of data (given that an archive is generally considered the retention of a single copy of data for a prolonged period of time). If, for example, you consider a version of a file to be “unique” data and you have a policy to store only changed versions on a monthly basis then we are really talking about creating an archive vs. storing a backup for a long time. Why might that be important? Well, it goes somewhat to addressing the amount of data that requires storage as backup nearly always creates duplication, triplication, etc of data (the implication being cost even if tape is very cheap at as little as $0.05/GB (provided you aren’t experiencing undue cost overheads associated with improper handling of tape or the lack of optimization of access to tape — EDT to the rescue anyone?)). Moreover, it goes to reinforce the growing messaging that tape is really the archival medium of choice and will be for decades to come while its presence in the backup world will dwindle away as HDD and SSD based backup grows.

Second, the statement “these non-enterprise tapes were at best unreliable formats – they actually gave a lot of fodder to the “tape is dodgy theme” is absolutely true. What baffles me is not the fact that vendors are continuing the life-support of such antiquated drives but the fact that these same vendors who supply enterprise class drives would allow their lower-end offerings to hinder their success with higher margin, more robust technologies. It beggars belief. Something that needs to be addressed amongst tape vendors is their failure to band together to tout the benefits of their technologies as the disk vendors have done.

Lastly, deduplication…I don’t agree that it will not migrate to tape in a meaningful way unless you constrain your thinking to be that you must always move deduplicated data in its deduplicated form to tape. Okay, so now your probably thinking I’m just proving the point. But, wait. What I’m saying is fundamental to the underlying rationale for deduplication in the first place; namely, the elimination of duplicate copies of data. To that end, I think the meaningful move of deduplication to tape will come as a result of software becoming more intelligent so as to effectively build an archive of unique data from deduplicated data combined with a metadata schema that allows a single copy of data to be recovered to its source without needing to be rehydrated. How might this work? Well, LTO-5 and LTFS may help in this area as LTO-5 tape media has gained the ability to be self-describing meaning that the deduplication function that would have previously been handled through a dedupe appliance can now be eliminated through an intelligent metadata store held on the LTO-5 cartridge itself. Taking this approach will enable people to dedupe data on disk and then move it to tape retaining only one hydrated copy of the data on tape; that is how dedupe and tape coexist!

Outside of these comments regarding Preston’s predictions, I also want to mention a prediction of my own: the disparity between the capabilities of tape hardware and backup / archive applications will grow. Already this is happening as companies like SpectraLogic invest in developing extremely large tape systems with significant intelligence yet the way in which applications use and access the library haven’t evolved much beyond the days of the first auto-loaders. This will create a growing opportunity for companies like us to bring solutions to market that close this gap and help end-users tap into otherwise unrealized performance / management improvements. We have been doing this with our Enterprise DistribuTape (EDT) software for years and we keep going from strength to strength. I wouldn’t be surprised if others follow suit.

December 8th, 2011 by cyoung

Tags: , , , , , , ,

Category: Opinion

Comments (2)

 

  1. Hi Chris,

    I considered LTFS when I wrote that deduped-data-on-tape still won’t fly.

    The simple fact is that while LTFS has been developed to allow a simulacrum of random access to tape, the seeks for data are still going to be wildly variant. I’ve worked with LTO in HSM environments, and the results are not pretty; if you consider that dedupe is effectively a block-level implementation of single-instance HSM, the end result is obvious.

    Having also dealt with file-level recoveries from block-level backups, the performance issues when dealing with recovery of large, highly fragmented files are … well, to be frank, hideous. If we’re talking large deduplicated data stores, again, it’s a similar sort of problem.

    So yes, it’ll “work”, but for very small values of “work”. Undoubtedly vendors will promote this as a usable solution, too, and undoubtedly some companies will buy into it. But the resource requirements will be higher, the performance lower, and the RTOs will have to be very large to support recovery from deduplicated tape – so it’s quite likely it’ll be at best a niche sort of solution.

    Cheers,
    Preston.

    • cyoung says:

      Preston,

      I agree with you that if you attempt to put block level deduplicated data onto tape that the seek times would be atrocious. Realistically, anything more granular than file level deduplication is likely untenable. Moreover, using tape where a quick RTO is required will always be problematic and this is the reason (as you know and stated) that tape is a fail-safe medium and an archive medium. Keeping those two applications in mind, tapping into a deduplicated data store, re-hydrating unique instances of data and building a new metadata catalog that provides information on where the data originated from and storing that on an LTFS enabled volume is, again, where I see deduplication and tape meaningfully converging. This doesn’t fully address your RTO comment because the tape based metadata catalog would still have a lot of pointer to what, in effect, would look like highly fragmented data on tape and the start, stop, seek, backhitch, etc joy that can come with tape could (likely would) hamper restore performance. I think you will see an answer to that too (albeit I don’t think there is a huge requirement because we are talking about data that, in a lot of cases, won’t be restored or will only be restored very infrequently). I suspect you will see hardware vendors supplying LTFS supported / enabled hardware to evolve as well to support more than 2 partitions and that this will allow for some form of “on tape defragmentation” as is already being done with Oracle T10KC drives. Again, I don’t know that the need for this is significant because of the low probability of archive data needing to be restored quickly but it is something that I think will come to pass and mark part of what I consider the meaningful convergence of deduplication and tape.

      Best,

      Chris

Leave a Reply

Headquarters
505 East Huntland Drive
Suite 450
Austin, Texas 78752

Tel: +1 (800) 450 0575
Fax: +1 (512) 485 7856
Locations
To find out how to get to us please use the link below and navigate to the office you require
Follow us
Contact Us
2011 © Gresham Enterprise Storage