January 2, 2009

Shoah Foundation tames 8 PB with tape and automation

http://collaboratory.nunet.net/nssd112/oakterrace/imc/spielberg.JPG

The Shoah Foundation, founded by Stephen Spielberg to preserve Holocaust survivors’ narratives after Schindler’s List and now a part of the University of Southern California, has conducted interviews with thousands of survivors in 56 countries. The Foundation has 52,000 interviews that amount to 105,000 hours of footage.

CTO Sam Gustman says the footage was originally shot on analog video cameras, then converted to digital betacam and MPEGs for distribution online. It currently amounts to 135 TB. However, the Foundation is converting the footage to Motion JPEG 2000, which will create bigger files–about 4 PB of data, Gustman estimated. Each video will be copied twice, bringing the total to 8 PB.

Gustman says the Foundation received a $2 million donation of SL8500 tape libraries, Sun STK 6540 arrays and servers from Sun Microsystems in June. The Foundation has an automated transcoding system running on the servers, and that takes up the 140 TB of 6540 disk capacity for workspace. Sun’s SAM-FS software will automate the migration of data within the system, to the 6540 and then to the SL8500 silo for long-term storage.

We’re hearing a lot in the industry these days about rich content applications such as this one moving to clustered disk systems, but Gustman said disk costs too much for the Foundation’s budget. He sees the potential for an eventual move to disk storage, but “disk is still too expensive–four to five times the total cost of ownership, mostly for powerand cooling.”

Another advantage to the T10000 tape drives the Foundation plans to use is that they will eliminate having to migrate the entire collection to disk during copying, transcoding and technology refreshes. One T10000 drive can make copies or do conversions directly between drives in the robot, and the virtualization layer with SAM-FS means that can happen transparently.

However, as an organization charged with the historic preservation of records, Gustman agreed with others I’ve talked to about this subject in saying that there’s still no great way to preserve digital information in the long term. “The problem with digital preservation right now is that you have to put energy into it–you can’t just stick it in a box and hope it’s there 100 years from now,” he said. “Maybe there’ll be something eventually that you don’t have to put energy into, but it doesn’t exist yet.”


Source: Search Storage

No comments:

http://lh6.ggpht.com/_qwBkZfsYCCI/SagBUtnHMtI/AAAAAAAAAIk/U6lEwYXKIT0/Banner%20cuadrado.gif