Ruby sensei Chad Fowler in Santa Clara earlier this summer. I love hanging out with Chad when I get the chance. Always a treat.
The Golden Gate Bridge from the Marin Headlands. It’s a classic shot that I’m happy to take time and time again. Photographed with a iPhone 4.
It’s a beautiful day to run off a time lapse or three or five at Crater Lake today. The best part about doing a time lapse is that once you’ve set it up, you can chill out for a while and drink the scenery in.
Funny side note: everybody that stops at this outlook is walking up to get a shot from this same angle. A camera on a tripod is a magnet.
The rapid pace at which I acquire data—somewhere betwee 1-2TB/yr—means that I’ve cycled through many a storage setup over the last few years. At one point, my entire photo library fit on a single drive. I’d expect this is true for most photographers in this day where 2TB drives can be had for ridiculously low prices.
When all your data fits on a single drive, the strategy is easy. A primary drive for your active library. A secondary drive for the backup that stays at home. And a third drive to make a backup that lives somewhere else, such as your safe deposit box. Easy. Clean. Simple. If you fall into this category, count your lucky stars. The only reason to read onward is for entertainment value.
When you exceed the amount of data you can comfortably put on a single disk, it all gets way too complicated. Quickly. And it’s not like you can jam 2TB of data on a 2TB disk. No no no no nooooo. You really don’t want to run a filesystem more than 75-80% full. So, really, you have to bite the bullet at 1.5TB if you’re using 2TB disks.
A few years ago, when I exceeded what I could comfortably put on a 750GB hard drive (the biggest that you could reasonable get your mitts on at the time), I decided to stay away from anything too terribly complex and to continue on the same basic idea. Times two. Everything before 2006 on one drive, everything after on another. That meant two onsite and two offsite backup drives. A bit bulky, but not too bad. It seemed like it would be fairly simple to linearly scale this approach.
Except that it’s not quite linear if you want to maintain a nice YYYY/MM/DD folder structure. To do that and stay within a 75-80% cushion means that you give up even more space. Luckily, hard drive capacities quickly inflated and I simply upgraded my 2 primary + 2 onsite backup + 2 offsite backup strategy from 750GB disks to 1.5TB disks and finally 2TB.
My data acquisition rate, however, ended up out-running the ability for Seagate, Hitachi or Western Digital to keep up. I ended up going to 3+3+3 drives late last year. By the middle of this year—in part thanks to my dabbling in video—I could see 4+4+4 on the horizon. And really, 5+5+5 was going to happen in 2011, thanks to the lost space that the 25-35% buffer demands. It was clear that it was time to do what I hadn’t wanted to do all these years: Go RAID.
I can clearly hear some of you dear readers shouting at me right about now. ”Delete more photos! Surely you don’t need to keep all of them!?!?!?” Damn. I wish that helped. I do, in point of fact, delete obvious dingers. I even delete some of the ones that aren’t obvious. If I didn’t do that, you don’t even want to know how much data I’d have on hand. I’d guess it’d be past 8TB by now. Seriously.
So, RAID. Let’s get one thing out of the way right now. RAID isn’t magic. It’s not backup. And each kind of RAID has it’s own consequences. Even today’s spiffy Beyond RAID and RAID-X devices that purport to make things better don’t solve it all. RAID, as we know it, is a power tool. It solves some problems fairly well, but gives you the ability to screw things up just as fast and with tons more data than ever before if you mess up. I’ve worked with big-ass RAID setups in data centers and that’s precisely why I didn’t want to go there at home all these years.
A quick aside: In a perfect world where Sun hadn’t gotten weird, maybe ZFS would have saved us by now at the workstation level like it has many a sysadmin in the server farms. But something happened with legal types and Apple bugged out of ZFS and now we’re stuck waiting to see if something better comes out of the Infinite Loop. Then, Sun got bought by Oracle which is even weirder, as Google is finding out. It’s probably a good thing Apple got cold feet.
Anyway, it’s 2010. 2TB hard drives are the norm. And if you want to stack a bunch of ‘em together into a single large volume, you’re looking at RAID in one form or another. If you condense all the gobbly-gook mumbo-jumbo and cut through a lot of the crap, I personally think it boils down to the following options, at least at the SOHO level:
Given that, I pondered for a while. Weeks. Months, it felt like. Not full time, mind you. But, every time I was on an airplane, I doodled on napkins trying to decide what I was going to do to balance out the pros and cons of each approach available. This week, up against a wall with data storage and needed to do something about it, I finally decided that I would pursue a strategy that incorporated the best of two different RAID approaches, giving speed where I wanted it most and safety where I needed it.
For my local working dataset—where I feel the need for speed—I built a 8TB RAID 0 out of 4 Hitachi 7K2000 2TB drives in a FirmTek enclosure. Lloyd Chambers has benchmarked this exact array running at a peak of 500MB/s when installed inside a Mac Pro. At 50% full, it runs north of 400MB/s. My array won’t run this fast in my enclosure, but that’s because I bought a port-multiplied enclosure a while back that was more optimized for my previous strategy. The theoretical max for my current enclosure should be a bit under 300MB/s. Not shabby. And if I need the speed the drives can give, I can change enclosures and SATA cards.
Next, for on-site backup purposes, I picked up a six-bay ReadyNAS Pro Pioneer and stuffed it with a bunch of 2TB Seagate drives that I’ve been accumulating. I’ve set it up for dual disk redundancy. This means that it can weather having two disks die. With a full set of 6 disks, it’s the perfect size to back up my 8TB working array over the network. Even better, the 2TB drives I put into it are from a wide range of lots as I’ve been picking up them up in ones and twos from different sources over the six months or so. This helps increase the odds that there won’t be a double or triple disk failure.
Once implemented, instead of a bunch of volumes all over my Finder, this will give me one big volume on my desktop. Then, there’s a big volume out on the network to be the safe copy. Better yet, thanks to the fact that the ReadyNAS has an rsync server, I don’t even have to mount the drive via the Finder. I can just launch a script to sync things up in the background and keep my desktop nice and tidy. Finally, the ReadyNAS also has a feature that scans all the data on the drives every week and checks for bit rot.
Inevitably, at some point, I’ll need to expand again. I figure I’ll run into the wall again late next year at current rates. I hope that 3TB drives will be common at that point and that I can move from 8TB to 12TB in a nice smooth move. If, on the other and, data piles up faster than that, it’s almost certainly going to be because of video. In that case, the solution will be straightforward. I’ll repeat the double-down strategy, but split photos and video apart into their own local and remote arrays.
For the time being, however, I’ve got working room again. And a sane plan that gets me a bit further down the road. At least a year or 18 months. Maybe a bit more.
“Wait a minute,” you say. “What about those offsite backups?” Well, dear reader, those are going to remain on portable external drives for now that I cart back and forth to the bank. It’s not a great solution, but it works acceptably well. I still hope to replace this with a better solution someday. The ReadyNAS boxes have the ability to sync with another ReadyNAS in a remote location. That could be an interesting thing to investigate if I end up liking the box I have here now.
No, not Malcolm Gladwell’s book. This is an episode of Radiolab on WNYC talking about why we blink and how it might be tied to storytelling. From last year, but so what.
I’ve noticed something kinda humorous over the last year or so. When I’m editing a massive group of photos—sorting out selects from keepers from junk—and I’m listening to music with a good beat, I move between photos in sync with the beat.
Heavy duty strong beat dance music or rock and roll, I move between frames fast. Sometimes on the beat, sometimes every other beat. But my fingers dance on the keyboard right in time. When I get too involved and really start moving too fast, I have to back up a few frames and re-roll and pick a slower subset of beats.
Beat/move, beat/move, beat/move, hold on a few beats and check out this image, move forward on the beat. Move back on the beat. Keep going. Beat/move/beat/move.
I’m sure it looks its craziest when I’m bouncing between three or four images trying to decide rankings. In my brain, it makes total sense. I’m sure that to an outsider, however, it can look like a hyper-spastic seizure-inducing what-the-frak-is-he-doing mess. Fine by me. Whatever works to slice and dice a set down.
This observation brought to you by the ranking of 3 and the right arrow key. Now back to editing another set. The Foo Fighters are up in iTunes. Oh yah.
In Fast Company. From the article:
According to Bruno Giussani, a former journalist who now directs TEDGlobal, “Chris [Anderson] believes in the fact that when you put smart people together into a room, they tend to engage, and he wants to use that to make a contribution toward a better world.”
It’s amazing. At every TED event I’ve been at, I’ve watched so many smart people connect and engage. It’s one of the few things that gives me real hope for the future.
Wade Allison in the New Scientist. It’s only a single point of view, but given how much pollution—from mercury to CO2—is created by our current mainstream power generating technologies, it’s something not to be dismissed out of hand.