Dr. C. Scott Ananian (cananian) wrote,
Dr. C. Scott Ananian

Flash Filesystems

Dave Woodhouse in a recent article calls shenanigans on hardware translation for flash devices. I agree: flash memory is a Horse Of A Different Color, and trying to gussy it up to look like a rotating disk of magnetized rust is using a bad and leaky abstraction that will only end in tears. But I don't think the engineers' better judgment will prevail: the use of hardware translation is driven by Windows/DOS compatibility concerns, since Microsoft (to my knowledge) has shown no desire in writing a new filesystem for flash drives. OLPC used a raw flash device in the XO-1, but in their follow-on had to switch to a hardware-translation device because market/scale economics were making those devices cheaper and cheaper while the original raw flash device was (a) not increasing in volume (aka, getting relatively more expensive), (b) not increasing in size (no one wanted to make new ones), and (c) getting discontinued (aka, impossible to buy). The best one can hope for is that a raw interface be offered in addition to the standard "Windows-compatible" one, for specialized embedded or high-performance applications — but the chicken and egg problem applies: until there are compelling gains, these interfaces won't be purchased in sufficient volumes to yield reasonable prices, and no one is writing the optimized filesystems because you can't find reasonably-priced flash devices to run them on. The end result is likely to be that Worse Is Better, and we'll be left with another set of legacy chains. Given enough time and transistors, the hardware may eventually grow Heroic Measures to work around the bad abstraction (see: the x86 instruction set).

If your filesystem is large enough and the amount of data being rewritten small enough, the flash problems "probably" won't bite you until after the obsolescence of the device — flash storage doesn't have to be good, it just has to be "not too bad" until, say, 3 years after purchase. Like non-removable rechargeable batteries that slowly degrade over time, you'll find your filesystem slowly degrading — one more reason to eventually buy a new one, and I've never known manufacturers to be especially sorry about that. Heroic Measures may never be needed/taken.

Leaving amateur market economics (and despair), let's revisit a cryptic and probably overlooked paragraph in my olpcfs proposal:

Implementations tuned for flash memory can use Adaptive Packed Memory Arrays for efficient CAS. This is an open research topic, but there is some evidence that a high-performance scalable flash-based filesystem can more profitably be implemented using cache-oblivious B-Trees and APMAs, rather than more "traditional" filesystem structures.

Here I'm trying to build a little bridge between filesystem designers and functional data structure researchers, two groups who probably rarely sit down together for a beer. I think Packed Memory Arrays are the "B-trees of flash filesystems": a better way to build an on-disk index given the peculiar characteristics of flash memory. Just as BeOS demonstrated that your filesystem could be "B-trees all the way down", I believe you could build a compelling filesystem using PMAs as the primary data structure. Ultimately, I suspect that the development strategy Dave Woodhouse describes — small incremental changes to existing filesystems whose philosophies are "close enough" — will probably prevail over a ground-up PMA rewrite. Incremental improvements and shared codebases are the Right Strategy for successful open source projects: you get more developers and testers for those parts which aren't flash-specific, and you've already gotten yourself out of the Cathedral with some working code to kick things off.

But if anyone's interested in attempting a clean slate design, PMAs are my advice. Maybe you'll come up with something so wonderful it will make a compelling book (and inspire folks like me), even if ultimately you don't win in the marketplace.

(But maybe you'll win! Flash storage and your optimized filesystem will prevail, and one day we'll think of rotating media in the same way we now think of core memory, floppy disks, tape drives, and the x86 instruction set... er, hmm.)

Tags: olpc, olpcfs

  • Reading Project Talk (and slides)

    An unruly tag team of OLPC folks gave a long talk on the Literacy Project today for attendees at this year's OLPC-SF Community Summit. It was…

  • The Importance of Sensing Distance

    At IDC 2012 in June, Arnan Sipitakiat and Nusarin Nusen discussed how they are using Robo-Blocks—a turtle robot and “tangible Turtle…

  • XO Turtle Bot drives around

    Here's a first look at an XOrduino Turtle bot driving around: I've checked out all of the functionality on the A1.5 board except the step-up…

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded