In preparation for blogging on the application of flash memory technologies to directory services, this blog post will cover the basics of flash memory. This will help people begin to understand why flash through the ZFS secondary [filesystem] cache (a.k.a. L2ARC) and ZFS Intent Log (a.k.a. ZIL) can improve overall directory performance. Further, the use of flash memory and ZFS will also enable radical new directory services architectures. More on those applications in a future blog post. For now, let's review the basics of flash memory.
Flash memory is a non-volatile storage medium that is read and written to through electrical erasure and reprogramming. Flash memory has better kinetic shock absorption properties, lower power consumption and much greater IOPS than hard disk drives. This combination of features makes flash memory a great intermediate storage medium between hard disks and DRAM memory in the storage stratum hierarchy. As an aside, read Brendan Gregg's blog post on how Sun's ZFS makes harnessing the best of flash memory possible and easily accessible by all applications through the ZFS Intent Log and ZFS secondary cache (a.k.a. L2ARC).
Flash memory consists of an array of memory cells in the form of floating-gate transistors. At the present time, there are two categories of flash memory devices. They are single-level cell (SLC) and multi-level cell (MLC) devices. SLC devices store a single bit of information per cell and MLC devices store multiple bits of information per cell. The inherent danger that comes with MLC devices is that the flash memory loses more data per block than compared to SLC when memory cells fail. StorageSearch.com has a great article titled Are MLC SSDs Ever Safe in Enterprise Apps that gives a great in-depth look at this issue. Sun recognizes this tradeoff and has elected to stick with SLC based flash memory for its performance and better reliability characteristics.
The two most well known limitations of flash memory are erase-before-write and write endurance. Erase-before-write requires that an occupied data block (i.e. typically a 512KB group of 4KB memory cells) must be erased before the new data can be written to that block. The performance implication of this limitation is that the device will have a perception of very high write performance until all blocks have been written to for the first time. Once the effects of erase-before-write kicks in, the overall write throughput can drop significantly.
Write endurance simply means that each memory cell has a limited number of times that it can be erased before the memory cell fails. Current flash memory device write-erase-cycles range from 100,000 to 1,000,000 where MLC devices represent the low end and SLC devices are on the high end.
Flash memory producers typically employ one or more of the following strategies to address these limitations.
- Wear Leveling techniques distribute writes across the full array of memory cells in order to avoid pockets of cell failure. Wear leveling is also used to relocate bad blocks of data to working cells as well.
- Garbage Collection is employed within the flash memory controller to during idle device cycles consolidate existing occupied data blocks and erase the freed blocks. Garbage collection helps to prevent erase-before-write during write operations by keeping the erased block pool as large as possible. You might say that Garbage Collection is automated defragmentation while the flash memory is idle.
- The TRIM command, when invoked by the operating system tells the flash device to free up blocks that have been marked for deletion. For example, when you empty the trash bin of a Microsoft Windows desktop, the files are marked for deletion but not actually deleted. If Microsoft Windows is configured to run the TRIM command when files are deleted, the blocks occupied by the deleted files will be erased and made available for use by new data. If blocks were only marked as deleted and not actually erased, the flash memory device would fill up such that it would have to erase-before-write on every write operation because there wouldn't be any free (e.g. pre-erased) blocks available. Fortunately the use of flash memory by ZFS via the ZFS intent log and ZFS secondary cache do not need to invoke the TRIM command because the data is deleted instead of just being marked for deletion.
- Reserve Capacity is employed to ensure that the flash memory device has exccess capacity for bad block management and working space for garbage collection. Sun's flash memory devices reserve approximately 25% per flash module for this purpose.
A third less known limitation of some flash memory devices is the issue of write cache reliability. The core issue here is that most flash memory devices use DRAM to buffer data before it is written to the flash memory device in order to increase throughput performance. The danger with this kind of buffering is that during a power loss, all data in DRAM cache is susceptible to being lost if not committed to flash memory in time. Sun's flash modules mitigate this problem by implementing a super-capacitor to give the on-board DRAM buffer memory time to commit all buffered data to flash memory before the DRAM power is exhausted.
That wraps up this blog post.
Have a flashtastic day!