Exploring the differences in integrated and host-based error code correction (ECC) in NAND flash, diving into the details of the impact of each approach on system performance, reliability, and ultimately cost...
Cost pressures in memory-intensive applications such as high-end consumer products, networking, and industrial systems are driving engineers to find new ways to reduce system cost while still improving performance. Error code correction (ECC) is an essential technology for maintaining reliability and extending memory longevity in NAND flash. To achieve better efficiency for these markets in NAND flash-based systems, developers continue to use architectures in which ECC is implemented in the host MCU, as opposed to those with integrated ECC. This article will explore the differences in integrated and host-based ECC, diving into the details of the impact of each approach on system performance, reliability, and ultimately cost.
Error Code Correction Technology
When selecting flash memory for a system, developers have the choice between NAND and NOR technologies. A NAND cell is smaller than a NOR cell, so NAND has a lower cost per bit than NOR memory. This in turn has led to NAND flash being available in higher densities compared to NOR flash. Furthermore, the physics behind the NOR cell results in a longer program-erase (P/E) time compared to NAND. Due to these advantages, NAND is being adopted at an ever-increasing rate.
The disadvantages of NAND flash have traditionally been its endurance and slower read performance. NAND cells wear down or lose their ability to hold a programmed value over time, causing memory bits to switch states. When a block begins experiencing wear, its data can be transferred to another block. To prevent data loss as cells degrade, ECC technology is used.
ECC uses redundancy to verify that stored data matches what was written to memory. In addition, when an error is detected, ECC can correct a limited number of errors per block to guarantee higher data integrity. When a certain error threshold is exceeded, the data is moved to a new block. The abandoned block is marked as “bad” and never used again. Thus, NAND flash combined with ECC can provide the level of integrity required by high-reliability applications.
However, because ECC generation and checking is a process that takes time, it impacts throughput and system cost, depending upon how it is implemented. In general, ECC can be integrated with the memory itself or managed externally by the host processor. Figure 1 shows the various implementation options. The integrated approach has two configurations: a one-die approach where ECC is part of the memory die and a two-die approach where a controller IC (with serial interface and ECC) is collocated with the memory die. With a host-based approach, ECC support is part of the MCU NAND flash controller accessing the NAND flash. The availability of these three options from memory manufacturers allows OEMs to select the best tradeoffs for their application.
A Flexible Approach to ECC
ECC integrated into the NAND flash memory offers the advantage that ECC is managed directly by the memory chip itself. However, while this simplifies system design to some degree, the integrated approach comes at the tradeoff of higher memory cost and reduced read performance. The reduced read performance is due to slower internal clocks in the flash, compared to much higher internal clock frequencies used in host processors.
Memory cost is higher because integrating ECC increases the size and complexity of the NAND flash device. Consider that a hardware implementation of 8-bit ECC has a gate count of about ~50K. This represents an ~1.7% impact on a simple memory controller’s gate count (3000K). If integrated on a NAND memory, however, the impact is between 10-15% and raises the cost of the memory more significantly. For systems with large memory requirements that use multiple memory devices, integrating ECC with the NAND memory means this added cost is paid multiple times, as opposed to paying for ECC a single time when it is based in the host MCU.
Read performance also drops because integrated ECC adds latency to each memory read at a clock rate lower than that at which a host controller could process ECC checking. Figure 2 shows a comparison of read throughput for NOR flash, NAND flash with integrated ECC, and NAND flash with host-based ECC. As can be seen, NAND flash with integrated ECC has less than half the performance of NOR flash. However, when ECC is host-based, NAND flash read performance nearly doubles, putting it almost on par with NOR flash.
Host-based ECC also provides better performance when an error is detected (and corrected). Figures 3a and 3b show the impact of an error on the Read First Data Time (RFDT). With integrated ECC, RFDT increases from 45 to 70 microseconds. With host-based ECC, RFDT is much better, only increasing from 35 to 45 microseconds.
Another important factor to consider when using ECC is the strength of the ECC algorithm used. Figure 4 shows how 12-bit ECC extends the NAND read cycles lifetime by 1.4X and NAND P/E cycles lifetime by 1.47X compared to 8-bit ECC. The issue with integrated ECC is that the ECC algorithm type is fixed, typically at 8-bit levels to keep chip cost down. When ECC is host-based, developers have the flexibility to choose a stronger level of ECC to extend the effective lifetime of NAND memory blocks. This in turn enables developers to select a lower density —and less expensive — memory device without compromising reliability.
Issues also arise from incompatible ECC specs. Memory chips from different vendors may implement the ECC protection spare area using a different page length and/or interleaving spare structures rather than running them consecutively. Each vendor may also use a different flag to designate the error threshold level (i.e., the threshold before a block is consider “bad” and its data is transferred to another block). These can create problems for developers, including incompatibility of ECC status bits.
When EEC is host-based, the MCU determines the particular ECC specs to use and applies them consistently across all memory chips. In this way, incompatibility issues are avoided and system design is simplified.
Lowering System Cost
The advantages of using NAND flash with host-based ECC can be applied to systems currently utilizing NOR Flash. NAND flash is available with serial interfaces like SPI to save 9 pins compared to parallel NAND flash devices, shrink the overall memory footprint, and simplify board layout. For many applications, OEMs can somewhat seamlessly migrate existing designs from more-expensive SPI NOR Flash to SPI NAND Flash to realize substantial cost savings.
It is important to note that while migrating to NAND flash with integrated ECC provides immediate short-term benefits — such as utilizing the same bus protocol and SPI controller already present in the system — this approach has long-term drawbacks. As discussed earlier, integrated ECC offers only about half the performance at a higher system cost compared to host-based ECC. Thus, any OEM considering taking advantage of the cost efficiencies of NAND flash should seriously consider taking the additional step to implement a host-based ECC architecture.
ECC is a critical technology for assuring the high reliability of NAND flash. By implementing ECC using a host-based approach, OEMs can significantly lower memory costs while having the flexibility to select the optimal configuration for their application including choosing stronger ECC, having full control of error resilience and the error threshold level, improving NAND read and P/E cycles lifetimes, and being able to select lower density memories for volume production. Host-based ECC also eliminates potential incompatibility issues that can arise with integrated ECC implementations.
High reliability memory is an important requirement for many applications. With its high densities, serial interfaces such as SPI and low cost per bit, NAND flash using host-based ECC enables developers to lower system cost while shrinking the overall memory footprint and simplifying board layout.
The availability of both high-reliability NAND flash and NOR flash gives developers greater flexibility in design. Together, these memory technologies allow developers to choose the optimal capabilities for their application, such as NOR if faster random access is needed or NAND for faster write performance.
— Jim Yastic, senior technical marketing manager, Macronix America