What are the Common Memory Error Types and How Do ECC DIMMs Work?

2023-12-22 ATP Blogs
DRAM,DRAM modules,SDRAM,DDR1 DRAM,DRAM modules,SDRAM,DDR1 DRAM,DRAM modules,SDRAM,DDR1 DRAM,DRAM modules,SDRAM,DDR1

Defective main memory can disrupt business operations with performance degradation or hardware crashes, leading to costly downtime. Dynamic random access memory (DRAM) modules typically have built-in mechanisms that address memory errors. This post answers the most common questions on computer memory errors to help you ensure high availability and maximum reliability of DRAM installed in your mission-critical systems.


What are the types of memory errors?

Memory errors fall into two broad categories:

Soft Memory Errors are those that randomly corrupt memory bits and alter stored data but do not cause physical damage to the memory module.  Soft memory errors damage the data being processed rather than the system hardware, but in mission-critical applications such as medical equipment, industrial controllers, autonomous cars, security/surveillance systems and data centers, uncorrected soft errors may lead to catastrophic outcomes.


There are two types of soft memory errors:

  • Chip-Level Soft Errors are usually due to the radioactive decay of elements in the memory chip packaging. When these alpha particles hit the chip, they cause the cell to change its state to a different value, create an imbalance in the electrical properties of the chip, and cause stored data to be corrupted. Due to advancements in memory design and technology, these types of errors are now rare, as it takes about 10 years for the chip materials' radioactive elements to decay.


  • System-Level Soft Errors usually occur when the data being processed is hit with a glitch or noise while data is on the data bus. Noise is interference or static that destroys signal integrity and can come from electromagnetic interference (EMI) radio waves, electrical wiring, lightning, bad connections, and other sources. The noise could be misinterpreted by the system to be a data bit and uses or executes the bad data bit or program code, resulting in an error.


  • Hard memory errors are errors that keep recurring as a result of hardware or physical defects on the memory module. Hard memory errors are commonly caused by operating a system beyond the memory's speed capacity and subjecting the system to charges of static electricity. Other causes include environmental factors such as temperature, shock/vibration, electrical/voltage stress or physical stress. Mishandling, aging, or manufacturing defects can also affect the reliability of hardware components. Hard errors are usually permanent and require module replacement.


How can you tell if the memory error is soft or hard?

Soft memory errors can typically be rectified by rebooting the system. If the system is rebooted and the errors keep recurring, they are most likely caused by hard errors and the solution is to replace the memory chip or module entirely.


How costly are memory errors?

At best, memory errors can degrade performance. At worst, they can cause system crashes. Aside from hardware repair and replacement costs, memory failures can cause major end-user service disruptions, damage important data and consequently affect general operations.


What external factors affect memory performance and reliability?

Extreme temperatures are generally considered to impact the physical makeup of memory because they cause physical changes to the materials or components, so companies make considerable investments on thermal and cooling solutions. Increased utilization and DIMM age can also affect memory performance and reliability and increase the severity of memory errors.


What error correction mechanisms are available and how do such mechanisms work?

In mission-critical applications where data corruption and system failure must be avoided, dual in-line memory modules (DIMMs) with error correcting code (ECC) are used. ECC DIMMs can do either single-bit error correction (SEC) or SEC and double-bit error detection (SECDED). SEC alone cannot detect double-bit errors so it will report the memory as error free if there are two error bits. SECDED, on the other hand, can detect all single- and double-bit errors but will correct only single-bit errors. It is unable to detect triple-bit errors or correct double-bit errors.More advanced error detection and correction can be handled by more complex codes such as ChipKill™ or Advanced ECC memory, which is capable of detecting and correcting multi-bit errors that standard ECC cannot correct. Developed specifically for the NASA pathfinder mission to Mars, ChipKill works by creating a duplicate set of data in the form of a checksum in another part of the memory subsystem. When memory failure occurs, data recovery is done by recalculating the data from the checksum information, allowing the DIMM to withstand even the failure of an entire DRAM chip and resulting in better system availability. Studies have shown that ChipKill reduces uncorrectable error rates by up to 4X compared to SECDED.


What are correctable and non-correctable errors?

Correctable errors are generally single-bit errors that the system or the built-in ECC mechanism can correct. These errors do not cause system downtime of data corruption. Uncorrectable errors are generally multi-bit errors that could cause the system to crash or shut down immediately. 


Physically, how does an ECC DIMM differ from a non-ECC DIMM?

If the number of chips on the module is divisible by three, the module is an ECC DIMM. Standard RAM has eight memory chips that store data, providing it to the CPU on demand. An ECC memory module has an additional memory chip to detect and correct errors for the eight chips. The table below shows illustrations of ECC and non-ECC DIMMs from ATP.

Table 1. ATP DDR/DDR2/DDR3/DDR4 ECC and non-ECC DIMMs.


ATP DRAM Differentiators

ATP DRAM products are used in applications where the highest degree of reliability is required. Memory errors can have a major impact on operations, so ATP painstakingly ensures that all its DRAM products meet the toughest standards.


  • Functional Testing: Automatic Testing Equipment (ATE) 
    Major integrated chips (ICs) used in ATP DRAM products are sourced from Tier 1 manufacturers and undergo meticulous testing to ensure excellent reliability and longevity. All DRAM modules undergo stringent functional testing using the Automatic Testing Equipment (ATE) to detect structural and component defects and to screen out marginal timings and signal integrity (SI). 

Figure 1. Functional testing using ATP Automatic Testing Equipment (ATE).

  • System Testing: Test During Burn-In (TDBI) 
    At mass production (MP) level, all the modules are subjected to Test During Burn-In (TDBI), which combines temperature, load, speed and time to stress-test the memory module and to screen out weak ICs. ATP's TDBI aims to effectively screen out defective DRAM chips that will potentially fail during the early life failure (ELF) period. By ensuring that only robust DRAM chips are on the module, TDBI significantly lowers failure rates and extends the product service life.

    Since even just 0.01% error on a 99.99% effective device can increase the failure rates at module level and lead to failure in actual usage, TDBI detects and screens out the 0.01% error to ensure the DRAM modules' reliability.

Figure 2. ATP Test During Burn-In (TDBI) for 100% of DRAM modules at mass production (MP) level screens out weak ICs.


  • ATP Mini Chamber
    During TDBI, the specially designed ATP Mini Chamber isolates the temperature cycling to the targeted area so only the modules are subjected to burn-in. This makes it easy to find the root cause of failure and keeps the motherboard in stable operation.

Figure 3. ATP Mini Chamber subjects only the DRAM modules to temperature cycling.


ATP's industrial DRAM products are available in legacy SDRAM and a complete range of DDR1, DDR2, DDR3 and DDR4 modules including the latest DDR4-2666 in different densities and form factors.


授权代理商:世强先进(深圳)科技股份有限公司
技术资料,数据手册,3D模型库,原理图,PCB封装文件,选型指南来源平台:世强硬创平台www.sekorm.com
现货商城,价格查询,交期查询,订货,现货采购,在线购买,样品申请渠道:世强硬创平台电子商城www.sekorm.com/supply/
概念,方案,设计,选型,BOM优化,FAE技术支持,样品,加工定制,测试,量产供应服务提供:世强硬创平台www.sekorm.com
集成电路,电子元件,电子材料,电气自动化,电机,仪器全品类供应:世强硬创平台www.sekorm.com
  • +1 赞 0
  • 收藏
  • 评论 0

本文由涂抹转载自ATP Blogs,原文标题为:What are the Common Memory Error Types and How Do ECC DIMMs Work?,本站所有转载文章系出于传递更多信息之目的,且明确注明来源,不希望被转载的媒体或个人可与我们联系,我们将立即进行删除处理。

评论

   |   

提交评论

全部评论(0

暂无评论

相关推荐

【经验】什么是常见内存错误类型?ECC DIMM如何工作?

损坏的主内存可能会因性能下降或硬件崩溃而中断业务运营,从而导致代价高昂的停机时间。动态随机存取存储器(DRAM)模块通常具有解决存储器错误的内置机制。这篇文章回答了有关计算机内存错误的最常见问题,以帮助您确保关键任务系统中安装的DRAM的高实用性和最大的可靠性。ATP 可提供软错误可纠错的DRAM产品。

设计经验    发布时间 : 2020-01-05

【经验】DRAM的可靠性受什么因素影响?ATP DRAM为何具备高可靠性?

ATP的DRAM模块经过两个级别的测试,以确保最大的可靠性:1、先进的IC级集成电路测试;2、增强的模块水平测试:老化测试(TDBI)和自动测试设备(ATE)确保模块达到甚至超过合格参数。同时具有工业额定温度,采用密封涂层,使用使用抗硫电阻器(基于项目)和厚度为30µm的金手指镀层。

设计经验    发布时间 : 2019-12-12

DDR5: What is On-Die ECC?

As dynamic random access memory (DRAM) capacity increases with each generation and wafer lithography shrinks to achieve higher speeds and better economies of scale, bit errors are also expected to increase. On-die ECC is an important feature of DDR5. It provides additional protection by correcting bit errors within the DRAM chip before sending data to the central processing unit (CPU).

设计经验    发布时间 : 2023-10-10

ATP(华腾国际)DRAM存储模块和NAND闪存产品选型指南(英文)

目录- Company Profile    Segment Challenges and Solutions    Thermal Solutions    Endurance Solutions    Security Solutions    CFexpress & USB 3.0    Value Line SSDs    DDR5    DRAM SOLUTIONS    FLASH SOLUTIONS    Flash Products Naming Rule    Solutions & Technologies    Flash Technology Overview table    Complete Flash Spec Overview & Product Dimensions   

型号- A750PI,E650SC SERIES,S600SC,B800PI,S750 SERIES,S600SI,S600SCA,E750PC SERIES,B600SC,N700PC,S700SC,E650SC,A750 SERIES,E600VC,S800PI,A750PI SERIES,I800PI,A600VC,A650SI,A650SC,N700 SERIES,S650SI,N750,N750PI,A800PI,A700PI,N700SI,N650 SERIES,E600SAA,N700SC,A750,N600SC,A600VC SERIES,E600SA,E650SI,E650SI SERIES,N750 SERIES,E700PIA,TR-03153,N600SI,S650,S650SC,E700PAA,N650SIA,E600SI,B600SC SERIES,S750SC,S600SIA,I700SC,N650SI,N600VI,E600SIA,E750PI,N650SC,N750PI SERIES,N600VC,I600SC,E750PC,S700PI,A650 SERIES,N650,N600 SERIES,N600VC SERIES,S650 SERIES,A650,AES-256,E700PI,A600SI,E750PI SERIES,N700PI,E700PA,S750,S750PI,E700PC,A600SC

选型指南  -  ATP  - v1.0  - 012023 PDF 英文 下载

ATP(华腾国际)DRAM存储模块和NAND闪存产品选型指南(中文)

目录- 公司简介    细分市场挑战和解决方案    热管理解决方案    TSE闪存解决方案    DRAM解决方案    闪存解决方案    闪存解决方案-固态驱动器和模块    闪存解决方案-存储卡    闪存解决方案-托管NAND    闪存产品命名规则    闪存规范概述和产品尺寸   

型号- A750PI,S600SC,N750PI系列,B800PI,S750 SERIES,S600SI,S600SCA,B600SC,N700PC,S700SC,N600,E650SC,E750,E750PC系列,E600VC,S800PI,I800PI,E750 SERIES,A600VC,A650SI,N600系列,A650SC,N700 SERIES,S650SI,N750,N750PI,A800PI,A700PI,N700SI,E600SAA,N700SC,A750,N600SC,A600VC SERIES,E600SA,E650SI,E700PIA,N600SI,S650,S650SC,E700PAA,N600VC系列,N650系列,N650SIA,N750系列,E600SI,S750系列,E650,N700,S750SC,S600SIA,I700SC,N650SI,N600VI,E650SI系列,E600SIA,E650SC系列,E750PI,N650SC,S650系列,N600VC,I600SC,E750PC,A600VC系列,A650系列,S700PI,N700系列,N650,N600 SERIES,N600VC SERIES,A750PI系列,S650 SERIES,A650,E700PI,E750PI系列,A600SI,N700PI,A750系列,E700PA,E650 SERIES,S750,S750PI,E700PC,A600SC

选型指南  -  ATP  - v1.0  - 012023 PDF 中文 下载

Momentum DRAM Series : DDR4 The Global Leader in Specialized Storage and Memory Solutions

型号- R48G00SD328ACSC,R416G0SD3282CSC,R416G0UD328BCSC,R432G0SD3282ASC,R48G00UD328ACSC,R432G0UD328BASC

数据手册  -  ATP  - v1  - 082024 PDF 英文 下载

The ATP Gym and Coach System: Exercising SSDs to Ensure Total Fitness

With the Gym and Coach system, ATP has dramatically improved RDT and the initialization process for functional test details. By making industrial SSDs undergo a lot of “painful” exercises through stringent testing, ATP makes sure that customers have everything to gain by receiving the most robust, reliable and enduring flash storage products for their applications.

原厂动态    发布时间 : 2021-05-27

DRAM-less Value Line SSDs Available in I-Temp/C-Temp Operable Models

型号- A600VC SERIES,N600VI,A600VI,A600VI SERIES,A600VC,N600VC SERIES,N600VC,N600VI SERIES

数据手册  -  ATP  - 2024/7/23 PDF 英文 下载

数据手册  -  ATP  - v1.0  - 2022/1/26 PDF 英文 下载 查看更多版本

ATP Wide-Temp DDR4 RDIMMs with I-Temp Registered Clock Driver Ensure Maximum Reliability in Extreme Temperatures

Like all ATP ELECTRONICS DRAM modules, ATP‘s wide-temp DDR4 modules with I-Temp RCD undergo rigorous 100% module-level testing to ensure maximum reliability.

原厂动态    发布时间 : 2022-08-12

ATP(华腾国际)固态硬盘选型指南

描述- Since 1991, we have consistently distinguished ourselves as one of the world’s leading original equipment manufacturers (OEM) of high-performance, high-quality and high-endurance NAND flash products and DRAM modules.

型号- A750PI,S600SC,B800PI,S750 SERIES,S600SI,B600SC,N700PC,A750 SERIES,E650SC,N601,N651SI,A600VI,S800PI,E600VC,I800PI,A600VC,A650SI,A650SC,N651SC,N750,S650SI,A600VI SERIES,N750PI,N651SIE,A800PI,N601 SERIES,A700PI,N651SIA,N650 SERIES,E600SAA,A750,A600VC SERIES,N600SC,E600SA,E650SI,N750 SERIES,E700PIA,N600SI,S650,S650SC,N651SI SERIES,E700PAA,B600SC SERIES,E600SI,N600VI SERIES,I700SC,N600VI,N650SI,E600SIA,N650SC,E750PI,N600VC,I600SC,E750PC,N651,S700PI,A650 SERIES,N650,N751PI,S700PC,N600VC SERIES,N651 SERIES,S650 SERIES,A650,N601SC,S750PC,E700PI,A600SI,N700PI,E700PA,S750,S750PI,E700PC,N651SCE,A600SC

选型指南  -  ATP  - v1.0  - 022024 PDF 英文 下载

DDR4-3200 DRAM Solutions Deliver Memory Boost to AMD EPYC™ and 2nd Gen Intel® Xeon® Scal | ATP

Taipei, Taiwan (March 2020) – ATP Electronics announces the release of fast, low-power DDR4-3200 solutions to take full advantage of the latest AMD EPYC™ Family and 2nd Generation Intel® Xeon® Scalable Processors (formerly codenamed Rome and Cascade Lake, respectively). ATP’s DDR4-3200 modules ensure a big boost in performance, compute density and productivity with their fast 3200 MT/s data rate to optimize the power of AMD’s eight-memory channel and Intel’s six-memory channel architectures.

新产品    发布时间 : 2020-05-22

展开更多

电子商城

查看更多

品牌:ATP

品类:DDR2

价格:

现货: 0

品牌:ATP

品类:DDR3

价格:

现货: 0

品牌:ATP

品类:NON-ECC DIMM

价格:

现货: 0

品牌:ATP

品类:DDR4

价格:

现货: 0

品牌:ATP

品类:DDR2

价格:

现货: 0

品牌:ALLIANCE

品类:SDRAM

价格:¥35.0000

现货: 14,507

品牌:ALLIANCE

品类:SDRAM

价格:¥11.3424

现货: 8,958

品牌:ALLIANCE

品类:SDRAM

价格:¥11.3424

现货: 8,817

品牌:ALLIANCE

品类:SDRAM

价格:¥38.1104

现货: 7,355

品牌:智多晶

品类:FPGA

价格:

现货: 5,000

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

现货市场

查看更多

品牌:ONSEMI

品类:贴片IC

价格:¥4.0000

现货:49,598

品牌:华邦电子

品类:存储IC

价格:¥4.5000

现货:10,970

品牌:华邦电子

品类:存储IC

价格:¥9.1000

现货:10,000

品牌:RENESAS

品类:MCU

价格:¥247.3344

现货:5,544

品牌:西安紫光国芯

品类:DRAM

价格:¥25.0100

现货:4,344

品牌:MICRON

品类:存储芯片

价格:¥45.5488

现货:3,365

品牌:MICRON

品类:IC

价格:¥31.0906

现货:1,680

品牌:FORESEE

品类:NAND Flash

价格:¥17.3000

现货:1,080

品牌:ISSI

品类:IC

价格:¥12.8840

现货:1,080

品牌:MICRON

品类:DRAM

价格:¥15.0000

现货:1,000

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

品牌:

品类:

价格:

现货:

服务

查看更多

ATP半导体冷板制冷器定制

可定制ATP TE Cooler的冷却功率:40~200W;运行电压:12/24/48V(DC);控温精度:≤±0.1℃; 尺寸:冷面:20*20~500*300;热面:60*60~540*400 (长*宽;单位mm)。

最小起订量: 1 提交需求>

查看更多

授权代理品牌:接插件及结构件

查看更多

授权代理品牌:部件、组件及配件

查看更多

授权代理品牌:电源及模块

查看更多

授权代理品牌:电子材料

查看更多

授权代理品牌:仪器仪表及测试配组件

查看更多

授权代理品牌:电工工具及材料

查看更多

授权代理品牌:机械电子元件

查看更多

授权代理品牌:加工与定制

世强和原厂的技术专家将在一个工作日内解答,帮助您快速完成研发及采购。
我要提问

954668/400-830-1766(工作日 9:00-18:00)

service@sekorm.com

研发客服
商务客服
服务热线

联系我们

954668/400-830-1766(工作日 9:00-18:00)

service@sekorm.com

投诉与建议

E-mail:claim@sekorm.com

商务合作

E-mail:contact@sekorm.com

收藏
收藏当前页面