我在dmesg中有垃圾邮件,其中包含:
kernel:EDAC MC0: UE page 0x0,offset 0x0,grain 1073741824,row 3,labels ":": i3200 UE
知道什么是假的吗?
这里是加载模块:
# lsmod | grep edac i3200_edac 3330 0 edac_core 46581 2 i3200_edac
edac-util不显示任何错误
# edac-util -v mc0: 0 Uncorrected Errors with no DIMM info mc0: 0 Corrected Errors with no DIMM info mc0: csrow0: 0 Uncorrected Errors mc0: csrow0: ch0: 0 Corrected Errors mc0: csrow0: ch1: 0 Corrected Errors mc0: csrow1: 0 Uncorrected Errors mc0: csrow1: ch0: 0 Corrected Errors mc0: csrow1: ch1: 0 Corrected Errors mc0: csrow2: 0 Uncorrected Errors mc0: csrow2: ch0: 0 Corrected Errors mc0: csrow2: ch1: 0 Corrected Errors mc0: csrow3: 0 Uncorrected Errors mc0: csrow3: ch0: 0 Corrected Errors mc0: csrow3: ch1: 0 Corrected Errors mc0: csrow4: 0 Uncorrected Errors mc0: csrow4: ch0: 0 Corrected Errors mc0: csrow4: ch1: 0 Corrected Errors mc0: csrow5: 0 Uncorrected Errors mc0: csrow5: ch0: 0 Corrected Errors mc0: csrow5: ch1: 0 Corrected Errors mc0: csrow6: 0 Uncorrected Errors mc0: csrow6: ch0: 0 Corrected Errors mc0: csrow6: ch1: 0 Corrected Errors mc0: csrow7: 0 Uncorrected Errors mc0: csrow7: ch0: 0 Corrected Errors mc0: csrow7: ch1: 0 Corrected Errors
解决方法
这似乎是一个记忆错误,但不是致命的错误.
echo 0 > /sys/module/edac_core/parameters/edac_mc_log_ce
将在下次重新启动之前阻止控制台上的垃圾邮件.
基本上ce_errors是可纠正错误的缩写(在ram之外也没有“缺陷”).
见kernel docu about edac
和edac wiki
更多细节.
虽然我可能完全错了,但我们有一台服务器(ECC RAM),因为没有无法纠正的错误,memdisk没有显示任何问题,我让它使用相同的ram运行,更改输出,开始监控无法纠正的错误和对我们来说没有进一步的问题.