I'm working with embedded ARM platform with built NAND flash. My roofs partition is squashfs. Both u-boot and kernel use OMAP_ECC_BCH8_CODE_HW. The problem is that some boards (not just one) stopped working after a power outage (they were used for about 2 months). These errors can be seen while booting:
How should I debug this? I haven't erased the flash so it's still possible to make some tests on it. What I've done so far:
It seems to me that these errors are recoverable and that's why nanddump corrected them in the first case. I compared these 2 dumps and they are only three differences (3 ecc corrections reported by nanddump?)
The question is: why these errors weren't corrected by system automatically? Is it because squashfs is not "mtd-aware" filesystem and it shouldn't be used on mtd devices? If so, should I use squashfs over UBI? What about the kernel then (as far as I know it has to be raw image in order to boot it from u-boot)? Thanks for any help! | ||
Indeed, the Linux MTD layer doesn't do any maintenance on the NAND/NOR memory. For example, when a bitflip happens on your NAND, it's corrected by the ECC. The MTD layer is aware of that, but it doesn't DO anything about it. It just returns the error. So you need another layer on top of MTD to take care of that. One solution is to use UBI, which is designed to solve this kind of problems. Have a look at the UBI documentation on linux-mtd. If you want to stick with squashfs, it's possible to add another MTD abstraction on top of UBI (gluebi), then run squashfs on top of that. The result looks like that:
It makes a scary picture, but it works pretty well ;) Have a look at this slides from free-electrons for more info (the picture comes from slide 47). About the kernel, I'm not sure but I think U-Boot does support UBI. Never tried it though... | ||||
联系客服