Crusial SSD bug and firmware fix

About 2.5 years ago I got a Thinkpad X230 as working machine. It’s quiet a decent one given the hardware specs and since then I was happy with it even I’m not able to use the touchpad at all. The quality is just so poor compared to an Apple Macbook which I owned before. But anyway that’s another topic…

What I want to talk about here is more my experience with the hard drive. When I was asked what kind of drive I wanna have I told that it should be a SSD drive. Benefits are the speed improvements and no risks in a hard-drive crash when traveling or walking around while the laptop is turned on. Especially the speed is amazing and I’m happy to see that my installed Ubuntu starts in less than 20s and copying small files goes with an amazing speed too.

So all was working fine since then but a couple of months ago I noticed the first glitches. Sometimes when I take my laptop out of the docking station or put it back into it the system seem to hang for a short moment. Something similar also happened while walking around and e.g. playing a movie. Usually it took up to 5s and then all was working fine again. But then one more issue appeared which let my LVM partitions remount in read-only state. This was quite frustrating because an additional fsck would have to be run during each reboot due to some inconsistencies of my ext4 journal data. Even more frustrating was that in some irregular intervals my LightDM display manager froze and needed a restart by logging into my machine via SSH from a different machine. If I had no other machine available a reboot of my laptop was necessary due no keyboard press was accepted.

All this came to a final end by last Friday when I wanted to participate in a debugging theme night in the local CCC (chaos computer club) hacker space. Once arrived there I wanted to boot up my laptop but I haven’t seen more than a black screen after the initial BIOS routines were done. It didn’t went away after a couple of cold restarts. So actively participating was only a dream. Also the first question: “Looks like your SSD died. How is your backup doing?”, which I actually got from different people was kinda expected. 🙂 Thankfully I was able to answer that it is only 2 days old… But even with this a lot of work would be ahead of me. Anyway, I will have to think about if I wanna go back there after that bad omen…

Being back at home and having the machine turned off for a while it actually booted again. Hurray! I was confronted with all the orphaned nodes again across the partitions which I had to repair first, but then the system started up as usual. After creating another backup I did some more investigations to see if the hard-drive is reporting any kind of failure. But all looked fine as before. All entries of the smart status didn’t show any failure. So I did a long test with ‘smartctl’ and once I got the full report, I saw a line which told that my SSD might be affected by a firmware bug which let it stop working until the the next reboot. And during a reboot even a BSOD could happen. The problem will start to occur after the first 5000 hours on-time usage and will will continue to happen even with a drive reset. A full description can be found at tomshardware website.

After reading the details and the announcements from Crucial about this problem I look at the firmware of my SSD drive and noticed that it is years old. It’s version was “000F” which looks to be the second version of the firmware of this drive. Since then numerous updates have been released. So it was clear to me that an update of the firmware was necessary. I finally did that with a bootable CD-R and all went smoothly. The necessary restart didn’t show a problem, and even I was confronted with orphaned nodes (most likely from the last shutdown) the system works perfectly. I did various tests for all those situations which caused my machine to fail in the past. All of them passed. No more hanging videos while walking around, and also no read-only remounted file systems after taking the machine out of the docking station or putting it back into. Also no freeze of LightDM has been seen so far!

I for myself will learn from it and will check for firmware updates way more often. Also including the hard-drive now.

One more thing you might wanna do – if you own a SSD and run Linux – is to update several mounting parameters to extend your disk lifetime.