A great Vista troubleshooting story

I read every comment left on this site. If you have your own website or blog and you include that address with your comment, I’ll probably pay a visit. That’s how I found this illuminating Vista troubleshooting report by David Moisan:

For the past six months, I’d been scraping by with Vista.  Every few days, my machine would lock up.  I tried many combinations of hardware;  was it my cheap USB hub?  Remove it.  My Firewire?  Removed.  I wasn’t happy with the performance but it was never really bad enough for me to consider going back to XP.  (or going to Linux as some would think)  I added 2 gigs of Crucial memory to the box when I renovated my server, but the lockups didn’t go away.

Sometimes the machine would simply lock up, but other times, the symptoms were very interesting:  I’d wake the machine up from sleep for the morning, and over a minute or so, the computer would slowly grind to a halt.   Control-Alt-Delete would often yield an error message (a message about being unable to bring up the security dialog.)

I can hear the grumbling now: “Vista’s a train wreck. Not ready for prime time. How could Microsoft have released this dog?”

Except, in this case, the problem had nothing to do with software. It wasn’t caused by Microsoft or by a third-party driver. The solution was as simple as swapping a $4 SATA cable.

I’ve said it before, but it’s worth repeating: Whenever you begin experiencing instability problems with a PC, the very first thing to check is hardware:

  • Most system and application failures are fairly easy to identify. Random failures often indicate hardware problems.
  • Bad RAM, overheating, and defective hard disks, in order, are the most common hardware failures in my experience. A cheap or overtaxed power supply can cause grief as well.
  • Hardware can fail over time. Most people assume that the problem is software because they haven’t changed any hardware lately.

Thanks for sharing the story, David!

15 thoughts on “A great Vista troubleshooting story

  1. This happened to me with XP as well. What I thought was a dying PC turned out to be nothing more than a crimped IDE cable.

    I wonder, is there a test suite that people could boot from a CD to identify things like a failing cable?

  2. Been there, done that.

    I had a P3 system that had problems. Touching the hosts file, starting the system, or even unplugging and plugging the Ethernet cable in would cause SVCHOST to go 100% for about 30 minutes. I scrubbed and reinstalled, and that was fixed for about 3 months before that hit again. Total Annihilation was slower than I ever remember it playing. Then, even with dealing with those issues, I was getting random crashes and at two a day, I pulled the plug and got another motherboard. Never had that problem again with that system, even though I was still technically running the same software.

    Upgraded to a P4, and had rather unimpressive results with the thing. Then we had the big bad capacitor scare. According to my mobo manufacturer, I wasn’t affected. Then the computer would just shutdown. Testing memory and the motherboard proved nothing, but examining the motherboard, I noticed that the stupid capacitors were indeed bulging. I finally found somebody that had a motherboard (we went from DDR to DDR2 memory and all that) that I could use (I wanted new, not used) and found that my system was now a heck of a lot faster than it had. No overclocking, no change in memory, video or anything else. Everything that was in the old system was in the new system, nothing added.

    The only weirdness was that using Javascript under Firefox would time out. Google being the real issue. Two thirds of the time, Javascript would time out (and I would cancel it with no ill effect) doing a Google search. IE6/7 was unaffected, so I used IE for all my Google searches.

    Now I’m running an Q6600. No more Firefox weirdness. And while I’m still technically 2.4GHZ, I’m not hyperthreading either, so I’m getting the full 2.4 per process.

  3. Some things never change – just the other day I broke out the pencil eraser to clean up some RAM contacts. It turned around some very bad behavior that had the owner worried about viruses, yadda, yadda.

  4. I support many different computers, old, new, fast, slow…, in many different environments.

    In my experience hardware is rarely the cause of PC malfunctions.
    The frequency of problems I encounter from most to least follows:
    1. Software applications loading too many unnecessary processes into memory at boot straining system resources
    2. Malware
    3. Badly written or not thoroughly debugged software – dlls, applications and/or drivers.
    4. Combination of software not tested or envisioned by authors.
    5. Microsoft updates.
    6. Hardware overheating.
    7. Other hardware issues.

    Today’s computers rival simple life-forms. Their complexity is comparable to a single-cell organism or greater. All organisms have good days and bad days, so too with computers.

  5. Good advice, Ed, and true to my experience. I’ve lost count of the number of times replacing a HD, power supply, or RAM made me appear to be a genius to my friends. (It’s a good thing they didn’t figure it out for themselves.) Except for HDs, most computer parts don’t fail right away; they seem to flicker and dim, much like remote control batteries.

  6. Interesting to read these comments. I run a (small, part-time) computer repair biz in Ireland, and such random or intermittent faults drive me nuts. In our rural area we have a lot of problems with consistency of the mains power supply, and this seems to cause a lot of faults. My problem is that I worry I am replacing parts unnecessarily, especially the motherboard. I have not been able to find either software or hardware that can independently test and verify intermittent faults – at least not at a reasonable price!!

  7. In my experience as a LAN Admin, family and friends computer failures, and e-mail from my website I’ve noticed most computer problems are caused by:
    Ignorance: Many people just don’t know any better. They use their computer without knowing basic things like the importance of having up-to-date malware protection and what not to do on the Internet.
    Hardware (or lack of): Many people buy a PC based on price instead of features. They don’t understand that you get what you pay for. Once they start using their cheap PC and it slows down they want to blame Windows for being so slow.
    Malware: It blows me away every time I work on a PC that has outdated anti-malware protection that is riddled with malware and the PC user has no idea why they need anti-malware protection or how to use the protection they have.

  8. Definitely a good read, as are the comments. In my experience computer problems are usually caused by the following (in order of course):
    1. User
    2. Software
    3. Hardware

    You really have to hand it to software though. With the huge variety of hardware available it adds an extreme amount of possible issues and it’s pretty amazing that it works as well as it does.

    The random issues do usually come down to a hardware failure or hardware/software compatibility problem.

  9. I installed all the beta versions of Windows Vista and never had any blue screen issues with my P4 PC. When Vista went RTM, I installed it and experienced blue screen stop errors which were quite puzzling. After three months of troublshooting all my hardware, which included cleaning all metal contacts to the motherboard, the very last effort was to replace the Power Supply. Now, no more Blue Screens.
    As it happened, I purchased a brand new Antec Sonata II case which included an Antec SmartPower 450 watt power supply at the time I installed Vista RTM. I recall I transferred all my hardware to this new case and this is when the Blue Screen issues became apparent. Windows Vista was definitely not the culprit…the defective power supply (new) was! I have not had a Blue Screen in the months past since changing out the defective Antec PSU!

  10. Thanks for the kind words, Ed. There are a few other glitches I’m working on elsewhere on my machine, but that was a big win. My guess is that the connectors on the SATA cable were badly crimped and affected by temperature changes; at times it acted like a thermal problem, and other times not!

    A clue that may help others that I did not mention in the original post: Many times, STOP codes (blue screen codes) are related to disk problems. As you may know, there are four subcodes to a blue screen error. One of them you may see is 0x00000185. In the Microsoft documentation, this is listed as a disk error that usually happens to a SCSI disk with defective cables or bad termination. No reason it can’t happen to SATA controllers.

    I was getting 0x00000185 errors but not realizing their significance. Bad on me since I’m an experienced tech who does this by day. But there it was. Check the cables next time you see this error.

    Take care.

  11. I had a strange problem with Windows XP on my office machine (2gb of RAM): sometimes Java applications crashed with GPF in JVM (which is very unusual for JVM – it’s pretty stable). Sometime (usually in 10 days interval) Windows was simply freezing. The techinal department believed, that the problem is Windows XP itself and I should reinstall it. Well… The nightly memory test showed that this computer just simply cannot achieved 400Mhz memory speed (as it should, because of 400 ddr ram) – the few last MBs throwed a lot of errors in the fifth memtest86′ test. Reducing of the memory speed to 333Mhz gave me back the stable system.

  12. “Whenever you begin experiencing instability problems with a PC, the very first thing to check is hardware”.

    I disagree. You can spend a lot of more time purchasing and replacing hardware than checking out software problems. A reboot fixes most problems I run into.

    As you have stated in other posts, the tweaks and “optimizations” performed by users intentionally, or various software like ad/spy removal “helpfully” do for you. There are all sorts of registry keys that can cause pretty strange behavior. (just yesterday I fixed a slow down by opening up regmon and noticing that the mouse driver was querying (hundreds of times a second) a particular folder that didn’t exist. I added the folder, and the slowdown disappeared.

    You can’t go too far the other saying that all problems are software, and “see, I reformatted and reinstalled, and now everything is fine. stupid windows, bill gates tax, etc”. I have seen people do things like that, and they work for a while, but then you get back to the section of the hard drive that is failing, or when the drive head starts jumping around once more software is installed and/or running, it draws more current, causing strange things.

    And, when living in a small town in NH, power surges always caused all sorts of strange behavior – we always saw a couple computers the day after a thunderstorm, and it took us a while to figure out that we needed to pay attention to the weather to help us diagnose problems.

    And of course, there are the computers that like going for a drive, and so work fine once they bring them to the shop, and stay fine once they go back. (or just work in the shop, since it was relatively humid there, and had very low chance of static buildup being a problem).

    But, as a random guess of things that I have fixed over the years, I’d guess 20% hardware problems, 70% software problems, and 10% unsolvable/reformat/was thinking of upgrading anyway.

  13. Jon, you have completely missed my point. So let me try again.

    By “instability,” I mean problems that don’t respond to conventional troubleshooting. An unstable system is one that crashes or hangs or slows down for no apparent reason. Most software-related problems are relatively easy to diagnose and fix if you know what to look for. Likewise, transient problems that can be cured with a reboot don’t fall into this category.

    My bullet list at the end of the post started out: Most system and application failures are easy to identify. It’s the random failures that usually indicate hardware issues, in my experience.

Comments are closed.