Yet another issue

The server that I spoke of in a recent blog post, which is kernel panicing every 2 days under kerel 2.6.9-8 and 2.6.9-10 compiled by CentOS, but is running fine on 3 other VPS’s, and 1 other dedicated server, seems to be flying through its tests.

The testing done thus far has shown that the memory passes the latest Memtest86 (v 3.3).

The system can sustain an install of Windows XP.

The system successfully passes Passmark’s tests, without problem (except 3D Graphics, no Direct X, but irrelevant in a Linux Server environment).

The CPU passes the CPU test, the memory passes fine, the disk drives are fine.

The system crashes around every two days in the data centre.

So, how on earth do you figure that out? I guess the next test would be to prove it can stay up for 2 days or more out of the data centre, running the software we run, but that defies the simple theory that the same setup on other machines doesn’t panic. So far, the only machine to have problems is THIS machine.

So, how do you eliminate that?

It’s got a 400W power supply, runs a Pentium D – 820, 2.8Ghz, a standard Gigabyte motherboard, 2 sticks of 1GB RAM, and a plain IDE drive (because SATA wouldn’t work).

I wonder if the issues could be related to a driver, or perhaps the NIC, or something on the network is conflicting?

I like to stick with the laws of science, because, as proven by my machine’s issues, there’s always an answer. Something is the blame for it, and the only way to get to the bottom of it, is push it. Push it to its limits. I got them to do that, and it survived the 15 minute full load PassMark Burn In Test.

That’s generally pretty conclusive that things are running fine, at full load.

So, we are stuck in the middle.

It’s not a hardware issue, because the hardware runs XP, and the tests all passed successfully.

It’s not a software issue, because the exact same software running on one machine is running on the other (no kidding, i literally installed the same software, copied all the source out of the source folders, and compiled them again, installed all software, and copied configuration and data over).

So, we’ve got a bit of an issue. It’s crashing every 2 days or so, and its not a hardware or software issue (as yet).

I’m still thinking of other ways we can get this demonstrated and proven beyond belief, and beyond belief means you can perform an action, and get it to do the same thing again. That’s what I deem conclusive. It’s like turning a light on, when the switch is on, assuming the light bulb is not faulty – “Let there be light”. Flick the switch (or don’t pay the bill), and “let there be no light”.

Think Boolean, it’s either on or off.

So, there’s an explanation for the server reboots, we’ll get to the bottom of it one way or another.

On another note, I’m getting myself a newer graphics card, to match my fiance’s FX6200, I know, probably not the best move, considering an upgrade to Dual Core can’t be too much further around the corner, but I figure, my machines still in its young time, running fine as, and still got plenty of life in it, so a replacement AGP card will bring it fully up to speed for now, and that might be as far as I go with this machine until we decide on the move to Dual Core in the middle of the year or so. Then again, I might decide against dual core, and go for the power saving benefits AMD chips have.. But I just can’t get my head around the 2.0Ghz is better, compared to 3.0Ghz+. It’s a strange way of thinking.

Computers, fascinating machines. They can be both simple, and very complex, even to the most advanced of users.

This entry was posted in Linux, Random. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *