AMD Opteron PROBLEM
with Opteron x52s and x54s which is basically a temperature induced FP bug.
When the chips get hot, and you are running FP heavy code, you get 'inconsistent' results.
According to AMD, there are a small number of chips, about 3,000, and out there that have this problem, and it has a test that can identify the chips. If you are affected, it will replace the chips for free.
You can read the full release here, and there are numbers to call for most of the world if you are affected. I called the US number and got an ear full of screetchy static, so I am not sure what they will say. Hopefully this will be rectified soon, it is a slight impediment to getting a fix.
Me: (dialing) beep – beep – beep
AMD: Screeech – squeal static - squeal!
Me: Ow. (hangs up)
(repeat with the same results, three times)
In my opinion this is the worst kind of problem you can get, the key word being 'inconsistent'. If your chip craps out on you, it is fairly obvious that it does, the box doesn't power on even if you can see it through the smoke. When you get calculations that silently come up wrong, well that is very, very bad.
If you have a large numerical simulation that you ran for a month, and you later discover that you had a bad chip, you have to redo the entire thing. The 3,000 or so affected people may very well end up more unhappy than a single bad chip or two might suggest.
On the up side, it looks like AMD is getting out in front of this and replacing the chips before it becomes headline news. I guess AMD learned as much from FDIV as Intel did