electricshampo1 16 hours ago

Nice to see SDC concerns being taken more seriously by hardware folks. Once software gets to sufficient quality (which we have achieved in many cases), these kinds of rando hw issues are the only remaining causes of "impossible" bugs that waste endless engineering time to debug.

I wonder how much of this relies on or is made easier by the clustered core architecture of E-Core Xeons. In comparison each physical core of P-Core Xeons is its own island basically.

trebligdivad 17 hours ago

Is this limited to lockstep between softcores on a die - so good for low level error failures like soft error, but no good if the package dies? (Still very neatly done)

bombela 15 hours ago

I wonder what is the ratio of software vs those type of hardware bugs in the wild. Maybe the product of this paper will help produce this metric.