Read data damage and wear leveling

A newly formatted drive usually holds all zeros.

A newly formatted drive usually holds all zeros. An erased block of a solid-state device is all

s,

making a raw read of an erased block all

characters. However, it’s unusual for a user to read

an erased block during normal I/O operations.

A technique used in the past is to write a known pattern to the entire drive. Then as database

activity executes against that same drive, incorrect behavior (stale read, lost write, or read of

incorrect offset) can be detected when the pattern unexpectedly appears.

This technique doesn’t work well on solid-state storage. The erasure and RMW activities for

writes destroy the pattern. The solid-state storage garbage collection (GC) activity, wear leveling,

proportional/set-aside list blocks, and other optimizations tend to cause writes to acquire

different physical locations, unlike spinning media’s sector reuse.

The firmware used in solid-state storage tends to be complex when compared to spinning media

counterparts. Many drives use multiple processing cores to handle incoming requests and

garbage collection activities. Make sure you keep your solid-state device firmware up to date to

avoid known problems.

A common garbage collection (GC) approach for solid-state storage helps prevent repeated, read

data damage. When reading the same cell repeatedly, it’s possible the electron activity can leak

and cause damage to neighboring cells. Solid-state storage protects the data with various levels

of error correction code (ECC) and other mechanisms.

One such mechanism relates to wear leveling. Solid-state storage keeps track of the read and

write activity on the storage device. The garbage collection can determine hot spots or locations

wearing faster than other locations. For example, the GC determines that a block is in a read-only

state and needs to move. This movement is generally to a block with more wear, so the original

block can be used for writes. This process helps balance the wear on the drive, but it moves read-

only data to a location that has more wear and mathematically increases the failure chances, even

if slightly.

Another side effect of wear leveling can occur with SQL Server. Suppose you execute

DBCC

CHECKDB

, and it reports an error. If you run it a second time, there’s a small chance that

Recommendations:

1
0xFF

DBCC