That confirms my suspicions ...
Your $50 consumer-class disk drives will simply not do error recovery correctly on this controller. It is a TLER issue. Specifically, when you get a bad block, or one that just does not read immediately, then the disk will go into a deep recovery phase to try to get the data.
When it attempts to read a block that gives either an ECC error or a read error, then it goes into deep recovery to try to get the data. If it recovers, it moves on, otherwise it locks up all I/O until the firmware-specified timeout which is 10+ seconds, depending on the firmware/model. The problem is that most of the controllers only allocate 7-8 seconds for recovery. If a drive takes longer than that, then bad things happen, like drives going offline and data getting lost.
You need to run enterprise class disks which are programmed to give up after just a few seconds. Not only will this minimize the timeouts, but also you may never even see timeouts as they also typically have 2 more ECC bits. Heck, you pretty much have the same number of data bits than ECC recovery capability, so statistically if you read every bit on the entire RAID twice, you are statistically guaranteed to lose 512 bytes to 64KB.
Also, intel does NOT certify or qualify or recommend these drives for use with this RAID (or any of their matrix controllers for server use).
Seagate does not design those drives for 24x7. Those disks are designed for 2400 hours use/year light duty. Do the math on how many days that is.
Your solution is to get enterprise class drives.
You might want to read this ..
http://www.experts-exchange.com/articles/Storage/Misc/Disk-drive-reliability-overview.htmlThis goes into it in more detail (a wd paper, but TLER is the same issue with seagate disks)
http://www.wdc.com/en/library/sata/2579-001098.pdfNow they do have a firmware/driver update that may help, but it is not a cure, it will prevent other types of issues. See link below.
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=1657&DwnldID=8849&lang=engMy formal recommendation is for you to get enterprise class drives that will not have this issue.