A number nerd's wet storage dreamIf you're a regular reader of Maximum PC, then a name you're likely to remember is Backblaze, a cloud-based backup firm that routinely shares its data about hard drive failures and various operations. The level of openness is pretty rare, as not too many companies offer the same level of transparency -- Puget Systems comes to mind -- and even fewer would splash the Internet with raw data. Well, that's what Blackblaze just did, offering up raw data collected from more than 41,000 disk drives in its data center.
Backblaze reckons this is the largest data set on disk drive performance ever to made public, and if there is a larger collection, it's news to us as well. What you'll find inside the two files (one containing 2013 data and one containing 2014 data) are daily snapshots of the state of every HDD in Backblaze's data center, including the drive's serial number, model number, and all of its S.M.A.R.T. data, which will tell you how many hours the drives have been running, temps, if sectors have gone bad, and more.
Here's what you'll find in the snapshots:
- Date The date of the file in yyyy-mm-dd format.
- Serial Number The manufacturer-assigned serial number of the drive.
- Model The manufacturer-assigned model number of the drive.
- Capacity The drive capacity in bytes.
- Failure Contains a "0" if the drive is OK. Contains a "1" if this is the last day the drive was operational before failing.
- SMART Stats 80 columns of data, that are the Raw and Normalized values for 40 different SMART stats as reported by the given drive. Each value is the number reported by the drive.
"There are lots of smart people out there who like working with data, and you may be one of them. Now its your turn to pore over the data and find hidden treasures of insight. All we ask is that if you find something interesting, that you post it publicly for the benefit of the computing community as a whole," Backblaze stated in a blog post.
Backblaze isn't being lazy by turning this data over to the public. After all, this is a company that sells its backup service, and while it routinely performs analysis (which it often shares), diving even deeper into the data is a time consuming task, and one with diminishing returns, as it relates to its primary business.
That said, if anyone on the Internet wants to comb through the data and post any conclusions, they're certainly welcome to do that.
You can find the files here, along with information on how to decipher the data.
Follow Paul on Google+, Twitter, and Facebook
More...
