Monday, January 4, 2016

Why academia has a data sharing problem

Martin Bobrow, Chair of the Wellcome Trust's advisory group on data access, submitted an enlightening summary of data sharing problems in Nature, where he asked:
Most research-funding agencies, and most scientists, now agree that research data should be shared — provided that those who donate their data and samples are protected. This approach is strongly advocated by organizations such as the Global Alliance for Genomics and Health. But data sharing will work well only when it is streamlined, efficient and fair. How can more scientists be encouraged and helped to make their data available, without adding an undue administrative burden?
I think the burden he's addressing is actually split into at least two parts:

1. The burden of actually sharing data. This is what usually comes to mind when people think of data sharing being difficult, and it involves hammering down infrastructure and data formats to enable sharing.

2. The burden created by actually making data available. Being the 'owner' of data brings both the opportunity for first crack at investigating that data and also the responsibility to share it. There's a real cost that sharing imposes, both in serving people that want access to data and the cost of storing it (though both are continually falling).

Thinking realistically, there's an actual disincentive to share academically generated data. Sharing data essentially gives potential competitors 'your data' at no cost, which may vaporize whatever competitive scientific advantage you may have gained.

Further on in the article, Bobrow offers this explanation:
It is reasonable for scientists to impose certain conditions or restrictions on the use of their hard-earned data sets, but these should be proportionate and kept to a minimum. Justifiable conditions can range from requiring secondary users to acknowledge the source of the data in publications, to stipulating a fair embargo time on the use of new data releases. Whatever the conditions imposed, they need to be presented clearly to data users.

Criteria used to judge academic careers still focus heavily on individual publication records and provide little incentive for wider data sharing. Scientists who let others use their data deserve reward too.
So yes, the issue with academic data sharing is incentive.

People who put together well designed data sets should be rewarded for their expertise and talents in doing so. Good data isn't as simple as sending a box of samples to a [insert your favourite high-throughput technology] production center; it requires knowledge of what constitutes 'normal' samples, experimental design, not to mention actually handling the logistics of obtaining the right samples in the first place.

Why wouldn't someone deserve credit for that?