Sciences : How to Share Scientific Data

A deluge of digital data from scientific research has spawned a controversy over who should have access to it, how it can be stored and who will pay to do so.

The matter was the subject of discussion after the journal Science published a paper on Thursday by Francine Berman, a computer scientist at Rensselaer Polytechnic Institute who is a leader of a group that focuses on research data, and Vinton Cerf, the vice president of Google.

The paper calls for a different culture around scientific data based on acknowledging that costs must be shared. It also explores economic models that would involve the support of various scientific communities: public, private and academic.

In an interview, Dr. Cerf said storing and sharing digital information was becoming a “crucial issue” for both public and private institutions.

The debate is likely to accelerate next week when federal agencies are expected to file proposals for how they would “support increased public access to the results of research funded by the federal government.” The plans were requested by John P. Holdren, director of the federal Office of Science and Technology Policy, in a memorandum in February.

But Dr. Holdren also directed that plans be carried out using “resources within the existing agency budget.”

That is likely to be a formidable challenge. While the cost of data storage is falling rapidly, the amount of information created by data-based science is immense. In addition, many agencies have complicated arrangements providing favorable access to corporations that then resell federal data. The agencies also must overcome the hurdle of developing systems that will make the data accessible.

Still, the federal guidelines underscore the importance of digital information in scientific research and the growing urgency to resolve the problems.

“Data is the new currency for research,” said Alan Blatecky, the director of advanced cyberinfrastructure at the National Science Foundation. “The question is how do you address the cost issues, because there is no new money.”

Dr. Berman and Dr. Cerf argued in their paper that private companies, as well as academic and corporate laboratories, must be willing to invest in new computer data centers and storage systems so that crucial research data is not irretrievably lost.

“There is no economic ‘magic bullet’ that does not require someone, somewhere, to pay,” they wrote.

Dr. Berman is the chairwoman of the United States branch of the Research Data Alliance, an organization of academic, government and corporate researchers attempting to build new systems to store the digital data sets being created, and to develop new software techniques for integrating different kinds of data and making it accessible. “Publicly accessible data requires a stable home and someone to pay the mortgage,” she said in an interview.

Google initially promised to host large data sets for scientists for free, and then killed the program in 2008 after just a year, for unspecified business reasons.

It may have been that the company was taken aback by the size of scientific research data sets. For example, the Obama administration’s proposal to eventually capture the activity of just one million neurons in the human brain (the human brain has about 85 to 100 billion neurons) for a year would require about 3 petabytes of data storage, or almost one third the amount generated by the Large Hadron Collider during the same period.

Dr. Berman said she was heartened to see a growing international recognition of the scope of the problem. The Research Data Alliance, begun last August with an international telephone conference of just eight researchers, now has more than 750 academic, corporate and government scientists and information technology specialists in 50 countries.

In their paper, she and Dr. Cerf argue that coping with the explosion of data would require a cultural shift on the part of not just the government and corporate institutions, but also individual scientists.

“The casual approach for many scientists has been to ‘stick it on my disk drive and make it available to anyone who wants to use it,’ ” Dr. Cerf said.

They argued that the costs need not be prohibitive. “If you want to download a song from iTunes, it’s not free, but it doesn’t break the bank,” Dr. Berman said.

Even those who feel that information should be free and open acknowledge that easy availability to data from government-subsidized projects gives an unfair and unnecessary advantage to private firms.

And some scientists argue that there would be advantages to charging for data. “Paying a small fee for downloads in the aggregate would also act as an incentive for providing the needed infrastructure,” said Bernardo A. Huberman, a physicist at Hewlett-Packard Laboratories.

In his memorandum, Dr. Holdren told the federal agencies that the release of research papers could be delayed for up to a year; the reasons were not explained. That has angered activists who favor immediate and broad availability of publicly financed research.

“In scientific fields, a year is a very long time,” said Carl Malamud, the founder of Public. Resource.Org, a nonprofit group that attempts to make government information freely available online. Meanwhile, he said, corporations could sell the information. “It’s a sop to the special interests that publish this stuff.”

Dr. Berman said there were models that could provide ideas for the new infrastructures needed to store the data and make it accessible. One is the Protein Data Bank — a database of biological molecules — that is heavily used by the life sciences community and is publicly supported.

That data is freely available. However, she also pointed to the social science database Longitudinal Study of American Youth, which is maintained by the Inter-University Consortium for Political and Social Research at the University of Michigan. Users are charged a subscription fee.

This article has been revised to reflect the following correction:

Correction: August 15, 2013

An article on Tuesday about the challenges of storing and making vast amounts of scientific data, much of it publicly financed, readily available referred incompletely to instructions from John P. Holdren, director of the federal Office of Science and Technology Policy, in a memorandum sent to federal agencies in February. While the memo said a guideline for making research papers publicly available would be an embargo period of a year after publication, it also stipulated that individual agencies could tailor their plans to release papers on a different time frame.

View the original article here

Sciences

Sunday, 18 August 2013

How to Share Scientific Data

No comments:

Post a Comment