ACS CAP – Data Curation Subcommittee meeting – 2/7/2011
Attendees: Carol Hoover, Bob Schwarzwalder, Dave Martinsen
The subcommittee conducted a general discussion surrounding specific uses of data by ACS:
ACS sometimes requests that data be submitted as part of a journal publication for use in verification or fraud detection, e.g. to verify purity of a molecule or to confirm expected variability of data.
Cambridge University Press is using OSCAR to parse experimental sections of journal papers looking at chemical formulas/names, IR spectra, etc. for possible re-use, to archive data, ensure validity, error detection and to create new sources of data. Journal of Organic Chemistry is currently using OSCAR.
IUCR has a program that looks for anomalies in bond angle, etc. in crystallographic data.
It is a challenge to integrate fraud detection into the peer review system.
Jan/Feb 2011 issue of D-LIB Magazine is dedicated to the issues surrounding data: http://www.dlib.org/
NISO – a working group is currently working on developing best practices for supplemental materials.
Purdue – has produced useful tools on data curation.
MestreLab – manufacturer of NMR software, has produced relational cloud-based NMR tool for laboratories, backs up data to the cloud.
Mackenzie Smith (Assoc. Director for Technology at MIT Libraries) does not recommend that publisher’s become data curators. Nor do ACS or this subcommittee.
Question to consider – If ACS were to participate in the data curation issue, how/what would they do as a publisher? As a professional society?
Suggested this subcommittee needs to describe what ACS needs, identify if there are specific requirements, identify options for participation in data issues and make recommendations. The subcommittee needs more feedback/guidance from ACS on this issue.
Discussed the idea of a user survey at LANL/Stanford regarding researchers’ opinions on their needs in terms of data curation, ways they believe publishers/professional societies can/should participate. Stanford has surveyed on these questions in the past, but not since the NSF Data Management Plan requirements were issued in January 2011. LANL has not surveyed on this issue, but would consider.
Issues involved in data curation:
Value of data prior to publication (e.g. patents, prior claims, etc.)
Data request fulfillment
Public uses – crowdsourcing?
Open Data movement
There are many more questions than answers in the data curation arena at the moment.
ACS on Campus – librarians have asked for sessions on data management – there is interest.
ACS has formed a Data Integrity group (Darla Henderson).