Publishing old research data is like going through your attic for a garage sale. It provokes ambivalent feelings: nostalgia and frustration from spending hours digging in old, sometimes poorly documented stuff, as well as satisfaction from finally presenting a well-sorted selection of used – yet still valuable – goods to anyone who is interested in it. Data that underpinned long-published papers may not be current any more, but can still offer value to the research community. The reasons for making old data available now may vary widely: Perhaps, Open Science was not yet a hot topic at the time the data was produced or research funders did not yet require their grant-recipients to make the data openly available. Regardless, someone somewhere may very well have a use for the data of previous research projects.
Based on my personal experience, this series of blog posts investigates what happens if one decides to publish old research data and what such data publication “post-research”, i.e. well after the publication of a paper or the termination of a research project, involves and requires. This first part will offer some background and discuss my first, intuitive publication attempt.
Growing desire for post-research data publication
The interest and effort in data sharing is continuously growing in the scientific community. Science funders (e.g., SNSF, 2018) as well as publishers (e.g., ESA, 2018) increasingly require researchers to make their research data publicly available during the publication process. Published research data should be FAIR, i.e., findable, accessible, interoperable, and reusable (FORCE11, 2016). Still, many (former) researchers have unpublished data stored on internal institutional servers or personal devices – hardly discoverable for and accessible to the scientific community. Especially small datasets in the “long tail” of research data are often not well curated (Ferguson et al., 2014; Palmer et al., 2007). Becoming aware of the importance to publish data, (former) researchers now increasingly intend to publish their data “post-research”, i.e., sometime (or even years) after the respective research article(s) have been published. This is what I experience myself, as a former researcher from the field of ecology.
For my PhD thesis, I collected data on tree seedling growth and phenology, as well as associated data describing maternal trees, seed source environments, and test site conditions (Frank 2016; Figure 1). I used these data to compare three major tree species in Switzerland with respect to their risk of being poorly adapted to climate change. Together with my research colleagues, I published three papers based on these data (Frank et al., 2017c, 2017b, 2017a). In addition, two follow-up projects use(d) the data for their analyses (e.g., Frank et al. 2019). Obviously, the data were and still are valuable, not only because they cost a lot of time and money to collect, but also because such datasets are rare in the world. Nevertheless, I did not openly publish the dataset, mainly because during my research project I was never asked – and certainly not forced – to do so. In 2016/2017, when I finished my PhD thesis and published the three papers, neither my funders nor my publishers required data publication. Furthermore, like every PhD candidate, I was extremely pressed for time during the final stage of my PhD thesis, which did not allow for additional data publication efforts (at least I thought so). Now, almost three years after the completion of my PhD thesis, I want to find out if and how data publication would still be possible.
Why and where to publish data post-research?
I began this process with thinking about my motivation to publish data from my PhD thesis. First, I felt that the data should become available to other researchers for re-use. Secondly, I hoped to be rewarded for doing so by getting another citable academic paper on my publication list. I had heard of data papers and their benefits for data owners (scholarly recognition) and other researchers (easy data discovery and reuse; Chavan and Penev 2011). Therefore, it seemed logical to me to attempt a data publication in a well-known ecological and peer-reviewed journal. I chose Ecology, the journal in which I had published the first article of my PhD project (Frank et al., 2017c). Data papers in Ecology undergo full peer-review, during which the ecological significance and quality of the data as well as the technical requirements for high-standard metadata are evaluated (ESA, 2019b).
What data should be published?
For my first data paper draft, I decided to focus on a subset of the project’s data, i.e., the seedling growth and phenology data (Figure 1, A). Those data had been assessed for the three tree species Norway spruce (Picea abies), silver fir (Abies alba), and European Beech (Fagus sylvatica) during two consecutive years, and at two test sites. To reduce the complexity of my first post-research data publication attempt, I further concentrated on a subset of these data, i.e., on the data collected for Norway spruce and silver fir during 2013 and 2014. This dataset had been the basis for the first publication within my PhD project (Frank et al., 2017c).
Preparing the data paper draft
To prepare my data paper draft, I followed the instructions of Ecology for data papers (ESA, 2019b). These include formal requirements concerning the data (e.g., logical formatting, conversion to plain text, compression of multiple files) and metadata (e.g., structure and format). The latter should adhere to the standards of Michener et al. (1997). A data paper for Ecology has to be submitted according to the following structure: Title, Authors, Abstract, Key Words, Metadata, Acknowledgements, and Literature Cited. The final data paper, published in print and online, contains only the parts “Title” to “Key Words”. Actual metadata and data files are attached as supplementary material online. The ESA data paper instructions recommend to consult recent data papers published in Ecology for further orientation. Recent data papers in Ecology also contain a section “Introduction” in addition to the sections mentioned above (e.g., Bello et al. 2017). However, before I finished my data paper draft, I started to doubt my approach to this data publication project. As a result and at the time of writing this blog post, i.e., half a year since the start of this data publication self-experiment, the data is still not published. What happened?
Challenges
First, I could not definitively decide on the extent of data, i.e., the number of data subsets of the former research project that should be included in my data publication. I originally thought I should focus on a manageable subset of data but maybe now that I make the effort of data publication, I should consider including more data (Figure 1). In addition, I had not determined whether there were any outstanding legal restrictions or technical aspects related to data publication that had to be considered. Until these questions were answered, I could not really proceed.
Secondly, I was no longer sure which was the best way to publish my data. Was the data paper really the ideal vessel, or would it make more sense to simply place the data in a designated data repository? In other words, I realized that I should have started the data publication experiment by preparing and uploading my data and metadata to a data repository before starting to write a data paper draft. It is important to thoroughly review the data first, and this can only be achieved by digging into the data and exploring its condition. Any dataset published in a data paper has to be stored in an open repository anyway, either before submission or after acceptance of the data paper. Therefore, it would have been wise to complete the data repository step first. I also began to wonder whether Ecology would be the ideal place to publish my data paper once I completed uploading the data to a repository. There is no doubt that the journal is a subject-specific vessel for promoting the availability of ecological datasets. Furthermore, a data paper in Ecology is not entirely behind a paywall: abstracts and access links to data and metadata are available online to anyone. However, Ecology charges $250 for each data paper “due to the financial liability of long-term hosting and maintenance” (ESA, 2019b). I find that publication fees are only acceptable if the publisher provides a clear, additional value for the author and the data user beyond peer-review and “long-term accessibility and maintenance of data papers”. For my purposes, I would have hoped for a helpful data paper template or a practical, up-to-date metadata scheme. Unfortunately, Ecology offers neither.
Finally, preparing the data paper draft proved to be trickier than expected. Instructions and requirements for metadata preparation are, in my opinion, not very user-friendly in Ecology (ESA, 2019b). The instructions consist mainly of the abovementioned standard ecological metadata descriptors from Michener et al. (1997). It was time-consuming to gather all relevant descriptors and to compile the metadata catalogue accordingly. Such an exhaustive procedure seems strange given that several data journals now provide templates (e.g., Scientific Data by Springer Nature Publishing 2019), and so-called “Data Paper Tools” that facilitate metadata preparation for researchers. Suggestions for such tools are provided by the Global Biodiversity Information Facility GBIF (2019) and seem to work with a variety of data journals (not, however, Ecology). Standardizing metadata tools do not only assist the researcher who attempts to publish his/her data, but also the researcher who later on accesses the dataset for further use.
Lessons learned so far
The post-research publication of my data is much more complex than I had expected when I started this self-experiment. I realized that it requires a more systematic approach and enough time to plan and thoroughly prepare a data publication. If you start from scratch as I did, make sure to carefully think about which data to publish and where to publish. If the amount of time you can spend on your data publication project is limited, then uploading your data and corresponding metadata – provided that these contain a detailed description of the dataset inclusive contact information for further inquiries – to a trustworthy repository might be sufficient. If you choose to prepare an additional data paper, select a journal that is both user-friendly for you as data publisher and for those who will want to look at or reuse your data. Do not hesitate to contact your university library if you need more information.
Next steps
After my first post-research data publication attempt, I decided to retrace several of my steps and start over more systematically. My goal is still to publish the data, but also to develop guidelines based on my experience that can help (former) researchers with their post-research data publication. I also want to investigate the role of libraries in this process to figure out how they may provide support in post-research data publication. More about this in my next posts on this blog.
What are YOUR experiences with post-research data publication? Please get in touch and let me know: aline.frank@ub.unibe.ch
Literature
Bello, C., Galetti, M., Montan, D., Pizo, M. A., Mariguela, T. C., Culot, L., … Jordano, P. (2017). Atlantic frugivory: a plant–frugivore interaction data set for the Atlantic Forest. Ecology, 98(6), 1729–1729. http://doi.org/10.1002/ecy.1818
Chavan, V., & Penev, L. (2011). The data paper: a mechanism to incentivize data publishing in biodiversity science. BMC Bioinformatics, 12(Suppl 15), S2. http://doi.org/10.1186/1471-2105-12-S15-S2
ESA (2019a). Ecological applications data policy. Retrieved on February 14, 2019, from https://esajournals.onlinelibrary.wiley.com/hub/journal/19395582/resources/data-policy-eap
ESA (2019b). Ecology: data paper instructions. Retrieved on February 14, 2019, from https://esajournals.onlinelibrary.wiley.com/hub/journal/19399170/resources/data_paper_inst_ecy
Ferguson, A. R., Nielson, J. L., Cragin, M. H., Bandrowski, A. E., & Martone, M. E. (2014). Big data from small data: data-sharing in the “long tail” of neuroscience. Nature Neuroscience, 17(11), 1442–1447. http://doi.org/10.1038/nn.3838
FORCE11 (2016). FAIR Data Principles. Retrieved on February 14, 2019, from https://www.force11.org/group/fairgroup/fairprinciples
Frank, A. D. (2016). Genecology of Norway spruce , silver fir , and European beech in Switzerland: Are current populations adapted to future climates? ETH Dissertation Nr. 23644. ETH, Zürich.
Frank, A., Heiri, C., & Kupferschmid, A. D. (2019). Growth and quality of Fagus sylvatica saplings depend on seed source, site, and browsing intensity. Ecosphere, 10(1), e02580. http://doi.org/10.1002/ecs2.2580
Frank, A., Howe, G. T., Sperisen, C., Brang, P., St.Clair, J. B., Schmatz, D. R., & Heiri, C. (2017). Risk of genetic maladaptation due to climate change in three major European tree species. Global Change Biology, 23(12), 5358–5371. http://doi.org/10.1111/gcb.13802
Frank, A., Pluess, A. R., Howe, G. T., Sperisen, C., & Heiri, C. (2017). Quantitative genetic differentiation and phenotypic plasticity of European beech in a heterogeneous landscape: indications for past climate adaptation. Perspectives in Plant Ecology, Evolution and Systematics, 26, 1–13. http://doi.org/10.1016/j.ppees.2017.02.001
Frank, A., Sperisen, C., Howe, G. T., Brang, P., Walthert, L., St.Clair, J. B., & Heiri, C. (2017). Distinct genecological patterns in seedlings of Norway spruce and silver fir from a mountainous landscape. Ecology, 98(1), 211–227. http://doi.org/10.1002/ecy.1632
GBIF (2019). Data papers: getting scholarly recognition for your datasets. Retrieved on February 14, 2019, from https://www.gbif.org/data-papers
Michener, W. K., Brunt, J. W., Helly, J. J., Kirchner, T. B., & Stafford, S. G. (1997). Nongeospatial metadata for the ecolgical sciences. Ecological Applications, 7(1), 330–342. http://doi.org/10.1890/1051-0761(1997)007[0330:NMFTES]2.0.CO;2
Palmer, C. L., Cragin, M. H., Heidorn, P. B., & Smith, L. C. (2007). Data curation for the long tail of science: the case of environmental sciences. 3rd International Digital Curation Conference, May 2014, 1–6.
SNSF (2018). Open Research Data. Retrieved on May 4, 2018, from http://www.snf.ch/en/theSNSF/research-policies/open_research_data/Pages/default.aspx
Springer Nature Publishing (2019). Scientific data submission guidelines. Retrieved on February 14, 2019, from https://www.nature.com/sdata/publish/submission-guidelines#sec-3