My research field for my PhD in Climate Sciences at the University of Bern is paleoclimatology – the study of past climate variability. My research is based on core samples of sediments deposited at the bottom of lakes, which we use to reconstruct environmental or climatic changes that happened in the past, typically on the timescale of 100-20’000 years (though some lake sediments extend back millions of years). We use measurements based on properties of the sediment, or on biological remains in the sediment as ‘proxies’ for environmental or climatic data that was not measured in the past. One of the challenges of working in this field is determining whether one particular proxy record is representative of changes that happened over a large area, or if the variations found in the sediments reflect only local changes. Each lake has its own interesting story to tell, but if we want to draw conclusions about regional or continental scale variations in vegetation patterns, land-use, or climate, it is clear that one site is not enough. It is likewise not enough to compare lakes to other lakes; we gain confidence in our results by comparing with other paleo data such as tree-rings, corals, marine sediments, ice cores, or archaeological data. Real insights are almost impossible without being able to compare your results to results found at other sites. This is where the role of Open Science and FAIR data practices (Findable, Accessible, Interoperable and Reusable) become incredibly valuable.
Some of the most important research papers in the field of paleoclimatology have resulted from syntheses of all available data about a specific region or time period. Prior to my PhD, I was involved in two of these synthesis projects and my task was to scour the available publications and databases and compile the data files in a consistent format. This is a time-consuming task and requires close attention to detail to ensure consistency among datasets in terms of measurement units, naming conventions, and even methodological considerations. The published datasets that I found were inconsistent in their formatting, terminology, and what information was included. My objective was to produce data files in a consistent, machine-readable format with metadata included. If datasets were not made available as supplemental material or in data repositories, we emailed authors to ask if they would be willing to share their data files. Some researchers were happy to do so, while others were reluctant. Some researchers requested that we include them as authors before they would share their data with us. This might be justified in some cases but is not always appropriate. In the end, some datasets were not included in our compilation because researchers did not want to share their data, but these were rare exceptions.
Overall, there is a strong trend in the geoscience and paleoclimate communities towards publishing data files following the FAIR principles. Many funding agencies and publishers now require that data is made available when articles are published. Datasets may be published as supplemental material to publications (sometimes this means the data is behind a paywall) or stored in a data repository (PANGEA and the World Data Service for Paleoclimatology are two of the most commonly used repositories). However, making data truly interoperable and reusable requires further work. The paleoclimate community is diverse and interdisciplinary, which has led to the application of different standards in different fields. Nonetheless, progress has been made towards community-wide standardization of data formatting and terminology for a wide variety of data types (see PaCTS 1.0: A Crowdsourced Reporting Standard for Paleoclimate Data). This effort was coordinated in part by the Past Global Changes association hosted at the University of Bern. This group has taken a leading role in organizing efforts to make paleodata more open and accessible and has recently published a summary of the current status of open paleodata, “Building and Harnessing Open Paleodata”.
More work remains to be done to help researchers embrace and implement these types of tools. According to a 2017 survey1, only 25% of geoscience data was submitted to a data repository. The primary barrier for data sharing is lack of time or knowledge to prepare and submit datasets in the correct formats. This highlights the importance of funding agencies providing resources for open data publication. Workshops and resources to train researchers in the usage of open data tools are needed to increase the adoption of open data practices in the future. New tools using DOIs to track and credit the re-use of datasets need to be implemented more widely to alleviate researchers’ concerns about receiving credit for open data. The good news is that regardless of the effort it takes to adhere to the FAIR principles, the potential benefit of open data for research outcomes is clear to anyone involved in the paleoscience community.
1. Stuart, D., Baynes, G., Hrynaszkiewicz, I., Allin, K., Penny, D., Lucraft, M., & Astell, M. (2018). Practical challenges for researchers in data sharing. Springer Nat. https://doi.org/10.6084/m9.figshare.5975011.v1