Research data means all (digital) data, which originate during scientific activity and serve as basis for research results1. The type of data depends very much on the respective discipline. Some examples are series of measurements, images, audiotapes, models, algorithms and scripts which can be structured and stored in a variety of file formats.
A reliable research data management:
- covers the proper indexing, processing and storage of the data
- minimises the risk of data loss and data abuse
- ensures long-term reusability of data
- offers transparency of the research process
- facilitates sharing your data within the scientific community
- helps to conform to funder standards (e.g. ERC Horizon2020, DFG)
- ensures long-term reusability of data
Managing research data involves establishing a needs-based plan in order to ensure effective use, re-use, publication and archiving of data. This can be done using the stages of the data life cycle as an orientation for the creation of a data management plan which suits the characteristics of your Project.
1) Kindling, M.; Schirmbacher, P. (2013): "Die digitale Forschungswelt" als Gegenstand der Forschung. In: Information : Wissenschaft und Praxis 64 (2/3), S. 127-136.
The data life cycle model visualises all stages research data cycle through during a research project from their gathering/creation to their reuse. These stages can vary from discipline to discipline; however, in general six to seven stages are identified:
- (Grant application and preparation)
A data management plan (DMP) systematically describes how research data are managed within a research projects. It documents the storage, indexing, maintenance and processing of data. A data management plan is essential in order to make data interpretable and re-usable for third parties. It is therefore recommended to assign data management responsibilities before the start of a project. The following questions can serve as an orientation:
Which data will be generated and used within the project?
Which data have to be archived at the end of a project?
Who is responsible for the indexing of metadata?
For what period of time will the data be archived?
Who will be able to use the data after the end of the project and under which licensing conditions?
National and international research funding organizations require to an increasing degree to integrate data management plans into the project proposals. Costs of research data management during the project and of the provision of data for re-use will be funded.
On the basis of the tool DMPonline, developed by the British Digital Curation Centre (DCC), the exact steps for the creation of a DMP are explained.
The HU Berlin offers example DMPs for Horizon 2020.
An increasing number of funding bodies and publishers expect that research data are freely available for the public.
For example DFG and EU proposals have to contain a data management plan. Moreover DFG und BMBF demand further information how the data are used when the project has ended and if the data are made accessible in a repository.
Several publishers (e.g. the Nature Publishing Group) combine the publication of a paper with the submission of the related data as a supplement.
Guidelines of funding agencies:
DFG Guidelines for Safeguarding Good Research Practice (Code of Conduct)
Open Research Data Pilot within the EU Framework Programme Horizon 2020.
Metadata provide structured information about your research data. They play a prominent role for the later retrieval of the data sets and ensure their exchange and re-usability.
In order to raise the effectiveness of metadata, a standardization of descriptions is necessary. By using metadata standards, metadata from different sources can be linked and processed together. Every scientific community has its own documentation scheme and metadata standard according to the specific needs of the discipline.
A summary of (discipline specific) metadata standards offers the website forschungsdaten.info.
Good academic practice requires the storage of research data for a period of at least ten years (compare the DFG Guidelines). The BTU recommends that research data are stored in professional and - if possible- discipline specific data archives (repositories) to minimize loss of data and data abuse.
Finding a repository
Research data can be archived and published in online repositories. Within the last years, hundreds of discipline specific or institutional repositories were established. Depending on your research discipline the options can be large and confusing. The Registry of Research Data Repository, re3data, a service offered by DataCite and supported by the German Research Foundation, provides a good overview of shall help users to identify important characteristics of a research data repository at first sight. Several icons visualize the characteristics of the platforms, e.g. the term of use and licenses of the data. Moreover re3data.org shows, if the research data repository is either certified or supports a repository standard.
Research funders and journals increasingly ask for publication of the (raw) data that you collected for your study. Furthermore research data publication increases the visibility of your data by making them citable.
There are different ways to open your data to the public:
- As an independent publication in a repository
- As a data supplement of a paper
- As a so-called "Data Paper" in a data journal
Suitabe repositories for your research discipline can be found in the Registry of Research Data Repositories (↗Finding a repository). Moreover research data can be submitted to many publishers as data supplement to enrich the text publication.
Some journals are primarily specialized to the description and review of data sets. So-called Data Papers support a formal Peer-Review, publication and citation of research data. A selection of available data journals offers forschungsdaten.org and the Humboldt-Universität zu Berlin.
Persistent identification is the process of assigning a permanent, digital identifier consisting of numbers and/or alphanumerical characters to a data set (or any other digital object).
Frequently used identification systems are DOI (Digital Object Identifier) and URN (Uniform Resource Name). As opposed to other serial identifiers (such as URL addresses) a persistent identifier refers to the object itself rather than to its location on the internet. Even if the location of a persistently identified object changes, the identifier remains the same. All that needs to be changed is the URL location in the identification database. In this way it can be ensured that data sets are permanently findable, retrievable and citable.
You can get an URN free of charge by the Deutsche Nationalbibliothek, DOI by members of data Data Cite (GESIS, TIB, ZB MED und ZBW).