1. Introduction
Data management is an integral part of good scientific practice. But the realization of adequate data management remains challenging, especially for researchers. Templates for data management plans, aiming to support researchers in planning and implementing data management, often lack concrete recommendations to support researchers in data management (; ). Consequently, researchers ask for ‘much more tailored guidance and discipline-specific examples’ () to manage data according to their needs and the common practice of their community. To address this challenge of more tailored and specific guidance, Science Europe established the concept of domain data protocols (DDP), i.e., ‘a “model DMP” for a given domain or community that shares common methods’ ().
Based on this concept, members of the project Domain Data Protocols for Empirical Educational Research (DDP-Bildung) developed the Standardized Data Management Plan for Educational Research, briefly called Stamp (; ). As outlined in the second, the Stamp supports researchers in planning and implementing data management in educational research. It consists of eight content modules, each containing a so-called minimal condition on managing data, granular checklists to realize the minimal conditions and auxiliary materials, providing guidelines and examples on different data management activities. The eight content models reflect the FAIR Data Principles (), the idea of Open Science, and the rules of good scientific practice.
Initially, we developed the Stamp for the educational research community, aiming to provide guidance tailored to this domain. At the same time, we expected that the idea of activities to manage data adequately is rather a matter of the type of data processed, the methods of data analysis, and the information included in the data rather than of a particular community or domain (). We thus considered the transferability of the Stamp to other domains and its usability beyond educational research during the entire development process. We designed the Stamp with a flexible structure that allows it to evolve and adapt to changing data management practices, such as employing new types of data. This structure not only allows future development of the Stamp within educational research but also its transferability to other research domains.
To investigate the extent of transferability of the Stamp and its usability beyond educational research, staff members of DDP-Bildung organized two workshops, funded by KonsortSWD (), in August 2022. As discussed in section three of this paper, the outputs of these workshops highlight that the Stamp’s minimal conditions can be used as a basis for data management and its documentation across domains. Likewise, checklists can be transferred to other domains, at least to some extent, as exemplified in section four. In sum, transferring the Stamp to other domains is a matter of translating its terminology to match the terminology of respective communities.
2. The Standardized Data Management Plan for Educational Research
Researchers are increasingly encouraged by different stakeholders, such as academic journals, professional associations, and research funding agencies, to make the research process as transparent as possible to enable reproducible results and to share data FAIRly and openly with others. Such requirements can be challenging. First, guidelines of good scientific practice often remain very general and at a high level of abstraction. Second, researchers often find data management a necessary evil. And third, not all of them are familiar with the FAIR Data Principles and the idea of Open Science. At the same time, existing tools to foster the creation of FAIR data—such as templates for data management plans (DMP)—largely vary, and rarely provide tailored, discipline-specific guidance ().
To address this lack of more discipline-specific guidance, the project DDP-Bildung developed the Stamp. Funded by the German Federal Ministry of Education and Research (grant number 16QK01), the project brought together 12 research institutions in Germany with diverse areas of expertise on educational research to develop the Stamp within a funding period from June 2019 to May 2022. Based on the concept of Science Europe, DDPs are open, standardized, and referenceable data protocols, serving as ‘model’ DMPs for a particular domain or community (). Accordingly, a DDP covers different topics of data management, such as ‘documentation and data quality,’ ‘legal and ethical requirements, codes of conduct,’ or ‘data sharing and long-term preservation’ (). For each of these topics, a DDP includes, e.g., ‘applicable regulations,’ ‘applicable standards,’ and ‘templates and examples’ (). While the formal minimum conditions should ‘set a minimum standard’ for data management and its quality, standards, templates, and examples assist researchers in realizing their data management ().
DDPs are regarded as a ‘pragmatic solution’ for the benefit of various stakeholders. They assist researchers in doing data management, preparing project proposals and funding applications, and offer support for data archiving and sharing. By describing activities to manage data throughout the data life cycle, DDPs also simplify the budgeting of such activities. They support replication of results by the community and the re-use of data by others in new (research) contexts. DDPs simplify review processes regarding data management, reducing the efforts of examining funding applications and reports on data management by implementing standardized procedures. Finally, DDPs foster data ingestion in data repositories and archives by assisting researchers in creating FAIR data. In addition, considering the various recommendations and attempts to reform research assessment and to improve the relevance of managing and sharing research data, we consider the Stamp to be an effective support here (; ; ).
In the DDP project, we investigated educational research as a domain, characterized by common methods of data processing, analysis, use, and interpretation. Types of data in educational research are very heterogeneous, but there is a common terminology to describe such data. We applied this terminology in the development of the Stamp to improve its understandability and acceptance, paying close attention to the needs and requirements of the educational research domain by involving potential users throughout the project. The Stamp’s primary aim is to standardize and facilitate data processing and management in empirical educational research following the FAIR Data Principles. The data can be made available to third parties as openly as possible and as closed as necessary according to the idea of Open Science and in particular the ‘A’ of FAIR (; ).
The Stamp consists of eight so-called content modules and a basic module providing information, e.g., on the project and its data, as well as of an introduction and a glossary, illustrated in Figure 1. The eight content modules comprise research ethics, data protection, copyright, data organization, transparency, availability, long-term preservation, and responsibilities and expenses. Each content module is hierarchically structured, subdivided into different elements. At the first level, the modules include minimal conditions on managing FAIR data and offer explanatory notes. In principle, these minimal conditions form the resulting DMP, and a reference to it should be meaningful enough for funders, according to the basic idea of Science Europe. The second level comprises checklists for managing data and complying with the minimal conditions (level of actions). The third level finally provides legal requirements, e.g., in the context of data protection regulations or copyrights, standards, use cases, and further resources (level of auxiliary materials). On this level, the Stamp contains domain-specific guidance in terms of its standards, referencing to, e.g., guidelines and best practice advise of associations, research centers, and repositories from educational research. Use cases illustrate possible applications, introducing challenges and solutions for the various data management activities in the context of educational research, employing the terminology of educational researchers and reflecting their work routines.
In comparison to the Stamp, most conventional DMP templates, e.g., in DMPOnline (), from the National Science Foundation (), or Horizon Europe (), consist of sets of questions about data management. But, they rarely provide guidance on how to find answers to these questions and thus on how to plan, implement, and realize data management activities. The Stamp sets out to address this need of researchers, giving concrete answers to a vast number of questions on how to manage their data, providing guidance in the form of checklists and auxiliary material. The Stamp thus differs from conventional DMP templates, going beyond simply naming data management activities and respective explanatory notes (see Figure 1, level 1). Researchers can apply the Stamp just by using its checklists. This option might be particularly fitting for experienced researchers interested in giving their data management a clear structure, easy to compare with other projects and thereby promoting the idea of Open Science. According to their individual level of expertise, researchers can also choose to use more guidance. For example, an early career researcher, in his/her first research project, may want to consult legal regulations, a use case, or even use a training recommended in the further resources of the Stamp.
Unlike traditional DMP templates, the Stamp, with its clear structure and step-by-step instructions, provides low-threshold support to assist researchers in helping themselves, and thereby also aims to increase data availability. As recent research shows, research data availability in the field of educational psychology was as low as 7.16% in 2020, independently of the scientific journals’ and the research institutions’ research data policies (). Thus, in cases where data management is not compulsory, introducing targeted incentives would significantly enhance the availability of research data. This requires that researchers comprehend the benefits of sharing data and have the capability and motivation to do so. The Stamp, rather than conventional DMP templates, is an example of providing such incentives.
3. The Cross-Disciplinary Character of the Stamp’s Minimal Conditions
Initially, we designed the Stamp to manage data of educational research. However, after consulting various stakeholders, we are convinced that the Stamp can be used outside educational research, expecting that data management is rather a matter of the types of data processed and methods employed than of a particular domain. With the aim of putting this approach to test, we organized two workshops with data management experts from the social sciences, on August 16th, 2022, as well as with experts from other research domains outside the social sciences, on August 23rd, 2023. In these workshops, we discussed the extent to which minimal conditions and checklists can be used beyond educational research. We announced the workshops via mailing lists and blog posts on the website of the project DDP-Bildung and further disseminated by DDP-partners into related domains. In total, more than 100 people applied for participation in one of the workshops. As places for each event were limited to 30 people, we selected participants according to the date of registration. In sum, 28 experts—representing social science domains such as political science, sociology, and psychology—participated in the first workshop. In addition, 26 experts joined the second workshop, representing, e.g., chemical science, agriculture, biology, physics, linguistics, philosophy, or medical science.
Most of these participants were from German universities, and were involved in general data management. Among the participants were particularly knowledgeable persons from the field of data management as well as leading experts. This had the advantage of receiving the statements of researchers who could provide insights into entire domains rather than speaking from a limited personal experience.
Interestingly, participants of both workshops agreed that the Stamp’s minimal conditions can be used in their domain, in particular sharing examples of aspects that should be included to fully meet their needs. Due to their abstract nature, the minimal conditions do not claim exclusive validity for educational research, but general validity on managing data, regardless of the specific domain. Minimal conditions on items such as data organization (IV.), transparency (V.) or availability (VI.) as well as ensuring responsibilities and expenses necessary for data management (VIII.) are not domain-specific (see Table 1). They concern all domains or empirical research projects. For example, to work with data, researchers, regardless of their domain, must ensure adequate documentation, enabling (re-)users of the data to retrace its provenance, understand it, assess its quality, and interpret it meaningfully. This applies not only to the original researchers but also to others examining the data or (re-)using it for new purposes. The procedures and standards of data documentation as well as the metadata schemas employed might be domain-specific, but the need for data documentation is cross-disciplinary.
I. Research Ethics Data and related materials are processed in accordance with the rules of good scientific practice. The project members respect (personal) rights of everyone involved in the project throughout its duration and beyond. II. Data Protection Personal data will be processed in accordance with the legal requirements of data protection. This applies to (1) the secure processing of personal data during the project, (2) the availability of data for re-use by others beyond the project, and (3) the long-term preservation of relevant materials. III. Copyright Processing of data and related materials generated in the project as well as the materials of third parties re-used are carried out in accordance with the provisions of intellectual property rights. In the duration of the project, this applies to (1) the legally compliant use of materials in accordance with the project, (2) the transfer of copyrights to re-use the data in the context of data sharing with others, and (3) the long-term preservation of relevant materials beyond the project. IV. Data Organisation Data and related materials are systematically stored and secured in a protected back-up system to ensure its usability during the project (and, if necessary, beyond). V. Transparency Data and related materials are processed and documented in the project in such a way that project members as well as third parties can 1) recap the entire data genesis, and 2) (re-)use the data and related materials in the current project as well as beyond. VI. Availability As far as possible, all data and related materials generated in the project will be made available for re-use by others via a repository or research data centre. Shared data and related materials should be as comprehensive as possible in terms of content, as open as possible in terms of legal and research ethics and made available as early as possible in the duration of the project. VII. Long-Term Preservation Data and related materials that cannot be shared for re-use with others will be preserved for at least 10 years beyond the end of the project in accordance with the rules of good scientific practice. VIII. Responsibilities and Expenses Responsibilities and expenses are defined for processing data and related materials. It concerns both responsibilities for the implementation of the present Stamp and the expenses necessary to do so (1) over the entire duration of the project, (2) for the availability of data for re-use by others, and (3) for the long-term preservation of data and related materials beyond the project. |
Some minimal conditions on, e.g., research ethics (I.) or long-term preservation (VII.) might be of different relevance for different domains. However, whether a minimal condition is relevant for a given project must be examined at the beginning of the research process in all domains. For example, representatives of the natural sciences reported that their researchers are less confronted with obligations on protecting their research objects, e.g., in terms of data protection (II.). Nevertheless, it is an ethical obligation for all domains to protect their researchers and institutions as well as to be aware of (negative) consequences of research outcomes, as outlined in the content model on research ethics (I.).
Another example is long-term preservation—the storage of materials that cannot be shared beyond the end of the project. Among others, long-term preservation depends on legal issues, such as intellectual property rights or data protection regulations, as well as on characteristics of the data, like its volume. An illustrative example is the Large Hadron Collider in CERN, Switzerland, where ‘in November 2018 alone, … 15.8 petabytes of data were recorded’ (). Such a volume cannot be preserved, or can be only with a considerable expenditure of resources, which can be very challenging, considering energy-saving measures and other issues ().
Other minimal conditions are partially or even completely obsolete for some domains or empirical research projects, such as data protection (II.). Projects that do not process personal data do not need to consider data protection regulations. This may apply to entire domains, but even in medical or social science research empirical projects exist that do not process personal data, e.g., when examining organizations or associations, or when working with aggregated data, exclusively. In short, compliance with data protection regulations depends on processing personal data and the information covered in this data (), regardless of the domain concerned. The flexible structure of the Stamp enables users to skip data management activities or even complete modules, if they are irrelevant for a given project. However, as summarized by one of the participants from natural sciences, once personal data is processed, such as when organizing an event, the Stamp’s minimal condition and respective checklists serve as a guideline to follow data protection regulations.
Finally, some of the Stamp’s minimal conditions do not meet the requirements of some domains completely. As an example, the minimal condition on copyright (III.) does not include patent rights, which might be relevant, e.g., in engineering or computer science. In addition, other legal regulations such as animal protection in veterinary medicine or the Nagoya Protocol on Biodiversity () in biology are not considered, as highlighted by participants from the life sciences. Here, the respective domains are called upon to discuss possible additions to the Stamp’s minimal conditions required for their data.
4. Applying the Stamp’s Checklists outside Educational Research
In addition to the minimal conditions, we also explored the cross-disciplinary character of the Stamp’s checklists, discussing two checklists in the workshops. The first checklist focuses on consent management in the context of data protection, displayed in Table 2. For researchers in social sciences, data protection, informed consent and secure data handling are part of their everyday work routine. Regarding the checklist, there was consensus among participants of both workshops that consent management is a legal requirement for processing personal data, regardless of the domain.
NOT APPLICABLE | CONSIDERED | ||
---|---|---|---|
⚬ | II.Ba.5 | project members implement a consent management system, considering | |
⚬ | II.Ba.5.1 | the purposes of processing consented | |
⚬ | II.Ba.5.2 | further (categories) of recipients of data | |
⚬ | II.Ba.5.3 | the duration of data storage as well as the criteria for data destruction | |
⚬ | II.Ba.5.4 | the claiming of rights of the persons concerned, documenting | |
⚬ | II.Ba.5.4.1 | who has claimed which rights, when, and how | |
⚬ | II.Ba.5.4.2 | the timely communication with persons claimed their rights | |
⚬ | II.Ba.5.4.3 | consequences of right claiming for further data processing see use case on Objection Data Processing and Revocation of Consent (II.Ba.5.4.3.F) | |
⚬ | II.Ba.5.4.4 | restriction of rights of data subjects and its conditions see use case on Restricting Participant’s Right (II.Ba.5.4.4.F) | |
⚬ | II.Ba.5.5 | possible deadlines, e.g., for destructing contact data, according to the consent form | |
Participants agreed that the checklist contains information needed to meet legal requirements of data protection regulations. Interestingly, while the checklist on consent management can be used in many domains, it requires some adaptations; for example, when collecting test results in the form of blood samples in medical science. Also, it is of importance that such legal requirements depend on the region in which researchers operate. It implies that all minimal conditions and checklists of the Stamp containing legal information must be adapted to regions outside Germany and the European Union, e.g., for international cooperative projects, as discussed with representatives from the social sciences in the first workshop.
In addition, participants of both workshops examined the usability of one of the checklists on data documentation ensuring transparency. The discussions focused on the documentation of measurement instruments for quantitative (V.Al.11) and qualitative (V.As.6) data, as shown in Table 3, and to which extent activities listed can be transferred to other domains. In general, participants agreed on the checklist’s content as being of relevance for all domains. To ensure understandability, interpretability and (in part) replicability of data, data documentation is key, e.g., regarding data provenance and the measurement instrument employed.
NOT APPLICABLE | CONSIDERED | |||||
---|---|---|---|---|---|---|
V.Al.11 Documentation of the Measuring Instrument | ||||||
⚬ | ⚬ | V.Al.11 | The documentation contains information on the measurement instrument at the study level | |||
⚬ | ⚬ | V.Al.11.1 | (documentation of the) original (digitalized) measurement instrument, including all questions, items, tasks, and answer options (as well as options for refusing to answer) | |||
⚬ | ⚬ | V.Al.11.2 | (documentation of) digitalized interviewer instructions | |||
⚬ | ⚬ | V.Al.11.3 | (documentation of) digitalized skip pattern structure in the measurement instrument | |||
V.As.6 Documentation of the Measuring Instrument and the Method of Data Analysis | ||||||
⚬ | V.As.6 | The documentation contains information on the survey instrument and the evaluation method on the study level | ||||
⚬ | ⚬ | V.As.6.1 | (documentation of the) original (digitalized) measurement instrument | |||
⚬ | ⚬ | V.As.6.2 | (documentation of) digitalized documentation forms, socio-demographic questionnaire, interview guide, etc. | |||
⚬ | ⚬ | V.As.6.3 | description of document collection or protocolling of interviews see recommendations on Observation Protocols (V.As.6.3.E) | |||
⚬ | V.As.6.4 | description of specifications regarding data storage, transport and preservation of recording and observations | ||||
⚬ | V.As.6.5 | description of data collection and measures to ensure data quality | ||||
⚬ | ⚬ | V.As.6.6 | citation (source, persistent identifier, and mode of citation, if applicable) and licence conditions of all materials (re-)used in the project, such as | |||
⚬ | ⚬ | V.As.6.6.1 | data, artefacts, and documents | |||
⚬ | ⚬ | V.As.6.6.2 | further measurement instruments, like interview guides | |||
⚬ | ⚬ | V.As.6.6.3 | supporting materials for interviewing, stimuli etc. | |||
⚬ | V.As.6.7 | (documentation of) method of data analysis | ||||
Of course, different domains employ different documentation standards, metadata sets and (controlled) vocabularies to describe, e.g., the process of data gathering. Such standards and vocabularies are usually to a great extend domain-specific. However, as discussed in the workshops, the domain-specific character of data documentation is rather about data documentation standards than about the content of documentation. For the latter, there was consensus that data documentation must cover the process of data gathering and the measurement instrument(s) employed, regardless of the domain. Therefore, the Stamp’s checklists on data documentation might not apply to the standards of domains outside educational research. But the structure and content of checklists serve as a blueprint for other domains to develop their own standardized data management plan, including checklist(s) on data documentation.
Another limitation discussed was about the content of the checklists and the data management activities covered. One participant, representing life science, mentioned the need for including additional activities, such as documenting technical devices used for data gathering. Transparent documentation of a particular disease may require more than documentation on blood samples, e.g., naming the hardware employed to analyze such blood samples. As the Stamp is a first attempt to design a standardized data management plan, we could not consider every research scenario in its development. However, its flexible structure enables researchers to easily modify checklists as needed, like including activities to document technical devices for data gathering and analysis.
Finally, participants of both workshops discussed the terminology used in the Stamp, its minimal conditions, and checklists. During its development, the Stamp was continuously evaluated by researchers, data stewards and data managers in educational research. Consequently, it employs the terminology used and understood by researchers in this domain. Making the Stamp more usable outside educational research requires an adaptation of this terminology due to the common practice of the respective domain. For example, the titles of the content modules might not be self-explanatory for all domains, as argued by representatives from the natural sciences. Depending on the terminology used in a particular domain, a renaming would therefore be advisable.
5. Conclusions
To provide a more tailored, discipline-specific guidance, assisting researchers in conducting their data management, we set out to develop a first domain data protocol for educational research in terms of the Standardized Data Management Plan for Educational Research (Stamp). As intended, our endeavor resulted in a tailor-made tool for educational research, containing checklists and auxiliary materials to guide researchers through their data management and to process sharable data, according to the FAIR Data Principles and the idea of Open Science.
However, due to its flexible structure, the Stamp has a cross-disciplinary character. With its minimal conditions, it reflects requirements of good scientific practice on replicability of research outputs, data management, and data sharing. Consequently, the minimal conditions are not domain-specific but cross-disciplinary. Of course, some of the Stamp’s minimal conditions are far-reaching for some domains while others are not sufficient for other domains. Nevertheless, with its eight minimal conditions, the Stamp provides a first set of requirements on managing FAIR data that can be adopted, rearranged, and supplemented according to the requirements of other domains.
Likewise, the Stamp’s checklists, defining activities to manage data and to reach the minimal condition of each content module, can be re-used by domains outside educational research, at least when processing the same types of data with similar methods. But the flexible structure of the Stamp also enables researchers to adopt the checklists and extend its content by further data management activities, transferring it to other domains. The Stamp thus serves as a blueprint to develop DDPs for such domains, requiring two further adjustments. First, the terminology must be translated according to the terminology of the respective domain. Second, domain-specific, tailored guidance must be added at the auxiliary level, e.g., in terms of documentation and metadata standards.
In sum, the Stamp can be best described as a comprehensive data protocol instead of a ‘model DMP’ for a given domain. This data protocol is now to be tested for its real-life applicability in other domains. Transfer and adaptation by additional domains will facilitate further use, increasing the available set of minimum conditions and checklists across domains. In any case, testing for applicability will elaborate similarities and differences in data management across domains and thus foster our understanding of domain-specific practices in managing data.