FAIR Data

​​​​​​The FAIR principles are guidelines to make all research results such as (but not limited to) research data, software and protocols Findable, Accessible, Interoperable and Reusable. They were formulated and formally published in 2016 in Nature Scientific Data.

 

- FINDABLE: first of all data should be easy to find for both humans and computers and this can be done through rich and detailed metadata [labels describing author, title, date, provenance...] and unique and persistent identifiers [stable URLs to which that resource will forever correspond, a DOI, a handle, etc.].

In order to be retrievable, the data must be kept for at least 10 years in an archive that offers guarantees and stored in multiple copies and in a secure environment even during the research, not only when it is finished.

 

- ACCESSIBLE: data and metadata must be Accessible, which does not mean “open” but knowing how to get to the data and how to possibly download it through an open, free and implementable protocol everywhere. Authentication and authorization procedures and/or confidentiality agreements may be in place.

The guiding principle to be followed is “as open as possible, as closed as necessary” depending on the specific requirements, e.g., for the protection of sensitive data or Intellectual Property.

There may be FAIR data closed for security or privacy reasons.

Descriptive metadata again play a key role, also to signal the need for particular transmission protocols (other than http://) or the presence of API – Application programming Interface.

Ideally, to be Accessible, data should be saved in non-proprietary, uncompressed, unencrypted formats with documented standards.

 

- INTEROPERABLE: data must be described using standards relevant to the community. A valuable tool in this respect is the FAIR sharing register, a curated and informative resource on data and metadata standards, inter-related to databases and data policies.

 

- REUSABLE: the ultimate goal of the FAIR principles is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings and should have a clear and accessible data usage license (such as the Creative Commons).

 

 

Open data is one of the pillars of open science and one of the objectives of the Politecnico di Torino’s “PoliTo4Impact”Strategic Plan.

The topic was addressed by a specific pilot programme in Horizon 2020 and will play an important role in Horizon Europe, the next European framework programme (2021-2027) with the obligation to make datasets supporting research results open.

 

Each letter in FAIR refers to a list of principles with a total of 15 principles altogether. l

Below the list of the principles. 


FINDABLE:

F1. (meta)data is assigned a unique and persistent identifier

F2. data are described with complete metadata (defined by R1 below)

F3. metadata shall clearly and explicitly include the data identifier they describe

F4. the (meta)data are recorded or indexed in a searchable resource

 

ACCESSIBLE:

A1. (meta)data are retrievable by their identifier using a standardised communication protocol

   A1.1 the protocol is open, free and universally implementable

   A1.2 the protocol allows for an authentication and authorisation procedure, where necessary

A2. metadata are accessible, even when the data are no longer available

 

INTEROPERABLE:

I1. (meta)data use a formal, accessible, shared and widely applicable language for knowledge representation

I2. (meta)data use vocabularies that follow FAIR principles

I3. (meta)data include qualified references to other (meta)data

 

REUSABLE:

R1. meta(data) are richly described with a plurality of accurate and relevant attributes

   R1.1. the (meta)data are released with a clear and accessible data use licence.

           If you need help, you can use the License Selector Tool 

   R1.2. the (meta)data provide detailed information on provenance      

   R1.3. the (meta-)data meet standards relevant to the subject area of a scientific community

 

FAIR self assessment tool by the Australian Research Data Commons.

F-UJI is a web service that allows you to evaluate the “FAIRness” of your data automatically if you have a PID (DOI or URL) associated with the dataset you want to evaluate. It is based on metrics developed by the FAIRsFAIR project.

FAIR Enough is also a web service whereby by entering the DOI, URL or handle of an online resource (=your dataset or other research product) you get an evaluation on whether it complies with FAIR principles. It is a service developed by Maastricht University.

FAIR data does NOT mean OPEN data.

The “A” in the acronym “FAIR” indicates that the data must be accessible in some way, but not necessarily open to all.

Remember that data can be confidential but still be managed according to FAIR principles. The guiding principle to follow is “as open as possible, as closed as necessary” depending on the specific needs. For ethical, privacy or intellectual property protection reasons, some data may have to remain closed.

Read more:

What is the difference between “FAIR data” and “Open data” if there is one?

Three camps, one destination: the intersections of research data management, FAIR and Open

Think about how research and innovation could advance faster with the increased reproducibility and transparency enabled by FAIR/Open data and think about the people who could benefit from your data. The first to benefit from FAIR data is you!

“2As a scientist, you should treat your data as a love letter to your future self” (Lambert Heller, German National Library of Science and Technology - Nature Index 360o Feb 2019) 

Source: Sara Jones, DCC, University of Glasgow, Open Science Days 2015, 21st & 23rd April, Prague & Brno

FAIR Data

Metadata standards vary from discipline to discipline. Some resources that may be helpful in finding standards for your specific field of research are:

In some fields of engineering, technology and design, standards for data and metadata are still evolving. It is useful to check with your research community on the development and co-creation of these standards.

The journey towards FAIR data starts even before you begin your research project. Drawing up a Data Management Plan will force you to think about your research data management practices from the very beginning. During the research cycle, as shown in the figure below, you will have to think about how to manage and archive your data during the research (the “active” data management phase). It is important to devise an archiving and backup strategy for active data management during the writing of the DMP: to do this also consider the resources your institution has in place.

Towards the end of the project, you will instead have to make choices about the type of data you wish to preserve in the long term. This process is called long-term archiving. To archive the results of your long-term research, you need a certified repository that guarantees certain standards of security and control. You can find some information in the section below “What is a repository and how can I choose one?”.

FAIR Data

Different research disciplines have different research results and it may be necessary to consider various elements when deciding what to select and preserve. Here are some general guidelines that apply to most research disciplines. For your discipline, check the best practices followed by your research community and/or consult your RDM consultant.

Definitely deposit:

  • Original data sets, original software code, raw data obtained from the analysis of physical samples, observational data that cannot be regenerated.
  • Data sets that are not original and not readily available or available online, which you have permission to share.
  • For social science data, include descriptions of studies, code books and summary statistics.

Possibly deposit:

Intermediate versions of analysis or code if potentially useful to others or if they have been used in publications or theses.

It is not necessary to deposit:

  • Incomplete, non-functional or intermediate versions of code that would be of marginal use to others.
  • Output files from analyses if 1) the data set and code used to generate the output are deposited and 2) it is easy enough to regenerate the output from the deposited files.
  • Data sets stored and accessible through other institutions or organisations.
  • Graphs or tables created from the original data that could be easily regenerated.

Do not deposit:

Any data containing personal information that identifies human subjects or data that could violate legal contracts.

Exceptions:

Analysis output files can be deposited if they take a long time to regenerate or are excessively large or cannot be easily recreated from the deposited data set and code.

The repository in the context of research data/outputs is a digital environment that enables the preservation of research data and other digital outputs in the long term. Essentially, it should offer the following functionalities:

  1. Stores data securely
  2. Ensures that data can be found
  3. Describes data appropriately (metadata)
  4. Adds licence information

You can deposit data in a generic repository (e.g. ZenodoHarvard Dataverse) or in a subject-specific repository (e.g. Dryad). Looking for your discipline? Searchwww.re3data.org for more suitable data repositories. See a demo of a search for data repositories using the re3data directory.

Preferably, you should deposit your data in a reference repository of your scientific community.

As a good practice, after filing, we invite you to archive the metadata (DOIs) in the PORTO@IRIS institutional archive in the “9. FAIR Data Collection”.

 

Openaire provides a detailed cost guide that will give you an indication of the time, effort and budget required for RDM-related activities ranging from data archiving, data cleaning, software licence costs to data analysis and archiving in a repository.

Remember that these costs can be budgeted in your financing proposals.

We cannot imagine conducting research without software. Researchers use software for research activities or develop their own software as part of the research results.

For good scientific practice, research software should adhere to FAIR principles to enable full repeatability, reproducibility and re-use. Research software should be archived for reproducibility and actively maintained for reuse.

Publishing open source research software is an established practice in science on platforms such as Github and Gitlab.

Growing community initiatives such as software carpentries help train researchers, who have no specific background in software development or programming, to establish workflows that help them manage, monitor, preserve and, if possible, share or publish research software using task automation tools and version control systems such as Git.

For more information on how to FAIR the software: