how to disclose data for double-blind review and make it archived open data upon acceptance
2018-09-06
Openness in science is key to fostering progress through transparency, reproducibility, and replicability. Although open access and open data are two essential pillars in open science, open data forms the foundation for excellence in evidence-based research. For years, I have promoted open science practices, both open access and open data, within the software engineering research field.
Several venues in software engineering1 are embracing double-blind review and urging the sharing of data with reviewers, aiming to produce open data when the papers are accepted2.
While I strive to turn these recommendations into requirements (we are making progress), numerous authors have expressed concerns about potential missteps when double-blind review
and data sharing
occur simultaneously. The process is indeed straightforward, but once the data is appropriately open and archived, there’s no turning it back3.
In this post, I will explain how to prepare your dataset for double-blind review and subsequently release it as properly licensed and archived4 open data using a straightforward method5.
Quick jump:
Firstly, I would like to reiterate that you should not distribute preprints, postprints, and datasets from non-persistent systems such as personal websites–whether on your personal server or your institution’s server–or consumer cloud storage (e.g., Dropbox, Google Drive). These systems are extremely volatile, and data can disappear over sometimes short periods6.
Scientific and research communities require your knowledge to last forever–and yes, any knowledge you produce is valuable. This is why your data (and pre/postprints) should be released under a proper license and preserved in archived repositories, where no one, not even you, can delete it.
Here’s where figshare.com7 and Zenodo.org come into picture. Figshare and Zenodo are two discipline-agnostic platforms for archived open data. They offer the same functionality. Figshare is a for-profit enterprise managed by open science proponents, whereas Zenodo is a non-profit institution supported by OpenAIRE and CERN. Although they’re free for users and provide identical functionalities, their content archival methods differ. Figshare’s content is secured and distributed with CLOKSS, while Zenodo, to my knowledge, uses no additional digital preservation or archival system, but it is hosted at CERN, which provides reasonable confidence in their data preservation capabilities.
Without further ado, let’s get into it. I’ll explain the process for figshare first, followed by Zenodo. I’m assuming that you already have:
- An anonymized dataset ready for third-party people and machines to view.
- An account on either figshare or Zenodo. You don’t need an anonymous/dummy account.
I have created a dummy dataset named test.csv to demonstrate the process for both figshare and Zenodo.
Double-blind data submission on figshare
In figshare, when you initiate a submission, you’ll need to fill in details such as title, authors, submission categories, item type, keywords, and description.
You can provide unblinded author details at this stage. The author details will be blinded once the item is published for double-blind review. Ensure that the title and description are blinded, as these fields will become visible.
Now for the crucial part. Make sure to Generate a private link
and do not check the Publish
option. The following screenshot displays the correct settings.
There’s no need to check the options Apply embargo
, Make file(s) confidential
8, and Reserve Digital Object Identifier
.
After verifying that the Publish
is not checked, you can save the item.
Include the private URL in your submission. In my case, the private URL was _https://figshare.com/s/9cfe7e5d1d07140ff285_
.
Here’s what reviewers see when they open your dataset from the private URL:
The dataset is presented nicely and can be downloaded. The title, category, and description are shown, while the author details and your username are not displayed. You’re all set. The dataset is private to you and those who know the private URL. Furthermore, the dataset is not indexed by figshare.
Important: private URLs at figshare expire after 12 months and can be extended by contacting support. This is designed to prevent the use of private URLs as final placeholders.
If you have multiple files for double-blind review (e.g., an appendix, a dataset, and some scripts), upload them separately. Then, create a figshare Collection. Collections can also be shared privately via a single URL.
Open data upon acceptance on figshare
Congratulations on getting your paper accepted. Now, you only need to perform three steps to turn your figshare submission into open data.
-
Choose either the CC0 or the CC-BY 4.0 license. CC0 equates to rendering the data as public domain (it can’t be freer), while CC-BY requires attribution when reusing the data and permits any form of reuse9. Earlier versions of the CC-BY license are not appropriately worded for data.
-
Check the
Publish
option. -
Use the DOI you’ve just received to properly reference your open data in your paper.
Here is a sample of my published test file:
Graziotin, Daniel (2018): Test upload to demonstrate private sharing for peer review and DBR. figshare. Dataset. DOI: 10.6084/m9.figshare.7048631.v1.
Done. You are now an open science hero and my personal hero.
Double-blind data submission on Zenodo
Zenodo also allows submissions of blinded data, but I find the process less intuitive, with a twist. As far as I know, the solitary way of making a submission compatible with double-blind review is to publish the submission as open access [sic], with all details blinded. Other options will either render the file inaccessible (Embargo access
or Closed access
options) or expose your username during the request process (Restricted access
option).
Important: your dataset will be anonymous and compliant with double-blind review. However, the file will be publicly accessible and indexed on Zenodo.
Start by uploading your dataset. Ensure that you click the green Start upload
button after selecting your dataset.
Enter various details, but avoid revealing identifying information. The following part varies from the figshare approach: enter Anonymous as the author first and last name
. Even though you’re using your account to upload the item, the published data will display Anonymous as the author, and your username won’t be exposed.
Select the CC-BY 4.0 license. CC-BY mandates attribution when reusing the data and permits any type of reuse9.
As mentioned earlier, choose Open Access as your access right
.
Save and publish the dataset. Utilize the acquired DOI to reference the dataset in your double-blind submission.
This is what reviewers will see:
Open data upon acceptance on Zenodo
Congratulations on getting your paper accepted. There are only two steps required to make your Zenodo submission open data. Technically, it was already open data. You just need to reveal it now.
-
Replace the Anonymous entry with the actual author details.
-
Use the DOI to properly reference your open data in your paper.
That’s all. For any questions, feel free to contact me or leave a comment below. This is particularly relevant if I have overlooked something about Zenodo.
-
This post is aimed at the software engineering research community, but it applies to any discipline. ↩
-
I have collaborated with CHASE, MSR, PROFES, ESEM, and ESEC/FSE [to be defined] on open science policies, serving as the open science chair for all except MSR. Fun fact: the CHASE workshop was the first software engineering venue to adopt open science policies in 2016. At this point, I would like to extend my sincere gratitude to Daniel Méndez for his support and his attempts to incorporate open science practices into conferences while involving me in the process. ↩
-
Both figshare and Zenodo are versioned. Each time you save a published submission, a new version is created and also published. Unpublishing a submission and its versions is not straightforward. Proceed with caution. ↩
-
Archiving means there are mechanisms to ensure that data is appropriately preserved, duplicated, and distributed in such a way that it will stand the test of time even if the hosting platform should fail. See, for example, LOCKKS and CLOKKS. ↩
-
This approach is also compatible with single-blind review and open peer review. ↩
-
Edit 2019-09-26: As of September, I am no longer an advisor for any company, including figshare.
Full disclosure: I am a figshare advisor. If you ask me to pick between figshare and Zenodo, my answer will be biased.↩ -
Make file(s) confidential
is an option used when only metadata should be made publicly available. In software engineering research, it’s unlikely we will need to use this. ↩ -
Some people have asked me why not opt for the CC BY-NC 4.0, which does not permit commercial use of your data. Here’s why [1]. ↩ ↩2
I do not use a commenting system anymore, but I would be glad to read your feedback. Feel free to contact me.