Discover Top Posts Tagged with #datacite

Some thoughts on institutional research software management and persistent identification

Pablo de Castro, Open Access Advocacy Librarian at U Strathclyde (with thanks to Alan Morrison, Research Data Management Officer, for the explanation on institutional workflows around research software at Strathclyde Uni)

See also this previous StrathOA blog post by Alan Morrison "Depositing, distributing & citing software and code (A Zenodo – GitHub integration)"

A webinar on "DOIs for research software" will be organised by DataCite on Wed May 24th (in two weeks at the time of writing). This forthcoming event provides a good opportunity to share some thoughts on research software and the attempts to persistently identify it. These thoughts address the way institutions may or may not be specifically supporting research software management, with some specific considerations on persistent identifiers thrown on top. These latter thoughts are driven by one of the questions immediately raised by the event title: why is it called "DOIs for software" instead of "persistent identifiers for software"?

a. On research software management and its support from institutions

1. Research software is a key part of the gradually-developing European Open Science Cloud (EOSC). Moreover, research software is also a critical element when discussing research reproducibility

Slide from presentation “Software – a different kind of research object?” delivered by Neil Chue-Hong (Software Sustainability Institute) within the U Lancaster 3rd Data Conversation linked below (click on the image to access the full deck of slides)

2. While data repositories have also been collecting software for quite some time, this tends to be a researcher-led task. Proactive institutions are definitely able to support their academics for this specific purpose, usually within a wider conversation on Research Data Management – see for instance this inspiring 3rd Data Conversation "Software as data" held by colleagues at Lancaster University on Oct 3rd, 2017.

3. It's not that frequent however for institutions to independently address research software management as a separate area with its specific workflows and resources, but rather as part of the RDM-related work. RDM policies are quite widespread – including a recently issued RDMS policy at Strathclyde – but they tend not to include specific sections devoted to research software management.

4. When discussing general worklines like persistent identifiers for research software, the perspective of the institution is very relevant. Members of an institutional Open Research team are arguably best placed to deliver the sort of advice on Open Research implementation that would ensure that research software is always persistently identified. Critically, institutional Open Research teams are able to provide this advice in a discipline-agnostic way. This places them at the forefront of any specific dissemination activity around PIDs, not just for software but for any other entity too (including datasets but also projects or research equipment and facilities).

5. The intersection between persistent identification and institutional advocacy offers Open Research implementation teams a potential way into a more holistic support for the adequate management of the various research outputs produced by research groups, departments and schools.

b. On persistent identifiers for research software (or for any other entity in the area of "emerging PIDs" such as geosamples, conferences or research equipment and facilities)

A prominent research information management workflow modeller made the following remark during a discussion on PIDs at a recent euroCRIS event: "After extensively discussing the issue within the team, we decided not to implement a PID-issuing feature for all sorts of entities in [specific commercial CRIS solution] – which we could easily do from a technical perspective – because we could add to the confusion by enabling a mechanism to inadvertently create duplicate unique persistent identifiers for those entities".

An interesting example for this risk of duplication is provided by the VasoTracker software developed by researchers at the Universities of Strathclyde and Durham within the 'Optical Cannula' Wellcome-funded project, persistent grant ID https://doi.org/10.35802/202924 (among other acknowledged funding sources). As described on the VasoTracker website, this is a collection of open source tools for studying vascular physiology. The motivation for its development is also explained in the homepage:

This VasoTracker software not having been deposited in the system [Pure] that Strathclyde uses as a data repository, it has no DOI. The reasons why it hasn't been deposited probably come down to (i) the frequent misconception by researchers that datasets only apply to supplementary data underpinning publications and (ii) the probable wish to avoid the need to keep what has quickly become a live software package updated in several places at the same time – which may have led to choosing the website (and its associated github repository) as the default 'containers' for the code.

So would Strathclyde researchers developing code and their institutional Open Research support teams learn any new tricks at a webinar on "DOIs for research software"? Presumably yes, even if it were just on how Zenodo can help with the deposit of code, its maintenance and versioning. Plus perhaps DataCite will soon start supporting the issuing of DOIs for research software via Fabrica like it's already doing for geosamples and might one day do for research instruments and facilities.

There is however one interesting aspect regarding this VasoTracker software in line with the remark above on the risk of "inadvertently creating duplicates for unique PIDs". VasoTracker already has a PID. It's a RRID and not a DOI, granted (hence the nuanced title for the DataCite webinar?) but still a persistent identifier. How this RRID: SCR_017233 came to be assigned is not easy to tell. It's highly unlikely that this was a result of the outreach effort from the researchers involved in its development – it looks rather as if it had been automatically identified by some algorithm searching all across the Internet, including all github repositories.

In fact, AI-driven PID cross-linking routines could quite quickly get the PID Graph displayed that we are so painstakingly building these days. The SciCrunch portal that hosts all these RRIDs is in fact able to crawl the references to a specific 'identified entity' (a software package in this case, but also a research instrument or facility or an antibody) in the published research literature (with the caveat that it needs to be available Open Access, otherwise even the super-clever modern algorithm will crash into the old-fashioned profit-driven paywalls).

The SciCrunch identification of the research publications that cite this RRID-tagged piece of software is not perfect, or not yet: the list on the RRID webpage only includes two of the seven references shown on the VasoTracker webpage (as identified by the software creators themselves). The fact that these references appear at all on the very same SciCrunch page where this RRID: SCR_017233 is described is a huge progress anyway and a hint at what we will be able to achieve in the not-so-distant future.

The risks of duplication highlighted by the research information management colleague at the euroCRIS meeting should however be kept in mind during the process of expanding the DOI coverage. While duplication is not necessarily an issue per se, it would make sense for the different PID initiatives to enable some (reasonably simple) mechanism to map duplicate entries to each other.

#research software #research data management #persistent identifiers #PIDs #DataCite #PID graph #VasoTracker

On “going deeper than the article”

It seems a strange thing to take an interest in, but for readers of journals, DOIs (digital object identifiers) are commonplace, and understanding how the heaving virtual library is organised can be useful in all sorts of ways.

I stumbled upon an (arguably) interesting blog post, DOIs unambiguously and persistently identify published, trustworthy, citable online scholarly literature. Right? » via PLOS Blogs, Research findings: going deeper than the article.

CrossRef is a DOI registration agency, though there are actually 8 such agencies, and DOIs hold no information on which was used to create them. This leads to some inconsistencies in their compatibility with the web services called “APIs” (application programming interfaces), which provide standardised means to access information on content at a DOI 'address'. For example,

The South Park movie has a DOI:

http://dx.doi.org/10.5240/B1FA-0EEC-C316-3316-3A73-L

This following two DOIs point to the same article- there is no apparent difference between the two copies:

http://dx.doi.org/10.6084/m9.figshare.91541

http://dx.doi.org/10.1038/npre.2012.7151.1

These journals assigned DOIs, but not through CrossRef:

http://dx.doi.org/10.3233/BIR-2008-0496

http://dx.doi.org/10.6084/m9.figshare.95564

http://dx.doi.org/10.3205/cto000081

To search metadata for the above examples, you need to visit four sites:

http://search.crossref.org

https://ui.eidr.org/search

https://www.medra.org/en/search.htm

http://search.datacite.org/ui

The truth is, the DOI has nothing specifically to do with citation or scholarly publishing. It is simply an identifier that can be used for virtually any application.

One of the more common companions to CrossRef (a not-for-profit body representing commercial publishers) is DataCite (comprising public research organisations, plus Microsoft Research), who assign DOIs for the up and coming data repository figshare.

See also ✖ Fundref - CrossRef's provided means to access funding information related to scholarly content

✖ CrossCheck - used by many universities (on iThenticate) to check for plagiarism in essays etc. (Turnitin). Presentation slides from the neuroscientist who created Turnitin describe how it works with CrossCheck here.

✖ CrossMark - informs readers of changes to content in a publication, including within a pdf file. A video explaining it is available at CrossRef's site (below), plus some of their conference slides on SlideShare.

#scientific research #digital object identifier #CrossRef #DataCite #Fundref #CrossCheck #CrossMark

#data #repository #datacite #biomed

Check out this presentation by Björn Brembs.

Hear him speak live, tomorrow at the DataCite Workshop in Köln.

#DataCite #Björn Brembs #open source #open data #open science