My last project update included information about self-depositing software in platforms such as Zenodo, Figshare, and the Open Science Framework (OSF). All the repositories and project management tools discussed have integrations with GitHub and other source code hosting platforms, which contribute to stable homes for software as well as encourage software citations. In addition to these platforms, institutional repositories (IRs) offer yet another location for self-depositing software. In my research into how Git repositories are archived, I am interested in whether or not institutional repositories are practical places for source code. My investigations, which are exploratory, consider the ways in which IRs handle more complex files—such as different types of source code and software—as well as how they account for multiple versions of a particular software. As part of this, I examine the limitations and problems that have been raised about IRs generally as well as various solutions suggested for moving beyond IRs in lieu of a more federated and networked system. One of the main questions in which I am interested is, what are the benefits and drawbacks of depending on the current distributed model of IRs for long-term preservation and access to source code?
The IASGE team is deviating from our regularly scheduled program of research to talk about restrictions being placed on members of our community. And disclaimer, the opinions expressed here do not necessarily reflect those of NYU or the Sloan Foundation. On July 20, 2019, Shahin Sorkh, a computer engineering student and full-time developer, was trending on HackerNews, where someone had linked his blog post about what is it like to be a dev in Iran. This post, previously hosted on GitHub pages, 404'd on July 25th when GitHub decided to comply with U.S. export control laws, implemented on "Specially Designated Nationals (SDNs) and other denied or blocked parties under U.S. and other applicable law [...] including prohibited end uses described in 17 CFR 744", as mentioned in the Trade Controls page on GitHub Help. Users with IP addresses originating in Crimea, Sudan, Cuba, Iran, North Korea, and Syria or whose payment history or other information linked to those locations were affected.
And when we say affected...we mean that users received emails from GitHub saying that their accounts were blocked. Little-to-no advance notice was given, and no option to back up their repositories.
We, the IASGE team, have chosen to write about this because restriction to members of the Git community—even when authorized by Federal Law—has far-reaching and chilling consequences for open source, open scholarship, and for the open exchange of information and ideas.
Self-archiving is the act of depositing a copy of a digital object in a repository so that it is openly accessible to researchers and the public free of charge. While theses, dissertations, pre- and post-prints, and other text-based manuscripts often dominate the literature on self-depositing, datasets and software also have a place within this discourse. As part of the Open Access (OA) model, self-archiving in an open repository is considered "Green" Open Access, while "Gold" Open Access is publishing in a fully OA or a hybrid journal (cf. Harnad, et al., 2008; Harnad, 2015). This blog post discusses options available to self-archive source code so that it is openly available and citable. I begin with a discussion of the problem of source code citability. I then discuss solutions to this including integrations between source code hosting platforms and repositories, the open-access repository Zenodo, and the collaborative workspace Open Science Framework (OSF). Institutional Repositories (IRs) offer yet another option for preserving works and making them discoverable. I will discuss them in my next blog post and will limit this posting to non-IR options.