Scholarly version control, community, & method tracking with Git hosting platforms (GHPs)

Even when there are illustrative zines and guides in layman's terms about Git and Git hosting platforms (re: Oh shit, git!, Really Friendly Git Intro, or Non-technical person's guide to becoming an open source software contributor via Github), it is still difficult for many overcommitted scholars to re-imagine their research habits. Many still see a barrier to entry when incorporating a git commit and a git push at the end of the day of doing their research. As with many first impressions, it's hard to shake off the idea that Git was originally created by and for computer scientists. There will also always be a collaborator who will say "screw it, can we just use Dropbox?" (Raj 2016). This is all true, but as more users adopt Git hosting platforms (GHPs), new features are developed or hacked to create more relevant scholarly features. This post will cover four of the eight (and counting) notable ways that scholars have incorporated GHPs, specifically GitHub, GitLab, and BitBucket, into their daily research, documentation, collaboration, application, and advocacy for openness: (1) version control, (2) community & collaboration, and (3) method/protocol tracking.

Read more…

Web Archiving Source Code?

Although primarily used to capture material related to cultural heritage, politics, and social media, web archiving tools and techniques can be leveraged to capture source code in ways that allow for downloading repositories similar to the live web. The question is how can we extend web archiving to capture code and its ephemera in a manner that most fully reflects the scholarly git experience?

For this blog post, I will cover three examples of state-of-the-art web archiving techniques—Archive-it, Heritrix, and Webrecorder—and will explore some examples in which members of the web archiving and software preservation communities have used web harvesting tools to capture software and content from git-hosting platforms. The web archiving tools explored in this post have been chosen in particular because they are popular among academic libraries for external (subscription-based) and local (in-house) captures, as indicated in the NDSA 2017 Survey of Web Archiving in the United States (Farrell, McCain, Praetzellis, Thomas, & Walker, 2018).

Read more…

Git moving into open scholarship

A few weeks ago, I broadly defined Git and scholarship as separate entities. My original plan for this second Git-focused post was to start listing the features that Git hosting platforms offer and matching them to the traditional scholarly responsibilities they serve (e.g. publish a book on GitHub, collaborate with reproducible data and software on GitLab, etc.). But, as the school year came to a close, a spark hit me while I reflected back on the courses I’ve taken, projects I’ve delivered, and conferences I’ve attended. Openness was a common theme throughout all of the experiences, and so now I want to be explicit on why Git is not just for programmers and why the adaptations of this tool via Git hosting platforms is beneficial and influential to scholars, research institutions, and anyone else on the path of life-long learning.

Read more…

State-of-the-Art Web Archiving Techniques: Part I

Over the next few months, the Investigating and Archiving the Scholarly Git Experience (IASGE) team will explore how academics are using git hosting platforms and how Library and Information Science (LIS) professionals can effectively archive and make accessible the scholarship and scholarly ephemera hosted on platforms such as GitHub, GitLab, and Bitbucket.

My role within this project is to investigate archival and preservation methods for capturing, storing, preserving, and making accessible git repositories, including source code and its contextual ephemera. What I will be exploring ranges from self-archiving git repositories to fully programmatic means of capture. Some of my main research questions include: How can traditional archiving and preservation practices, such as appraisal and policy, inform how git is preserved? How can newer technologies, such as web archiving, be leveraged to include, support, and further the preservation of git for future use?

Read more…

What is Scholarship & Git?

In these coming months, we will cover scholars' experiences with Git and Git hosting platforms (e.g. GitLab), why it’s important, and how we as LIS professionals can preserve the materials, workflows, and data in these systems. As the Git Research Scientist, I will share protocols, use cases, and features of Git integrated into the academic environment.

This blog will start with the basics to make sure that we’re using the same terminology and have the same general foundation of understanding. I begin by asking a few questions: What is the scholarly experience? What is the Git experience? How do these two things intersect? The definitions reviewed here are not all-inclusive nor mutually exclusive, but these will be expanded upon in the coming posts as we look deeper into how scholarship as a concept has evolved, and how scholars have adapted Git as a tool that can fulfill the new needs.

Read more…