In these coming months, we will cover scholars' experiences with Git and Git hosting platforms (e.g. GitLab), why it’s important, and how we as LIS professionals can preserve the materials, workflows, and data in these systems. As the Git Research Scientist, I will share protocols, use cases, and features of Git integrated into the academic environment.
This blog will start with the basics to make sure that we’re using the same terminology and have the same general foundation of understanding. I begin by asking a few questions: What is the scholarly experience? What is the Git experience? How do these two things intersect? The definitions reviewed here are not all-inclusive nor mutually exclusive, but these will be expanded upon in the coming posts as we look deeper into how scholarship as a concept has evolved, and how scholars have adapted Git as a tool that can fulfill the new needs.
What is the scholarly experience?
First, a definition of the “scholarly experience” to distinguish the different needs between commercial/hobby software developers and the scholarship environment. Git is open to all users, only a portion of whom are working academics or engaged in projects related to academic research. As this blog is engaged in the academic uses of Git, it is necessary to understand what we mean by this. A working definition is difficult, however, because the range of experiences that qualify as academic, as research, or as academic research is extremely broad. For the purposes of this study, academic can be defined as activities taking place in, through, or in conjunction with an institution of higher education and/or major academic funding agency. While scholarship has progressed and transformed through the decades (i.e. including but not limited to diversity, inclusion, equity, and accessibility), Ernest Boyers’ 1990 book, Scholarship Reconsidered: Priorities of the Professoriate through the Carnegie Foundation for the Advancement of Teaching, covers the core tenets that define scholarship: teaching, service (applied research), basic research, communication (synthesizing across disciplines, topics and time), scholarship of agreement (diplomacy and rigor to the discipline), and teaching and learning.
Boyers’ additions continued to append that each core tenet should include an aspect of public sharing in order to give back to the community and to give peers an opportunity to receive, review, and collaborate on the teaching, research, and service output. The emphasis on synthesizing and communicating a scholar’s work for the peer community, allowing opportunity for peer review, ensures the continuous cycle of advancement through research and teaching. In this case, scholar, academic, and researcher will be synonymous and used interchangeably.
In short, the scholarly experience “melds discovery, integration, application, and teaching” (Glassick, 1998), while publicly sharing in order to contribute to community development and validate through peer exposure and review. The “four defining characteristics … 1) questioning, 2) gathering and exploring evidence, 3) trying out and refining new insights, and 4) going public.”( Loertscher, 2012).
What is the Git experience?
The word “Git” is a British English slang term identifying an “annoying, ignorant person”, which sets the scene for the Git experience in the case of open-source software. Wikipedia’s Git page quotes that when Linus Torvald developed Git in 2005, he favored in naming his projects after himself—Git was the term to describe his “egotistical” nature in creating “the stupid content tracker”. (Bonus: The initial commit of Git is captured in the Internet Archive!) Basically, Git is “a program to manage your source code history… to manage projects, or a set of files, as it changes over time.”(Steeves, What is Git?) Within two years of Git’s release, popular usage and demand for a more user-friendly interface lead to the creation of compatible hosting platforms: GitHub, Bitbucket, GitLab, Sourceforge, etc. These offer the same functionality of the local Git command-line experience, but they also include: a friendly web-based GUI, access control levels, and collaborative features such as wiki, task management, bug tracking, and pull requests.
Whether a user is interfacing Git via command-line or a GUI host platform (to be reviewed in the next blog post), there are the basic Git steps to get started:
- Projects or files and the information about activities done within are stored in a data structure called a repository. This is just a regular folder with set of files like anyone has on their computer at a given time, except that someone has initiated Git inside it.
- As edits and changes are made to files, Git can be used to create a snapshot -- that is, a record of the changes made at that specific moment in time, alongside any additional metadata related to those changs (e.g. author, date, message/note).
- There are a few phases in Git’s version control feature:
- The Working Directory: the project folder holds all of the files where updates are not tracked
- The Staging Area: where file changes are noted but no new repository versions are created
git add .
- View the changed files that are held in the staging area
- The Repository: where the changes become new versions of the original files
git commit -m “message describing changes made”
With this efficient and simplified workflow, Git and its compatible hosting platforms were created for software product development. Developers from small startups to tech giants like Microsoft, Spotify, and AirBnB rely on Git and GitHub to optimize efficiency amongst their software engineers in the agile development process. The ability for developers to work on mirrored code files at the same time, merge the files, and deploy an aggregated version of the code allows for quick updates, version control, and accountability on the developers in teams. With a low retention rate where developers tend to stay on a project for less than two years (Paysa, 2017), Git offers historical provenance so that incoming developers are able to reference documentation and pick up where the last person left off. Similarly, scholars often need to document data as they experiment, analyze, and publish results. Peer reviewers and researchers looking to reference publications need to be able to reproduce those results, and this is possible through Git’s version control feature parsed out above (Ram, 2013). This is just one of many benefits that the Git experience offers to scholars.
As a tool, the scholarly labor and content contributed using Git hosting platforms has not seen the validity and care in the same manner as data repositories or institutional repositories. This is where the analysis of academia integrating Git comes into play.
How could you incorporate Git into the academy? How do you use Git in your workflow? The next post from me will include the popular Git hosting platforms as a tool and the common use cases. Tune back in for next week’s post for more one web archiving and preserving the Git experience from our LIS Research Scientist.
Greetings—my name is Sarah Nguyen and I’m the Research Scientist for the Sloan Foundation grant-funded project, Investigating & Archiving the Scholarly Git Experience, hosted by NYU Libraries. Currently, I am wrapping up my first-year as an MLIS candidate with the University of Washington iSchool and I am exploring open technologies and community around accessible data management through a few grant-funded projects along with this one: Preserve This Podcast, CUNY City Tech’s OER program, and Dance Heritage Coalition with Mark Morris Dance Group Archive Project. Outside of the internet I can be found riding a Cannondale mtb or practicing movement through dance.
Boyer, Ernest L. (1990). Scholarship Reconsidered: Priorities of the Professoriate. Princeton, NJ: Carnegie Foundation for the Advancement of Teaching. https://eric.ed.gov/?id=ED326149
Glassick, (1998). Scholarship assessed: Evaluation of the professoriate. The Journal of Academic Librarianship, 24(4), 336– 337. https://doi.org/10.1016/S0099-1333(98)90125-2
Loertscher, J. (2012). Using a scholarly approach to improve teaching and learning in biochemistry higher education. Biochemistry And Molecular Biology Education, 40(6), 388–389. https://doi.org/10.1002/bmb.20648
Ram, Karthik. (2013). “Git Can Facilitate Greater Reproducibility and Increased Transparency in Science.” Source Code for Biology and Medicine, 8(7). https://doi.org/10.1186/1751-0473-8-7