IASGE at iPres2019: A Recap and Review

Hello world! Vicky here, project lead for IASGE, making my blog debut to tell you all about how great #iPres2019 was! iPres is the International Conference on Digital Preservation, held this year at the EYE Film Museum, the national museum for film in the Netherlands, located on Amsterdam’s IJ harbour.

iPRES2019 was hosted by the Dutch Digital Heritage Network, which is a collaboration between cultural heritage institutes that work to address the challenge of access to digital resources over time, with the "[t]he ultimate goal to develop a network of common facilities, services and knowledge base to improve the visibility, usability, and sustainability of the rich digital collections of Dutch heritage institutes." (iPres About Page).

The conference program included hands-on activities such as the Hackathons (one team from SPN came with awesome Emulation as a Service Infrastructure bootable USB sticks! I had a working EaaSI system in 5 minutes!!) and the "Great Digital Preservation Bake-off" (competing toolkits and solutions!) as well as the more traditional conference offerings like a poster and demo session, panels, and paper presentations.

While we were missing Sarah deeply, Genevieve, David Millman (the named PI of the IASGE grant!), and I were able to attend on behalf of the IASGE team.

A picture of (left-to-right) David Millman, Vicky Steeves, and Genevieve Milliken.
Left-to-right: David Millman (IASGE PI), Vicky Steeves (IASGE Project Lead), and Genevieve Milliken (IASGE REsearch Scientist) at the iPres 2019.

As a part of our dissemination work for this project, Genevieve and I presented a poster at iPres about the environmental scan we undertook to appraise the techniques, tools, standards, and workflows currently available to archive Git repositories and scholarly ephemera in Git hosting platforms (e.g. code annotations, discussions on issues/merge requests). The current approaches can be placed into four buckets:

  1. Web Archiving: the process of collecting portions of the World Wide Web; web archivists typically use web crawlers to capture web pages automatically due to the massive size and amount of information on the Web. Some projects of interest using web archiving to capture Git repositories: Scholarly Orphans and Autopilot from Webrecorder.
  2. Self-Archiving: researchers working to archive their own Git repositories. We observed this tends to happen in Zenodo, figshare, and the Open Science Framework. Some characteristics that drew researchers to these platforms are: versioning, DOI minting, and discovery.
  3. Programmatic Capture: the use of APIs, crawlers, and listers to collect source code and/or its scholarly ephemera from code hosting platforms. Software Heritage uses this method to capture source code from repositories, and the GH Archive uses this method to capture the scholarly ephemera from GitHub using the GitHub API.
  4. Software Preservation: the preservation of compiled complex digital objects (e.g. software is the complex digital object). Organizations of interest for this section include: the Software Preservation Network, the Software Sustainability Institute, and UNESCO’s partnership with Inria and Software Heritage.

We are working on this to inform the way that code and ephemera on Git hosting platforms move from a phase in which they are highly active and collaborative, to a state in which they are stable, permanently citable, and under active, professional preservation. Ultimately, we hope to fill gaps in the current digital archiving landscape by gathering and interpreting a broad range of scholarship on these and allied topics in order to think more strategically about the future of these vulnerable works.

You can view and click around our poster below, which goes over our findings in these areas -- any feedback is welcome! Feel free to reach out to me at vicky.steeves@nyu.edu:



We had a lot of interest in our poster and got a lot of insightful questions, helpful feedback, and invitations for collaboration. Below are some candids from the session, from the iPres flickr, all licensed CC-BY-4.0, courtesy of the conference photographer Sebastiaan ter Burg.

Vicky and Genevieve in front of their poster.
Vicky and Genevieve in front of their poster.
Vicky explaining the IASGE poster to a conference-goer.
Vicky explaining the IASGE poster to a conference-goer.

Genevieve talking to a conference-goer about our research.
Genevieve engaging with a conference-goer about our research.
Genevieve lighting up the background of this photo with laughter.
Genevieve lighting up the background with laughter.

iPres2019 was not only wonderful because we had the opportunity to present our work, but also because we got to meet colleagues trying to address similar problems, such as:

  • Memento Tracer: a framework for scalable high-quality web archiving. Martin Klein led a workshop on Monday afternoon, and you can read the collaborative notes here. Memento has 3 essential parts:
    • a browser extension that records Traces (a set of instructions for capturing the essence of web publications of a certain class, like capturing an entire Bitbucket repository),
    • a repository where anyone can upload/download/reuse Traces (this is great, because that means Traces can be versioned! and no one has to reinvent the wheel!), and
    • a headless browser extension that uses Traces as guidance in the process that navigates and captures web publications (so if we have one working Trace for Bitbucket, we can use that for all Bitbucket repos that we want!).

    Martin explained that the Memento can be used in conjunction with ORCID to track researchers across all the platforms they use for scholarship and preserve their work. You can see examples of how this works for 16 test (but real!) researchers at: https://myresearch.institute.

  • SARA – Software Archiving of Research Artifacts: this was the poster next to our IASGE poster, and had a very similar goal of preserving academic code! The goal of SARA is to “enable [researchers] to capture the intermediate statuses of their research work already during the process […] The collected research data and the different versions of the associated software tools are therefore traceable for later research.” Right now the requirements for capturing Git repositories is that it must exist in GitLab (which I love!), but I’m going to keep my eye on this project for next steps.

If you want to hear about the other work that I presented at iPres and some other , I would invite you to check out this blog post on Data Dispatch: https://data-services.hosting.nyu.edu/5-things-i-learned-at-ipres-2019.

The next post on this blog will be an update from Sarah about another notable way in which scholars use Git and Git hosting platforms (following the first four ways she discussed in her July blog post). As said by Genevieve and Sarah in previous posts, ISAGE is in active conversation about all topics in our blog posts. Please do get in touch with thoughts and recommendations on this recap or related projects. You can contact me via email or submit an issue or merge request to our GitLab repository.