Preservica Users

Leveraging the Preservica API for Data Exports

Maxime Champagne

November 9th, 2020

The journey to become a trusted digital repository is a significant endeavour for any organization. As part of LAC’s (Library and Archives Canada) efforts in this regard, over the past two years we have been testing and integrating our workflows with Preservica to support LAC’s management of digital holdings.

Preservica brings hope in the digital preservation sector as it is being used in many different countries, is compliant with the OAIS model, and provides LAC and other institutions with benefits from our collaborative investment into the platform. However, for those that are familiar with COTS (Commercial-off-the-Shelf), these solutions tend to cover most of our requirements but not all, as opposed to custom-made solutions that tend to be tailored to an institution’s specific needs… for a price. This is where the Preservica API comes into play, which helps to bridge that gap. Preservica has implemented functions for its API covering almost every angle and allowing for greater flexibility with the platform.

The article today is part of a contribution from LAC to the developer blog of Preservica in the hopes of further helping the global community in achieving their digital preservation goals by leveraging Preservica functionality.

My name is Maxime Champagne, supervisor of the digital repository at Library and Archives Canada.

Summary of What and Why

LAC relies heavily on its data tape infrastructure for preserving content. Our requirement is to be able to export content from Preservica and to store it to data tape while preserving and validating the fixity values upon transfer.

In working with the Preservica software and cloud-based tools, it was identified that our collections were growing to a point where we could no longer export them solely using regular methods.

This is where the Preservica API comes into play, enabling the export of large or infinite volumes of data as everything is converted into a micro transaction. For enabling these transactions, a framework of generic functions has been established.

Process Overview

Function Summary and Explanation

Get-PreservicaStructure

This function starts at the root of the active Preservica Instance and crawls down through each Structure Object (folder), mapping it to a hash table for future use.

When retrieving objects, if you know the associated Structure Objects (SO) you can link them to the hash table (Key+Value pair collection) and retrieve the hierarchy where it belongs (the full folder path).

Convert-ArrayToHashMap

This function reads an array object and converts it to a Hash Table (or dictionary object) for fast access.

The function removes the need to search the array for a given key, and instead maps all the keys with their relevant values.

Get-PreservicaObjectsFromFolder -so_ref $root_SO

This function requires a reference to a Structure Object (SO). The function will crawl down and collect any Information Object (IO) and Collection Object (CO) it finds to store them in an array.

Get-PreservicaUpdatedObjects -sinceDate $sinceDate;

This function retrieves all objects that were updated since the date specified.

Get-PreservicaMetadataForObjects

This function populates the provided array of objects with relevant metadata (generic) from Preservica’s XIP schema, such as:

'fileName' = $fileName
'fileSize' = $fileSize
'fixityAlgorithm' = $fixityAlgortihm
'fixityValue' = $fixityValue
'content' = $downloadURL

Get-PreservicaParentDataForObjects

This function improves the data by associating each object with its parent object.

$row.path = $path;
$row.io_ref = $io_ref;
$row.securityTag = $securityTag
$row.so_ref = $so_ref;

Get-PreservicaEventLogs

This function extracts the last 100 events for each object.

Get-dataFromMODSMetaDataFragment

This function extracts specific metadata stored within a MODS fragment for each object. In this case, OCLC refers to LAC’s library cataloguing solution:

'link' = $link – OCLC reference number
'link_source' = $link_source – the source name – being OCLC in this case
'language' = $language – language of the publication

Get-PreservicaObjects

This function creates the folder structure for each object and downloads the files.

Source Code

The source code for the project can be found at https://github.com/lac-preservica/dams.

More updates from Preservica

Preservica Users

Texas State Library and Archives Commission - Preservica APIs in practice

Brian Thomas, Electronic Records Specialist at Texas State Library and Archives Commission (TSLAC), considers why they started using Preservica APIs, the most common ways these are used and the steps they use to work out new processes.

Brian Thomas

October 14th, 2020

Preservica on Github

Open API library and latest developments on GitHub

Visit the Preservica GitHub page for our extensive API library, sample code, our latest open developments and more.

Preservica.com

Protecting the world’s digital memory

The world's cultural, economic, social and political memory is at risk. Preservica's mission is to protect it.