Leveraging the Preservica API for Data Exports
Maxime Champagne
November 9th, 2020
The journey to become a trusted digital repository is a significant endeavour for any organization. As part of LAC’s (Library and Archives Canada) efforts in this regard, over the past two years we have been testing and integrating our workflows with Preservica to support LAC’s management of digital holdings.
Preservica brings hope in the digital preservation sector as it is being used in many different countries, is compliant with the OAIS model, and provides LAC and other institutions with benefits from our collaborative investment into the platform. However, for those that are familiar with COTS (Commercial-off-the-Shelf), these solutions tend to cover most of our requirements but not all, as opposed to custom-made solutions that tend to be tailored to an institution’s specific needs… for a price. This is where the Preservica API comes into play, which helps to bridge that gap. Preservica has implemented functions for its API covering almost every angle and allowing for greater flexibility with the platform.
The article today is part of a contribution from LAC to the developer blog of Preservica in the hopes of further helping the global community in achieving their digital preservation goals by leveraging Preservica functionality.
My name is Maxime Champagne, supervisor of the digital repository at Library and Archives Canada.
Summary of What and Why
LAC relies heavily on its data tape infrastructure for preserving content. Our requirement is to be able to export content from Preservica and to store it to data tape while preserving and validating the fixity values upon transfer.
In working with the Preservica software and cloud-based tools, it was identified that our collections were growing to a point where we could no longer export them solely using regular methods.
This is where the Preservica API comes into play, enabling the export of large or infinite volumes of data as everything is converted into a micro transaction. For enabling these transactions, a framework of generic functions has been established.
Process Overview
Function Summary and Explanation
Get-PreservicaStructure
This function starts at the root of the active Preservica Instance and crawls down through each Structure Object (folder), mapping it to a hash table for future use.
When retrieving objects, if you know the associated Structure Objects (SO) you can link them to the hash table (Key+Value pair collection) and retrieve the hierarchy where it belongs (the full folder path).
Convert-ArrayToHashMap
This function reads an array object and converts it to a Hash Table (or dictionary object) for fast access.
The function removes the need to search the array for a given key, and instead maps all the keys with their relevant values.
Get-PreservicaObjectsFromFolder -so_ref $root_SO
This function requires a reference to a Structure Object (SO). The function will crawl down and collect any Information Object (IO) and Collection Object (CO) it finds to store them in an array.
Get-PreservicaUpdatedObjects -sinceDate $sinceDate;
This function retrieves all objects that were updated since the date specified.
Get-PreservicaMetadataForObjects
This function populates the provided array of objects with relevant metadata (generic) from Preservica’s XIP schema, such as:
'fileName' = $fileName
'fileSize' = $fileSize
'fixityAlgorithm' = $fixityAlgortihm
'fixityValue' = $fixityValue
'content' = $downloadURL
Get-PreservicaParentDataForObjects
This function improves the data by associating each object with its parent object.
$row.path = $path;
$row.io_ref = $io_ref;
$row.securityTag = $securityTag
$row.so_ref = $so_ref;
Get-PreservicaEventLogs
This function extracts the last 100 events for each object.
Get-dataFromMODSMetaDataFragment
This function extracts specific metadata stored within a MODS fragment for each object. In this case, OCLC refers to LAC’s library cataloguing solution:
'link' = $link – OCLC reference number
'link_source' = $link_source – the source name – being OCLC in this case
'language' = $language – language of the publication
Get-PreservicaObjects
This function creates the folder structure for each object and downloads the files.
Source Code
The source code for the project can be found at https://github.com/lac-preservica/dams.
More updates from Preservica
Texas State Library and Archives Commission - Preservica APIs in practice
Brian Thomas, Electronic Records Specialist at Texas State Library and Archives Commission (TSLAC), considers why they started using Preservica APIs, the most common ways these are used and the steps they use to work out new processes.
Brian Thomas
October 14th, 2020
Open API library and latest developments on GitHub
Visit the Preservica GitHub page for our extensive API library, sample code, our latest open developments and more.
Protecting the world’s digital memory
The world's cultural, economic, social and political memory is at risk. Preservica's mission is to protect it.