Getting Started

Using the PAR API to create Custom Migrations

Jack O'Sullivan

August 11th, 2020

Since the release of v6, Preservation Actions within Preservica have been defined and controlled using a PAR (Preservation Action Registries) data model. To facilitate this, Preservica’s registry also exposes a PAR API to allow a full range of CRUD operations on this data. This API also makes it possible to write new migration actions using Preservica’s existing toolset, for example, to introduce re-scaling to your image/video migrations, or to get different output formats altogether. In this article, we will introduce the key concepts in this data model, explain how Preservica uses and interpret them, and introduce the API calls required to create your own custom actions. We will do this by a worked example, using ImageMagick to create a custom “re-size migration” for images.

You should take care when making changes to your Preservica Registry. Changes to the Registry will affect the functionality of Preservica, particularly in relation to preservation actions such as characterisation and validation of content, and the available migration options. Changes to Tools, Preservation Actions and Business Rules that are already present in the Registry (as opposed to new custom Tools, Preservation Actions and Business Rules that you create) may be overwritten in future Preservica upgrades without warning. If you’re in doubt about the consequences of any changes you wish to make, you should contact Preservica support.

This functionality is only available in the EPC and EoP editions of Preservica. To successfully perform the create/update calls described in this article, you will need the credentials for a user with the “SDB_REGISTRY_ADMIN_USER” role. For EPC customers, this role can only be obtained by contacting Preservica support, for EoP customers, you will need to discuss this with your system admins in charge of your user directory.

Endpoints

The PAR API endpoint is part of the Registry. In this article we will use the Registry at https://demo.preservica.com/Registry, for any URLs listed here, you should substitute your Registry’s actual location for this part. All requests and endpoints that form part of the PAR API return JSON formatted data (excepting deletion requests, which return no content).

All read-only PAR endpoints are unauthenticated, meaning you don’t need to supply any Preservica credentials to access them. Because you don’t have to specify any headers or parameters, the read-only endpoints are easy to use with a browser, as well as any specific API tools such as Postman or SOAP UI.

The write requests (Create, Update and Delete) all require authentication using HTTP Basic Auth, with the Preservica credentials of a user with the relevant Registry Admin role.

The endpoints are documented at:

https://demo.preservica.com/Registry/par/documentation.html

Each core entity type in the PAR data model has it’s own listing endpoint at https://demo.preservica.com/Registry/par/{entity-name}, for example, to get a listing of Tools on Preservica’s Demo system we can send a GET request to:

https://demo.preservica.com/Registry/par/tools

A new entity can be created by sending a POST to this same endpoint, with the JSON of the entity to create forming the POST’s body. It is advisable to include a “Content-Type” header with the value “application/json;charset=UTF-8” with this request.

Each actual entity can be accessed by appending the GUID of the PAR Identifier (see Identifiers below for more details) to the listing URL, for example, to get the entry for ImageMagick we can send a GET request to:

https://demo.preservica.com/Registry/par/tools/835e539a-abe2-55c1-b58b-8e68da664d8f

Each existing entity can be updated by sending a PUT request to this endpoint, with the JSON you want to define the entity forming the PUT’s body. Again, it is advisable to include a “Content-Type” header with the value “application/json;charset=UTF-8” with this request.

Existing entities can be deleted by sending a DELETE request to this endpoint.

Key Data Model concepts

PAR Identifiers

One of the key concepts of the Preservation Action Registries project is that it defines models and protocols for Registries, not a single master “registry”, therefore there is no concept of a single authoritative naming authority. This leads to a problem of ambiguity. If two different systems both implement a PAR API, and both describe Preservation Actions describing how to do something like “migrate-to-pdf”, but one describes this in terms of using LibreOffice and another in terms of using PostScript, then we would have conflicting Preservation Actions with the same name. One way of avoiding this ambiguity is to give every entity a UUID identifier, which is guaranteed to be unique (or at least, wildly unlikely to not be unique!), however this is not very human-friendly.

The compromise is to use a three part Identifier object, consisting of a GUID, a name, and a namespace within which the name is asserted to be unique. Using a name and namespace approach, we can generate Type-5 GUIDs such that systems can define entities with any name they like, without fear of ambiguity with other systems.

Tools

Simply speaking, this is just a description of a particular piece of software that you want to use to perform some action. If you have third party software that you wish to run, it is possible to define custom entries and add them to your Registry, however for security reasons it is currently only possible to execute the tools Preservica already uses. The good news is that the migration Tools that Preservica currently wraps are very flexible and can be used for a wide variety of purposes beyond the actions already defined.

The full range of Tools Preservica currently wraps can be found through a simple HTTP GET request to the endpoint:

https://demo.preservica.com/Registry/par/tools

Most of the Tools that Preservica currently wraps, and all of the migration tools, are command line interfaces.

Preservation Actions

The next entity to consider is the Preservation Action. This is where we define how we actually want to use the tool. Preservation Actions should provide all parameters required to achieve the desired behaviour, including prototype values for and input and output files, in the order required, and the type of Preservation Action that is being performed. (The list of Preservation Action Types can be found at https://demo.preservica.com/Registry/par/preservation-action-types). If you are defining a Property Extraction or Validation action, you also need to define how the output of the Tool can be parsed to extract the values required.

This is perhaps best illustrated with the simplest example, migrating image files to JPG using ImageMagick.

For example, with a TIFF file called “my_input_image.tiff”, on the command line, you can achieve this by simply running:

convert my_image.tiff my_image.jpg

In this example, the only parameters are the input file name, and output file name. These have to be specified in this order. ImageMagick infers from the “.jpg” in the output file name, that you want to convert your input to JPG.

The corresponding Preservation Action in Preservica can be found at https://demo.preservica.com/Registry/par/preservation-actions/eb870704-7f39-52aa-a2dc-6ead71a314b8

The definition consists of 72 lines of JSON:

{
    "constraints": [
        {
            "allowedFormats": [],
            "allowedPropertiesAllOf": [],
            "allowedPropertiesAnyOf": []
        }
    ],
    "description": "Migrate Image files to JPG using ImageMagick",
    "example": "convert ${inputfile} ${outputfile}",
    "id": {
        "guid": "eb870704-7f39-52aa-a2dc-6ead71a314b8",
        "name": "migrate-to-jpg-imagemagick",
        "namespace": "http://par.preservica.com"
    },
    "inputFiles": [
        {
            "description": "File that will be acted upon",
            "file": {
                "filepath": ""
            },
            "name": "inputfile.$ext"
        }
    ],
    "inputToolArguments": [
        {
            "description": "Input File Name",
            "type": "command_line",
            "value": "${inputfile}"
        },
        {
            "description": "Output File Name",
            "type": "command_line",
            "value": "${outputfile}"
        }
    ],
    "localLastModifiedDate": "2019-07-29T10:46:00Z",
    "outputFiles": [
        {
            "description": "File that will be created",
            "file": {
                "filepath": ""
            },
            "name": "inputfile.jpg"
        }
    ],
    "tool": {
        "id": {
            "guid": "835e539a-abe2-55c1-b58b-8e68da664d8f",
            "name": "imagemagick-cli",
            "namespace": "http://par.preservica.com"
        },
        "localLastModifiedDate": "2019-07-29T10:45:52Z",
        "toolLabel": "convert",
        "toolName": "ImageMagick CLI",
        "toolOperatingEnvironments": [
            "windows command line",
            "linux command line"
        ],
        "toolPublisher": "ImageMagick Studio LLC - http://www.imagemagick.org/",
        "toolVersion": "6.7.8-9"
    },
    "type": {
        "id": {
            "guid": "45bad06c-e2c0-5b46-b740-0cd2ed140a6f",
            "name": "mig",
            "namespace": "http://id.loc.gov/vocabulary/preservation/eventType/"
        },
        "label": "migration",
        "localLastModifiedDate": "2019-07-29T10:45:48Z"
    }
}

Breaking this down:

The “constraints” object allows you to constrain the Preservation Action such that it will only run on content that matches the constraint. Currently Preservica does not support this part of the PAR model and you can safely omit it from any POSTs or PUTs you make.

The “description” field allows you to describe in a human readable manner, and at a high-level, what the Preservation Action does.

The “example” should convey a technical description of the underlying mechanism. Here we can see it looks very similar to the command line instruction listed above.

The “id” field is a Par Identifier (see PAR Identifiers).

In this case, “inputFiles” is an array of one file. Preservica’s Preservation Actions use a convention of “inputfile.$ext”, this allows us to define other files that are named in relation to this, without being prescriptive about what that name or extension is . We can see how this works in this example by skipping down a few lines to “outputFiles”, again an array of one file. Here the name is “inputfile.jpg”. This means that whatever the name of the actual input (e.g. “my_image.tiff”) we will ask for an output that retains the basename (“my_image”) but uses the extension “jpg”. Most migration tools make formatting decisions based on the output file name specified, so this is an important point.

The “inputToolArguments” is an array of two things, as we saw in the command line instruction, the input file name, and the output file name, in that order. Each “argument” has a name and a type, the name is informational only, the type should specify how the argument is passed (in the case of command line tools, this should be “command_line”, and the value should be the literal value to use, except in a few special cases. In this case, ${inputfile} and ${outputfile} are prototype values, and we use this convention to tell Preservica to use the actual input and output names of the content it is currently applying against.

The “tool” and “type” are simply the entire Tool and Preservation Action Type entity definitions of the software to use, and the type of action this is.

The “localLastModifiedDate” field reports when this entity was last changed in this specific Registry. When creating or modifying any entity, you do not need to provide this.

Business Rules

The final key entity is the Business Rule, this is what Preservica actually uses to determine which Preservation Actions can be applied to particular content in a given context. Preservica uses Business Rules to constrain Preservation Actions to only be applicable to content of certain formats, and for certain types of action; this is best illustrated by example. The business rule for using the JPEG migration action can be found at https://demo.preservica.com/Registry/par/business-rules/53185e9f-d80d-5dde-ae3d-94dc849ee698

This definition is over 300 lines long, but the bulk of that is in the “formats” list (which is elided here for clarity).

{
  "id": {
    "guid": "53185e9f-d80d-5dde-ae3d-94dc849ee698",
    "name": "jpeg-migration-imagemagick",
    "namespace": "http://par.preservica.com"
  },
  "description": "Migrate image to JPEG using ImageMagick default settings",
  "formats": [{
      "guid": "1a22525d-75ff-52a8-8c53-c4e41873ff43",
      "name": "fmt/3",
      "namespace": "http://www.nationalarchives.gov.uk"
    }, {
      "guid": "698aab31-8e7a-5f1d-9f83-6f68a8908438",
      "name": "fmt/4",
      "namespace": "http://www.nationalarchives.gov.uk"
    },…
  ],
  "formatFamilies": [],
  "preservationActionTypes": [{
      "id": {
        "guid": "45bad06c-e2c0-5b46-b740-0cd2ed140a6f",
        "name": "mig",
        "namespace": "http://id.loc.gov/vocabulary/preservation/eventType/"
      },
      "label": "migration",
      "localLastModifiedDate": "2019-07-29T10:45:48Z"
    }
  ],
  "preservationActions": [{
      "optionalInputProperties": [],
      "outputPropertiesRetrieved": [],
      "outputFilesRetrieved": [],
      "preservationAction": {
        "guid": "eb870704-7f39-52aa-a2dc-6ead71a314b8",
        "name": "migrate-to-jpg-imagemagick",
        "namespace": "http://par.preservica.com"
      },
      "priority": 1
    }
  ],
  "notes": "Use ImageMagick with default settings for migration of image files to compressed JPEG",
  "localLastModifiedDate": "2020-04-08"
}

Breaking this down:

The “id” field is a PAR Identifier.

The “description” field is a high-level human readable summary of the intention of the Rule; this is what Preservica shows the end user in the GUI when asking them to select Business Rules to apply, so keeping this brief is advisable.

The “formats” field is an array of PAR Identifiers for the formats of content that this Rule applies to. In this example, we can see fmt/3 and fmt/4 (both variants of GIF) and in the full rule, most other image formats are also listed. This means that any GIF content (or in fact, most image content) is eligible for migration to JPEG using this Business Rule.

The “formatFamilies” field is an array of Format Family objects. The benefit of defining a Business Rule in terms of Format Families instead of individual formats is that changes to the Format Family mean that the evaluation of whether the rule applies to a specific format is automatically “updated”, i.e. you don’t have to edit the Business Rule itself.

The “preservationActionTypes” field is an array of Preservation Action Type entities. In this case, because conversion to JPEG is always lossy, we use the Business Rule to constrain this migration so that it cannot apply to “Normalizations” where a new Preservation copy of some content is being generated.

The “preservationActions” field is effectively an array of Preservation Actions to use. The “priority” field can be used to describe cases where the intent of the rule can be achieved by multiple different actions, and you want to specify an order in which they should be attempted. Currently, Preservica only supports using lower priority actions if the referenced Preservation Action entity isn’t in the Registry. If you are defining property extraction, you can specify multiple actions with the same priority to run all of the Preservation Actions (i.e., you can use two different tools to extract two different sets of properties).

The “notes” field is intended as a longer form human readable field than the description. This allows you to record in detail the rationale for the business rule.

Worked Example

In this worked example, we will walk through exactly what we need to specify so that was can have a Business Rule where our TIFF to JPEG migration automatically re-sizes the images so that they match the viewer size used in the Universal Access system. When images are displayed in that system, they are placed on a canvas 600px in height and 904px in width, so for images larger than that, we want to reduce the size so that the image fits into that box.

We can do this using ImageMagick, so we have no need for a new tool.

There are details on various ways of scaling/re-sizing using ImageMagick on their website, we need the example listed as “Only Shrink Larger Images”, i.e. the command line instruction for our operation is:

convert my_image.tiff -resize 904x600> my_image.jpg

Note, this command line instruction is valid for all (supported) image format inputs, so the Preservation Action can be described without needing to specify TIFF at all.

This is very similar to the default example above, but in this case we have two new command line parameters, “-resize” and “904x600>”.

Each of these will require a new “inputToolArgument” in our Preservation Action:

{
    "description": "Resize flag",
    "type": "command_line",
    "value": "-resize"
}

and

{
    "description": "Resize value",
    "type": "command_line",
    "value": "904x600>"
}

We will also require a new id for this Preservation Action. For this, we need to decide on a name, and what namespace we are asserting this is unique in. The name should be human readable, and for forward-compatibility, should be safe to use as part of a URL, for that reason, Preservica uses the convention of all lower case, and hyphens rather than spaces as word breaks. We also have a convention for preservation actions: the name starts with the type of action, ends with the tool name and is descriptive of what happens, so for this, a good example would be “migrate-to-ua-resized-jpg-imagemagick”.

All of Preservica’s Preservation Actions are declared in the namespace http://par.preservica.com (the namespace does not necessarily need to be a URL, and any URL used does not necessarily need to resolve to a real resource). To create a Type-5 GUID, we need a UUID representing the namespace, so in our code, http://par.preservica.com maps to a UUID that was initially randomly chosen. For this example, I’m going to use http://par-example.my-institution.org as the namespace, mapping to the random UUID “ecff79c8-5b8b-45e0-8a66-db854784e589”.

With that, we can use UUID Tools to generate our Type-5 GUID “460cdcc4-8025-5be9-85ab-5aba66f57738”.

Thus, the Preservation Action JSON we require will be:

{
    "description": "Migrate Image files to JPG using ImageMagick downsizing large images so that the resulting image fits inside a 904x600px box",
    "example": "convert ${inputfile} -resize 904x600> ${outputfile}",
    "id": {
        "guid": "460cdcc4-8025-5be9-85ab-5aba66f57738",
        "name": "migrate-to-ua-resized-jpg-imagemagick",
        "namespace": "http://par-example.my-institution.org"
    },
    "inputFiles": [
        {
            "description": "File that will be acted upon",
            "file": {
                "filepath": ""
            },
            "name": "inputfile.$ext"
        }
    ],
    "inputToolArguments": [
        {
            "description": "Input File Name",
            "type": "command_line",
            "value": "${inputfile}"
        },
        {
            "description": "Resize flag",
            "type": "command_line",
            "value": "-resize"
        },
        {
            "description": "Resize value",
            "type": "command_line",
            "value": "904x600>"
        },
        {
            "description": "Output File Name",
            "type": "command_line",
            "value": "${outputfile}"
        }
    ],
    "outputFiles": [
        {
            "description": "File that will be created",
            "file": {
                "filepath": ""
            },
            "name": "inputfile.jpg"
        }
    ],
    "tool": {
        "id": {
            "guid": "835e539a-abe2-55c1-b58b-8e68da664d8f",
            "name": "imagemagick-cli",
            "namespace": "http://par.preservica.com"
        },
        "toolLabel": "convert",
        "toolName": "ImageMagick CLI",
        "toolOperatingEnvironments": [
            "windows command line",
            "linux command line"
        ],
        "toolPublisher": "ImageMagick Studio LLC - http://www.imagemagick.org/",
        "toolVersion": "6.7.8-9"
    },
    "type": {
        "id": {
            "guid": "45bad06c-e2c0-5b46-b740-0cd2ed140a6f",
            "name": "mig",
            "namespace": "http://id.loc.gov/vocabulary/preservation/eventType/"
        },
        "label": "migration"
    }
}

We can now POST this JSON directly to the Preservation Actions endpoint:

Once we’ve done that, our new Preservation Action is ready to be used, so we now need to create the Business Rule that provides our constraints.

In this case, we want to limit the action to content identified as a TIFF (fmt/353), and because we’re not only compressing, but also changing the image dimensions, we only want to allow it for “migration”, not “normalization”.

Again, we will need an id, and again, Preservica has some naming conventions that we will continue. In the case of migrations, we use the output format, description of the type of migration and the tool used, so we will use “jpeg-migration-ua-resize-imagemagick”. With the same namespace considerations, our GUID is “ba9d5759-e4c2-5f94-9ada-36fb36c37208”.

We will need to know the PAR Identifier for TIFF (fmt/353), which we can find from any Business Rule involving TIFF, or from the file-format entry for TIFF itself (https://demo.preservica.com/Registry/par/file-formats/fmt/353).

We need the “Migration” Preservation Action Type, and the id for the Preservation Action we just created. All of which combine into the following JSON definition:

{
    "id": {
        "guid": "ba9d5759-e4c2-5f94-9ada-36fb36c37208",
        "name": "jpeg-migration-ua-resize-imagemagick",
        "namespace": "http://par-example.my-institution.org"
    },
    "description": "Migrate image to JPEG using ImageMagick with downsize of large images",
    "formats": [
        {
            "guid": "985814d4-147e-590e-b1cb-faff53bef136",
            "name": "fmt/353",
            "namespace": "http://www.nationalarchives.gov.uk"
        }
    ],
    "formatFamilies": [],
    "preservationActionTypes": [
        {
            "id": {
                "guid": "45bad06c-e2c0-5b46-b740-0cd2ed140a6f",
                "name": "mig",
                "namespace": "http://id.loc.gov/vocabulary/preservation/eventType/"
            },
            "label": "migration"
        }
    ],
    "preservationActions": [
        {
            "optionalInputProperties": [],
            "outputPropertiesRetrieved": [],
            "outputFilesRetrieved": [],
            "preservationAction": {
                "guid": "460cdcc4-8025-5be9-85ab-5aba66f57738",
                "name": "migrate-to-ua-resized-jpg-imagemagick",
                "namespace": "http://par-example.my-institution.org"
            },
            "priority": 1
        }
    ],
    "notes": "Use ImageMagick with resize to fit the UA image canvas (only resize images that are larger than the canvas dimensions) for migration of image files to compressed JPEG"
}

We can POST this directly to the Business Rules endpoint:

Once we’ve done that, we should be able to use this migration within Preservica.

With a large (4599x7351px) TIFF image:

The new Business Rule shows up in the drop down when we run the Create New Representation workflow:

And creates a 375x600px JPG:

With the details of the command line instruction recorded in the Event details:

More updates from Preservica

Getting Started

Custom Reporting via the Preservica Content API

Preservica provides a REST API to allow users to query the underlying search engine. In this article we will show how CSV documents can be returned by the API.

James Carr

November 29th, 2021

Getting Started

Using OPEX and PAX for Ingesting Content

Preservica has developed the concept of an OPEX (Open Preservation Exchange) package, a collection of files and folders with optional metadata, as a way to organise content into an easy to understand format for transfer into or out of a digital preservation system. Although we have created it, we hope suppliers of digital content to be preserved, and other digital preservation systems, will use it due to its simplicity.

Richard Smith

January 28th, 2021

Getting Started

Using Python with the Preservica Entity APIs (Part 3)

In this article we will be looking at API calls which create and update entities within the repository, some calls to add and update descriptive metadata and we will also look at the use of external identifiers which are useful if you want to synchronise external metadata sources to Preservica.

James Carr

June 11th, 2020

Getting Started

Using Python with the Preservica Entity APIs (Part 2)

In my previous article on using the Preservica Entity API with Python we looked at creating the authentication token used by all the web service calls and then showed how we could use the token to request basic information about the intellectual assets held in the Preservica repository.

James Carr

May 27th, 2020

Preservica on Github

Open API library and latest developments on GitHub

Visit the Preservica GitHub page for our extensive API library, sample code, our latest open developments and more.

Preservica.com

Protecting the world’s digital memory

The world's cultural, economic, social and political memory is at risk. Preservica's mission is to protect it.