OPEX

Creating complex assets within an OPEX

Jody Palmer

January 11th, 2023

Up to now users needed to create PAX zips. Users would need to create a zipped folder structure, with optional XIP for fixity information. In Preservica 6.6.1 we are adding the ability to use PAX folders without having to zip them up, and also without having to use XIP metadata when adding fixity information.

Up to now users needed to create PAX zips. Users would need to create a zipped folder structure, with optional XIP for fixity information, like in this example (see Using OPEX and PAX for Ingesting Content):

folder/
	folder.opex
	folder1/
		folder1.opex
		pamphlet.pax.zip.opex
		pamphlet.pax.zip/
		   pamphlet.xip
			Representation_Preservation/
				page1/
					page1.tiff
				page2/
					page2.tiff
			Representation_Access/
				programme/
					programme.pdf
		test_doc.pdf
		test_doc.pdf.opex

In Preservica 6.6.1 we are adding the ability to use PAX folders without having to zip them up, and also without having to use XIP metadata when adding fixity information OPEX is being updated to support this with :

  • fixity paths
  • manifest paths
  • asset folders

PAX folders are a specific implementation of these which will be supported by Preservica

OPEX with Asset Folders

To support folder assets there would need to be a common way to identify them. Currently OPEX metadata files for content files sit next to them, and OPEX metadata files for folders sit inside them; we have to change this relationship slightly. In OPEX 1.2 if a folder has OPEX metadata as a sibling, then that folder must be treated as an asset. This means there is no requirement on the structure, content or metadata within a folder asset, just adding an OPEX metadata file alongside it means the following structure would be valid within the OPEX definition. We don’t place requirements on OPEX consumers to support any particular asset folder packages. A consumer should indicate if it supports a given folder asset structure and report an error if an unsupported asset folder structure is found. This allows OPEX to remain agnostic to the definition of an asset folder that may be heavily tied to the consuming system.

OPEX Schema Update

The OPEX schema has been extended to allow a OPEX metadata file for asset folders so that a single OPEX metadata file can describe the content inside an asset folder.

  • The OPEX Manifest now supports relative paths inside an OPEX asset folder
  • Fixities can specify what file they refer to in a relative path attribute inside the OPEX asset folder

Like all parts of OPEX these elements are not mandatory, but they are highly recommended in order to guarantee that when errors occur during the transfer of data across networks or between devices, consumers are able to stop processing and report this to a user. If a manifest element is added to OPEX metadata it must be refer to all files and folders to be ingested, which is already the case for non asset folders.

Unzipped PAX are treated as OPEX Asset Folders

OPEX processors can in principle support any asset folder structure, but Preservica can’t support arbitrary folder structures. The Preservica processing of OPEX asset folders will support unzipped PAX folder structures in Preservica 6.6.1 (the PAX format is described at Using OPEX and PAX for Ingesting Content). In order to be treated as a PAX by Preservica the folder name must end in .pax (lower case). The following unzipped PAX in the pamphlet.pax folder is a valid example of an OPEX asset folder.

folder/
	folder.opex
	folder1/
		folder1.opex
		pamphlet.pax.opex (sibling opex meaning that folder should be treated as an asset)
		pamphlet.pax/
		  Representation_Preservation/
				page1/
					page1.tiff
				page2/
					page2.tiff
			Representation_Access/
				programme/
					programme.pdf

and the corresponding .opex (note that all sections of the file are optional)

<OPEXMetadata xmlns="http://www.openpreservationexchange.org/opex/v1.2"> 
       <Transfer>
             <SourceID>de6b04b2-d2e4-4ac7-b869-eced1ac0faea_795449a2-9e3a-40b3-9fec-32bb1cb252bb</SourceID>
             <Fixities>
                   <Fixity path="Representation_Preservation/page1/page1.tiff" type="SHA-1" value="59DDADED34E12AEC38706DE8A6F3EDBCF2E93E9C"/>
                   <Fixity path="Representation_Preservation/page2/page2.tiff" type="SHA-1" value="59DDADED34E12AEC38706DE8A6F3EDBCF2E93E9C"/>
                   <Fixity path="Representation_Access/programme/programme.pdf" type="SHA-1" value="59DDADED34E12AEC38706DE8A6F3EDBCF2E93E9C"/>
             </Fixities>
             <Manifest>
                   <Folders>
                         <Folder>Representation_Preservation</Folder>
                         <Folder>Representation_Preservation/page1</Folder>
                         <Folder>Representation_Preservation/page2</Folder>
                         <Folder>Representation_Access</Folder>
                         <Folder>Representation_Access/programme</Folder>
                   </Folders>
                   <Files>
                         <File type="content">Representation_Preservation/page1/page1.tiff</File>
                         <File type="content">Representation_Preservation/page2/page2.tiff</File>
                         <File type="content">Representation_Access/programme/programme.pdf</File>
                   </Files>
             </Manifest>
       </Transfer>
       <Properties>
             <Title>Pamphlet</Title>
             <Description></Description>
             <SecurityDescriptor>open</SecurityDescriptor>
       </Properties>
       <DescriptiveMetadata>
             <Item xmlns="http://fake.co.uk">
                   <Name>Pamphlet</Name>
             </Item>
       </DescriptiveMetadata>
 </OPEXMetadata>

Existing OPEX producers

If you’re already creating OPEX, with or without PAX you don’t need to change the way you do things. However, if you want to start using complex PAX structure, or you want to simplify your existing ways of working with PAX packages, the OPEX 1.2 schema will be supported in Preservica 6.6.1 along with support for PAX packages as asset folders.

Preservica on Github

Open API library and latest developments on GitHub

Visit the Preservica GitHub page for our extensive API library, sample code, our latest open developments and more.

Preservica.com

Protecting the world’s digital memory

The world's cultural, economic, social and political memory is at risk. Preservica's mission is to protect it.