OAI-PMH Data Provider
Preservica provides an OAI-PMH Data Provider implementation for metadata harvesting. This can be used by an external system, such as a cataloging application, to retrieve information about the hierarchy of logical objects - structural objects (folders) and information objects (assets) - in the archive.
Authentication
All requests to the Preservica OAI-PMH Data Provider require authentication. All requests must include a valid Preservica access token (see Chapter 6) in a Preservica-Access-Token HTTP header. Data returned from any method on the API will be based on the access permissions of the user for whom the access token was generated.
For backward compatibility reasons, the OAI-PMH Data Provider also supports HTTP basic authentication, where a valid Preservica user name and password are encoded into an HTTP Authorization header.
Unauthenticated access to the OAI-PMH Data Provider is no longer supported.
OAI-PMH Commands
The Preservica OAI-PMH Data Provider implements all the mandatory command verbs in the protocol. The full details are provided here:
https://www.openarchives.org/OAI/openarchivesprotocol.html#ProtocolMessages
The following sections provide any additional details specific to the Preservica implementation.
Identify
See: https://www.openarchives.org/OAI/openarchivesprotocol.html#GetRecord
This command returns basic information about the repository. Note that the Preservica OAI-PMH Data Provider uses the YYYY-MM-DDThh:mm:ssZ granularity level.
GetRecord
See: https://www.openarchives.org/OAI/openarchivesprotocol.html#GetRecord
This command returns the metadata for a single Preservica record, and requires two query parameters to be specified: identifier and metadataPrefix.
The value of the identifier query parameter indicates the required record. The value takes the form: oai:type:unique_id, where type is the type of record: so for a structural object, or io for an information object, and unique_id is the record’s unique identifier (typically a UUID value).
The value of the metadataPrefix query parameter indicates the required metadata format for the response; the value should be the name of the metadata schema as stored in Preservica. (Schemas can be registered in Preservica through the Administration interface; the registration process is fully covered in the Preservica Administration Guide.) So, if the XIP schema is stored in the system with the name XIP, the parameter should be specified as metadataPrefix=XIP.
To determine whether a record can be returned based on the specified metadata prefix, the Data Provider applies the following rules:
- If XIP is requested, the full XIP metadata record will be returned.
- If the XIP record in the database contains embedded metadata for the requested schema, the embedded metadata is returned; e.g. for metadataPrefix=MODS, if an XIP record contains embedded MODS metadata, that metadata will be returned.
- If oai_dc metadata is requested, but no oai_dc metadata is embedded in the record, the basic XIP fields of the record are transformed to an equivalent oai_dc record. (Support for oai_dc is a requirement of the OAI-PMH protocol.)
For example, the request:
http://server.com/OAI-PMH?verb=GetRecord&identifier=oai:so:62335e6b-6686-400d-80c9-82d1ac8f8c16&metadataPrefix=XIP
will return a single OAI record (as per the standard protocol) for the structural object with the identifier 62335e6b-6686-400d-80c9-82d1ac8f8c16, with the metadata in XIP format.
ListRecords
See: https://www.openarchives.org/OAI/openarchivesprotocol.html#ListRecords
This command allows the client to request metadata for a number of Preservica records; typically the client specifies a date range for harvesting, with the Data Provider returning the details of any records created, updated and/or deleted during that time.
For the Preservica Data Provider, each OAI-PMH record element returned contains three sections: header, metadata and about (see http://www.openarchives.org/OAI/openarchivesprotocol.html#Record).
- The header contains the unique identifier for the object.
- The metadata section contains an XML fragment conforming to the metadata schema as specified in the metadataPrefix argument of the ListRecords request. (The required schema is specified using a query parameter in the same way as for the GetRecord command.)
- The about section contains the unique identifier, the change type and the basic XIP metadata for the record.
Selective Harvesting
As part of a request using the ListRecords command, a client can include from and until timestamps in a request, using query parameters. This enables the client to avoid retrieving duplicates of data already received. The encoding format for dates is covered in http://www.openarchives.org/OAI/openarchivesprotocol.html#Dates.
The following example HTTP GET request made against the Preservica Data Provider harvests all records created/updated/deleted between 01/01/2010 and 04/08/2010 with descriptive metadata in Dublin Core:
http://server.com/OAI-PMH?verb=ListRecords&from=2010-01-01T00:00:00Z&until=2010-08-04T23:59:59Z&metadataPrefix=oai_dc
The example response to this request lists three records, two structural objects added to the repository, and one deleted:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xip="http://preservica.com/XIP/v6.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAIPMH.xsd">
<responseDate>2010-08-04T11:57:00Z</responseDate>
<request until="2010-08-04" from="2010-01-01" metadataPrefix="oai_dc" verb="ListRecords">http://server.com/OAI-PMH</request>
<ListRecords>
<record>
<header>
<identifier>oai:so:d580fd11-015b-4594-9fa8-ce2548ac4887</identifier>
<datestamp>2010-07-29T13:19:31Z</datestamp>
</header>
<metadata>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title>Folder 01</dc:title>
<dc:description>Folder 01 full description</dc:description>
<dc:type>StructuralObject</dc:type>
<dc:identifier>d580fd11-015b-4594-9fa8-ce2548ac4887</dc:identifier>
<dc:date>2010-07-29T13:19:31Z</dc:date>
</oai_dc:dc>
</metadata>
<about>
<aboutRecord xmlns="http://www.preservica.com/OAI-PMH/Extension">
<Identifier>oai:so:d580fd11-015b-4594-9fa8-ce2548ac4887</Identifier>
<ChangeType>Created</ChangeType>
<XIP xmlns="http://preservica.com/XIP/v6.0">
<StructuralObject>
<Ref>d580fd11-015b-4594-9fa8-ce2548ac4887</Ref>
<Title>Folder 01</Title>
<Description>Folder 01 full description</Description>
<SecurityTag>open</SecurityTag>
</StructuralObject>
</XIP>
</aboutRecord>
</about>
</record>
<record>
<header>
<identifier>oai:so:6e11a303-c851-4c11-9ac7-98023c169f9a</identifier>
<datestamp>2010-07-29T13:19:38Z</datestamp>
</header>
<metadata>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title>Folder 02</dc:title>
<dc:description>Folder 02 full description</dc:description>
<dc:type>StructuralObject</dc:type>
<dc:identifier>6e11a303-c851-4c11-9ac7-98023c169f9a</dc:identifier>
<dc:relation>d580fd11-015b-4594-9fa8-ce2548ac4887</dc:relation>
<dc:date>2010-07-29T13:19:38Z</dc:date>
</oai_dc:dc>
</metadata>
<about>
<aboutRecord xmlns="http://www.preservica.com/OAI-PMH/Extension">
<Identifier>oai:so:6e11a303-c851-4c11-9ac7-98023c169f9a</Identifier>
<ChangeType>Created</ChangeType>
<XIP xmlns="http://preservica.com/XIP/v6.0">
<StructuralObject>
<Ref>6e11a303-c851-4c11-9ac7-98023c169f9a</Ref>
<Title>Folder 02</Title>
<Description>Folder 02 full description</Description>
<SecurityTag>open</SecurityTag>
<Parent>d580fd11-015b-4594-9fa8-ce2548ac4887</Parent>
</StructuralObject>
</XIP>
</aboutRecord>
</about>
</record>
<record>
<header status="deleted">
<identifier>oai:so:d68e4b99-bb6f-4ba7-b1be-ab35b87a6270</identifier>
<datestamp>2010-08-02T14:23:49Z</datestamp>
</header>
</record>
</ListRecords>
</OAI-PMH>
Flow Control
Selective harvesting using the ListRecords command can potentially return large amounts of data (depending on the arguments); if too much data is returned there is a danger of an HTTP timeout. OAI-PMH allows for the partitioning (i.e. paging) of data in response to a ListRecords command via flow control (see http://www.openarchives.org/OAI/openarchivesprotocol.html#FlowControl). With flow control, the data provider replies to a ListRecords request with an incomplete list of records and a resumption token. The resumption token is passed as an argument in the next ListRecords request by the client (as a query parameter), to retrieve the next batch (page) of records.
For example, the following request:
http://server.com/OAI-PMH?verb=ListRecords&from=1990-01-01T00:00:00Z&until=2010-08-04T23:59:59Z&metadataPrefix=oai_dc
will return a very large number of records. The Preservica OAI-PMH Data Provider has a deployment property that allows the maximum number of records in a single response to be configured. (This is initially set to 200 records, but can be modified as required.) If this record limit is lower than the number of records to be returned, the response will include a resumption token at the end of the message:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xip="http://preservica.com/XIP/v6.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAIPMH.xsd">
<responseDate>2010-08-04T11:57:00Z</responseDate>
<request until="2010-08-04" from="1990-01-01" metadataPrefix="oai_dc" verb="ListRecords">http://server.com/OAI-PMH</request>
<ListRecords>...
<resumptionToken>metadataPrefix%3Doai_dc%26 from %3D1990-01-01T00%3A00%3A00Z%26until%3D2010-08-04T23%3A59%3A59Z%26change %3D170</resumptionToken>
</ListRecords>
</OAI-PMH>
The resumption token can then be used in a subsequent request to get the next "page" of results:
http://server.com/OAI-PMH?verb=ListRecords&resumptionToken=metadataPrefix%3Doai_dc%26from%3D2010-01-01T00:00:00%26Zuntil%3D2010-08-04T23:59:59Z%26change%3D170
Different Types of Record Changes
The core OAI-PMH protocol does not currently provide a direct means of identifying the type of change to a record (apart from deletion; see below).
New Records and Updated Records (Metadata and Structure)
The aboutRecord element with the OAI-PMH about element contains a ChangeType element that indicates the type of change and the reason the record is in the response: values for ChangeType can include - Created, Modified, Moved.
Deleted Records
The OAI-PMH protocol does provide a means of identifying records that have been deleted, using the status attribute of the record header (see http://www.openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords)
The Preservica OAI-PMH Data Provider implementation maintains information about deletions with no time limit, i.e. the "deleted record level" is persistent. As stated in the OAI-PMH protocol, this fact is returned in the deletedRecord element of the response to the Identify command.
An example of a deleted record is as follows:
<record>
<header status="deleted">
<identifier>oai:so:d68e4b99-bb6f-4ba7-b1be-ab35b87a6270</identifier>
<datestamp>2010-08-02T14:23:49Z</datestamp>
</header>
</record>
For a hard deleted record, there is no metadata, so the OAI-PMH record has a status attribute of "deleted" in the header tag.
ListIdentifiers
See: https://www.openarchives.org/OAI/openarchivesprotocol.html#ListIdentifiers.
This command is similar to ListRecords, except that each OAI-PMH record returned contains only the header section - i.e. there are no metadata or about sections. Note that even though metadata is not returned, this command still requires the metadataPrefix query parameter; identifiers are only returned for records that are available for the specified metadata schema.
The following example HTTP GET request uses the same arguments as the ListRecords example:
http://server.com/OAI-PMH?verb=ListIdentifiers&from=2010-01-01T00:00:00Z&until=2010-08-04T23:59:59Z&metadataPrefix=oai_dc
The example response to this request lists just the headers for the three records:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xip="http://preservica.com/XIP/v6.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAIPMH.xsd">
<responseDate>2010-08-04T11:57:00Z</responseDate>
<request until="2010-08-04" from="2010-01-01" metadataPrefix="oai_dc" verb="ListIdentifiers">http://server.com/OAI-PMH</request>
<ListIdentifiers>
<header>
<identifier>oai:so:d580fd11-015b-4594-9fa8-ce2548ac4887</identifier>
<datestamp>2010-07-29T13:19:31Z</datestamp>
</header>
<header>
<identifier>oai:so:6e11a303-c851-4c11-9ac7-98023c169f9a</identifier>
<datestamp>2010-07-29T13:19:38Z</datestamp>
</header>
<header status="deleted">
<identifier>oai:so:d68e4b99-bb6f-4ba7-b1be-ab35b87a6270</identifier>
<datestamp>2010-08-02T14:23:49Z</datestamp>
</header>
</ListIdentifiers>
</OAI-PMH>
ListMetadataFormats
See: http://www.openarchives.org/OAI/openarchivesprotocol.html#ListMetadataFormats.
This command allows a client to request the metadata formats supported by the Data Provider.
An example HTTP GET request is:
http://server.com/OAI-PMH?verb=ListMetadataFormats
When no identifier is specified, the response indicates the two metadata formats that are supported for all records: XIP and OAI Dublin Core (oai_dc). The response is:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xip="http://www.nationalarchives.gov.uk/XIP"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAIPMH.xsd">
<responseDate>2010-08-04T11:55:42Z</responseDate>
<request verb="ListMetadataFormats">http://server.com/OAI-PMH</request>
<ListMetadataFormats>
<metadataFormat>
<metadataPrefix>XIP</metadataPrefix>
<schema>http://preservica.com/XIP/v6.0/XIP.xsd</schema>
<metadataNamespace>http://preservica.com/XIP/v6.0</metadataNamespace>
</metadataFormat>
<metadataFormat>
<metadataPrefix>oai_dc</metadataPrefix>
<schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
<metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
</metadataFormat>
</ListMetadataFormats>
</OAI-PMH>
An optional identifier query parameter can be specified to request the metadata formats for a particular record:
http://server.com/OAI-PMH?verb=ListMetadataFormats&identifier=oai:so:6e11a303-c851-4c11-9ac7-98023c169f9a
The response lists XIP, plus the schemas for any descriptive metadata fragments associated with the record. (If these do not explicitly include oai_dc, that is also added). The response for this example, assuming the structural object has some EAD descriptive metadata, is:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xip="http://www.nationalarchives.gov.uk/XIP"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAIPMH.xsd">
<responseDate>2010-08-04T11:55:42Z</responseDate>
<request verb="ListMetadataFormats" identifier="oai:col:6e11a303-c851-4c11-9ac7-98023c169f9a">http://server.com/OAI-PMH</request>
<ListMetadataFormats>
<metadataFormat>
<metadataPrefix>XIP</metadataPrefix>
<schema>http://preservica.com/XIP/v6.0/XIP.xsd</schema>
<metadataNamespace>http://preservica.com/XIP/v6.0</metadataNamespace>
</metadataFormat>
<metadataFormat>
<metadataPrefix>ead</metadataPrefix>
<schema>http://www.loc.gov/ead/ead.xsd</schema>
<metadataNamespace>urn:isbn:1-931666-22-9</metadataNamespace>
</metadataFormat>
<metadataFormat>
<metadataPrefix>oai_dc</metadataPrefix>
<schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
<metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
</metadataFormat>
</ListMetadataFormats>
</OAI-PMH>
ListSets
Note: Sets are an optional feature in the OAI-PMH protocol and are not currently implemented by the Preservica Data Provider.
Error Reporting
Since the OAI-PMH protocol uses HTTP for transport, if the data provider is unavailable, the client will receive an appropriate HTTP error response (e.g. 400 for bad request, 401 for unauthorised, 404 for not found, 500 for internal server error, etc.).
In addition, OAI-PMH defines a number of additional error responses.
For full details of all OAI-PMH errors, see
http://www.openarchives.org/OAI/openarchivesprotocol.html#HTTPResponseFormat, and http://www.openarchives.org/OAI/openarchivesprotocol.html#ErrorConditions.
Common Errors
Bad Verb
For an unknown / malformed command verb, the response contains the badVerb error:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xip="http://www.nationalarchives.gov.uk/XIP"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAIPMH.xsd">
<responseDate>2010-08-04T14:34:17Z</responseDate>
<request>http://server.com/OAI-PMH</request>
<error code="badVerb">Illegal OAI-PMH verb</error>
</OAI-PMH>
Missing Arguments
If the request includes illegal arguments or is missing required arguments, the badArgument error is returned:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xip="http://www.nationalarchives.gov.uk/XIP"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAIPMH.xsd">
<responseDate>2010-08-04T14:42:08Z</responseDate>
<request verb="ListRecords">http://server.com/OAI-PMH</request>
<error code="badArgument">Required OAI-PMH argument is missing</error>
</OAI-PMH>
GetRecord Errors
A number of errors can be returned in response to a GetRecord request.
If the value of the identifier argument for a GetRecord request is unknown, the idDoesNotExist error is returned:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xip="http://www.nationalarchives.gov.uk/XIP"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAIPMH. xsd">
<responseDate>2010-08-04T14:53:27Z</responseDate>
<request identifier="oai:so:ed52b531-5954-4df3-83c4-e586700ab11e" verb="GetRecord">http://server.com/OAI-PMH</request>
<error code="idDoesNotExist">oai:so:ed52b531-5954-4df3-83c4-e586700ab11e is a valid OAI-PMH identifier, but does not map to an item in this repository</error>
</OAI-PMH>
If the value of the metadataPrefix argument is not supported by the repository, the cannotDisseminateFormat error is returned:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xip="http://www.nationalarchives.gov.uk/XIP"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAIPMH.xsd">
<responseDate>2010-08-04T14:44:09Z</responseDate>
<request metadataPrefix="mads" verb="GetRecord">http://server.com/OAIPMH</request>
<error code="cannotDisseminateFormat">Unknown metadata format</error>
</OAI-PMH>
If the metadataPrefix is recognised, but metadata is not available in that format, the cannotDisseminateFormat error is returned again but with a different message.
ListRecords Errors
A number of errors can be returned in response to a ListRecords request.
If the value of the metadataPrefix argument is not supported by the repository, the cannotDisseminateFormat error is returned:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xip="http://www.nationalarchives.gov.uk/XIP"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAIPMH.xsd">
<responseDate>2010-08-04T14:44:09Z</responseDate>
<request metadataPrefix="mads" verb="ListRecords">http://server.com/OAIPMH</request>
<error code="cannotDisseminateFormat">Unknown metadata format</error>
</OAI-PMH>
If the combination of the values of the from, until and metadataPrefix arguments results in an empty list, the noRecordsMatch error is returned:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xip="http://www.nationalarchives.gov.uk/XIP"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAIPMH.xsd">
<responseDate>2010-08-04T14:46:09Z</responseDate>
<request until="1990-08-04" from="1990-01-01" metadataPrefix="oai_dc" verb="ListRecords">http://server.com/OAI-PMH</request>
<error code="noRecordsMatch"/>
</OAI-PMH>
ListRecords Errors
If the combination of the values of the from, until and metadataPrefix arguments results in an empty list, the noRecordsMatch error is returned:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xip="http://www.nationalarchives.gov.uk/XIP"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAIPMH.xsd">
<responseDate>2010-08-04T14:46:09Z</responseDate>
<request until="1990-08-04" from="1990-01-01" metadataPrefix="oai_dc" verb="ListRecords">http://server.com/OAI-PMH</request>
<error code="noRecordsMatch"/>
</OAI-PMH>
ListMetadataFormats Errors
If the value of the identifier argument for a ListMetadataFormats request is unknown, the idDoesNotExist error is returned:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xip="http://www.nationalarchives.gov.uk/XIP"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAIPMH.xsd">
<responseDate>2010-08-04T14:53:27Z</responseDate>
<request identifier="oai:so:ed52b531-5954-4df3-83c4-e586700ab11e" verb="ListMetadataFormats">http://server.com/OAI-PMH</request>
<error code="idDoesNotExist">oai:so:ed52b531-5954-4df3-83c4-e586700ab11e is a valid OAI-PMH identifier, but does not map to an item in this repository</error>
</OAI-PMH>
Open API library and latest developments on GitHub
Visit the Preservica GitHub page for our extensive API library, sample code, our latest open developments and more.
Protecting the world’s digital memory
The world's cultural, economic, social and political memory is at risk. Preservica's mission is to protect it.