• API Reference
  • Upload Content (S3 Compatible)

Upload Content (S3 Compatible)

The API web application provides a subset of S3 endpoints for content upload including support for multi-part upload. This allows the use of 3rd party components such as the Amazon S3 Command Line Interface (CLI) or Software Development Kit (SDK) to perform the upload.

Packages or files are uploaded into the root of tenant specific buckets and then automatically processed by ingest or accrual workflows. Detailed information on the S3 REST API can be found on the Amazon website (https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html).

S3 Endpoints

The S3 compatible endpoints are available at the following location:

http://server.com/api/s3/buckets

This URL is normally configured in 3rd party applications to access the Preservica S3 endpoints.

All endpoints require an access token within a Preservica-Access-Token header or embedded within the credential value (AWS Access Key) of the Authorization header. The endpoints do not require the AWS Secret Key which should be set to NOT_USED. Some endpoints require additional user metadata as headers (prefixed with x-amz-meta-) to complete successfully.

Note: Only the AWS V4 Authorization header format is supported (e.g. AWS4-HMAC-SHA256 Credential=preservica_access_token/b/c/s3/aws4_request, SignedHeaders=headers, Signature=signature).

The following S3 endpoints are supported:

Endpoint Type Parameters Metadata Description
/api/s3/buckets GET List buckets
/api/s3/buckets/bucket/key HEAD Check key exists
/api/s3/buckets/bucket/key PUT structuralobjectreference Upload package file
/api/s3/buckets/bucket/key DELETE Delete file
/api/s3/buckets/bucket/key POST uploads structuralobjectreference Initiate multi-part upload
/api/s3/buckets/bucket/key PUT uploadId=upload_id Upload part
/api/s3/buckets/bucket/key POST uploadId=upload_id Complete multi-part upload
/api/s3/buckets/bucket/key DELETE uploadId=upload_id Abort multi-part upload

Buckets

Tenant specific buckets exist for package upload:

http://server.com/api/s3/buckets/tenant.package.upload

In each case, the body should contain the file content.

Note: An Ingest (v6) Workflow Context must be active for the package upload requests to be successful.

The upload is handled synchronously but the ingest into Preservica will be processed asynchronously. A progress token is provided in Preservica-Progress-Token header that can be used to check the status of the ingest (see Chapter 4).

For example, the PUT request:

http://server.com/api/s3/buckets/ee.package.upload/sample.zip

with header (access token not included):

x-amz-meta-structuralobjectreference=192b1095-be91-4b9c-a5d6-db73bcdf5b96

Note: collectionreference can be used instead of structuralobjectreference for backward compatibility

will perform the following:

• Upload the sample.zip file;

• Start a workflow to ingest the extract content within the zip into the folder within the EE tenancy.

and return an OK (200) response code with ETag and Preservica-Progress-Token headers.

Multi-part

To upload large packages or files the multi-part endpoints should be used for improved upload speed and the ability to recover from network issues. Multi-part can also be used to ingest growing files as the client can upload parts when they are ready. An example of the normal sequence of requests for a multi-part upload is:

Initiate the multi-part upload with the POST request:
http://server.com/api/s3/buckets/ee.package.upload/sample.zip?uploads

with header (access token not included):

x-amz-meta-structuralobjectreference=192b1095-be91-4b9c-a5d6-db73bcdf5b96

Note: collectionreference can be used instead of structuralobjectreference for backward compatibility

returns the response:

<InitiateMultipartUploadResult xmlns="http://s3.amazonaws.com/ doc/2006-03-01/">
	<Bucket>ee.package.upload</Bucket>
	<Key>sample.zip</Key>
	<UploadId>sample.zip-1681a25a-c8f2-4ffc-bbed-8c7083d2fe32</UploadId>
</InitiateMultipartUploadResult>
Upload part 1 with the PUT request:
http://server.com/api/s3/buckets/ee.package.upload/sample.zip?uploadId=sample.zip-1681a25ac8f2-4ffc-bbed-8c7083d2fe32&partNumber=1
Upload part 2 with the PUT request:
http://server.com/api/s3/buckets/ee.package.upload/sample.zip?uploadId=sample.zip-1681a25ac8f2-4ffc-bbed-8c7083d2fe32&partNumber=2
Complete the multi-part upload with the POST request:
http://server.com/api/s3/buckets/ee.package.upload/sample.zip?uploadId=sample.zip-1681a25ac8f2-4ffc-bbed-8c7083d2fe32

returns the response:

<CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
	<Location>http://server.com/api/s3/buckets/ee.package.upload/sample.zip</Location>
	<Bucket>ee.package.upload</Bucket>
	<Key>sample.zip</Key>
	<ETag>temp-etag-ee.file.accrual-sample.zip-progressToken-c4d92368-c0da-11e8-a355-529269fb1459</ETag>
</CompleteMultipartUploadResult>

The upload is handled synchronously but the ingest into Preservica will be processed asynchronously. A progress token is provided in the Preservica-Progress-Token header that can be used to check the status of the ingest (see Chapter 4). The progress token is additionally appended to the end of the eTag in the XML response for multi-part upload, pre-pended by "-progressToken-" (e.g. temp-etag-bucketnameobjectKey-progressToken-02e3734f-6707-483d-aee3-4f55c4cb9efb).

Important note: The endpoint to complete a multi-part upload may take several minutes to complete. The server will return a status code immediately and send whitespace characters as the file is assembled to keep the connection alive. The XML response that will be sent once the assembly is complete must be checked to ensure the request was successful. This closely matches the AWS S3 implementation (see https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadComplete.html).

A multi-part upload which has been initiated but never completed will be removed automatically by the system after approximately 7 days.

AWS S3 Command Line Interface (CLI)

The AWS CLI is a tool provided by Amazon to manage AWS services. It is available for most operating systems and can be downloaded from https://aws.amazon.com/cli/.

The tool can be used by a client to simplify using multi-part upload. The tool will use multi-part upload when required (when the file to upload is large) but the client does not need to be concerned with the process.

For example, the script fragment:

set AWS_ACCESS_KEY_ID=163202a5-217b-459d-b3c1-caf6f0023ec4
set AWS_SECRET_ACCESS_KEY=NOT_USED
aws --endpoint-url http://server.com/api/s3/buckets s3 cp --metadata
structuralobjectreference=22b47084-c365-4547-919f-0f3985cb1b97
"1GB_Package.zip" s3://ee.package.upload

will output:

Completed 70.2 MiB/960.9 MiB (62.1 MiB/s) with 1 file(s) remaining

and perform the following:

  • Initiate a multi-part upload for the 1GB package
  • Split the package into parts
  • Upload the parts in parallel
  • Complete the multi-part upload
  • Start a workflow to ingest the package into the folder within the EE tenancy.

Once the upload process is complete the tool will output:

upload: .\1GB_Package.zip to s3://ee.package.upload/1GB_Package.zip

Note: The AWS_ACCESS_KEY_ID environment variable must be set to a valid Preservica Access Token.

Preservica on Github

Open API library and latest developments on GitHub

Visit the Preservica GitHub page for our extensive API library, sample code, our latest open developments and more.

Preservica.com

Protecting the world’s digital memory

The world's cultural, economic, social and political memory is at risk. Preservica's mission is to protect it.