DuraCloud Storage and Ingestion Options

Storage Options | Methods of Ingestion

DuraCloud™ @TDL (currently) offers three storage options for members — two in the Amazon Cloud and one at the Texas Advanced Computing Center (TACC). By default, content ingested into a DuraCloud™ @TDL space is stored in Amazon S3 storage; copies of the content can be stored in Amazon Glacier and at TACC. (Members who do not want to use S3 storage for the long term may request to have the original copy removed from Amazon S3 storage.)

DuraCloud™ @TDL will, in the future, serve as an ingestion point for content that will be replicated to the Digital Preservation Network.

Storage Options

 Amazon S3

Amazon S3 (Simple Storage Services) is secure, durable, highly-scalable object storage in the Amazon cloud. Amazon S3 is the default storage option for content ingested into DuraCloud™ @TDL, and it is “high-availability” storage, meaning that content stored here can be retrieved with relative ease.

Members who do not wish to store their content in Amazon S3 over the long term can request that it be copied over to other storage options (such as Amazon Glacier) and deleted from S3.

The Amazon S3 service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon S3 synchronously stores your data across multiple facilities before returning SUCCESS. In addition, Amazon S3 calculates checksums on all network traffic to detect corruption of data packets when storing or retrieving data. Unlike traditional systems which can require laborious data verification and manual repair, Amazon S3 performs regular, systematic data integrity checks and is built to be automatically self-healing.

Costs:

  • Storage:  $360 per TB/ year
  • Data In:  $0
  • Data Out:  $120 per TB

Amazon Glacier

Amazon Glacier is a low-cost cloud archive storage service that provides secure and durable storage for data archiving and online backup. In order to keep costs low, Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable.  The Texas Digital Library utilizes AWS Glacier spaces located within the continental United States.

The Glacier service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Unlike traditional systems that can require laborious data verification and manual repair, Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing.

Costs:

  • Storage:  $120 per TB/ year
  • Data In:  $0
  • Data Out:  $120 per TB

iRODS at the Texas Advanced Computing Center (TACC)

The Texas Digital Library has established an instance of IRODS at the Texas Advanced Computing Center (TACC).  IRODS at TACC enables users to store their material within the geographic boundaries of Texas and rely upon the computing services of the Academy.  Policies within the TACC iRods system are established by TACC and TDL personnel.

Costs:

  • Storage:  $205 per TB/ year
  • Data In:  $120
  • Data Out:  $120

Ingestion Modes and Resources

Content Ingestion

Members may manage ingestion and retrieval of items from the DurAdmin administrative interface or through the DuraSync tool.

  • DurAdmin: DuraCloud™ supports uploading files through file selection or drag-and-drop via the web-based DuraCloud™ administrative interface. However, this method requires you to initiate the upload for each file or set of files every time you would like to update them.  Also, this web-based administrator does not allow you to upload whole directories at a time. The DuraCloud™ @TDL administrative interface is located at https://dcloud.tdl.org/duradmin/.
  • DuraCloud Sync: This application allows you to continuously copy files from any number of local folders to a DuraCloud™ space.  As you add, update, and delete files locally these changes will be automatically propagated to the cloud. You can use the tool in two different modes: GUI mode or via a command-line interface.

Instructions for setting up and using these tools are available HERE.

DuraSync GUI Mode

To run in GUI Mode, users can download and install a DuraCloud™ @TDL java tool. Once the installation is complete, the application will guide you through the setup.

DuraSync Command-line mode

Command-line mode is useful for those users running in a server environment, for those who want to run the Sync Tool in scripts, or for those who simply prefer a command line interface.

Special Considerations for Ingestion of Large Files: Chunker and Stitcher Tools in DuraCloud

Large Files may be broken up into “chunks” as they are ingested into DuraCloud™ @TDL.  As these files are retrieved, they will automatically be “stitched” back together.  Checksums and other resources will ensure that files retain their bit integrity.

Retrieval

DuraCloud Retrieval Tool (command-line tool)

The Retrieval Tool is a utility which is used to transfer (or “retrieve”) digital content from DuraCloud™ @TDL to your local file system.

Instructions for setting up and using the DuraCloud™ @TDL ingestion
and retrieval tools mentioned above are available HERE.

Quick Links

- DuraCloud™ @TDL Home
- Member Allowance and costs
- Content Preparation
- Metadata in DuraCloud
- Resource Library


Questions about setting up and using DuraCloud™ @TDL? Contact the TDL Helpdesk.


DuraCloud™ is open source software developed by DuraSpace.DuraSpace logo