S3 Backup design doc by mvandenburgh · Pull Request #2627 · dandi/dandi-archive

mvandenburgh · 2025-11-03T17:32:41Z

This PR lays out a design for S3 backup using S3 Replication and the Glacier Deep Storage class. Related #524

doc/design/s3-backup.md

We are no longer considering the use of a bucket in a different region.

We are expecting a bulk of 6PB over the next 5 years, not 30PB.

doc/design/s3-backup.md

Co-authored-by: Cody Baker <51133164+CodyCBakerPhD@users.noreply.github.com>

When the AWS docs say "GB", they mean 10^9 bytes, not 2^30. Co-authored-by: Cody Baker <51133164+CodyCBakerPhD@users.noreply.github.com>

Co-authored-by: Cody Baker <51133164+CodyCBakerPhD@users.noreply.github.com>

Clarify purpose of calculating the expected bucket storage cost covered by AWS already.

CodyCBakerPhD · 2025-11-04T20:11:39Z

doc/design/s3-backup.md

+$$
+
+while the associated backup costs would represent only an additional $`\$5900 / \$126000 \approxeq 4.6\%`$ of the cost of the storage itself.
+To help provide a significant level of safety to an important dataset, AWS may be willing to cover such a low marginal cost.


Suggested change

To help provide a significant level of safety to an important dataset, AWS may be willing to cover such a low marginal cost.

To help provide a significant level of safety to an important database, it may be worth reaching out to see if AWS may be willing to cover such a low marginal cost.

Original wording sounds as if we are speaking for AWS

Although - given their previous seeming lack of concern for applying glacier to the main archive contents (to 'save ephemeral costs'), I am guessing their perspective is less about the monetery aspect (which is being waived either way) than it is about actual additional storage at the data center (essentially doubling the size of the archive, even as it grows)

Please remove this line as it has been confirmed that the Open Data program will not cover backup.

Suggested change

To help provide a significant level of safety to an important dataset, AWS may be willing to cover such a low marginal cost.

CodyCBakerPhD · 2025-11-09T20:59:20Z

@satra Two things relevant to this discussion

in your current discussion with AWS cloud architects / open data team, have they had any thoughts or suggestions on the topic of backup?
has MIT Engaging cluster team ever given you a quote for storage expansion (i.e., one-time, any recurrent costs, etc.)? @kabilar was unaware of any

kabilar · 2025-12-15T18:11:28Z

doc/design/s3-backup.md

+## Cost
+
+**Below are the additional costs introduced by this backup feature** for a 1 PB primary bucket (assuming both the primary bucket and backup bucket are in us-east-2). All of this information was gathered from the "Storage & requests", "Data transfer", and "Replication" tabs on [https://aws.amazon.com/s3/pricing/](https://aws.amazon.com/s3/pricing/).
+
+**Storage Costs** (backup bucket in us-east-2):
+
+- Glacier Deep Archive storage: ~$0.00099/GB/month
+    - 1 PB = 1,000 TB × $0.99/TB = **$990/month**
+
+**Data Transfer Costs**:
+
+- Same-region data transfer between S3 buckets is free
+
+**Retrieval Costs** (only incurred when disaster recovery is needed):
+
+- Glacier Deep Archive retrieval:
+    - $0.02/GB (standard, 12-hour retrieval)
+    - $0.0025/GB (bulk retrieval, can take up to 48 hours)
+
+Imagining that the entire primary bucket was destroyed (which is not the
+expected scale of data loss, but useful as a worst-case analysis), then the cost
+to restore from backup would be
+
+$`1\ \rm{PB} \times \frac{1000 \ \rm{TB}}{\rm{PB}} \times \frac{1000 \ \rm{GB}}{\rm{TB}} \times \frac{\$0.0025/mo}{\rm{GB}} = \$2500`$.
+
+### Future Costs
+
+The DANDI Archive is expecting a ramp-up in data volume of 1 PB of new data over each of the next five years, culminating in a total of 6PB.
+
+Scaling up the previous analysis means that the monthly costs will be projected to rise to a total of **~$5,900/month** once all of that data is seated.
+The worst-case disaster recovery cost would similarly scale up to a total of **~$16,000**.
+
+An open question is whether the AWS Open Data Sponsorship program would cover the marginal costs of backup. A quick estimate shows that once all 5 PB has been uploaded, the expected bucket cost for the primary bucket (i.e., what the AWS Open Data Sponsorship program covers already, excluding backup) will be:
+
+$$
+6\ \rm{PB} \times \frac{1000\ \rm{TB}}{\rm{PB}} \times \frac{1000\ \rm{GB}}{\rm{TB}} \times \frac{\$0.021/mo}{\rm{GB}} \approxeq \$126000/mo
+$$
+
+while the associated backup costs would represent only an additional $`\$5900 / \$126000 \approxeq 4.6\%`$ of the cost of the storage itself.
+To help provide a significant level of safety to an important dataset, AWS may be willing to cover such a low marginal cost.


@satra The design here looks good, but the main open question and blocker is cost.

Would the AWS Open Data program be open to covering the back up storage costs as well (~$12,000/year for a 1 PB backup, and ~$72,000/year for the projected 6 PB backup)? See the doc for more details.

See table here: https://github.com/CodyCBakerPhD/dandi-archive/blob/f9a19031a1a40f5fa71cfdaf42a7b6bd61e934dd/doc/design/s3-backup-nese.md#future-costs

for best comparison

they won't cover backup. we can cover deep glacier equivalent to start with and then figure out from there on.

Might I suggest that if we are to move forward with S3 replication + deep glacier strategy, perhaps we should start off by testing it with only a certain percentage of assets while closely monitoring costs to ensure our predictions are accurate?

(and while we're at it, attempt some limited 'replication' tests to ensure we know entirely how that process works and guarantee that it behaves as expected in addition to restoration pricing also meeting predictions)

CodyCBakerPhD · 2026-01-12T19:36:07Z

Note that some minor engineering effort should be required to figure out how to setup/configure S3 replication to offer delayed deletion (keeping garbage-collected assets for some period of time on the backup bucket after they were removed from the main bucket)

https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication-what-is-isnot-replicated.html#replication-delete-op describes it otherwise as keeping all the data ever uploaded for all time, which potentially accumulates costs even beyond what has been estimated

Design doc for S3 backup

1cbd19a

kabilar requested review from satra and yarikoptic November 3, 2025 19:57

CodyCBakerPhD reviewed Nov 3, 2025

View reviewed changes

doc/design/s3-backup.md Show resolved Hide resolved

CodyCBakerPhD reviewed Nov 3, 2025

View reviewed changes

doc/design/s3-backup.md Outdated Show resolved Hide resolved

CodyCBakerPhD reviewed Nov 3, 2025

View reviewed changes

doc/design/s3-backup.md Outdated Show resolved Hide resolved

CodyCBakerPhD reviewed Nov 3, 2025

View reviewed changes

doc/design/s3-backup.md Outdated Show resolved Hide resolved

Fix outdated info

dbe010b

We are no longer considering the use of a bucket in a different region.

waxlamp force-pushed the s3-backup-design-doc branch 6 times, most recently from dd89dbd to 333137a Compare November 4, 2025 15:34

Update cost analysis with accurate inputs

da16c20

We are expecting a bulk of 6PB over the next 5 years, not 30PB.

waxlamp force-pushed the s3-backup-design-doc branch from 333137a to da16c20 Compare November 4, 2025 15:35

Fix typo in bulk retrieval cost

21d8b90

waxlamp force-pushed the s3-backup-design-doc branch from 2efa373 to 7429734 Compare November 4, 2025 15:54

Add worst-case disaster recovery cost estimate

3e74365

waxlamp force-pushed the s3-backup-design-doc branch from 7429734 to 3e74365 Compare November 4, 2025 15:55

CodyCBakerPhD reviewed Nov 4, 2025

View reviewed changes