Skip to content

nocentino/mongodb-flasharray-backup

Repository files navigation

MongoDB Snapshot & PITR — Pure Storage FlashArray + Fusion + Ops Manager

PowerShell scripts for crash-consistent snapshot backup and point-in-time recovery of a MongoDB 8.0 Enterprise sharded cluster using Pure Storage FlashArray, Pure Storage Fusion, and MongoDB Ops Manager 8.0.

The three FlashArrays — one per MongoDB node — are enrolled in a Pure Storage Fusion fleet, which provides a unified control plane across the entire storage estate. A single Connect-Pfa2Array call targets the Fusion gateway; all subsequent FlashArray operations are routed to the correct array via -ContextName, eliminating per-array session management and enabling fleet-wide protection group coordination. When New-MongoSnapshot.ps1 triggers a protection group snapshot, Fusion coordinates it across all three arrays — a single consistent point-in-time across every MongoDB data volume regardless of which physical array hosts it. New-MongoSnapshot.ps1 also uses the Ops Manager Third-Party Backup API to open a $backupCursor on one secondary per shard, which pins the WiredTiger checkpoint and freezes journal cleanup. While the cursor is open, the Fusion-coordinated FlashArray protection group snapshot captures all three data volumes in a coordinated crash-consistent sweep — no fsyncLock, no write stall, no primary involvement. Because the journal is intact at the snapshot point, every volume image is guaranteed crash-recoverable on restart. Restore-MongoSnapshot.ps1 stops agents, unmounts volumes, overwrites each volume in-place from the snapshot (a sub-second CoW pointer swap), remounts, and restarts agents — WiredTiger handles crash recovery automatically. For PITR, pitr/Start-OplogTailer.ps1 uses the Ops Manager Oplog Snapshot API to continuously capture oplog as .oplogs segment files, and pitr/Invoke-OplogReplay.ps1 replays those segments to a target timestamp after the snapshot restore completes. See docs/how-it-works.md for the full recoverability deep dive.


Quick Start

1. Install prerequisites

Install-Module PureStoragePowerShellSDK2

PowerShell 7+ required. SSH key auth must be configured from this machine to every cluster node (packer@aen-mongo-{01,02,03}).

2. Configure your environment

cp .env.example .env
# Edit .env — fill in FA_ENDPOINT, FA_PASSWORD, FA_USERNAME, OM_BASE_URL,
# OM_API_VERSION, OM_GROUP_ID, OM_CLUSTER_ID, OM_PUBLIC_KEY, OM_PRIVATE_KEY,
# MONGOSH_PATH, MONGOS_HOST, MONGOS_PORT, SSH_USER, MONGO_TOOLS_BASE,
# FA_PROTECTION_GROUP, FA_CLUSTER_NAME,
# CLUSTER_NODES (comma-separated fallback node list used when OM is unreachable)

3. Initialize protection groups (one-time; re-run after adding nodes)

pwsh ./Initialize-ProtectionGroups.ps1 -WhatIf   # preview first
pwsh ./Initialize-ProtectionGroups.ps1

4. Take a snapshot

pwsh ./New-MongoSnapshot.ps1
# Note the tag printed at the end, e.g. om-20260512-143022

5. Restore from a snapshot

pwsh ./Restore-MongoSnapshot.ps1 -SnapshotTag "om-20260512-143022"

For full PITR (snapshot + oplog replay) see Backup & Recovery Workflows below.


Environment

Role Host IP Ports
Ops Manager aen-mongo-00 10.21.229.11 8080
Cluster Node 1 aen-mongo-01 10.21.229.8 27017 (mongos), 27020–27022 (shards)
Cluster Node 2 aen-mongo-02 10.21.229.9 27020–27022
Cluster Node 3 aen-mongo-03 10.21.229.10 27020–27022
Shard Port Role
aen-shard_0 27020 Config server (CSRS)
aen-shard_1 27021 Data shard
aen-shard_2 27022 Data shard
  • MongoDB: 8.0.21-ent, 3-node replica set per shard
  • Ops Manager: 8.0.23
  • Storage: Pure Storage FlashArray — one array per node (sn1-x90r2-f07-27, sn1-x90r2-f06-27, sn1-x90r2-f06-33), data volumes mounted at /data/mongo
  • SSH user: packer (passwordless sudo)

Prerequisites

  • PowerShell 7+
  • PureStoragePowerShellSDK2 module: Install-Module PureStoragePowerShellSDK2
  • All three FlashArrays enrolled in the same Pure Storage Fusion fleet
  • SSH key auth from this machine to packer@aen-mongo-{01,02,03}
  • .env at project root — copy .env.example and fill in FlashArray endpoint/credentials, Ops Manager API keys, cluster topology
  • Ops Manager API user role: GLOBAL_BACKUP_ADMIN and an API public and private key.
  • Ops Manager third-party backup enabled: mms.featureFlag.backup.thirdPartyManaged=true
  • aen-cluster registered for third-party backup (state = ACTIVE)
  • Protection groups initialized on all FlashArrays (Initialize-ProtectionGroups.ps1)

System Requirements

Single data disk per node — LVM is not supported

Each cluster node has a single Pure Storage pRDM volume mounted at /data/mongo. This is a raw block device; the volume is not part of an LVM logical volume or a multi-disk RAID set. Resolve-NodeToArrayVolumeMap in Config.ps1 identifies the backing FlashArray volume via the SCSI serial number of the device returned by findmnt -no SOURCE /data/mongo. If /data/mongo is an LVM logical volume composed of multiple pRDMs, the serial-number lookup will return the wrong or no device and the node-to-volume mapping will fail. See TODO.md for the planned multi-volume / LVM enhancement.

Oplog directory is on the data disk

The Ops Manager backup agent writes per-RS oplog snapshot files (.oplogs) to /data/mongo/oplog/ on each agent node. This directory lives on the same FlashArray volume as the MongoDB data files. A FlashArray snapshot of the data volume therefore captures a consistent point-in-time copy of both WiredTiger data files and any oplog segment files written by the agent up to that moment. See TODO.md for the planned work to move the oplog directory to a dedicated disk.

SSH requirements

All scripts connect to cluster nodes over SSH. The following must be satisfied before running any script:

Requirement Reason
SSH key-based auth (no password) for $SshUser on all cluster nodes Every script uses -o BatchMode=yes. Any password or interactive SSH prompt causes an immediate auth failure.
Passwordless sudo for $SshUser on all cluster nodes Restore-MongoSnapshot.ps1 runs privileged commands over SSH without a TTY: systemctl stop/start mongodb-mms-automation-agent, umount/mount /data/mongo, pkill mongod/mongos, blockdev, udevadm settle, blkid, xfs_repair, and e2fsck. Without NOPASSWD in /etc/sudoers, each command will hang waiting for a password and the restore will time out. Add with: echo "$SshUser ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/$SshUser on each node.
$SshUser is a member of the mongod OS group on all nodes The OM backup agent writes .oplogs files under /data/mongo/oplog/ as user mongod with directory permissions drwxr-x---. scp in pitr/Start-OplogTailer.ps1 runs as $SshUser and will fail with exit 1 if that user is not in the mongod group. Add with sudo usermod -aG mongod $SshUser on each node, then close any existing ControlMaster sockets so new connections pick up the updated group.
mongorestore on $PATH for $SshUser on each shard primary Invoke-OplogReplay.ps1 SCPs .oplogs files to /tmp/ on the shard primary and runs mongorestore there via SSH. Either ensure mongorestore is on the default $PATH for $SshUser, or set MONGO_TOOLS_BASE in .env to its directory.
python3 and libsnappy.so on each shard primary pitr/Invoke-OplogReplay.ps1 SCPs pitr/decode_oplogs.py to /tmp/ on each shard primary and runs it via SSH to decompress OM .oplogs files before passing them to mongorestore. Python 3 is standard on RHEL/Rocky 9. libsnappy.so is provided by the snappy RPM (snappy-1.1.x), which is a transitive dependency of the MongoDB automation agent — it is present on any node where the OM agent is installed. Verify with: rpm -q snappy and python3 --version.
SSH ControlMaster socket directory /tmp/ssh-mux-* is writable Config.ps1 sets ControlPath=/tmp/ssh-mux-%C. If /tmp is not writable for $SshUser, multiplexing silently falls back to a new connection per call, which can saturate sshd MaxStartups under concurrent load.

Scripts

All scripts dot-source Config.ps1, which loads credentials from .env and provides runtime discovery helpers (Get-ClusterNodes, Resolve-FaContextNames, Resolve-NodeToArrayVolumeMap). Node lists and volume mappings are always discovered at runtime from Ops Manager and SCSI serials — nothing is hardcoded.

Script Purpose
Initialize-ProtectionGroups.ps1 One-time setup: discovers nodes, resolves FA volumes via SCSI serial, creates/updates the PG on every array. Idempotent. Use -Prune to remove orphaned members after scale-down.
New-MongoSnapshot.ps1 Takes a crash-consistent snapshot. Coordinates Ops Manager $backupCursor with a FlashArray PG snapshot. Writes a metadata sidecar to ~/mongo-snapshots/<tag>.json.
Restore-MongoSnapshot.ps1 Restores from a named snapshot. Pre-flight verifies all FlashArray snapshots exist on all arrays, all node volume sizes match, and all nodes are reachable via SSH. Stops agents, unmounts, overwrites volumes in-place, remounts, restarts agents, waits for cluster stabilization, verifies document counts. Destructive.
Remove-OldArtifacts.ps1 Deletes FA PG snapshots, local sidecar JSON files, oplog segments, and log files older than N days.
pitr/Start-OplogTailer.ps1 Continuously captures oplog using the OM Oplog Snapshot API. Each cycle creates an oplog snapshot job, SCPs .oplogs segment files per shard to ~/mongo-oplog-stream/<tag>/<shardId>/segments/, detects gaps via OM's previousEnd field, and updates ~/mongo-oplog-stream/<tag>/state.json. Run in its own terminal.
pitr/Stop-OplogTailer.ps1 Stops the tailer and writes t2-mark.json (document counts at stop time) for use as the PITR upper-bound assertion.
pitr/Invoke-OplogReplay.ps1 Applies captured oplog segments to a restored cluster via mongorestore --oplogReplay. Accepts -TargetTimestamp for sub-segment precision.
pitr/decode_oplogs.py Python helper deployed automatically by Invoke-OplogReplay.ps1. Decodes the OM .oplogs binary format (snappy-compressed BSON wrapper) to raw oplog BSON so mongorestore --oplogReplay can consume it. Requires Python 3 and libsnappy.so on each agent node — both present by default on RHEL/Rocky 9 with the snappy RPM installed.
tests/Start-InsertLoad.ps1 Continuous background writer into testdb.loadtest for testing under load.

Key usage

# Protection group setup
pwsh ./Initialize-ProtectionGroups.ps1
pwsh ./Initialize-ProtectionGroups.ps1 -Prune -WhatIf     # preview orphan removal

# Snapshot
pwsh ./New-MongoSnapshot.ps1

# Restore
pwsh ./Restore-MongoSnapshot.ps1 -SnapshotTag "om-20260505-201951"
pwsh ./Restore-MongoSnapshot.ps1 -SnapshotTag "om-20260505-201951" -Force

# Cleanup
pwsh ./Remove-OldArtifacts.ps1 -OlderThanDays 30
pwsh ./Remove-OldArtifacts.ps1 -OlderThanDays 30 -WhatIf

# Oplog tailer
pwsh ./pitr/Start-OplogTailer.ps1 -SnapshotTag "om-20260505-201951"
pwsh ./pitr/Stop-OplogTailer.ps1  -SnapshotTag "om-20260505-201951"

# Oplog replay
pwsh ./pitr/Invoke-OplogReplay.ps1 -SnapshotTag "om-20260505-201951"
pwsh ./pitr/Invoke-OplogReplay.ps1 -SnapshotTag "om-20260505-201951" -TargetTimestamp 1778030500

Note: The config shard (aen-shard_0, CSRS) will report NotWritablePrimary during oplog replay — expected. Only data shards need oplog replay.


Backup & Recovery Workflows

Snapshot-only restore (crash recovery)

# 1. Take snapshot
pwsh ./New-MongoSnapshot.ps1

# 2. (disaster occurs)

# 3. Restore to snapshot point-in-time
pwsh ./Restore-MongoSnapshot.ps1 -SnapshotTag "om-20260505-201951"

WiredTiger rolls back any writes that were not yet checkpointed at snapshot time.

Point-in-time restore (snapshot + oplog)

How oplog replay works: pitr/Start-OplogTailer.ps1 uses the OM Oplog Snapshot API to download .oplogs segment files. These files use OM's internal format: a BSON metadata header followed by a snappy-compressed block of raw oplog BSON. pitr/Invoke-OplogReplay.ps1 automatically deploys pitr/decode_oplogs.py to each agent node via SCP before the replay loop begins. The decoder uses Python's ctypes to call libsnappy_uncompress from the libsnappy.so system library, strips the header documents, and writes raw oplog BSON to stdout. mongorestore --oplogReplay reads this standard BSON directly. No manual decompression step is required.

# 1. Take snapshot (T1) — writes per-shard oplog anchors into the metadata sidecar
pwsh ./New-MongoSnapshot.ps1
# Note the tag, e.g. om-20260505-201951

# 2. Start the continuous oplog tailer in its own terminal (long-running)
pwsh ./pitr/Start-OplogTailer.ps1 -SnapshotTag "om-20260505-201951"

# 3. (disaster occurs at T2)

# 4. Stop the tailer (writes the T2-mark count file used for the replay range check)
pwsh ./pitr/Stop-OplogTailer.ps1 -SnapshotTag "om-20260505-201951"

# 5. Restore to snapshot (T1)
pwsh ./Restore-MongoSnapshot.ps1 -SnapshotTag "om-20260505-201951"

# 6. Replay oplog segments to advance cluster from T1 → T2
pwsh ./pitr/Invoke-OplogReplay.ps1 -SnapshotTag "om-20260505-201951"
# Or to a specific timestamp:
pwsh ./pitr/Invoke-OplogReplay.ps1 -SnapshotTag "om-20260505-201951" -TargetTimestamp 1778030500

Adding or replacing a cluster node

The backup and restore pipeline is topology-agnostic — node lists, FlashArray volume mappings, and shard counts are all discovered at runtime from Ops Manager and SCSI serial numbers. There are no hardcoded node counts in the snapshot, restore, tailer, or replay scripts.

Adding a node

  1. Register the node and its mongod processes in Ops Manager (or via automationConfig). Once OM shows the node as a member of the cluster, Get-ClusterNodes will pick it up automatically — no script changes required.

  2. Update CLUSTER_NODES in .env to include the new node. Get-ClusterNodes queries Ops Manager first, but falls back to .env if OM is unreachable. The restore script uses the same fallback. If .env is not updated, a restore attempted while OM is down will miss the new node's volume.

  3. Run Initialize-ProtectionGroups.ps1 to add the new volume to the protection group. This is required before taking a snapshot — the snapshot pre-flight verifies that every node's volume is a PG member and will abort before opening any backup cursor if any volume is absent.

    # Idempotent — existing members are untouched.
    pwsh ./Initialize-ProtectionGroups.ps1
  4. Take a new snapshot.

Removing a node

Order matters — follow these steps in sequence:

  1. Deregister the node from Ops Manager (remove it from the automationConfig or decommission via the UI). Get-ClusterNodes queries OM first; if the node is still registered, -Prune will not consider its volume orphaned.

  2. Remove the node from CLUSTER_NODES in .env (for the OM-fallback path).

  3. Run Initialize-ProtectionGroups.ps1 -Prune to remove the orphaned volume from the PG. Use -WhatIf first to preview:

    pwsh ./Initialize-ProtectionGroups.ps1 -Prune -WhatIf   # preview what would be removed
    pwsh ./Initialize-ProtectionGroups.ps1 -Prune            # requires typing the PG name to confirm

    -Prune compares the live OM-discovered node list against all current PG members and removes any volume whose node no longer exists in the cluster.

Note: The build/ provisioning scripts (Provision-MongoStorage.ps1, Verify-MongoStorage.ps1, Set-MongoVmStaticIp.ps1) contain hardcoded node-to-array mappings used for initial infrastructure setup. These are not on the backup/restore path but must be updated manually when adding or removing a node.


Output Locations

Artifact Path
Snapshot metadata ~/mongo-snapshots/<tag>.json
Oplog tailer segments ~/mongo-oplog-stream/<tag>/<shardId>/segments/<startTs>_<endTs>.oplogs
Oplog tailer state ~/mongo-oplog-stream/<tag>/state.json
Oplog tailer gap markers ~/mongo-oplog-stream/<tag>/gap-<timestamp>.json
T2 mark ~/mongo-oplog-stream/<tag>/t2-mark.json
Snapshot logs ~/mongo-snapshot-logs/
Restore logs ~/mongo-restore-logs/
Oplog tailer logs ~/mongo-oplogtailer-logs/
Oplog replay logs ~/mongo-oplogreplay-logs/
Insert load status ~/mongo-loadtest-status.json

Troubleshooting

Symptom When it occurs Cluster state What to do
OM snapshot job '...' is already in progress New snapshot blocked by stuck job Untouched Call /fail to release the backup cursor: . ./Config.ps1; Invoke-OmApi -Method POST -Path "group/$GroupId/clusters/$ClusterId/snapshot/<id>/fail". Poll until state = FAILED, then re-run.
Snapshot 'om-...' found on N of M arrays Restore pre-flight: tag missing on one or more arrays Untouched — aborts before any change Snapshot was deleted (retention) or a node was added after the snapshot. List available snapshots per array: Get-Pfa2ProtectionGroupSnapshot -Array $FA -ContextName @($ctx) -Filter "source.name='$ProtectionGroupName'". Use a newer tag or take a fresh snapshot.
missing member snapshot '...aen-mongo-XX-data' Restore pre-flight: volume not in PG at snapshot time Untouched — aborts before any change Volume was added to cluster after snapshot. Run pwsh ./Initialize-ProtectionGroups.ps1 then take a fresh snapshot.
Volume size mismatch Restore pre-flight Untouched — aborts before any change Snapshot was taken from a different-sized volume. Do not restore — investigate which snapshot matches the current live volume.
STEP 4 volume overwrite fails Mid-restore Dangerous — agents stopped, /data/mongo unmounted, volumes partially overwritten 1. Retry failed overwrites manually: New-Pfa2Volume -Array $FA -ContextName @($ShortName) -Name $VolumeName -SourceName "$ProtectionGroupName.$SnapshotTag.$VolumeName" -Overwrite $true. 2. Rescan and remount /data/mongo on each node: echo 1 | sudo tee /sys/block/$DISK/device/rescan > /dev/null; sudo mount /data/mongo. 3. Start agents: sudo systemctl start mongodb-mms-automation-agent. Do NOT start agents until all overwrites are complete — partial restores leave mixed data epochs across shards.
New node stays (not reachable/healthy) in rs.status() After adding a node via automationConfig API Cluster healthy; new node isolated firewalld active on new node. Open ports: sudo firewall-cmd --permanent --add-port=2702{0,1,2}/tcp && sudo firewall-cmd --reload
Agent loops with Error ensuring directory /data/mongo/logs exists After pushing automationConfig for a new node Cluster healthy; new node mongods not starting Pre-create dirs as root: sudo mkdir -p /data/mongo/{logs,shard0,shard1,shard2} && sudo chown -R mongod:mongod /data/mongo
systemctl enable mongodb-mms-automation-agent fails on new node Agent install on new node N/A RPM only ships init.d script. Copy unit file from existing node: scp packer@aen-mongo-01:/etc/systemd/system/mongodb-mms-automation-agent.service /tmp/ then install on new node and sudo systemctl daemon-reload
THIRD_PARTY_DISCOVERY_ERROR from snapshot pre-flight After adding a new node to the cluster Cluster healthy; no snapshot possible Third-party backup must be re-activated after any topology change. In OM UI: Servers → select new node → enable Backup and Monitoring → Deploy Changes. Wait for deployment to complete, then retry the snapshot.
decode_oplogs.py not found Invoke-OplogReplay.ps1 cannot locate the helper Cluster healthy; replay not started decode_oplogs.py must be in the pitr/ directory with Invoke-OplogReplay.ps1. It is part of the repo — verify ls pitr/decode_oplogs.py.
snappy_uncompress failed from decoder decode_oplogs.py decompression error on agent node Cluster healthy; this segment skipped libsnappy.so version mismatch or corrupt .oplogs file. Verify rpm -q snappy on the agent returns snappy-1.1.x. If the segment file is corrupt (truncated download), re-download: restart the tailer run or retrieve the file manually from the OM agent's oplog directory (/data/mongo/oplog/<shardId>/<port>/<date>/).

Known Limitations

Multiple clusters on the same Fusion fleet

The scripts are scoped to a single cluster per .env file. All scripts read one $GroupId, $ClusterId, $MongosHost, and $ProtectionGroupName from the active .env and operate against that cluster only.

Workaround (today): Use a separate .env file per cluster and pass -ConfigFile by sourcing the right file before running each script, or maintain multiple project clones with their own .env. Use a unique FA_PROTECTION_GROUP name per cluster (e.g. cluster-a-pg, cluster-b-pg) to prevent snapshot name collisions on shared arrays.

What won't collide today: Snapshot names are <FA_PROTECTION_GROUP>.<tag>, so as long as each cluster has a unique PG name, snapshots on a shared fleet co-exist safely. The ClusterName tag is already written to every snapshot for identification.

What's missing: Restore-MongoSnapshot.ps1 looks up snapshots by PG name and tag only — it does not assert that the snapshot's ClusterName tag matches the target cluster before overwriting volumes. With unique PG names per cluster this is not a practical risk, but the validation is absent.

Tracked in TODO: parameterized multi-cluster support.


Set-Pfa2ProtectionGroupSnapshotTagBatch with -ContextName returns HTTP 500

PUT /protection-group-snapshots/tags/batch?context_names=<remote-array> returns HTTP 500 InternalServerError (willRetry=False, "Unidentified internal error") for every remote array in the fleet. The identical request without context_names succeeds on the gateway.

Workaround (current): New-MongoSnapshot.ps1 connects directly to each fleet member in STEP 7.5 and writes post-snapshot tags (mongo:postSnap, mongo:t1ts) without -ContextName. The management FQDN for each member is derived from the gateway endpoint by replacing the short name prefix (e.g. sn1-x90r2-f06-27.puretec.purestorage.comsn1-x90r2-f07-27.puretec.purestorage.com). All arrays must share the same management domain for this to work.

Tracked: issue #1 · Repro script: tests/Repro-PgSnapshotTagContextNames.ps1


Additional Documentation

About

PowerShell scripts for crash-consistent snapshot backup and point-in-time recovery of MongoDB 8.0 clusters using Pure Storage FlashArray and MongoDB Ops Manager

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors