PowerShell scripts for crash-consistent snapshot backup and point-in-time recovery of a MongoDB 8.0 Enterprise sharded cluster using Pure Storage FlashArray, Pure Storage Fusion, and MongoDB Ops Manager 8.0.
The three FlashArrays — one per MongoDB node — are enrolled in a Pure Storage Fusion fleet, which provides a unified control plane across the entire storage estate. A single Connect-Pfa2Array call targets the Fusion gateway; all subsequent FlashArray operations are routed to the correct array via -ContextName, eliminating per-array session management and enabling fleet-wide protection group coordination. When New-MongoSnapshot.ps1 triggers a protection group snapshot, Fusion coordinates it across all three arrays — a single consistent point-in-time across every MongoDB data volume regardless of which physical array hosts it. New-MongoSnapshot.ps1 also uses the Ops Manager Third-Party Backup API to open a $backupCursor on one secondary per shard, which pins the WiredTiger checkpoint and freezes journal cleanup. While the cursor is open, the Fusion-coordinated FlashArray protection group snapshot captures all three data volumes in a coordinated crash-consistent sweep — no fsyncLock, no write stall, no primary involvement. Because the journal is intact at the snapshot point, every volume image is guaranteed crash-recoverable on restart. Restore-MongoSnapshot.ps1 stops agents, unmounts volumes, overwrites each volume in-place from the snapshot (a sub-second CoW pointer swap), remounts, and restarts agents — WiredTiger handles crash recovery automatically. For PITR, pitr/Start-OplogTailer.ps1 uses the Ops Manager Oplog Snapshot API to continuously capture oplog as .oplogs segment files, and pitr/Invoke-OplogReplay.ps1 replays those segments to a target timestamp after the snapshot restore completes. See docs/how-it-works.md for the full recoverability deep dive.
1. Install prerequisites
Install-Module PureStoragePowerShellSDK2PowerShell 7+ required. SSH key auth must be configured from this machine to every cluster node (packer@aen-mongo-{01,02,03}).
2. Configure your environment
cp .env.example .env
# Edit .env — fill in FA_ENDPOINT, FA_PASSWORD, FA_USERNAME, OM_BASE_URL,
# OM_API_VERSION, OM_GROUP_ID, OM_CLUSTER_ID, OM_PUBLIC_KEY, OM_PRIVATE_KEY,
# MONGOSH_PATH, MONGOS_HOST, MONGOS_PORT, SSH_USER, MONGO_TOOLS_BASE,
# FA_PROTECTION_GROUP, FA_CLUSTER_NAME,
# CLUSTER_NODES (comma-separated fallback node list used when OM is unreachable)3. Initialize protection groups (one-time; re-run after adding nodes)
pwsh ./Initialize-ProtectionGroups.ps1 -WhatIf # preview first
pwsh ./Initialize-ProtectionGroups.ps14. Take a snapshot
pwsh ./New-MongoSnapshot.ps1
# Note the tag printed at the end, e.g. om-20260512-1430225. Restore from a snapshot
pwsh ./Restore-MongoSnapshot.ps1 -SnapshotTag "om-20260512-143022"For full PITR (snapshot + oplog replay) see Backup & Recovery Workflows below.
| Role | Host | IP | Ports |
|---|---|---|---|
| Ops Manager | aen-mongo-00 |
10.21.229.11 |
8080 |
| Cluster Node 1 | aen-mongo-01 |
10.21.229.8 |
27017 (mongos), 27020–27022 (shards) |
| Cluster Node 2 | aen-mongo-02 |
10.21.229.9 |
27020–27022 |
| Cluster Node 3 | aen-mongo-03 |
10.21.229.10 |
27020–27022 |
| Shard | Port | Role |
|---|---|---|
aen-shard_0 |
27020 |
Config server (CSRS) |
aen-shard_1 |
27021 |
Data shard |
aen-shard_2 |
27022 |
Data shard |
- MongoDB: 8.0.21-ent, 3-node replica set per shard
- Ops Manager: 8.0.23
- Storage: Pure Storage FlashArray — one array per node (
sn1-x90r2-f07-27,sn1-x90r2-f06-27,sn1-x90r2-f06-33), data volumes mounted at/data/mongo - SSH user:
packer(passwordless sudo)
- PowerShell 7+
PureStoragePowerShellSDK2module:Install-Module PureStoragePowerShellSDK2- All three FlashArrays enrolled in the same Pure Storage Fusion fleet
- SSH key auth from this machine to
packer@aen-mongo-{01,02,03} .envat project root — copy.env.exampleand fill in FlashArray endpoint/credentials, Ops Manager API keys, cluster topology- Ops Manager API user role:
GLOBAL_BACKUP_ADMINand an API public and private key. - Ops Manager third-party backup enabled:
mms.featureFlag.backup.thirdPartyManaged=true aen-clusterregistered for third-party backup (state =ACTIVE)- Protection groups initialized on all FlashArrays (
Initialize-ProtectionGroups.ps1)
Each cluster node has a single Pure Storage pRDM volume mounted at /data/mongo. This is a raw block device; the volume is not part of an LVM logical volume or a multi-disk RAID set. Resolve-NodeToArrayVolumeMap in Config.ps1 identifies the backing FlashArray volume via the SCSI serial number of the device returned by findmnt -no SOURCE /data/mongo. If /data/mongo is an LVM logical volume composed of multiple pRDMs, the serial-number lookup will return the wrong or no device and the node-to-volume mapping will fail. See TODO.md for the planned multi-volume / LVM enhancement.
The Ops Manager backup agent writes per-RS oplog snapshot files (.oplogs) to /data/mongo/oplog/ on each agent node. This directory lives on the same FlashArray volume as the MongoDB data files. A FlashArray snapshot of the data volume therefore captures a consistent point-in-time copy of both WiredTiger data files and any oplog segment files written by the agent up to that moment. See TODO.md for the planned work to move the oplog directory to a dedicated disk.
All scripts connect to cluster nodes over SSH. The following must be satisfied before running any script:
| Requirement | Reason |
|---|---|
SSH key-based auth (no password) for $SshUser on all cluster nodes |
Every script uses -o BatchMode=yes. Any password or interactive SSH prompt causes an immediate auth failure. |
Passwordless sudo for $SshUser on all cluster nodes |
Restore-MongoSnapshot.ps1 runs privileged commands over SSH without a TTY: systemctl stop/start mongodb-mms-automation-agent, umount/mount /data/mongo, pkill mongod/mongos, blockdev, udevadm settle, blkid, xfs_repair, and e2fsck. Without NOPASSWD in /etc/sudoers, each command will hang waiting for a password and the restore will time out. Add with: echo "$SshUser ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/$SshUser on each node. |
$SshUser is a member of the mongod OS group on all nodes |
The OM backup agent writes .oplogs files under /data/mongo/oplog/ as user mongod with directory permissions drwxr-x---. scp in pitr/Start-OplogTailer.ps1 runs as $SshUser and will fail with exit 1 if that user is not in the mongod group. Add with sudo usermod -aG mongod $SshUser on each node, then close any existing ControlMaster sockets so new connections pick up the updated group. |
mongorestore on $PATH for $SshUser on each shard primary |
Invoke-OplogReplay.ps1 SCPs .oplogs files to /tmp/ on the shard primary and runs mongorestore there via SSH. Either ensure mongorestore is on the default $PATH for $SshUser, or set MONGO_TOOLS_BASE in .env to its directory. |
python3 and libsnappy.so on each shard primary |
pitr/Invoke-OplogReplay.ps1 SCPs pitr/decode_oplogs.py to /tmp/ on each shard primary and runs it via SSH to decompress OM .oplogs files before passing them to mongorestore. Python 3 is standard on RHEL/Rocky 9. libsnappy.so is provided by the snappy RPM (snappy-1.1.x), which is a transitive dependency of the MongoDB automation agent — it is present on any node where the OM agent is installed. Verify with: rpm -q snappy and python3 --version. |
SSH ControlMaster socket directory /tmp/ssh-mux-* is writable |
Config.ps1 sets ControlPath=/tmp/ssh-mux-%C. If /tmp is not writable for $SshUser, multiplexing silently falls back to a new connection per call, which can saturate sshd MaxStartups under concurrent load. |
All scripts dot-source Config.ps1, which loads credentials from .env and provides runtime discovery helpers (Get-ClusterNodes, Resolve-FaContextNames, Resolve-NodeToArrayVolumeMap). Node lists and volume mappings are always discovered at runtime from Ops Manager and SCSI serials — nothing is hardcoded.
| Script | Purpose |
|---|---|
Initialize-ProtectionGroups.ps1 |
One-time setup: discovers nodes, resolves FA volumes via SCSI serial, creates/updates the PG on every array. Idempotent. Use -Prune to remove orphaned members after scale-down. |
New-MongoSnapshot.ps1 |
Takes a crash-consistent snapshot. Coordinates Ops Manager $backupCursor with a FlashArray PG snapshot. Writes a metadata sidecar to ~/mongo-snapshots/<tag>.json. |
Restore-MongoSnapshot.ps1 |
Restores from a named snapshot. Pre-flight verifies all FlashArray snapshots exist on all arrays, all node volume sizes match, and all nodes are reachable via SSH. Stops agents, unmounts, overwrites volumes in-place, remounts, restarts agents, waits for cluster stabilization, verifies document counts. Destructive. |
Remove-OldArtifacts.ps1 |
Deletes FA PG snapshots, local sidecar JSON files, oplog segments, and log files older than N days. |
pitr/Start-OplogTailer.ps1 |
Continuously captures oplog using the OM Oplog Snapshot API. Each cycle creates an oplog snapshot job, SCPs .oplogs segment files per shard to ~/mongo-oplog-stream/<tag>/<shardId>/segments/, detects gaps via OM's previousEnd field, and updates ~/mongo-oplog-stream/<tag>/state.json. Run in its own terminal. |
pitr/Stop-OplogTailer.ps1 |
Stops the tailer and writes t2-mark.json (document counts at stop time) for use as the PITR upper-bound assertion. |
pitr/Invoke-OplogReplay.ps1 |
Applies captured oplog segments to a restored cluster via mongorestore --oplogReplay. Accepts -TargetTimestamp for sub-segment precision. |
pitr/decode_oplogs.py |
Python helper deployed automatically by Invoke-OplogReplay.ps1. Decodes the OM .oplogs binary format (snappy-compressed BSON wrapper) to raw oplog BSON so mongorestore --oplogReplay can consume it. Requires Python 3 and libsnappy.so on each agent node — both present by default on RHEL/Rocky 9 with the snappy RPM installed. |
tests/Start-InsertLoad.ps1 |
Continuous background writer into testdb.loadtest for testing under load. |
# Protection group setup
pwsh ./Initialize-ProtectionGroups.ps1
pwsh ./Initialize-ProtectionGroups.ps1 -Prune -WhatIf # preview orphan removal
# Snapshot
pwsh ./New-MongoSnapshot.ps1
# Restore
pwsh ./Restore-MongoSnapshot.ps1 -SnapshotTag "om-20260505-201951"
pwsh ./Restore-MongoSnapshot.ps1 -SnapshotTag "om-20260505-201951" -Force
# Cleanup
pwsh ./Remove-OldArtifacts.ps1 -OlderThanDays 30
pwsh ./Remove-OldArtifacts.ps1 -OlderThanDays 30 -WhatIf
# Oplog tailer
pwsh ./pitr/Start-OplogTailer.ps1 -SnapshotTag "om-20260505-201951"
pwsh ./pitr/Stop-OplogTailer.ps1 -SnapshotTag "om-20260505-201951"
# Oplog replay
pwsh ./pitr/Invoke-OplogReplay.ps1 -SnapshotTag "om-20260505-201951"
pwsh ./pitr/Invoke-OplogReplay.ps1 -SnapshotTag "om-20260505-201951" -TargetTimestamp 1778030500Note: The config shard (
aen-shard_0, CSRS) will reportNotWritablePrimaryduring oplog replay — expected. Only data shards need oplog replay.
# 1. Take snapshot
pwsh ./New-MongoSnapshot.ps1
# 2. (disaster occurs)
# 3. Restore to snapshot point-in-time
pwsh ./Restore-MongoSnapshot.ps1 -SnapshotTag "om-20260505-201951"WiredTiger rolls back any writes that were not yet checkpointed at snapshot time.
How oplog replay works:
pitr/Start-OplogTailer.ps1uses the OM Oplog Snapshot API to download.oplogssegment files. These files use OM's internal format: a BSON metadata header followed by a snappy-compressed block of raw oplog BSON.pitr/Invoke-OplogReplay.ps1automatically deployspitr/decode_oplogs.pyto each agent node via SCP before the replay loop begins. The decoder uses Python'sctypesto calllibsnappy_uncompressfrom thelibsnappy.sosystem library, strips the header documents, and writes raw oplog BSON to stdout.mongorestore --oplogReplayreads this standard BSON directly. No manual decompression step is required.
# 1. Take snapshot (T1) — writes per-shard oplog anchors into the metadata sidecar
pwsh ./New-MongoSnapshot.ps1
# Note the tag, e.g. om-20260505-201951
# 2. Start the continuous oplog tailer in its own terminal (long-running)
pwsh ./pitr/Start-OplogTailer.ps1 -SnapshotTag "om-20260505-201951"
# 3. (disaster occurs at T2)
# 4. Stop the tailer (writes the T2-mark count file used for the replay range check)
pwsh ./pitr/Stop-OplogTailer.ps1 -SnapshotTag "om-20260505-201951"
# 5. Restore to snapshot (T1)
pwsh ./Restore-MongoSnapshot.ps1 -SnapshotTag "om-20260505-201951"
# 6. Replay oplog segments to advance cluster from T1 → T2
pwsh ./pitr/Invoke-OplogReplay.ps1 -SnapshotTag "om-20260505-201951"
# Or to a specific timestamp:
pwsh ./pitr/Invoke-OplogReplay.ps1 -SnapshotTag "om-20260505-201951" -TargetTimestamp 1778030500The backup and restore pipeline is topology-agnostic — node lists, FlashArray volume mappings, and shard counts are all discovered at runtime from Ops Manager and SCSI serial numbers. There are no hardcoded node counts in the snapshot, restore, tailer, or replay scripts.
-
Register the node and its mongod processes in Ops Manager (or via
automationConfig). Once OM shows the node as a member of the cluster,Get-ClusterNodeswill pick it up automatically — no script changes required. -
Update
CLUSTER_NODESin.envto include the new node.Get-ClusterNodesqueries Ops Manager first, but falls back to.envif OM is unreachable. The restore script uses the same fallback. If.envis not updated, a restore attempted while OM is down will miss the new node's volume. -
Run
Initialize-ProtectionGroups.ps1to add the new volume to the protection group. This is required before taking a snapshot — the snapshot pre-flight verifies that every node's volume is a PG member and will abort before opening any backup cursor if any volume is absent.# Idempotent — existing members are untouched. pwsh ./Initialize-ProtectionGroups.ps1
-
Take a new snapshot.
Order matters — follow these steps in sequence:
-
Deregister the node from Ops Manager (remove it from the
automationConfigor decommission via the UI).Get-ClusterNodesqueries OM first; if the node is still registered,-Prunewill not consider its volume orphaned. -
Remove the node from
CLUSTER_NODESin.env(for the OM-fallback path). -
Run
Initialize-ProtectionGroups.ps1 -Pruneto remove the orphaned volume from the PG. Use-WhatIffirst to preview:pwsh ./Initialize-ProtectionGroups.ps1 -Prune -WhatIf # preview what would be removed pwsh ./Initialize-ProtectionGroups.ps1 -Prune # requires typing the PG name to confirm
-Prunecompares the live OM-discovered node list against all current PG members and removes any volume whose node no longer exists in the cluster.
Note: The
build/provisioning scripts (Provision-MongoStorage.ps1,Verify-MongoStorage.ps1,Set-MongoVmStaticIp.ps1) contain hardcoded node-to-array mappings used for initial infrastructure setup. These are not on the backup/restore path but must be updated manually when adding or removing a node.
| Artifact | Path |
|---|---|
| Snapshot metadata | ~/mongo-snapshots/<tag>.json |
| Oplog tailer segments | ~/mongo-oplog-stream/<tag>/<shardId>/segments/<startTs>_<endTs>.oplogs |
| Oplog tailer state | ~/mongo-oplog-stream/<tag>/state.json |
| Oplog tailer gap markers | ~/mongo-oplog-stream/<tag>/gap-<timestamp>.json |
| T2 mark | ~/mongo-oplog-stream/<tag>/t2-mark.json |
| Snapshot logs | ~/mongo-snapshot-logs/ |
| Restore logs | ~/mongo-restore-logs/ |
| Oplog tailer logs | ~/mongo-oplogtailer-logs/ |
| Oplog replay logs | ~/mongo-oplogreplay-logs/ |
| Insert load status | ~/mongo-loadtest-status.json |
| Symptom | When it occurs | Cluster state | What to do |
|---|---|---|---|
OM snapshot job '...' is already in progress |
New snapshot blocked by stuck job | Untouched | Call /fail to release the backup cursor: . ./Config.ps1; Invoke-OmApi -Method POST -Path "group/$GroupId/clusters/$ClusterId/snapshot/<id>/fail". Poll until state = FAILED, then re-run. |
Snapshot 'om-...' found on N of M arrays |
Restore pre-flight: tag missing on one or more arrays | Untouched — aborts before any change | Snapshot was deleted (retention) or a node was added after the snapshot. List available snapshots per array: Get-Pfa2ProtectionGroupSnapshot -Array $FA -ContextName @($ctx) -Filter "source.name='$ProtectionGroupName'". Use a newer tag or take a fresh snapshot. |
missing member snapshot '...aen-mongo-XX-data' |
Restore pre-flight: volume not in PG at snapshot time | Untouched — aborts before any change | Volume was added to cluster after snapshot. Run pwsh ./Initialize-ProtectionGroups.ps1 then take a fresh snapshot. |
Volume size mismatch |
Restore pre-flight | Untouched — aborts before any change | Snapshot was taken from a different-sized volume. Do not restore — investigate which snapshot matches the current live volume. |
| STEP 4 volume overwrite fails | Mid-restore | Dangerous — agents stopped, /data/mongo unmounted, volumes partially overwritten |
1. Retry failed overwrites manually: New-Pfa2Volume -Array $FA -ContextName @($ShortName) -Name $VolumeName -SourceName "$ProtectionGroupName.$SnapshotTag.$VolumeName" -Overwrite $true. 2. Rescan and remount /data/mongo on each node: echo 1 | sudo tee /sys/block/$DISK/device/rescan > /dev/null; sudo mount /data/mongo. 3. Start agents: sudo systemctl start mongodb-mms-automation-agent. Do NOT start agents until all overwrites are complete — partial restores leave mixed data epochs across shards. |
New node stays (not reachable/healthy) in rs.status() |
After adding a node via automationConfig API | Cluster healthy; new node isolated | firewalld active on new node. Open ports: sudo firewall-cmd --permanent --add-port=2702{0,1,2}/tcp && sudo firewall-cmd --reload |
Agent loops with Error ensuring directory /data/mongo/logs exists |
After pushing automationConfig for a new node | Cluster healthy; new node mongods not starting | Pre-create dirs as root: sudo mkdir -p /data/mongo/{logs,shard0,shard1,shard2} && sudo chown -R mongod:mongod /data/mongo |
systemctl enable mongodb-mms-automation-agent fails on new node |
Agent install on new node | N/A | RPM only ships init.d script. Copy unit file from existing node: scp packer@aen-mongo-01:/etc/systemd/system/mongodb-mms-automation-agent.service /tmp/ then install on new node and sudo systemctl daemon-reload |
THIRD_PARTY_DISCOVERY_ERROR from snapshot pre-flight |
After adding a new node to the cluster | Cluster healthy; no snapshot possible | Third-party backup must be re-activated after any topology change. In OM UI: Servers → select new node → enable Backup and Monitoring → Deploy Changes. Wait for deployment to complete, then retry the snapshot. |
decode_oplogs.py not found |
Invoke-OplogReplay.ps1 cannot locate the helper |
Cluster healthy; replay not started | decode_oplogs.py must be in the pitr/ directory with Invoke-OplogReplay.ps1. It is part of the repo — verify ls pitr/decode_oplogs.py. |
snappy_uncompress failed from decoder |
decode_oplogs.py decompression error on agent node |
Cluster healthy; this segment skipped | libsnappy.so version mismatch or corrupt .oplogs file. Verify rpm -q snappy on the agent returns snappy-1.1.x. If the segment file is corrupt (truncated download), re-download: restart the tailer run or retrieve the file manually from the OM agent's oplog directory (/data/mongo/oplog/<shardId>/<port>/<date>/). |
The scripts are scoped to a single cluster per .env file. All scripts read one $GroupId, $ClusterId, $MongosHost, and $ProtectionGroupName from the active .env and operate against that cluster only.
Workaround (today): Use a separate .env file per cluster and pass -ConfigFile by sourcing the right file before running each script, or maintain multiple project clones with their own .env. Use a unique FA_PROTECTION_GROUP name per cluster (e.g. cluster-a-pg, cluster-b-pg) to prevent snapshot name collisions on shared arrays.
What won't collide today: Snapshot names are <FA_PROTECTION_GROUP>.<tag>, so as long as each cluster has a unique PG name, snapshots on a shared fleet co-exist safely. The ClusterName tag is already written to every snapshot for identification.
What's missing: Restore-MongoSnapshot.ps1 looks up snapshots by PG name and tag only — it does not assert that the snapshot's ClusterName tag matches the target cluster before overwriting volumes. With unique PG names per cluster this is not a practical risk, but the validation is absent.
Tracked in TODO: parameterized multi-cluster support.
PUT /protection-group-snapshots/tags/batch?context_names=<remote-array> returns HTTP 500 InternalServerError (willRetry=False, "Unidentified internal error") for every remote array in the fleet. The identical request without context_names succeeds on the gateway.
Workaround (current): New-MongoSnapshot.ps1 connects directly to each fleet member in STEP 7.5 and writes post-snapshot tags (mongo:postSnap, mongo:t1ts) without -ContextName. The management FQDN for each member is derived from the gateway endpoint by replacing the short name prefix (e.g. sn1-x90r2-f06-27.puretec.purestorage.com → sn1-x90r2-f07-27.puretec.purestorage.com). All arrays must share the same management domain for this to work.
Tracked: issue #1 · Repro script: tests/Repro-PgSnapshotTagContextNames.ps1
- docs/how-it-works.md — recoverability deep dive:
$backupCursor, WiredTiger crash recovery, PITR gap invariant, Third-Party Backup API reference, node selection, full/incremental snapshots, gotchas - docs/ops-manager-install-notes.md — Ops Manager 8.0 installation on RHEL 9.5
Config.ps1— all shared topology and path variables- docs/MongoDB-Storage-Provisioning-Runbook.md — FlashArray volume provisioning runbook