Performance Improvements and Bug Fixes#662
Open
mikewiebe wants to merge 62 commits into
Open
Conversation
1. Only fetch VPC pair serials for vpc or aa_fex interfaces. 2. Add lookup tables for replaced, similar to what overridden already has. replaced still does linear scans of self.have and self.pb_input for each interface
1. Fix dcnm_interface unit tests based on performance refactoring 2. dcnm_intf_compare_want_and_have(), the nvPairs update handling was indented outside the for nk in nv_keys loop, so only the last nvPair comparison could trigger action = "update"
Changes 1. Targeted State Fetch in get_have() What changed: get_have() previously fetched all VRFs on the fabric and their attachments regardless of playbook scope. It now restricts the attachment query to only the VRFs named in the playbook when state is merged, replaced, or deleted. Why: A fabric with 500 VRFs but a playbook targeting 5 VRFs was fetching attachments for all 500. The targeted path adds an early-return when all wanted VRFs are new (nothing to fetch at all). States overridden and query preserve full-fabric semantics as required. Improvement: O(n) API data reduction proportional to the ratio of fabric VRFs to playbook VRFs. 2. O(1) Lookup Dictionaries for Diff Computation What changed: Added have_create_by_name, have_attach_by_name, and want_create_by_name dictionaries (keyed by vrfName) built once after get_have() and get_want() complete. Replaced all inner for-loop scans and find_dict_in_list_by_key_value(search=self.have_*/self.want_create, ...) lookups in diff_merge_create(), diff_merge_attach(), get_diff_delete(), get_diff_replace(), and get_diff_override(). Why: Every diff method was doing O(n) linear scans through have_create/have_attach/want_create for each item in an outer loop, resulting in overall O(n²) complexity. With 100 VRFs this is 10,000 comparisons instead of 100. Improvement: All diff-phase VRF lookups drop from O(n²) to O(n). 3. Bulk Attachment Fetch in get_diff_query() What changed: get_diff_query() previously made one GET /vrfs/attachments API call per VRF — N calls for N VRFs. Both the config-present and no-config branches now make a single bulk call with all VRF names comma-joined, then index the results by vrfName for O(1) access in the loop. Why: The GET_VRF_ATTACH endpoint already supports comma-separated VRF names. The per-VRF loop was an unnecessary fan-out of API calls. This mirrors the pattern already used in get_have(). Improvement: N API calls reduced to 1 per query run. 4. Early-Exit Fast Path in Delete-Readiness Polling What changed: wait_for_vrf_del_ready() previously entered a polling while-loop for every VRF unconditionally. An initial single-pass poll now checks each VRF's state before entering the retry loop, and immediately skips VRFs already in terminal states (NA, OUT-OF-SYNC, FAILED). Why: In the common case where VRFs are already detached and ready, every VRF was still making an initial API call inside the retry loop with sleep delays. The quick-scan pass avoids the loop overhead entirely for those VRFs. Improvement: Eliminates unnecessary retry-loop overhead for VRFs already in terminal states. 5. Deduplication of VRF Deploy Lists What changed: Applied sorted(set(...)) before joining all_vrfs into the vrfNames undeploy string in get_diff_delete() and get_diff_override(). Why: Defensive guard against duplicate VRF names in the undeploy payload, which could cause controller-side errors or redundant operations in edge cases. Improvement: Prevents potential duplicate undeploy requests. Testing: All 63 existing unit tests pass without modification. No changes to module interface, argument spec, or external behavior.
1. Hybrid Bulk Threshold for get_have()
Constant added: BULK_GET_HAVE_VRF_THRESHOLD = 25 at module level.
New API paths (GET_VRF_TARGETED) added to both v11 and v12 path dicts:
"/rest/top-down/fabrics/{}/vrfs?vrf-names={}"
get_have() restructured: for targeted states (merged/replaced/deleted) with explicit config:
< 25 VRFs in config: calls GET_VRF_TARGETED directly via dcnm_send, fetching only the named VRFs — avoids downloading the full fabric VRF payload.
≥ 25 VRFs in config: falls back to get_vrf_objects() (bulk GET_VRF) + in-memory filter — one bulk call is cheaper than many individual calls for large configs.
Non-targeted states (overridden/query/deleted-without-config): full-fabric sweep unchanged.
2. Batch VRF-Lite Switch Lookups
All three call sites where get_vrf_lite_objects() was called per-switch now use a pre-fetch batch per VRF instead:
get_have() outer loop: before processing the per-switch inner loop for a given VRF, collects all serial numbers for that VRF and makes one GET_VRF_SWITCH call with all serial numbers comma-joined. Results are indexed by serialNumber in lite_by_sn, used for O(1) lookup in the inner loop.
get_diff_query() config branch: before the per-switch inner loop for each want_c, makes one batch call with all switches of that VRF. Results in lite_by_sn dict; per-switch lite_attach is reconstructed from the indexed batch response.
get_diff_query() no-config branch: same pattern — one batch call per fabric VRF with all its switches.
Test updates: Added two combined fixture entries (mock_vrf_attach_get_ext_object_combined_sn1_sn2 and mock_vrf_attach_get_ext_object_combined_sn1_sn4) to the JSON fixtures, and updated the 11 test cases that exercise VRF-Lite playbook configs to use one combined mock instead of two sequential mocks
…ible-dcnm into int_perf_updates
…call reduction) - Split deletion readiness checks into separate attachment and object status methods - Remove duplicated quick-scan logic in VRF deletion flow - Use bulk GET APIs for checking all pending resources in single call per retry - Reduces deletion polling from N×M API calls to M calls (N=resources, M=retries)
…& Action Plugin + Log Module Addition for Networks. Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
… Logging in VRF/Network Action Plugin
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
…ze Pending Deploy in Networks Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
…-dcnm into performance_bulk_n
…-dcnm into performance_bulk_n
Performance Bulk Updates + Handling
Bulk API updates + Fixes
Enable actions on int_perf_updates Remove before merge to develop
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR delivers a broad set of performance improvements and bug fixes across the NDFC/DCNM collection, with a focus on reducing controller API volume, improving large-scale fabric workflows, and fixing several idempotence and deployment edge cases.
The main improvements introduce bulk API usage where supported, batched polling and delete workflows, cached controller lookups, and O(1) in-memory lookup maps for frequently accessed inventory, interface, VRF, network, link, resource manager, and vPC pair data. These changes significantly reduce repeated API calls and repeated list scans during merged, replaced, overridden, deleted, and query operations.
The PR also improves deployment accuracy for VRF and Network resources by tracking affected switch serials and supporting both switch-level and resource-level deploy modes. Network and VRF delete workflows now use batched readiness checks and bulk delete retry handling, making large-scale cleanup operations more reliable.
Several correctness fixes are included as well, including improved TOR attachment handling for networks, secondary gateway handling for MSD workflows, VRF VLAN and BGP password idempotence fixes, Resource Manager scope comparison fixes, interface native VLAN and defaulting fixes, and safer vPC pair caching/deploy behavior.
Overall, these changes make the modules faster, more scalable, and more predictable when operating against larger fabrics or repeated idempotent runs.
Changes
dcnm_interfaceIF_WITH_SNO.- Added per-switch/per-interface interface detail caching.
- Prefetches interface policy details for all switches referenced in intent before diffing.
- Splits grouped bulk interface responses into individual cache entries.
- Optimized overridden, deleted, and replaced workflows with lookup maps instead of repeated scans.
- Uses bulk interface modify endpoint when supported, including
207 Multi-Statushandling.- Flattens interface deploy payloads to reduce repeated deploy calls.
- Limits safety deploy checks to
check_deploy=truecases.nvPairskeys.- Fixed native VLAN-related idempotence gaps.
- Fixed stale/default handling for port-channel and vPC member interfaces after parent operations.
- Added TOR uplink awareness so TOR uplink ports are not incorrectly defaulted during overridden/default operations.
dcnm_inventory- Added cached POAP inventory lookup by serial number.
- Reuses existing inventory data for known switches and skips unnecessary reachability checks for idempotent runs.
- Groups create/import payloads by common role/configuration to reduce duplicate discover/import calls.
- Optimized rediscovery, health polling, credential matching, and role assignment using dictionaries/sets.
- Batches NDFC role assignments into a single roles API call.
- Reduced polling windows for all-switch readiness and config-save retries.
- Added fast-path handling for merged reruns with no new switches while still honoring save/deploy flows.
- Updated diff formatting to reflect grouped switch create/import payloads.
dcnm_links- Added bulk link update endpoint support.
- Sends all link modifications through a single bulk PUT when supported.
- Preserves existing per-link update behavior when bulk APIs are unavailable.
- Accepts both
200 OKand207 Multi-Statusfor bulk link updates.- Minor unit test expectation cleanup.
dcnm_networkdeploy_modesupport for switch-level versus resource-level deploy flows.- Added bulk network update support using
/top-down/v2/bulk-update/networkswhere available.- Added switch-level deploy helpers and serial-to-network deploy mapping.
- Added optimized network inventory retrieval using targeted GETs for small requests and bulk GETs for large/overridden requests.
- Added O(1) attachment lookup by network name.
- Added batched attachment readiness checks, detach/deploy handling, and network delete polling.
- Added bulk network delete with retry and partial-failure handling.
- Optimized attachment/deploy payload generation using serial attach/detach maps.
- Fixed merged TOR port handling so new TOR ports are appended without dropping existing TOR attachments.
- Fixed replaced/overridden TOR behavior so TOR ports are removed only when required.
- Improved parsing for mixed leaf/TOR
portNames, residual ports, and TOR-only attachment strings.- Added normalization for one-sided TOR intent on vPC peers.
- Fixed config-only deployment detection so network configuration changes deploy even without attachment deltas.
- Improved named-network not-found handling across NDFC and OneManage response variants.
- Removed direct
/tmp/dcnm.logdebug writes from the action plugin.dcnm_resource_manager- Added bulk resource creation endpoint support.
- Added bulk payload builder for
Fabric,Device,DeviceInterface,DevicePair, andLinkscopes.- Uses bulk resource create when supported and falls back to individual create otherwise.
- Accepts both
200 OKand207 Multi-Statusfor bulk create/delete operations.entityType/scopeTypecomparisons case-insensitive.- Preserves optional
resourceValueandvrfNamein generated bulk payloads.dcnm_vpc_pair- Caches pair info under both peer DB IDs when available.
- Added sync status caching for repeated deploy checks.
- Returns deep copies from cache to prevent mutation side effects.
- Reuses cached pair info across delete/query/override flows.
- Reduces unnecessary recommendation, peer-link, and sync-status API calls.
- Invalidates sync cache around deploy/save retry points so retry checks use fresh controller state.
- Refined deploy decision logic so create/merge paths go directly to deploy payload while no-change existing pairs use sync checks.
dcnm_vrfdeploy_modesupport for switch-level versus resource-level deploy flows.- Added bulk VRF update support using
/top-down/v2/bulk-update/vrfswhere available.- Added switch-level deploy helpers and serial-to-VRF deploy mapping.
- Added O(1) lookup maps for VRF create, attach, and desired state.
- Optimized VRF-lite data retrieval by batching
GET_VRF_SWITCHby VRF and serials.- Filters VRF processing to playbook-targeted VRFs before attachment/switch lookups.
- Added batched VRF attachment delete readiness checks.
- Added bulk VRF delete with retry and partial-failure handling.
enableL3VniNoVlanis enabled or desired VLAN is0.- Fixed BGP password clear handling by skipping key-type comparison when desired password is empty.
- Fixed no-attach deploy accumulation to avoid duplicating VRFs already in deploy diffs.
- Allows
OUT-OF-SYNC,FAILED, andNAterminal states during delete readiness checks.- Rebuilds lookup maps during failure/rollback paths to keep retry state consistent.
- Action plugin now validates wrapped deploy payloads and calls shared parent deploy logic.
Test Coverage
dcnm_interfaceIF_WITH_SNOprefetch/cache behavior across Ethernet, SVI, loopback, port-channel, vPC, multi-interface, bunched-interface, and overridden flows.dcnm_inventorydcnm_linksdcnm_networkdcnm_resource_managerdcnm_vpc_pairdcnm_vrfGET_VRF_SWITCHresponses across multiple switches.NAterminal VRF delete state, bulk delete retry behavior, and delete failure reporting.