@@ -1039,3 +1039,164 @@ Expected Outcomes
10391039- Comprehensive **documentation **, including setup guides and best
10401040 practices.
10411041- A **short tutorial video ** demonstrating installation and usage.
1042+
1043+ Persistent & Scheduled Firmware Upgrades
1044+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1045+
1046+ .. image :: ../images/gsoc/ideas/2023/firmware.jpg
1047+
1048+ .. important ::
1049+
1050+ Languages and technologies used: **Python **, **Django **, **Celery **,
1051+ **REST API **, **JavaScript **.
1052+
1053+ **Mentors **: *Federico Capoano *, *TBA *.
1054+
1055+ **Project size **: 350 hours.
1056+
1057+ **Difficulty rate **: medium.
1058+
1059+ This project aims to enhance `OpenWISP Firmware Upgrader
1060+ <https://github.com/openwisp/openwisp-firmware-upgrader> `__ with two
1061+ complementary features that improve reliability and operational
1062+ flexibility for mass firmware upgrades: **persistent retries ** for offline
1063+ devices (`#379
1064+ <https://github.com/openwisp/openwisp-firmware-upgrader/issues/379> `__)
1065+ and **scheduled execution ** for planned maintenance windows (`#380
1066+ <https://github.com/openwisp/openwisp-firmware-upgrader/issues/380> `__).
1067+
1068+ Currently, firmware upgrades in OpenWISP happen immediately via Celery
1069+ tasks. If a device is offline at the moment of upgrade, the task fails and
1070+ requires manual retry. In large deployments, this becomes unmanageable.
1071+ Additionally, network operators need the ability to schedule upgrades
1072+ during low-usage windows without manual intervention at execution time.
1073+
1074+ Expected outcomes
1075+ +++++++++++++++++
1076+
1077+ Introduce support for **persistent mass upgrades ** that automatically
1078+ retry for offline devices and **scheduled mass upgrades ** that execute at
1079+ a user-defined future time.
1080+
1081+ 1. **Persistent mass upgrades ** (`#379
1082+ <https://github.com/openwisp/openwisp-firmware-upgrader/issues/379> `__)
1083+
1084+ Mass upgrade operations should be able to retry indefinitely for
1085+ devices that are offline at the initial execution time.
1086+
1087+ - Add a ``persistent `` boolean field to mass upgrade operations
1088+ (visible in admin and REST API, checked by default, immutable after
1089+ creation).
1090+ - Track retry count and scheduled retry time in the
1091+ ``UpgradeOperation `` model.
1092+ - Implement **device online detection **:
1093+
1094+ - Prefer using the ``health_status_changed `` signal from OpenWISP
1095+ Monitoring (with mocking for testing).
1096+ - Fallback: periodic retries with randomized exponential backoff
1097+ (configurable, max once every 12 hours).
1098+
1099+ - Implement **retry strategy **:
1100+
1101+ - Randomized exponential backoff with indefinite retries.
1102+ - Periodic reminders (default every 2 months) via
1103+ ``generic_notification `` to admins about devices still pending
1104+ upgrade, with links filtering pending devices.
1105+ - Continue until admin cancels or all devices are upgraded.
1106+
1107+ - **Integration with Celery **: Use a new Celery task to "wake up"
1108+ pending upgrades, with randomized delays to prevent system overload.
1109+ - **Failure handling **: Use ``generic_notification `` for failures
1110+ requiring attention (devices offline too long, upgrade errors).
1111+ - **Edge cases **: Handle concurrent signal triggers, ensure only one
1112+ upgrade per device, no rollback support needed.
1113+
1114+ 2. **Scheduled mass upgrades ** (`#380
1115+ <https://github.com/openwisp/openwisp-firmware-upgrader/issues/380> `__)
1116+
1117+ Allow users to schedule mass upgrades for future execution.
1118+
1119+ - **UI **: Add optional datetime scheduling on mass upgrade confirmation
1120+ page. Default is immediate execution unless a future datetime is set.
1121+ - **Validation **: Scheduled datetime must be:
1122+
1123+ - In the future
1124+ - Respect minimum delay (e.g., 10 minutes)
1125+ - Not exceed maximum horizon (e.g., 6 months)
1126+
1127+ - **Timezone handling **: User input in browser timezone, storage in
1128+ UTC, server timezone clearly indicated in UI.
1129+ - **Status model **: Extend to include ``scheduled `` state with
1130+ transitions: scheduled → running, scheduled → canceled, scheduled →
1131+ failed.
1132+ - **Execution model **: Use Celery Beat periodic task (every minute) to
1133+ scan and execute due upgrades. **Avoid Celery eta/countdown ** for
1134+ reliability with far-future tasks.
1135+ - **Runtime validation **: Re-evaluate devices, permissions, firmware
1136+ availability at execution time. Cancel with error if all targets
1137+ become invalid.
1138+ - **Conflict prevention **: Prevent creating conflicting mass upgrades
1139+ (scheduled or immediate) when one is already pending.
1140+ - **Notifications **: Send ``generic_notification `` when scheduled
1141+ upgrades start and complete.
1142+
1143+ 3. **Combined features **
1144+
1145+ Scheduled upgrades should also support persistence. A scheduled upgrade
1146+ that starts but has offline devices should continue retrying according
1147+ to the persistence logic.
1148+
1149+ 4. **General requirements **
1150+
1151+ - Operations editable only while in ``scheduled `` status.
1152+ - Clear exposure of scheduled status and datetime in admin list, detail
1153+ view, and REST API.
1154+ - Full feature parity between Django admin and REST API.
1155+
1156+ 5. **Testing and documentation **
1157+
1158+ - Test coverage **must not decrease ** from current levels.
1159+ - **Browser tests ** for the scheduling UI and admin interface workflows
1160+ are required.
1161+ - Documentation has to be kept up to date, including:
1162+
1163+ - Usage instructions for persistent and scheduled upgrades.
1164+ - Updated screenshots reflecting UI changes.
1165+ - One short example usage video per each feature.
1166+
1167+ Prerequisites to work on this project
1168+ +++++++++++++++++++++++++++++++++++++
1169+
1170+ Applicants must demonstrate a solid understanding of:
1171+
1172+ - **Python **, **Django **, and **JavaScript **.
1173+ - REST APIs and background task processing (Celery, Celery Beat).
1174+ - Timezone handling and datetime management.
1175+ - Experience with `OpenWISP Firmware Upgrader
1176+ <https://github.com/openwisp/openwisp-firmware-upgrader> `__ is
1177+ essential. Contributions or resolved issues in this repository are
1178+ considered strong evidence of the required proficiency.
1179+
1180+ Open questions for contributors
1181+ +++++++++++++++++++++++++++++++
1182+
1183+ 1. **Persistence implementation **: What is the optimal database schema for
1184+ tracking persistent upgrade state while maintaining compatibility with
1185+ existing upgrade operation models?
1186+ 2. **Scheduling mechanism **: How exactly should the Celery Beat periodic
1187+ task be configured to reliably detect and execute due scheduled
1188+ upgrades without performance issues?
1189+ 3. **Timezone UX **: What is the best way to handle timezone display and
1190+ input in the admin interface to minimize user confusion?
1191+ 4. **Backoff strategy **: What are the optimal parameters for randomized
1192+ exponential backoff (initial delay, max delay, randomization factor)?
1193+ 5. **Conflict detection **: How should conflicting operations be detected
1194+ and prevented? What defines a "conflict"?
1195+ 6. **Monitoring integration **: How exactly should the
1196+ ``health_status_changed `` signal from OpenWISP Monitoring be integrated
1197+ for optimal online detection?
1198+ 7. **Notification frequency **: What are the optimal default periods for
1199+ reminder notifications about pending persistent upgrades?
1200+ 8. **Edge case handling **: How should edge cases be handled, such as
1201+ devices that are offline for months, or mass upgrades with very large
1202+ device counts?
0 commit comments