-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Description:
When the supporter lookup fails in the custom flow for org 310, the server currently halts confirmation email processing. This caused an incident, where most confirmation emails were not sent.
Background:
Org 310 runs a customized flow that performs a lookup to check if a supporter is already subscribed before sending the confirmation email (saving their action?).
If it is subscribed, the supporter gets one one-button letter, if not, they have the normal two buttons (confirm action+ opt-in vs. confirm action + opt-out).
In both cases, an email should be sent
That lookup is a micro-service running locally and checking on the CRM if the supporter is already a member or not and "translate" between the API response and what proca expects. It has a bunch of extra features and local cache to make that lookup faster.
The CRM API went down and impacted that lockup microservice that impacted proca that - in some cases at least - failed before reaching the confirmation email step. Roughly 1400 actions were delayed or unconfirmed due to this behavior. Actions were saved as expected, but supporters had no way to confirm them due to a lack of emails
Expected behavior:
If the lookup request fails, the system should default to two-button emails and continue sending the confirmation email.
A lookup fail can be:
- connection error
- no data returned
- returned an http error (40x 50x) or anything else than 200
- wrong data returned
- taking too long to reply
- one of the million of ways a system can fail
in all these, proca should send an email (the default two buttons email)
integration with promotheus
we should send events (lookup existing email, lookup non existing email, error) to make it easier to monitor