Skip to content

feat: Endpoint timeout middleware#120

Closed
ahmadbky wants to merge 5 commits intomasterfrom
feat-mw-timeout
Closed

feat: Endpoint timeout middleware#120
ahmadbky wants to merge 5 commits intomasterfrom
feat-mw-timeout

Conversation

@ahmadbky
Copy link
Copy Markdown
Member

@ahmadbky ahmadbky commented Mar 6, 2026

No description provided.

This new GraphQL node returns the trending event editions, meaning the
editions where players set the most records in the last days, with a
certain limit.

Refs: #102.
* Fix the default order of the `players` and `maps` GraphQL nodes, which
  is based on the score of the players/maps, but was sorting them in an
  ascending order when the "order" input was set to "ASCENDING", and
  vice-versa when set to "DESCENDING". This seems logical, but it
  implies that the default sort is by score in the ascending order,
  which leads to a sort by rank in a descending order, which is not
  expected, as we want the default sort to be by rank in an ascending
  order.
* Fix column selection mismatch between the `maps` and `players` table
  for the fetch of the `maps()` GraphQL node.
Previously, the `lastDays` argument of the `trendingEventEditions`
GraphQL node had a default value of 7. But it is better to not have a
default value at all, and let the user specify it, with a max of 30
days.
* Add the middleware for the request timeout. Make it configurable with
  custom durations and handlers in case of timeout.
* Add configuration environment variables for the timeout used by the
  middleware for the endpoints, and another for the URL of the Discord
  webhook.
* Add unit tests.
* Wrap this middleware for the main server before wrapping the
  `TracingLogger`.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a slow-request timeout notification middleware to the Actix-based game_api, while also introducing several unrelated GraphQL API changes (new trending query + ranking/leaderboard adjustments).

Changes:

  • Add RequestTimeoutNotifier Actix middleware plus associated env configuration for timeout thresholds.
  • Tighten session key handling/parsing and update CI to provide the session key during tests.
  • Extend GraphQL API with trending_event_editions, adjust ranking query selection/sorting, and add a limit argument for mappack leaderboard.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
crates/graphql-api/src/objects/root.rs Adds trending_event_editions; adjusts select and sort behavior in cursor-paginated ranking queries.
crates/graphql-api/src/objects/mappack.rs Adds limit parameter to leaderboard Redis ZRANGE.
crates/game_api/src/main.rs Wires the new slow-request timeout notifier middleware into the Actix app.
crates/game_api/src/env.rs Adds request timeout env var; changes session key handling to parse into cookie::Key.
crates/game_api/src/configure/slow_req_mw.rs New middleware implementation + unit tests; adds tracing/webhook timeout handlers.
crates/game_api/src/configure.rs Exposes the new slow_req_mw module.
.github/workflows/ci.yml Provides RECORDS_API_SESSION_KEY to CI tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +6 to +15
use std::future::{Ready, ready};
use std::time::Duration;
use tokio::time::sleep;

#[derive(Clone)]
pub struct WebhookTimeoutHandler;

impl TimeoutHandler for WebhookTimeoutHandler {
fn on_timeout(&self, _: &TimeoutInfo) {
// TODO: webhook handler
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebhookTimeoutHandler::on_timeout is currently a no-op (TODO), but it is already wired into the middleware chain in main.rs and there is an env var (WEBHOOK_REQUEST_TIMEOUT) suggesting it should notify. Either implement the webhook behavior (and read the URL from env) or remove this handler from the chain until it’s functional to avoid silently dropping timeout alerts.

Suggested change
use std::future::{Ready, ready};
use std::time::Duration;
use tokio::time::sleep;
#[derive(Clone)]
pub struct WebhookTimeoutHandler;
impl TimeoutHandler for WebhookTimeoutHandler {
fn on_timeout(&self, _: &TimeoutInfo) {
// TODO: webhook handler
use std::env;
use std::future::{Ready, ready};
use std::time::Duration;
use tokio::time::sleep;
use reqwest::Client;
use serde_json::json;
#[derive(Clone)]
pub struct WebhookTimeoutHandler;
impl TimeoutHandler for WebhookTimeoutHandler {
fn on_timeout(&self, info: &TimeoutInfo) {
// Read webhook URL from environment; if not set or empty, do nothing.
let webhook_url = match env::var("WEBHOOK_REQUEST_TIMEOUT") {
Ok(url) if !url.is_empty() => url,
_ => return,
};
// Clone fields needed inside the async task.
let endpoint_path = info.endpoint_path.clone();
let timeout_ms = info.timeout.as_millis();
// Spawn an async task to send the webhook without blocking the caller.
tokio::spawn(async move {
let client = Client::new();
let payload = json!({
"endpoint_path": endpoint_path,
"timeout_ms": timeout_ms,
});
if let Err(err) = client.post(webhook_url).json(&payload).send().await {
tracing::error!(error = %err, "Failed to send timeout webhook");
}
});

Copilot uses AI. Check for mistakes.
Comment on lines +159 to +167
async fn slow_req_mw_catch() {
static TIMEOUT_CATCHED: OnceLock<()> = OnceLock::new();

#[derive(Clone)]
struct OnTimeout;
impl TimeoutHandler for OnTimeout {
fn on_timeout(&self, _: &super::TimeoutInfo) {
let _ = TIMEOUT_CATCHED.set(());
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test flag is named TIMEOUT_CATCHED, which is grammatically incorrect and makes the intent harder to read in assertions. Consider renaming to TIMEOUT_CAUGHT (or TIMEOUT_TRIGGERED) to improve clarity.

Copilot uses AI. Check for mistakes.
use records_lib::{DbEnv, LibEnv};

fn parse_session_key(input: &str) -> Result<Key, Box<dyn Error>> {
Key::try_from(input.as_bytes()).map_err(From::from)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_session_key passes the raw string bytes directly into Key::try_from without trimming. When sess_key is loaded via file_read(), the file content will commonly include a trailing newline, which will change the byte length and can cause Key::try_from to fail unexpectedly. Trim whitespace/newlines (e.g., input.trim()) before converting to bytes.

Suggested change
Key::try_from(input.as_bytes()).map_err(From::from)
Key::try_from(input.trim().as_bytes()).map_err(From::from)

Copilot uses AI. Check for mistakes.
Comment on lines +151 to +154
or_default_val(|| Duration::from_secs(10)),
],
description: "The timeout for a request to be considered as running for too long",
default_val_fmt: "10s",
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUEST_TIMEOUT_MS is parsed as milliseconds, but the displayed default is 10s and the default value is set via Duration::from_secs(10). This is confusing for operators because the env var name implies milliseconds. Either rename the variable (e.g. REQUEST_TIMEOUT) or change the default/display to milliseconds (e.g. 10000ms) so units are consistent.

Suggested change
or_default_val(|| Duration::from_secs(10)),
],
description: "The timeout for a request to be considered as running for too long",
default_val_fmt: "10s",
or_default_val(|| Duration::from_millis(10_000)),
],
description: "The timeout for a request to be considered as running for too long",
default_val_fmt: "10000ms",

Copilot uses AI. Check for mistakes.
Comment on lines 89 to 98
App::new()
.wrap(cors)
.wrap(middleware::from_fn(configure::mask_internal_errors))
.wrap(middleware::from_fn(configure::fit_request_id))
.wrap(configure::slow_req_mw::RequestTimeoutNotifier::new(
game_api_lib::env().request_timeout.get(),
TracingTimeoutHandler.chain_with(WebhookTimeoutHandler),
))
.wrap(TracingLogger::<configure::RootSpanBuilder>::new())
.wrap(
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestTimeoutNotifier is wrapped before TracingLogger and SessionMiddleware, so the timeout measurement excludes time spent in those outer middlewares (Actix runs the last wrap() as the outermost layer). If the goal is to warn on end-to-end request latency, consider registering this middleware last (outermost) so it covers the full pipeline.

Copilot uses AI. Check for mistakes.
Comment on lines +149 to 158
limit: Option<isize>,
) -> GqlResult<Vec<MappackPlayer<'a>>> {
let db = ctx.data_unchecked::<Database>();
let mut redis_conn = db.redis_pool.get().await?;

let limit = limit.map(|l| l.saturating_sub(1)).unwrap_or(-1);

let leaderboard: Vec<u32> = redis_conn
.zrange(mappack_lb_key(AnyMappackId::Id(&self.mappack_id)), 0, -1)
.zrange(mappack_lb_key(AnyMappackId::Id(&self.mappack_id)), 0, limit)
.await?;
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaderboard’s limit handling turns 0 into -1 via saturating_sub(1), which makes Redis return the full leaderboard instead of an empty result. Negative limits will also produce surprising ranges. Consider validating limit (e.g. require limit > 0) and translating it to an inclusive stop index only for positive values.

Copilot uses AI. Check for mistakes.
Comment on lines +264 to +269
async fn trending_event_editions(
&self,
ctx: &async_graphql::Context<'_>,
last_days: Option<u8>,
limit: Option<u64>,
) -> GqlResult<Vec<EventEdition<'_>>> {
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is titled about an endpoint timeout middleware, but it also introduces unrelated GraphQL/API behavior changes here (new trending_event_editions query and ranking sort logic changes later in the file). Consider splitting these into separate PRs (or updating the PR title/description) to keep review scope and release notes accurate.

Copilot uses AI. Check for mistakes.
tracing::warn!(
"Request to {} took more than {}ms",
info.endpoint_path,
info.timeout.as_micros()
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TracingTimeoutHandler logs a "{}ms" suffix but uses info.timeout.as_micros(), which reports microseconds. This will produce incorrect values in logs; use as_millis() (or change the message/unit to µs) to keep units consistent with the configured timeout.

Suggested change
info.timeout.as_micros()
info.timeout.as_millis()

Copilot uses AI. Check for mistakes.
@ahmadbky ahmadbky closed this Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants