Skip to content

Latest commit

 

History

History
182 lines (146 loc) · 8.7 KB

File metadata and controls

182 lines (146 loc) · 8.7 KB

Repflow Lambda Case Interview Instructions

Context

This exercise focuses on the AWS Lambda found in lambda/lambda_function.py. It processes inbound emails received via Amazon SES and S3, classifies and parses potential sponsorship opportunities with AI agents, persists conversation data via a backend API (lambda/api_client.py), conditionally creates deals, and may send a reply via SES.

Your goal is to demonstrate your ability to:

  • Read and understand unfamiliar code quickly.
  • Infer the business use case and data flow.
  • Identify design patterns and pitfalls.
  • Propose and justify fixes and improvements.
  • Design and write tests using mocks/doubles for external systems.

Timebox your work as appropriate (suggested total: 2–3 hours). Prioritize clarity and judgment.


Folder Contents You’ll Use

  • lambda/lambda_function.py — Main Lambda logic and AI orchestration.
  • lambda/api_client.py — Async client for backend API calls (aiohttp).
  • lambda/package.py — Packaging helper for Lambda (reference only, do not run).
  • lambda/database/ — Data models (reference only; Lambda uses API, not DB directly).

You do not need the rest of the repository to complete this exercise.


Environment Assumptions (Do Not Call Real Services)

  • You will NOT invoke real AWS or HTTP calls. Use mocks.
  • Expected environment variables:
    • API_BASE_URL
    • API_AUTH_TOKEN (required by create_api_client() in api_client.py)
    • S3_BUCKET
    • SES_REGION
    • OPENAI_API_KEY (agents are mocked; no real calls)

High-Level Flow (What the Lambda Does)

Key entry points and helpers in lambda/lambda_function.py:

  • lambda_handler(event, context)async_lambda_handler(event, context)
  • Parse event: parse_ses_event() → builds EmailEvent (message_id, thread_id, sender/recipient, subject)
  • Fetch email body: fetch_email_from_s3(bucket, key)
  • Early exits/gates:
    • Already processed: was_email_processed() / mark_email_processed()
    • Blacklist: is_blacklisted()
    • No-reply detection via regex
  • Store inbound message: store_message() (SimpleMessage via API)
  • Build contextual conversation history: get_conversation_history() + format_conversation_history()
  • Run FSM for deal + conversation: run_deal_conversation_fsm()
    • Agents: deal_filter_agent, deal_parser_agent, comprehensive_preference_agent, qa_polisher_agent
    • Qualification logic: check_qualification_criteria()
    • Possible deal creation: create_deal_from_qualified_lead() and APIClient.create_conversation_for_deal()
  • Conditional reply via SES: send_ses_reply()

Also present (for reference): process_multi_agent_qualification() (legacy pipeline-style flow), not the primary path.


Your Tasks

1) Code Comprehension and Business Inference

  • Deliver a concise system diagram or bullet flow covering the steps above.
  • Summarize the business use case: qualifying inbound brand sponsorships, gathering missing info, evaluating against creator preferences (User.preferences), creating deals, continuing the conversation, and replying.

Artifacts to submit:

  • A short write-up (Markdown) with the diagram/flow and business summary.

2) Design Patterns and Trade-offs

Identify patterns and discuss trade-offs with specific references to functions/classes:

  • Orchestration/Pipeline: run_deal_conversation_fsm() vs process_multi_agent_qualification()
  • State Machine: LeadDealState, ConversationState (from database.models.lead)
  • DTOs/Validation: LeadQualificationData, Deal, QAPolisherResponse, EmailEvent
  • API Client abstraction: APIClient methods (get_lead_by_thread_id, create_deal, etc.)
  • Utilities: extract_email_address, normalize_subject_for_thread, generate_thread_id, extract_budget_value

Artifacts:

  • Bullet list of patterns and trade-offs with code citations.

3) Pitfalls, Edge Cases, and Risks

Provide a prioritized list with suggested mitigations. Examples to consider (not exhaustive):

  • ID field inconsistency: deals sometimes use id vs leads using _id (see create_deal_from_qualified_lead() vs log lines in get_lead_by_thread_id()).
  • Signature mismatch: qa_polisher_agent forbids signature but send_ses_reply() appends one.
  • Async correctness: sync boto3 inside async handler (fetch_email_from_s3, send_ses_reply) may block.
  • Idempotency placement: mark_email_processed() timing vs side effects.
  • S3 availability: missing or delayed object should be handled gracefully before agent calls.
  • Thread ID design: hashing normalized subject + sorted emails can collide for similar threads.
  • Packaging/runtime mismatch: package.py pins Python 3.13 which may not match AWS supported runtime.
  • Logging/PII: printing full event payload.

Artifacts:

  • Risk list with code references and your proposed mitigations.

4) Testing Design and Targeted Unit/Integration Tests

Write tests with mocks/doubles. Do not call real services.

  • Pure function unit tests:

    • extract_email_address() (various header formats)
    • normalize_subject_for_thread() (strips Re/Fwd variants, whitespace)
    • generate_thread_id() (direction/case invariance; subject variations)
    • extract_budget_value() (currencies, commas, ranges → lower bound)
    • determine_deal_type_from_budget() (affiliate/hybrid/ugc/flat)
    • is_partnership_type_accepted() (acceptance matrix)
    • format_conversation_history() output markers
  • FSM path tests (mock agents + APIClient):

    • Deal Filter → Sponsorship high confidence vs Other/low confidence
    • Deal Parser populates some fields, leaves others missing
    • Preference Agent returns CONTINUE/NEGOTIATE/REJECT
    • Verify run_deal_conversation_fsm() transitions and side effects, including deal creation when check_qualification_criteria() passes
  • Handler integration-ish tests (async):

    • Already processed → early 200
    • Blacklisted/no-reply → mark processed and 200
    • Existing qualified lead with conversation → create_message_for_conversation() path and fallback to store_message()
    • New Sponsorship thread → INFO_GATHERING ask or finalization; ensure store_message() and reply logic; mark_email_processed() called
    • SES send success/failure: outbound message stored only on success

Artifacts:

  • Test plan write-up and code (prefer pytest + pytest-asyncio).
  • Use mocks for:
    • agents.Runner.run(...) returning instances compatible with DealFilterResponse, LeadQualificationData, PreferenceEvaluationDraftResponse, QAPolisherResponse
    • APIClient methods (get_lead_by_thread_id, create_lead, update_lead, create_deal, get_conversation_history, create_conversation_for_deal, create_message_for_conversation, create_simple_message, check_email_processed, mark_email_processed, check_email_blacklisted)
    • boto3 S3/SES calls in fetch_email_from_s3 and send_ses_reply

5) Proposed Fixes and Small Refactors

Provide a short design note with specific, scoped changes (cite functions/lines):

  • Align data contracts: standardize id vs _id for deals/leads throughout the Lambda.
  • Signature policy: remove or make signature optional in send_ses_reply() to match qa_polisher_agent rules.
  • Idempotency: move mark_email_processed() earlier and add idempotent inserts on message creation.
  • Resilience: add retries/backoff to APIClient._make_request() for transient errors.
  • Async correctness: consider aioboto3 or run sync boto3 calls in a thread executor.
  • Config hygiene: require env vars; reduce PII logs; fix package.py runtime to match Lambda.
  • Observability: structured logs keyed by thread_id; basic counters for outcomes.

Artifacts:

  • A brief Markdown with your proposed changes and rationale.

Sample SES Event Payload

Use this as a starting point for tests of parse_ses_event() and the handler. Timestamps may vary.

{
  "Records": [{
    "ses": {
      "mail": {
        "messageId": "abcdef123",
        "timestamp": "2025-10-08T21:06:52Z",
        "commonHeaders": {
          "from": ["Brand Rep <rep@brand.com>"],
          "to": ["creator@repflow.app"],
          "subject": "Sponsorship opportunity for Q4"
        }
      },
      "receipt": {}
    }
  }]
}

Submission

Please provide:

  • Your Markdown write-ups (Stages 1–3 and 5) in a single SOLUTION.md.
  • Your tests (a tests/ folder is fine) with instructions to run them locally using mocks. Prefer pytest (pytest-asyncio for async), and indicate your Python version.
  • Brief notes on assumptions and any scope you intentionally left out due to time.

We will evaluate:

  • Clarity and correctness of your understanding.
  • Practicality and prioritization of your critiques and fixes.
  • Thoughtful testing strategy with good isolation.
  • Communication quality and technical judgment.