-
Notifications
You must be signed in to change notification settings - Fork 68
CLI code agent: prompt user to retry on transient API errors #3130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -74,6 +74,12 @@ function createBaseEnvironment(): Record< string, string > { | |
| delete env.ANTHROPIC_CUSTOM_HEADERS; | ||
| delete env.CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS; | ||
|
|
||
| // Fail fast on transient API errors so the user-mediated retry prompt can | ||
| // intervene instead of the SDK burning through its default 10 retries. | ||
| if ( ! env.CLAUDE_CODE_MAX_RETRIES ) { | ||
| env.CLAUDE_CODE_MAX_RETRIES = '1'; | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is to prevent the SDK internally re-trying. The default is 10 so before this change, the SDK was re-trying after a timeout without notifying the client code, so if the timeout is 4 mins, it would take 40 minutes to fail. Now, it fails immediately, and the agent handles the error; the user is notified that
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if we should retry at least once without user knowing. |
||
| } | ||
|
|
||
| return env; | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -358,8 +358,11 @@ export async function runCommand( options: { | |
| return answers; | ||
| } | ||
|
|
||
| const MAX_RETRY_ATTEMPTS = 4; | ||
|
|
||
| async function runAgentTurn( | ||
| prompt: string | ||
| prompt: string, | ||
| retryAttempt = 0 | ||
| ): Promise< { status: TurnStatus; usage?: { numTurns: number; costUsd?: number } } > { | ||
| const env = await resolveAiEnvironment( currentProvider ); | ||
| ui.beginAgentTurn(); | ||
|
|
@@ -425,14 +428,22 @@ export async function runCommand( options: { | |
| numTurns: result.numTurns, | ||
| }; | ||
| turnStatus = 'max_turns'; | ||
| } else if ( result.interrupted ) { | ||
| turnStatus = 'interrupted'; | ||
| } else { | ||
| turnStatus = result.success ? 'success' : 'error'; | ||
| } | ||
| } | ||
| } | ||
| } catch ( error ) { | ||
| turnStatus = 'error'; | ||
| throw error; | ||
| // In JSON mode there's no interactive retry, so re-throw and let | ||
| // the caller record the error. In interactive mode, fall through | ||
| // so the post-loop retry prompt offers the user a chance to retry. | ||
| if ( isJsonMode ) { | ||
| throw error; | ||
| } | ||
| ui.showError( getErrorMessage( error ) ); | ||
| } finally { | ||
| await persist( ( recorder ) => recorder.recordTurnClosed( turnStatus ) ); | ||
| ui.endAgentTurn(); | ||
|
|
@@ -462,6 +473,33 @@ export async function runCommand( options: { | |
| } | ||
| } | ||
|
|
||
| if ( turnStatus === 'error' && ! isJsonMode ) { | ||
| if ( retryAttempt >= MAX_RETRY_ATTEMPTS ) { | ||
| ui.showInfo( | ||
| __( 'The server has not recovered after multiple attempts. Please try again later.' ) | ||
| ); | ||
| } else { | ||
| const answer = await ui.askUser( [ | ||
| { | ||
| question: __( 'There was a hiccup on the server. Do you want to continue?' ), | ||
| options: [ | ||
| { label: 'Yes', description: __( 'Continue from where you left off' ) }, | ||
| { label: 'No', description: __( 'Stop here' ) }, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if a third-option (other) is needed where the user can give different instructions. |
||
| ], | ||
| }, | ||
| ] ); | ||
| const choice = Object.values( answer )[ 0 ]?.toLowerCase(); | ||
| if ( choice === 'yes' ) { | ||
| ui.showInfo( __( 'Retrying…' ) ); | ||
| // If the SDK threw before emitting any result, sessionId is | ||
| // still whatever it was before this turn; without one to resume, | ||
| // replay the original prompt instead. | ||
| const retryPrompt = sessionId ? 'Continue from where you left off.' : prompt; | ||
| return runAgentTurn( retryPrompt, retryAttempt + 1 ); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return { | ||
| status: turnStatus, | ||
| usage: maxTurnsResult, | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not strictly related to these changes, but I have identified that if the user presses ESC while in the
Take screenshotstep, the process terminates abruptly, so I added this to prevent that scenario. Now, it is correctly handled.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like there should be a better way to catch these SDK errors other than checking strings but it seems there's no other valid option now?