Skip to content

Add TTS #647

Open
kmcnaught wants to merge 3 commits into
TheJoeFin:mainfrom
kmcnaught:tts
Open

Add TTS #647
kmcnaught wants to merge 3 commits into
TheJoeFin:mainfrom
kmcnaught:tts

Conversation

@kmcnaught
Copy link
Copy Markdown

As per #446

I built this for an accessibility use case - reading onscreen text out from games for a dyslexic user. I used Claude Code with human review and steering. Thanks for the good base on which to build!

I'm opening a PR mainly to show what I've done - you'll probably want to rewrite in your preferred way (or ask your agent to!) but I thought it was helpful reference at least.

Notes:

  • I added the functionality as a post-grab action first, but with notifications on as well, I found the notification chime kept interrupting (cancelling) the TTS. I'm not sure if this is solveable another way.
  • I then added an option to use TTS instead of a notification - this gets round the problem for now (and works for my 1-user use case) but it's a bit messy UX.
  • Both post-grab and 'instead of notification' are implemented - I expect you'll only want one or other of them

kmcnaught and others added 3 commits May 22, 2026 11:34
Adds a "Speak text" PostGrabAction that reads OCR output aloud using
Windows.Media.SpeechSynthesis. Speaks the final transformed text after
all other actions run in FullscreenGrab; in GrabFrame, speaks only when
the captured text changes to avoid repeating on every OCR tick.

- ITtsEngine interface for future engine swappability
- WindowsSpeechEngine wraps WinRT SpeechSynthesizer + MediaPlayer
- TtsService queues utterances via Channel<string> so new text waits
  rather than interrupting in-progress speech
- TtsSpeakWordLimit setting (default 100) truncates long captures;
  configurable in General Settings
- PostGrabActionManager: new SpeakText_Click action at order 6.6
- GrabFrame: speaks on text change when action is checked
- Tests: count updated to 6, "Speak text" assertion, fire-and-forget test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds SpeakInsteadOfToast setting: when enabled, Text Grab speaks the
captured text aloud rather than chiming a notification toast. Defaults
to enabled.

Includes:
- Stop-speaking button in GrabFrame to cancel playback mid-sentence
- TTS drain-on-shutdown fix so the queue empties cleanly on exit
- Stop() method on TtsService to flush the queue and cancel speech
Adds a dedicated Voice Output page in Settings containing:
- Voice picker populated from SpeechSynthesizer.AllVoices
- Word limit setting
- Speak-instead-of-notification toggle (moved from General Settings)
- Preview button to hear the selected voice

WindowsSpeechEngine now applies the saved TtsVoiceName before
synthesising. TtsVoiceName setting added (empty = system default).
@kmcnaught
Copy link
Copy Markdown
Author

Forgot one note:

  • If you don't have the app staying alive in the system tray, you need to ensure that the TTS completes before shutdown, otherwise it gets cut off almost immediately. I'm not sure if it would be more appropriate to have an overlay element controlling the TTS that stays there until it's finished, to allow you to stop if for example you've accidentally triggered it on a large body of text. hmm.

Copy link
Copy Markdown
Owner

@TheJoeFin TheJoeFin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good, there are a few changes that need to be made, and please rebase onto dev if that has too many conflicts let me know.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a radio group with the notifications to be

After grab always:

  • Nothing
  • Show notification
  • Speak text aloud

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for this implementation but makes me think there needs to be a more central service for handling 'text action' like copy/speak/etc.

We don't need to solve now just a thought

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good for an MVP of this feature. The only thing I think might be nice is adding a way to set synthesizer.Options.SpeakingRate since the default setting felt a little slow. Also if a user has all of these speech options set up on their Windows settings do those flow through here or do they need to be set twice?

Comment on lines -60 to +63
<Value Profile="(Default)">False</Value>
<Value Profile="(Default)">True</Value>
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo this change, keep the default option for Make single line as False

Comment on lines +24 to +34
<ui:ToggleSwitch
x:Name="SpeakInsteadOfToastToggle"
Checked="SpeakInsteadOfToastToggle_Checked"
Unchecked="SpeakInsteadOfToastToggle_Checked">
<TextBlock Style="{StaticResource TextBodyNormal}">
Speak text instead of showing notification
</TextBlock>
</ui:ToggleSwitch>
<TextBlock Margin="0,4,0,0" Style="{StaticResource TextBodyNormal}">
Speaks the grabbed text aloud rather than showing a notification.
</TextBlock>
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this is enabled AND the Post grab 'speak' option is enabled the text is spoken twice.

break;

case "SpeakText_Click":
Singleton<TtsService>.Instance.Speak(text);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure the 'speak always' setting is not enabled to avoid speaking twice

Comment on lines +974 to +986

<Button
x:Name="StopSpeakingBTN"
Grid.Column="6"
Width="30"
Height="30"
Margin="6,0,0,0"
Click="StopSpeakingBTN_Click"
Style="{StaticResource SymbolButton}"
ToolTip="Stop speaking"
Visibility="Collapsed">
<ui:SymbolIcon Symbol="SpeakerOff24" />
</Button>
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never actually see this button appear 🤔

@kmcnaught
Copy link
Copy Markdown
Author

Thanks for the review, will get on it.

Quick question: do you agree that we should have both the post-grab action and the "speak instead of notification" ? If so, I'll just make sure to avoid the duplicate speech. But I wasn't completely sure about the UX

@TheJoeFin
Copy link
Copy Markdown
Owner

@kmcnaught let's just do the post-grab for now and work on brining TTS to more places later. So to that point, let's pull out everywhere that setting adds code

  • Settings.settings, Settings.designer.cs, App.config
  • Word Border double click
  • Grab Frame stop button

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants