Add TTS #647
Conversation
Adds a "Speak text" PostGrabAction that reads OCR output aloud using Windows.Media.SpeechSynthesis. Speaks the final transformed text after all other actions run in FullscreenGrab; in GrabFrame, speaks only when the captured text changes to avoid repeating on every OCR tick. - ITtsEngine interface for future engine swappability - WindowsSpeechEngine wraps WinRT SpeechSynthesizer + MediaPlayer - TtsService queues utterances via Channel<string> so new text waits rather than interrupting in-progress speech - TtsSpeakWordLimit setting (default 100) truncates long captures; configurable in General Settings - PostGrabActionManager: new SpeakText_Click action at order 6.6 - GrabFrame: speaks on text change when action is checked - Tests: count updated to 6, "Speak text" assertion, fire-and-forget test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds SpeakInsteadOfToast setting: when enabled, Text Grab speaks the captured text aloud rather than chiming a notification toast. Defaults to enabled. Includes: - Stop-speaking button in GrabFrame to cancel playback mid-sentence - TTS drain-on-shutdown fix so the queue empties cleanly on exit - Stop() method on TtsService to flush the queue and cancel speech
Adds a dedicated Voice Output page in Settings containing: - Voice picker populated from SpeechSynthesizer.AllVoices - Word limit setting - Speak-instead-of-notification toggle (moved from General Settings) - Preview button to hear the selected voice WindowsSpeechEngine now applies the saved TtsVoiceName before synthesising. TtsVoiceName setting added (empty = system default).
|
Forgot one note:
|
TheJoeFin
left a comment
There was a problem hiding this comment.
Overall looking good, there are a few changes that need to be made, and please rebase onto dev if that has too many conflicts let me know.
There was a problem hiding this comment.
There should be a radio group with the notifications to be
After grab always:
- Nothing
- Show notification
- Speak text aloud
There was a problem hiding this comment.
This is fine for this implementation but makes me think there needs to be a more central service for handling 'text action' like copy/speak/etc.
We don't need to solve now just a thought
There was a problem hiding this comment.
This seems good for an MVP of this feature. The only thing I think might be nice is adding a way to set synthesizer.Options.SpeakingRate since the default setting felt a little slow. Also if a user has all of these speech options set up on their Windows settings do those flow through here or do they need to be set twice?
| <Value Profile="(Default)">False</Value> | ||
| <Value Profile="(Default)">True</Value> |
There was a problem hiding this comment.
undo this change, keep the default option for Make single line as False
| <ui:ToggleSwitch | ||
| x:Name="SpeakInsteadOfToastToggle" | ||
| Checked="SpeakInsteadOfToastToggle_Checked" | ||
| Unchecked="SpeakInsteadOfToastToggle_Checked"> | ||
| <TextBlock Style="{StaticResource TextBodyNormal}"> | ||
| Speak text instead of showing notification | ||
| </TextBlock> | ||
| </ui:ToggleSwitch> | ||
| <TextBlock Margin="0,4,0,0" Style="{StaticResource TextBodyNormal}"> | ||
| Speaks the grabbed text aloud rather than showing a notification. | ||
| </TextBlock> |
There was a problem hiding this comment.
When this is enabled AND the Post grab 'speak' option is enabled the text is spoken twice.
| break; | ||
|
|
||
| case "SpeakText_Click": | ||
| Singleton<TtsService>.Instance.Speak(text); |
There was a problem hiding this comment.
make sure the 'speak always' setting is not enabled to avoid speaking twice
|
|
||
| <Button | ||
| x:Name="StopSpeakingBTN" | ||
| Grid.Column="6" | ||
| Width="30" | ||
| Height="30" | ||
| Margin="6,0,0,0" | ||
| Click="StopSpeakingBTN_Click" | ||
| Style="{StaticResource SymbolButton}" | ||
| ToolTip="Stop speaking" | ||
| Visibility="Collapsed"> | ||
| <ui:SymbolIcon Symbol="SpeakerOff24" /> | ||
| </Button> |
There was a problem hiding this comment.
I never actually see this button appear 🤔
|
Thanks for the review, will get on it. Quick question: do you agree that we should have both the post-grab action and the "speak instead of notification" ? If so, I'll just make sure to avoid the duplicate speech. But I wasn't completely sure about the UX |
|
@kmcnaught let's just do the post-grab for now and work on brining TTS to more places later. So to that point, let's pull out everywhere that setting adds code
|
As per #446
I built this for an accessibility use case - reading onscreen text out from games for a dyslexic user. I used Claude Code with human review and steering. Thanks for the good base on which to build!
I'm opening a PR mainly to show what I've done - you'll probably want to rewrite in your preferred way (or ask your agent to!) but I thought it was helpful reference at least.
Notes: