Skip to content

Commit 895c5dd

Browse files
Update on-device explainer with changes + languages (#171)
* Update on-device explainer with changes + languages * Update on-device explainer with changes + languages --------- Co-authored-by: Evan Liu <[email protected]>
1 parent 4d6e11e commit 895c5dd

File tree

1 file changed

+79
-32
lines changed

1 file changed

+79
-32
lines changed

explainers/on-device-speech-recognition.md

Lines changed: 79 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ The Web Speech API is a powerful browser feature that enables applications to pe
1111
To address these issues, we introduce **on-device speech recognition capabilities** as part of the Web Speech API. This enhancement allows speech recognition to run locally on user devices, providing a faster, more private, and offline-compatible experience.
1212

1313
## Why Use On-Device Speech Recognition?
14-
14+
1515
### 1. **Privacy**
1616
On-device processing ensures that neither raw audio nor transcriptions leave the user's device, enhancing data security and user trust.
1717

@@ -20,6 +20,36 @@ Local processing reduces latency, providing a smoother and faster user experienc
2020

2121
### 3. **Offline Functionality**
2222
Applications can offer speech recognition capabilities even without an active internet connection, increasing their utility in remote or low-connectivity environments.
23+
## New API Members
24+
25+
This enhancement introduces new members to the Web Speech API to support on-device recognition: a dictionary for configuration, an instance attribute, and static methods for managing capabilities.
26+
27+
### `SpeechRecognitionOptions` Dictionary
28+
29+
This dictionary is used to configure speech recognition preferences, both for individual sessions and for querying or installing capabilities.
30+
31+
It includes the following members:
32+
33+
- `langs`: A required sequence of `DOMString` representing BCP-47 language tags (e.g., `['en-US']`).
34+
- `processLocally`: A boolean that, if `true`, instructs the recognition to be performed on-device. If `false` (the default), any available recognition method (cloud-based or on-device) may be used.
35+
36+
37+
```idl
38+
dictionary SpeechRecognitionOptions {
39+
required sequence<DOMString> langs; // BCP-47 language tags
40+
boolean processLocally = false; // Instructs the recognition to be performed on-device. If `false` (default), any available recognition method may be used.
41+
};
42+
```
43+
44+
#### Example Usage
45+
```javascript
46+
const recognition = new SpeechRecognition();
47+
recognition.options = {
48+
langs: ['en-US'],
49+
processLocally: true
50+
};
51+
recognition.start();
52+
```
2353

2454
## Example use cases
2555
### 1. Company with data residency requirements
@@ -31,57 +61,74 @@ Some websites would only adopt the Web Speech API if it meets strict performance
3161
### 3. Educational website (e.g. khanacademy.org)
3262
Applications that need to function in unreliable or offline network conditions—such as voice-based productivity tools, educational software, or accessibility features—benefit from on-device speech recognition. This enables uninterrupted functionality during flights, remote travel, or in areas with limited connectivity. When on-device recognition is unavailable, a website can choose to hide the UI or gracefully degrade functionality to maintain a coherent user experience.
3363

34-
## New Methods
64+
## New API Components
65+
66+
### 1. `static Promise<AvailabilityStatus> SpeechRecognition.available(SpeechRecognitionOptions options)`
67+
This static method checks the availability of speech recognition capabilities matching the provided `SpeechRecognitionOptions`.
3568

36-
### 1. `Promise<boolean> availableOnDevice(DOMString lang)`
37-
This method checks if on-device speech recognition is available for a specific language. Developers can use this to determine whether to enable features that require on-device speech recognition.
69+
The method returns a `Promise` that resolves to an `AvailabilityStatus` enum string:
70+
- `"available"`: Ready to use according to the specified options.
71+
- `"downloadable"`: Not currently available, but resources (e.g., language packs for on-device) can be downloaded.
72+
- `"downloading"`: Resources are currently being downloaded.
73+
- `"unavailable"`: Not available and not downloadable.
3874

3975
#### Example Usage
4076
```javascript
41-
const lang = 'en-US';
42-
SpeechRecognition.availableOnDevice(lang).then((available) => {
43-
if (available) {
44-
console.log(`On-device speech recognition is available for ${lang}.`);
77+
// Check availability for on-device English (US)
78+
const options = { langs: ['en-US'], processLocally: true };
79+
80+
SpeechRecognition.available(options).then((status) => {
81+
console.log(`Speech recognition status for ${options.langs.join(', ')} (on-device): ${status}.`);
82+
if (status === 'available') {
83+
console.log('Ready to use on-device speech recognition.');
84+
} else if (status === 'downloadable') {
85+
console.log('Resources are downloadable. Call install() if needed.');
86+
} else if (status === 'downloading') {
87+
console.log('Resources are currently downloading.');
4588
} else {
46-
console.log(`On-device speech recognition is not available for ${lang}.`);
89+
console.log('Not available for on-device speech recognition.');
4790
}
4891
});
4992
```
5093

51-
### 2. `Promise<boolean> installOnDevice(DOMString[] lang)`
52-
This method install the resources required for on-device speech recognition for the given BCP-47 language codes. The installation process may download and configure necessary language models.
94+
### 2. `Promise<boolean> install(SpeechRecognitionOptions options)`
95+
This method installs the resources required for speech recognition matching the provided `SpeechRecognitionOptions`. The installation process may download and configure necessary language models.
5396

5497
#### Example Usage
5598
```javascript
56-
const lang = 'en-US';
57-
SpeechRecognition.installOnDevice([lang]).then((success) => {
99+
// Install on-device resources for English (US)
100+
const options = { langs: ['en-US'], processLocally: true };
101+
SpeechRecognition.install(options).then((success) => {
58102
if (success) {
59-
console.log('On-device speech recognition resources installed successfully.');
103+
console.log(`On-device speech recognition resources for ${options.langs.join(', ')} installed successfully.`);
60104
} else {
61-
console.error('Unable to install on-device speech recognition.');
105+
console.error(`Unable to install on-device speech recognition resources for ${options.langs.join(', ')}. This could be due to unsupported languages or download issues.`);
62106
}
63107
});
64108
```
65109

66-
## New Attribute
67-
68-
### 1. `mode` attribute in the `SpeechRecognition` interface
69-
The `mode` attribute in the `SpeechRecognition` interface defines how speech recognition should behave when starting a session.
70-
71-
#### `SpeechRecognitionMode` Enum
72-
73-
- **"on-device-preferred"**: Use on-device speech recognition if available. If not, fall back to cloud-based speech recognition.
74-
- **"on-device-only"**: Only use on-device speech recognition. If it's unavailable, throw an error.
75-
76-
#### Example Usage
77-
```javascript
78-
const recognition = new SpeechRecognition();
79-
recognition.mode = "ondevice-only"; // Only use on-device speech recognition.
80-
recognition.start();
81-
```
110+
## Supported languages
111+
The availability of on-device speech recognition languages is user-agent dependent. As an example, Google Chrome supports the following languages for on-device recognition:
112+
* de-DE (German, Germany)
113+
* en-US (English, United States)
114+
* es-ES (Spanish, Spain)
115+
* fr-FR (French, France)
116+
* hi-IN (Hindi, India)
117+
* id-ID (Indonesian, Indonesia)
118+
* it-IT (Italian, Italy)
119+
* ja-JP (Japanese, Japan)
120+
* ko-KR (Korean, South Korea)
121+
* pl-PL (Polish, Poland)
122+
* pt-BR (Portuguese, Brazil)
123+
* ru-RU (Russian, Russia)
124+
* th-TH (Thai, Thailand)
125+
* tr-TR (Turkish, Turkey)
126+
* vi-VN (Vietnamese, Vietnam)
127+
* zh-CN (Chinese, Mandarin, Simplified)
128+
* zh-TW (Chinese, Mandarin, Traditional)
82129

83130
## Privacy considerations
84-
To reduce the risk of fingerprinting, user agents must implementing privacy-preserving countermeasures. The Web Speech API will employ the same masking techniques used by the [Web Translation API](https://github.com/webmachinelearning/writing-assistance-apis/pull/47).
131+
To reduce the risk of fingerprinting, user agents must implement privacy-preserving countermeasures. The Web Speech API will employ the same masking techniques used by the [Web Translation API](https://github.com/webmachinelearning/writing-assistance-apis/pull/47).
85132

86133
## Conclusion
87134
The addition of on-device speech recognition capabilities to the Web Speech API marks a significant step forward in creating more private, performant, and accessible web applications. By leveraging these new methods, developers can enhance user experiences while addressing key concerns around privacy and connectivity.

0 commit comments

Comments
 (0)