You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update on-device explainer with changes + languages (#171)
* Update on-device explainer with changes + languages
* Update on-device explainer with changes + languages
---------
Co-authored-by: Evan Liu <[email protected]>
Copy file name to clipboardExpand all lines: explainers/on-device-speech-recognition.md
+79-32Lines changed: 79 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ The Web Speech API is a powerful browser feature that enables applications to pe
11
11
To address these issues, we introduce **on-device speech recognition capabilities** as part of the Web Speech API. This enhancement allows speech recognition to run locally on user devices, providing a faster, more private, and offline-compatible experience.
12
12
13
13
## Why Use On-Device Speech Recognition?
14
-
14
+
15
15
### 1. **Privacy**
16
16
On-device processing ensures that neither raw audio nor transcriptions leave the user's device, enhancing data security and user trust.
17
17
@@ -20,6 +20,36 @@ Local processing reduces latency, providing a smoother and faster user experienc
20
20
21
21
### 3. **Offline Functionality**
22
22
Applications can offer speech recognition capabilities even without an active internet connection, increasing their utility in remote or low-connectivity environments.
23
+
## New API Members
24
+
25
+
This enhancement introduces new members to the Web Speech API to support on-device recognition: a dictionary for configuration, an instance attribute, and static methods for managing capabilities.
26
+
27
+
### `SpeechRecognitionOptions` Dictionary
28
+
29
+
This dictionary is used to configure speech recognition preferences, both for individual sessions and for querying or installing capabilities.
30
+
31
+
It includes the following members:
32
+
33
+
-`langs`: A required sequence of `DOMString` representing BCP-47 language tags (e.g., `['en-US']`).
34
+
-`processLocally`: A boolean that, if `true`, instructs the recognition to be performed on-device. If `false` (the default), any available recognition method (cloud-based or on-device) may be used.
35
+
36
+
37
+
```idl
38
+
dictionary SpeechRecognitionOptions {
39
+
required sequence<DOMString> langs; // BCP-47 language tags
40
+
boolean processLocally = false; // Instructs the recognition to be performed on-device. If `false` (default), any available recognition method may be used.
41
+
};
42
+
```
43
+
44
+
#### Example Usage
45
+
```javascript
46
+
constrecognition=newSpeechRecognition();
47
+
recognition.options= {
48
+
langs: ['en-US'],
49
+
processLocally:true
50
+
};
51
+
recognition.start();
52
+
```
23
53
24
54
## Example use cases
25
55
### 1. Company with data residency requirements
@@ -31,57 +61,74 @@ Some websites would only adopt the Web Speech API if it meets strict performance
31
61
### 3. Educational website (e.g. khanacademy.org)
32
62
Applications that need to function in unreliable or offline network conditions—such as voice-based productivity tools, educational software, or accessibility features—benefit from on-device speech recognition. This enables uninterrupted functionality during flights, remote travel, or in areas with limited connectivity. When on-device recognition is unavailable, a website can choose to hide the UI or gracefully degrade functionality to maintain a coherent user experience.
This method checks if on-device speech recognition is available for a specific language. Developers can use this to determine whether to enable features that require on-device speech recognition.
69
+
The method returns a `Promise` that resolves to an `AvailabilityStatus` enum string:
70
+
-`"available"`: Ready to use according to the specified options.
71
+
-`"downloadable"`: Not currently available, but resources (e.g., language packs for on-device) can be downloaded.
72
+
-`"downloading"`: Resources are currently being downloaded.
73
+
-`"unavailable"`: Not available and not downloadable.
This method install the resources required for on-device speech recognition for the given BCP-47 language codes. The installation process may download and configure necessary language models.
This method installs the resources required for speech recognition matching the provided `SpeechRecognitionOptions`. The installation process may download and configure necessary language models.
console.log(`On-device speech recognition resources for ${options.langs.join(', ')}installed successfully.`);
60
104
} else {
61
-
console.error('Unable to install on-device speech recognition.');
105
+
console.error(`Unable to install on-device speech recognition resources for ${options.langs.join(', ')}. This could be due to unsupported languages or download issues.`);
62
106
}
63
107
});
64
108
```
65
109
66
-
## New Attribute
67
-
68
-
### 1. `mode` attribute in the `SpeechRecognition` interface
69
-
The `mode` attribute in the `SpeechRecognition` interface defines how speech recognition should behave when starting a session.
70
-
71
-
#### `SpeechRecognitionMode` Enum
72
-
73
-
-**"on-device-preferred"**: Use on-device speech recognition if available. If not, fall back to cloud-based speech recognition.
74
-
-**"on-device-only"**: Only use on-device speech recognition. If it's unavailable, throw an error.
75
-
76
-
#### Example Usage
77
-
```javascript
78
-
constrecognition=newSpeechRecognition();
79
-
recognition.mode="ondevice-only"; // Only use on-device speech recognition.
80
-
recognition.start();
81
-
```
110
+
## Supported languages
111
+
The availability of on-device speech recognition languages is user-agent dependent. As an example, Google Chrome supports the following languages for on-device recognition:
112
+
* de-DE (German, Germany)
113
+
* en-US (English, United States)
114
+
* es-ES (Spanish, Spain)
115
+
* fr-FR (French, France)
116
+
* hi-IN (Hindi, India)
117
+
* id-ID (Indonesian, Indonesia)
118
+
* it-IT (Italian, Italy)
119
+
* ja-JP (Japanese, Japan)
120
+
* ko-KR (Korean, South Korea)
121
+
* pl-PL (Polish, Poland)
122
+
* pt-BR (Portuguese, Brazil)
123
+
* ru-RU (Russian, Russia)
124
+
* th-TH (Thai, Thailand)
125
+
* tr-TR (Turkish, Turkey)
126
+
* vi-VN (Vietnamese, Vietnam)
127
+
* zh-CN (Chinese, Mandarin, Simplified)
128
+
* zh-TW (Chinese, Mandarin, Traditional)
82
129
83
130
## Privacy considerations
84
-
To reduce the risk of fingerprinting, user agents must implementing privacy-preserving countermeasures. The Web Speech API will employ the same masking techniques used by the [Web Translation API](https://github.com/webmachinelearning/writing-assistance-apis/pull/47).
131
+
To reduce the risk of fingerprinting, user agents must implement privacy-preserving countermeasures. The Web Speech API will employ the same masking techniques used by the [Web Translation API](https://github.com/webmachinelearning/writing-assistance-apis/pull/47).
85
132
86
133
## Conclusion
87
134
The addition of on-device speech recognition capabilities to the Web Speech API marks a significant step forward in creating more private, performant, and accessible web applications. By leveraging these new methods, developers can enhance user experiences while addressing key concerns around privacy and connectivity.
0 commit comments