引言

随着人工智能技术的快速发展,语音识别已成为现代应用的重要组成部分。在Unity开发中,集成语音识别功能可以极大提升用户体验,特别是在游戏、VR/AR应用和交互式展示中。与传统云端语音识别方案不同,离线语音识别无需网络连接,具有更好的隐私保护性和实时性。Vosk作为一款开源的离线语音识别库,凭借其轻量级、高精度和跨平台特性,成为Unity开发者的理想选择。

Vosk基于深度神经网络和隐马尔可夫模型(DNN-HMM),支持20多种语言,包括中文、英语、法语、德语等,并提供多种规模的预训练模型以满足不同场景的需求。其完全离线的特性使其特别适合对数据隐私要求较高的应用场景。

本文将详细介绍如何在Unity项目中配置Vosk环境,为实现语音识别功能奠定基础。

1. Vosk简介与特点

Vosk是一个基于Kaldi语音识别工具包构建的开源离线语音识别引擎,具有以下核心特点:

  • 完全离线工作:无需网络连接,所有数据处理均在设备本地完成,保证了数据安全和隐私保护
  • 多语言支持:支持中文、英文、法文、德文等20多种语言,满足国际化项目需求
  • 轻量高效:模型体积小(最小仅12MB),内存占用低,在树莓派等嵌入式设备上也能流畅运行
  • 高准确率:基于深度学习算法,在安静环境下识别准确率可达95%以上
  • 跨平台兼容:支持Windows、Linux、macOS、Android和iOS等多个平台
  • 实时识别:提供流式API,支持实时语音识别,延迟控制在200ms以内

与其他语音识别方案相比,Vosk在资源消耗和响应速度方面表现优异,特别适合集成到Unity项目中实现实时语音交互功能。

2. 环境配置准备工作

在开始集成Vosk前,需要完成以下准备工作:

2.1 系统与Unity要求

  • Unity版本:建议使用2019.4或更高版本,支持.NET 4.x或更高版本
  • 操作系统:Windows、macOS或Linux开发环境
  • 存储空间:至少1GB可用空间(用于存放模型文件)

2.2 下载Vosk相关文件

  1. Vosk Unity插件:从GitHub获取Vosk的C#绑定库(https://github.com/alphacep/vosk-unity-asr)
  2. 语音模型:从Vosk模型库(https://alphacephei.com/vosk/models)下载所需语言模型:
    • 中文小型模型(vosk-model-small-cn-0.22,约42MB):适合移动设备和嵌入式系统
    • 中文标准模型(vosk-model-cn-0.22,约1.3GB):提供更高精度,适合服务器或高性能设备

3. Unity项目配置步骤

3.1 创建Unity项目并导入Vosk

  1. 新建Unity项目或打开现有项目
  2. 在Assets文件夹中创建Plugins文件夹,存放Vosk的DLL文件(如libvosk.dllvosk.dll等)
  3. 将下载的Vosk Unity插件文件导入到项目中

3.2 导入模型文件

  1. 在Assets目录下创建StreamingAssets文件夹(如果尚未存在)
  2. 将下载的模型压缩包(如vosk-model-small-cn-0.22.zip)直接放入StreamingAssets文件夹中
    • 注意:无需解压模型文件,Vosk可以直接读取压缩包内容

3.3 配置播放器设置

  1. 打开"File > Build Settings > Player Settings"
  2. 在"Configuration"选项中,确保".NET Runtime Version"设置为".NET 4.x"或更高版本
  3. 根据目标平台进行相应设置:
    • Windows:无需特殊配置
    • Android:确保设置适当的权限(麦克风访问权限)
    • iOS:需要额外配置麦克风使用描述

4. 模型选择与优化建议

4.1 模型选择策略

根据应用场景选择合适的模型至关重要:

模型类型 大小 适用场景 硬件要求
小型模型 40-50MB 移动设备、嵌入式系统 低端CPU,256MB+内存
标准模型 1.3-1.5GB 桌面应用、服务器 多核CPU,2GB+内存
专业模型 1.5GB+ 专业语音识别 高性能CPU,8GB+内存

4.2 性能优化建议

  • 音频格式配置:确保音频输入为16kHz、16位单声道格式,这是Vosk模型的标准输入格式
  • 预处理优化:使用音频滤波算法减少背景噪音干扰
  • 资源管理:在不需要语音识别时及时释放识别器资源,减少内存占用
  • 多线程处理:将语音识别处理放在单独线程中,避免阻塞主线程

5. 常见问题与解决方案

在配置和使用Vosk过程中可能会遇到以下常见问题:

  1. 模型加载失败

    • 原因:模型路径错误或模型文件不完整
    • 解决:检查模型文件是否放置在StreamingAssets文件夹中,并确认文件完整性
  2. 识别准确率低

    • 原因:环境噪音或音频格式不匹配
    • 解决:添加音频预处理环节,确保输入音频符合16kHz、16位单声道要求
  3. 性能问题

    • 原因:模型过大或硬件资源不足
    • 解决:根据设备性能选择合适的模型规模,或考虑添加加载屏幕
  4. 平台兼容性问题

    • 原因:不同平台的库文件不兼容
    • 解决:确保使用针对目标平台编译的Vosk库文件

6. 基于Vosk的AI聊天代码实现

主控模块

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
using UnityEngine;
using UnityEngine.Networking;
using UnityEngine.UI;
using TMPro;
using System;
using System.Collections;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

public class AiChat : MonoBehaviour
{
[Header("UI 绑定")]
[SerializeField] public TMP_InputField inputField; // 问题输入框
[SerializeField] public Button submitButton; // 提交按钮
[SerializeField] Text answerText; // 答案文本框
[SerializeField] Toggle ttsToggle; // 语音合成开关

[Header("API Settings")]
[SerializeField] string apiUrl = "接口链接";
[SerializeField] string apiKey = "密钥";

[Header("Chat Settings")]
[SerializeField] public string askTag = "";
[SerializeField] public AIChatAssistant getRAGChat;


private bool isStreaming = false;
private Coroutine streamCoroutine;
private StringBuilder fullResponse = new StringBuilder();

void Start()
{
//判断网络状态
if (Application.internetReachability != NetworkReachability.NotReachable)
{
// 原:绑定按钮点击事件
// submitButton.onClick.AddListener(OnSubmitClicked);

// 当前已被RAGC模块劫持
submitButton.onClick.AddListener(OnSubmitClick);
// 直接监听RAGChat组件
submitButton.onClick.AddListener(() => StartCoroutine(getRAGChat.SubmitQuestion()));

// 初始状态设置
submitButton.interactable = true;
answerText.text = "等待输入问题...";
//默认打开语音合成开关
ttsToggle.isOn = true;
}
else
{
answerText.text = "请检查网络连接!";
}
}

// 点击劫持方法
void OnSubmitClick()
{
getRAGChat.questionInput = inputField.text;
}

public void OnSubmitClicked()
{
if (string.IsNullOrWhiteSpace(inputField.text))
{
answerText.text = "<color=#FF0000>请输入有效问题!</color>";
return;
}

// 如果已有请求在进行中,先停止
if (isStreaming && streamCoroutine != null)
{
StopCoroutine(streamCoroutine);
}

// 重置状态
isStreaming = true;
fullResponse.Clear();
answerText.text = "思考中...";
submitButton.interactable = false;

// 开始流式请求
streamCoroutine = StartCoroutine(StreamChatCompletion(inputField.text));
}

IEnumerator StreamChatCompletion(string userMessage)
{
// 准备请求数据
var requestData = new RequestData
{
messages = new List<Message>
{
new Message { role = "user", content = userMessage + "," + askTag }
},
stream = true
};

string jsonPayload = JsonUtility.ToJson(requestData);
byte[] payloadBytes = Encoding.UTF8.GetBytes(jsonPayload);

// 创建Web请求
using (UnityWebRequest request = new UnityWebRequest(apiUrl, "POST"))
{
request.uploadHandler = new UploadHandlerRaw(payloadBytes);
request.downloadHandler = new DownloadHandlerBuffer();
request.SetRequestHeader("Content-Type", "application/json");
request.SetRequestHeader("Authorization", "Bearer " + apiKey);
request.disposeDownloadHandlerOnDispose = true;

// 发送请求
yield return request.SendWebRequest();

if (request.result == UnityWebRequest.Result.ConnectionError ||
request.result == UnityWebRequest.Result.ProtocolError)
{
Debug.LogError($"API Error: {request.error}");
Debug.LogError($"Response Code: {request.responseCode}");
Debug.LogError($"Response: {request.downloadHandler.text}");
answerText.text = $"<color=#FF0000>请求失败: {request.error}</color>";
isStreaming = false;
submitButton.interactable = true;
yield break;
}

// 获取完整响应
string rawResponse = request.downloadHandler.text;
Debug.Log($"Raw API Response: {rawResponse}");

// 处理响应
if (string.IsNullOrEmpty(rawResponse))
{
answerText.text = "<color=#FFA500>服务器返回空响应</color>";
yield break;
}

// 分割响应行
string[] responseLines = rawResponse.Split('\n');
bool receivedValidResponse = false;

foreach (string line in responseLines)
{
if (string.IsNullOrWhiteSpace(line)) continue;

string trimmedLine = line.Trim();

// 检查结束标记
if (trimmedLine == "[DONE]")
{
Debug.Log("Received [DONE] marker");
break;
}

// 处理SSE格式 (data: {...})
string jsonStr = trimmedLine;
if (trimmedLine.StartsWith("data:"))
{
jsonStr = trimmedLine.Substring(5).Trim();
}

// 跳过事件标记
if (jsonStr == "event:message") continue;

try
{
// 反转义处理
string unescapedStr = jsonStr
.Replace("\\\"", "\"")
.Replace("\\\\", "\\")
.Replace("\\n", "\n")
.Replace("\\r", "\r")
.Replace("\\t", "\t");

// 移除多余的双引号
if (unescapedStr.StartsWith("\"") && unescapedStr.EndsWith("\""))
{
unescapedStr = unescapedStr.Substring(1, unescapedStr.Length - 2);
}

// 调试输出
Debug.Log($"Processing line: {unescapedStr}");

// 解析JSON
var response = JsonUtility.FromJson<StreamResponse>(unescapedStr);

// 提取内容
if (response.choices != null && response.choices.Length > 0)
{
if (response.choices[0].delta != null &&
!string.IsNullOrEmpty(response.choices[0].delta.content))
{
string content = response.choices[0].delta.content;
fullResponse.Append(content);
answerText.text = fullResponse.ToString();
receivedValidResponse = true;
}
}
}
catch (Exception e)
{
Debug.LogWarning($"解析错误: {e.Message}\n原始数据: {jsonStr}");
}

yield return null; // 确保UI更新
}

// 完成处理
isStreaming = false;
submitButton.interactable = true;

// 文本转语音 —————————
// 校验是否打开自动语音合成 —————————————————————————————————
if (ttsToggle.isOn) UITTSController.Instance.OnConvertClick();

if (!receivedValidResponse)
{
// 尝试提取错误信息
if (rawResponse.Contains("error"))
{
try
{
var errorResponse = JsonUtility.FromJson<ErrorResponse>(rawResponse);
answerText.text = $"<color=#FF0000>API错误: {errorResponse.error.message}</color>";
}
catch
{
answerText.text = $"<color=#FF0000>未知API错误: {rawResponse}</color>";
}
}
else if (fullResponse.Length > 0)
{
answerText.text = fullResponse.ToString();
}
else
{
answerText.text = $"<color=#FFA500>未收到有效响应,原始数据:\n{rawResponse}</color>";
}
}
}
}

// 请求数据结构
[System.Serializable]
private class RequestData
{
public List<Message> messages;
public bool stream;
}

[System.Serializable]
private class Message
{
public string role;
public string content;
}

// 响应数据结构
[System.Serializable]
private class StreamResponse
{
public string id;
public string @object;
public int created;
public string model;
public Choice[] choices;
}

[System.Serializable]
private class Choice
{
public int index;
public Delta delta;
public object logprobs;
public string finish_reason;
}

[System.Serializable]
private class Delta
{
public string content;
}

// 错误响应结构
[System.Serializable]
private class ErrorResponse
{
public ErrorInfo error;
}

[System.Serializable]
private class ErrorInfo
{
public string message;
public string type;
public string code;
}
}

TTS语音合成模块

AudioManager.cs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
using System.Collections;
using UnityEngine;
using UnityEngine.Networking;

public class AudioManager : MonoBehaviour
{
private AudioSource audioSource;

void Awake()
{
audioSource = gameObject.AddComponent<AudioSource>();
}

public IEnumerator DownloadAndPlayAudio(string url)
{
Debug.Log($"开始下载音频: {url}");

// 强制指定MIME类型
using (UnityWebRequest www = UnityWebRequestMultimedia.GetAudioClip(url, AudioType.MPEG))
{
((DownloadHandlerAudioClip)www.downloadHandler).streamAudio = true;
((DownloadHandlerAudioClip)www.downloadHandler).compressed = false;

// 添加超时控制
www.timeout = 15;
var operation = www.SendWebRequest();

while (!operation.isDone)
{
Debug.Log($"下载进度: {www.downloadProgress:P}");
yield return null;
}

if (www.result != UnityWebRequest.Result.Success)
{
Debug.LogError($"下载失败: {www.error},响应头: {www.GetResponseHeaders()}");
yield break;
}

Debug.Log($"音频下载完成,长度: {www.downloadedBytes} bytes");
AudioClip clip = DownloadHandlerAudioClip.GetContent(www);

if (clip == null || clip.length == 0)
{
Debug.LogError("音频解码失败");
yield break;
}

audioSource.clip = clip;
audioSource.Play();
Debug.Log("音频开始播放");
}

using (UnityWebRequest www = UnityWebRequestMultimedia.GetAudioClip(url, AudioType.MPEG))
{

yield return www.SendWebRequest();

if (www.result != UnityWebRequest.Result.Success)
{
Debug.LogError($"音频下载失败: {www.error}");
yield break;
}

AudioClip clip = DownloadHandlerAudioClip.GetContent(www);
audioSource.clip = clip;
audioSource.Play();
}
}

public void TogglePause()
{
if (audioSource.isPlaying)
{
audioSource.Pause();
}
else
{
audioSource.UnPause();
}
}

public void StopPlayback()
{
audioSource.Stop();
}

public bool IsPlaying()
{
return audioSource.isPlaying;
}

public float GetPlaybackProgress()
{
if (audioSource.clip == null || Mathf.Approximately(audioSource.clip.length, 0f))
{
return 0f;
}
return Mathf.Clamp01(audioSource.time / audioSource.clip.length);
}
}

BaiduTTSController.cs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
using System.Collections;
using UnityEngine;
using UnityEngine.Networking;

public class BaiduTTSController : MonoBehaviour
{
// 在百度云控制台获取的实际凭证
private const string CLIENT_ID = "ID";
private const string CLIENT_SECRET = "密钥";
private string accessToken = "";

// 异步获取Access Token
public IEnumerator GetAccessToken()
{
string url = $"百度智能云链接client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}";

using (UnityWebRequest www = UnityWebRequest.Get(url))
{
yield return www.SendWebRequest();

if (www.result != UnityWebRequest.Result.Success)
{
Debug.LogError($"Token请求失败: {www.error}");
yield break;
}

TokenResponse response = JsonUtility.FromJson<TokenResponse>(www.downloadHandler.text);
accessToken = response.access_token;
}
}

// 创建语音合成任务
public IEnumerator CreateTTSTask(string text, System.Action<string> callback)
{
string apiUrl = $"接口链接access_token={accessToken}";

CreateTaskRequest requestData = new CreateTaskRequest
{
text = text,
format = "mp3-16k",
voice = 0,
lang = "zh",
speed = 5,
pitch = 5,
volume = 5
};

using (UnityWebRequest www = new UnityWebRequest(apiUrl, "POST"))
{
byte[] bodyRaw = System.Text.Encoding.UTF8.GetBytes(JsonUtility.ToJson(requestData));
www.uploadHandler = new UploadHandlerRaw(bodyRaw);
www.downloadHandler = new DownloadHandlerBuffer();
www.SetRequestHeader("Content-Type", "application/json");

yield return www.SendWebRequest();

if (www.result != UnityWebRequest.Result.Success)
{
Debug.LogError($"任务创建失败: {www.error}");
yield break;
}

TaskCreateResponse response = JsonUtility.FromJson<TaskCreateResponse>(www.downloadHandler.text);
callback?.Invoke(response.task_id);
}
}

// 查询任务状态
public IEnumerator QueryTaskStatus(string taskId, System.Action<string> callback)
{
string apiUrl = $"接口链接access_token={accessToken}";

QueryTaskRequest requestData = new QueryTaskRequest
{
task_ids = new string[] { taskId }
};

using (UnityWebRequest www = new UnityWebRequest(apiUrl, "POST"))
{
byte[] bodyRaw = System.Text.Encoding.UTF8.GetBytes(JsonUtility.ToJson(requestData));
www.uploadHandler = new UploadHandlerRaw(bodyRaw);
www.downloadHandler = new DownloadHandlerBuffer();
www.SetRequestHeader("Content-Type", "application/json");

yield return www.SendWebRequest();

if (www.result != UnityWebRequest.Result.Success)
{
Debug.LogError($"状态查询失败: {www.error}");
yield break;
}

TaskQueryResponse response = JsonUtility.FromJson<TaskQueryResponse>(www.downloadHandler.text);
if (response.tasks_info.Length > 0 && response.tasks_info[0].task_status == "Success")
{
callback?.Invoke(response.tasks_info[0].task_result.speech_url);
}

if (response.tasks_info.Length > 0 && response.tasks_info[0].task_status == "Success")
{
string audioUrl = response.tasks_info[0].task_result.speech_url;

Debug.Log($"获取音频地址: {audioUrl}");


// 添加URL预验证
using (UnityWebRequest headRequest = UnityWebRequest.Head(audioUrl))
{
yield return headRequest.SendWebRequest();
if (headRequest.result == UnityWebRequest.Result.Success)
{
callback?.Invoke(audioUrl);
}
else
{
Debug.LogError($"音频地址不可用: {headRequest.error}");
}
}
}

}
}

// 数据模型
[System.Serializable]
private class TokenResponse
{
public string access_token;
}

[System.Serializable]
private class CreateTaskRequest
{
public string text;
public string format;
public int voice;
public string lang;
public int speed;
public int pitch;
public int volume;
}

[System.Serializable]
private class TaskCreateResponse
{
public string task_id;
}

[System.Serializable]
private class QueryTaskRequest
{
public string[] task_ids;
}

[System.Serializable]
private class TaskQueryResponse
{
public TaskInfo[] tasks_info;
}

[System.Serializable]
private class TaskInfo
{
public string task_status;
public TaskResult task_result;
}

[System.Serializable]
private class TaskResult
{
public string speech_url;
}
}

UITTSController.cs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
using System.Collections;
using TMPro;
using UnityEngine;
using UnityEngine.UI;

public class UITTSController : MonoBehaviour
{
public static UITTSController Instance { get; private set; }
private void Awake()
{
if (Instance != null && Instance != this)
{
Destroy(gameObject);
return;
}
Instance = this;
}

[Header("UI Components")]
public Text inputField;
public Button convertButton;
public Text statusText;
public Slider progressSlider;

private BaiduTTSController ttsController;
private AudioManager audioManager;

void Start()
{
ttsController = gameObject.AddComponent<BaiduTTSController>();
audioManager = gameObject.AddComponent<AudioManager>();

//判断网络状态
if(Application.internetReachability != NetworkReachability.NotReachable){
StartCoroutine(InitializeSystem());
// 按钮事件绑定
convertButton.onClick.AddListener(OnConvertClick);
}
else{
statusText.text = "网络连接失败";
}

}


IEnumerator InitializeSystem()
{
statusText.text = "正在初始化";
yield return ttsController.GetAccessToken();
statusText.text = "朗读语音就绪";
convertButton.interactable = true;
}

public void OnConvertClick()
{
if (string.IsNullOrEmpty(inputField.text)) return;

StartCoroutine(ConvertProcess());
}

IEnumerator ConvertProcess()
{
convertButton.interactable = false;
statusText.text = "正在生成语音";

// 创建任务
yield return ttsController.CreateTTSTask(inputField.text, (taskId) => {
StartCoroutine(PollTaskStatus(taskId));
});
}

IEnumerator PollTaskStatus(string taskId)
{
float timeout = 30f;
float pollInterval = 1f;
bool isCompleted = false;

while (timeout > 0 && !isCompleted)
{
statusText.text = $"处理中...{timeout}秒";

// 使用Coroutine等待单次查询完成
yield return StartCoroutine(ttsController.QueryTaskStatus(taskId, (audioUrl) => {
StartCoroutine(PlayAudio(audioUrl));
isCompleted = true;
}));

if (isCompleted) break;

yield return new WaitForSeconds(pollInterval);
timeout -= pollInterval;
}

if (!isCompleted)
{
statusText.text = "请求超时";
Debug.LogError("状态轮询超时,最后响应数据:");
}
convertButton.interactable = true;
}

IEnumerator PlayAudio(string url)
{
statusText.text = "正在转载...";
yield return audioManager.DownloadAndPlayAudio(url);

statusText.text = "播放中";
convertButton.interactable = true;

// 更新进度条
while (audioManager.IsPlaying())
{
progressSlider.value = audioManager.GetPlaybackProgress();
yield return null;
}
}
}

Vosk语音识别模块

VoskSpeechRecognizer.cs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
using UnityEngine;
using UnityEngine.UI;
using System.Threading;
using System.Collections.Concurrent;
using System.Collections;
using Vosk;
using Newtonsoft.Json.Linq;
using TMPro;

public class VoskSpeechRecognizer : MonoBehaviour
{
public Button toggleButton;
public Text resultText;
public TMP_InputField outputText;
public string modelPath = Application.streamingAssetsPath + "/Assets/vosk-model-small-cn-0.22"; // 替换为你的模型路径

private VoskRecognizer recognizer;
private AudioClip recordingClip;
private bool isRecording;
private Thread recognitionThread;
private int sampleRate = 16000;

// 主线程安全的变量
private string displayText = "语音识别就绪";
private string threadStatus = "";
private string partialResult = "";
private string finalResult = "";

private ConcurrentQueue<float[]> audioDataQueue = new ConcurrentQueue<float[]>();
private ConcurrentQueue<string> statusQueue = new ConcurrentQueue<string>();
private int lastPosition = 0;
private bool modelInitialized = false;

void Start()
{
displayText = "初始化中...";
resultText.text = displayText;
StartCoroutine(InitializeModel());
}

IEnumerator InitializeModel()
{
try
{
// 初始化Vosk环境
Vosk.Vosk.SetLogLevel(0);
Model model = new Model(modelPath);
recognizer = new VoskRecognizer(model, sampleRate);
modelInitialized = true;

displayText = "就绪,点击按钮开始识别";
toggleButton.interactable = true;
toggleButton.onClick.AddListener(ToggleRecording);
}
catch (System.Exception e)
{
displayText = $"初始化失败: {e.Message}";
Debug.LogError(e);
}
yield return null;
}

void Update()
{
// 1. 在主线程收集音频数据
if (isRecording && Microphone.IsRecording(null))
{
int currentPosition = Microphone.GetPosition(null);
if (currentPosition < lastPosition)
{
statusQueue.Enqueue("检测到音频缓冲区循环");
lastPosition = 0;
}

if (currentPosition > lastPosition)
{
int sampleCount = currentPosition - lastPosition;
float[] samples = new float[sampleCount];

if (recordingClip != null)
{
recordingClip.GetData(samples, lastPosition);
audioDataQueue.Enqueue(samples);
lastPosition = currentPosition;
}
else
{
statusQueue.Enqueue("错误:录音Clip为空");
}
}
}

// 2. 处理来自后台线程的状态更新
while (statusQueue.TryDequeue(out string status))
{
threadStatus = status;
Debug.Log(status);
}

// 3. 更新显示文本(优先级:最终结果 > 部分结果 > 线程状态 > 默认文本)
if (!string.IsNullOrEmpty(finalResult))
{
displayText = $"最终结果: {finalResult}";
outputText.text = finalResult;
}
else if (!string.IsNullOrEmpty(partialResult))
{
displayText = $"实时识别: {partialResult}";
outputText.text = partialResult;
}
else if (!string.IsNullOrEmpty(threadStatus))
{
displayText = threadStatus;
}

// 4. 更新UI
resultText.text = displayText;

}

void ToggleRecording()
{
if (!modelInitialized)
{
displayText = "模型未初始化完成";
return;
}

isRecording = !isRecording;
toggleButton.GetComponentInChildren<Text>().text = isRecording ? "停止" : "开始";

if (isRecording)
{
// 开始录音
try
{
displayText = "正在启动麦克风...";

// 重置状态
partialResult = "";
finalResult = "";
threadStatus = "";
audioDataQueue = new ConcurrentQueue<float[]>();
lastPosition = 0;

recordingClip = Microphone.Start(null, true, 10, sampleRate);

if (recordingClip == null)
{
displayText = "无法创建录音Clip";
isRecording = false;
return;
}

statusQueue.Enqueue("音频处理线程启动");

recognitionThread = new Thread(ProcessAudio);
recognitionThread.IsBackground = true;
recognitionThread.Start();
}
catch (System.Exception e)
{
displayText = $"启动录音失败: {e.Message}";
isRecording = false;
Debug.LogError(e);
}
}
else
{
// 停止录音
displayText = "正在停止录音...";
Microphone.End(null);
isRecording = false;

if (recognitionThread != null && recognitionThread.IsAlive)
{
recognitionThread.Abort();
}

if (!string.IsNullOrEmpty(finalResult))
{
displayText = $"最终结果: {finalResult}";
}
else
{
displayText = "识别结束,无结果";
}
}
}

void ProcessAudio()
{
statusQueue.Enqueue("音频处理线程启动");

while (isRecording)
{
if (audioDataQueue.TryDequeue(out float[] samples))
{
// 转换为字节数据
byte[] audioBytes = new byte[samples.Length * 2];
for (int i = 0; i < samples.Length; i++)
{
short sample = (short)(samples[i] * short.MaxValue);
audioBytes[i * 2] = (byte)(sample & 0xFF);
audioBytes[i * 2 + 1] = (byte)(sample >> 8);
}

try
{
// 语音识别处理
if (recognizer.AcceptWaveform(audioBytes, audioBytes.Length))
{
var result = recognizer.Result();
finalResult = JObject.Parse(result)["text"]?.ToString() ?? "无文本结果";
partialResult = "";
statusQueue.Enqueue($"最终结果: {finalResult}");
}
else
{
var partial = recognizer.PartialResult();
partialResult = JObject.Parse(partial)["partial"]?.ToString() ?? "解析部分结果失败";
statusQueue.Enqueue($"部分结果: {partialResult}");
}
}
catch (System.Exception e)
{
statusQueue.Enqueue($"识别错误: {e.Message}");
Debug.LogError(e);
}
}
else
{
Thread.Sleep(10);
}
}
}

void OnApplicationQuit()
{
isRecording = false;
if (recognitionThread != null && recognitionThread.IsAlive)
{
recognitionThread.Abort();
}

if (recognizer != null)
{
recognizer.Dispose();
}

Debug.Log("Vosk资源已释放");
}

// 添加移动端麦克风权限检查
IEnumerator RequestMicrophonePermission()
{
if (Application.platform == RuntimePlatform.Android ||
Application.platform == RuntimePlatform.IPhonePlayer)
{
displayText = "请求麦克风权限...";
yield return Application.RequestUserAuthorization(UserAuthorization.Microphone);

if (!Application.HasUserAuthorization(UserAuthorization.Microphone))
{
displayText = "需要麦克风权限";
yield break;
}
}

// 继续初始化
StartCoroutine(InitializeModel());
}
}

结语

通过以上步骤,我们成功在Unity项目中配置了Vosk离线语音识别环境。Vosk作为一个轻量级、高精度的离线语音识别解决方案,为Unity开发者提供了实现语音交互功能的强大工具。其离线特性特别适合对数据隐私要求高的应用场景,而跨平台支持则使得一次开发即可部署到多种设备。

正确配置环境只是实现语音识别的第一步,在实际开发中还需要根据具体应用场景调整参数和优化性能。建议从小型模型开始测试,逐步优化识别效果,再根据需求决定是否需要升级到更大规模的模型。

随着语音交互技术的不断发展,Vosk这样的离线识别方案将在更多应用场景中发挥重要作用,为用户提供更自然、更安全的交互体验。

参考资料

  1. Vosk官方模型库
  2. Vosk Unity插件GitHub页面
  3. Unity音频系统文档

注意:本文仅涉及环境配置部分,实际语音识别功能的实现需要编写C#脚本处理音频输入和调用Vosk接口。请参考Vosk官方文档和示例代码了解具体实现方法。