引言
随着人工智能技术的快速发展,语音识别已成为现代应用的重要组成部分。在Unity开发中,集成语音识别功能可以极大提升用户体验,特别是在游戏、VR/AR应用和交互式展示中。与传统云端语音识别方案不同,离线语音识别 无需网络连接,具有更好的隐私保护性和实时性。Vosk作为一款开源的离线语音识别库,凭借其轻量级、高精度和跨平台特性,成为Unity开发者的理想选择。
Vosk基于深度神经网络和隐马尔可夫模型(DNN-HMM),支持20多种语言 ,包括中文、英语、法语、德语等,并提供多种规模的预训练模型以满足不同场景的需求。其完全离线的特性使其特别适合对数据隐私要求较高的应用场景。
本文将详细介绍如何在Unity项目中配置Vosk环境,为实现语音识别功能奠定基础。
1. Vosk简介与特点
Vosk是一个基于Kaldi语音识别工具包构建的开源离线语音识别引擎,具有以下核心特点:
完全离线工作 :无需网络连接,所有数据处理均在设备本地完成,保证了数据安全和隐私保护
多语言支持 :支持中文、英文、法文、德文等20多种语言,满足国际化项目需求
轻量高效 :模型体积小(最小仅12MB),内存占用低,在树莓派等嵌入式设备上也能流畅运行
高准确率 :基于深度学习算法,在安静环境下识别准确率可达95%以上
跨平台兼容 :支持Windows、Linux、macOS、Android和iOS等多个平台
实时识别 :提供流式API,支持实时语音识别,延迟控制在200ms以内
与其他语音识别方案相比,Vosk在资源消耗和响应速度方面表现优异,特别适合集成到Unity项目中实现实时语音交互功能。
2. 环境配置准备工作
在开始集成Vosk前,需要完成以下准备工作:
2.1 系统与Unity要求
Unity版本 :建议使用2019.4或更高版本,支持.NET 4.x或更高版本
操作系统 :Windows、macOS或Linux开发环境
存储空间 :至少1GB可用空间(用于存放模型文件)
2.2 下载Vosk相关文件
Vosk Unity插件 :从GitHub获取Vosk的C#绑定库(https://github.com/alphacep/vosk-unity-asr)
语音模型 :从Vosk模型库(https://alphacephei.com/vosk/models)下载所需语言模型:
中文小型模型 (vosk-model-small-cn-0.22,约42MB):适合移动设备和嵌入式系统
中文标准模型 (vosk-model-cn-0.22,约1.3GB):提供更高精度,适合服务器或高性能设备
3. Unity项目配置步骤
3.1 创建Unity项目并导入Vosk
新建Unity项目或打开现有项目
在Assets文件夹中创建Plugins文件夹,存放Vosk的DLL文件(如libvosk.dll、vosk.dll等)
将下载的Vosk Unity插件文件导入到项目中
3.2 导入模型文件
在Assets目录下创建StreamingAssets文件夹(如果尚未存在)
将下载的模型压缩包(如vosk-model-small-cn-0.22.zip)直接放入StreamingAssets文件夹中
注意:无需解压 模型文件,Vosk可以直接读取压缩包内容
3.3 配置播放器设置
打开"File > Build Settings > Player Settings"
在"Configuration"选项中,确保".NET Runtime Version"设置为".NET 4.x"或更高版本
根据目标平台进行相应设置:
Windows :无需特殊配置
Android :确保设置适当的权限(麦克风访问权限)
iOS :需要额外配置麦克风使用描述
4. 模型选择与优化建议
4.1 模型选择策略
根据应用场景选择合适的模型至关重要:
模型类型
大小
适用场景
硬件要求
小型模型
40-50MB
移动设备、嵌入式系统
低端CPU,256MB+内存
标准模型
1.3-1.5GB
桌面应用、服务器
多核CPU,2GB+内存
专业模型
1.5GB+
专业语音识别
高性能CPU,8GB+内存
4.2 性能优化建议
音频格式配置 :确保音频输入为16kHz、16位单声道格式,这是Vosk模型的标准输入格式
预处理优化 :使用音频滤波算法减少背景噪音干扰
资源管理 :在不需要语音识别时及时释放识别器资源,减少内存占用
多线程处理 :将语音识别处理放在单独线程中,避免阻塞主线程
5. 常见问题与解决方案
在配置和使用Vosk过程中可能会遇到以下常见问题:
模型加载失败
原因 :模型路径错误或模型文件不完整
解决 :检查模型文件是否放置在StreamingAssets文件夹中,并确认文件完整性
识别准确率低
原因 :环境噪音或音频格式不匹配
解决 :添加音频预处理环节,确保输入音频符合16kHz、16位单声道要求
性能问题
原因 :模型过大或硬件资源不足
解决 :根据设备性能选择合适的模型规模,或考虑添加加载屏幕
平台兼容性问题
原因 :不同平台的库文件不兼容
解决 :确保使用针对目标平台编译的Vosk库文件
6. 基于Vosk的AI聊天代码实现
主控模块
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 using UnityEngine;using UnityEngine.Networking;using UnityEngine.UI;using TMPro;using System;using System.Collections;using System.Collections.Generic;using System.Text;using System.Text.RegularExpressions;public class AiChat : MonoBehaviour { [Header("UI 绑定" ) ] [SerializeField ] public TMP_InputField inputField; [SerializeField ] public Button submitButton; [SerializeField ] Text answerText; [SerializeField ] Toggle ttsToggle; [Header("API Settings" ) ] [SerializeField ] string apiUrl = "接口链接" ; [SerializeField ] string apiKey = "密钥" ; [Header("Chat Settings" ) ] [SerializeField ] public string askTag = "" ; [SerializeField ] public AIChatAssistant getRAGChat; private bool isStreaming = false ; private Coroutine streamCoroutine; private StringBuilder fullResponse = new StringBuilder(); void Start () { if (Application.internetReachability != NetworkReachability.NotReachable) { submitButton.onClick.AddListener(OnSubmitClick); submitButton.onClick.AddListener(() => StartCoroutine(getRAGChat.SubmitQuestion())); submitButton.interactable = true ; answerText.text = "等待输入问题..." ; ttsToggle.isOn = true ; } else { answerText.text = "请检查网络连接!" ; } } void OnSubmitClick () { getRAGChat.questionInput = inputField.text; } public void OnSubmitClicked () { if (string .IsNullOrWhiteSpace(inputField.text)) { answerText.text = "<color=#FF0000>请输入有效问题!</color>" ; return ; } if (isStreaming && streamCoroutine != null ) { StopCoroutine(streamCoroutine); } isStreaming = true ; fullResponse.Clear(); answerText.text = "思考中..." ; submitButton.interactable = false ; streamCoroutine = StartCoroutine(StreamChatCompletion(inputField.text)); } IEnumerator StreamChatCompletion (string userMessage ) { var requestData = new RequestData { messages = new List<Message> { new Message { role = "user" , content = userMessage + "," + askTag } }, stream = true }; string jsonPayload = JsonUtility.ToJson(requestData); byte [] payloadBytes = Encoding.UTF8.GetBytes(jsonPayload); using (UnityWebRequest request = new UnityWebRequest(apiUrl, "POST" )) { request.uploadHandler = new UploadHandlerRaw(payloadBytes); request.downloadHandler = new DownloadHandlerBuffer(); request.SetRequestHeader("Content-Type" , "application/json" ); request.SetRequestHeader("Authorization" , "Bearer " + apiKey); request.disposeDownloadHandlerOnDispose = true ; yield return request.SendWebRequest(); if (request.result == UnityWebRequest.Result.ConnectionError || request.result == UnityWebRequest.Result.ProtocolError) { Debug.LogError($"API Error: {request.error} " ); Debug.LogError($"Response Code: {request.responseCode} " ); Debug.LogError($"Response: {request.downloadHandler.text} " ); answerText.text = $"<color=#FF0000>请求失败: {request.error} </color>" ; isStreaming = false ; submitButton.interactable = true ; yield break ; } string rawResponse = request.downloadHandler.text; Debug.Log($"Raw API Response: {rawResponse} " ); if (string .IsNullOrEmpty(rawResponse)) { answerText.text = "<color=#FFA500>服务器返回空响应</color>" ; yield break ; } string [] responseLines = rawResponse.Split('\n' ); bool receivedValidResponse = false ; foreach (string line in responseLines) { if (string .IsNullOrWhiteSpace(line)) continue ; string trimmedLine = line.Trim(); if (trimmedLine == "[DONE]" ) { Debug.Log("Received [DONE] marker" ); break ; } string jsonStr = trimmedLine; if (trimmedLine.StartsWith("data:" )) { jsonStr = trimmedLine.Substring(5 ).Trim(); } if (jsonStr == "event:message" ) continue ; try { string unescapedStr = jsonStr .Replace("\\\"" , "\"" ) .Replace("\\\\" , "\\" ) .Replace("\\n" , "\n" ) .Replace("\\r" , "\r" ) .Replace("\\t" , "\t" ); if (unescapedStr.StartsWith("\"" ) && unescapedStr.EndsWith("\"" )) { unescapedStr = unescapedStr.Substring(1 , unescapedStr.Length - 2 ); } Debug.Log($"Processing line: {unescapedStr} " ); var response = JsonUtility.FromJson<StreamResponse>(unescapedStr); if (response.choices != null && response.choices.Length > 0 ) { if (response.choices[0 ].delta != null && !string .IsNullOrEmpty(response.choices[0 ].delta.content)) { string content = response.choices[0 ].delta.content; fullResponse.Append(content); answerText.text = fullResponse.ToString(); receivedValidResponse = true ; } } } catch (Exception e) { Debug.LogWarning($"解析错误: {e.Message} \n原始数据: {jsonStr} " ); } yield return null ; } isStreaming = false ; submitButton.interactable = true ; if (ttsToggle.isOn) UITTSController.Instance.OnConvertClick(); if (!receivedValidResponse) { if (rawResponse.Contains("error" )) { try { var errorResponse = JsonUtility.FromJson<ErrorResponse>(rawResponse); answerText.text = $"<color=#FF0000>API错误: {errorResponse.error.message} </color>" ; } catch { answerText.text = $"<color=#FF0000>未知API错误: {rawResponse} </color>" ; } } else if (fullResponse.Length > 0 ) { answerText.text = fullResponse.ToString(); } else { answerText.text = $"<color=#FFA500>未收到有效响应,原始数据:\n{rawResponse} </color>" ; } } } } [System.Serializable ] private class RequestData { public List<Message> messages; public bool stream; } [System.Serializable ] private class Message { public string role; public string content; } [System.Serializable ] private class StreamResponse { public string id; public string @object; public int created; public string model; public Choice[] choices; } [System.Serializable ] private class Choice { public int index; public Delta delta; public object logprobs; public string finish_reason; } [System.Serializable ] private class Delta { public string content; } [System.Serializable ] private class ErrorResponse { public ErrorInfo error; } [System.Serializable ] private class ErrorInfo { public string message; public string type; public string code; } }
TTS语音合成模块
AudioManager.cs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 using System.Collections;using UnityEngine;using UnityEngine.Networking;public class AudioManager : MonoBehaviour { private AudioSource audioSource; void Awake () { audioSource = gameObject.AddComponent<AudioSource>(); } public IEnumerator DownloadAndPlayAudio (string url ) { Debug.Log($"开始下载音频: {url} " ); using (UnityWebRequest www = UnityWebRequestMultimedia.GetAudioClip(url, AudioType.MPEG)) { ((DownloadHandlerAudioClip)www.downloadHandler).streamAudio = true ; ((DownloadHandlerAudioClip)www.downloadHandler).compressed = false ; www.timeout = 15 ; var operation = www.SendWebRequest(); while (!operation.isDone) { Debug.Log($"下载进度: {www.downloadProgress:P} " ); yield return null ; } if (www.result != UnityWebRequest.Result.Success) { Debug.LogError($"下载失败: {www.error} ,响应头: {www.GetResponseHeaders()} " ); yield break ; } Debug.Log($"音频下载完成,长度: {www.downloadedBytes} bytes" ); AudioClip clip = DownloadHandlerAudioClip.GetContent(www); if (clip == null || clip.length == 0 ) { Debug.LogError("音频解码失败" ); yield break ; } audioSource.clip = clip; audioSource.Play(); Debug.Log("音频开始播放" ); } using (UnityWebRequest www = UnityWebRequestMultimedia.GetAudioClip(url, AudioType.MPEG)) { yield return www.SendWebRequest(); if (www.result != UnityWebRequest.Result.Success) { Debug.LogError($"音频下载失败: {www.error} " ); yield break ; } AudioClip clip = DownloadHandlerAudioClip.GetContent(www); audioSource.clip = clip; audioSource.Play(); } } public void TogglePause () { if (audioSource.isPlaying) { audioSource.Pause(); } else { audioSource.UnPause(); } } public void StopPlayback () { audioSource.Stop(); } public bool IsPlaying () { return audioSource.isPlaying; } public float GetPlaybackProgress () { if (audioSource.clip == null || Mathf.Approximately(audioSource.clip.length, 0f )) { return 0f ; } return Mathf.Clamp01(audioSource.time / audioSource.clip.length); } }
BaiduTTSController.cs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 using System.Collections;using UnityEngine;using UnityEngine.Networking;public class BaiduTTSController : MonoBehaviour { private const string CLIENT_ID = "ID" ; private const string CLIENT_SECRET = "密钥" ; private string accessToken = "" ; public IEnumerator GetAccessToken () { string url = $"百度智能云链接client_id={CLIENT_ID} &client_secret={CLIENT_SECRET} " ; using (UnityWebRequest www = UnityWebRequest.Get(url)) { yield return www.SendWebRequest(); if (www.result != UnityWebRequest.Result.Success) { Debug.LogError($"Token请求失败: {www.error} " ); yield break ; } TokenResponse response = JsonUtility.FromJson<TokenResponse>(www.downloadHandler.text); accessToken = response.access_token; } } public IEnumerator CreateTTSTask (string text, System.Action<string > callback ) { string apiUrl = $"接口链接access_token={accessToken} " ; CreateTaskRequest requestData = new CreateTaskRequest { text = text, format = "mp3-16k" , voice = 0 , lang = "zh" , speed = 5 , pitch = 5 , volume = 5 }; using (UnityWebRequest www = new UnityWebRequest(apiUrl, "POST" )) { byte [] bodyRaw = System.Text.Encoding.UTF8.GetBytes(JsonUtility.ToJson(requestData)); www.uploadHandler = new UploadHandlerRaw(bodyRaw); www.downloadHandler = new DownloadHandlerBuffer(); www.SetRequestHeader("Content-Type" , "application/json" ); yield return www.SendWebRequest(); if (www.result != UnityWebRequest.Result.Success) { Debug.LogError($"任务创建失败: {www.error} " ); yield break ; } TaskCreateResponse response = JsonUtility.FromJson<TaskCreateResponse>(www.downloadHandler.text); callback?.Invoke(response.task_id); } } public IEnumerator QueryTaskStatus (string taskId, System.Action<string > callback ) { string apiUrl = $"接口链接access_token={accessToken} " ; QueryTaskRequest requestData = new QueryTaskRequest { task_ids = new string [] { taskId } }; using (UnityWebRequest www = new UnityWebRequest(apiUrl, "POST" )) { byte [] bodyRaw = System.Text.Encoding.UTF8.GetBytes(JsonUtility.ToJson(requestData)); www.uploadHandler = new UploadHandlerRaw(bodyRaw); www.downloadHandler = new DownloadHandlerBuffer(); www.SetRequestHeader("Content-Type" , "application/json" ); yield return www.SendWebRequest(); if (www.result != UnityWebRequest.Result.Success) { Debug.LogError($"状态查询失败: {www.error} " ); yield break ; } TaskQueryResponse response = JsonUtility.FromJson<TaskQueryResponse>(www.downloadHandler.text); if (response.tasks_info.Length > 0 && response.tasks_info[0 ].task_status == "Success" ) { callback?.Invoke(response.tasks_info[0 ].task_result.speech_url); } if (response.tasks_info.Length > 0 && response.tasks_info[0 ].task_status == "Success" ) { string audioUrl = response.tasks_info[0 ].task_result.speech_url; Debug.Log($"获取音频地址: {audioUrl} " ); using (UnityWebRequest headRequest = UnityWebRequest.Head(audioUrl)) { yield return headRequest.SendWebRequest(); if (headRequest.result == UnityWebRequest.Result.Success) { callback?.Invoke(audioUrl); } else { Debug.LogError($"音频地址不可用: {headRequest.error} " ); } } } } } [System.Serializable ] private class TokenResponse { public string access_token; } [System.Serializable ] private class CreateTaskRequest { public string text; public string format; public int voice; public string lang; public int speed; public int pitch; public int volume; } [System.Serializable ] private class TaskCreateResponse { public string task_id; } [System.Serializable ] private class QueryTaskRequest { public string [] task_ids; } [System.Serializable ] private class TaskQueryResponse { public TaskInfo[] tasks_info; } [System.Serializable ] private class TaskInfo { public string task_status; public TaskResult task_result; } [System.Serializable ] private class TaskResult { public string speech_url; } }
UITTSController.cs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 using System.Collections;using TMPro;using UnityEngine;using UnityEngine.UI;public class UITTSController : MonoBehaviour { public static UITTSController Instance { get ; private set ; } private void Awake () { if (Instance != null && Instance != this ) { Destroy(gameObject); return ; } Instance = this ; } [Header("UI Components" ) ] public Text inputField; public Button convertButton; public Text statusText; public Slider progressSlider; private BaiduTTSController ttsController; private AudioManager audioManager; void Start () { ttsController = gameObject.AddComponent<BaiduTTSController>(); audioManager = gameObject.AddComponent<AudioManager>(); if (Application.internetReachability != NetworkReachability.NotReachable){ StartCoroutine(InitializeSystem()); convertButton.onClick.AddListener(OnConvertClick); } else { statusText.text = "网络连接失败" ; } } IEnumerator InitializeSystem () { statusText.text = "正在初始化" ; yield return ttsController.GetAccessToken(); statusText.text = "朗读语音就绪" ; convertButton.interactable = true ; } public void OnConvertClick () { if (string .IsNullOrEmpty(inputField.text)) return ; StartCoroutine(ConvertProcess()); } IEnumerator ConvertProcess () { convertButton.interactable = false ; statusText.text = "正在生成语音" ; yield return ttsController.CreateTTSTask(inputField.text, (taskId) => { StartCoroutine(PollTaskStatus(taskId)); }); } IEnumerator PollTaskStatus (string taskId ) { float timeout = 30f ; float pollInterval = 1f ; bool isCompleted = false ; while (timeout > 0 && !isCompleted) { statusText.text = $"处理中...{timeout} 秒" ; yield return StartCoroutine (ttsController.QueryTaskStatus(taskId, (audioUrl ) => { StartCoroutine(PlayAudio(audioUrl)); isCompleted = true ; })); if (isCompleted) break ; yield return new WaitForSeconds (pollInterval ) ; timeout -= pollInterval; } if (!isCompleted) { statusText.text = "请求超时" ; Debug.LogError("状态轮询超时,最后响应数据:" ); } convertButton.interactable = true ; } IEnumerator PlayAudio (string url ) { statusText.text = "正在转载..." ; yield return audioManager.DownloadAndPlayAudio(url); statusText.text = "播放中" ; convertButton.interactable = true ; while (audioManager.IsPlaying()) { progressSlider.value = audioManager.GetPlaybackProgress(); yield return null ; } } }
Vosk语音识别模块
VoskSpeechRecognizer.cs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 using UnityEngine;using UnityEngine.UI;using System.Threading;using System.Collections.Concurrent;using System.Collections;using Vosk;using Newtonsoft.Json.Linq;using TMPro;public class VoskSpeechRecognizer : MonoBehaviour { public Button toggleButton; public Text resultText; public TMP_InputField outputText; public string modelPath = Application.streamingAssetsPath + "/Assets/vosk-model-small-cn-0.22" ; private VoskRecognizer recognizer; private AudioClip recordingClip; private bool isRecording; private Thread recognitionThread; private int sampleRate = 16000 ; private string displayText = "语音识别就绪" ; private string threadStatus = "" ; private string partialResult = "" ; private string finalResult = "" ; private ConcurrentQueue<float []> audioDataQueue = new ConcurrentQueue<float []>(); private ConcurrentQueue<string > statusQueue = new ConcurrentQueue<string >(); private int lastPosition = 0 ; private bool modelInitialized = false ; void Start () { displayText = "初始化中..." ; resultText.text = displayText; StartCoroutine(InitializeModel()); } IEnumerator InitializeModel () { try { Vosk.Vosk.SetLogLevel(0 ); Model model = new Model(modelPath); recognizer = new VoskRecognizer(model, sampleRate); modelInitialized = true ; displayText = "就绪,点击按钮开始识别" ; toggleButton.interactable = true ; toggleButton.onClick.AddListener(ToggleRecording); } catch (System.Exception e) { displayText = $"初始化失败: {e.Message} " ; Debug.LogError(e); } yield return null ; } void Update () { if (isRecording && Microphone.IsRecording(null )) { int currentPosition = Microphone.GetPosition(null ); if (currentPosition < lastPosition) { statusQueue.Enqueue("检测到音频缓冲区循环" ); lastPosition = 0 ; } if (currentPosition > lastPosition) { int sampleCount = currentPosition - lastPosition; float [] samples = new float [sampleCount]; if (recordingClip != null ) { recordingClip.GetData(samples, lastPosition); audioDataQueue.Enqueue(samples); lastPosition = currentPosition; } else { statusQueue.Enqueue("错误:录音Clip为空" ); } } } while (statusQueue.TryDequeue(out string status)) { threadStatus = status; Debug.Log(status); } if (!string .IsNullOrEmpty(finalResult)) { displayText = $"最终结果: {finalResult} " ; outputText.text = finalResult; } else if (!string .IsNullOrEmpty(partialResult)) { displayText = $"实时识别: {partialResult} " ; outputText.text = partialResult; } else if (!string .IsNullOrEmpty(threadStatus)) { displayText = threadStatus; } resultText.text = displayText; } void ToggleRecording () { if (!modelInitialized) { displayText = "模型未初始化完成" ; return ; } isRecording = !isRecording; toggleButton.GetComponentInChildren<Text>().text = isRecording ? "停止" : "开始" ; if (isRecording) { try { displayText = "正在启动麦克风..." ; partialResult = "" ; finalResult = "" ; threadStatus = "" ; audioDataQueue = new ConcurrentQueue<float []>(); lastPosition = 0 ; recordingClip = Microphone.Start(null , true , 10 , sampleRate); if (recordingClip == null ) { displayText = "无法创建录音Clip" ; isRecording = false ; return ; } statusQueue.Enqueue("音频处理线程启动" ); recognitionThread = new Thread(ProcessAudio); recognitionThread.IsBackground = true ; recognitionThread.Start(); } catch (System.Exception e) { displayText = $"启动录音失败: {e.Message} " ; isRecording = false ; Debug.LogError(e); } } else { displayText = "正在停止录音..." ; Microphone.End(null ); isRecording = false ; if (recognitionThread != null && recognitionThread.IsAlive) { recognitionThread.Abort(); } if (!string .IsNullOrEmpty(finalResult)) { displayText = $"最终结果: {finalResult} " ; } else { displayText = "识别结束,无结果" ; } } } void ProcessAudio () { statusQueue.Enqueue("音频处理线程启动" ); while (isRecording) { if (audioDataQueue.TryDequeue(out float [] samples)) { byte [] audioBytes = new byte [samples.Length * 2 ]; for (int i = 0 ; i < samples.Length; i++) { short sample = (short )(samples[i] * short .MaxValue); audioBytes[i * 2 ] = (byte )(sample & 0xFF ); audioBytes[i * 2 + 1 ] = (byte )(sample >> 8 ); } try { if (recognizer.AcceptWaveform(audioBytes, audioBytes.Length)) { var result = recognizer.Result(); finalResult = JObject.Parse(result)["text" ]?.ToString() ?? "无文本结果" ; partialResult = "" ; statusQueue.Enqueue($"最终结果: {finalResult} " ); } else { var partial = recognizer.PartialResult(); partialResult = JObject.Parse(partial )["partial" ]?.ToString() ?? "解析部分结果失败" ; statusQueue.Enqueue($"部分结果: {partialResult} " ); } } catch (System.Exception e) { statusQueue.Enqueue($"识别错误: {e.Message} " ); Debug.LogError(e); } } else { Thread.Sleep(10 ); } } } void OnApplicationQuit () { isRecording = false ; if (recognitionThread != null && recognitionThread.IsAlive) { recognitionThread.Abort(); } if (recognizer != null ) { recognizer.Dispose(); } Debug.Log("Vosk资源已释放" ); } IEnumerator RequestMicrophonePermission () { if (Application.platform == RuntimePlatform.Android || Application.platform == RuntimePlatform.IPhonePlayer) { displayText = "请求麦克风权限..." ; yield return Application.RequestUserAuthorization(UserAuthorization.Microphone); if (!Application.HasUserAuthorization(UserAuthorization.Microphone)) { displayText = "需要麦克风权限" ; yield break ; } } StartCoroutine(InitializeModel()); } }
结语
通过以上步骤,我们成功在Unity项目中配置了Vosk离线语音识别环境。Vosk作为一个轻量级、高精度的离线语音识别解决方案,为Unity开发者提供了实现语音交互功能的强大工具。其离线特性特别适合对数据隐私要求高的应用场景,而跨平台支持则使得一次开发即可部署到多种设备。
正确配置环境只是实现语音识别的第一步,在实际开发中还需要根据具体应用场景调整参数和优化性能。建议从小型模型开始测试,逐步优化识别效果,再根据需求决定是否需要升级到更大规模的模型。
随着语音交互技术的不断发展,Vosk这样的离线识别方案将在更多应用场景中发挥重要作用,为用户提供更自然、更安全的交互体验。
参考资料
Vosk官方模型库
Vosk Unity插件GitHub页面
Unity音频系统文档
注意 :本文仅涉及环境配置部分,实际语音识别功能的实现需要编写C#脚本处理音频输入和调用Vosk接口。请参考Vosk官方文档和示例代码了解具体实现方法。