[C#.NET] 使用 Microsoft Speech SDK 11 來達成 Text to Speech 的功能

最近的案子要做使用者於手機上輸入文字，再於電腦網頁 Flash 中將文字唸出來的效果。
手機跟電腦互動這段之前就做過了，透過 QRCode 跟 Nodejs 達成，不再多提。
本篇文章將介紹使用 Microsoft Speech SDK 11 來達成 Text to Speech 的功能。

我的開發環境是 Windows 7，伺服器環境是 Windows Server 2008。
若您是 Windows XP / Windows 2000 Server 的話，可以尋找一下 Microsoft Speech API (SAPI) 5.4 的教學，Speech SDK 11 似乎只能安裝在 Windows Vista / Windows 2003 Server 之後的版本。

本來一開始也是用 SAPI 5.4 撰寫的，但它少了繁體中文的語言包，文字輸入只能轉成簡體中文，而且若要中英混讀的話得在程式中判斷字碼來切換語言發音，
否則遇到英文單字就會一個字母一個字母唸，滿瞎的…

這次範例會寫一隻 TTS Console Application，輸入文字後，會產生 WAV 檔案，再使用 LAME MP3 Encoder 將 WAV 轉成 MP3。
最後是透過 web 的 ashx 呼叫這支 .exe 檔案。
（之所以繞這麼多路，是因為不能動客戶的 SERVER 環境，沒辦法裝一些必要的東西，只能裝自家 SERVER 囉…）

進入正題，來下載跟安裝下面的檔案吧：

Microsoft Speech Platform – Runtime (Version 11)
http://www.microsoft.com/en-us/download/details.aspx?id=27225
這是 Runtime，一開始忘了灌，難怪都一直吐錯誤。我是下載 x86 的版本。
Microsoft Speech Platform – Software Development Kit (SDK) (Version 11)
http://www.microsoft.com/en-us/download/details.aspx?id=27226
這個是開發包。我一樣是下載 x86 的版本。
Microsoft Speech Platform – Runtime Languages (Version 11)
http://www.microsoft.com/en-us/download/details.aspx?id=27224
語言包，就下載繁體中文吧。是「女聲」喔。

一、裝好之後，打開 Visual Studio 2010，開啟一個新的主控台應用程式專案 (以 C# 為例)，然後將 C:\Program Files (x86)\Microsoft SDKs\Speech\v11.0\Assembly\Microsoft.Speech.dll 加入參考。

二、鍵入以下程式碼

整個原理挺簡單的，註解都在裡面了。
這個範例的 Code 很亂，請自行重構囉 XD

using System;
using System.Diagnostics;
using System.Reflection;
using System.Web.Security;
using Microsoft.Speech.AudioFormat;
using Microsoft.Speech.Synthesis;

namespace Synth_API
{
    class Program
    {
        static void Main(string[] args)
        {
            // 這是執行程式的路徑
            string workpath = System.IO.Path.GetDirectoryName(Assembly.GetEntryAssembly().Location);
            // 輸入值，屆時會以 「Synth_API.exe "我想說一些話"」 這種方式傳入
            string strInput = args[0];
            // 這邊把文字以 MD5 編碼，作為聲音檔的檔名
            string sID = getMD5Hash(strInput);

        tryagain:
            // 如果已經產生過就不用再產生
            if (System.IO.File.Exists(workpath + "\\mp3\\" + sID + ".mp3"))
            {
                // 直接印出檔名就離開
                Console.Write(sID);
                Environment.Exit(1);
            }
            else
            {
                // 建立 SpeechSynthesizer
                SpeechSynthesizer synth = new SpeechSynthesizer();
                // 選擇語言包，這就是繁體中文小姐的名字，真長...
                synth.SelectVoice("Microsoft Server Speech Text to Speech Voice (zh-TW, HanHan)");
                // 設定音量跟速率
                synth.Volume = 100;
                synth.Rate = -3;

                try
                {
                    // 設定導出 wav 檔案
                    synth.SetOutputToWaveFile(workpath + "\\wav\\" + sID + ".wav", new SpeechAudioFormatInfo(32000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
                    // 發音產生
                    synth.Speak(strInput);

                    // 透過 LAME MP3 Encoder 將 WAV 轉成 MP3
                    Process p = Process.Start(workpath + "\\lame.exe", "-f " + workpath + "\\wav\\" + sID + ".wav " + workpath + "\\mp3\\" + sID + ".mp3");
                    p.WaitForExit();
                }
                catch
                {
                    // 發生錯誤就重頭來一次
                    synth.SetOutputToNull();
                    goto tryagain;
                }

                // 印出檔名
                Console.Write(sID);
                Environment.Exit(1);
            }

        }

        // MD5 雜湊值產生方法，偷懶拿 Web 的來用 :P
        private static string getMD5Hash(string strToHash)
        {
            return FormsAuthentication.HashPasswordForStoringInConfigFile(strToHash, "MD5").ToLower();
        }
    }
}

三、
好了之後，建置，產生 .exe 檔案。
擺到 SERVER 上吧。對了，SERVER 一樣要安裝 Speech SDK 11 的那些檔案喔。

四、
現在來準備 web ashx 的部分，裡面會呼叫上面這支 .exe。
在 web 中建立一支 tts.ashx，程式碼如下：

<%@ WebHandler Language="C#" Class="tts" %>

using System;
using System.Web;
using System.Net;
using System.IO;
using System.Diagnostics;
using System.Web.Security;

public class tts : IHttpHandler
{

    public void ProcessRequest(HttpContext context)
    {
        // 將 MIME 類型設為 MPEG3
        context.Response.ContentType = "audio/mpeg3";

        // 這個就是想要傳入產生發音的文字
        string txt = HttpContext.Current.Server.UrlDecode(context.Request.QueryString["txt"]);

        // 執行外部程式，就指定到我們的上面產生的 Synth_API.exe 囉
        Process p = new Process();
        p.StartInfo.FileName = "D:\\某某目錄\\Synth_API.exe";
        // 這邊就是 args[0] 參數，但由於字串有可能有空白，像是「This is a apple」，所以用雙引號把字串包起來。
        p.StartInfo.Arguments = "\"" + txt + "\"";
        p.EnableRaisingEvents = true;
        p.StartInfo.UseShellExecute = false;
        p.StartInfo.RedirectStandardOutput = true;
        p.Start();

        // 抓 Synth_API.exe 印出的 MD5 檔名
        string output = "";
        output = p.StandardOutput.ReadToEnd();
        p.WaitForExit();

        // 抓到檔名
        string fileName = output + ".mp3";
        string filePath = "D:\\某某目錄\\mp3\\" + output + ".mp3";

        // 將檔案以 FileStream 方式打開
        FileStream fs = new FileStream(filePath, FileMode.Open);
        byte[] bytes = new byte[(int)fs.Length];
        fs.Read(bytes, 0, bytes.Length);
        fs.Close();

        // 設定瀏覽器以下載文件方式打開
        context.Response.AddHeader("Content-Disposition", "attachment; filename=" + HttpUtility.UrlEncode(fileName, System.Text.Encoding.UTF8));
        context.Response.BinaryWrite(bytes);
        context.Response.Flush();
        context.Response.End();

        p.Close();

    }

    public bool IsReusable
    {
        get
        {
            return false;
        }
    }

}

好了，這樣就可以以「tts.ashx?txt=欲發音的文字」來訪問。
如果 Process 無法執行外部程式的話，也許跟執行權限有關，檢查一下 IIS 應用程式集區的「識別」身分吧。

Flash 的 demo 就不做了 😛
反正就是送字串 POST 跟播放回傳的 MP3 囉。

Partner Studio

patw 的技術筆記

[C#.NET] 使用 Microsoft Speech SDK 11 來達成 Text to Speech 的功能

One thought on “[C#.NET] 使用 Microsoft Speech SDK 11 來達成 Text to Speech 的功能”

發佈留言取消回覆

One thought on “[C#.NET] 使用 Microsoft Speech SDK 11 來達成 Text to Speech 的功能”

發佈留言 取消回覆

發佈留言取消回覆