語音韻律的實驗分析與建模 | 做自己 - 2024年5月

語音韻律的實驗分析與建模

作者：顧文濤

出版社：世界圖書北京公司

出版日期：2013年01月01日

ISBN：9787510056529

語言：繁體中文

節選了作者關於語音韻律研究的部分論文，系統考察了時長、字調、聲調協同、句調、韻律結構、焦點重音、情感表達、說話人風格等各個層面的韻律特征，特別突出了定量建模的研究方法。研究對象以普通話和粵語為主，涉及多種漢語方言，同時包括跨語言對比及語言接觸的研究。顧文濤，江蘇揚州人。上海交通大學通信與信息系統專業工學博士，日本東京大學博士后。現任南京師范大學文學院語言科技系特聘教授、博士生導師。曾在美國貝爾實驗室訪學，曾任日本東京大學JSPS外國人特別研究員、香港中文大學副研究員。主要研究方向為實驗語音學與語音信息處理，特別是語音韻律的分析和建模。在Phonetica, IEEETransactions on Audio Speechand Language Processing,Speech Communication,IEICE Transactions onInformation and Systems等國際權威期刊以及INTERSPEECH, ICPhS,ICASSP ISCSLP SpeechProsody等重要國際會議上發表論文40余篇。現主持國家社科基金項目、國家社科基金重大招標項目子課題、江蘇省社會科學基金項目、江蘇高校哲學社會科學重點研究基地重大項目各l項。

PART I SPEAKER ADAPTATION FOR DURTION MODEL IN MANDARIN TEXT-TO-SPEECH SYNTHESIS 1 Introduction 1.1 Introduction to Duration Modeling in TTS Systems 1.1.1 Text-to-Speech Synthesis and Segmental Duration 1.1.2 Duration Model 1.2 Speaker Adaptation for Duration Model--Goal and Basic Assumption 1.3 The Source Model for Mandarin Duration 1.3.1 Phone Categorization 1.3.2 Multiplicative Model 1.3.3 Duration Factors Model-Based Optimal Text Selection 2.1 Introduction 2.2 Coverage and Statistical Model 2.3 Model-Based Greedy Text Selection 2.3.1 Analysis-of-Variance Model 2.3.2 Design Matrix and Parameter Estimability 2.3.3 Matroid Cover Problem 2.3.4 Model-Based Greedy Algorithm 2.4 Multi-Model Based Greedy Algorithm 2.4.1 Modified Algorithm for Multi-Model Cases 2.4.2 Experimental Result 2.4.3 Analysis of Computational Complexity 2.5 Further Generalization of the Algorithm 2.6 Experimental Result and Discussion 2.7 Conclusion 3 Speech Data 3.1 Speech Recording 3.2 Segmentation and Labeling 3.3 Data Analysis 4 Analysis of Multi-Speaker Mandarin Duration Models 4.1 Statistical Analysis 4.2 Muhiplicative Model Fitting 4.3 Effects of Factors in Duration Models 4.3.1 Vowel 4.3.2 Plosive Burst and Aspiration 4.3.3 Plosive Closure 4.3.4 Nasal Coda 4.3.5 Fricative 4.3.6 Sonorant Consonant 4.3.7 Common Effects across Phone Categories 4.4 Compensatory Effects 4.4.1 Burst/Aspiration and Closure of Plosives 4.4.2 Vowel and Nasal Coda 4.4.3 Obstruent and Vowel 4.4.4 Obstruent and Glide 4.4.5 Syllabic Compensatory Effects 4.5 Syllable Duration 5 Speaker Adaptation for Duration Modeling 5.1 An Efficient Speaker Adaptation Model 5.1.1 Target Model Assumption 5.1.2 Validity of Scalable Hypothesis 5.1.3 Theoretic Analysis of Model Estimation 5.1.4 Model Fitting by Linear Regression 5.1.5 Sentence Effect on Model Estimation 5.1.6 Analysis of Model RobustnessPART II Quantitative Analysis and Modeling of Tonal and Intonational Variations on Various LayersReferences

手繪景觀設計表現技法