亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

? 歡迎來到蟲蟲下載站! | ?? 資源下載 ?? 資源專輯 ?? 關于我們
? 蟲蟲下載站

?? (8)synthetic and snhc audio in mpeg-4.htm

?? 關于MPRG4的一些基本的指南
?? HTM
字號:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0094)http://3c.nii.org.tw/3c/silicon/embedded/MPEG/Synthetic%20and%20SNHC%20Audio%20in%20MPEG-4.htm -->
<!-- saved from url=(0099)http://leonardo.telecomitalialab.com/icjfiles/mpeg-4_si/10-SNHC_audio_paper/10-SNHC_audio_paper.htm --><HTML><HEAD><TITLE>Synthetic and SNHC Audio in MPEG-4</TITLE>
<META http-equiv=Content-Type content="text/html; charset=windows-1252">
<META content="MSHTML 6.00.2800.1106" name=GENERATOR>
<META content=D:\office97\Office\html.dot name=Template></HEAD>
<BODY vLink=#800080 link=#0000ff 
background="&#65288;8&#65289;Synthetic and SNHC Audio in MPEG-4.files/yellowline.jpg"><B><FONT 
face=Arial size=4>
<P align=center>Synthetic and SNHC Audio in MPEG-4</P></FONT><FONT size=2>
<P align=center>Eric D. Scheirer </P></FONT></B><FONT size=2>
<P align=center>Machine Listening Group, MIT Media Laboratory<BR>E15-401D, 
Cambridge MA 02143-4307 USA<BR>Tel: +1 617 253 0112 Fax: +1 617 258 
6264<BR>eds@media.mit.edu<BR></P>
<P align=center><B>Youngjik Lee </B>and<B> Jae-Woo Yang</B></P>
<P align=center>Switching and Transmission Technology Laboratories, ETRI</P>
<P align=center>&nbsp;</P></FONT><FONT face=Arial>
<P>Abstract</P></FONT>
<DIR>
<DIR><FONT face=Arial></FONT><FONT size=2>
<P>In addition to its sophisticated audio-compression capabilities, MPEG-4 
contains extensive functions supporting synthetic sound and the 
synthetic/natural hybrid coding of sound. We present an overview of the 
Structured Audio format, which allows efficient transmission and client-side 
synthesis of music and sound effects. We also provide an overview of the 
Text-to-Speech Interface, which standardizes a single format for communication 
with speech synthesizers. Finally, we present an overview of the AudioBIFS 
portion of the Binary Format for Scene Description, which allows the description 
of hybrid soundtracks, 3-D audio environments, and interactive audio 
programming. The tools provided for advanced audio functionality in MPEG-4 are a 
new and important addition to the world of audio 
standards.</P></FONT></DIR></DIR>
<P align=center>
<HR>

<P></P><B><I><FONT face=Arial>
<P><A name=_Toc434198797>Introduction</A></P></FONT></I></B>
<DIR><FONT face=Arial><I></I></FONT>
<DIR><FONT face=Arial><I><B></B></I></FONT><FONT size=2>
<P>This article describes the parts of MPEG-4 that govern the compression, 
representation, and transmission of synthetic sound and the combination of 
synthetic and natural sound into hybrid soundtracks. Through these tools, MPEG-4 
provides advanced capabilities for ultra-low-bitrate sound transmission, 
interactive sound scenes, and flexible, repurposable delivery of sound 
content.</P>
<P>We will discuss three MPEG-4 audio tools. The first, MPEG-4 Structured Audio, 
standardizes precise, efficient delivery of synthetic music and sound effects. 
The second, MPEG-4 Text-to-Speech Interface, standardizes a representation 
protocol for synthesized speech, an interface to text-to-speech synthesizers, 
and the automatic synchronization of synthetic speech and "talking head" 
animated face graphics [24]. The third, MPEG-4 AudioBIFS--part of the main BIFS 
framework--standardizes terminal-side mixing and post-production of audio 
soundtracks [22]. AudioBIFS enables interactive soundtracks and 3-D sound 
presentation for virtual-reality applications. In MPEG-4, the capability to mix 
and synchronize real sound with synthetic is termed <I>Synthetic/Natural Hybrid 
Coding</I> of Audio, or SNHC Audio.</P>
<P>The organization of the present paper is as follows. First, we provide a 
general overview of the objectives for synthetic and SNHC audio in MPEG-4. This 
section also introduces concepts from speech and music synthesis to readers 
whose primary expertise may not be in the field of audio. Next, a detailed 
description of the synthetic-audio codecs in MPEG-4 is provided. Finally, we 
describe AudioBIFS and its use in the creation of SNHC audio soundtracks. 
</P></FONT></DIR></DIR>
<P align=center><A name=_Toc434198798></A>
<HR>

<P></P><B><I><FONT face=Arial>
<P>Synthetic Audio in MPEG-4: Concepts and Requirements</P></FONT></I></B>
<DIR><FONT face=Arial><I></I></FONT>
<DIR><FONT face=Arial><I><B></B></I></FONT><FONT size=2>
<P>In this section, we introduce speech synthesis and music synthesis. Then we 
discuss the inclusion of these technologies in MPEG-4, focusing on the 
capabilities provided by synthetic audio and the types of applications that are 
better addressed with synthetic audio coding than with natural audio 
coding.</P></FONT></DIR></DIR><FONT face=Arial>
<P>Relationship between natural and synthetic coding</P></FONT>
<DIR>
<DIR><FONT face=Arial></FONT><FONT size=2>
<P>Modern standards for natural audio coding [1, 2] use perceptual models to 
compress natural sound. In coding synthetic sound, perceptual models are not 
used; rather, very specific parametric models are used to transmit sound 
<I>descriptions</I>. The descriptions are received at the decoding terminal and 
converted into sound through real-time sound <I>synthesis</I>. The parametric 
model for the Text-to-Speech Interface is fixed in the standard; in the 
Structured Audio toolset, the model itself is transmitted as part of the 
bitstream and interpreted by a reconfigurable decoder.</P>
<P>Natural and synthetic audio are not unrelated methods for transmitting sound. 
Especially as sound models in perceptual coding grow more sophisticated, the 
boundary between "decompression" and "synthesis" becomes somewhat blurred. 
Vercoe, Gardner, and Scheirer [28] have discussed the relationships among 
parametric models of sound, digital sound creation and transmission, perceptual 
coding, parametric compression, and various techniques for algorithmic 
synthesis. </P></FONT></DIR></DIR><FONT face=Arial>
<P><A name=_Toc434198805>Concepts in speech synthesis</A></P></FONT>
<DIR>
<DIR><FONT face=Arial></FONT><FONT size=2>
<P>Text-to-speech (TTS) systems generate speech sound according to given text. 
This technology enables the translation of text information into speech so that 
the text can be transferred through speech channels such as telephone lines. 
Today, TTS systems are used for many applications, including automatic 
voice-response systems (the "telephone menu" systems that have become popular 
recently), e-mail reading, and information services for the visually handicapped 
[9, 10].</P>
<P>TTS systems typically consist of multiple processing modules as shown in 
Figure 1. Such a system accepts text as input and generates a corresponding 
<I>phoneme</I> sequence. Phonemes are the smallest units of human language; each 
phoneme corresponds to one sound used in speech. A surprisingly small set of 
phonemes, about 120, is sufficient to describe all human languages. </P></FONT>
<P align=center><IMG height=263 
src="&#65288;8&#65289;Synthetic and SNHC Audio in MPEG-4.files/image30.gif" 
width=503></P><B><FONT size=2>
<P align=center>Figure 1: Block diagram of a text-to-speech system, showing the 
interaction between text-to-phoneme conversion, text understanding, and prosody 
generation and application</P></FONT></B><FONT size=2>
<P>The phoneme sequence is used in turn to generate a basic speech sequence 
without <I>prosody</I>, that is, without pitch, duration, and amplitude 
variations. In parallel, a text-understanding module analyzes the input for 
phrase structure and inflections. Using the result of this processing, a prosody 
generation module creates the proper prosody for the text. Finally, a prosody 
control module changes the prosody parameters of the basic speech sequence 
according to the results of the text-understanding module, yielding synthesized 
speech.</P>
<P>One of the first successful TTS systems was the DecTalk English speech 
synthesizer developed in 1983 [11]. This system produces very intelligible 
speech and supports eight different speaking voices. However, developing speech 
synthesizers of this sort is a difficult process, since it is necessary to 
extract all the acoustic parameters for synthesis. It is a painstaking process 
to analyze enough data to accumulate the parameters that are used for all kinds 
of speech.</P>
<P>In 1992, CNET in France developed the pitch-synchronous overlap-and-add 
(PSOLA) method to control the pitch and phoneme duration of synthesized speech 
[25]. Using this technique, it is easy to control the prosody of synthesized 
speech. Thus synthesized speech using PSOLA sounds more natural; it can also use 
human speech as a guide to control the prosody of the synthesis, in an 
analysis-synthesis process that can also modify the tone and duration. However, 
if the tone is changed too much, the resulting speech is easily recognized as 
artificial.</P>
<P>In 1996, ATR in Japan developed the CHATR speech synthesizer [10]. This 
method relies on short samples of human speech without modifying any 
characteristics; it locates and sequences phonemes, words, or phrases from a 
database. A large database of human speech is necessary to develop a TTS system 
using this method. Automatic tools may be used to label each phoneme of the 
human speech to reduce the development time; typically, hidden Markov models 
(HMMs) are used to align the best phoneme candidates to the target speech. The 
synthesized speech is very intelligible and natural; however, this method of TTS 
requires large amounts of memory and processing power. </P>
<P>The applications of TTS are expanding in telecommunications, personal 
computing, and the Internet. Current research in TTS includes voice conversion 
(synthesizing the sound of a particular speaker抯 voice), multi-language TTS, 
and enhancing the naturalness of speech through more sophisticated voice models 
and prosody generators.</P></FONT></DIR></DIR><FONT face=Arial>
<P><A name=_Toc434198806>Applications for speech synthesis in 
MPEG-4</A></P></FONT>
<DIR>
<DIR><FONT face=Arial></FONT><FONT size=2>
<P>The synthetic speech system in MPEG-4 was designed to support interactive 
applications using text as the basic content type. Some of these applications 
include on-demand storytelling, motion picture dubbing, and "talking head" 
synthetic videoconferencing.</P>
<P>In the story-telling on demand (STOD) application, the user can select a 
story from a huge database stored on fixed media. The STOD system reads the 
story aloud, using the MPEG-4 Text-to-Speech Interface (henceforth, TTSI) with 
the MPEG-4 facial animation tool or with appropriately selected images. The user 
can stop and resume speaking at any moment he wants through the user interface 
of the local machine (for example, mouse or keyboard). The user can also select 
the gender, age, and the speech rate of the electronic story-teller.</P>
<P>In a motion-picture-dubbing application, synchronization between the MPEG-4 
TTSI decoder and the encoded moving picture is the essential feature. The 
architecture of the MPEG-4 TTS decoder provides several levels of 
synchronization granularity. By aligning the composition time of each sentence, 
coarse granularity of synchronization can be easily achieved. To get more 
finely-tuned synchronization, information about the speaker lip shape can be 
used. The finest granularity of synchronization can be achieved by using 
detailed prosody transmission and video-related information such as sentence 
duration and offset time in the sentence. With this synchronization capability, 
the MPEG-4 TTSI can be used for motion picture dubbing by following the lip 
shape and the corresponding time in the sentence.</P>
<P>To enable synthetic video-teleconferencing, the TTSI decoder can be used to 
drive the facial-animation decoder in synchronization. <I>Bookmarks</I> in the 
TTSI bitstream control an animated face by using facial animation parameters 
(FAP); in addition, the animation of the mouth can be derived directly from the 
speech phonemes. Other applications of the MPEG-4 TTSI include speech synthesis 
for avatars in virtual reality (VR) applications, voice newspapers, and 
low-bitrate Internet voice tools.<A name=_Toc434198799></A> 
</P></FONT></DIR></DIR><FONT face=Arial>
<P><A name=_Toc434198804></A>Concepts in music synthesis</P></FONT>
<DIR>
<DIR><FONT face=Arial></FONT><FONT size=2>
<P>The field of music synthesis is too large and varied to give a complete 
overview here. An artistic history by Chadabe [4] and a technical overview by 
Roads [16] are sources that provide more background on the concepts developed 
over the last 35 years.</P>
<P>The techniques used in MPEG-4 for synthetic music transmission were 
originally developed by Mathews [13, 14], who demonstrated the first digital 
synthesis programs. The so-called <I>unit-generator model</I> of synthesis he 
developed has proven to be a robust and practical tool for musicians interested 
in the precise control of sound. This paradigm has been refined by many others, 
particularly Vercoe [26], whose language "Csound" is very popular today with 
composers. </P>
<P>In the unit-generator model (also called the <I>Music-N</I> model after 
Mathews

?? 快捷鍵說明

復制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频
国内精品久久久久影院色| 中文字幕av一区二区三区| 一区二区三区日韩精品视频| 成人不卡免费av| 综合色天天鬼久久鬼色| 91在线国产观看| 亚洲一区在线观看视频| 欧美挠脚心视频网站| 美女视频黄久久| 久久久国产精品麻豆| 99综合影院在线| 亚洲另类色综合网站| 欧美日韩国产首页| 经典三级视频一区| 日韩精品一区二区三区视频在线观看 | 亚洲视频狠狠干| 色妞www精品视频| 亚洲亚洲精品在线观看| 日韩一二在线观看| 国产一区二区美女| 亚洲欧美日韩一区二区 | 丝袜美腿亚洲综合| 91精品国产欧美一区二区18| 久久91精品久久久久久秒播| 日本一区二区三区视频视频| 日本久久一区二区三区| 亚洲国产wwwccc36天堂| 久久免费视频色| 色先锋aa成人| 精品中文字幕一区二区| 国产精品久久久久久久裸模| 7777女厕盗摄久久久| 国产盗摄精品一区二区三区在线| 樱桃视频在线观看一区| 2欧美一区二区三区在线观看视频 337p粉嫩大胆噜噜噜噜噜91av | 欧美成人三级在线| 99riav久久精品riav| 日韩电影一区二区三区四区| 国产情人综合久久777777| 欧美日韩视频在线观看一区二区三区| 激情文学综合插| 一区二区三区中文字幕精品精品| 精品国产免费视频| 欧美婷婷六月丁香综合色| 风流少妇一区二区| 日韩成人av影视| 色av成人天堂桃色av| 国产欧美日产一区| 久久久777精品电影网影网| 日本丶国产丶欧美色综合| 国产精品99久久久| 视频一区视频二区中文| 中文字幕亚洲在| 国产亚洲欧美中文| 欧美一区二区三区小说| 在线一区二区三区四区| 成人激情小说网站| 国产精品一区二区三区乱码| 日本免费新一区视频| 亚洲伊人伊色伊影伊综合网| 欧美激情一二三区| 日韩一级大片在线观看| 欧美蜜桃一区二区三区| 色婷婷久久99综合精品jk白丝| 国产91清纯白嫩初高中在线观看 | 国产精品欧美极品| 国产婷婷色一区二区三区四区| 日韩欧美成人激情| 欧美激情中文不卡| 欧美一区二区三区免费大片 | 国产精品久久久99| 久久久99久久| 国产视频在线观看一区二区三区| 日韩午夜三级在线| 日韩欧美另类在线| 91麻豆精品国产91久久久久久久久 | 成人国产视频在线观看| 国产不卡视频在线播放| 国产在线国偷精品产拍免费yy| 免费成人深夜小野草| 人人狠狠综合久久亚洲| 青椒成人免费视频| 麻豆免费精品视频| 久99久精品视频免费观看| 激情综合亚洲精品| 国产一区二区三区免费在线观看| 国模少妇一区二区三区| 色菇凉天天综合网| 日韩精品中文字幕在线不卡尤物| 欧美高清hd18日本| 国产精品久久久久影院老司| 国产喂奶挤奶一区二区三区| 日本一二三不卡| 亚洲欧洲成人自拍| 夜夜爽夜夜爽精品视频| 亚洲第一av色| 久久精品99久久久| 国产成人精品午夜视频免费| 本田岬高潮一区二区三区| 91猫先生在线| 欧美久久久久久久久久| 26uuu亚洲综合色欧美| 中文字幕乱码日本亚洲一区二区| 国产精品电影一区二区| 亚洲一二三专区| 久草在线在线精品观看| 成人免费视频一区二区| 在线亚洲一区观看| 精品国产网站在线观看| 国产精品美女久久久久久久| 亚洲国产视频一区二区| 精品一区二区三区香蕉蜜桃| av不卡在线播放| 91麻豆精品国产91久久久久久久久 | 亚洲精品国产一区二区三区四区在线| 亚洲福利视频三区| 国产成人综合自拍| 欧美日韩国产a| 国产精品欧美一级免费| 日本强好片久久久久久aaa| 成人妖精视频yjsp地址| 欧美三级中文字| 国产日产欧美精品一区二区三区| 一个色在线综合| 国产成人综合精品三级| 中文字幕日韩精品一区 | 91丨国产丨九色丨pron| 欧美一区二区三区人| 中文字幕在线不卡| 免费成人你懂的| 91成人免费在线| 久久精品无码一区二区三区| 午夜不卡在线视频| 成人动漫一区二区在线| 欧美一级艳片视频免费观看| 亚洲欧美日韩成人高清在线一区| 精品一二三四区| 欧美亚洲国产一区在线观看网站 | 不卡一区二区在线| 日韩你懂的电影在线观看| 亚洲精品成a人| 国产夫妻精品视频| 日韩午夜精品电影| 婷婷国产在线综合| 欧美曰成人黄网| 国产精品久久久久aaaa| 国产精品自拍网站| 日韩欧美久久久| 日韩不卡一区二区三区 | 欧美精品一区二区精品网| 欧美撒尿777hd撒尿| 久久九九国产精品| 国产在线乱码一区二区三区| 在线观看不卡一区| 国产成人在线影院| 一区二区三区国产精品| 91亚洲精品久久久蜜桃| 精品久久久久久久久久久久久久久| 亚洲国产精品麻豆| 色999日韩国产欧美一区二区| 国产精品久久精品日日| 国产99久久精品| 国产亚洲污的网站| 国产**成人网毛片九色| 国产视频在线观看一区二区三区| 国产麻豆精品久久一二三| 欧美精品一区二区三区视频| 蜜桃久久av一区| 精品成人在线观看| 国产一区在线观看视频| 久久久久久97三级| 国产成人99久久亚洲综合精品| 国产亚洲女人久久久久毛片| 国产传媒日韩欧美成人| 中文字幕久久午夜不卡| 日韩女优电影在线观看| 中文字幕成人在线观看| 国内精品久久久久影院色| 精品卡一卡二卡三卡四在线| 国产一区二区免费看| 国产精品你懂的在线| 91丨九色丨尤物| 亚洲第一久久影院| 日韩亚洲欧美中文三级| 久久电影网站中文字幕| 国产亚洲自拍一区| 成人国产精品免费| 一区二区高清免费观看影视大全| 欧美三片在线视频观看 | 欧美体内she精高潮| 日韩中文字幕一区二区三区| 日韩一二三四区| 欧美揉bbbbb揉bbbbb| 美国av一区二区| 国产精品情趣视频| 欧美日韩在线播放三区| 蜜桃视频一区二区三区在线观看| 国产日韩欧美精品在线| 91搞黄在线观看| 精品一区二区久久|