?? rfc3551.txt
字號:
frame size which is defined as part of the encoding. This does not work when carrying frames of different sizes unless the frame sizes are relatively prime. If not, the frames MUST indicate their size. For frame-based codecs, the channel order is defined for the whole block. That is, for two-channel audio, right and left samples SHOULD be coded independently, with the encoded frame for the left channel preceding that for the right channel. All frame-oriented audio codecs SHOULD be able to encode and decode several consecutive frames within a single packet. Since the frame size for the frame-oriented codecs is given, there is no need to use a separate designation for the same encoding, but with different number of frames per packet. RTP packets SHALL contain a whole number of frames, with frames inserted according to age within a packet, so that the oldest frame (to be played first) occurs immediately after the RTP packet header. The RTP timestamp reflects the instant at which the first sample in the first frame was sampled, that is, the oldest information in the packet.Schulzrinne & Casner Standards Track [Page 11]RFC 3551 RTP A/V Profile July 20034.5 Audio Encodings name of sampling default encoding sample/frame bits/sample rate ms/frame ms/packet __________________________________________________________________ DVI4 sample 4 var. 20 G722 sample 8 16,000 20 G723 frame N/A 8,000 30 30 G726-40 sample 5 8,000 20 G726-32 sample 4 8,000 20 G726-24 sample 3 8,000 20 G726-16 sample 2 8,000 20 G728 frame N/A 8,000 2.5 20 G729 frame N/A 8,000 10 20 G729D frame N/A 8,000 10 20 G729E frame N/A 8,000 10 20 GSM frame N/A 8,000 20 20 GSM-EFR frame N/A 8,000 20 20 L8 sample 8 var. 20 L16 sample 16 var. 20 LPC frame N/A 8,000 20 20 MPA frame N/A var. var. PCMA sample 8 var. 20 PCMU sample 8 var. 20 QCELP frame N/A 8,000 20 20 VDVI sample var. var. 20 Table 1: Properties of Audio Encodings (N/A: not applicable; var.: variable) The characteristics of the audio encodings described in this document are shown in Table 1; they are listed in order of their payload type in Table 4. While most audio codecs are only specified for a fixed sampling rate, some sample-based algorithms (indicated by an entry of "var." in the sampling rate column of Table 1) may be used with different sampling rates, resulting in different coded bit rates. When used with a sampling rate other than that for which a static payload type is defined, non-RTP means beyond the scope of this memo MUST be used to define a dynamic payload type and MUST indicate the selected RTP timestamp clock rate, which is usually the same as the sampling rate for audio.Schulzrinne & Casner Standards Track [Page 12]RFC 3551 RTP A/V Profile July 20034.5.1 DVI4 DVI4 uses an adaptive delta pulse code modulation (ADPCM) encoding scheme that was specified by the Interactive Multimedia Association (IMA) as the "IMA ADPCM wave type". However, the encoding defined here as DVI4 differs in three respects from the IMA specification: o The RTP DVI4 header contains the predicted value rather than the first sample value contained the IMA ADPCM block header. o IMA ADPCM blocks contain an odd number of samples, since the first sample of a block is contained just in the header (uncompressed), followed by an even number of compressed samples. DVI4 has an even number of compressed samples only, using the `predict' word from the header to decode the first sample. o For DVI4, the 4-bit samples are packed with the first sample in the four most significant bits and the second sample in the four least significant bits. In the IMA ADPCM codec, the samples are packed in the opposite order. Each packet contains a single DVI block. This profile only defines the 4-bit-per-sample version, while IMA also specified a 3-bit-per- sample encoding. The "header" word for each channel has the following structure: int16 predict; /* predicted value of first sample from the previous block (L16 format) */ u_int8 index; /* current index into stepsize table */ u_int8 reserved; /* set to zero by sender, ignored by receiver */ Each octet following the header contains two 4-bit samples, thus the number of samples per packet MUST be even because there is no means to indicate a partially filled last octet. Packing of samples for multiple channels is for further study. The IMA ADPCM algorithm was described in the document IMA Recommended Practices for Enhancing Digital Audio Compatibility in Multimedia Systems (version 3.0). However, the Interactive Multimedia Association ceased operations in 1997. Resources for an archived copy of that document and a software implementation of the RTP DVI4 encoding are listed in Section 13.Schulzrinne & Casner Standards Track [Page 13]RFC 3551 RTP A/V Profile July 20034.5.2 G722 G722 is specified in ITU-T Recommendation G.722, "7 kHz audio-coding within 64 kbit/s". The G.722 encoder produces a stream of octets, each of which SHALL be octet-aligned in an RTP packet. The first bit transmitted in the G.722 octet, which is the most significant bit of the higher sub-band sample, SHALL correspond to the most significant bit of the octet in the RTP packet. Even though the actual sampling rate for G.722 audio is 16,000 Hz, the RTP clock rate for the G722 payload format is 8,000 Hz because that value was erroneously assigned in RFC 1890 and must remain unchanged for backward compatibility. The octet rate or sample-pair rate is 8,000 Hz.4.5.3 G723 G723 is specified in ITU Recommendation G.723.1, "Dual-rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s". The G.723.1 5.3/6.3 kbit/s codec was defined by the ITU-T as a mandatory codec for ITU-T H.324 GSTN videophone terminal applications. The algorithm has a floating point specification in Annex B to G.723.1, a silence compression algorithm in Annex A to G.723.1 and a scalable channel coding scheme for wireless applications in G.723.1 Annex C. This Recommendation specifies a coded representation that can be used for compressing the speech signal component of multi-media services at a very low bit rate. Audio is encoded in 30 ms frames, with an additional delay of 7.5 ms due to look-ahead. A G.723.1 frame can be one of three sizes: 24 octets (6.3 kb/s frame), 20 octets (5.3 kb/s frame), or 4 octets. These 4-octet frames are called SID frames (Silence Insertion Descriptor) and are used to specify comfort noise parameters. There is no restriction on how 4, 20, and 24 octet frames are intermixed. The least significant two bits of the first octet in the frame determine the frame size and codec type: bits content octets/frame 00 high-rate speech (6.3 kb/s) 24 01 low-rate speech (5.3 kb/s) 20 10 SID frame 4 11 reservedSchulzrinne & Casner Standards Track [Page 14]RFC 3551 RTP A/V Profile July 2003 It is possible to switch between the two rates at any 30 ms frame boundary. Both (5.3 kb/s and 6.3 kb/s) rates are a mandatory part of the encoder and decoder. Receivers MUST accept both data rates and MUST accept SID frames unless restriction of these capabilities has been signaled. The MIME registration for G723 in RFC 3555 [7] specifies parameters that MAY be used with MIME or SDP to restrict to a single data rate or to restrict the use of SID frames. This coder was optimized to represent speech with near-toll quality at the above rates using a limited amount of complexity. The packing of the encoded bit stream into octets and the transmission order of the octets is specified in Rec. G.723.1 and is the same as that produced by the G.723 C code reference implementation. For the 6.3 kb/s data rate, this packing is illustrated as follows, where the header (HDR) bits are always "0 0" as shown in Fig. 1 to indicate operation at 6.3 kb/s, and the Z bit is always set to zero. The diagrams show the bit packing in "network byte order", also known as big-endian order. The bits of each 32-bit word are numbered 0 to 31, with the most significant bit on the left and numbered 0. The octets (bytes) of each word are transmitted most significant octet first. The bits of each data field are numbered in the order of the bit stream representation of the encoding (least significant bit first). The vertical bars indicate the boundaries between field fragments.Schulzrinne & Casner Standards Track [Page 15]RFC 3551 RTP A/V Profile July 2003 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LPC |HDR| LPC | LPC | ACL0 |LPC| | | | | | | | |0 0 0 0 0 0|0 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2| |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 | | | 1 |C| | 3 | 2 | | | |0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0| |4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 | | | | | | | |
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -