
|
Multimedia - Streaming Audio
CODECS
MPEG-2 AAC |
MP3 |
MPEG-4 Audio |
Dolby AC-3
|
MPEG-2 Advanced Audio Coding (AAC)
|
Description
Also known as MPEG-2 NBC, AAC represents the actual state of the art in audio
coding. It is able to include up to 48 audio channels, 15 low frequency enhancement
channels, 15 embedded data streams and has multi-language capability. It also
offers a better compression ratio than layer-3. MPEG formal listening tests have
demonstrated that it is able to provide slightly better audio quality at 96 kb/s
than layer-3 at 128 kb/s or layer-2 at 192 kb/s. AAC offers a data reduction by a
factor of 16 while maintaining CD quality.
The appropriate incorporation of high coding gain and great flexibility opens
up a wide field of applications. With sampling frequencies between 8 kHz and 96
kHz and any number of channels between 1 and 48, the method is well prepared for
future developments in the audio sector. Compared to well-known coding methods
such as MPEG-2 Layer-2, it is possible to achieve half the bit rate with no loss
of subjective quality.
The driving force to develop AAC was the quest for an efficient coding method
for surround signals, like 5-channel signals (left, right, center, left-surround,
right-surround); as being used in cinemas today. There have been algorithms for
these signals in MPEG-2 for quite a while. Optimum efficiency, however, was not
reached due to technical and historical reasons. Therefore, the set aim was a
considerable decrease of necessary bit ate.
Features of MPEG-2 AAC
- High compression performance is achieved.
- Flexibility of encoding and decoding complexity, e.g., different spatial
resolution, temporal resolution, and quality enables very flexible trade-off
between quality, performance and cost.
- Object-based coding functionalities allow for interaction with audio-visual
objects and enable new interactive applications in a mobile environment.
- It uses for compression:
- Huffman coding
- quantization and scaling
- M/S matrixing
- intensity stereo
- coupling channel
- backward adaptive prediction
- temporal noise shaping (TNS)
- modified discrete cosine transform (IMDCT)
- gain control and hybrid filter bank (polyphase quadrature filter (IPQF)+IMDCT)
- The most important new tool used is the backward adaptive prediction which
uses about 45% of the decoding time. There is also a low complexity profile:
- no prediction
- TNS limited to 12 coefficients, but still over an 18 kHz bandwidth.
- And a scaleable sampling rate profile:
- no prediction
- no coupling channel
- gain control
- Hybrid Filter Bank (IPQF + divided IMDCT)
- TNS is limited to 12 coefficients , and is limited to 6 kHz bandwidth
Configuration of MPEG- 2 AAC
- Scalable coding of speech and music for different transmission methods,
such as Internet and Digital Broadcasting.
- Multimedia streams control other multimedia: Control audio and streams
(synchronization and switching, etc.)
- Presentation control: Control the display, audio and other presentable
output
- User interface control: Interface with user
- Graphics composition and control: Control object placement,
transparency effect, and its user interaction
- Return channel management: Control return channel (method, data rate,
protocols etc.)
- Conditional access management: Entitlement management and control
- EPG display and control: Electronic program guide management and
display
- Profile management: Profile of the client for selective adaptation
- Resource management: Resources available at the client (e.g. storage,
digital interface, and other peripheral)
- Diversity of mobile devices (e.g. PDA, sub-notebooks, notebooks, or
portable workstations) in regard to available resources.
- Diversity of wireless networks (e.g. HIPERLAN, GSM, UMTS, or satellite)
in regard to network topology, protocols, bandwidth, reliability etc.
Applications for MPEG-2 AAC
- Broadcast
- Content based Storage and Retrieval
- Digital AM Broadcasting
- Digital Television Set-Top Box and DVD
- Infotainment
- Mobile Multimedia
- Real Time Communications
- Streaming Audio-Video on the Internet / Intranet
- Studio and Television Post-production
- Surveillance and Virtual Meeting
- Delivery of audio for wireless distribution - via 3G or Bluetooth.
Due to its high coding efficiency, AAC is a prime candidate for any digital
broadcasting system. AAC has been selected for the use within the DRM system. The
Digital Radio Mondiale (DRM) is a world consortium dedicated to forming a single
world standard for digital broadcasting in the AM radio bands below 30 MHz. Due to
its superior performance, AAC will also play a major role for the delivery of
high-quality music via the Internet.
top
Description
In 1987, the ISO / IEC devised a very powerful algorithm that is standardized
as ISO-MPEG Audio Layer-3 (ISO 13818-3).
By using MPEG audio coding, one can shrink down the original sound data from
a CD by a factor of 12, without losing sound quality. Factors of 24 and even more
still maintain a sound quality that is significantly better than what you get by
just reducing the sampling rate and the resolution of your samples. Basically, this
is realized by perceptual coding techniques addressing the perception of sound
waves by the human ear.
By exploiting stereo effects and by limiting the audio bandwidth, the coding
schemes may achieve an acceptable sound quality at even lower bitrates. MPEG
Layer-3 is one of the most powerful member of the MPEG audio coding family. For a
given sound quality level, it requires the lowest bitrate - or for a given bitrate,
it achieves the highest sound quality.
Using MPEG audio, one may achieve a typical data reduction of
|
1:4 |
by Layer 1 (corresponds with 384 kbps for a stereo signal) |
|
1:6...1:8 |
by Layer 2 (corresponds with 256..192 kbps for a stereo signal) |
|
1:10...1:12 |
by Layer 3 (corresponds with 128..112 kbps for a stereo signal) |
still maintaining the original CD sound quality. For the use of low bit-rate audio
coding schemes in broadcast applications at bitrates of 60 kbit/s per audio channel,
the ITU-R recommends MPEG Layer-3. (ITU-R doc. BS.1115)
|
sound quality |
bandwidth |
mode |
bitrate |
reduction ratio |
|
telephone sound |
2.5 kHz |
mono |
8 kbps |
96:1 |
|
better than short-wave |
4.5 kHz |
mono |
16 kbps |
48:1 |
|
better than AM radio |
7.5 kHz |
mono |
32 kbps |
24:1 |
|
similar to FM radio |
11 kHz |
stereo |
56...64 kbps |
26...24:1 |
|
near-CD |
15 kHz |
stereo |
96 kbps |
16:1 |
|
CD |
>15 kHz |
stereo |
112..128kbps |
14..12:1 |
Features of MP3:
Major enhancements over the Layer I and Layer II algorithms include:
- Alias reduction - Layer III specifies a method of processing the MDCT
values to remove some redundancy caused by the overlapping bands of the Layer I
and Layer II filter bank.
- Nonuniform quantization - The Layer III quantizer raises its input to
the 3/4 power before quantization to provide a more consistent signal-to-noise
ratio over the range of quantizer values. The requantizer in the MPEG/audio
decoder re-linearizes the values by raising its output to the 4/3 power.
- Entropy coding of data values - Layer III uses Huffman codes to encode
the quantized samples for better data compression.
- Use of a "bit reservoir" - The design of the Layer III bit stream
better fits the variable length nature of the compressed data. As with Layer II,
Layer III processes the audio data in frames of 1,152 samples. Unlike Layer II,
the coded data representing these samples does not necessarily fit into a
fixed-length frame in the code bit stream. The encoder can donate bits to or
borrow bits from the reservoir when appropriate.
- Noise allocation instead of bit allocation - The bit allocation
process used by Layers I and II only approximates the amount of noise caused by
quantization to a given number of bits. The Layer III encoder uses a noise
allocation iteration loop. In this loop, the quantizers are varied in an
orderly way, and the resulting quantization noise is actually calculated and
specifically allocated to each subband.
Configuration of MP3:
- Designed as an adaptive representation scheme that also accommodates very low
bitrate applications, is very appropriate for mobile multimedia applications.
- A complement of services over a fixed unidirectional communication channel.
- The system can be configured of a single (logical) origination point, a
real-time, unidirectional communication channel and a large number of end-user
receiver/decoder terminals. It is a one-to-many, or possibly a few-to-many
system.
- The need of being able to trade-off between quality, performance and cost.
Applications of MP3:
- Digital Audio Broadcasting (EUREKA DAB, WorldSpace, ARIB, DRM)
- ISDN transmission for broadcast contribution and distribution purposes
- Archival storage for broadcasting
- Accompanying audio for digital TV (DVB, Video CD, ARIB)
- Internet streaming (Microsoft Netshow, Apple Quicktime)
- Portable audio (mpman, mplayer3, Rio, Lyra, YEPP and others)
- Storage and exchange of music files on computers
Performance metrics: MP3 Decoder
Analog Devices' MPEG1 Layer 3 (MP3) multi-channel audio decoder reference design
implements the digital audio decode in real-time on the ADSST-2185M 16-bit
fixed-point Digital Signal Processor (DSP). The chipset decodes primary and
extended streams for Layer 3 of the MPEG1 standard. The MPEG1 audio decoder fully
complies with the IEEE 1172/1173 audio standard. The decoder runs completely within
the internal RAM of the DSP and is implemented in 36 MIPS.
Processor: ADSP - 2185M (75 MHz)
Processor: Proprietary SIMD DSP core at 150 MHz
top
MPEG-4 is an ISO/IEC standard 14496 developed by MPEG (Moving Picture Experts
Group), which builds on the proven success of three fields:
- Digital television
- Interactive graphics applications (synthetic content)
- Interactive multimedia (World Wide Web, distribution of and access to content)
MPEG-4 provides the standardized technological elements enabling the
integration of the production, distribution and content access paradigms of
the three fields.
AAC, also known as Advanced Audio Coding is possibly the strongest contender
to upset MP3. AAC takes advantage of the best features of MPEG-2, Dolby
Digital, and AT&T's Perceptual Audio Coder (PAC). Impartial labs have tested
AAC and consider it to be of high quality. It requires a lower bandwidth
than MP3 (64 kbps / channel), but a typical implementation requires 30 to
40% more MIPS than MP3.
Description
MPEG-4 Audio facilitates a wide variety of applications which could range from
intelligible speech to high quality multichannel audio, and from natural sounds to
synthesized sounds. In particular, it supports the highly efficient representation
of audio objects consisting of:
- Speech signals: Speech coding can be done using bitrates from 2 kbit/s
up to 24 kbit/s using the speech coding tools.
- Synthesized Speech: Scalable TTS coders bitrate range from 200 bit/s
to 1.2 Kbit/s which allows a text, or a text with prosodic parameters (pitch
contour, phoneme duration, and so on), as its inputs to generate intelligible
synthetic speech.
- General audio signals: Support for coding general audio ranging from
very low bitrates up to high quality is provided by transform coding techniques.
With this functionality, a wide range of bitrates and bandwidths is covered. It
starts at a bitrate of 6 kbit/s and a bandwidth below 4 kHz but also includes
broadcast quality audio from mono up to multichannel. Furthermore, AAC
(with some modifications) is the only high-quality audio coding scheme used
within the MPEG-4 general audio standard, the future "global multimedia
language". Due to its high coding efficiency, AAC is a prime candidate for
any digital broadcasting system.
- Synthesized Audio: Synthetic Audio support is provided by a Structured
Audio Decoder implementation that allows the application of score-based control
information to musical instruments described in a special language.
- Bounded - complexity Synthetic Audio: This is provided by a Structured
Audio Decoder implementation that allows the processing of a standardized
wavetable format.
Features of MPEG-4 Audio
- It supports high performance data compression.
- A trade-off between quality and performance can be made by scaling encoder
and decoder complexity, spatial resolution, temporal resolution, and quality.
- Content-based coding enables interactivity with objects.
- Additional functionality like speed control and pitch change for speech
signals
- Additional functionality like scalability in terms of bitrate, bandwidth,
error robustness, complexity, etc.
- Composition interactivity, Objects synchronization, and Improved coding
efficiency.
- Improved temporal random access, Content-based scalability, Auxiliary data
capability
- Compatibility with MPEG-2 standard
- Copy protection and User interaction
- Downloading of audio-visual objects and other information data.
- Multipoint operation, Robustness to information error and loss
- Coding of multiple concurrent data streams
Configuration of MPEG-4 Audio
MPEG-4 Audio provides several "profiles" to allow the optimal use of MPEG-4 in
different applications. At the same time the number of profiles is kept as low as
possible in order to maintain maximum interoperability. MPEG-4 offers the following
profiles:
- The Speech Audio Profile provides a parametric speech coder, a CELP
speech coder and a Text-To-Speech interface.
- The Synthesis Audio Profile provides the capability to generate sound
and speech at very low bitrates.
- The Scalable Audio Profile, a superset of the Speech Profile, is
suitable for scalable coding of speech and music and for different transmission
methods, such as Internet and Digital Broadcasting.
- The Main Audio Profile is a rich superset of the three previous
profiles (scalable, speech, synthesis) containing tools for both natural and
synthetic audio.
- The High Quality Audio Profile contains the CELP speech coder and the
Low Complexity AAC coder including Long Term Prediction. Scalable coding can be
performed by the AAC Scalable coder. Optionally, the error resilient bitstream
syntax may be used.
- The Low Delay Audio Profile contains the parametric and CELP speech
coders (optionally using the error resilient bitstream syntax), the Low Delay
AAC coder and the Text-to-Speech interface.
- The Natural Audio Profile contains all natural audio coding tools
available in MPEG-4, but not the synthetic ones.
- The Mobile Audio Internetworking Profile contains the low delay and
scalable AAC object types including TwinVQ and BSAC. This profile is intended
to extend communication applications using non-MPEG speech coding algorithms
with high quality audio coding capabilities.
Applications of MPEG-4 Audio
- Broadcast
- Content based Storage and Retrieval
- Digital AM Broadcasting
- Digital Television Set-Top Box and DVD
- Infotainment
- Mobile Multimedia
- Real Time Communications
- Streaming Audio-Video on the Internet / Intranet
- Studio and Television Post-production
- Surveillance and Virtual Meeting
- Delivery of audio for wireless distribution - via 3G or Bluetooth.
Performance metrics: MPEG-4 AAC decoder
Processor: ADSP-2189 at 75 MHz
| MIPS |
48 |
| PM |
30 KW |
| DM |
29 KW |
Processor: Proprietary SIMD DSP core
| MIPS |
38 |
| PM |
27 KW |
| DM |
28 KW |
top
Description
AC-3 is a flexible audio data compression technology capable of encoding a
variety of audio channel formats into a single low-rate bitstream. Eight channel
configurations are supported, ranging from conventional mono or stereo to a
surround format with six discrete channels (left, center, right, left surround,
right surround and subwoofer). The AC-3 bitstream specification permits sample
rates of either 48 kHz, 44.1 kHz, or 32 kHz, and supports data rates ranging from
32 kbps (kilobits-per-second) to 640 kbps.
AC-3 coding technology has been adopted by the Advanced Television Systems
Committee (ATSC) as the audio service standard for High Definition Television
(HDTV) in the United States. It has also found applications in consumer media
(laserdisc, digital video disc) and direct satellite broadcast. At present, there
are more than a dozen semiconductor manufacturers working on AC-3 decoder chips.
Features of AC-3
- Mixdown,
- Loudness control.
- Backward compatibility
- Integral dynamic range control system.
- high resolution spectral envelope coding
- hybrid forward/backward adaptive bit allocation
- very high coding gain at modest complexity.
- Bit starvation is avoided during extreme signal demands by invoking the
technique of coupling
- Low end-to-end delay mode
Configurations of AC-3
- Associated services may be embedded into the AC-3 bit stream , which include:
visually impaired (a verbal description of the visual scene), hearing impaired
(dialogue with enhanced intelligibility), commentary,dialogue, and second
stereo programs.
- Provides a means of conditional access.
- Datacasting
- Audio rendering
- Synchronization with auxiliary media.
Applications of AC-3
- Consumer electronics equipment for cable television
- Direct digital broadcast via satellite
- Pre-recorded media.
- CD players
- Digital Studio as well as Home Theatre systems
- Set Top Decoders
- Solid State Audio Recorder
- Surround systems in Cinema halls and entertainment media.
- Consumer media (laserdisc, digital video disc)
Performance metrics: DOLBY AC-3 decoder
| Standard Supported |
Dolby Digital (AC-3) Multichannel Decode Standard |
| DSP Processor |
Single ADSP-21061 KS-160 |
| MIPS for 6-Channel Decode |
25 MIPS |
| Memory |
Data Memory: 19 K words Program Memory: 8 K words |
| THD+n |
SNR of -120 dB FS |
top


Home Page |
Company |
Solutions |
Technology |
Employment |
Contact |
Site Map
|