1. Background |
 |
In parallel with the evolution of broadband
networks and digital audio equipments, information rates for
delivery and storage have risen rapidly owing to the demands
for high- quality audio signals (high sampling rates, high
word resolution, and multi-channel). NTT Communication Science
Labs recognized the importance of lossless compression
technology for audio signals and its standardization,
considering interoperability, long-term maintenance, and clear
IPR status. The Laboratories took the initiative in promoting
this technology as the standard in the ISO/IEC*3
MPEG group. |
2. Progress for
international standardization |
 |
For this standardization work, NTT initiated
discussions on its need and requirements and prepared the
technical call for the technologies. In line with the normal
standardization process, a number of improvement and
integration works were carried out on top of the initial
reference model. Partners in this standardization work
included the Technical University of Berlin (Germany),
RealNetwoks Corp. (USA), and I2R (Singapore) After the
specification had been tentatively defined, it was voted on
twice by 23 national bodies. The last ballot closed in last
week, and it has been disclosed that the standard has been
affirmed. This means specification of the lossless coding has
now been officially established as [14496-3 3rd ED AMD 2 (ALS:
Audio Lossless)*4]. It is expected that this
standard will be used in common tools for various
applications, that it will continued to be maintained so that
compressed files can be perfectly decoded even after 100
years. The MPEG group will continue working on the reference
software and conformance testing. It is also expected a
consortium of essential patent holders will be organized for
the collection and delivery of patent royalties. |
3. Technical
merits |
|
[Main Points] |
- |
Assured perfect reconstruction even after
the compression |
- |
State-of- the-art compression
performance |
- |
Significant reduction of transmission and
storage cost with minor decoding
time | | | |
 |
It is known that we have already used some of
standard audio coding schemes such as MP3 and AAC*5 or one for minidisc. These are
all perceptual coding that offer a high compression ratio at
the penalty of minor waveform distortion at the decoder. These
approaches carefully control the quantization distortion based
on the characteristics of human hearing. The waveform is
different from the original, although perceptually very close
to it. In contrast to perceptual coding, lossless coding
assures perfect reconstruction of the waveform without a
single bit of difference. This is very important for
applications such as waveform editing and archiving
high-quality audio signals. At the cost of perfect
reconstruction, the compression ratio is limited and the
compressed file size varies from 15 to 70 % of the original
depending on the statistical properties of the original
waveform. The compression performance, however, outperforms
ZIP*6.
Figure
1 compares the compression performance of MPEG-4ALS with
other available compression tools for audio signals. The
vertical axis denotes the compression ratio (the file size
divided by the original size: the smaller, the less cost), and
the horizontal axis shows the decoding time (the faster, the
more convenient). The standardized specification offers a wide
range of flexibility in selecting the operation mode at the
encoder. One can select a very fast mode with lower
performance or very high compression mode at the cost of slow
encoding and decoding. The proprietary decoder can improve the
speed. We can see that the standardized specification provides
the state-of-the-art technology. |
|
This MPEG-4ALS accepts variety of input formats:
- |
Sampling rates of up to 192 kHz (44.1 kHz
for CD) |
- |
Various integer PCM formats up to 32 bit per
sample (16 bit for CD) |
- |
32-bit floating point data in the IEEE754
format (integer for CD) |
- |
Up to 65536 channels (2 channels for
CD) | | | |
|
It can be used for almost all applications. Decoding is
generally very fast and at least 10 times faster than the
playback time of the music. It is obvious that the file
compression can reduce the size of archive files. It is also
useful for downloading compressed files, since download time
can be significantly reduced and the decoding time is much
smaller than the playback or download time. |
|
The specification features a number of technologies
for reducing the rate. In particular, NTT contributed to the
development of the following elementary tools.
- |
Time domain linear prediction based on
PARCOR coefficients. |
- |
Multi-channel coding (collaborative work
with NTT and the University of Tokyo) |
- |
Long-term prediction (collaborative work
with NTT and the University of Tokyo) |
- |
Common factor coding and masked compression
for floating-point data |
- |
Progressive order prediction for random
accessibility. | | | |
|
In parallel with its standardization activities, NTT labs
have developed the proprietary technologies for efficient
algorithms and efficient implementation while maintaining
compliancy to the standard. |
4. Future task |
 |
NTT Communication Science Labs will continue to
support the standardization of the conformance and reference
software and the enhancement of the encoder performance. In
parallel, NTT Communications Corp. will design and provide
integrated delivery or archiving systems by making use of
practical software compliant to this standard. In addition,
NTT group companies will produced with collaborative work with
partners or with licensing for various applications, including
professional audio editing tools, portable music players and
editing or archiving medical or environmental data. |
<Terminology> |
1. MPEG |
 |
Moving Picture Expert Group:standardization
group in ISO/IEC JTC1/SC29/WG11. This group has established
number of important compression schemes for video, and audio
since 1978. |
2. PARCOR
coefficient |
 |
Partial Auto Correlation:A set of predictive
parameters invented by NTT Musashino Lab in 1972. This set has
property of stability and easy quantization, and therefore
widely used for speech coding and synthesis, and other signal
processing areas. |
3. ISO/IEC |
 |
ISO (International Organization for
Standardization) and IEC (International Electro technical
Commission) are organization that seek to establish
international standards for various fields. |
4. 14496-3 3rd ED
AMD 2 (ALS) |
 |
MPEG-4 audio 3rd edition amendment 2. It is
usually called as ALS. |
5. AAC |
 |
Advanced Audio Coder :Efficient multi-channel
audio coder established in 1997. Perceptual quality is better
than that of MP3. The coder is used in the Japanese digital
broadcasting system and some of portable music players. |
6. ZIP |
 |
General purpose lossless compression tool, which
adaptively updates the codebook depending on the input
sequence. It can compress text and program sources and has
been incorporated in the OS.. |