An Unsupervised Adaptive Filtering Approach of 2-to-5 Channel Upmix

A new algorithm of converting two-channel audio materials to five-channel based on subband unsupervised adaptive filtering is proposed in this paper. This algorithm uses a subband analysis-processing-synthesis framework. In each subband, a robust stereo image is obtained using principle component analysis, and an effective energy re-distribution among surround channels is achieved by mapping cross-correlation between two input channels to a weighted panning matrix.

In digital media, physical or virtual sound scenes are typically represented in channel-based representation or object-based representation [1]. The channel-based representation, such as stereo or 5.1 surround sound, is most widely used because of its direct relation to the speaker configuration of the playback system. But the channel-based representation lacks of the flexibility to support different speaker configurations. On the other hand, object-based representation can be applied to any loudspeaker

configuration by rendering the sound objects based on their spatial attributes [2]. The difficulty with object-based representation is the requirement of significantly larger
storage and higher transmission bandwidth [3]. To avoid these problems, a new representation approach inspired by human auditory system is developed, which exploits the representation of the foreground and background sound.

These sound components are usually referred to as the primary and ambient components, respectively [4]. The primary components usually consist of multiple point-like sound sources, whereas the ambient components are made up of environmental sound, such as the reverberation, applause, or nature sound like waterfall. Such primary and ambient based representation facilitates flexible rendering of the sound scene based on the loudspeaker configuration without degrading the efficiency in the reproduction.
However, the primary and ambient components are usually mixed in the audio signal for existing channel-based audio formats, which necessitate the extraction of primary
and ambient components from the audio signal. Prior to PAE, preprocessing such as
short-time Fourier transform (STFT) can be applied. The output of PAE will be the extracted primary and ambient components, along with their spatial attributes. These spatial attributes can either be incorporated in the extracted components or transmitted to the receiver for flexible rendering. Post-processing techniques, which may include
enhancement [5], [6], coding [4], [7], re-mixing [8]-[11], or simply sending to the playback systems [12]-[14], can be employed in the receiver depending on the requirements of the applications.

This is a papered I published in 2005 at an AES convention, for more information please see here.

Leave a Reply