Merimaa - Concer Hall Impulse Responses - Pori, Finland - Analisys Results (2005).pdf

(4535 KB) Pobierz
12592096 UNPDF
Concert Hall Impulse Responses — Pori, Finland:
Analysis results
Juha Merimaa 1 ,TimoPeltonen 2 , and Tapio Lokki 3
juha.merimaa@hut.fi, timo.peltonen@akukon.fi, tapio.lokki@hut.fi
1 Laboratory of Acoustics and Audio Signal Processing
Helsinki University of Technology
P.O.Box 3000, FI-02015 TKK, Finland
2 Akukon Oy Consulting Engineers
Kornetintie 4 A, FI-00380 Helsinki, Finland
3 Telecommunications Software and Multimedia Laboratory
Helsinki University of Technology
P.O.Box 5400, FI-02015 TKK, Finland
May 6, 2005
1 Introduction
This document presents analysis results of a published set of concert hall impulse responses. For
a detailed description of the hall, measurement positions, sound sources and microphones, as well
as the measurement procedure and post processing of the responses, see the reference document
(Merimaa et al., 2005). The responses and all documentation are available for download from
http://www.acoustics.hut.fi/projects/poririrs/ .
A major part of this document consists of individual analysis results for each pair of source-
receiver positions. The results include standard room acoustical parameters, spectrogram plots and
special directional analysis of the responses. The document is organized as follows. Section 2 de-
scribes computation and purpose of the listed room acoustical parameters and Section 3 introduces
the two types of figures used to illustrate the responses. The results for the individual responses
have been collected to the end of the document, each source-receiver combination being presented
on its own page.
2 Room acoustical parameters
The room acoustical parameters are a very traditional way of characterizing rooms or concert
halls. There is some controversy in the literature describing the relation of these parameters to the
actual perception of a listener. Nevertheless, the following subsections try to give some guidelines
for established interpretation of the results. In order to limit the analysis, we have only chosen to
present standardized parameters (ISO 3382, 1997) with the addition of Gade’s (1989a; 1989b; 1992)
support, describing the ability of a musician to hear him/herself when performing on the stage.
All parameters reported in this document have been calculated at octave bands. The filtering
has been performed using ANSI S1.1-1986 standard filters as implemented in the Octave Matlab
toolbox (Couvreur, 1997). Except for the spatial parameters interaural cross-correlation and lateral
1
125 Hz 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz 8 kHz
T 30 (s)
2
.
7
2
.
5
2
.
4
2
.
4
2
.
1
1
.
7
1
.
2
EDT (s)
2
.
5
2
.
4
2
.
4
2
.
3
2
.
0
1
.
6
0
.
9
G
(dB)
8
.
9
8
.
7
8
.
9
9
.
6
9
.
9
8
.
2
4
.
3
C 80 (dB)
0
.
3
3
.
6
3
.
1
1
.
8
0
.
6
1
.
6
6
.
6
1
IACC E
0
.
10
0
.
26
0
.
74
0
.
74 0
.
74 0
.
73
-
1
IACC L
0
.
12
0
.
31
0
.
87
0
.
89 0
.
94 0
.
94
-
LF P
0
.
24
0
.
24
0
.
28
0
.
37 0
.
34 0
.
41
-
LF SF
0
.
37
0
.
33
0
.
34
0
.
37 0
.
48 0
.
59
-
SNR (dB)
66 . 1
65 . 7
67 . 3
72 . 0 . 3 . 2 . 5
Table 1: Room acoustical parameters at octave bands averaged over all responses with receiver
positions in the audience area.
energy fraction, all other parameters are reported as an average of the values derived from both of
the DPA 4006 microphones.
All parameters except the signal-to-noise ratio (SNR, which is actually not a room acoustical
parameter but describes the measurements) have been calculated from the denoised responses (see
Merimaa et al., 2005, Section 5.1). This can be motivated as follows: Examination of the achieved
SNRs and start times of the denoising process reveals that the parameters that are evaluated over a
limited decay or period of time include in all cases only measured data. This leaves only strength and
clarity, where the computation of energy of the whole impulse response includes the extrapolated
parts. However, the denoising process extrapolates the responses in such a way, that the strength
and clarity over the denoised responses are actually better estimates of the real parameters than
using just the measured parts would yield.
The average parameters over all responses with receivers positions in the audience area are listed
in Table 1. This is the most appropriate way to characterize the hall itself. The parameters for the
individual source-receiver pairs presented in the end of this document should not be interpreted too
strictly. It has been shown that the parameters can vary considerably between individual locations
or even small displacements of measurement positions in a hall (Pelorson et al., 1992; Bradley, 1994;
Nielsen et al., 1998; Okano et al., 1998; de Vries et al., 2001), although it is common to assume
that the perception of a hall does not vary as much within the hall.
In the following subsections, each reported parameter and the applied computation methods
are described.
2.1 Reverberation time and early decay time
The reverberation time (RT) is the oldest and most common parameter describing concert hall
acoustics. It is defined as the time that it takes for the sound inside a hall to decay 60 dB after a
source is turned off. Similarly, the early decay time (EDT) is defined as the time during which the
first 10 dB of the decay process occurs, multiplied by six.
In a perfectly diffuse hall, EDT and RT would always yield exactly the same values. In practice
they do, however, differ to some degree, and EDT is more dependent on the geometry of a hall
and on the measurement position (Barron, 1995). Both RT and EDT affect the subjective sense of
reverberance or liveness of a hall. EDT is considered a better descriptor for the running reverber-
ance, since during continuous sound most of the reverberation tail is masked by the sound itself.
Only when there is a longer break in the source signal, a listener will be able to hear the full decay
characterized by the RT.
According to Beranek (1996) the subjective reverberance is mainly determined by reverberation
time at mid and high frequencies above approximately 350 Hz. On the other hand, low frequency
reverberation creates a sense of warmth, which is more of a timbral attribute. Furthermore, exces-
2
12592096.005.png 12592096.006.png
sive low frequency reverberation can make a hall sound boomy. For describing the warmth, Beranek
(1996) has proposed bass ratio defined as the ratio of average of RTs at the 125 and 250 Hz octave
bands to that at 500 and 1000 Hz. Gade (1989b) has proposed a similar measure computed from
EDTs at 250 and 500 Hz related to EDTs at 1 and 2 kHz.
The standard (ISO 3382, 1997) allows several ways to measure RT and EDT. We have chosen to
derive them from a least-squares line fit to backward integrated squared impulse responses, which
gives an ensemble average of the decay curves that would be obtained with random noise samples
as an excitation (Schroeder, 1965). The RT and EDT are calculated from the slope of the fitted
line. For determining the RT, the line was fitted between
5and
35 dB points, which gives the
standardized
T 30
value (ISO 3382, 1997), and for EDT between 0 and
10 dB points relative to
the level of the direct sound.
2.2 Strength
is a parameter describing the amount of sound energy directed to a listening
position. It is defined as the logarithmic ratio of the total energy of a measured response to that
produced by the same sound source with the same excitation at a distance of 10 m in a free
field (ISO 3382, 1997). In addition to specific acoustical features of a hall, strength depends on
the distance from the sound source, as well as on the reverberation time (Barron and Lee, 1988).
Perceptually the strength (especially at mid-frequencies 500 and 1000 Hz, Beranek 1996) determines
the loudness of a hall. As a spectral parameter, the strength at different frequency bands is, of
course, also related to the timbre.
Given the applied level calibration (Merimaa et al., 2005, Section 4.3), computation of the
strength is straightforward. The total energy at each octave band was divided by the energy of the
impulse response of the corresponding filter, and 10 dB was added to the resulting value.
G
2.3 Clarity
C 80 , the response is divided into the early and
late parts at 80 ms after the arrival of the direct sound. Clarity is highly correlated with the decay
parameters RT and EDT 1 and it also depends on the distance from the sound source somewhat
similar to
(Barron and Lee, 1988; Barron, 1995). The clarity is related to the perception of what
Beranek (1996) calls horizontal definition, defined as “the degree to which sounds that follow one
another stand apart”.
The
C 80 values presented in this paper were calculated such that the responses were divided
into early and late part before the octave band filtering. This way the time-domain spreading due
to the filtering does not affect the division.
2.4 Interaural cross-correlation and lateral energy fraction
Interaural cross-correlation (IACC) and lateral energy fraction (LF) are related to the spatial
properties of a hall. IACC can be calculated from impulse responses measured with a dummy
or a real head and it is defined as the maximum of the normalized interaural cross-correlation
function over lags in the range of [ 1 , 1] ms (ISO 3382, 1997). IACC is typically divided into
IACC E integrated over 0–80 ms from the arrival of the direct sound and to IACC L integrated over
80–1000 ms (Bradley, 1994; Hidaka et al., 1995). For determining the LF, an omnidirectional and
a figure-of-eight microphone are needed. LF is defined as the fraction of the energy arriving from
lateral directions during the early part of a response (0–80 ms) (ISO 3382, 1997).
The IACC and LF are intended as measures of spatial impression. The spatial impression is
typically divided into auditory source width (ASW) and listener envelopment (LEV). ASW depends
1 The correlation is especially high between C 80 and the ratio of RT and EDT (Barron, 1995).
3
The strength factor
The clarity index is defined as the ratio of early energy to the late (reverberant) energy expressed
in decibels (ISO 3382, 1997). In the clarity index
G
12592096.007.png
IACC E ]overthe
octave bands centered at 500, 1000, and 2000 Hz combined with strength at frequencies below 125
Hz as the best descriptors for ASW. However, despite the established nature of these parameters,
they do not always seem to be able to describe the perception, when acoustical environments of
considerably different size are compared (Merimaa and Hess, 2004).
At low frequencies the [1 IACC E ] values are always low since the relatively small distance
between the ears of the dummy head compared to the wavelength of sound results in a high
correlation. Nevertheless, at octave bands up to 1 kHz, the hall averages of [1
IACC E ]andLF
have been shown to correlate strongly (Bradley, 1994). With increasing frequency, IACC gains
more sensitivity to sound emanating from directions closer to the median plane than the (ideally)
frequency independent figure-of-eight directivity pattern used in LF. According to Okano et al.
(1998) IACC E describes the human spatial perception better than LF when these two parameters
disagree.
The IACC values reported in this paper were computed from the non-diffuse-field-equalized
dummy head responses and they are reported in the form [1
IACC] such that high values describe
high diffuseness (low correlation).
The LF estimates were derived both from measurements with the Pearl TL-4 stereo back-to-back
cardioid microphone (LF P ) and the SoundField microphone system (LF SF ). The omnidirectional
energy was integrated over 0–80 ms and the lateral energy over 5–80 ms to exclude possible leakage
of the direct sound into the lateral signal (ISO 3382, 1997). With the Pearl microphone, the
responses of the left and right channels were subtracted from one another to obtain the figure-of-
eight directivity pattern and summed to obtain the omnidirectional reference response. In case
of the SoundField microphone, both directivity patterns were readily available, but the utilized
figure-of-eight channel Y was scaled by 1
2.5 Support
Support (ST) aims at describing a hall from a performing musician’s point of view. The ST
parameters were developed to describe the perceptual support, which is “the property which makes
the musician feel that he can hear himself and that it is not necessary to force the instrument to
develop the tone” (Gade, 1989a).
ST is defined as the ratio of reflected energy to the emitted energy, as measured with an
omnidirectional microphone at a distance of 1 m from a sound source on the stage. The emitted
energy is integrated over the time interval of 0–10 ms after the arrival of the direct sound, including
typically the direct sound and the first floor reflection. For computing the reflected energy, Gade
(1989a,b) proposed originally two time intervals: 20–100 ms (ST1) and 20–200 ms (ST2). In a
later publication (Gade, 1992), ST1 was renamed ST early and ST late including reflected energy
integrated over 100–1000 ms was introduced. Furthermore, ST2 was replaced by ST total ,whichcan
be calculated as the (linear) sum of ST early and ST late .
The ST early and ST late values for the three source positions are listed in Table 2. These param-
eters were calculated from responses which were excluded from the public database due to little use
for anything apart from the support calculation. The responses were measured with the DPA 4006
microphone pair at a distance of 1 m from each source position and the parameters computed from
2 Bradley and Soulodre (1995b) have also proposed late lateral energy (as opposed to energy fraction) for measuring
envelopment.
4
mainly on the early reflections characterized by IACC E and LF, wheras the envelopment is created
by diffuse late reverberation as measured with IACC L (Morimoto and Maekawa, 1989; Bradley
and Soulodre, 1995a) 2 . Okano et al. (1998) have proposed the average of [1
/ 2 to compensate for the B-format gain convention. It
is interesting to notice that the results differ considerably. Part of these differences may be due
to nonideal directivity patterns of the microphones. Furthermore, measurements of a SoundField
MKV system (Farina, 2001) have shown another source of error caused by frequency dependent
variations in the relative gains of the omnidirectional and figure-of-eight channels. This result leads
the authors to believe that the LF P values are more reliable.
12592096.008.png 12592096.001.png
ST early
250 Hz 500 Hz 1 kHz
2 kHz
4 kHz
S1
14
.
1
12
.
7
11
.
3
10
.
3
9
.
9
S2
11
.
8
9
.
7
10
.
5
8
.
3
5
.
1
S3
12
.
3
9
.
0
10
.
5
9
.
2
8
.
7
Avg.
12
.
6
10
.
2
10
.
7
9
.
2
7
.
4
ST late
250 Hz 500 Hz 1 kHz
2 kHz
4 kHz
S1
11 . 2
12 . 1
12 . 0
10 . 4
10 . 6
S2
13
.
2
12
.
0
11
.
7
11
.
3
9
.
9
S3
14
.
6
12
.
7
11
.
9
11
.
6
12
.
1
Avg
12 . 8
12 . 3
11 . 9
11 . 1
10 . 8
Table 2: Support (in dB) at each source position.
both microphones were averaged. Only the omnidirectional sound source (as opposed to having
also a subwoofer) was used and for this reason the ST at the 125 Hz octave band has been omitted
from the results. Furthermore, the source cannot be considered omnidirectional at the omitted 8
kHz octave band and the listed 4 kHz results should also be interpreted with some care while the
results are prone to random errors depending on the direction of the measurement position relative
to the sound source.
2.6 Signal-to-noise ratio
As mentioned earlier, signal-to-noise ratio (SNR) is actually not a room acoustical parameter but
describes the measurements. The listed SNR values were calculated as the ratio of the peak value
of a response to the background noise level averaged over 10 % of the end of a response prior to
denoising. The DPA 4006 responses were used, and in each case it was verified that the 10 % of
the samples were indeed background noise.
3Fgus
The figures presented for each source-receiver pair in the end of this document consist of spectro-
grams and a special directional analysis of the early responses.
3.1 Spectrograms
The spectrograms provide a time-frequency representation of the decay of a response. They were
calculated from the rightmost (reference) DPA 4006 microphone measurements after applying the
denoising procedure (Merimaa et al., 2005, Section 5.1). The computation was performed using
1024 sample FFT with 50 % overlapping Hann windowed time frames. The resolution is thus the
same as that used in the denoising. The energy of each time-frequency component is plotted on a
dB scale. The frequency range of the plots has been limited to 20 kHz and the spectrograms are
presented over the full length of the waveform files in the database. Furthermore, the frequency-
dependent starting point of the extrapolated exponentially decaying random noise created in the
denoising process is shown with a solid black line on top of the spectrograms.
3.2 Directional analysis
The directional analysis plots illustrate the direction dependent arrival of sound to a measurement
position during a time period of 100 ms starting from slightly before the arrival of the direct sound.
5
12592096.002.png 12592096.003.png 12592096.004.png
Zgłoś jeśli naruszono regulamin