Abstract
Depending on the acoustic scenario, people with hearing loss are challenged on a different
scale
than independent of normal hearing people to comprehend sound, especially speech. That
happens
especially during social interactions within a group, which often occurs in environments
with
low signal-to-noise ratios. This communication disruption can create a barrier for people to
acquire and develop communication skills as a child or to interact with society as an adult.
Hearing loss compensation aims to provide an opportunity to restore the auditory part of
socialization.
Technology and academic efforts progressed to a better understanding of the human hearing
system. Through constant efforts to present new algorithms, miniaturization, and new
materials,
constantly-improving hardware with high-end software is being developed with new features
and
solutions to broad and specific auditory challenges. The effort to deliver innovative
solutions
to the complex phenomena of hearing loss encompasses tests, verifications, and validation in
various forms. As the newer devices achieve their purpose, the tests need to increase the
sensitivity, requiring conditions that effectively assess their improvements.
Regarding realism, many levels are required in hearing research, from pure tone assessment
in
small soundproof booths to hundreds of loudspeakers combined with visual stimuli through
projectors or head-mounted displays, light, and movement control. Hearing aids research
commonly
relies on loudspeaker setups to reproduce sound sources. In addition, auditory research can
use
well-known auralization techniques to generate sound signals. These signals can be encoded
to
carry more than sound pressure level information, adding spatial information about the
environment where that sound event happened or was simulated.
This work reviews physical acoustics, virtualization, and auralization concepts and their
uses
in listening effort research. This knowledge, combined with the experiments executed during
the
studies, aimed to provide a hybrid auralization method to be virtualized in four-loudspeaker
setups. Auralization methods are techniques used to encode spatial information into sounds.
The
main methods were discussed and derived, observing their spatial sound characteristics and
trade-offs to be used in auditory tests with one or two participants. Two well-known
auralization techniques (Ambisonics and Vector-Based Amplitude Panning) were selected and
compared through a calibrated virtualization setup regarding spatial distortions in the
binaural
cues. The choice of techniques was based on the need for loudspeakers, although a small
number
of them. Furthermore, the spatial cues were examined by adding a second listener to the
virtualized sound field. The outcome reinforced the literature around spatial localization
and
these techniques driving Ambisonics to be less spatially accurate but with greater immersion
than Vector-Based Amplitude Panning.
A combination study to observe changes in listening effort due to different signal-to-noise
ratios and reverberation in a virtualized setup was defined. This experiment aimed to
produce
the correct sound field via a virtualized setup and assess listening effort via subjective
impression with a questionnaire, an objective physiological outcome from EEG, and behavioral
performance on word recognition. Nine levels of degradation were imposed on speech signals
over
speech maskers separated in the virtualized space through Ambisonics’ first-order technique
in a
setup with 24 loudspeakers. A high correlation between participants’ performance and their
responses on the questionnaire was observed. The results showed that the increased
virtualized
reverberation time negatively impacts speech intelligibility and listening effort.
A new hybrid auralization method was proposed merging the investigated techniques that
presented
complementary spatial sound features. The method was derived through room acoustics concepts
and
a specific objective parameter derived from the room impulse response called Center Time.
The
verification around the binaural cues was driven with three different rooms (simulated). As
the
validation with test subjects was not possible due to the COVID-19 pandemic situation, a
psychoacoustic model was implemented to estimate the spatial accuracy of the method within a
four-loudspeaker setup. Also, an investigation ran the same verification, and the model
estimation was performed with the introduction of hearing aids. The results showed that it
is
possible to consider the hybrid method with four loudspeakers for audiological tests while
considering some limitations. The setup can provide binaural cues to a maximum ambiguity
angle
of 30 degrees in the horizontal plane for a centered listener.
Introduction
Individuals with normal hearing often can effortlessly comprehend complex listening
scenarios
involving multiple sound sources, background noise, and echoes. However, those with hearing
loss
may find these situations particularly challenging. These environments are commonly
encountered
in daily life, particularly during social events. They can negatively impact the
communication
abilities of individuals with hearing loss. The difficulties associated with understanding
complex listening scenarios can be a significant barrier for individuals with hearing loss,
leading to reduced participation in social activities.
Several hearing research laboratories worldwide are developing systems to realistically
simulate
challenging scenarios through virtualization to better understand and help with these
everyday
challenges. The virtualization of sound sources is a powerful tool for auditory research
capable
of achieving a high level of detail, but current methods use expensive, expansive
technology. In
this work, a new auralization method has been developed to achieve sound spatialization with
a
reduction in the technological hardware requirement, making virtualization at the clinic
level
possible.
Key Chapters
Chapter 2: Literature Review
Examines previous work in virtualization and auralization, basic concepts of human sound
perception, room acoustics, and loudspeaker-based virtualization.
Chapter 3: Investigation of Binaural Cue Distortions
Compares VBAP and Ambisonics methods through a calibrated virtualization setup in terms
of
spatial distortions and examines spatial cues with a second listener.
Chapter 4: Behavioral Study
Examines subjective effort within virtualized sound scenarios (first-order Ambisonics),
focusing on how signal-to-noise ratio (SNR) and reverberation affect listening effort in
speech-in-noise tasks.
Chapter 5: The Iceberg Method
Proposes a hybrid auralization method combining VBAP and Ambisonics for small
reproduction
systems (four loudspeakers), evaluated with objective parameters and hearing aids.
Conclusion
Throughout the course of this study, a new auralization method called
Iceberg
was conceptualized and compared to well-known methods, including VBAP and first-order
Ambisonics, using objective parameters. The Iceberg method is innovative in that it uses
"Center
Time" (TS) to find the transition point between early and late reflections in order to split
the
Ambisonics impulse responses and adequately distribute them. VBAP is responsible for
localization cues in this proposed method, while Ambisonics contributes to the sense of
immersion.
In the center position, the Iceberg method was found to be in line with the localization
accuracy of other methods while also adding to the sense of immersion. Also, a second
listener
added to the side did not present undesired effects to the auralization. Additionally, it
was
found that virtualization of sound sources with Ambisonics can implicate limitations on a
participant’s behavior due to its sweet spot in a listening-in-noise test. However, these
limitations can be circumvented and extended to Iceberg, resulting in subjective responses
that
align with behavioral performance in speech intelligibility tests and increasing the
localization accuracy.
Iceb erg: A loudsp eak er-based ro om auralization metho d for auditory researc h Sergio Luiz
Aguirre Submitte d in fulfilment of the r e quir ements for the de gr e e of Do ctor of
Philosophy Hearing Sciences – Scottish Section - Sc ho ol of Medicine Univ ersit y of Nottingham
Sup ervised b y William M. Whitmer, Lars Bramsløw, & Graham Na ylor 2022 Ther e is no
“nonsp atial he aring” — Jens Blauert A bstr act Dep ending on the acoustic scenario, p eople
with hearing loss are challenged on a differen t scale than normal hearing people to comprehend
sound, esp e- cially sp eec h. That happ en esp ecially during so cial in teractions within a
group, whic h often o ccurs in en vironments with lo w signal-to-noise ratios. This com- m
unication disruption can create a barrier for people to acquire and dev elop comm unication
skills as a child or to interact with so ciet y as an adult. Hear- ing loss comp ensation aims
to provide an opp ortunity to restore the auditory part of socialization. T echnology and
academic efforts progressed to a b et- ter understanding of the h uman hearing system. Through
constant efforts to presen t new algorithms, miniaturization, and new materials, constantly-
impro ving hardware with high-end softw are is b eing dev elop ed with new fea- tures and
solutions to broad and sp ecific auditory c hallenges. The effort to deliv er inno v ativ e
solutions to the complex phenomena of he aring loss encom- passes tests, verifications, and v
alidation in v arious forms. As the newer de- vices achiev e their purp ose, the tests need to
increase the sensitivit y , requiring conditions that effectively assess their improv ements.
Regarding realism, many levels are required in hearing research, from pure tone assessment in
small soundpro of b o oths to h undreds of loudsp eakers com- bined with visual stimuli through
pro jectors or head-mounted displays, ligh t, and mov ement con trol. Hearing aids research
commonly relies on loudsp eak er setups to repro duce sound sources. In addition, auditory
researc h can use w ell-kno wn auralization tec hniques to generate sound signals. These signals
can b e enco ded to carry more than sound pressure lev el information, adding spatial
information ab out the environmen t where that sound even t happ ened or was simulated. This
work reviews ph ysical acoustics, virtualization, and auralization concepts and their uses in
listening effort research. This knowl- edge, com bined with the exp eriments executed during the
studies, aimed to pro vide a hybrid auralization metho d to b e virtualized in four-loudsp eaker
se- tups. Auralization metho ds are techniques used to enco de spatial information in to sounds.
The main metho ds w ere discussed and derived, observing their spatial sound characteristics and
trade-offs to b e used in auditory tests with one or t w o participants. Two well-kno wn
auralization techniques (Am bisonics and V ector-Based Amplitude Panning) were selected and
compared through a calibrated virtualization setup regarding spatial distortions in the binau-
ral cues. The c hoice of techniques was based on the need for loudsp eakers, although a small
num b er of them. F urthermore, the spatial cues were exam- ined by adding a second listener to
the virtualized sound field. The outcome reinforced the literature around spatial lo calization
and these techniques driv- ing Am bisonics to b e less spatially accurate but with greater
immersion than V ector-Based Amplitude Panning. A com bination study to observ e changes in
listening effort due to differen t signal-to-noise ratios and rev erb eration in a virtualized
setup was defined. This i exp erimen t aimed to pro duce the correct sound field via a virtualized
setup and assess listening effort via sub jective impression with a questionnaire, an ob jectiv e
physiological outcome from EEG, and b eha vioral p erformance on w ord recognition. Nine levels
of degradation were imp osed on sp eec h signals o v er sp eec h maskers separated in the
virtualized space through Ambisonics’ first-order tec hnique in a setup with 24 loudsp eak ers. A
high correlation b e- t w een participan ts’ p erformance and their resp onses on the
questionnaire w as observ ed. The results show ed that the increased virtualized rev erb eration
time negativ ely impacts sp eech intelligibilit y and listening effort. A new h ybrid
auralization metho d w as prop osed merging the inv estigated tech- niques that presented
complemen tary spatial sound features. The metho d was deriv ed through ro om acoustics concepts
and a sp ecific ob jective parameter deriv ed from the ro om impulse resp onse called Center
Time. The v erification around the binaural cues was driv en with three different ro oms
(simulated). As the v alidation with test sub jects w as not p ossible due to the COVID-19
pandemic situation, a psychoacoustic mo del w as implemen ted to estimate the spatial accuracy
of the metho d within a four-loudsp eak er setup. Also, an in- v estigation ran the same
verification, and the mo del estimation was performed with the introduction of hearing aids. The
results show ed that it is p ossible to consider the h ybrid method with four loudsp eak ers for
audiological tests while considering some limitations. The setup can provide binaural cues to a
maxim um ambiguit y angle of 30 degrees in the horizon tal plane for a centered listener. ii A
cknow le dgements Thank you, Obrigado, Gr acias, Gr azie, T ak skal du have, Dank u ze er
& Danke sehr Firstly , I w ould like to express m y gratitude to my supervisors, Drs.
Bill Whit- mer, Lars Bramsløw, and Graham Naylor, for their guidance, exp ertise, and remark
able patience throughout this pro cess. Y our supp ort and mentorship ha v e been in v aluable
in helping me to fill my knowledge gaps, tirelessly encour- aging me to ask the righ t questions,
and guiding me to pro duce high-qualit y scien tific research. I also thank Dr. Thomas Lunner for
his initial guidance, insigh tful questions, and commen ts. Thank you to the sp ecial p eople at
Eriksholm Researc h Centre in Denmark and the p eople at Hearing Sciences – Scottish Section. W
orking with suc h fan tastic top teams has b een a jo y and a privilege. A sp ecial thank y ou
to Jette, Mic hael, Bo, Niels, Claus, Dorothea, Sergi, Jepp e, James, Lorenz, Johannes, and
Hamish. Thanks to all the p eople in v olv ed in HEAR-ECO for their hard w ork, esp ecially
Hidde, Beth, Patrycja, Tirdad, and Defne. I am deeply grateful to m y sw eet wife, Lilian, for
the lov e, encouragement, and supp ort she has given me throughout this journey . Her unw av
ering supp ort has b een an enduring seed of resilience and inspiration. I cannot thank her
enough for b eing such an integral part of my life. Thanks to my friends Math and Gil, for alwa
ys b eing there. Special thanks to m y former professors, Drs.: Arcanjo Lenzi, William D’Andrea
F onseca, Eric Brand˜ ao, Paulo Mareze, Stephan Paul, and Bruno Sanc hes Masiero for stim u-
lating critical thinking and for all the support, knowledge, and encouragement. Thank y ou also
to m y former Oticon Medical colleagues Simon Zisk a Krogholt, P atric k Maas, Brian Sk ov, and
Jens T. Balslev for the and supp ort. I w ould like to thank you, my Professor, Pr ofessor a
Dra. Dinara Xavier P aix˜ ao. All of this is p ossible b ecause of y ou and y our determination
to create an official undergraduate course in Acoustical Engineering in Brazil. This course iii is
praised for forming remark able professionals who are recognized w orldwide. It is not just my
dream that y ou hav e made p ossible, but the coun tless p eople for whom this course has b een
a life-changing exp erience. W e know that this w as a collective effort, but y our role was
vital. Y our wa y of showing that p olitics is a part of everything, and that we need to b e
gentle but correct, made all the difference. Thank you. Muito Obrigado . I sincerely thank m y
friends and colleagues from m y undergraduate studies in acoustical engineering (USFM/EAC) and
the master’s (UFSC/L V A). Y our supp ort, happiness, patience and encouragemen t hav e b een in
v aluable through- out this journey . Thank y ou for helping me to develop m y skills and kno
wledge and for b eing such a p ositiv e influence on m y academic career. I am deeply grateful
for all y ou ha v e done for me and lo ok forw ard to contin uing our pro- fessional
relationship. I wan t to thank the oldest friends, Fabrício, André, Juliano, and the Pan teon. I
v alue the b ond and history that w e share. Thank you for b eing suc h wonderful friends. I wan
t to express my heartfelt thanks to the Brazilian CNPq and the gov ern- men t (Lula/Dilma) p
olicies that supp ort students with lo w income and from public schools. With their financial
assistance, I could pursue m y studies and ac hiev e my goals. I am deeply gratef ul for their
support and the opp ortunity to receiv e a qualit y education. I would also like to express m y
gratitude to Marie Sk lo do wsk a-Curie Actions for their supp ort of my do ctoral education.
Their reference programme for do ctoral education has provided me with in v aluable resources
and opp ortunities, and I am extremely grateful for their support. Thank you for helping me
achiev e my goals and b eing a v aluable part of m y academic journey . I wan t to express my
gratitude to all those who will read this thesis in the future. Y our time and atten tion are
greatly appreciated. I wish you a go o d reading exp erience and hope that you will find the
ideas and research presented in this w ork to b e b oth thought-pro voking and b eneficial. Thank
you again for considering this w ork. iv A uthor’s De clar ation This thesis is the result of
the author’s original researc h. Chapter 4 is a collab oration w ork with Tirdad Seifi-Ala. The
author has comp osed it and has not b een previously submitted for any other academic
qualification. This pro ject has received funding from the Europ ean Union’s Horizon 2020 researc
h and innov ation programme under the Marie-Sk lo do wsk a-Curie grant agreemen t No 765329; The
funder had no role in study design. Sergio Luiz Aguirre v Con ten ts Abstract i Ac kno wledgemen
ts iii Author’s Declaration v Nomenclature xxi 1 In tro duction 1 1.1 Motiv ations . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . 1 1.2 Aims and Scop e . . . . . . . . . . . . . . .
. . . . . . . . . . . . 2 1.3 Con tributions . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 4 1.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . 4 2 Literature
Review 7 2.1 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2
Human Binaural Hearing . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Spatial Hearing
Concepts . . . . . . . . . . . . . . . . . 9 2.2.2 Binaural cues . . . . . . . . . . . . . . . .
. . . . . . . . 10 2.2.3 Monaural cues . . . . . . . . . . . . . . . . . . . . . . . . 11 vi
CONTENTS vii 2.2.4 Head-related transfer function . . . . . . . . . . . . . . . 12 2.2.5 Sub
jectiv e asp ects of an audible reflection . . . . . . . . 16 2.3 Spatial Sound & Virtual
Acoustics . . . . . . . . . . . . . . . . 17 2.3.1 Virtualization . . . . . . . . . . . . . . .
. . . . . . . . . 18 2.3.1.1 Auralization . . . . . . . . . . . . . . . . . . . . 19 2.3.1.2
Repro duction . . . . . . . . . . . . . . . . . . . 21 2.3.2 Auralization Paradigms . . . . . .
. . . . . . . . . . . . 22 2.3.2.1 Binaural . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2.2 P anorama . . . . . . . . . . . . . . . . . . . . . 23 2.3.2.3 Sound Field Synthesis . .
. . . . . . . . . . . . 31 2.3.3 Ro om acoustics . . . . . . . . . . . . . . . . . . . . . . .
32 2.3.3.1 Ro om acoustics parameters . . . . . . . . . . . 33 2.3.3.2 Rev erb eration Time . .
. . . . . . . . . . . . . 33 2.3.3.3 Clarit y and Definition . . . . . . . . . . . . . . 35
2.3.3.4 Cen ter Time . . . . . . . . . . . . . . . . . . . 37 2.3.3.5 P arameters related to
spatialit y . . . . . . . . . 37 2.3.4 Loudsp eak er-based Virtualization in Auditory Researc h
. 40 2.3.4.1 Hybrid Metho ds . . . . . . . . . . . . . . . . . 43 2.3.4.2 Sound Source Lo
calization . . . . . . . . . . . . 45 2.4 Listening Effort Assessment . . . . . . . . . . . . . .
. . . . . . 50 2.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 53 3
Binaural cue distortions in virtualized Am bisonics and VBAP 55 CONTENTS viii 3.1 In tro duction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 Metho ds . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 58 3.2.1 Setups and system c haracterization . . . . . .
. . . . . . 59 3.2.1.1 Rev erb eration time . . . . . . . . . . . . . . . . 60 3.2.1.2
Early-reflections . . . . . . . . . . . . . . . . . 61 3.2.2 Pro cedure . . . . . . . . . . . . .
. . . . . . . . . . . . . 62 3.2.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . .
64 3.2.4 VBAP Auralization . . . . . . . . . . . . . . . . . . . . . 67 3.2.5 Am bisonics
Auralization . . . . . . . . . . . . . . . . . . 67 3.3 Results . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . 69 3.3.1 Analysis . . . . . . . . . . . . . . . . . . . . . . .
. . . . 69 3.3.2 Cen tered p osition . . . . . . . . . . . . . . . . . . . . . . 71 3.3.2.1 Cen
tered ITD . . . . . . . . . . . . . . . . . . . 73 3.3.2.2 Cen tered ILD . . . . . . . . . . . .
. . . . . . . 76 3.3.3 Off-cen tered p osition . . . . . . . . . . . . . . . . . . . . 82 3.3.3.1
Off-cen ter ITD . . . . . . . . . . . . . . . . . . 83 3.3.3.2 Off-cen ter ILD . . . . . . . . . .
. . . . . . . . 87 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 95 4 Sub jectiv e Effort
within Virtualized Sound Scenarios 97 4.1 In tro duction . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 98 4.2 Metho ds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
100 CONTENTS ix 4.2.1 P articipan ts . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.2.2
Stim uli . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.2.3 Apparatus . . . . . .
. . . . . . . . . . . . . . . . . . . . 104 4.2.4 Auralization . . . . . . . . . . . . . . . . .
. . . . . . . . 107 4.2.5 Pro cedure . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.2.6 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . 112 4.2.7 Statistics . . . .
. . . . . . . . . . . . . . . . . . . . . . . 113 4.3 Results . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 113 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 116 4.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 119 5
Iceb erg: A Hybrid Auralization Metho d F o cused on Compact Setups. 120 5.1 In tro duction . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.2 Iceb erg an Hybrid Auralization
Metho d . . . . . . . . . . . . . . 122 5.2.1 Motiv ation . . . . . . . . . . . . . . . . . . .
. . . . . . . 122 5.2.2 Metho d . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.2.2.1 Comp onen ts . . . . . . . . . . . . . . . . . . . . 124 5.2.2.2 Energy Balance . . . .
. . . . . . . . . . . . . . 124 5.2.2.3 Iceb erg prop osition . . . . . . . . . . . . . . . .
127 5.2.3 Setup Equalization & Calibration: . . . . . . . . . . . . . 133 5.3 System
Characterization . . . . . . . . . . . . . . . . . . . . . . 138 5.3.1 Exp erimen tal Setup . .
. . . . . . . . . . . . . . . . . . . 139 CONTENTS x 5.3.2 Virtualized RIRs & BRIRs . .
. . . . . . . . . . . . . . . 139 5.3.3 Conditions . . . . . . . . . . . . . . . . . . . . . . .
. . . 142 5.3.4 Rev erb eration Time . . . . . . . . . . . . . . . . . . . . . 143 5.4 Main
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.4.1 Cen tered Position . .
. . . . . . . . . . . . . . . . . . . . 145 5.4.1.1 In teraural Time Difference . . . . . . . . .
. . . 145 5.4.1.2 In teraural Level Difference . . . . . . . . . . . . 147 5.4.1.3 Azim uth
Estimation . . . . . . . . . . . . . . . 150 5.4.2 Off-Cen ter Positions . . . . . . . . . . . .
. . . . . . . . 151 5.4.2.1 In teraural Time Difference . . . . . . . . . . . . 151 5.4.2.2 In
teraural Level Difference . . . . . . . . . . . . 153 5.4.2.3 Azim uth Estimation . . . . . . . .
. . . . . . . 155 5.4.3 Cen tered Accompanied by a Second Listener . . . . . . . 156 5.4.3.1 In
teraural Time Difference . . . . . . . . . . . . 157 5.4.3.2 In teraural Level Difference . . . .
. . . . . . . . 157 5.4.3.3 Azim uth Estimation . . . . . . . . . . . . . . . 159 5.5 Supplemen
tary T est Results . . . . . . . . . . . . . . . . . . . . 163 5.5.1 Cen tered Position (Aided)
. . . . . . . . . . . . . . . . . 164 5.5.1.1 In teraural Time Difference . . . . . . . . . . . .
165 5.5.1.2 In teraural Level Difference . . . . . . . . . . . . 166 5.5.1.3 Azim uth Estimation
. . . . . . . . . . . . . . . 166 5.5.2 Off-cen ter Positions (Aided) . . . . . . . . . . . . . .
. . 169 CONTENTS xi 5.5.2.1 In teraural Time Difference . . . . . . . . . . . . 169 5.5.2.2 In
teraural Level Difference . . . . . . . . . . . . 171 5.5.2.3 Azim uth Estimation . . . . . . . .
. . . . . . . 174 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
5.6.1 Sub jectiv e impressions . . . . . . . . . . . . . . . . . . . 180 5.6.2 Adv an tages and
Limitations . . . . . . . . . . . . . . . . 181 5.6.3 Study limitations and F uture W ork . . .
. . . . . . . . . 182 5.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 183
6 Conclusion 184 6.1 Iceb erg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
185 6.2 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.2.1 Iceb erg
capabilities . . . . . . . . . . . . . . . . . . . . . 189 6.2.2 Iceb erg & Second Joint
Listener . . . . . . . . . . . . . . 190 6.2.3 Iceb erg: Listener W earing Hearing Aids . . . .
. . . . . . 191 6.2.4 Iceb erg Limitations . . . . . . . . . . . . . . . . . . . . . 192 6.3
General Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 193 6.4 Main Contributions
. . . . . . . . . . . . . . . . . . . . . . . . . 193 Bibliograph y 195 App endices 225 A ITDs
Ambisonics 225 B Delta ILD Ambisonics 226 C W av e Equation and Spherical Harmonic
Representation 228 C.1 W av e Equation in Spherical Co ordinates . . . . . . . . . . . . . 228
C.2 Separation of the V ariables . . . . . . . . . . . . . . . . . . . . . 228 C.3 Spherical
Harmonics . . . . . . . . . . . . . . . . . . . . . . . . 230 D Rev erb eration time in Acoustic
Sim ulation 231 E Alpha Co efficien ts 232 F Questionnaire 234 xii List of T ables 2.1
Non-exhaustiv e ov erview list of hybrid auralization metho ds prop osed in the literature. The
A-B order of the techniques do es not represent any order of significance. . . . . . . . . . . .
45 2.2 Ov erview of Lo calization Error Estimates or Measuremen ts from Loudsp eak er-Based
Virtualization Systems Using V arious Au- ralization Metho ds. . . . . . . . . . . . . . . . . .
. . . . . . . . 49 3.1 Sound pressure level difference b etw een direct sound and early
reflections ∆ SPL [dB] . . . . . . . . . . . . . . . . . . . . . . . 62 4.1 The questionnaire for
sub jective ratings of p erformance, effort and engagement (English translation from Danish) . .
. . . . . 113 4.2 Results of linear mixed mo del based on SNR and R T predictors estimates of
the questionnaire. . . . . . . . . . . . . . . . . . . . 114 4.3 P earson skipp ed correlations
b etw een p erformance and self-rep orted questions. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 116 5.1 Rev erb eration Time in three virtualized environmen ts . . . . . . 144
5.2 One wa y anov a, columns are absolute difference b et w een esti- mated and reference angles
for different KEMAR p ositions and R Ts. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 162 5.3 Hearing Level in dB according to the prop osed Standard Au- diograms . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 164 5.4 One wa y anov a, columns are
absolute difference b et w een esti- mated and reference angles for different p ositions and R Ts.
. . . 168 xiii 5.5 Maxim um ITD according to displacement . . . . . . . . . . . . 171 xiv List
of Figures 2.1 Tw o-dimensional representation of the cone of confusion. . . . . 11 2.2 A
descriptive definition of the measured free-field HR TF for a giv en angle. . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . 12 2.3 P olar co ordinate system related to head incidence
angles . . . . 13 2.4 Head-related transfer functions of four human test participants, fron tal
incidence . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5 Audible effects of a
single reflection . . . . . . . . . . . . . . . . 16 2.6 Binaural repro duction setups . . . . .
. . . . . . . . . . . . . . . 23 2.7 V ector-based amplitude panning: 2D display of sound
sources p ositions and weigh ts. . . . . . . . . . . . . . . . . . . . . . . . 25 2.8 Diagram
representing the placemen t of sp eak ers in the VBAP tec hnique . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . 26 2.9 Spherical Harmonics Y m n ( θ , ϕ ). . . . . . . . . . . . .
. . . . . . . 29 2.10 B-format comp onen ts: omnidirectional pressure comp onen t W, and the
three v elo cit y comp onen ts X, Y, Z. . . . . . . . . . . . . 30 2.11 Illustration of Huygen’s
Principle of a propagating wa ve front. . 32 2.12 Normalized Room Impulse Response: example from
a real room in the time domain (left), and in the time domain in dB (righ t). 35 2.13 LoRa
implementation pro cessing diagram . . . . . . . . . . . . . 43 3.1 Hearing Sciences - Scottish
Section T est Ro om. . . . . . . . . . 59 xv LIST OF FIGURES xvi 3.2 Eriksholm T est Ro om. . .
. . . . . . . . . . . . . . . . . . . . . 59 3.3 Rev erb eration time in third of o ctav e . . .
. . . . . . . . . . . . 61 3.4 HA TS and KEMAR inside test ro om in Glasgo w . . . . . . . . .
63 3.5 HA TS and KEMAR inside Eriksholm’s Anechoic Ro om . . . . 63 3.6 Description of exp
erimen t’s measured p ositions and mannequin placemen t . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 65 3.7 In teraural cross correlation - F rontal angle . . . . . . . . . . .
. 72 3.8 P olar representation IACC . . . . . . . . . . . . . . . . . . . . . 72 3.9 In teraural
Time Difference b y angle: VBAP accompanied . . . . 74 3.10 Am bisonics - directivit y
representation in 2D . . . . . . . . . . 75 3.11 In teraural Time Difference b y angle: Am
bisoncis accompanied . 76 3.12 In teraural Level Differences: VBAP and Ambisonics . . . . . . .
77 3.13 In teraural Lev el Differences: av eraged o ctav e bands as a func- tion of azim uth
angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane. . . . . . . . . . . . .
. . . . . . . . . . . 78 3.14 In teraural Lev el Differences with additional listener (VBAP and
Am bisonics) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.15 Discrepancies in
Interaural Lev el Differences (VBAP) . . . . . . 79 3.16 VBAP In teraural Lev el Differences as
function of azim uth angle around the centered listener. . . . . . . . . . . . . . . . . . . . .
80 3.17 Discrepancies in Interaural Lev el Differences: Am bisonics . . . . 81 3.18 Am bisonics
In teraural Lev el Differences as function of azimuth angle around the cen tered listener. . . .
. . . . . . . . . . . . . 82 3.19 VBAP Off center ITD HA TS 25 cm . . . . . . . . . . . . . . . .
84 3.20 VBAP Off center ITD HA TS 50 cm . . . . . . . . . . . . . . . . 84 3.21 VBAP Off center
ITD HA TS 75 cm . . . . . . . . . . . . . . . . 85 LIST OF FIGURES xvii 3.22 VBAP Off center ITD
displaced HA TS . . . . . . . . . . . . . . 85 3.23 VBAP ITD considering real sound sources only
. . . . . . . . . 86 3.24 Am bisonics ITD as a function of source angle . . . . . . . . . . 87
3.25 ILD (VBAP and Am bisonics) in off-cen ter setups . . . . . . . . 88 3.26 ILD centered setup
and off-center VBAP setups . . . . . . . . . 89 3.27 Differences in the ILD b etw een centered and
off-center (25 cm) in VBAP setups . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.28
Differences in the ILD b etw een centered and off-center (50 cm) in VBAP setups . . . . . . . . .
. . . . . . . . . . . . . . . . . . 90 3.29 Differences in the ILD b etw een centered and
off-center (75 cm) in VBAP setups . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.30
ILD centered setup and off-center Ambisonics setups . . . . . . 92 4.1 Auralization pro cedure
implemented to create mixed audible HINT sen tences with 4 spatially separated talkers at the
sides and back (maskers) and one target in fron t. . . . . . . . . . . . 104 4.2 Spatial setup
of the exp erimen t . . . . . . . . . . . . . . . . . . 105 4.3 Eriksholm Anechoic Ro om: Rev
erb eration Time . . . . . . . . . 105 4.4 Eriksholm Anechoic Ro om: Bac kground noise . . . . .
. . . . . 106 4.5 Exp erimen t setup placed inside anechoic ro om. . . . . . . . . . . 106 4.6
Ov erall rev erb eration time (R T) as a function of receptor (head) p osition in the
mid-saggital plane re center (0 cm) . . . . . . . . 108 4.7 Sound pressure level Am bisonics
virtualized setup . . . . . . . . 109 4.8 P articipan t p ositioned to the test. . . . . . . . .
. . . . . . . . . 110 4.9 Exp erimen t’s trial design . . . . . . . . . . . . . . . . . . . . .
. 111 4.10 Graphic User Interface . . . . . . . . . . . . . . . . . . . . . . . 112 LIST OF
FIGURES xviii 4.11 P erformance accuracy (word-scoring) . . . . . . . . . . . . . . . 114 4.12
Self rep orted sub jective intelligibilit y . . . . . . . . . . . . . . . 115 4.13 Self rep
orted sub jective effort . . . . . . . . . . . . . . . . . . . 115 4.14 Self rep orted sub
jective disengagement . . . . . . . . . . . . . . 115 5.1 T op view. Loudsp eak ers p osition on
horizontal plane to virtu- alization with prop osed Iceb erg metho d. . . . . . . . . . . . . .
123 5.2 Normalized Ambisonics first-order RIR generated via ODEON soft w are. Left panel depicts
the w a v eform; right panel depicts the wa veform in dB. . . . . . . . . . . . . . . . . . . .
. . . . . 125 5.3 Reflectogram split into Direct Sound Early and Late Reflections. 126 5.4 Iceb
erg’s pro cessing Blo c k diagram. The Ambisonics RIR is treated, split, and con v olv ed to an
input signal. A virtual audi- tory scene can b e created by playing the multi-c hannel output
signal with the appropriate setup.. . . . . . . . . . . . . . . . . 129 5.5 Omnidirectional c
hannel of Ambisonics RIR for a sim ulated ro om. 129 5.6 RIR Ambisonics segments . . . . . . . .
. . . . . . . . . . . . . 130 5.7 Example of signal auralized with the Iceb erg metho d . . . .
. . 132 5.8 Loudsp eak er frequency resp onse comparison . . . . . . . . . . . 135 5.9 Loudsp
eak ers normalized frequency resp onse . . . . . . . . . . . 135 5.10 Loudsp eak ers normalized
frequency resp onse filtered . . . . . . . 136 5.11 BRIR/RIR acquisition flow chart: Iceb erg
auralization metho d. . 141 5.12 BRIR measurement setup: B&K HA TS and KEMAR p ositioned
inside the anechoic ro om. . . . . . . . . . . . . . . . . . . . . . . 141 5.13 Measuremen t p
ositions (grid) . . . . . . . . . . . . . . . . . . . 142 5.14 R T within Iceb erg virtualized
en vironmen t . . . . . . . . . . . . 144 5.15 Iceb erg Cen tered Interaural Time Difference . .
. . . . . . . . . 145 LIST OF FIGURES xix 5.16 Iceb erg Centered In teraural Time Difference R Ts
= 0, 0.5 and 1.1 s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.17
Iceb erg cen tered ILD . . . . . . . . . . . . . . . . . . . . . . . . 147 5.18 Iceb erg and
Real loudsp eak ers ILDs as a function of azimuth angle . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 148 5.19 Iceb erg Heatmap Absolute ∆ ILD . . . . . . . . . . . . . . .
. 148 5.20 Iceb erg: Heatmap Absolute ∆ ILD min us JND . . . . . . . . . . 149 5.21 Iceb erg
metho d: Estimated azimuth angle . . . . . . . . . . . . 150 5.22 Iceb erg ITD F rontal
displacement . . . . . . . . . . . . . . . . . 152 5.23 Iceb erg ITD F rontal and lateral
displacement . . . . . . . . . . 152 5.24 Delta Interaural Level Differences R T = 0.0 s . . . .
. . . . . . 153 5.25 Delta Interaural Level Differences R T = 0.5 s . . . . . . . . . . 154 5.26
Delta Interaural Level Differences R T = 1.1 s . . . . . . . . . . 155 5.27 Estimated (mo del by
Ma y and Kohlrausc h [ 182 ]) frontal az- im uth angle at different p ositions inside the loudsp
eak er ring as function of the target angle. . . . . . . . . . . . . . . . . . . . . 156 5.28
ITD with second listener present . . . . . . . . . . . . . . . . . 157 5.29 Delta Interaural
Level Differences Cen tered+Second Listener . . 158 5.30 Estimated lo calization error with
presence of a second listener . 160 5.31 Difference to target in estimated lo calization with
presence of a second listener . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 5.32
Estimated error to R T=0 considering the estimation of real loudsp eak ers as basis. . . . . . .
. . . . . . . . . . . . . . . . . . 162 5.33 HA TS w earing a Oticon Hearing Device (righ t ear)
. . . . . . . 164 5.34 In teraural Time Difference Iceb erg metho d (aided) . . . . . . . . 165
5.35 Iceb erg metho d ILD (aided condition) . . . . . . . . . . . . . . 166 5.36 Azim uth angle
estimation (aided condition) . . . . . . . . . . . 167 5.37 Absolute difference in estimated
azimuth angle (aided condition) 168 5.38 T ukey test to compare means aided condition. . . . . .
. . . . . 169 5.39 Iceb erg metho d off center ITD (aided condition) . . . . . . . . . 170 5.40
Delta Interaural Level Differences Aided R T = 0.0 s . . . . . . . 172 5.41 Delta Interaural
Level Differences Aided R T = 0.5 s . . . . . . . 172 5.42 Delta Interaural Level Differences
Aided R T = 1.1 s . . . . . . . 173 5.43 Estimated fron tal azim uth angle on different p
ositions (aided condition) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 A.1
ITD as a function of source angle Ambisonics . . . . . . . . . . 225 B.1 Differences in the ILD
Am bisonics, 25 cm . . . . . . . . . . . . . 226 B.2 Differences in the ILD Am bisonics, 50 cm .
. . . . . . . . . . . . 227 B.3 Differences in the ILD Am bisonics, 75 cm . . . . . . . . . . . .
. 227 D.1 Reverberation time (a) Classro om (b) Restaurant . . . . . . . . 231 E.1 Classro om
alpha co efficien ts . . . . . . . . . . . . . . . . . . . . 232 E.2 Restaurant alpha co efficien ts
. . . . . . . . . . . . . . . . . . . . 233 E.3 Anechoic alpha co efficien ts . . . . . . . . . .
. . . . . . . . . . . 233 xx Nomenclature General Sym b ols C 50 Clarit y: the ratio b et ween
the first 50 ms of the RIR and from 50 ms to the end, Eq. (2.12), page 36 . C 80 Clarit y: the
ratio b etw een first 80 ms of the RIR and RIR from 80 ms to the end, Eq. (2.12), page 36 . D 50
Clarit y: ratio b et w een first RIR 50 ms of a RIR and the complete RIR, Eq. (2.14), page 37 . D
80 Clarit y: ratio b etw een first 80 ms of a RIR and the the complete RIR, Eq. (2.14), page 37 .
g Gain matrix, page 25 . h ( t ) Impulse resp onse energy in time domain, Eq. (2.10), page 35 .
h b ( t ) RIR measured with a pressure gradient microphone, Eq. (2.16), page 38 . h L ( t )
Impulse resp onses collected from the left ear, page 39 . h R ( t ) Impulse resp onses collected
from the right ear, page 39 . l 1 V ector from center p oint to c hannel 1, Eq. (2.3), page 25 .
l 2 V ector from center p oint to c hannel 2, Eq. (2.3), page 25 . L 12 Sp eak er P osition
Matrix (Channels), page 25 . m Am bisonics comp onen ts order, Eq. (2.9), page 30 . N n um b er
of necessary sources to Ambisonics repro duction, Eq. (2.9), page 30 . p V ector from center p
oint to virtual font, Eq. (2.3), page 25 . xxi p p-v alue for the t-statistic of the hypothesis
test that the corresp ond- ing co efficien t is equal to zero or not., page 114 . p n L ( t )
Bandpassed left impulse resp onse, Eq. (3.7), page 71 . p n R ( t ) Bandpassed righ t impulse
resp onse, Eq. (3.7), page 71 . p L ( t ) Impulse response at the en trance of the left ear
canal, Eq. (3.5), page 70 . p R ( t ) Impulse resp onse at the en trance of the right ear canal,
Eq. (3.5), page 70 . R T Rev erb eration time, page 34 . R T 60 Rev erb eration time, page 34 .
s ( t ) Arbitrary sound source signal, page 29 . S l ( t ) Time signal recorded with the set
microphone and the loudsp eaker l , page 66 . S E Standard error of the co efficien ts, page 114 .
T 20 Rev erb eration Time ( T 60 ) extrap olated from 25 dB of energy deca y, Eq. (2.10), page
35 . T 30 Rev erb eration Time ( T 60 ) extrap olated from 35 dB of energy deca y, Eq. (2.10),
page 35 . t Time, page 12 . t s Cen ter Time, Eq. (2.15), page 37 . T 60 Rev erb eration Time,
Eq. (2.10), page 34 . v ( t ) 1kHz Sin usoidal 1 k Hz signal recorded from the calibrator in
VFS, Eq. (3.2), page 66 . V V olume of the ro om, Eq. (2.10), page 34 . v l ( t ) Calibrator
signal recorded in the left ear, Eq. (3.1), page 65 . v r ( t ) Calibrator signal recorded in
the right ear, Eq. (3.1), page 65 . Greek Sym b ols α l, rms Calibration factor for the left
ear, Eq. (3.1), page 65 . xxii α r, rms Calibration factor for the right ear, Eq. (3.1), page 65
. ¯ α Av eraged Absorption Co efficient, Eq. (2.10), page 34 . Γ l Lev el factor to the loudsp
eaker l , Eq. (3.3), page 66 . ω Angular frequency, page 12 . ϕ Elev ation angle related to the
ears axis of the listener, page 29 . θ Azimuthal angle related to the ears axis of the listener,
page 29 . Mathematical Op erators and Con v en tions β Fixed-effects regression co efficient, page
114 . e Exp onen tial function, where e (1) ≈ 2 , 7182, page 12 . R In tegral, page 12 . j √ −
1, imaginary op erator, page 12 . τ Time delay, Eq. (2.18), page 39 . t x t-statistic for each
co efficien t to test the n ull hypothesis, page 114 . Y m n ( θ , ϕ ) Spherical harmonics
function of order n and degree m , Eq. (2.7), page 29 . L eq Equiv alen t contin uous sound lev
el, page 103 . max() F unction that returns the elemen t with the maximum v alue for a sequence
of num b ers, or for a vector, Eq. (2.18), page 39 . RMS() Ro ot mean square, Eq. (3.2), page 66
. Acron yms and Abbreviations 2D Tw o-dimensions in space, page 24 . 3D Three-dimensions in
space, page 24 . vs. F rom Latin V ersus is the past participle of vertar e . which means
“against” and “as opp osed or compared to., page 81 . AD/D A Analog-to-Digital Digital-to-Analog
conv erter, page 59 . AR Augmen ted reality, page 27 . xxiii ASW Apparen t Source Width, page 38
. BRIR Binaural ro om impulse resp onse, page 15 . CTC Cross-talk cancellation, page 23 . dB HL
Hearing Loss in decib els, page 102 . DBAP Distance-Based Amplitude Panning, page 47 . DS Direct
sound, page 15 . EcoEG Combination study num b er 3: Eco (Reverberation/Ecological) and EEG,
page 97 . EEG Electro encephalogram, page 50 . FFRs Brainstem frequency resp onses, page 50 .
FFT F ast F ourier transform, page 70 . FIR Finite Impulse Resp onse, page 71 . HA TS Head and
torso sim ulator, page 60 . HC B& K 4128 HA TS at cen ter p osition, page 79 . HC K+ X
B& K 4128 HA TS at cen ter p osition and KEMAR at X cm to the left, page 80 . HC K- X
B& K 4128 HA TS at center p osition and KEMAR at X cm to the righ t, page 79 . HEAR- ECO
Inno v ativ e Hearing Aid Researc h – Ecological Conditions and Out- come Measures, page 97 .
HINT Hearing in Noise T est, page 103 . HO A Higher Order Ambisonics, page 31 . HR TF
Head-Related T ransfer F unction, page 12 . IA CC Interaural Cross-Correlation Co efficien t, page
39 . IA CF In teraural cross-correlation function, Eq. (3.5), page 70 . ILD In teraural Level
Difference, page 10 . IPD In teraural Phase Difference, page 10 . xxiv ITD In teraural Time
Difference, page 10 . ITF Interaural T ransfer F unction, page 14 . JND Just noticeable
difference, page 93 . KEMAR Kno wles Electronics Manikin for Acoustic Research, page 60 . LEF
Lateral Energy F raction, Eq. (2.16), page 38 . LEV Listener Env elopment , page 39 . LG Lateral
Strength, Eq. (2.17), page 39 . LMM Linear Mixed-effect Mo del, page 113 . LPF Lo w-pass filter,
page 73 . L TI Linear and Time-Inv arian t System, page 33 . MD AP Multiple-Direction Amplitude
Panning, page 47 . MO A Mixed Order Ambisonics, page 52 . MTF Monaural T ransfer F unction, page
13 . NSP Nearest Sp eak er, page 52 . PLE P erceptual Lo calization Error, page 41 . PT A4 F our
bands pure tone audiometry, page 102 . RIRs Ro om impulse resp onse, page 15 . SH Spherical
Harmonics, page 28 . SPL Sound pressure lev el, page 35 . SR T Sp eec h Reception Threshold,
page 52 . VBAP V ector-Based Amplitude P anning, page 23 . VBIP V ector-Based In tensit y
Panning, page 41 . VFS V olts full scale, page 66 . VSE Virtual Sound Environmen t, page 18 . W
Omnidirectional channel, Eq. (2.9), page 29 . WFS W av e Field Syn thesis, page 31 . xxv LIST OF
FIGURES xxvi X Bi-directional pattern c hannel to wards the source, Eq. (2.9), page 29 . Y
Bi-directional pattern channel perp endicular to the source in az- im uth, Eq. (2.9), page 29 .
Z Bi-directional pattern channel p erp endicular to the source in elev a- tion, Eq. (2.9), page
29 . Chapter 1 In tro duction Individuals with normal hearing often can effortlessly comprehend
complex listening scenarios in v olving m ultiple sound sources, bac kground noise, and ec ho es
[ 226 ]. Ho wev er, those with hearing loss may find these situations par- ticularly c hallenging
[ 273 , 289 , 304 , 317 ]. These environmen ts are commonly encoun tered in daily life,
particularly during so cial even ts. They can negatively impact the communication abilities of
individuals with hearing loss [ 137 , 260 ]. The difficulties asso ciated with understanding
complex listening scenarios can b e a significant barrier for individuals with hearing loss,
leading to reduced participation in so cial activities [ 16 , 63 , 119 ]. 1.1 Motiv ations Sev
eral hearing research lab oratories worldwide are developing systems to re- alistically simulate
challenging scenarios through virtualization to b etter un- derstand and help with these everyda
y challe nges in p eople’s lives [ 41 , 79 , 102 , 116 , 118 , 160 , 161 , 188 , 195 , 218 – 220
, 259 , 272 , 298 ] The virtualization of sound sources is a p o w erful to ol for auditory
research capable of achieving 1 Chapter 1. Aims and Scop e 2 a high lev el of detail, but curren
t metho ds use exp ensiv e, expansive technol- ogy [ 293 ]. In this work, a new auralization
metho d has been dev elop ed to ac hiev e sound spatialization with a reduction in the
technological hardw are requiremen t, making virtualization at the clinic lev el p ossible. 1.2
Aims and Scop e Ov erall the ob jectiv e of the research was to in v estigate parameters of
sound virtualization metho ds related to its lo calization accuracy , esp ecially the p er-
ceptually based ones [ 39 ], in their optimal but also in c hallenging conditions. F urthermore,
an auralization metho d oriented to a smaller setup to reduce the hardw are requirements is prop
osed. The sp ecific ob jectives were: • T o inv estigate spatial distortions through binaural cue
differences in tw o w ell-kno wn virtualization setups (V ector-Based Amplitude Panning and Am
bisonics (VBAP)). • T o in vestigate the influence of a second listener inside the sound field
(VBAP and Ambisonics). • T o ev aluate the feasibilit y of a sp eec h-in-noise test within
Ambisonics virtualized reverberant ro oms. • T o study the relation b et ween reverberation,
signal-to-noise ratio (SNR), and listening effort in en vironments virtualized in first-order Am
bisonics. • T o in v estigate the binaural cues, ob jective lev el and rev erb eration time for
a new auralization metho d utilizing four loudsp eak ers. Chapter 1. Con tributions 3 • T o inv
estigate the influence of hearing aids on binaural cues and ob jec- tiv e parameters within
virtualized scenes utilizing the new auralization metho d with an appropriate setup. The main ob
jective of this research was to examine v arious parameters of sound virtualization metho ds
related to their lo calization accuracy , with a fo cus on p erceptually-based methods [ 39 ],
in optimal and c hallenging conditions. Addi- tionally , a new auralization metho d w as prop
osed for a smaller setup to reduce hardw are requirements. The sp ecific goals of the research
included: • Examining spatial distortions through differences in binaural cues in t w o w ell-kno
wn virtualization setups (V ector-Based Amplitude Panning and Am bisonics (VBAP)). • Ev aluating
a second listener’s impact within the sound field (VBAP and Am bisonics). • Assessing the
feasibility of a sp eech-in-noise test within Ambisonics vir- tualized reverberant ro oms. • Inv
estigating the relationship b et w een reverberation, signal-to-noise ratio (SNR), and listening
effort in environmen ts virtualized using first-order Am bisonics. • Using four loudsp eak ers,
prop ose an auralization metho d, measure it, analyze ob jective parameters against existent
metho ds. • T est and analyze the influence of acquiring signals with hearing aids microphones on
virtualized scenes using the new auralization metho d with a four-loudsp eaker virtualization
setup. Chapter 1. Con tributions 4 1.3 Con tributions The main contribution of this researc h to
the scientific field of auditory p er- ception is the dev elopmen t of a new auralization metho d
that addresses the curren t gap in the virtualization of sound sources using a small n umber of
loudsp eak ers. Sp ecifically , this metho d aims to achiev e b oth go o d lo calization accuracy
and a high level of immersion simultaneously , which has b een a chal- lenge in previous approac
hes. F urthermore, the proposed method com bines existing techniques. It can b e implemented
using readily av ailable hardw are, requiring a minimum of four loudsp eak ers. This tec hnology
mak es it more accessible for audiologists and researchers to create realistic listening
scenarios for patien ts and participan ts while reducing the technical resources required for
implemen tation. Ov erall, this work represen ts a v aluable contribution to the field of
auditory p erception and has the p otential to adv ance the understanding of spatial hearing and
the developmen t of effective hearing solutions. 1.4 Organization of the Thesis In Chapter 2 , a
review examines previous work carried out in sev eral dif- feren t areas concerning
virtualization and the auralization of sound sources. The chapter starts with an ov erview of
the basic concepts of h uman sound p erception. Next, virtual acoustics are explored, reviewing
the generation of virtual acoustic en vironments using different rendering paradigms and meth- o
ds. In addition, relev an t ro om acoustics concepts and ob jective parameters, and their
relation to hearing p erception, are described. Finally , the review considers auralization and
virtualization as applied to auditory research. This review stresses the imp ortance of virtual
sound sources for greater realism and ecological v alidit y in auditory research and the c
hallenges of adequately Chapter 1. Organization of the Thesis 5 creating a virtual en vironmen t
fo cused on auditory research. Chapter 3 presents an in vestigation of binaural cue distortions
in imp erfect setups. First, the metho ds are describ ed, including the complete auralization of
signals using tw o different methods and the system’s calibration. The in v es- tigation first
compares b oth auralization methods through the same calibrated virtualization setup in terms of
spatial distortions. Then the spatial cues are examined with the addition of a second listener
to the virtualized sound field. Both inv estigations are p erformed with the primary listener on
and off-center. In Chapter 4 , a b eha vioral study examines sub jectiv e effort within
virtualized sound scenarios. As the study was part of a collab orativ e pro ject, only one
auralization metho d was selected, first-order Am bisonics. The aim was to ex- amine ho w SNR and
reverberation com bine to affect effort in a sp eec h-in-noise task. Also, the feasibility of
using first-order Am bisonics was examined. Ho w- ev er, the sound sources were w ell separated
in space, and lo calization accuracy w as not a factor. An imp ortan t asp ect of the study w as
an auralization issue in v olving head mo v ement observed during pilot data collection. This
issue led to a solution that allow ed the study to con tin ue. The results v erified the
relationships b et w een sub jective effort and acoustic demand. F urthermore, this issue led to
the further inv estigation of the effect of off-center listening, considered in b oth Chapter 3
and Chapter 5 . In Chapter 5 , a h ybrid method of auralization is prop osed com bining the
meth- o ds examined and used in previous chapters: VBAP and Ambisonics. This metho d w as
designed to allow auralized signals to b e virtualized in a small repro duction system, thus
providing b etter accessibility to research within the virtualized sound field in clinics and
researc h cen ters that do not ha ve a size- able acoustic apparatus. The hybrid auralization
metho d aims to unite the strengths of b oth tec hniques: lo calization b y VBAP and immersion
by Am- Chapter 1. Organization of the Thesis 6 bisonics. Both of these psychoacoustic strengths
are related to the ro om’s im- pulse resp onse. The hybrid metho d con v olv es the desired
signal with distinct parts of an Ambisonics-format impulse resp onse that c haracterizes the
desired en vironmen t. The potential for generating auralizations for a repro duction system
with at least four loudspeakers is demonstrated. The virtualization system was tested with three
differen t scenarios. Parameters relev ant to the p erception of a scene, such as rev erb eration
time, sound pressure lev el, and binaural cues, w ere ev aluated in differen t p ositions within
the sp eak er arrange- men t. The effects of a second participan t inside the ring were also in
vestigated. The ev aluated parameters w ere as exp ected, with the listener in the system’s cen
ter (sw eet sp ot). How ever, deviations and issues at sp ecific presentation angles w ere
identified that could b e impro v ed in future implementations. Such errors also need to b e
further inv estigated as to their influence on the sub jec- tiv e perception of the scenario,
whic h w as not performed due to the CO VID-19 pandemic. An alternativ e robustness assessment
was p erformed offline, exam- ining the lo calization accuracy with a mo del prop osed b y May et
al. [ 182 ] The metho d also prov ed effective for tests with hearing aids for listeners p
ositioned in the cen ter of the sp eak er arrangement. Ho w ev er, the metho d p erformance
considering hearing instruments with compression algorithms and adv anced signal pro cessing
still needs to b e verified. Chapter 6 presents a general discussion of the feasibilit y of
applying tests using the prop osed metho d and an o v erview of the pro cesses. In addition, the
relev an t con tributions of the work are presen ted, as are the limitations and the suggestions
for further impro v emen ts. Chapter 2 Literature Review 2.1 In tro duction The field of
audiology is concerned with the study of hearing and hearing dis- orders, as w ell as the
assessmen t and rehabilitation of individuals with hearing loss [ 110 ]. In this review chapter,
w e will explore v arious topics related to h uman binaural hearing, spatial sound, and virtual
acoustics to pro vide a comprehensiv e o v erview of the curren t state of kno wledge in these
fields and highligh t their imp ortant contributions to our understanding of hearing and auditory
p erception. First, we will delv e in to the intricacies of h uman binaural hearing. Next, w e
will examine the concepts of spatial hearing, including the v arious binaural and monoaural cues
that contribute to our ability to lo cal- ize sound in space. W e will also explore the
head-related transfer function, whic h describ es the w ay that sounds are filtered as they tra
vel from their source to the ear drum, as well as the sub jective asp ects of audible
reflections. Next, we will turn our attention to spatial sound and virtual acoustics. W e will
discuss the virtualization of sound, including the v arious methods used to achiev e this, suc h
as auralization and virtual sound repro duction. W e will 7 Chapter 2. Human Binaural Hearing 8
also examine the differen t auralization paradigms used in auditory researc h, including
binaural, panorama, vector-based amplitude panning, ambisonics, and sound field syn thesis. W e
will then examine the role of ro om acoustics in virtualization, and auditory research,
including the v arious parameters, used to describ e ro om acoustics, such as reverberation
time, clarity and definition, cen ter time, and parameters related to spatiality . Finally , w e
will explore the use of loudsp eak er-based virtualization in auditory researc h, including
hybrid metho ds and sound source lo calization, as well as the assessmen t of listening effort.
2.2 Human Binaural Hearing The engineering side of the listening pro cess can be simplified mo
deled through t w o input blo cks separated in space [ 92 ]. These inputs, frequency , and lev
el are limited and are follo w ed b y a signal pro cessing c hain that relates the medium
transformations for the wa ve propagation from air to fluid and elec- trical pulses [ 315 ].
Although this blo ck mo deling can b e reasonably accurate for educational pur- p oses, it falls
short of capturing the true effect and imp ortance of listening on our essence as human b eings.
The abilit y to feel and interpret the world through the sense of hearing, and to attribute
meaning to sound ev ents, enables h umans to enric h their tangible world [ 56 , 244 ]. F or
instance, a c haracteristic sound can ev ok e memories or trigger an alert [ 128 ]. A piece of
music can bring tears to one’s eyes or p ersuade someone to purchase more cereal [ 13 , 114 ]. A
p erson’s voice can activ ate certain facial nerves, turning hidden teeth into a smile. These
are some of the reasons wh y researc hers and clinicians dedicate their liv es to understanding
the transformation of sound even ts into auditory ev en ts, with a scien tific dedication fo
cused on creating solutions and op ening Chapter 2. Human Binaural Hearing 9 opp ortunities for
more p eople to exp erience the sound they lov e and deserve - a dedication fo cused on p eople
and their needs . As the auditory system comprises t wo sensors, normal-hearing listeners can
exp erience the b enefits of comparing sounds autonomously , relating them to the space around
them [ 21 ]. This constant signal comparison is the main principle of binaural hearing, where
the differences b et ween these sounds allo w for the identification of the direction of a sound
even t, as w ell as the sensation of sound spatiality [ 9 , 40 ]. Usually , these signals are
assumed to b e part of a linear and time-inv ariant system, whic h helps to study how humans in
terpret the information present in the differen t signals across the time and frequency domains.
Ho wev er, this assumption of linearity can fail when analyzing fast sound sources, reflectiv e
surfaces, or sound propagating through disturb ed air [ 200 , 255 ]. Nonetheless, the adv an
tages of quan tifying and capturing the effect hav e led to significant progress in hearing
sciences. 2.2.1 Spatial Hearing Concepts Iden tifying the direction of incidence of a sound
source based on the audible w a v es receiv ed b y the listener is defined as an act or pro cess
of h uman sound lo calization [ 285 ]. F or research in acoustics, it is relev an t to ac kno
wledge that the receiver is, in general, a human b eing. The human hearing mec hanism’s main
anatomical c haracteristic is the binaural system. There are t wo signal reception p oin ts
(external ears p ositioned on opp osite sides of the head). Al- b eit, the whole set (torso,
head, hearing pavilions) can also mo dify the signal that reac hes the tw o tympanic mem branes
at some extent [ 153 , 216 ]. Hu- man binaural hearing and asso ciated effects ha ve b een
extensiv ely rep orted b y Blauert [ 38 ]. Chapter 2. Human Binaural Hearing 10 In addition to
analyzing sound sources’ spatial lo cation, the cen tral auditory system extracts real-time
information from the sound signals related to the acoustic en vironment, suc h as geometry and
ph ysical properties [ 153 ]. An- other b enefit is the p ossibility of separating and in
terpreting com bined sounds, esp ecially from sources in differen t directions [ 170 , 242 ].
2.2.2 Binaural cues The sound propagation sp eed in the air can be assumed to be finite and appro
ximately constan t, considering it as an appro ximately non-disp ersive medium [ 18 ]. Thus,
when the incidence is not directly fron tal or rear, the w a v efront trav els through differen t
paths to the ears, reaching them at differ- en t times. The time interv al b etw een that a sound
takes to arrive on both ears is commonly expressed in the literature as Interaural Time
Difference (ITD) [ 39 ]. It is crucial cue for sound source lo calization in lo w-frequency
sounds [ 39 , 153 , 242 ]. Moreov er, it is considered the primary lo calization cue [ 306 ]. F
or con tinuous pure tone signals and other p eriodic signals, the ITD can be expressed as the
time Interaural Phase Difference (IPD) [ 285 ]. On the other hand, most mammals’ high-frequency
sound source lo calization is based on a comparativ e analysis of sound energy in eac h ear’s
frequency bands, the Interaural Lev el Difference (ILD). The named duplex theory sur- mises ITD
cues as the basis to sound lo calization of low-frequency and ILD cues to high-frequency . The
authorship of this is assigned to Lord Rayleigh at the b eginning of the last century [ 246 ].
These binaural cues are related to the azim uthal p osition. How ever, they do not presen t the
same success explaining the localization on elev ated p ositions [ 37 , 250 ]. An am biguit y in
binaural cues caused by head symmetry and referred to as the cone of confusion [ 296 ] can
create difficulties to a correct sound source lo calization. The cone of confusion is the
imaginary cone extended sidew ays from eac h ear where sound source Chapter 2. Human Binaural
Hearing 11 lo cations will create the same interaural differences (see Figure 2.1 ). Figure 2.1:
Two-dimensional r epr esentation of the c one of c onfusion. Head mov ements are essential for
resolving the am biguous cues from sound sources lo cated on the cone of confusion. As the p
erson mo ves their head, they change the reference and the incidence angle helping them to solv
e the dualit y . This change is reflected in the cues asso ciated with directional sound filtering
caused by the h uman b o dy’s reflection, absorption, and diffraction. 2.2.3 Monaural cues
Monoaural cues are related to spatial impression, esp ecially in the lo calization of elev ated
sound sources. These cues give, to some exten t, some limited but crucial localization abilities
to p eople with unilateral hearing loss [ 72 , 307 ]. This type of cue is cen tered on instan t
level comparison and frequency changes. As the level of a contin uous enough sound source
changes, the appro ximation or distancing of that source can b e estimated. F urthermore, when
there are head mov emen ts that shape the frequency conten t, the disturbance, mainly the pinnae
provide, can b enefit the listener to learn the p osition of a sound source [ 129 , 292 ]. In
addition, the imp ortance of the previous knowledge of the sound to the decon volution pro cess
is also in vestigated, revealing mixed results [ 307 ]. Chapter 2. Human Binaural Hearing 12
2.2.4 Head-related transfer function The Head-Related T ransfer F unction (HR TF) describ es the
directional filtering of incoming sound due to h uman bo dy parts suc h as the head and pinnae [
189 ]. The free-field HR TF can b e expressed as the division of the impulse resp onses in the
frequency domain measured at the en trance to the ear canal and the cen ter of the head but with
the head absen t [ 108 ] (see Figure 2.2 ). HR TFs dep end on the direction of incidence of the
sound and are generally measured for some discrete incidence directions. Mathematical mo dels
can also generate individualized HR TFs based on anthropometric measures [ 52 ] or through
geometric generalization [ 70 ]. Figure 2.2: A descriptive definition of the me asur e d fr e
e-field HR TF for a given angle. The referential system related to the head can b e seen in
Figure 2.3 , where β is the elev ation angle in the midplane, and ϕ is the angle defined in the
horizon tal plane. Chapter 2. Human Binaural Hearing 13 Figure 2.3: Polar c o or dinate system r
elate d to he ad incidenc e angles, adapte d fr om Portela [ 240 ]. Supp ose the distance to the
sound source exceeds 3 meters. In that case, it can b e considered approximately a plane w a ve,
thus making the previous HR TFs almost indep enden t of the distance to the sound source [ 38 ].
Blauert [ 39 ] also explain tw o other t yp es of HR TF, namely: • Monaural T ransfer F unction
(MTF): relates the sound pressure, at a measuremen t p oint in the ear canal, from a sound
source at an y p osition to a sound pressure measured at the same p oin t, with a sound source
at a reference p osition ( ϕ = 0 and β = 0). MTF is giv en by MTF = P i P 1 r,ϕ,β,f P i P
1 ϕ =0 ◦ ,β =0 ◦ ,f , (2.1) where p i it can b e p 1 , p 2 , p 3 or p 4 . – p 1 sound pressure
in the cen ter of the head position with the listener Chapter 2. Human Binaural Hearing 14 absen
t; – p 2 sound pressure at the entrance of the o ccluded ear canal; – p 3 sound pressure at the
entrance to the ear canal; – p 4 : eardrum sound pressure. • Interaural T ransfer F unction
(ITF): relates the sound pressures at corre- sp onding measurement p oints in the tw o auditory
canals. The reference pressure will then b e the ear that is directed to w ards the sound
source. The ITF can b e obtained through ITF = P i Opp osite side of the source P i Side facing
the source . (2.2) More considerable v ariations are seen ab o v e 200 Hz in HR TFs [ 293 ] b
ecause the head, torso, and shoulders b egin to significantly in terfere in frequencies up to
approximately 1.5 kHz (mid frequencies). In addition, the pinna and the ca vum conc hae (space
inside the most inferior part of the helix cross; it forms the vestibule that leads into the
external acoustic meatus [ 270 ]) distort frequencies greater than 2 kHz. HR TF measuremen ts v
ary from p erson to p erson, as seen in Figure 2.4 , where TS 1, TS 2, TS 3, and TS 4. represent
HR TFs of different p eople. When recording using mannequins or different p eople’s ear canals
(non-individu alized HR TFs), the repro duction precision in terms of spatial lo cation and
realism tends to b e diminished [ 51 , 178 ]. This p o orer precision is b ecause the transfer
function will differ for eac h individual, especially at high frequencies [ 155 ]. This dep
endence is related to the w a v elength and the singular irregularit y of the ear canal of each
human b eing [ 38 ]. Chapter 2. Human Binaural Hearing 15 Figure 2.4: He ad-r elate d tr ansfer
functions of four human test p articip ants, fr ontal incidenc e, fr om V orl¨ ander [ 293 ].
Binaural Impulse Resp onse A Binaural Ro om Impulse Resp onse (BRIR) results from a measurement
of the resp onse of a ro om to excitation from an (ideally) impulsive sound [ 183 ]. The BRIRs
are comp osed of a sequence of sounds. P arameters like the magnitude and the decay rate, the
phase, and time distribution are the key to understanding how a BRIR can audibly char- acterize
a ro om to a human p erception [ 167 ]. Alb eit the air contains a small p ortion of Co2 that is
disp ersiv e, sound propagation v elo city can be considered homogeneous in the air (non-disp
ersiv e medium) [ 312 ] for the Ro om Impulse Resp onses (RIRs). The first sound from a source
that reaches a receptor inside the ro om tra vels a smaller distance, and it is called direct
sound (DS). Usually , the following sounds result from reflections that tra vel a longer path,
losing energy on each interaction and resulting in an exp onen tial deca y of magnitude. The
BRIR is prop osed to collect the ro om information as a regular Impulse Resp onse, although ha
ving tw o sensors separated as the typical human head. No w ada ys, BRIR can b e recorded with s
mall microphones placed in the ear canal of a p erson or utilizing microphones placed in
mannequins [ 197 ]. A BRIR is the auditory time representation of a set source-receptor defined b
y its p osition, orientation, acoustic prop erties as directionalit y of the sound Chapter 2.
Human Binaural Hearing 16 source, as well as from the physical elemen ts within the environmen t
[ 38 , 108 ]. The conv olution of BRIR with audio signals is a feasible task for mo dern
computation, which allows the creation and manipulation of sounds ev en in real-time
applications [ 62 , 217 ]. Thus, it is p ossible to imp ose spatial and rev erb eran t
characteristics of differen t spaces to a given sound [ 109 ]. 2.2.5 Sub jectiv e asp ects of an
audible reflection The impulse resp onse is comp osed of the direct sound follo w ed by a series
of reflections (initial and later reflections) [ 45 , 165 ]. Essential knowledge on how the human
auditory system processes the sp ectral and spatial information con tained in the impulse resp
onse has b een obtained through studies with sim ulated acoustic fields. [ 6 , 17 , 93 , 125 ,
141 , 174 , 176 , 188 , 193 , 257 , 305 , 321 ]. The results of Barron’s exp erimen ts, depicted
in Figure 2.5 , in volv ed the repro duction of b oth a direct sound and a lateral reflection.
These t wo auditory stimuli were manipulated in terms of their time delay and relative
amplitude, with the goal of eliciting sub jectiv e impressions correlated to these factors. By v
arying the time b et w een the direct sound and the reflection, as w ell as the relative
amplitude of these stimuli, it w as p ossible to understand b etter ho w these c haracteristics
impact the o v erall auditory exp erience. Figure 2.5: Audible effe cts of a single r efle ction
arriving fr om the side (adapt fr om R ossing [ 254 ]). Chapter 2. Spatial Sound &
Virtual Acoustics 17 The audibility threshold curve indicates that the reflection will b e
inaudible if the dela y or the relativ e level is minimal. The reflection’s sub jective effect
also dep ends on the direction of incidence of the sound source in the horizon tal and v ertical
plane. It is p ossible to note that for dela ys of up to 10 milliseconds, the relativ e
difference in level must b e at least -20 dB for the reflection to b e noticeable. The echo effect
is typically observ ed in delays of more than 50 milliseconds, b eing an acoustic rep etition
with a high relativ e level—appro ximately the same energy as the direct sound. The coloring
effect is associated with the significant c hange in the spectrum caused b y the constructive and
destructiv e in terference of the sup erp osition of sound wa v es. The image c hange happ ens
when there are reflections with relativ e lev els higher than direct sound or minimal dela ys. In
this case, the sub jective p erception is that the sound source has a differen t p osition in
space than the visual system p erceiv es. 2.3 Spatial Sound & Virtual Acoustics The
sound p erceived by h umans is iden tified and classified based on physical prop erties, such as
in tensity and frequency [ 242 ]. Human b eings are equipp ed with tw o ears (tw o highly efficien
t sound sensors), enabling a real-time compar- ison of these prop erties b etw een the captured
sound signals [ 9 ]. The sounds and the dynamic interaction betw een sound sources, their p
ositions, mov e- men ts, and the ph ysical interaction of the generated sound w av es with the
en vironmen t can b e p erceiv ed b y normal-hearing p eople, providing what is called spatial
aw areness [ 153 ]. That auditory spatial aw areness includes the lo calization of the sound
source, estimation of distance, and estimation of the Chapter 2. Spatial Sound & Virtual
Acoustics 18 size of the surrounding space [ 38 , 305 ]. A p erson with hearing loss may lose
this abilit y partially or en tirely; the spatial aw areness is also tied to the lis- tener’s
exp erience with the sound and the en vironmen t, motiv ation, or fatigue lev el [ 54 , 304 ].
In the field of virtual acoustics, the ultimate goal is to generate a sound even t that elicits a
desired auditory sensation, creating a Virtual Sound Environmen t (VSE) [ 293 ]. In order to ac
hiev e this, it is necessary to syn thesize or record the acoustic prop erties of the target
scene and subsequently repro duce them in a manner that accurately reflects the original acoustic
conditions [ 97 ]. This in v olv es a careful consideration of the v arious factors that
contribute to the o v erall auditory exp erience, including the sp ectral and spatial c
haracteristics of the sound. By accurately recreating these prop erties, it is p ossible to
create a highly immersive and realistic VSE that effectiv ely conv eys the intended auditory exp
erience to the listener [ 196 , 213 , 293 ]. 2.3.1 Virtualization No w ada ys, it is p ossible
to create audio files containing information ab out sound prop erties related to a sp ecific space
[ 293 ]. F or example, it is p ossible to enco de information ab out the source and receptor p
osition, the transmission path, reflections on surfaces, and the amoun t of energy absorb ed and
scattered ( e.g., Odeon [ 59 ], a commercially a v ailable acoustical softw are). The sound field
prop erties can b e simulated, syn thesized, or recorded in-situ [ 113 , 293 ]. These signals
can be enco ded and repro duced correctly in v arious reproduction systems [ 122 , 161 ]. The
creation of files that can b e repro duced containing suc h information is called auralization.
As different interpretations of the terms o ccur in literature, in this thesis, the
virtualization pro cess is considered to encompass the auralization and the reproduction of a
sound (recorded, Chapter 2. Spatial Sound & Virtual Acoustics 19 sim ulated, or
synthesized) that includes spatial prop erties. 2.3.1.1 Auralization Auralization is a
relatively recent pro cedure. The first studies w ere conducted in 1929; Spand¨ ock and
colleagues tried to pro cess signals measured in a scale- created ro om. After that, in 1934,
Spand¨ oc k [ 280 ] succeeded in the first aural- ization, in the analogical wa y , using
ultrasonic signals of scale mo dels recorded in magnetic tap es. In 1962 Schroeder [ 263 ]
incorp orated the computing pro- cess into the auralization. In 1968 Krokstad [ 146 ] developed
the first acoustic ro om sim ulation softw are. The term auralization was in tro duced in the
litera- ture by Kleiner in 1993: “Auralization is the pro cess of rendering audible, by ph
ysical or mathematical mo deling, the soundfield of a source in a space, in suc h a wa y as to
simulate the binaural listening exp erience at a giv en p osition in the mo deled space.”
(Kleiner [ 138 ]) In his b o ok titled Auralization, published in 2008, V orl¨ ander defined:
“Aural- ization is the technique of creating audible sound files from n umerical (simu- lated,
measured, or syn thesized) data.” (V orl¨ ander [ 293 ]) In this work, auralization is understo
o d as a technique to create files that can b e executed as p erceiv able sounds. An auralization
metho d describes the tec h- nique; it can inv olve one or more auralization techniques. These
sounds can then b e virtualized (repro duced) via loudsp eakers or headphones and provide
audible information ab out a sp ecific acoustical scene in a defined space, fol- lo wing V
orlander’s definition. That w as chosen to encourage the separation of the pro cess as an
auralized sound file can contain information that allows it to b e deco ded in different repro
duction systems [ 320 ]. Auralization is consolidated in arc hitectural acoustics [ 45 , 148 ,
165 , 254 ], and Chapter 2. Spatial Sound & Virtual Acoustics 20 it is also emerging in
environmen tal acoustics [ 19 , 68 , 69 , 139 , 162 , 231 , 232 ]. This technique allows a piece
of audible information to b e easily accessed and understo o d. It is also an integral part of
the en tertainmen t industry in games, mo vies, and virtual or mixed realit y [ 320 ]. Knowing
an en vironmen t’s acoustic prop erties allo ws it to manipulate or add syn thesized or recorded
elements, leading the receiv er to the desired auditory impression, including the sound’s
spatial distribution [ 62 ]. This pro cess is also used in hearing researc h, allowing researc
hers to in tro duce more ecologically v alid sound scenarios to their study (See Section 2.3.4
). Sound spatiality , or the p erception of sound wa ves arriving from v arious direc- tions and
the abilit y to lo cate them in space, is a crucial aspect of the auditory exp erience [ 40 ].
Auralization, whic h is analogous to visualization, inv olves the represen tation of sound fields
and sources, the simulation of sound propaga- tion, and the strategy to deco de in the spatial
repro duction setup [ 293 ]. That is t ypically achiev ed through tri-dimensional computer mo
dels and digital sig- nal pro cessing tec hniques, whic h are applied to generate auralizations
that can b e repro duced via acoustic transducers [ 293 ]. The mo deling paradigm used to create
the spatial sensation can b e p ercep- tually or physically based [ 39 , 106 , 164 , 276 ].
Multiple dimensions influ- ence sound p erception; the t yp e of generation of the sound, the
wind direc- tion, the temp erature, the source and the receptor mo v emen t, space (size, shap
e, and conten t), receptor’s spatial sensitivity , and source directivity are some examples.
That implies the imp ortance of physical effects as Doppler shifts [ 96 , 284 , 293 ]. F
urthermore, the review of ro om acoustic and psyc hoa- coustics elements (See Section 2.3.3 )
corrob orates the auralization mo deling pro cedure’s understanding. Chapter 2. Spatial Sound
& Virtual Acoustics 21 2.3.1.2 Repro duction Sound signals con taining acoustic
characteristics of a space can b e repro duced either with binaural techniques (headphones or
loudsp eakers) or with multiple loudsp eak ers (m ultichannel techniques) [ 293 ]. Moreov er, an
acoustic mo del for a space can be analytically or n umerically implemented, ha ving a series of
comp etent algorithms and commercial soft ware and to ols av ailable [ 49 ]. With that, it is
also p ossible to measure micro and macro acoustic prop erties for materials in a lab oratory or
in-situ [ 206 ] and access databases of v arious co efficien ts and indexes to an extended catalog
of materials [ 50 , 71 , 158 , 266 ]. On the repro duction end of the virtualization pro cess,
factors such as fre- quency and lev el calibration, signal pro cessing, and the frequency resp
onse of the hardw are can significantly impact the accuracy of the final sound (e.g., the orien
tation/correction of the microphone when calibrating the system [ 274 ]). Dep ending on the
chosen paradigm, a lack of attention to these details may disrupt an accurate description of the
sound field, sound even t, or sound sen- sation [ 214 , 282 , 283 , 320 ]. Additionally , the
qualit y of the stim uli ma y be compromised dep ending on the chosen repro duction technique,
which is often tied to the hardware av ailable [ 77 , 166 , 275 , 276 ]. That can lead to
undesired effects on the level of immersion and problems with the accuracy of sound lo calization
and identification ( e.g. , source width, source separation, sound pressure level, and coloration
and spatial confusion effects [ 97 ]). The pro cess of building a VSE is called sound
virtualization, whic h in volv es b oth the aural- ization and repro duction stages to create
audible sound from a file. The main tec hnical approac hes or paradigms for repro ducing
auralized sound are Binau- ral, Panorama, and Sound Field Synthesis (Section 2.3.2 ). These
paradigms can be distinguished by their output, which can be ph ysically or p erceptu- ally
motiv ated. F or example, while binaural metho ds are treated apart, they can b e intrinsically
classified in a ph ysically-motiv ated paradigm since its suc- Chapter 2. Spatial Sound &
Virtual Acoustics 22 cess relies on repro ducing the correct physical signal at a sp ecific p
oint in the listener’s auditory system, t ypically the entrance of the ear canal [ 106 ]. 2.3.2
Auralization P aradigms 2.3.2.1 Binaural Binaural hearing, whic h refers to the abilit y to p
erceiv e sound in a three- dimensional auditory space, is a fundamental concept in auditory
researc h and has b een extensiv ely studied by researchers suc h as Blauert [ 40 ]. In the con-
text of auralization, the term ”binaural” refers to the sp ecific paradigm that aims to repro
duce the exact sound pressure level of a sound even t at the lis- tener’s eardrums. That can b e
achiev ed through the use of headphones or a pair of loudsp eakers (kno wn as transaural repro
duction) [ 314 ]. How ever, when using distan t loudsp eak ers, it is necessary to consider the
in terference that can o ccur b et ween the sounds coming from eac h sp eak er. T o mitigate
this issue, tec hniques such as cross-talk cancellation (CTC) [ 60 , 262 ] can b e employ ed,
whic h inv olve manipulating a set of filters to cancel out the distortions caused b y the sound
from one sp eaker reaching the other ear. Another form of bin- aural repro duction inv olves the
use of closer loudsp eak ers that are nearfield comp ensated. Binaural metho ds o ver headphones
is commonly applied. It requires no ex- tensiv e hardw are (in simple setups that do not track
the listener’s head), pro- viding a v alid acoustic representation and spatial a w areness [ 293
]. A Disad- v an tage of this metho d can include the accuracy dep endence of individualized HR
TF (as each h uman b eing ha ve his o wn slightly differen t anatomic ”fil- ter set”) [ 314 ].
Over headphones also, the mov ement of the listener’s head can b e disruptive to the immersion [
179 ]. It ma y require tracking the head’s Chapter 2. Spatial Sound & Virtual Acoustics
23 mo v emen t [ 11 , 115 , 252 ], e.g. , when mo v ements are required or allow ed in an exp
erimen t. F urthermore, a listener wearing a pair of headphones ma y not represent a realistic
situation. F or example, an exp erimen t with a virtual auditory environmen t that represents a
regular daily con versation with aged participan ts ma y lose the task’s ecological v alidity .
Also, usually , headphones prev en t the listener from wearing hearing devices. Figure 2.6
illustrates the main idea b ehind different binaural repro duction setups. Figure 2.6: Binaur al
r epr o duction setups: He adphones, tr ansaur al and ne ar-field tr ansaur al (A dapte d fr om
Kang and Kim [ 131 ]). 2.3.2.2 P anorama The P anorama paradigm encompasses auralization metho
ds fo cused on deliv- ering accurate ITDs and ILDs at the listener’s p osition, also kno wn as
stereo- phonic tec hniques [ 106 , 276 ]. The most well-kno wn metho ds are based on am- plitude
panning [ 180 ], including Lo w-order Ambisonics [ 91 ] and V ector-Based Amplitude Panning
(VBAP) [ 241 ]. High Order Am bisonics is an extension of the Am bisonics metho d, which is not
typically considered a panning metho d but rather a sound field synthesis metho d (See Section
2.3.2.3 ). VBAP em- plo ys lo cal panning b y rendering sound using pairs or triplets of loudsp
eakers. In con trast, Ambisonics uses global panning to pro duce a single virtual source using
all av ailable loudsp eakers [ 282 ]. Chapter 2. Spatial Sound & Virtual Acoustics 24 V
ector Based Amplitude P anning: The V ector-Based Amplitude Pan- ning (VBAP) is a first-order
appro ximation of the comp osition of emitted signals that creates virtual sources [ 241 ]. The
virtualization pro cess using VBAP is based on amplitude panning in tw o dimensions (v ariation
in ampli- tude b et ween the speakers), whic h is derived from the Law of Sines and Law of T
angents (see Benest y et al. [ 23 ] for a deriv ation of these la ws). The original h yp othesis
of VBAP assumes that the sp eak ers are arranged symmetrically , equidistan t from the listener,
and in the same horizon tal plane. VBAP do es not limit the num b er of usable speakers but uses
a maximum of 3 simulta- neously . The sp eakers are arranged in a reference circle (2D case) or
sphere (3D case), and a limitation of the tec hnique is that virtual sources cannot b e created
outside of this region. VBAP is mainly used for the repro duction of syn thetic sounds [ 180 ].
The formulation of the VBAP metho d (from Pulkki [ 241 ]) for tw o dimensions starts from the
stereophonic configuration of tw o channels (see Figure 2.7 ). Reform ulated to a v ector base,
formed by unit length v ectors l 1 = [ l 11 l 12 ] T and l 2 = [ l 21 l 22 ] T that p oin t to
the sp eakers and the unit length v ector p = [ p 1 p 2 ] T whic h p oints to the virtual source
and presen ts itself as a linear combination of the v ectors l 1 and l 2 . The notation T is
used here to iden tify the matrix transp osition. Chapter 2. Spatial Sound & Virtual
Acoustics 25 Figure 2.7: V e ctor-b ase d amplitude p anning: 2D display of sound sour c es p
ositions and weights. Consider the vetor p : p = g 1 l 1 + g 2 l 2 , (2.3) where g 1 and g 2
(scalar) are the gain factors to b e calculated for p ositioning the vector relative to the
virtual source. In matrix form, there is p T = g L 12 , (2.4) where g = [ g 1 g 2 ], and L 12 =
[ l 1 l 2 ] T . The gains can b e calculated b y g = p T L − 1 12 = [ p 1 p 2 ] l 11 l 12
l 21 l 22 − 1 . (2.5) Chapter 2. Spatial Sound & Virtual Acoustics 26 The
formulation is also expanded to 3 dimensions: p = g 1 l 1 + g 2 l 2 + g 3 l 3 , (2.6) and p T =
g L 123 , (2.7) where g 1 , g 2 , and g 3 are gain factors, g = [ g 1 g 2 g 3 ], and L 123 = [ l
1 l 2 l 3 ] T . The detailed deriv ation can b e found at [ 241 ]. The deriv ation can use
triangles and the three-dimensional system. Figure 2.8 presen ts an example of the sound source
distribution for virtualization of a virtual source P using VBAP in three dimensions. Figure
2.8: Diagr am r epr esenting the plac ement of sp e akers in the VBAP te chnique A dapte d fr om
[ 241 ]. Some factors collab orate so that metho ds based on Amplitude Panorama are widely used
in virtual audio applications, such as the lo w computational cost and flexibility in the sp eak
ers’ placement. Am bisonics: The original Ambisonics auralization metho d is an amplitude
panning metho d that differs from th e V ector Base Amplitude Panning (VBAP) Chapter 2. Spatial
Sound & Virtual Acoustics 27 metho d in sev eral w ays. While VBAP only uses p ositiv e
w eights to pan sound across speakers, Ambisonics uses a com bination of p ositiv e and neg-
ativ e weigh ts to create a shift in frequency and amplitude. This results in a more homogeneous
sound field, alb eit with a broader virtual source. Addition- ally , Am bisonics has all loudsp
eakers active for any source p osition. A t the same time, VBAP only activ ates sp ecific sp
eakers based on the desired source p osition [ 199 ]. One of the b enefits of Am bisonics is its
scalabilit y for repro duction on differen t loudsp eak er arra ys and the ability to enco de and
deco de the sound field dur- ing the recording and repro duction pro cess [ 161 ]. This v
ersatilit y is p ossible b ecause Am bisonics signals can b e directly recorded using an
appropriate mi- crophone array or sim ulated through numerical acoustic algorithms that mo del
the directional sensitivit y of the microphone arra y [ 5 , 46 , 59 ]. The signal can then be
deco ded and rendered in real time to differen t arra ys with v arious n um b ers of loudsp eak
ers. Hence, an Am bisonics deco der is a to ol for con v ert- ing an Ambisonics representation
of a sound field into a m ultic hannel audio format that can b e repro duced ov er a giv en sp
eaker setup [ 130 , 235 , 238 ]. In order to repro duce an Ambisonics signal, it must first b e
transformed, or ”de- co ded,” in to a format compatible with a sp ecific sp eak er configuration.
Simple deco ders consist of a frequency-indep enden t weigh ting matrix [ 282 ]. It is also p
ossible to repro duce the signal via headphones, whic h can b e considered a sp ecific sp eaker
setup, b y scaling it down to binaural signals [ 320 ]. Addi- tionally , Am bisonics can enhance
realism by trac king head mov emen ts and correcting binaural signals utilizing HR TFs as filters
[ 277 ]. This feature is par- ticularly relev an t in the recording and broadcasting industry ,
particularly with emerging technologies such as augmented realit y (AR) [ 320 ]. In summary , an
Am bisonics deco der is a to ol used to transform an Ambisonics representation of a sound field
in to a multic hannel audio format that can b e repro duced o v er a giv en loudspeaker setup,
enabling the creation of immersiv e sound exp eriences. Chapter 2. Spatial Sound &
Virtual Acoustics 28 According to Schr¨ oder [ 261 ], decomp osition in spherical harmonics (SH)
is a recen t analysis and widely used in the mo deling of directivity paths. Analo- gous to a F
ourier transform in the frequency domain, SH in the spatial domain decomp oses the signal into
spherical functions (in the F ourier transform, the decomp osition is in sine or cosine
functions) weigh ted by the co efficien ts of the corresp onding harmonic spheres. According to
Pollo w [ 239 ], it is commonly applied in m ulti-dimensional domain problems. Ho wev er, the
analytical re- quiremen ts for cases with few dimensions (tw o in the case of the sound field)
are considerably simplified. Manipulating the wa ve equation by separating v ariables is an essen
tial to ol here. App endix C sho ws the deriv ation of SH through the separation of v ariables
of the wa ve equation in spherical co ordinates (Equation C.1 ). The solutions to the linear w a
v e equation in spherical co ordinates expressed in the frequency domain (Helmholtz equation)
are orthogonal basis functions Y m n ( θ , ϕ ) where n is the degree and m is the order. These
angle-dep endent functions are called spherical harmonics and can represent, for example, a
sound field [ 309 ]. That is the core assumption to Ambisonics recording and repro duction.
Figure 2.9 depicts SHs up to order N = 2. Chapter 2. Spatial Sound & Virtual Acoustics
29 Figure 2.9: Spheric al Harmonics Y m n ( θ , ϕ ) . R ows c orr esp ond to or ders 0 ≤ n ≤ 2 ,
c olumns to de gr e es − n ≤ m ≤ n (adapte d fr om Pol low [ 239 ]). The four SH weigh ts Y m n
( θ , ϕ ) to enco de all the spatial audio information into a First-Order Ambisonics file is
given by: B m n ( t ) = s ( t ) Y m n ( θ s , ϕ s ) (2.8) where the s ( t ) is the source signal
in the time domain and Y m n ( θ s , ϕ s ) the enco ding co efficients to the source s ( t ).
Computed as first order in the B- format, the normalized comp onen ts can b e describ ed as [ 172
]: W = B 00 = S Y 00 ( θ S , ϕ S ) = S (0 . 707) X = B 11 = S
Y 11 ( θ S , ϕ S ) = S cos θ S cos ϕ S Y = B 1 − 1 = S Y 1 − 1 ( θ S , ϕ S ) = S sin θ S cos ϕ S
Z = B 10 = S Y 10 ( θ S , ϕ S ) = S sin ϕ S (2.9) The resulting four-c hannel signals are the
equiv alent to an omnidirectional (W) and three orthogonal bi-directional (commonly called
figure-of-eigh t) mi- Chapter 2. Spatial Sound & Virtual Acoustics 30 crophones (X, Y,
and Z). The c hannels can represent the pressure and the particle velocity of a given sound (See
Figure 2.10 . It is p ossible to transco de and manipulate the generated signal to change its
orientation with a matrix m ultiplication in signal pro cessing. Also, it is possible to deco de
the same enco ded signal to a single sound source, headphones, or a multic hannel arra y .
Figure 2.10: B-format c omp onents: omnidir e ctional pr essur e c omp onent W, and the thr e e
velo city c omp onents X, Y, Z. Extr acte d fr om Rumsey [ 256 ]. The limitation of first-order
Ambisonics is spatial precision since it is only effec- tiv e at a point cen tered within a
defined area. This limitation can b e o vercome with higher-order comp onen ts. Adding a set of
higher-order comp onen ts im- pro v es the directionalit y . How ever, increasing the n um b er
of comp onents will also increase the num b er of loudsp eakers required to play higher-order
Am- bisonics. That means a more accurate sound field representation if the order is increased.
The num b er of channels N for a p eriphonic Am bisonics of order m order is N = ( m + 1) 2 for
3D repro duction and N = (2 m + 1) for 2D [ 65 ]. Chapter 2. Spatial Sound & Virtual
Acoustics 31 2.3.2.3 Sound Field Synthesis The ob jective of tec hniques from Sound Field Syn
thesis remains the same as in the techniques from the p erceptually-motiv ated paradigm: a
spatial sound field repro duction. The p erceptually motiv ated are centered on the
psychoacoustic effects of summing binaural cues that lead the listener to p erceiv e a virtual
source. On the other side, the Sound Field Synthesis techniques rely on the ph ysical
reconstruction of the original/simulated sound field to a sp ecific area. The main techniques are
the extension of Am bisonics repro duction to higher orders called Higher Order Am bisonics
(HOA) and the W a ve Field Syn thesis (WFS) [ 24 , 25 ]. The HOA extends the order of the
classical Am bisonics and, therefore, the n um b er of sound sources arranged in a spherical
array . As the Ambisonics order increases, the p erceiv ed sound source direction accuracy also
increases, although requiring more loudsp eak ers [ 97 ]. An imp ortan t distinction can b e
made b et w een the Ambisoni cs and HOA. Given the truncation p ossibility in Am bisonics, the
metho d is treated as a soft transition from a p erceptually based metho d to a ph ysically
based one. Although the HOA utilizes the same principle as Am bisonics, it is classified as a
sound field syn thesis paradigm (ph ysically based) along with WFS. The HOA limitations are rep
orted in the literature b y several studies [ 26 , 27 , 64 , 73 , 236 , 299 ], esp ecially the
aliasing in frequency that leads to pressure errors and the sweet sp ot size [ 253 ]. The WFS
form ulation relies on Huygen’s principle: a propagatin g w av efront at an y instan t is shap
ed to the env elop e of spherical wa ves emanating from ev ery p oin t on the wa vefron t at the
prior instant [ 281 ], the principle is illustrated in Figure 2.11 ). Chapter 2. Spatial Sound
& Virtual Acoustics 32 Figure 2.11: Il lustr ation of Huygen ’s Principle of a pr op
agating wave fr ont. A conceptual difference b et w een WFS and HOA is that for HOA, the sound c
haracteristics are describ ed in a point (or small area) inside the array , while in WFS, the
sound pressure that m ust be kno wn is on the border of the repro duc- tion area. A review and a
comparison of b oth metho ds and their compromises in terms of spatial aliasing errors and noise
amplification is presen ted in [ 65 ]. Their findings indicated similar constraints to b oth metho
ds. Ho wev er, they are b oth c haracterized by the requiremen t of large arrays of loudspeak
ers. The HO A has b een found to ha v e a higher limit of the size of the cen ter area. In con
trast, the WFS has limitations on the distortion of higher frequencies (alias- ing), dep ending
on the num b er of loudsp eakers. Regardless, as the scop e of the thesis aims to w ork with a
small n um b er of loudsp eak ers, they will not b e thoroughly discussed. 2.3.3 Ro om acoustics
Differen t asp ects are tak en into accoun t when describing the hearing exp eri- ence of a h
uman b eing in a space, for example, the individualit y of auditory Chapter 2. Spatial Sound
& Virtual Acoustics 33 training, familiarity with space, p ersonal preferences, humor,
fatigue, culture, and the sp ok en language [ 10 , 32 , 123 , 169 , 184 , 190 , 250 ]. Ho wev
er, there are similarities in the expressions used b etw een sample groups. Such an effect is
attributed to the similarit y of the auditory-cognitiv e mec hanism of human b eings [ 40 ]. In
arc hitectural acoustics and ro om acoustics, studies of sound prop erties as- sume that a sound
source and a receiv er in a giv en space is a linear and time- in v arian t system (L TI) [ 45
]. Thus, a complete L TI characterization to each source-receiv er can b e expressed b y its
impulse resp onse in the time domain or the transfer function in the frequency domain [ 45 , 293
]. 2.3.3.1 Ro om acoustics parameters Ob jectiv e parameters are essen tial in acoustic pro
jects and in comp ositions of statistical mo dels that aim to predict the human interpretation
of acous- tic phenomena [ 254 ]. Ob jectiv e parameters deriv ed from the L TI’s impulse resp
onse aim to create metrics that quantify sub jective descriptors from n u- merous exp eriments [
254 ]. The calculation and measuremen t of many ob jective parameters are describ ed in an app
endix to the International Organization for Standardization (ISO) standard 3382 [ 127 ]. 2.3.3.2
Rev erb eration Time The reverberation time (R T) measures the time it takes for the impulse re-
sp onse’s sound pressure lev el to decrease to one-million th of its maximum v alue, equiv alent
to a decline of 60 dB; it is also often referred to as R T 60 or T 60 . Note that the
reverberation time measures how fast the decay of sound energy o ccurs and not how long the
reverberation lasts in the environmen t, Chapter 2. Spatial Sound & Virtual Acoustics 34
dep ending on the sound source p o w er and the bac kground noise. The R T w as the first
parameter studied, mo deled, and understo od, related to sev eral sub jectiv e asp ects of the
human hearing exp erience in a ro om. T o da y , this is considered the most critical parameter,
although it is not enough to describ e h uman p erception completely . W allace C. Sabine [ 258
] initially describ ed it through mathematical relations obtained by an empirical metho d, follo
wed later by developing the theoretical bases together with W. S. F ranklin [ 86 ]. The
expression gives the analytical form of the reverberation time obtained by Sabine: T 60 = 0 .
161 V S ¯ α [s] (2.10) where V is the v olume of the ro om and S ¯ α represents the amoun t of
absorption presen t in the environmen t, the unit is named [Sabins] in honor of Sabine.
Subsequen t mo dels improv ed the calculation of the rev erb eration time by con- sidering the
ev olution of the energy densit y , and the sound absorption carried out by the air [ 74 ], the
sp ecular reflection of each sound wa ve [ 187 , 271 ], the propagation path [ 148 ], the
triaxial arrangemen t of the differen t absorption co efficien ts [ 14 , 82 ], among others. In
addition to statistical theory , T 60 can b e obtained from the measurement of the impulse
response. In measuremen ts, the T 60 is obtained considering limitations regarding the bac
kground noise lev el and the sound source’s max- im um sound pressure level. Th us, according to
the ISO 3382 standard [ 127 ], the measuremen t’s dynamic range must present the end of the deca
y at least 15 dB ab o v e the background noise and start 5 dB b elow the maximum. F or example,
the sound pressure level required to measure the T 60 in a ro om with a background noise of 30
dB is 110 dB (30 + 15 + 60 + 5). Chapter 2. Spatial Sound & Virtual Acoustics 35 Linear
b ehavior is noted b y observing the square of the energy h 2 ( t ) in the deca y curve plotted
in dB (See Figure 2.12 ). Thus, to reduce the dynamic range required for measuremen t, it is p
ossible to estimate the T 60 through other limits. The T 60 is commonly mistak en for double the
T30, which is not true. The T 20 and T 30 also corresp ond to the time the sound pressure level
(SPL) inside the ro om tak es to drop 60 dB but estimated from measuremen t restricted to ranges
of -5 dB to -25 dB, and -5 dB to -35 dB, respectively . Therefore, the linear energy decay pro
duces the relation T 60 = T 30 = T 20 . Figure 2.12: Normalize d R o om Impulse R esp onse:
example fr om a r e al r o om in the time domain (left), and in the time domain in dB (right).
The T 20 is obtained as the deca y rate by the linear least-squares regression of the measured
decay curve, also called the Sc hro eder curve, in the range -5 dB to -25 dB. In comparison, the
T 30 is obtained when the curv es’ adjustment is carried out in the range b etw een 5 dB and -35
dB [ 127 ]. 2.3.3.3 Clarit y and Definition The clarity and definition parameters express a
balance b etw een the energy that arriv es earlier and later in the impulsiv e response, which
is related to Chapter 2. Spatial Sound & Virtual Acoustics 36 h uman b eings’ particular
abilit y to distinguish sounds in sequence [ 44 , 45 , 57 , 247 , 254 ]. With the first
reflections arriving within the limits of 50 or 80 milliseconds, the tendency is to b e
integrated b y the auditory system in to the direct sound. Th us, if the first reflections con
tain relatively greater energy than the rev erb erating tail, the sound will b e exp erienced as
amplified. On the other hand, if the reverberating tail has more energy and is long enough, it
will b e p erceiv ed and mask the next direct sound. The limits of 50 and 80 milliseconds are
defined in the literature as appropriate in optimizing sp eec h and music, resp ectively [ 245 ,
247 ]. The Clarit y defined in the ISO 3382 standard measures the ratio b etw een the energy in
the first reflections and the energy in the rest of the impulse response. Clarit y’s p ositive v
alues, whic h are given in dB, mean more energy in the first reflections. Negative v alues
indicate more energy in the rev erb erating tail. A n ull v alue indicates the balance b etw een
the parts of the impulse resp onse. The ”Clarity” is giv en by: C 80 = 10 log R 80ms 0 h 2 ( t
)d t R ∞ 80ms h 2 ( t )d t ! (2.11) and C 50 = 10 log R 50ms 0 h 2 ( t )d t R ∞ 50ms h 2 ( t )d
t ! (2.12) The ”Definition” parameter, in turn, is presented on a linear scale and com- putes the
ratio b et ween the energy con tained in the first reflections b y the total energy of the impulse
resp onse. V alues greater than 0.5 indicate that most of the impulse resp onse’s energy is
contained in the first reflections. The ”Definition” is given b y: Chapter 2. Spatial Sound
& Virtual Acoustics 37 D 80 = R 80ms 0 h 2 ( t )d t R ∞ 0 h 2 ( t )d t (2.13) and D 50 =
R 50ms 0 h 2 ( t )d t R ∞ 0 h 2 ( t )d t (2.14) 2.3.3.4 Cen ter Time The central time is a
parameter analogous to the previous ones, measuring the balance b et w een the energy con tained
in the early reflections and the rev er- b erating tail’s energy . Ho wev er, the central time is
particularly interesting in p oin ting out what can b e seen as the center of gra vit y of the
squared impulse resp onse. Moreov er, the cen tral time do es not previously delimit the
transition barrier b et w een first reflections and a rev erb erating tail. Thus, the definition of
the central time for an impulse resp onse is given b y: t s = R ∞ 0 th 2 ( t )d t R ∞ 0 h 2 ( t
)d t (2.15) 2.3.3.5 P arameters related to spatialit y The relation to the human auditory
spatiality sensation and the ob jective parameters deriv ed from measuremen ts are studied in
detail in the litera- ture [ 20 , 21 , 88 ]. They observ e how the sound energy distribution is
arranged from the directions and timing asp ect. The principal sensations and their Chapter 2.
Spatial Sound & Virtual Acoustics 38 related parameters are presen ted for b etter
understanding. Apparen t Source Width The Apparent Source Width (ASW) is related to the
impression of the sound source’s size or ho w the source is distributed in the space. An ob
jective metric asso ciated with ASW is the Lateral Energy F raction (LEF). The Equation 2.16
gives the LEF: LEF = R 80ms 5ms h 2 b ( t )d t R 80ms 0 h 2 ( t )d t (2.16) Where h ( t ) is the
impulse resp onse measured with a microphone that has an omnidirectional sensitivit y pattern
and h b ( t ) is the impulse response measured with a microphone that has bidirectional
sensitivit y (pressure gradient) at the same p osition as the omnidirectional. Th us, this ob
jective parameter represen ts the ratio b et w een the lateral energy that reac hes the receptor
b et w een 5 and 80 milliseconds ( i.e. , the energy con- tained in the early reflections,
excluding the direct sound) and the total energy arriving from all directions b et ween 0 and 80
milliseconds [ 21 ]. As low and mid frequencies make the dominant con tributions to the LEF,
this parame- ter is usually represen ted by the arithmetic mean of the o cta ve bands’ v alues
obtained b et w een 125 Hz and 1000 Hz [ 45 , 254 ]. Listener Env elopmen t The Listener En v
elopmen t (LEV) is related to the impression of b eing immersed in the ro om’s rev erb eran t
field. F rom Bradley and Soulodre’s exp erimen ts [ 44 ] with test participan ts inside an
anechoic ro om, the sense of inv olvemen t was assessed with loudsp eakers. The authors find the
LEV asso ciated with the ratio b et ween the lateral energy Chapter 2. Spatial Sound &
Virtual Acoustics 39 and the total energy reaching the receptor. The lateral energy is contained
in the impulse resp onse measured with a bidirectional microphone after the first 80
milliseconds. The total energy is defined as the impulse resp onse measured with an
omnidirectional microphone, in free field condition, and 10 meters a w ay from the sound source
utilizing the same sound source at the same p o wer. The ratio is called ”Lateral Strength” (LG)
and is given b y: LG = R ∞ 80ms h 2 b ( t )d t R ∞ 0 h 2 10 ( t )d t (2.17) In teraural
Cross-Correlation Co efficien t In his work, Keet [ 133 ] pro- p osed an auditory-cognitive pro
cess relating the spatial impression to compar- ing the signals receiv ed by b oth ears. The
cross-correlation function measures the degree of similarity of the signals. Therefore, the In
ter-Aural Cross-Correlation Co efficien t (IACC) was incorp o- rated as a third parameter related
to the spatial impression. The IACC is de- fined as the absolute maxim um v alue of the ratio b
et ween the cross-correlation function of the impulse resp onses collected from the left ear ( h
L ( t )) and the righ t ear ( h R ( t )) by the total energy contained in eac h of them. IA CC =
max R t 2 t 1 h L ( t ) h R ( t + τ )d t r R t 2 t 1 h 2 L ( t )d t R t 2
t 1 h 2 R ( t )d t (2.18) where R t 2 t 1 h 2 L ( t )d t and R t 2 t 1 h 2 R (
t )d t are the energy b et w een the instan t t1 and the instant t2 in the impulse resp onse
from the left and righ t ears; the expres- sion R t 2 t 1 h L ( t ) h R ( t + τ )d t is the
cross correlation function b et ween the impulse resp onse; τ is giv en b et ween 0 and 1 ms.
Chapter 2. Spatial Sound & Virtual Acoustics 40 2.3.4 Loudsp eak er-based Virtualization
in Auditory Re- searc h Virtualization of sounds through auralization of simulated en vironmen
ts hav e b een used in arc hitectural design to preview the sound behavior in ro oms when c
hanging space design or even to preview a completely new space that is not built yet b efore the
building pro cess [ 293 ]. As the ro om acoustics simulation and the auralization pro cess ev
olves as a sound equiv alent to the visual preview rendered in 3D mo dels, it has found
applications in research also outside the arc hitectural field [ 282 , 320 ]. Lately , the
virtualization of sound sources has b een applied to extend the ecological v alidity of sound
scenarios in auditory researc h [ 161 ]. Researc h that utilizes binaural virtualization with
headphones are common in auditory research literature [ 4 , 38 , 142 , 248 , 305 ]. A series of
adv antages may include, but are not limited to, the individual con trol of the stim uli repro
duced in eac h ear, the smaller setup, and easier calibration [ 251 ]. Although binaural repro
duction is a suitable metho d for some research questions, others may require a more complex
test en vironmen t, esp ecially researc h encompassing hearing aids. In that regard, the use of
loudsp eak ers can b e asso ciated with single loud- sp eak er presen tations where a loudsp eak
er repro duce a single sound source p ositioned in space ( e.g ., [ 176 , 230 , 268 , 321 ]) or
virtualization metho ds that manage auralized files to b e p erceiv ed as single sources or
complex environ- men ts [ 89 , 177 , 295 ]. F or the virtualization of sound sources, the loudsp
eak er n umber dep ends on the selected metho d of enco ding and deco ding the spatial
information [ 293 ]. F or example, a quadrophonic loudsp eak er arrangement w as found to b e
sufficient Chapter 2. Spatial Sound & Virtual Acoustics 41 to repro duce a diffuse sound
field to a p erceptual spatial impression when constraining listener mov ements [ 117 ]. How
ever, utilizing Directional audio co ding, Laitinen and Pullkki [ 150 ] found that to hav e an
adequate repro duction of diffuse sound, it would b e necessary from 12 to 20 loudsp eak ers.
VBAP and HOA techniques w ere ev aluated with different num b ers of loud- sp eak ers in
simulations by Grimm et al. [ 97 ]. Perceptual lo calization error (PLE) w as computed for the
arra ys utilizing these tec hniques. Eigh t loud- sp eak ers w ere estimated to b e sufficien t in
terms of sound source lo calization. In the same w ork, Grimm textitet al. show ed that the
effects of virtualiza- tion with VBAP and HOA on hearing-aid b eam patterns are present with less
than 18 loudsp eakers in a bandwidth of 4 kHz (spatial aliasing higher than 5.7 dB criterion).
Ho wev er, the sp ectral distances, a w eigh ted sum of the ab- solute differences in ripple and
sp ectral slop e b etw een virtual and repro duced sound sources, w ere all v ery low,
indicating high naturalness when compared to sub jective data from Mo ore and T an [ 191 ].
Aguirre [ 1 ] ev aluated VBAP and its v ariation V ector-Based Intensit y P anning (VBIP) in
terms of spatial accuracy with 30 normal-hearing participants within an array of eigh t loudsp
eakers. There was no significant difference among stim uli (sp eec h, in termitten t white noise,
and con tinuous white noise) on b oth tec hniques. It was found that an a v erage PLE around 4 ◦
, consistent with the v alues simulated by Grimm et al. [ 97 ]. Ev aluating SNR b enefits on HA b
eamformer algorithms within a spherical arra y with 41 loudsp eak ers, Oreinos and Buc hholz [
212 ] found similar results b et w een the real en vironmen t and the auralized one. Repro
duction errors in HO A repro duction to hearing aids were studied in [ 213 ]. The reverberation
w as found to reduce the time-a v eraged errors in tro duced by HOA, implying that the frequency
limit of usable renderings with HOA can b e extended in Chapter 2. Spatial Sound &
Virtual Acoustics 42 those environmen ts. Loudsp eak er-based virtualization ha v e b een used
in a hearing researc h con text ev aluating normal hearing, hearing impaired and hearing aid
users through differen t metho ds [ 6 – 8 , 30 , 31 , 55 , 61 , 80 , 93 , 102 , 136 , 168 , 174 ,
188 , 220 , 303 , 322 ]. F urthermore, some studies explored the ecological v alidity of the
tech- niques with sub jective resp onses based on psyc ho-linguistic measure comparing in-situ
and those virtualized in lab oratory [ 66 , 103 , 286 ]. The pro cess of virtualizing sound
sources using loudsp eak ers is complex [ 282 ] and requires a thorough understanding of
physical acoustics, psyc hoacoustics, signal pro cessing fundamen tals, and prop er calibration
of softw are and hard- w are [ 126 , 165 , 293 ]. As a result, researc h centers ha v e
developed systems to establish reliable procedures for virtualizing sound sources for auditory
testing. Examples of suc h systems include the transaural CTC system dev elop ed by Asp¨ oc k et
al. [ 17 ], the system with a spherical arra y of 42 loudsp eakers capable of rendering
scenarios using HOA up to fifth order and VBAP presented by P arsehian et al. [ 215 ], and the
Loudsp eak er Based Ro om Auralization (LoRa) system dev elop ed b y F avrot [ 79 ], whic h is
capable of rendering auditory scenes using pure HOA and a h ybrid v ersion with Nearest Sp eaker
(NSP) and HOA. In addition, Grimm [ 100 , 101 ] introduced the T o olb o x for Acoustic Scene
Cre- ation and Rendering (T ASCAR), whic h is capable of rendering p erceptually plausible
scenes in real-time using VBAP and HOA 2D implementation. A recen t study by Hamdan and Fletcher
[ 107 ] prop osed a metho d using only tw o loudsp eak ers in 2022 based on the transaural metho
d with cross-talk cancella- tion. While this list is not exhaustive, these studies provide
recommendations and guidelines for the field and highlight the imp ortance of implementing re-
liable systems and verifying their sound fields ob jectively and sub jectively to increase the
ecological v alidity of auditory research and hearing aid develop- men t. Chapter 2. Spatial
Sound & Virtual Acoustics 43 2.3.4.1 Hybrid Metho ds Hybrid metho ds that com bine the
repro duction of direct sound and reverber- ation are not new, ha ving b een dev elop ed since
at least the 1980s with the Am biophonics group [ 42 , 95 ]. They prop osed the Am biophonics
metho d to re- pro duce concerts to one or t w o home listeners as if they w ere in the hall
where the recording was p erformed. This metho d combined crosstalk canceled stereo- dip ole and
conv olved signals with the IR from the recorded spaces [ 76 , 94 ]. The system aims to enhance
the repro duction of recordings from existing systems (e.g., stereo and 5.1). The group also
developed a new recording metho dology called Am biophone. This metho d is a microphone
arrangemen t comp osed of t w o head-spaced omnidirectional microphones co v ered b y a baffle in
the rear to fav or ro om reflections from frontal directions. In 2010, the Loudsp eak er based Ro
om Auralization (LoRa) metho d developed b y F avrot [ 79 ] applied the hybrid concept using HOA
and the nearest sp eak er (NSP) for the direct sound and early reflections. The metho d uses the
env e- lop e from simulated ro oms to reduce the computational cost by multiplying it with
uncorrelated noise. The scheme was originally conceptualized for a large spherical 69 loudsp
eaker array . Figure 2.13 depicts its system schematic. Figure 2.13: L oR a implementation pr o
c essing diagr am. The multichannel RIR is derive d in eight fr e quency b ands and for e ach p
art of the input RIR (Figur e fr om F avr ot [ 79 ]). Chapter 2. Spatial Sound & Virtual
Acoustics 44 P elzer et al. [ 221 ] presented a comparison b etw een transaural or cross-talk
cancellation (CTC), VBAP , and 4th-order Am bisonics among t wo new h ybrid prop osals: (1)
direct sound and early reflections through CTC and late re- flections with 4th-order Ambisonics
and (2) direct sound and early reflections through VBAP and late reflections with 4th-order
Ambisonics. The h ybrid metho ds were implemented in a single case without generalization to
differen t sim ulations. These metho ds w ere tested within a 24 loudspeaker arra y with no
statistically significant change in human lo calization p erformance by an y of the metho ds.
Pausc h et al. [ 217 ] presented a metho d designed for inv estigations with sub jects with
hearing loss. The metho d mixes binaural techniques to pro cess comp onents in complex simulated
environmen ts and CTC to present them ov er loudsp eak ers. At the same time, the head p osition
can b e track ed, allo wing user interaction. In 2017 Pulkki et al. [ 243 ] presented the
first-order directional audio co ding (DirA C) metho d is a technique for repro ducing spatial
sound o v er a standard stereo audio system. It is based on using first-order am bisonic
channels, which enco de the sound pressure and particle v elo cit y at a listener’s lo cation to
repre- sen t the sound field. These c hannels are transformed into a stereo audio signal using a
frequency-dep enden t matrix, whic h preserves the spatial cues that are imp ortan t for
localizing sound sources. The metho d implies the direction of ar- riv al of the sound source to
b e able to virtualize it through amplitude panning. It uses real-world recordings. The DirAC
metho d is effectiv e for v arious types of audio conten t, including m usic, sp eec h, and sound
effects. It can p oten tially impro ve the spatial realism of audio exp eriences ov er
traditional stereo systems and has applications in m yriad fields, including en tertainmen t,
gaming, and virtual reality . T able 2.1 presents an ov erview of the listed metho ds and the
techniques in- Chapter 2. Spatial Sound & Virtual Acoustics 45 v olv ed, their purp ose,
and their parameters. T able 2.1: Non-exhaustive ov erview list of hybrid auralization metho ds
prop osed in the literature. The A-B order of the techniques do es not represent any order of
significance. Y ear Method Authors T echnique A T echnique B Proposed Loudspeaker Number Proposed
to 1986 Ambiophonics F arina et al. [ 76 ] Crosstalk Cancelation Binaural 2 Music Repro duction
2005 DirAC Pulkki et al. [ 243 ] Ambisonics VBAP 2+ multiple applications 2010 LoRa F avrot [ 79
] HOA NSP 64 2014 - P elzer et al. [ 221 ] Crosstalk Cancelation HOA 24 2014 - P elzer et al. [
221 ] VBAP HOA 24 2018 Extended Binaural Real-Time Pausc h et al. [ 217 ] Binaural CTC 2 Hearing
Loss Inv estigations 2.3.4.2 Sound Source Lo calization A comparison among VBAP and Am bisonics
conducted by F rank [ 84 ] demon- strated a median deviation in exp erimental results from the
ideal lo calization curv e of 2.35 º ± 2.93 º on VBAP and 1.05 º ± 4.07 º to third order Am
bisonics using max-r E deco der. The setup w as placed in a t ypical non-anec hoic studio and a
regular array of 8 loudsp eakers, listening in a 2.5m radius circle at the cen tral p osition.
The sub jective results from 14 participants were listening to pink noise. These exp erimental
results were compared to a lo calization mo del (Lindemann [ 157 ]) based on ITD and ILD from
impulse resp onses. The re- sults sho wed a deviation close to the standard deviation in sub
jective listening, 2.35 º on VBAP and 3.37 º to third order Ambisonics using max-r E . Off-center
measuremen ts w ere p ointed out as necessary for future inv estigation b y the author. Am
bisonics in first, third, and fifth order was examined in another study by F rank and Zotter [ 85
], with 15 normal hearing, 12 loudsp eak ers setup, and listening to pink noise with in terv al
attenuation. This study in vestigated the effect of the p osition (centered and off-center) and
the order. The results sho w ed, to the first order rendered, a lo calization error of around 5 º
to the Chapter 2. Spatial Sound & Virtual Acoustics 46 cen tered listener and 30 º for
the off-cen ter p osition. Also, Am bisonics in the first order with four loudsp eakers and the
third order with eight loudsp eakers w as in v estigated b y Stitt et al. , [ 283 ]. This study
was conducted in a non-reverberant en vironmen t to verify the off-center p osition and the Am
bisonics order. The setup w as a circular arra y with 2.2 meters of radius and an R T of 0.095
s. Eigh teen test participants listened to white noise bursts of 0.2 s. At this acoustically dry
condition, the cen tered first-order median absolute error w as around 10 º , while in the off-cen
ter p ositions tested w as close to 30 º . As exp ected, the error w as lo w er in the third
order achieving a median of absolute error around 8 º in the cen ter and 11 º off-center. A study
by Lauren t et al., [ 275 ] in v estigated the effect of 3D audio repro- duction artifacts on
hearing devices assessing ITD, ILD, and DI on HOA (third and fifth orders), VBAP , distance-based
amplitude panning (DBAP), and multiple-direction amplitude panning (MD AP). The study was
conducted in a non-anechoic ro om with 32 loudsp eak ers in a spherical configuration. The loudsp
eak er distance from the cen ter w as 1.5 m, except for the four loudsp eak- ers at the top,
which w ere distan t only 98 cm. This study inv estigated cen tered and off-center p ositions (10
and 20 cm). The results presented an exp ected Am bisonics limitation of repro ducing ITD b
ecause of the spatial aliasing at high frequencies, accordingly to the authors. In addition,
they in vestigated MVDR monoaural b eamformer, whic h did not repro duce the correct ITD, es- p
ecially off-center. At the cen tered p osition, only DBAP could not correct repro duced ITD. Am
bisonics ITDs deteriorate more than VBAP on off-center p ositions. ILD errors in virtualized
sound sources can make the system unre- liable for testing HI with pro cessing based on ILDs. In
the exp eriment, the ILDs were less affected by b eamforming pro cessing in VBAP , and Am
bisonics b enefited from the max RE deco ding that maximizes the energy v ector. How- ev er, the
authors exp ect a b etter ILD represen tation from VBAP as HO A has Chapter 2. Spatial Sound
& Virtual Acoustics 47 an aliasing frequency limitation. Hamdan and Fletcher [ 107 ]
presen t the dev elopmen t of a compact tw o-loudsp eaker virtual sound repro duction system for
clinical testing of spatial hearing with hearing-assistiv e devices. The system is based on the
transaural metho d with cross-talk cancellation and is suitable for use in small, reverberant
spaces, suc h as clinics and small researc h labs. The authors ev aluated the system’s p erfor-
mance regarding the accuracy of sound pressure repro duction in the fron tal hemisphere. They
found that it could pro duce virtual sound fields up to 8kHz. They suggest that tracking the
listener’s p osition could improv e the system’s p erformance. Overall, the authors b elieve
this system is a promising to ol for the clinical testing of spatial hearing with
hearing-assistive devices. Finally , a study by Bates et al. , [ 22 ], ev aluated second-order
Ambisonics and VBAP lo calization errors in sub jectiv e listening tests and ITD and IAF C com-
parisons. They presented the stimuli to a sim ultaneous set of nine listeners in differen t p
ositions inside a concert ro om (around 1 s of R T). With 16 loud- sp eak ers, they selected 1
second of sp eech (male and female), white noise, and m usic. The results indicate that VBAP and
Am bisonics tec hniques cannot consisten tly create spatially accurate virtual sources for a
distributed audience in a rev erb eran t en vironment. The off-center p ositions are compromised
by tec hnique and stimulus. Dep ending on the stimuli, centered p ositions resulted in lo
calization errors b et ween 10 º and 20 º degrees. In the spatial distribution inside the ring,
a bias from the target image p osition and tow ards the nearer con tributing loudsp eak er is
more presen t in the Am bisonics than the VBAP . The authors men tioned that the ro om acoustics
could also impact lo calization accuracy . The num b er of v ariables across these previous
studies and their con tributions is massive, e.g., ob jective measures, tec hnique v ariations,
n umber of loud- Chapter 2. Spatial Sound & Virtual Acoustics 48 sp eak ers, loudsp eak
er distance, num b er of simultaneous listeners, reverbera- tion time, and form of the arra y .
T able 2.2 presents an ov erview of metho ds and estimated or measured lo cal- ization error.
Chapter 2. Spatial Sound & Virtual Acoustics 49 T able 2.2: Ov erview of Lo calization
Error Estimates or Measurements from Loudsp eak er-Based Virtualization Systems Using V arious
Auralization Metho ds. Metho d Error at cen ter p osition Error at off-cen ter p osition LS num b
er Presen t Study Iceb erg VBAP/Am bisonics 30 º (Max Estimated) 7 º (Average Estimated) 30 º
(Max Estimated) 7 º (Average Estimated) 4 F rank [ 84 ] VBAP 2.35 º (Average) N/A 8 F rank [ 84
] HO A (3rd order) 3.37 º (Average) N/A 8 Zotter [ 85 ] Am bisonics 5 º (Median) 30 º (Median)
12 Zotter [ 85 ] HO A (3rd order) 2 º (Median) 15 º (Median) 12 Zotter [ 85 ] HO A (5thorder) 1
º (Median) 10 º (Median) 12 Stitt et al. , [ 283 ] Am bisonics 10 º (Median) 30 º (Median) 8
Stitt et al. , [ 283 ] HO A (3rd order) 8 º (Median) 11 º (Median) 8 Bates et al. , [ 22 ] Am
bisonics Second Order 10 º (mean) 20 º (mean) 16 Bates et al. , [ 22 ] VBAP 10 º (mean) 20 º
(mean) 16 Grimm et al. [ 97 ] HO A (3rd order) 2 º (Estimated) 6 º (Estimated) 8 Grimm et al. [
97 ] VBAP 4 º (Estimated) 6 º (Estimated) 8 Aguirre [ 1 ] VBAP 4 º (Median) N/A 8 Hamdan and
Fletcher [ 107 ] CTC 2 º (Max Head Displacemen t) N/A 2 Huisman et al. [ 125 ] Am bisonics 30 º
(Median) N/A 4 Huisman et al. [ 125 ] HO A (3rd order) ≈ 15 º (Median) N/A 8 Huisman et al. [
125 ] HO A (5th order) ≈ 8 º (Median) N/A 12 Huisman et al. [ 125 ] HO A (11th order) ≈ 5 º
(Median) N/A 24 Chapter 2. Listening Effort Assessment 50 2.4 Listening Effort Assessmen t The
regular task of following a conv ersation, listening to a p erson’s sp eec h, or in teracting
with someone in a conv ersation may require additional effort in an unfa v orable or c hallenging
sound en vironmen t [ 227 ]. The listening effort is defined as ”the delib erate allo cation of
mental resources to ov ercome obstacles in goal pursuit when carrying out a [listening] task” [
224 ]. Studying asp ects of the listening effort related to different acoustic situations through
reliable metho ds can lead to the developmen t of solutions to reduce it, improving the qualit y
of life [ 304 ]. How ever, there is no consensus in the literature on the b est metho d to
measure listening effort. A ttempts to measure ho w m uch energy a person tak es in a sp ecific
acoustic sit- uation may rely on differen t paradigms. There are ob jective measuremen ts of ph
ysiological parameters in literature asso ciated with changes in effort, such as pupil dilation [
151 , 209 , 211 , 301 , 302 , 319 ], resp onses to brainstem fre- quency (FFRs) and cortical
electro encephalogram (EEG) activity from ev en t- related p oten tials [ 28 , 33 ], or alpha
band oscillations [ 186 , 223 ]. In addition, the b ehavioral p ersp ective studies changes in
resp onse time in single [ 204 ] or dual-task paradigm tests, also assuming that they are
related to c hanges in cognitiv e load in auditory tests [ 87 , 228 ] [ 225 ]. In turn, sub
jectiv e assess- men ts of listening effort are p erformed through questionnaires [ 323 ] or
effort scales [ 147 , 149 , 249 , 260 ] and their results generally agree with p erformance
metrics [ 192 ]. Although sub jectiv e measuremen ts are intuitiv e and v alid, they tend to b e
less accepted as an indication of the amoun t of listening effort b ecause of differences b et w
een ob jectiv e and sub jective outcomes [ 151 , 225 ]. F or instance, Zekveld and Kramer [ 318
] presen t evidence of disagreement b et ween the ph ysiological and the sub jective measure
where the young normal-hearing participants at- Chapter 2. Listening Effort Assessment 51
tributed high sub jective effort to the most challenging conditions despite their smaller pupil
dilation. The authors assumed that the metho dological asp ects and the participant’s tendency
to drop out were also related to pupil dilation at lo w levels of in telligibility . In a study
on syn tactic complexit y and noise level in the auditory effort, W endt et al. [ 300 ] ev
aluated it through self-rated effort and pupil dilation. They found b oth background noise and
syntactic complexit y reflected in their measurements. How ever, at high levels of in
telligibilit y , the metho ds show different results. According to the authors, the explanation
is that each measure represen ts a different asp ect of the effort. In its turn, Picou et al. [
226 ]; and Picou and Rick etts [ 229 ] found sub jective ratings of listening effort were
correlated with p erformance instead of the listening effort utilizing the resp onse time as a b
eha vioral measure in a dual-task. Interestingly though, in this study , a question ab out
control w as correlated to their resp onse time results. The v aried outcomes from sub jective
and ob jective paradigms pro- p osed to ac hieve a proxy to listening effort can indicate that
these metho ds are quantifying separated asp ects of a complex global pro cess [ 12 , 224 ].
Another explanation suggests a bias in the sub jective metho d due to the heuris- tic strategies
adopted b y the participan ts to minimize the effort [ 192 ]. The men tioned strategy would
consist of replacing the question ab out the amount of effort sp en t with a more straightforw
ard question related to how they p er- formed in the task. Concomitan tly , studies based on ob
jective measuremen t paradigms also hav e divergen t results. F or example, ev en among
physiological measures sensitiv e to the spectral conten t of stim uli, such as pupil dilation
and alpha p o w er, they are not alwa ys related, and can b e sensitiv e to different as- p ects
of listening effort [ 186 ]. Even within the same paradigm, a different task ma y indicate that
different asp ects are b eing observed. F or example, Brown and Strand [ 53 ] analyzed the role
of the working memory as a weigh ting factor on listening effort. Although increasing bac kground
noise indeed increases lis- tening effort measured by the dual-task paradigm, the memory load was
not Chapter 2. Listening Effort Assessment 52 affected. They also suggested that the w orking
memory and listening effort are related in the recall-based single-task, unlik e in the
dual-task. In Lau et al. [ 151 ] significant differences b et w een sentence recognition and word
recog- nition w ere found on pupil dilation measurements and in sub jective ratings, although
with no correlation b et ween ob jective and sub jectiv e measures. The demand for mental
resources can also be affected b y p ersonal factors, suc h as fatigue and motiv ation [ 224 ]. A
t the same time, several physical-acoustical artifacts can degrade a sound, creating or leading
to difficulties in everyda y comm unication (increasing listening effort), esp ecially in so cial
situations. The masking noise, the sp ectral con ten t of the noise, the Signal-to-Noise Ratio
(SNR), and the en vironmen t reverberation are examples of artifacts capable of smearing the
temp oral en velope cues [ 163 ]. Also, sp eech in telligibilit y was assessed in a virtual
environmen t that consists in a large spherical array of 64 loudsp eakers repro ducing Mixed
Order Ambisonics (MO A) [ 6 ] presen ted comparable results of Sp eec h Reception Threshold (SR
T) compared to real ro om in a co-lo cated situation of masker and target. With spatial
separation of 30 degrees the virtual en vironmen t led to an SR T b enefit of 3 dB, it was argued
that b enefit was not present in more reverberant or complex scenes suggesting the masking effect
of more challenge scenes. SR Ts for normal hearing and hearing-impaired using hearing aids were
also in- v estigated b y [ 31 ]. A complex scenario (reverberant cafeteria) and an anec hoic
situation w ere ev aluated in a spherical arra y of 41 loudsp eakers. The virtu- alization w as
pro vided con volving the direct sound and the early reflections parts of the RIR with the
anechoic sen tence and presen ting the sound through the Nearest Sp eak er (NSP) and the late
reflections part of the RIR are created through the directional env elop e of eac h loudsp eaker
with uncorrelated noise. Chapter 2. Concluding Remarks 53 The reviewed studies were conducted in
lab oratories mainly taking adv an tage of spatial sound and virtual acoustics via loudsp eak er
or headphones repro- duction. Thus, the complex nature of h uman auditory phenomena and the imp
ortance of repro ducibilit y in hearing research highlight the need for inno- v ativ e to ols
suc h as spatial sound [ 134 ]. Virtualized sounds allows for realistic and controllable sound
en vironments, enabling control o ver selected param- eters and consistent repro duction of exp
erimen ts [ 61 , 161 , 282 , 293 ]. This tec hnology can help hearing inv estigations b ecome
more true-to-life and reli- able [ 134 , 161 , 251 ]. F or example, it can b e used to study
listening effort and sp eec h in telligibility using virtual sound sources to create ecologically
v alid and controlled en vironmen ts [ 7 , 177 ]. It also can enable the integration of vir-
tual sound scenarios with ecological tasks in volving m ultiple p eople, providing an
ecologically v alid assessment of the p erformance of hearing solutions more accessible than
large-field studies ( e.g., in Bates et al., [ 22 ]). Additionally , spatial audio enables the
accessible in v estigation of spatial sep- aration’s effects on binaural cues considering differen
t environmen ts, the role of binaural hearing in spatial p erception, and new hearing aid hardw
are and algorithms [ 61 , 97 , 213 ]. Overall, spatial sound & virtual acoustics in
hearing researc h offers numerous b enefits and represents a v aluable to ol for adv ancing our
understanding of hearing and developing effectiv e hearing solutions. 2.5 Concluding Remarks The
literature review suggests a contrast b et ween lo calization and immersion in auralization
metho ds that virtualize sound using a low num b er of loudsp eak ers. Th us, there is a need
for a method that can achiev e useful performance on b oth lo calization and immersion with a
small num b er of loudsp eak ers and that is reliable in rendering sound for listeners in the
presence of another Chapter 2. Concluding Remarks 54 listener within the virtualized sound field.
Previous metho ds, including hybrid approac hes, hav e b een developed using a larger n um b er
of loudsp eakers and differen t techniques for balancing energy . A recent study prop osed a
metho d using only tw o loudsp eak ers in 2022. How ever, it implemen ted a different
auralization method and had its limitations. The proposed metho d in this study is innov ativ e,
using a ro om acoustic parameter called center time to calculate the energy balance of ro om
impulse resp onses and combining it with t w o known auralization metho ds. Chapter 3 Binaural
cue distortions in virtualized Am bisonics and VBAP 3.1 In tro duction In acoustics, the complex
comm unication scenarios can inv olve, simultaneous sound sources, distracting background noise,
mo ving sound sources, sources without large spatial separation, and lo w signal to noise ratio.
Although p eo- ple with normal hearing can deal with most of these conditions in a relatively
efficien t w a y , p eople with hearing loss p erform po orly [ 273 , 289 , 317 ]. Since so- cial
ev en ts are often a real example of complex comm unication, the interaction barriers make p
eople a void this and sometimes ostracizing themselves [ 16 , 63 ]. That can b e a factor in
decreasing the qualit y of life of p eople with hearing problems. In hearing research, inno v
ative signal pro cessing tec hniques, new devices, more 55 Chapter 3. In tro duction 56 p o w
erful hardware, and up dated parameter settings are con tinuously devel- op ed and ev aluated.
These technological improv ements aspire to resolv e com- m unication problems in ev eryda y
situations for hearing aid users [ 227 ], increas- ing their so cialization and quality of life
[ 119 ]. T ests as sp eec h recognition in noise are dev elop ed and tailored to ev aluate the
human auditory resp onse on ev eryda y acoustics situations b etter than clinical based in pure
tones stim- ulation [ 145 ]. Even though the tasks are moving tow ards a more realistic represen
tation, they still need to improv e the ecological v alidity [ 134 ]. Auralization metho ds are
designed to create files mean t to b e repro duced to a sp ecific listener or a group of
listeners; these files contain particular c haracter- istics that try to mimic a recorded or
digitally created sound scene according to the metho d. The mathematical formulations that pro
duce these c haracteristics for the psyc hoacoustically based metho ds focus on deliv ering
accurate binaural cues. The listener p osition, ph ysical obstacles as the listener mov ement
will impact differently on distinct metho ds and cues. A VSE is an auralized sound field that can
con tain realistic elements. Cur- ren tly , it is p ossible to create VSE employing loudsp eaker
arrays or headphones for the listener, suc h as high bac kground noise, high reverberation, and
con- comitan t sound ev en ts from differen t directions [ 61 , 79 , 294 ]. F urthermore, through
a VSE, it is also p ossible to enable a participan t to w ear, for example, a hearing aid during
the test. Thus, the researcher can main tain control of the stim uli, the incidence direction,
signal-to-noise ratio (SNR), among other settings, while examining the hearing device p
erformance in a more ecological situation [ 98 , 161 , 269 ]. Although nov el tec hnologies
emerge and contribute to emulating sound sources and ev en en tire complex sound scenes with
humans’ so cial interaction [ 267 ], these opp ortunities are often o v erlo ok ed in auditory
ev aluations. Typically , Chapter 3. In tro duction 57 tests are p erformed b y observing only
one individual within the laboratory [ 81 , 89 , 104 , 152 , 169 , 175 ]. F urthermore, the
systems are designed to acquire resp onses from a single individual at a time [ 41 , 79 , 102 ,
118 , 195 , 218 – 220 , 259 ]. A reasonable explanation for this is the lo w cost and complexit
y of auralization through headphones. More complex tec hniques, lik e W av e Field Syn thesis,
do not limit the listener to a restricted sp ot [ 207 ], repro ducing a complete sound field,
although at the cost of a large n um b er of sound sources in a sp ecifically treated ro om. So
cial situations can ha ve effect on p eople’s listening effort [ 230 , 234 ] and their motiv ation
to listen [ 181 , 224 ]. In this context, so cial interactions hav e b een sim ulated through av
atars or audio visual recordings in virtual environmen ts, gaining space in auditory researc h [
116 , 160 , 161 , 272 , 298 ]. Although it can b e considered a significan t asset, it also fo
cuses on a single individual’s resp onses to simulated so cial stimuli. The scenario creates a
ground for this study to inv estigate controlled acousti- cal c hanges on the VSE. This study
assesses tw o main situations within a ring of loudsp eakers virtualizing sound sources on
Ambisonics and VBAP: (1) the displacemen t of the listener from the center (sweet sp ot), and
(2) the effect including a second simultaneous listener inside the ring. These topics can help
understand the p erception of sound in these sp ecific virtualization metho ds, increasing the
fundamental scien tific basis for future hearing research appli- cations. The c hanges to the
sound field w ere observ ed in three ma jor spatial cues: ITD, ILD, and IACC. That w as explored
by c hanging the listener’s p osi- tion and including a second listener inside the ring of
loudsp eakers to measure BRIRs. These metrics can describ e the spatial p erception of an
auralized sound sig- nal [ 47 , 48 ], being ITD and ILD resp onsible by lo calization and IA CC
p erceived Chapter 3. Metho ds 58 spaciousness and the listener env elopment [ 44 ]. Therefore,
these measuremen ts can indicate the p ossibilit y of a simultaneous second participant in any
hear- ing test with virtualized spatially distributed sound sources. Tw o differen t auralization
techniques were used to virtualized sound sources, vector-based amplitude panning (VBAP) [ 241 ]
and Am bisonics [ 91 ]. Both tec hniques rely on the same receptor-dep endent psyc hoacoustic
paradigm to provide an au- ditory sense of immersion for those with normal hearing [ 161 , 180
]. These tec hniques aim deliveri ng the correct binaural cues to a p oint or area to create a
realistic spatial sound impression, alb eit through differen t mathematical for- m ulations. The
work inv estigates if the techniques can pro vide an appropriate spatial impression for y oung
normal-hearing listeners. Hyp othesis The main research question is how auralized scenarios with
VBAP and Am bisonics are affected when displaced from the center and with another listener inside
the ring. The h yp othesis is that lo calization cues can b e b etter provided by VBAP , esp
ecially in off-cen ter p ositions. In contrast, Am- bisonics can provide a b etter sense of
immersiv eness. Also, the second listener w ould impact Ambisonics more than VBAP virtualized
sound sources. 3.2 Metho ds The exp eriment was conducted in tw o differen t lo cations; The first
one is a sound treated test ro om at the Hearing Sciences - Scottish Section in Glasgow (See
Figure 3.1 ), the second is an anechoic test ro om at Eriksholm Researc h Cen tre (See Figure
3.2 ). This section presen ts the ro oms’ acoustic characteri- zations and the metho ds used in
this exp erimen t. Chapter 3. Metho ds 59 Figure 3.1: He aring Scienc es - Sc ottish Se ction T
est R o om. Figure 3.2: Eriksholm T est R o om. 3.2.1 Setups and system c haracterization The
exp erimen t conducted in Glasgo w was in a large sound-pro of audiometric b o oth (4.3 × 4.7 ×
2.9 m; IAC Acoustics). An azimuthal circular arra y config- uration of 24 loudsp eak ers (3.5-m
diameter; 15 ◦ of separation; T annoy VX6) w as used. The ceiling and walls w ere co v ered with
100-mm deep acoustic foam w edges to reduce reflections; the flo or was carpeted with a foam
underlay . The AD/D A audio in terface that w as used was a F errofish Mo del A32. The loud- sp
eak ers received signals that w ere amplified b y AR T SLA4 amplifiers. The reference microphone
used to c haracterize the Glasgow T est Ro om was a 1/2” G.R.A.S 40AD pressure-field microphone
set with e GRAS 26CA preamplifier. It was oriented 90 degrees v ertically from the sound source.
At Eriksholm, an equiv alen t setup was fitted. This time in a full anec hoic ro om from IA C
Chapter 3. Metho ds 60 Acoustics. The ro om’s outer dimensions (6.7 × 5.8 × 4.9 m; ) and inner
di- mensions, from the tip of the foam edges (4.3 × 3.4 × 2.7 m). An azim uthal circular array
configuration of 24 active loudsp eakers (16 Genelec 8030A and 8 Genelec 8030C; 2.4-m diameter;
15 ◦ of separation) was used. The AD/DA w as a MOTU PCI-e 424 com bined with a firewire
24-channel audio extension. The reference microphone used to characterize the Eriksholm test ro
om w as a 1/2” B&K 4192 pressure-field and a preamplifier type 2669, supplied by pow er mo
dule 5935. It w as oriented 90 degrees v ertically from the sound source. The signal acquisition
and pro cessing w ere en tirely through Matlab 2020a soft- w are using the IT A-T o olb o x v.9
[ 29 ]. The technical setup w as equiv alent in b oth ro oms, a B&K head and torso sim-
ulator (HA TS) mo del 4128-C mannequin for measurements, and a K no wles E lectronics M annequin
for A coustic R esearch (KEMAR) was used as a ph ys- ical obstacle. Although tec hnically , both
devices are head and torso sim ulators, in this thesis, HA TS will refer to the B&K
4128-C for simplicity . The sampling rate of the recordings w as fixed at 48 kHz, resulting in an
uncertaint y of ± 20 µ s, therefore not compromising the final analysis. 3.2.1.1 Rev erb eration
time The reverberation time is one of the most critical ob jectiv e parameters of a ro om [ 154
]. The deca y of sound energy to 60 dB b elow its p eak after the cessation of a sound source c
haracterizes the R T. The parameter is frequency- dep enden t; it is asso ciated with sp eec h
understanding sp eec h, sound quality , and the sub jective p erception of the size of the ro
om. F or controlled en viron- men ts, the v alues are fractions of seconds. The T 60 for b oth
ro oms in the third o cta v e is presented in Figure 3.3 . Chapter 3. Metho ds 61 Figure 3.3: R
everb er ation time in thir d of o ctave b ands up to 16 kHz. The ro om’s rev erb eration time T
20 w as measured using a loudsp eak er, arbi- trarily c hosen, and microphone setup as in
Section 3.2.1 . The measurement and analysis were p erformed in Matlab through the IT A-T o olb
o x soft w are. 3.2.1.2 Early-reflections T o ensure that there is no influence of the environmen
t, Recommendation ITU-R 1116-3:2015 [ 126 ], determines that the magnitude of the first
reflections should b e at least 10 dB b elo w the magnitude of the direct sound ∆SPL ≥ 10 dB. The
differences in the SPL that are determined in the en vironmen ts of this work met this
requirement. T able 3.1 sho ws the difference in sound pressure lev el b et w een the direct
sound and early reflections. Higher differences in the Erkisholm environmen t are consisten t with
its anec hoic setup compared to the sound treated b o oth in Glasgow, where the flo or provide
some energy to the reflections. Chapter 3. Metho ds 62 T able 3.1: Sound pressure level difference
b et ween direct sound and early reflections ∆ SPL [dB] ∆ SPL [dB] Angle Eriksholm Glasgo w 0
-20.99 -14.94 15 -23.40 -15.31 30 -22.66 -14.61 45 -21.97 -15.45 60 -20.39 -13.28 75 -21.22
-15.19 90 -17.71 -15.33 105 -21.49 -15.22 120 -17.83 -15.68 135 -20.12 -15.23 150 -19.70 -14.62
165 -19.13 -16.11 180 -24.57 -15.03 195 -23.56 -13.52 210 -22.62 -14.81 225 -21.04 -15.39 240
-22.29 -14.25 255 -23.73 -14.37 270 -20.90 -14.01 285 -24.06 -12.56 300 -19.61 -15.95 315 -17.68
-15.03 330 -21.46 -15.66 345 -23.08 -15.95 3.2.2 Pro cedure The exp erimen t studied how the
presence of a second listener within a loud- sp eak er ring affects the spatial cues of the repro
duced sound field. The data w ere collected through the HA TS, and the second listener being sim
ultaneously inside the virtualized sound area w as sim ulated through another mannequin (KEMAR),
as shown in Figures 3.4 and 3.5 . Using the results for the reverberation time as presented in
Section 3.2.1 , the appropriate length of a logarithmic sw eep signal w as calculated as
approxi- mately four times larger than the higher v alue of T 60 (1.49 seconds). Also, a stop
margin of 0.1 seconds was set to ensure the quality of the ro om impulse Chapter 3. Metho ds 63
Figure 3.4: HA TS (with motion-tr acking cr own) and KEMAR inside test r o om in Glasgow. Figure
3.5: HA TS and KEMAR inside ane choic test r o om at Eriksholm. resp onses (RIRs) that w ere
obtained [ 75 , 194 ]. The frequency of the sweep w as from 50 Hz to 20 kHz. The p osition of
the head has a significant effect on the signals that are mea- sured. T o ha v e a reliable
assessmen t of the absolute tri-dimensional p osition of the HA TS, its p osition w as measured
with a the Vicon infra-red trac king Chapter 3. Metho ds 64 system with an accuracy of 0.5 mm in
Glasgow. A t Eriksholm a laser tap e measure was used to ensure the correct p ositions. The
microphones’ heigh t p osition in b oth exp erimen ts was set to match the geometrical cen ter
of the loudsp eak ers enclosure in all measuremen ts. The first p osition measured used the HA TS
in the center, without interference from another obstacle inside the ring, to provide a
baseline. Figure 3.6a illustrates a set of positions to study the influence of a second listener
inside the ring while keeping the test sub ject in the cen ter (the sweet sp ot). Three different
p ositions for the KEMAR (50, 75 and 100 cm of sepa- ration) w ere measured with the HA TS fixed
at the center of the loudsp eak er arra y . The data collected are from microphones in the HA TS
ears; the KEMAR w as only a physical obstacle to sim ulate a listener inside the ring. Figure
3.6b , illustrates a different set of p ositions, maintaining a minimum separation of 50 cm b etw
een the center of the heads, were measured. The purp ose of these p ositions with the HA TS
off-cen ter was to iden tify the presence of distortions caused b y the decen tralization of the
sub ject and the effect of the addition of a listener within the circle of loudspeakers as a
physical obstacle to sound w av es. The p ositioning was standardized so that the mo v emen t
along the x-axis to the left and righ t directions of the dummies were annotated as negative and
p ositiv e, resp ectiv ely . 3.2.3 Calibration T o calibrate the HA TS recordings, the adapter
B&K UA-1546 w as connected to the B&K 4231 calibrator. That provided a 97.1 dB
SPL signal, which corresp onds to 1.43 P a, instead of 94 dB without an adapter. The recorded
signal from eac h ear w as used to calibrate the levels of all measurements. The calibration
factor was calculated as: Chapter 3. Metho ds 65 (a) Center e d p osition (b) Off-c enter p
osition Figure 3.6: HA TS in gr ay, KEMAR in yel low. a) Me asur e d p ositions with the HA TS c
enter e d and the KEMAR pr esent in the r o om in differ ent p ositions (thr e e c ombinations).
b) Me asur e d p ositions with the HA TS in differ ent p ositions and the KEMAR pr esent in the r
o om in differ ent p ositions (nine c ombinations). α l, rms = 1 . 43 rms( v l ( t ) 1kHz ) P a
VFS , (3.1a) α r, rms = 1 . 43 rms( v r ( t ) 1kHz ) P a VFS , (3.1b) where α l, rms is
the calibration factor for the left ear; α r, rms is that for the right ear; v l ( t ) is the
calibrator signal recorded in the left ear; v r ( t ) is that for the right ear; The individual
loudsp eak ers’ sound pressure lev el to the same file can differ dep ending on several factors (
e.g. , the amplification system’s level). T o balance that, a factor was then measured for a GRAS
1/2” pressure-field microphone recording a pistonphone’s calibrated sound signal from 1 kHz. The
calibration factor α rms w as calculated from the ro ot mean square (RMS) using: Chapter 3.
Metho ds 66 α rms = 10 RMS( v ( t ) 1kHz ) P a VFS , (3.2) where v ( t ) 1kHz is the
sinusoidal signal at 10 Pa recorded from the calibrator in v olts full scale (VFS). The loudsp
eak er correction factor is calculated through the iterativ e pro cess that starts repro ducing
a RMS scaled v ersion of a pink noise signal at 70 dB SPL. pink noise( t ) = pink noise( t )
rms(pink noise( t )) 10 70 − dBperV 20 µ Γ l (3.3) where Γ l is the level factor to the loudsp
eak er l with initial v alue = 1; dBp erV = 20 log 10 α rms 20 µ . The signal pink noise( t
) is play ed through a loudsp eaker l and simultaneously recorded with the microphone S l ( t );
the SPL of the recorded signal is calculated as follows SPL l [dB] = 20 log 10 S l ( t )[VFS] α
rms Pa VFS 20[ µ P a] ! , (3.4) T en measurements are sequentially p erformed, making interv
als of 1 second; the next iteration happens if the SPL obtained exceeds the tolerance of 0.5
[dB] on an y of the measurements. A step of ± 0.1 [VFS] is set to up date Γ l in its next
iteration accordingly to the SPL obtained. Chapter 3. Metho ds 67 3.2.4 VBAP Auralization In the
first measurement, VBAP was the tec hnique used to auralize the files. The first step in signal pro
cessing was recording the 24 (RIRs) one from each loudsp eak er. Kno wing the R T of the ro om,
a sweep (50-20000 Hz) was cre- ated, fulfilling the length requirement; in this case, a
logarithmic sw eep of 1.49 seconds. After that, an inv erse filter (minim um-phased) w as created
to comp ensate for the frequency resp onses from the different loudsp eak ers. This signal is
then pro cessed through the VBAP technique to the sp ecified array of 24 loudsp eak ers. The
output is a file with 24 c hannels containing the sweep signal appropriately w eighted to the sp
ecific angle. The signal can b e pro cessed through a single channel (when the angle to b e pla y
ed is at the loudsp eak er p osition) or up to tw o combined channels when it is a virtual
loudsp eak er’s p osition. Each channel was also conv olved with the designed filter. The final
(auralized) signal w as used as an excitation in the transfer function where the receptors were
a pair of microphones in the B&K HA TS. 3.2.5 Am bisonics Auralization In the second
measuremen t at the Eriksholm test room, the files were auralized with first-order Ambisonics in a
similarlly to VBAP . T o b e able to pro cess the excitation signal, to acquire the impulse resp
onses, some adaptations were required. In this case, the Am bisonics auralization pro cess
requires an enco ded impulse resp onse that contains the magnitude and the direction of
incidence infor- mation for eac h instance of time. This RIR can b e attained via computer sim
ulation or recorded with a sp ecific arra y of microphones. The ODEON soft w are v ersion 12.15 w
as used to sim ulate the sound b eha vior in an ane- Chapter 3. Metho ds 68 c hoic environmen t
and enco de the impulse resp onses in Ambisonics first-order format around the listener. Odeon
softw are is based on a h ybrid n umeric metho d [ 59 ]. In general, the Image-Source, a
deterministic metho d, is fa vored in the region of the first reflections up to an order
predetermined b y the user. Then, reflections from subsequen t orders than the predetermined
transition order are calculated using ra y tracing, a sto chastic metho d [ 148 , 201 ].
Therefore, it is p ossible to simulate the sound b ehavior from a 3D mo del description of the
space and details of its acoustic prop erties. F rom that sim ulation result, any music or sound
can b e exp orted as recorded inside that space from the giv en p ositions of source and
receptor [ 288 ]. Another option is to exp ort the ro om impulse resp onse, which represen ts
the sound b eha vior of the given source receptor p ositions. The RIR can also b e exp orted as
BRIR and Ambisonics in first and second order in the v ersion 12 of the Odeon softw are. The
selected materials used to comp ose the simulation, and their corresp on- den t absorption co
efficien ts used in the ODEON sim ulation are listed in the App endix E . In total, 72 differen t
RIRs (5 degrees separation) were simu- lated for differen t p ositions of source-receptor. The
simulated source p ositions w ere at the same distance of 1.35 meters from the cen ter as the
loudsp eak- ers in the anec hoic ro om. These RIRs w ere conv olved with the appropriate sw eep
signal, pro ducing a four-channel first-order Am bisonics sweep signal. These signals were then
pro cessed by a deco der to the loudsp eakers array’s sp ecific p ositions, generating the
auralized 24 c hannel files. The in v erse filter pro cedure to each loudsp eak er was applied as
well as the calibration of the sound pressure level across loudsp eak ers. The alpha factor was
calculated as α rms = 1 rms( v ( t ) 1kHz ) Pa VFS , since the recorded input w as from a
sound calibrator t yp e 4231 b y B&K delivering 1 [Pa] SPL. The equalized, conv olved,
deco ded, and filtered sweep signals contain the sim ulated source-receptor sound distri- Chapter
3. Results 69 butions in magnitude, time, and space as if recorded inside the simulated ro om.
In this exp erimen t, the simulated anechoic ro om has an absorption co efficien t equal to one on
all surfaces, sim ulating the anec hoic condition. The setup in first-order Ambisonics was chosen
giv en the p ossibility of explor- ing a reduction in the num b er of loudsp eak ers in future
exp eriments and the p ossibilit y of generating it through v alidated soft w are such as Odeon.
3.3 Results In this study , the p erformance of the system w as ev aluated by collecting and
analyzing results based on the positions of a mannequin within the virtual sound field ( i.e. ,
center and off-cen ter) and the conditions under which the system was tested ( i.e. , with and
without the presence of a second head-and- torso simulator). The results were presen ted in
terms of angles referenced coun ter-clo c kwise, whic h allow ed for a detailed analysis of the
system’s p er- formance under v arious conditions. Through this analysis, it was p ossible to
gain a comprehensive understanding of the system’s capabilities and iden tify p oten tial areas
for impro v emen t. 3.3.1 Analysis The signals were play ed and sim ultaneously recorded; the
recorded result car- ried the auditory spatial effects from auralization and also the ph ysical
limita- tions given b y the virtualization setup ( e.g., loudsp eakers’ frequency resp onse, and
presence of loudspeakers inside the ro om). As the recorded sweep has a greater length than the
original one, zero-padding was p erformed. In that pro cess, zero es are app ended to the end of
the time domain signal, obtaining Chapter 3. Results 70 the equiv alent con volution nonetheless
[ 242 ]. After that, it w as possible to calculate the virtual en vironmen t’s impulse resp onse
by dividing the recorded signal b y the zero-padded v ersion of the initial sweep, b oth in the
frequency domain. F or b oth measurements, the in teraural time difference is calculated b y
compar- ing the sound’s arriv al time in the impulse resp onse b etw een the tw o c hannels of a
binaural ro om impulse resp onse (BRIR). There are different metho ds for ITD calculation [ 132 ,
314 ]. In this w ork, ITDs w ere estimated as the dela y that corresp onds to the maximum of the
normalized in teraural cross-correlation function (IA CF). According to the ISO-3382-1:2009 [
127 ], the IA CF is calcu- lated as: IA CF t 1 ,t 2 ( τ ) = R t 2 t 1 p L ( t ) p R ( t + τ )d t
R t 2 t 1 p 2 L ( t )d t R t 2 t 1 p 2 R ( t )d t (3.5) where p L ( t ) is the impulse
resp onse at the en trance of the left ear canal; p R ( t ) is that for the right canal; The
interaural cross correlation co efficien ts, IACC [ 127 ], are given by: IA CC t 1 ,t 2 = max | IA
CF( τ ) | , for − 1ms < τ < 1ms . (3.6) Similarly , to calculate the in teraural
lev el difference (ILD), a fast F ourier trans- form (FFT) is applied to the time domain’s
impulse resp onses, the sp ectrum is divided in to a v eraged o cta v e bands, and the ratio in
dB b et w een the frequency magnitudes are calculated as the ILD: Chapter 3. Results 71 ILD( n )
= 20 log 10 q R p n R ( t ) 2 q R p n L ( t ) 2 , (3.7) where: n is the giv en frequency
band; p n R ( t ) is the bandpassed right impulse resp onse; p n L ( t ) that to the left
channel. 3.3.2 Cen tered p osition In the cen tered-p osition configuration, (Figure 3.6a ), the
listener remains at the ideal VSE p osition (cen ter) to fo cus on the effect of an added
listener inside the loudsp eak er ring. This framework can b e v aluable to auditory researc h
as it can be used to analyze group responses to in terviews, argumen ts, collab orativ e w ork,
so cial stress or disputes b etw een individuals in listening tasks. The IA CC to the fron tal
angle (0 ◦ ) across frequencies is sho wn in Figure 3.7 . High v alues indicate that the system
delivers the same signal to both ears. Con v ersely , the drop in IACC v alues at high
frequencies can indicate that the Ambisonics may fail to render specific frequencies affecting the
o ctav e bands analysis. The IA CC v alues measured across all angles for VBAP and Am bisonics
can b e found in Figure 3.8 . They indicate that Ambisonics tend to pro vide less lateralization
in lo w er frequencies (constant and higher IA CC v alues) and lo w er but constan t v alues in
high frequencies, p ossibly translating to blurred sound lo calization. Chapter 3. Results 72
Figure 3.7: Inter aur al cr oss c orr elation as a function of fr e quency in o ctave b ands - F
r ontal angle 0 º . Figure 3.8: Inter aur al cr oss c orr elation for aver age d o ctave b ands
in Ambisonics and VBAP te chniques r epr esente d in p olar c o or dinates. That can happ en due
to a tilt in p ositioning the hats or imprecision from the virtualization system. F or example,
a high-frequency sound w a ve at 8 kHz has a w av elength of approximately 4 cm and 2 cm at 16
kHz, which means that ev en a sligh t tilt can influence high-frequency IA CC. F urthermore, the
in verse FIR filter applied was not the in v erse broadband signal, but the filtered in Chapter 3.
Results 73 third of o cta v e bands. That decision was a signal pro cessing compromise, as a
broadband filter w ould only partially comp ensate for loudsp eakers’ geometry or phase
differences in high frequencies. This p oint can b e further in v estigated as a wa y to improv e
Ambisonics repro duction. There is a relative increase of v ariations with frequency in VBAP
results, whic h are presen t to a lesser extent in the Ambisonics IA CC results. That reveals a
difficulty from Am bisonics to drive a go o d sense of lo calization as a high coherence level
indicates the sound coming from front or bac k [ 58 ]. A t the same time, due to the Ambisonics
activ ation of all a v ailable loudsp eak ers to render the sound in the sweet sp ot area, the
sense of immersion is higher. 3.3.2.1 Cen tered ITD The ITD results presented were obtained
after a ten th-order low-pass Butter- w orth filter (LPF) w as applied. The filter’s cutoff
frequency w as 1,000 Hz to appro ximate the low frequency dominance in ITD [ 38 , 124 , 197 ,
242 ]. V ector Based Amplitude P anning The light blue line in Figure 3.9 shows the results for
the ITD from the initial setup (HA TS alone cen tered). The sys- tem presented a magnitude p eak
in resp onse time of appro ximately 650 µ s, whic h corresp onds to appro ximately 22 cm for a
wa ve tra v eling at the ve- lo cit y of sound propagation in the air. This distance is
comparable to the distance b etw een HA TS microphones (19 cm). It is appropriate to note that
the symmetry of HA TS is also presented in the HA TS alone results (triangles in Figure 3.9 )
pro viding reassurance in the quality of the data collected. The HA TS was k ept in the center
of the loudsp eaker ring for the next set of measuremen ts. A second listener’s influence w as
then simulated by introduc- Chapter 3. Results 74 Figure 3.9: a) HA TS alone at c enter. b)
Light blue line: HA TS alone at c enter. Black line: HA TS c enter e d and KEMAR at 0.5 m to the
right. Blue line: HA TS c enter e d and KEMAR at 0.75 m to the right. R e d line: HA TS c enter
e d and KEMAR at 1 m to the right. ing a KEMAR and laterally v arying its p osition along the
lateral axis (x-axis). The results are presen ted in Figure 3.9 . The ITD data obtained from
this exp erimen t make it p ossible to comprehend that the second mannequin (KE- MAR) has an
impact as an obstacle on the interaural time difference in the HA TS at the center of the loudsp
eaker ring. In the closest p osition of the second listener (50 cm from the center), there is a
reduction of ITD v alues (angles b et ween 285 and 305 degrees). Thus, the maxim um difference is
50 us. That effect is related to the insertion of the physical obstacle represented b y the
second listener. As the sound wa ve diffracts, differen t paths to the listener’s ears are
imposed, reducing the sound’s arriv al time b etw een ears. Therefore, the effect should b e
centered at 270 degrees. How ever, the second listener was not p erfectly aligned to the lateral
of the centered listener. That was a limitation of the exp eriment as the KEMAR w as placed in
an ordinary chair, and its b ottom is not flat. Chapter 3. Results 75 Am bisonics The ITD results
for the initial setup (HA TS alone cen tered) virtualized from Ambisonics auralization are
presen ted in Figure 3.11 . The system sho w ed a magnitude p eak in resp onse time, roughly 600
µ , 50 µ lo w er than the VBAP method. Another c haracteristic of Am bisonics ITDs is the flat b
eha vior around the lateral angles, which is generated mainly by the c hosen order of the
Ambisonics auralization. In first-order, the horizontal directivity is determined b y the to an
intersection of three bi-directional (figure-eight) sensitivit y patterns, circumv ented b y a
omnidirectional one, as illustrated in Figure 3.10 . That can also limit the lo calization p
erformance when utilizing first-order Ambisonics, ev en when repro duced through a higher num b
er of loudsp eak ers. Figure 3.10: Horizontal 2D Ambisonics dir e ctional sensitivity cr op r
epr esentation. The r e d line r epr esents an omnidir e ctional p attern, the black line r epr
esents a bidi- r e ctional p attern, y-axis oriente d (nul l p oints at the sides), and the
purple line is a bidir e ctional p attern r epr esentation x-axis oriente d (nul l p oints in fr
ont and the b ack). The HA TS w as kept in the center of the loudsp eak ers ring and simulated a
second listener’s influence on the sound field by in tro ducing a KEMAR to three differen t p
ositions along the x-axis 50, 75, and 100 cm to the left of HA TS ( i.e. at 270 ◦ ). The results
are presented in Figure 3.11 by the black, blue, and red lines. The data clearly demonstrated
that as an obstacle, the second listener (KEMAR) do es not influence the interaural time
difference when using Ambisonics, and HA TS is at the center of the loudsp eak er ring. Chapter
3. Results 76 Figure 3.11: a) HA TS alone at c enter. b) Light blue line: HA TS alone at c
enter. Black line: HA TS c enter e d and KEMAR at 0.5 m to the right. Blue line: HA TS c enter e
d and KEMAR at 0.75 m to the right. purple line: HA TS c enter e d and KEMAR at 1 m to the
right. 3.3.2.2 Cen tered ILD The effects in higher frequencies due to a second listener require
an analysis of a differen t parameter. Instead of studying the difference in the arriv al time of
the sound b et w een the ears, the representativ e metric is the level difference b et w een the
ears. There are effects as absorption, reflection, and diffraction before the sound pressure signal
reaches the eardrums. The torso, shoulders, outer ear, and pinna mechanically affect an incoming
sound w av e. These effects are angle and frequency-dep enden t, as differen t frequency wa ves ha
ve differen t w a ve- lengths [ 39 , 40 , 90 ]. The effects on ILD caused by the virtualization
pro cess w ere calculated as the differences b etw een the reference ILDs measures with HA TS
alone and centered and the ILDs measured with HA TS and a second mannequin (KEMAR). As a
reference, Figure 3.12 presents the ILDs by each metho d from t w elve different angles (30
degrees separation) around the listener. Chapter 3. Results 77 Figure 3.12: Inter aur al L evel
Differ enc es as a function of o ctave-b and c enter fr e- quencies in twelve differ ent angles ar
ound the c entr al p oint. There are differences b et ween ILDs calculated from measuremen t with
b oth tec hniques on the energy in the av eraged o cta v e bands. Ho w ev er, the ILDs from VBAP
present a significant effect based on incidence angle (more natu- ral) than the Am bisonics [ 222
]. F urthermore, the ILD p eak for the Am bisonics is observ ed around 2000 Hertz, which can b e
interpreted as the limit in fre- quency repro duction of level difference b et w een ears when
deco ding through 24 loudsp eak ers [ 299 ]. A more comprehensive comparison b etw een
techniques with the HA TS centered alone can b e observ ed in the heatmap representation from
Figure 3.13 including all 72 angles (5 degrees separation) measured. The homogeneit y across
angles from Am bisonics measurements indicates that its ILD lac ks precision as a binaural
spatial cue. Lo calization accuracy in Am- bisonics repro duction, esp ecially to lateral
angles, is highly dep enden t on its order (acquisition and repro duction) [ 27 ]. Figure 3.14
sho ws the energy difference across the o cta ve bands for eight dif- feren t incidence angles on
b oth techniques with and without the presence of the second mannequin. On b oth techniques, the
strongest influence happ ens Chapter 3. Results 78 Figure 3.13: Inter aur al level differ enc es
aver age d o ctave b ands as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE
4128-C in the horizontal plane. when the second mannequin is closest to the center. The second
listener is to the righ t angle in VBAP (270 ◦ ), while in the Ambisonics is p ositioned to the
left (90 ◦ ). Figure 3.14: Inter aur al L evel Differ enc es (o ctave b and) angles ar ound the c
entr al p oint c onsidering differ ent displac ement of the se c ond listener. The ILDs
calculated from measurements with the second mannequin present Chapter 3. Results 79 are not
extensively different compared to the reference ILD. The difference is prop osed to b e observed
as a distortion parameter. These differences were cal- culated b y subtracting the ILDs with the
second mannequin from the specified cen ter alone reference ILD. Ideally all graphs should b e
black for a full matc h (no difference b et ween different setups/p ositions), meaning no measured
dis- tortion. V ector Based Amplitude P anning Figure 3.15 presents the differences b et w een
ILDs calculated from HA TS centered (HC) and the configuration that com bines the HA TS cen tered
plus the KEMAR in one of the three p ositions ( e.g. , HC K-50 is the defined notation to HA TS
cen tered and KEMAR at 50 cm to the righ t). The sounds w ere auralized via VBAP for all 72
angles (5 ◦ spacing). The angles that corresp ond to loudsp eak er lo cations (15 ◦ spacing) is
placed were repro duced directly by the physical loudsp eak er at that angle. Figure 3.15: VBAP
discr ep ancies in ILD b etwe en HA TS at the c enter and: (T op) HA TS at the c enter plus
KEMAR at 50 cm to the right, (Midd le) HA TS at the c enter plus KEMAR at 75 cm to the right,
(Bottom) HA TS at the c enter plus KEMAR at 100 cm to the right. The differences in frequencies
ov er 1 kHz are pronounced for angles to the right side of cen tered HA TS, 270-305 ◦ azim uth.
Smaller effects can also b e noted Chapter 3. Results 80 on other angles that corresp ond to
virtual sound sources (where there is no loudsp eak er, and the sound source is pro duced via
the auralization technique). These effects are diminished as the second mannequin p osition
increases a wa y from the centered receptor, indicating a smaller acoustic shadow. Figure 3.16:
VBAP Inter aur al L evel Differ enc es as function of azimuth angle ar ound the c enter e d
listener. Figure 3.16 sho ws the ILD in six o cta ve bands from impulse resp onses recorded with
files auralized using VBAP . The HA TS cen tered (HC) p osition refers to HA TS alone and it is
compared to the configurations adding the second listener (KEMAR) in three differen t p ositions
50, 75 and 100 cm displaced from the cen ter (K+50, K+75, and K+100, resp ectively). The
mismatch is pronounced when KEMAR is closer (blue line), esp ecially in the angles blo ck ed b y
KEMAR. As the second listener blo cks the sound w a v e, an acoustic shadow is created, which
reduces the sound energy to the ear facing the sound source, decreasing the lev el difference b
etw een ears. There is also a reduction in ILD for angles from 35 to 50. That can b e related to
the opp osite effect where the mannequin reflects part of the sound, increasing the lev el to HA
TS-centered counter ear. The finding supp orts in terpreting that a Chapter 3. Results 81 substan
tial effect o ccurs on ILDs to the KEMAR’s closer p osition. Am bisonics Figure 3.17 presen ts
the calculated differences b et w een ILDs from the Ambisonics auralization with the same
configurations ( i.e. , HC vs . HC K-50, HC vs . HC K-75, and HC vs . HC K-100).F or conv
enience, the second mannequin was p ositioned to the left of the cen ter (90 ◦ ). The switc h
from right to left do es not affect the comparison as b oth HA TS and the Eriksholm test ro om
are symmetric. Figure 3.18 shows the ILD in o cta ve bands, to highligh t the stronger effect b
eing at the 8 kHz. Figure 3.17: Ambisonics discr ep ancies in ILD b etwe en HA TS at the c enter
and: (T op) HA TS at the c enter plus KEMAR at 50 cm to the left, (Midd le) HA TS at the c enter
plus KEMAR at 75 cm to the left, (Bottom) HA TS at the c enter plus KEMAR at 100 cm to the left.
The results demonstrate that including a second listener has a negligible ef- fect on Ambisonics
first-order ILDs. Ho wev er, on the reference measurement (HA TS alone), the ILDs did not
adequately repro duce this spatial cue through- out the angles around the listener, given the
observ able minor ILD differences across angles, esp ecially ov er 2 kHz. Chapter 3. Results 82
Figure 3.18: Ambisonics Inter aur al L evel Differ enc es as function of azimuth angle ar ound
the c enter e d listener. 3.3.3 Off-cen tered p osition Being able to ha ve the participan t aw
ay from the center of the loudsp eak- ers ring can b e v aluable for testing simultaneous
participants or a particular ph ysical apparatus’ influence ( e.g. Listening effort ev aluated
under presence of another individual [ 230 ]). Auditory research that aims to test the influence
of a particular noise, SNR, or the direction of the noise on the in teraction in participan ts’
con versation can benefit from a setup that would make it p ossible to virtualize a sound scene
and presen t it without spatial distortions. Mea- suremen ts aiming to study the influence of
off-center HA TS displacement w ere p erformed in nine differen t configurations: with HA TS and
KEMAR indep en- den tly displaced 25, 50, and 75 cm from the cen ter, resulting in separations
of 50, 75, 100, 125, and 150 cen timeters (See Figure 3.6b ). The listening p osition is
critical to the auralization pro cess tec hniques pre- sen ted in this w ork as they are derived
and programmed to render the sound in the center of a loudsp eaker arra y . Adding computer p o
w er to real-time pro- Chapter 3. Results 83 cessing could handle participan t mo vemen ts;
although that can b e considered, it w as not in this part of the exp erimen t scop e. Suc h pro
cessing fo cuses on dynamics (head motion). The fo cus here is the effects of sub-optimal p
ositions and the influence of a second listener as an obstacle to the sound field. 3.3.3.1 Off-cen
ter ITD The effects of off-center p ositioning on sound’s arriv al time can affect the sub- jectiv
e p erception of the sound incidence direction. V ector Based Amplitude P anning Observing the
ITD results sho wn in Figures 3.19 , 3.20 , and 3.21 , almost no influence of the second
mannequin (KEMAR) can b e noted ev en with the HA TS off-center. The ITD at off-center p ositions
deviates from the ITD from HA TS cen tered at the same prop ortion regardless the second
listener (KEMAR) p osition. Nonetheless, Figure 3.22 shows that a pronounced effect app ears by
shifting out the HA TS off center. When exceeding 25 cm, the spikes represen t a difficulty of the
v ector-based amplitude panning pro cess to generate the virtual sound sources. This b ehavior
is exp ected as the VBAP mathematical form ulation is deriv ed by a unitary vector p ointing to
the center. In figures 3.20 and 3.21 it is p ossible to observ e more considerable distortions
(sharp p eaks crossing the reference line in addition to b eing offset from the reference line)
in the ITD for the virtual sound sources. Such distortions increase as HA TS is mov ed a w a y
from the cen tral position. Sound sources repro duced using VBAP in this loudsp eak er ring to
these receptor p ositions w ould not b e correctly interpreted in terms of direction b y the
listener. The ITDs difference is greater when the sound sources are at angles close to the
Chapter 3. Results 84 Figure 3.19: ITD as a function of sour c e angle Light blue line: HA TS
alone at the c enter. Black line: HA TS at -25, KEMAR at +25. Blue line: HA TS at -25, KEMAR at
+50. R e d line: HA TS at -50, KEMAR at +75. Figure 3.20: ITD as a function of sour c e angle
Light blue Line: HA TS alone at the c enter. Black Line: HA TS at -50, KEMAR at +25. Blue line:
HA TS at -50, KEMAR at +50. R e d line: HA TS at -50, KEMAR at +75. fron t or rear (0 ◦ and 180
◦ ) directions. This effect is related to HA TS ph ysical displacemen t. The ITD results at
lateral angles are represen ting a larger lob e to the HA TS righ t ear (270 ◦ ), and a sharp
ened lob e at the HA TS left ear (90 ◦ ) Chapter 3. Results 85 Figure 3.21: ITD as a function of
sour c e angle. Light blue line: HA TS alone, c enter e d. Black line: HA TS at -75, KEMAR at
+25. Blue line: HA TS at -75, KEMAR at +50. R e d line: HA TS at -50, KEMAR at +75. Figure 3.22:
ITD as a function of sour c e angle. Light blue line: HA TS alone, c enter e d. Black line: HA
TS at -25, KEMAR at +25. Blue line: HA TS at -50, KEMAR at +50. R e d line: HA TS at -75, KEMAR
at +75. sho ws the off-center displacement. This effect occurs b ecause HA TS is not at the cen
ter of the ring (See Figure 3.23b ), and the angles and separations b et w een the loudspeakers
are mo dified. The effect is ev en more apparen t when Chapter 3. Results 86 lo oking only at the
real sound source (angles corresp ondent to loudsp eak er lo cations) ITDs, without the
distortions created by VBAP auralization, (See Figure 3.23a ). (a) (b) Figure 3.23: a) ITD for r
e al sound sour c es. Light blue line: HA TS alone, c enter e d. Black line: HA TS at -25, KEMAR
at +25. Blue line: HA TS at -50, KEMAR at +50. R e d line: HA TS at -75, KEMAR at +75. b) HA TS
off-c enter p osition -75 cm scheme facing the thir d loudsp e aker. Am bisonics The VBAP metho d
constructs the auditory spatial cues through one to three loudsp eak ers in this setup, usually
in the same quadrant. Am- bisonics, in contrast, uses all the av ailable loudsp eakers in the
rendering pro- cess. Hence, the sound lo calization is b enefited on VBAP auralization com- pared
to Am bisonics due to the nature of the methods [ 104 , 105 , 175 , 180 , 221 ]. F urthermore,
the ITDs results observed from the first-order Am bisonics reflect the metho d’s limitation on sw
eet sp ot size. Figure 3.24 shows the calculated ITD in three differen t configurations H+25 K-25,
H+50 K-50, H+75 K-75, and the center configuration for comparison. T o improv e readabilit y ,
the ITD results for the remaining spatial configuration (whic h were similar across conditions)
can b e found in App endix A . The exp ected size of a listening area is 20 cm when com bining
24 speakers to repro duce Ambisonics in a 2D horizon tal matrix [ 299 ]. The displacemen t
Chapter 3. Results 87 Figure 3.24: ITD as a function of sour c e angle in A mbisonics setup.
Light blue line: HA TS alone, c enter e d. Black line: HA TS at -25, KEMAR a t +25. Blue line:
HA TS at -50, KEMAR at +50. R e d line: HA TS at -75, KEMAR at +75. of 25 cm and greater puts
the receptor outside the sw eet sp ot. Therefore, it is p ossible to observe in Figure 3.24 that
Am bisonics do es not virtualize this acoustic track correctly outside the center p osition, as
the v alues remain mostly constant for the side b eing play ed. 3.3.3.2 Off-cen ter ILD ILDs can
b e highly sensitive to the listener’s p osition on a virtualized sound field, given the
considered smaller w a velengths. The comp osition of a virtu- alized sound w a v e is b e p
erformed by sim ultaneously combining sounds from sev eral sound sources, whic h requires a
highly precise com bination. This section in v estigates the ILD c hanges due to ha ving the
listener aw ay from the optimal p osition while having another listener present. ILD’s influence
when the HA TS and a second participan t are aw ay from the center. A comparison of the ILD
results across the p ositions is sho wn in Figure 3.25 ; it presents for b oth tec hniques the
calculated ILDs o ver frequency on eigh t Chapter 3. Results 88 incidence directions spaced o
ver 45 degrees in the azimuth on three different p ositions plus the centered p osition as a
reference. The pattern deviation as the receptor is mo v ed from the cen ter is not the same
across the tec hniques. As exp ected, the ph ysical construction of the summed sound wa ve from
Am- bisonics that relies on all loudsp eak er has a higher impact on ILDs than the VBAP which
only com bines few sound sources from the same quadrant. Figure 3.25: ILD as a function of fr e
quency at differ ent angles (line c olor) for VBAP (top r ow) and Ambisonics (b ottom r ow) for
symmetric al displac ement in off- c enter setups. On files auralized through VBAP , the
discrepancies b et ween ILD measured ha ving HA TS in the center (optimal p osition) and the
other p ositions can b e in terpreted as acoustic artifacts capable of conv eying the wrong lo
calization of the sound source. Although the second listener did not hav e a primary influence,
the observed displacement from the center affects the ILD pattern, esp ecially the higher
frequencies. F or Am bisonics, the listener p osition is crit- ical. The ILD differences from
center to off-center p ositions create artifacts that compromise ILD used as a cue to the sound
localization on all tested p ositions. Chapter 3. Results 89 V ector Based Amplitude Panning The
top row of Figure 3.25 shows the ILD screening in some of the incidence angles. Comprehensive
visualization of ILDs across angles is presented in Figure 3.26 for the reference-cen tered
(top) and off-centered p ositions. There is an effect on ILDs when mo ving the receptor from the
center p osition and adding a second listener inside the loudsp eak er’s ring. Although
noticeable, the effect still preserves the pattern, allo wing the difference to b e interpreted as
artifacts. The vertical zero es ILDs indicated the fron tal and rear angles (0 ◦ and 180 ◦ )
where the sound should arriv e at the ears with the same level. These vertical black lines are
shifted as the listener is displaced from the cen ter. A t 75 cm displacement the low est v alue
vertical line on Figure 3.26 app ears is 35 ◦ (fron tal) and 145 ◦ (rear) Figure 3.26: VBAP
setups: ILD on c enter e d p osition (top); ILD on off-c enter setups: HA TS at 25 cm to the left
with KEMAR at 25 cm to the right (midd le top); HA TS at 50 cm to the left with KEMAR at 50 cm
to the right (midd le b ottom); HA TS at 75 cm to the left with KEMAR 75 cm to the right (b
ottom). The difference b etw een ILD with the HA TS in the reference p osition (alone and in the
center) and the configurations with HA TS outside the center sim ul- taneously with KEMAR are
shown in Figures 3.27 , 3.28 and 3.29 . The acoustic field b ehavior outside the cen ter of the
ring at frequencies ab o v e Chapter 3. Results 90 Figure 3.27: VBAP differ enc es in the ILD b
etwe en c enter e d Alone and off-c enter with KEMAR setups: HA TS at 25 cm to the left with:
KEMAR at 25 cm to the right (top); KEMAR at 50 cm to the right (midd le); KEMAR 75 cm to the
right (b ottom). Figure 3.28: VBAP differ enc es in the ILD b etwe en c enter e d setup and 25 cm
off- c enter VBAP setups: HA TS at 50 cm to the left with: KEMAR at 25 cm to the right (top);
KEMAR at 50 cm to the right (midd le); KEMAR 75 cm to the right (b ottom). 1 kHz presen ts
significan t ILD differences for the measured configurations, esp ecially on angles that are
virtual sound sources. The ILD difference reaches up to 15 dB. As in the ITD, the ILD data from
HA TS in the off-cen ter p osition shows the acoustic shado wing effect caused by KEMAR. It is p
ossible to note that as Chapter 3. Discussion 91 Figure 3.29: VBAP differ enc es in the ILD b
etwe en c enter e d setup and off-c enter setups: HA TS at 75 cm to the left with: KEMAR at 25 cm
to the right (top); KEMAR at 50 cm to the right (midd le); KEMAR 75 cm to the right (b ottom).
close as KEMAR is p ositioned to HA TS, greater discrepancies in ILD around p ositions near 270
degrees o ccur. This effect is due to the diffraction and absorption of the sound on the second
listener (KEMAR), and happens for b oth real (loudsp eaker) and virtual sound sources lo
cations. Am bisonics Am bisonics presents a more considerable limitation regarding mo v emen t
outside the cen ter of the ring due to its nature. The sound comp o- sition requires a
combination of amplitude and phase from all av ailable loud- sp eak ers b eing the correct
represen tation ac hiev ed only for an area at the center and without obstructions. The ILD in o
cta v e bands is sho wn in Figure 3.30 . The low amplitude and homogeneity across frequencies
demonstrate that Am- bisonics is limited to render the binaural cue prop osed, not appropriately
de- liv ering the lev el differences outside the cen ter. The ILD differences from the off-cen ter
p ositions to the HA TS centered are presen ted in App endix B . Chapter 3. Discussion 92 Figure
3.30: Ambisonics setups: ILD on c enter e d p osition (top); ILD on off-c enter setups: HA TS at
25 cm to the left with KEMAR at 25 cm to the right (midd le top); HA TS at 50 cm to the left
with KEMAR at 50 cm to the right (midd le b ottom); HA TS at 75 cm to the left with KEMAR 75 cm
to the right (b ottom). 3.4 Discussion Once the listener is cen tered in the loudsp eak er arra
y , the second listener did not affect the auralization other than the angles ph ysically shadow
ed b y the second listener. Th us, a second listener do es not deteriorate the spatial cues on b
oth auralization techniques analyzed in this w ork. F or VBAP , the discrepancies in ITD only o
ccur as the second listener was p ositioned 50 cm a wa y , the closest measured p osition in
this exp eriment. Also, differences in ILD for VBAP are more notable at the second listener’s
closest p osition. Concurren tly , Am bisonics has not presented an apparen t difference in ITD
to a cen tered listener b y placing a second listener inside the ring. The difference in Am
bisonics ILDs from the cen tered reference indicates an acoustic shado w (this time at the left
angle of 90 degrees) and an additional sligh t difference across other angles. Chapter 3.
Discussion 93 There is an apparent effect on ITD as the listener is mov ed out of the center. F
or VBAP , the p eak of magnitude remains practica lly the same, appro ximately 650 microseconds,
as the ITD 0 v alue (sound reac hing sim ultaneously in b oth ears) is shifted. At 75 cm
off-center to its left side, the difference in arriv al time corresp onds to a shift of 30 degrees
appro ximately . That is in line with the setup, as the mannequin was placed in front of another
loudsp eak er. Ho wev er, the Ambisonics ITDs demonstrate that the comp osition of magnitude and
phase is not completed in off-centered p ositions. The Am bisonics weigh ts are calculated to the
sound wa ves from the loudsp eakers to interact in the center p osition and then form a sound
field represen ting a sound wa ve from a defined incidence angle. Moving the primary listener to
the right makes the in teraction b et w een the loudsp eak ers inaccurate. In this case, the
time difference turns wrong due to the Am bisonics truncation order to b e lo w increasing the
aliasing effect as it can b e similarly observed in third and fifth order in Lauren t et al., [
275 ]. The sound from the right mainly reac hes the right ear and trav els to the left ear b
efore the sound from the left side can trav el the extra distance. That is an expected effect
since ev en the minim um displacemen t (25 cm) is larger than the exp ected repro ducible area
(around 20 cm) for this setup. There w as no difference observ ed as the second listener (KEMAR)
p ositions were changed (25, 50, and 75 cm to the right of the cen ter) in all VBAP measurements
with HA TS p ositioned to the left of the center. Considering the ITD just noticeable difference
(JND) in an anechoic condition is of the order of the 10 to 20 microseconds [ 38 , 140 , 241 ],
the ITD results when the off-center HA TS p osition w as 25 cm to the left were a goo d
approximation of the reference- cen tered measurement. That means that a listener would not b e
able to discern the difference con- cerning the direction of incidence if placed in these p
ositions relying only on Chapter 3. Discussion 94 the ITD cue. It is also w orth considering
that the JND to rev erb eran t condi- tions is even higher [ 140 ] and the artifact can b e
masked by rev erb eration [ 97 ] whic h would b enefit the auralization pro cess. The HA TS
measurements p osi- tioned on 50- and 75-cm presents p eaks and crossov er v alues across the
line that corresp onds to centered ITD, which indicate distortion problems at low frequencies
regarding the spatial cue. A similar analysis of the KEMAR im- pact on ITDs from Ambisonics
virtualization can not b e achiev ed since the ITD is not accurately rendered outside of the
sweet sp ot. Eac h result of the in teraural lev el difference p osition combination (HA TS and
KEMAR) was subtracted from the HA TS results alone to p erform the ILD analysis off-center. In
the VBAP metho d, a shado w effect generated by a second listener is presen t as exp ected,
mainly when the first listener is 25 or 50 cm left of center. Ho wev er, the differences in high
frequencies are essentially on virtual sources, whic h indicates the difficulty of creating the
virtual sound source impression outside the center p osition, indep endently of the second
listener presence [ 2 ]. Off-cen ter p ositions did not allo w the accurate syn thesis of the
ILDs from the loudsp eak ers using Ambisonics. The metho d did not repro duce time or level
differences accurately in these conditions, whic h could lead to not ac hieving the correct
spatial impression. That is in line with literature, although generally in v estigating higher
Am bisonics orders, the complexity of accurately render high frequency cues is presen t [ 279 ,
290 ], and also the off-cen ter increase on accuracy b y increasing the Ambisonics order with a
prop er n umber of loud- sp eak ers [ 275 ]. It should b e noted that the curren t study did not
measure c hanges in ITD and ILD for off-center listener p ositions without the presence of a
second listener. Based on the effects of ha ving the first listener off-cen ter with a second
listener Chapter 3. Concluding Remarks 95 presen t, coupled with the smaller c hanges with a
second listener when the first listener is centered, it can b e deduced from the current results
that the off- cen ter p osition has a degradation effect on the ITD and ILD. Considering that man
y sim ulations are limited by a “sw eet spot” for the listener(s), the off- cen ter p osition, as
opp osed to the presence of a second listener, is probably the greatest liability for m
ulti-listener metho ds in hearing research. 3.5 Concluding Remarks The more demanding the test
requiremen t in terms of lo calization of the sound source (out from left, right, front and
back), the more the researcher should driv e tow ards VBAP . In case of fixed p ositions and
requirement of more sense of immersion, Ambisonics should b e able to build more con vincing
sound sce- narios. The techniques do not affect the ILD and ITD acoustic cues in the cen tral p
osition for one test participant. The addition of a second listener within the ring also do es
not significantly affect these parameters at the three distances tested, except for the angles
usually hidden b y the shadow second listener. Th us, it is suitable to mo v e to wards sub
jectiv e tests with a cen ter participant and an actor on the side. Although the second listener
has not deteriorated the techniques, they presen t differen t p erformances in terms of spatial
repre- sen tation and notably presen t a differen t sense of immersion. Thus, the test’s purp ose
to b e designed m ust b e tak en in to account when defining the aural- ization metho d. There is
a clear degradation when t wo test sub jects are sim ultaneously presen t b oth in off-cen ter p
ositions, regardless of the distance of a second listener. The VBAP measurements show ed an
increase in differences for ITD increasing the Chapter 3. Concluding Remarks 96 distance from the
cen ter and significan t differences in ILD. These differences indicate the creation of acoustic
artifacts, p ossibly generated by the metho d’s difficult y in correctly virtualizing high
frequencies outside the sw eet sp ot. F or the ITD parameter, the displaced p osition of 25 cm
of the center has little difference or evidence of artifacts generated by virtualization errors.
At the same time, the other distances present significant differences and artifacts. The binaural
cues analysis suggests that VBAP is less sensitive to the participant p ositions than the
Ambisonics setup. Ho w ev er, it is relev ant to note that although the differences in the
binaural cues denote differences in audio spatialization, reflecting on the p erceived angle of
incidence of the sound, b oth techniques can b e calibrated to repro duce the stim uli at a
desired lev el of sound pressure. That means that an auralized sound can b e repro duced with
the correct sound pressure level although its direction ma y not b e correctly interpreted b y
the listener as their binaural cues are not b eing deliv ered appropriately . Chapter 4 Sub
jectiv e Effort within Virtualized Sound Scenarios This exp eriment was a collaborative study
(EcoEG [ 3 ]) with fello w HEAR- ECO PhD student Tirdad Seifi-Ala, also from the Univ ersit y of
Nottingham, that combined the virtualization of sound sources and electro encephalograph y (EEG)
to assess listening effort in ecologically v alid conditions. Both studen ts con tributed equally
to the study design, preparation, data collection and in- terpretation. TSA additionally p
erformed the data analysis; SA additionally p erformed the ro om simulations, stimuli
preparation, soft w are interface and sound calibration. As definitions can v ary , this chapter
uses the following terms: • Simulation: Numerical acoustic sim ulation of spatial behavior of a
sound in a defined space. • Auralization: Creation of a file that can b e conv erted to a p erceiv
able sound and contains spatial information. 97 Chapter 4. In tro duction 98 • Sound
Virtualization: Repro duction of an auralized sound file through loudsp eak ers or headphones.
4.1 In tro duction The in terest from researc hers and clinicians in the listening effort
measures has gro wn recently [ 83 , 135 , 210 ], the imp ortance of studying listening effort in
an ecologically v alid sound en vironmen t follows the same trend [ 134 ]. The previous chapter
discussed the feasibility and constraints of the virtual- ized sound field through binaural cues
and foreseeable effects on spatial im- pression and lo calization. This c hapter in v estigates
whether reverberation and the signal-to-noise ratio (SNR) are mo deled in b eha vioral data, b
eing a pro xy of sub jective listening effort in a virtualized sound environmen t. The rev erb
eration is the accumulation of energy reflections (sound) in an en- closed space that creates
diffusion in its sound field [ 256 ]. Reverberation Time, in turn, is an ob jective parameter that
represent s the amoun t of time required to dissipate the energy of a sound source b y
one-millionth of its v alue (60 dB) after the sound source has ceased [ 254 ]. This parameter w
as reviewed in Sec- tion 2.3.3.2 . The remaining sound energy can blur the auditory cues, rapid
transitions b et w een phonemes, and decrease the lo w-frequency mo dulation of a signal; it ma
y compromise sp eech in telligibilit y [ 39 , 112 ]. Since reverberation is a complex
phenomenon, depending on space and fre- quency [ 111 , 185 ], a wide range of ph
ysical-acoustical factors ma y limit some comparisons. F or example, the repro duction method,
the mask er t yp e, the p osition and num b er of sources, the SNR, the sound pressure level of
the pre- sen tation, the reverberation time interv al studied, and the simulated p osition b
eing in a free or in a diffuse sound field. Like the metho dologies, the findings Chapter 4. In tro
duction 99 in terms of rev erb eration influence on listening effort across exp eriments can also
v ary . Previous studies in v estigated the effect of rev erb eration on sp eec h in telligi-
bilit y and listening effort. V ariations across reverberation time, level, and p opulation
groups were observed. F or example, a correlation b et w een age and rev erb eration was traced
in work b y Neuman et al. [ 203 ]. This study found that rev erb eration negatively impacts the
necessary SNR to reach 50% of sp eec h recognition. This impact v aries across ages, with the
effect decreasing as age increases. The sensitivity of sub jective measures and electro dermal
activities w ere ev aluated b y Holub e et al. [ 121 ]. The effect of reverberation was found
statistically significan t to sub jective measures but not to the electrodermal activit y . A
study from Picou et al. [ 225 ] presents a resp onse time in a dual- task paradigm as a b eha
vioral measure of listening effort. In their study , there w as no significant effect in resp onse
time neither in the same SNR conditions nor comparing the resp onse time of equal p erformance
scores. The impact on listening effort w as studied b y Kw ak et al. [ 149 ] through sub jective
ratings resulting in a significant effect of reverberation on ratings of listening effort and the
sentence recognition p erformance. In Nicola and Chiara’s study [ 204 ], the negative influence
of reverberation on response time was considered in- dicativ e of an increase in listening
effort. The study assessed the influence of rev erb eration and noise fluctuation on resp onse
time. The different metho d- ologies applied in the studies and their groups of participan ts m
ust b e carefully analyzed, as they can explain the differen t results. Am bisonics arrangemen ts
(Mixed Order Am bisonics (MO A) [ 78 , 177 ] and HO A) are already used in audiological studies
[ 7 , 77 , 173 , 303 ]. This study prop osed a low-order (first-order) Ambisonics implemen
tation. The low-order tec hnique is more sensitiv e to the listener position [ 64 , 65 ], whic h
w as also veri- fied in this study . That can b e seen as a counter-in tuitive and non-con ven
tional Chapter 4. Metho ds 100 c hoice, although it was mean t to assess lo w-order Am bisonics’
feasibility in audiological studies and its constraints. This decision w as a step tow ards con-
firming the feasibilit y of a listener in a cen tralized p osition found in Chapter 3 , observing
its constrains, and further dev eloping an auralization metho d with lo w er hardware
requirements in Chapter 5 . Hyp othesis The main researc h question is how the auralized
acoustic sce- nario, sp ecifically the ro om and the SNR, increases auditory effort when vir-
tualized. The hypothesis for the exp eriment is that a longer R T provided through sound
virtualization and a low er SNR b oth lead to a more significant listening effort. Rev erb eration
time can influence normal hearing and hearing- impaired p eople in different wa ys. F or example,
on a v erage, hearing-impaired listeners exp erience more significan t difficulties with
understanding sp eec h in a reverberant condition than normal hearing listeners, so they can
suffer more from the strain of listening. As rev erb eration’s effects on hearing-impaired
listeners v ary (see Chapter 2 ), this study employ ed only normal-hearing par- ticipan ts to
inv estigate the effects of audio degradation. T o sub jectiv ely assess c hanges in hearing
effort, a questionnaire was provided to participan ts, asking ho w muc h effort they found for
each condition (describ ed in Section 4.2 ). This in v estigation is the first step to
understanding the feasibility of including the simplified virtualization of sound sources in the
expanding field of listening effort research. 4.2 Metho ds This exp eriment w as designed to
gather data for tw o parallel analyses: the first w as to ev aluate differences in b eha vioral p
erformance (sp eech recognition) and sub jectiv e impressions of listening effort driven b y
differen t scenarios, manip- Chapter 4. Metho ds 101 ulating the ro om t yp e and the
signal-to-noise ratio (SNR). The second study compared physiological resp onses of the brain as
measures of listening effort to the same b eha vioral p erformance. This c hapter fo cuses on the
exp eriment’s first study (b ehavioral data vs. sub jective impressions). Three ro oms were c
hosen for this study: a classro om, a restaurant dining area, and an anec hoic ro om. F or this
exp erimen t, a setup was dev elop ed to in vestigate the influence of lis- tening effort caused
in nine differen t situations: three ro om sim ulations c harac- terized by their reverberation
time and three SNRs. The setup was comp osed of four recorded talkers acting as maskers and one
talk er acting as the target. The talkers’ p ositions were all spatially separated. The test
paradigm inv olved the auditory presen tation of Danish hearing in noise test (HINT) sentences [
205 ] on top of four sp eec h maske rs and recalling the words they could keep in memory after 2
seconds. The sound sources are spatially distributed and the participan t is informed that the
target sp eec h is alw a ys from the front. The participants resp onses w ere w ord scored (
i.e. , w ord-based sp eec h intelligibilit y) b y Danish-sp eaking clinicians. The metho d in
this study follows a similar setup with a four-talker babble setup as in [ 209 , 302 ], whic h
in v estigated SNR and masker t yp es using pupilometry as a proxy for listening effort. Also, a
study from W endt et al., [ 301 ] inv es- tigated the impact of noise and noise reduction
through an equiv alent setup. This method’s inno v ation relies on using first-order Am bisonics
to generate the rev erb eration based on Odeon simulated ro oms. Chapter 4. Metho ds 102 4.2.1 P
articipan ts F or the data collection, 18 normal-hearing native Danish-sp eaking adults (eigh t
females) with an av erage age of 36.9 ± 11.2 y ears first ga v e written consen t form and
initially participated in the test. One participant was placed outside the sound field sw eet sp
ot, so his data w ere discarded, and the data for the other 17 participan ts w ere used for
further analysis. Ethical approv al for the study w as obtained from the Research Ethics
Committees of the Capital Region of Denmark. F or eac h participan t, the pure-tone a verage of
air conduction thresholds at 0.5, 1, 2 and 4 kHz pure tone audiometry (PT A4) w ere tested and
confirmed b elow 25 dB HL. 4.2.2 Stim uli The target stimulus consisted of simple Danish
sentences sp oken by a male sp eak er. The sen tences were from the HINT in Danish [ 205 ] and w
ere 1.3-1.8 s in duration. The masking signal consisted of four differen t sp eakers, t w o
female and t w o male, reading a Danish-language newspap er [ 302 ]. The total duration of each
of the masker recordings w as appro ximately 90 seconds. The mask ers’ onset w as 3 s b efore
and offset w as 2 s after the target, resulting in a masker duration of 6.3-6.8 s. In each trial,
the time segment used of each mask er was randomized. In addition, the spatial p osition for
each masker was also randomized in each trial, but alwa ys in tersp ersing male and female talk
ers. The ov erall maskers’ equiv alen t contin uous sound level L eq w as set at 70 dB (64 dB
each mask er), and the target L eq w ere set at 62 dB, 67 dB and 72 dB to generate three
different SNR conditions: -8, -3 and +2 dB. In this study , SNR w as defined as the equiv alent
contin uous sound lev el of the target signal compared to the comp eting masking L eq . The c
hosen rev erb eration conditions aimed to represent common everyda y situations. The R T of the
anechoic and Chapter 4. Metho ds 103 rev erb eran t conditions studied w ere defined as the ov
erall rev erb eration time obtained through the output of the simulation soft ware (ODEON Softw
are © v.12). The chosen rev erb eration time v alues aim to represent common everyda y
situations. The absorption co efficien ts and relativ e area utilized to obtain the men tioned
conditions are presen ted on App endix E . Fiv e source p ositions (one target and four maskers)
w ere created around a receptor in eac h sim ulated ro om. All p ositions w ere 1.35 m from the
center of eac h ro om where the receptor is lo cated. The approac h of creating t wo differen t
rooms instead of changing the parameters of a single room w as c hosen to achiev e a more
natural sound field. That w ay the absorption co efficien t applied to the ro om’s materials was
kept close to real. The virtualization of the prop osed a coustic scenarios follo ws the path
indicated in Figure 4.1 . An acoustic simulation is p erformed to create the appropriate c
haracteristics of the sound according to the ro om. The softw are calculates the amplitude and
the incidence directions of sound and its reflections arriving from sp ecific sources to a
receptor p osition inside the ro om. F or eac h com bina- tion of source-receptor, the softw are
generates a ro om impulse resp onse that is enco ded in Amb isonics first-order in Am biX [ 198 ]
format (which is a channel order sp ecification for Ambisonics auralization first 4 channels are
WYZX com- pared to WXYZ in the F uMa sp ecification). The generated file w as con v olv ed with
anechoic audio and deco ded to the sp ecific arra y of 24 loudsp eak ers. Chapter 4. Metho ds 104
Figure 4.1: Aur alization pr o c e dur e implemente d to cr e ate mixe d audible HINT sentenc es
with 4 sp atial ly sep ar ate d talkers at the sides and b ack (maskers) and one tar get in fr
ont. 4.2.3 Apparatus The exp eriment w as set up in an anechoic ro om (IAC Acoustics) with 4.3 m
× 3.4 m × 2.7 m (inner dimensions). The exp erimen tal setup consisted of a circular array of 24
loudsp eak ers p ositioned on 15 ◦ in terv al on the azim uth and 1.35 meters distance from the
center. The target sound was repro duced at 0 ◦ (participan t’s front), the maskers were
auralized at ± 90 ◦ and ± 150 ◦ (Figure 4.2 ). The p osition of the participan t during all the
test w as monitored through a laser line and a camera ensuring they remained in the sweet sp ot.
Stim uli were routed through a sound card (MOTU PCIe-424) with Firewire 440 connection to the
MOTU Audio 24 I/O interface) and were play ed via 16 loudsp eak ers Genelec 8030A and 8 loudsp
eak ers Genelec 8030C (Genelec Oy , Iisalmi, Finland) aligned in frequency and level. The
Biosemi EEG device was used to collect the ph ysiological data, whic h help ed to restrain
participan ts’ mo v emen t; the EEG data were not analyzed in this study . Chapter 4. Metho ds
105 Figure 4.2: Sp atial setup of the exp eriment: T est subje cts attende d to tar get (in
blue) stimuli fr om a 0 ◦ angle in fr ont.The masking talkers (in r e d) ar e pr esente d at
later al ± 90 and r e ar ± 150 p ositions. All enclosed spaces ha v e a certain degree of
reverberation due to acoustically reflectiv e surfaces and bac kground noise due to equipment,
including cont rolled audiological environmen ts. The lev els of reverberation and bac kground
noise meet the criteria from Recommendation ITU-R BS.1116-3 [ 126 ] and are re- sp ectiv ely sho
wn in Figures 4.4 and 4.3 . Figure 4.3: R everb er ation Time inside ane choic r o om at
Eriksholm R ese ar ch Cen- tr e with setup in plac e. Chapter 4. Metho ds 106 Figure 4.4:
Eriksholm Ane choic R o om: Backgr ound noise A-weighte d. L oudsp e ak- ers and lights on,
motorize d chair off. The parameters were measured with the setup (loudsp eak ers, motorized c
hair and BioSemi eeg equipmen t) inside the ro om and p ositioned as in the exp eri- men t.
Figure 4.5 sho ws the setup placed inside the anechoic ro om. Figure 4.5: Setup inside ane choic
r o om (Motorize d chair, adjustable ne ck supp ort and EEG e quipment). Chapter 4. Metho ds 107
4.2.4 Auralization Acoustic Scene Generation and Ro om Acoustic Simulation T o simulate the
acoustics c haracteristics of the chosen scenarios, geometric mo dels were created in the ro om
acoustics soft ware ODEON. Next, the Am- bisonics Ro om Impulse Resp onses w ere sim ulated
using ODEON softw are, v ersion 12 [ 59 ]. The absorption co efficients of the ro om surfaces are
listed in Annex E . All sen tences w ere auralized in Am bisonics [ 15 ], truncated by 1st order
and enco ded to 24 c hannels. The analysis utilized the Institute of T ec h- nical Acoustics (IT
A)T o olb o x [ 29 , 67 ]. Ro oms w ere chosen as representativ e of realistic and not extreme
acoustic conditions. The spaces simulated were a classro om (9.46 m × 6.69 m × 3.00 m) with an
ov erall R T of 0.5 seconds, and a restaurant’s dining area (12.19 m × 7.71 m × 2.80 m) with an
ov erall R T of 1.1 seconds. The distance b et w een source and receptor w as kept the same,
1.35 m, across rooms. T arget and mask er p ositions w ere sim ulated b y selecting the
appropriate sim ulated RIR to conv olve i.e., the sim ulated source-receptor RIR that corresp
onds to the desired repro duction angle. Am bisonics Sw eet Sp ot In this study , tw o different
metrics w ere used to compare the off-center p erformance of virtual sources auralized with first-
order Ambisonics: the R T and the Sound Pressure Lev el (SPL). That is, the presen ted
virtualized soundfield w as deliv ering the correct amoun t of rev erb er- ation and also the
correct sound pressure level of eac h source resulting in the appropriate signal-to-noise ratio
when was not p erfectly cen tered. T o estimate eac h p osition’s metrics, a logarithimc sweep
signal (50-20000 Hz, 2.73 s (FFT Degree 18, Sample F requency 96 kHz)) was generated and conv
olved with the Am bisonics first-order RIR calculated by ray-tracing in ODEON for eac h mod- eled
ro om. The simulated ro oms presen ted an o v erall theoretical rev erb eration Chapter 4. Metho
ds 108 time of 0, 0.5, and 1.1 s. These auralized files were enco ded to 24 channels distributed
in the horizon tal axis. In the following, the files were play ed in- side the anec hoic ro om
and simultaneously recorded. F rom the division in the frequency domain of the recorded signal
and the zero-padded initial signal (de- con v olution), the calculated impulse resp onse (or
binaural RIR (BRIR) when recorded with HA TS) represen ts the virtualized system, including the
physical effects of the arra y and all calibration. Rev erb eration Time The R T w as calculated
with IT A-T o olb o x from ini- tial 20-dB decrease from p eak level ( T 20 ) in the virtualized
IRs. Figure 4.6 sho ws the ov erall R T results at the center position and by mo ving the
receptor (manikin) tow ards the fron t. Figure 4.6: Over al l r everb er ation time (R T) as a
function of r e c eptor (he ad) p osi- tion in the mid-saggital plane r e c enter (0 cm) The
results show ed slightly greater R Ts (0.58 and 1.16 s) than what w as sim- ulated in the ODEON
soft ware (0.5 and 1.1 s). How ever, this was exp ected since there is equipment inside the
anechoic room (e.g., a large c hair and loud- sp eak ers) that can b e considered reflective
surfaces that were not presen t in Chapter 4. Metho ds 109 the sim ulation. The results show ed
that there is no ma jor effect on the energy deca y for small head mov ements. Sound Pressure lev
el The sound pressure level w as determined b y con- v olving the target and mask er sounds with
the impulse resp onses collected across t welv e p ositions with horizon tal displacements of
2.5, 5 and 10 cm and forw ard (mid-saggital) displacements of 2.5 and 5 cm. The results are
shown in Figure 4.7 . F our sp eech talkers are individually conv oluted. The equiv alent sound
pressure lev el is determined using the calibration factor. The measure is the av erage of 20
different sen tences. Figure 4.7: Sound pr essur e level virtualize d thr ough Ambisonics at
differ ent listener p ositions. The changes in SPL as a function of off-cen tre p osition do not
follo w a consis- ten t pattern. The SPL changes w ere, ho w ev er, mostly similar across the
three sim ulated ro oms, with the exception of three p ositions where the restauran t (1.1 s R
T) was 1-1.5 dB different (x = 2.5, y = 0; x = 0, y = 2.5; x = 10; y = 2.5). The cen ter p
osition is the optimal p osition for sound pressure level accuracy . T o help get reliable,
appropriate data from the exp eriment, a neck rest as well as a video feed and laser line were
added to the setup after the Chapter 4. Metho ds 110 first pilot test. The participan ts were ask
ed to b e in con tact with the neck rest all the time. The clinician was able to s ee the laser
line at the patient’s head throughout the test. They could ask the participan t to quickly
correct p osture at the start of eac h blo c k or at an y p oint of the session after the
participant needed a break. Figure 4.8 sho ws a participan t positioned with all sensors con-
nected. Another imp ortan t find w as, after adjusting the participan t p osition, the motorized
c hair should b e unplugged, otherwise the EEG data would b e compromised. Figure 4.8: Particip
ant p ositione d to the test. 4.2.5 Pro cedure There w ere 9 differen t conditions based on SNR
(+2 dB, -3 dB, -8 dB) and rev erb eration time (0 s, 0.5 s, 1.1 s) of the sound. Eac h condition
was presented in separate blo cks, and each blo c k consists of 20 sen tences, so in total there
w ere 9 blo c ks and 180 sen tences presented to the participan ts in the main test. In addition
to that, each participant w en t through a training round in the b eginning, consisting of 20
sentences with differen t conditions. The pro cedure for each trial is illustrated in Figure 4.9
. Each trial started with 2 s Chapter 4. Metho ds 111 of silence (preparation), then 3 s of
background noise which served primarily as a baseline p erio d for the separate EEG analysis.
Then a HINT sen tence w as play ed as the bac kground noise contin ued for 1.5 s on a v erage.
After the target sen tence finished, the bac kground noise contin ued for another 2 seconds
during whic h participan ts needed to main tain the w ords they just listened to (main tenance),
also serving primarily for the companion analysis of EEG resp onses re baseline. When the
background noise w as stopp ed, the participan ts were instructed to rep eat all the w ords
within the sen tence (recall). The listening effort reflected in alpha p o w er changes in the
maintenance phase ha v e b een inv estigated b y [ 208 , 310 , 311 , 313 ] Figure 4.9: T rial
design. F or e ach trial, 20 in e ach blo ck, ther e was 2 s of silenc e, then 3 s of masker (4
sp atial ly sep ar ate d talkers), then a Danish HINT sentenc e as tar get stimuli in the pr
esenc e of c ontinuing masker, then 2 additional s of masker, fol lowe d by silenc e when the p
articip ant r ep e ate d as many tar get wor ds as they c ould understand and ke ep in memory.
Figure 4.10 shows the user graphical user interface designed and implemented for this
experiment. The 24-channel audio files were produced b eforehand (off- line), b eing calibrated to
the sp ecific setup. Along with audio presen tation, the softw are also sent a series of triggers
in synch with presentation timings to the EEG softw are (Actiview, BioSemi) to mark the EEG
measuremen t Chapter 4. Metho ds 112 appropriately for the companion analysis. Figure 4.10: Gr
aphic User Interfac e use d to ac quir e the data fr om p articip ants. Wor ds ar e state
buttons that alternates b etwe en gr e en and r e d b eing save d as 1 or 0 r esp e ctively.
4.2.6 Questionnaire A t the end of each blo c k (SNR × ro om condition) a three-item
questionnaire w as presen ted to the participan ts; the English translation is sho wn in T a-
ble 4.1 . The questionnaire was translated from Zekveld and Kramer [ 318 ] to Danish. The resp
onse to each question had a scale of 0 to 100 in integer units App endix F . The first question
was aimed to measure participan ts’ estima- tion of their p erformance, referred to as “Sub
jective in telligibilit y” for the rest of the text. The second question was to measure
participan ts’ p erception of effort, referred to as “Sub jectiv e effort”. The third question
provided to mea- sure ho w often participants gav e up during the test, referred to as “Sub
jective disengagemen t”. Chapter 4. Results 113 T able 4.1: The questionnaire for sub jective
ratings of p erformance, effort and en- gagemen t (English translation from Danish) Question 1
How man y w ords do y ou think that you understo o d correctly? Question 2 How m uc h effort did
y ou sp end when listening to the sentences? Question 3 How often did you give up trying to p
erceive the sen tences? 4.2.7 Statistics A linear mixed mo del [ 171 , 233 ] (LMM) was used to
in v estigate SNR and R T effects on p erformance and questionnaire. The effects on differen t
alpha bands through EEG p ow er b y SNR and R T w ere also explored through LMM in the collab
orative analysis p erformed b y Seifi Ala. SNR and R T w ere fixed factors, while participan ts w
ere random factors in the mo del. Implemented in MA TLAB, the syntax for LMM was Dep endent ∼
1+SNR*R T+(1—Sub ject ID), with Dependent b eing either p erformance or questionnaire. Both the
SNR (-5, 0, 5) and R T (-0.53, -0.03, 0.56) lev els w ere re-cen tered around zero for the mo
del. 4.3 Results This section highlights the findings concerning the study’s questions, the fea-
sibilit y of having a hearing in noise test virtualized in first-order Am bisonics, and the
influence of degradation through SNR and Reverberation in the Sp eech In telligibilit y . The
participant’s b eha vioral p erformance ( i.e. , speech recognition accuracy) demonstrated
significant effects of SNR ( β = 5 . 98 , S E = 0 . 30 , t 158 = 19 . 67 , p < 0 . 001),
and R T ( β = − 31 . 17 , = 1 . 78 , t 158 = − 17 . 49 , p < 0 . 001) and a significan t
in teraction betw een the tw o ( β = 1 . 76 , S E = 0 . 43 , t 158 = 4 . 04 , p < 0 .
001). Figure 4.11 presents the mean p erformance (p ercen t correctly recalled w ords) as a
function of SNR for eac h ro om. Less signal degradation, whether higher Chapter 4. Results 114
SNR or low er R T led to higher p erformance accuracy . Figure 4.11: Performanc e ac cur acy b
ase d on p er c entage of c orr e ctly r e c al le d wor ds as a function of SNR and R T (line c
olor/shading). Err or b ars r epr esent the standar d err or of the me an. Lines/symb ols ar e
stagger e d for le gibility and do not indic ate variation in SNR. The statistical analysis of
the results for sub jective in telligibility (Figure 4.12 ), sub jectiv e effort (Figure 4.13 ),
and sub jectiv e disengagemen t (Figure 4.14 ) are sho wn in T able 4.2 . All the measures show
a significan t interaction b et ween SNR and R T. Lo wer signal degradation (higher SNR and lo w
er R T) led to higher sub jective estimation of intelligibilit y p erformance accuracy ,
decreased rep orted effort and disengagement. T able 4.2: Results of linear mixed mo del based on
SNR and R T predictors estimates of the questionnaire. DF = 158 Self-rep ort scales Question
Predictor Sub jective in telligibility Sub jective effort Sub jective disengagemen t SNR β = 5.71
S E = 0.42 t = 13.48 p < 0.001 β = -5.60 S E = 0.41 t = -13.57 p < 0.001 β = -5.78
S E = 0.48 t = -11.85 p < 0.001 R T β = -33.74 S E = 2.47 t = -13.61 p < 0.001 β =
23.58 S E = 2.41 t = 9.76 p < 0.001 β = 33.39 S E = 2.85 t = 11.68 p < 0.001 SNR x
R T β = 1.56 S E = 0.60 t = 2.57 p = 0.010 β = 1.50 S E = 0.59 t = 2.54 p = 0.012 β = -2.06 S E
= 0.69 t = -2.94 p = 0.003 Chapter 4. Results 115 Figure 4.12: Subje ctive intel ligibility as a
function of SNR and R T (line c olor/shading). Err or b ars r epr esent the standar d err or of
the me an. Lines/symb ols ar e stagger e d for le gibility and do not indic ate variation in
SNR. The sub jective impression of how m uc h effort was required and how willing they were to
give up in each situation are presented in Figures 4.13 and 4.14 , resp ectiv ely . Figure 4.13:
Subje ctive effort as a function of SNR and R T (line c olor/shading). Err or b ars r epr e- sent
the standar d err or of the me an. Lines/symb ols ar e stagger e d for le gibility and do not
indic ate variation in SNR. Figure 4.14: Subje ctive disengage- ment as a function of SNR and R
T (line c olor/shading). Err or b ars r ep- r esent the standar d err or of the me an.
Lines/symb ols ar e stagger e d for le gibility and do not indic ate variation in SNR. Chapter
4. Discussion 116 The results show the statistically significant contributions of rev erb eration
and SNR to p erceived p erformance, effort and disengagemen t. F rom Figures 4.12 , 4.13 , and
4.14 , the self-rep ort scales v aried near-linearly with the signal degra- dations across
conditions, agreeing generally with the b eha vioral data (See Figure 4.11 ). The sub jective
effort is related to the in verse of the reverberation time; the more time the energy needs to
dissipate in the en vironmen t, the greater the p erceiv ed effort. The results from all the
self-report scale questions w ere highly correlated to p erformance. P earson skipp ed
correlations [ 308 ] revealed a sig- nifican t ρ co efficient (See T able 4.3 ): T able 4.3: P
earson skipp ed correlations betw een performance and self-reported ques- tions. p erformance vs
sub jectiv e intelligibilit y p erformance vs sub jectiv e effort p erformance vs sub jectiv e
disengagement r 0.95 -0.79 -0.94 CI 0.93, 0.96 -0.84, -0.74 -0.96, -0.92 4.4 Discussion This
study presen ted an in teresting challenge to the researchers. The pilot data p ointed to the
direction of the virtualization not rendering the correct sound, esp ecially not the correct
sound pressure level. The setup was retested and in vestigated in different p ositions, and the
problem w as identified. The first-order Ambisonics rendering has a relatively small sw eet sp ot.
Thus par- ticipan ts were monitored to b e in the correct p osition during the testing. The sw
eet sp ot capabilities in terms of the correct ov erall SPL repro duction pre- sen ted
limitations of plus-minus 1 dB up to 5 cm and plus-minus 3 dB to the target SPL up to 10 cm off
center. Although not testing the exact Am bisonics implemen tation through a different p
erformance measure, the findings agree with literature observing con trasts caused by the repro
duction metho d in sim- Chapter 4. Discussion 117 ilar distances out of the cen ter. As a
reference, Grimm et al. [ 97 ] analyzed sim ulated Ambisonics en vironmen ts with differen t num
b ers of loudsp eak ers, studying its influence on a represen tative hearing aid algorithm. It
sho wed a decrease in SNR errors when increasing loudsp eakers and decreasing frequency . A
bandwidth of 2 kHz in the central listening p osition, 12 loudsp eak ers w ould b e required for
HOA. If 24 loudsp eak ers are a v ailable, the bandwidth in the cen tral listening p osition
would b e 6 kHz. Laurent et al. [ 276 ] analyzed the reconstruction error to assess the
rendering system’s frequency capabilities. A KEMAR was fitted with a hearing aid, without pro
cessing, to collect the impulse resp onses. Regarding range, a third-order implemen tation with
29 loudsp eak ers decreased from 3,150 Hz in the center to 2,500 Hz when p osi- tioned 10 cm
from the center. T ests that in v olv e separated sound sources and are auralized and
virtualized b y loudsp eak er setups need to b e v erified in terms of sweet sp ot size to the sp
ecific sound parameters (e.g., R T and SPL). An off-cen tered or mo ving head can, in an Am
bisonics 1st order auralization, easily encoun ter a sp ot in space where, for example, the wa
ve field com bination ma y partially cancels one or more mask ers increasing the SNR, even if the
in tended SNR is low (See Figure 4.7 ). In other off-center sp ot it could also b e p ossible to
partially cancel the target. These distortions could profoundly impact the results, and not
represen t what w ould b e achiev ed in the real scenario that was b eing sim ulated. F or
normal hearing participants, a more psychologically orien ted psyc hoacous- tic auralization
metho d such as lo wer order Ambisonics can pro vide the desired acoustic impression insofar as
ob jective and sub jectiv e p erformance when the calibration is p erformed, and the setup
limitations (e.g., v ery restricted sweet sp ot) are resp ected. An in vestigation of p
erformance in off-center p ositions using hearing impaired participants would b e an imp ortant
next step to w ards understanding a broad clinical application of this metho d. Chapter 4.
Concluding Remarks 118 P articipan ts were tested in three different SNRs (-8, -3, +2 dB) and
three vir- tual ro oms (with R Ts of 0, 0.5, and 1.1 s). The more the manipulated signal w as
degraded (low er SNR and higher R T), the more demanding the listening conditions b ecame, which
led to lo w er the participant’s sp eec h in telligibility . A questionnaire w as used as a sub
jective measure of effort. Comprehensiv ely , participan ts rep orted increased sp eech
intelligibilit y , less cognitive effort, and less tendency to disengagement when diminish the
signal degradation. That denotes that if they could recall the sp eec h well, they p erceived
that they p erformed w ell and also sp en t less effort. The results from all three ques- tions
within the questionnaire were strongly correlated (either p ositively or negativ ely) to the sp
eec h in telligibility of the participants. They significantly c hanged with both SNR and R T and
the in teraction b etw een them. When ask ed ab out sub jective impressions of eac h block, the
participants demon- strated to ha v e p erceiv ed the prop osed signal degradation b oth in SNR
and R T. That is in line with the studies from Zekveld et al. [ 319 ], Holub e et al. [ 121 ],
Neuman et al. [ 203 ], Kw ak et al. [ 149 ], Nicola & Chiara’s study [ 204 ] and Picou
& Ric ketts [ 229 ]. F urthermore, studies that cross ob jectiv e measurements of ph
ysiological parameters in the literature asso ciated with c hanges in effort can ha v e div ergen
t outcomes, as discussed in Chapter 2 . F rom that discussion, it is sp eculated that these
differen t metho ds, prop osed to ac hiev e a proxy to listen- ing effort, are sensitiv e
separated asp ects of a complex global pro cess [ 12 , 224 ]. Another explanation would b e the
minimization of effort utilized b y the par- ticipan t through the heuristic strategies in the
sub jective metho d [ 192 ], and lastly , the effect of w orking memory being related differently
to differen t meth- o ds [ 53 , 186 ]. A separated study by Tirdad Seifi-Ala from this combined
ex- p erimen t examined the correlation b et w een ob jective (physiological resp onses of the
brain) and sub jective paradigms. Chapter 4. Concluding Remarks 119 4.5 Concluding Remarks In
this study , nine lev els of degradation were imp osed on sp eech signals ov er sp eec h mask
ers separated in space and virtualized. Three different SNRs (-8, -3, +2 dB) and three different
sim ulated rooms (with R Ts of 0, 0.5, 1.1 s) w ere used to manipulate task demand. The sp eec h
intelligibilit y w as assessed through a w ord-scored sp eec h-in-noise test p erformed in a 24
loudsp eaker setup utilizing Am bisonics first-order. The results sho w ed a high correlation
betw een participan ts’ p erformance and resp onses to questions ab out sub jective intel-
ligibilit y , effort and disengagement. The main effects and interaction of SNR and R T w ere
demonstrated on all questions. F urthermore, it w as observ ed that the rev erb eration time
inside a ro om impacts b oth sp eec h intelligibilit y and listening effort. This study
demonstrated the p ossibility of virtualizing a com bination of sound sources in low order Am
bisonics and extracting quality b eha vioral data. Chapter 5 Iceb erg: A Hybrid Auralization
Metho d F o cused on Compact Setups. 5.1 In tro duction P eople usually wear their hearing
devices in spaces very differen t from the lab oratories’ soundpro of b o oths in ev eryday life.
Additionally , the everyda y sounds are more complex and different from the pure tones, w ords,
and phrases without context utilized in many hearing tests. Therefore, hearing researc h has
increasingly aimed to include acoustic verisimilitude on auditory tests to make them more
realistic and/or ecologically v alid [ 61 , 79 , 101 , 177 , 212 , 217 ]. Thus, they can ev
aluate new features and algorithms implemen ted on hearing devices and exp erimen t with
different fittings and treatmen ts while maintaining their rep eatabilit y and control. One can
utilize a particular auralization tec hnique to create repro ducible sound 120 Chapter 5. In tro
duction 121 files in a listening area. These sounds attempt to mimic the acoustical charac-
teristics of en vironmen ts (from actual recordings or acoustic sim ulations). It can then b e
play ed through a set of loudsp eak ers or pair of headphones, cre- ating b oth the sub jective
impression and ob jective represen tation of listening to the intended sound en vironment [ 293
]. Through an auralization metho d, it is p ossible to create a sound file con taining spatial
information ab out the scene and a series of details ab out the configura- tion of the repro
duction system [ 293 ]. The repro duction system includes, for example, the n um b er of loudsp
eak ers and their physical p osition, the n um b er of audio channels av ailable, and the
distance from the loudsp eak ers to the lis- tening p osition. The size of the effective
listening repro duction area - where the auditory spatial cues of the scene are most accurate -
is usually called the ”sweet sp ot” [ 253 ]. The spatialization accuracy is differently affected b
y differen t systems as w ell as auralization metho ds [ 65 , 97 , 166 , 275 , 276 ]. The
auralization metho d can be decisiv e in the repro duction system choice; for example, certain
metho ds require certain num b ers of loudsp eakers [ 62 , 217 ]. Consequen tly , the
auralization metho d can b e a limiting factor dep ending on the tests or experiments. A
dedicated setup capable of handling different auralization metho ds with a large listening area [
188 ] ma y require an excessiv e amoun t of funding and ph ysical space. These requiremen ts can
b e a limiting factor to conducting researc h and developing inno v ativ e treatments. This c
hapter prop oses a compact setup with a h ybrid auralization metho d. It is characterized in
some conditions (R Ts, presence of a second listener, and listener p osition) by considering the
intended use in auditory ev aluations as in the previous c hapter. The setup aims to repro duce
sound scenes maintaining spatial lo calization and creating an immersive sound en vironmen t
from either a scenario in an actual ro om or virtual ro oms created in acoustic softw are.
Chapter 5. Iceb erg an Hybrid Auralization Metho d 122 5.2 Iceb erg an Hybrid Auralization Metho
d The Iceb erg auralization metho d combines t w o well-kno wn metho ds: VBAP and Am bisonics.
In Chapter 3 , VBAP and Am bisonics binaural cues w ere ob jectiv ely ev aluated. The VBAP metho
d was found to render accurate cues in the center p osition, even with a second listener inside
the array . That corrob orates the use of VBAP to increase tests’ ecological v alidit y in
auditory tests [ 134 ]. On the other hand, Ambisonics deliv ered less precise lo calization
cues, imp osing more restrictions on the listener’s position. The results are in line with
literature presen ting p oor lo calization but high immersiv eness from lo w-order Am bisonics [
104 , 105 ] and, conv ersely , lesser immersiveness and greater lo calization accuracy from VBAP
[ 89 , 104 ]. Therefore, the idea here is to provide an auralization that contains temp oral and
sp ectral features of the sounds encoded through VBAP while the spaciousness pro vided through
the reverberation env elop e is enco ded through Ambisonics. This sp ecific com bination of
auralization metho ds has also b een considered to decrease the n um b er of necessary loudsp
eak ers for a setup that requires regular hearing devices. A t the same time, the setup ma y
allo w some degree of head mo v emen t without the need for tracking equipmen t. That is a coun
termeasure to ov ercome common limitations in ordinary auditory test spaces [ 316 ]. 5.2.1 Motiv
ation The primary motiv ation for creating this auralization method w as to test hear- ing aid
users in t ypical situations while w earing hearing devices in a small setup. Therefore, the
method is loudsp eak er-based, but at the same time, the num b er of loudsp eak ers and the
system complexity w ere also constraints. The theoretical supp ort for com bining these
auralization metho ds and prop os- Chapter 5. Iceb erg an Hybrid Auralization Metho d 123 ing
the smaller virtualization setup is gathered from ro om acoustic parameters and psychoacoustics
principles presented in the review and during this c hapter. These parameters and principles led
to a system able to use RIRs from simu- lated environmen ts (spaces that ma y only exist in a
computer) and recorded RIRs from real ones. The initial Iceb erg fo cus is on tests that
manipulate sound scenarios to ev aluate sp eec h in telligibilit y mask ed b y noise from static
p ositions as tested with lo w-order Ambisonics in Chapter 4 . 5.2.2 Metho d The Iceb erg metho
d is a relatively easy-to-use algorithm that can be intro- duced to test environmen ts with a
simple calibration pro cess. The virtual- ization system presen ted auralized files in a
quadraphonic arra y with loud- sp eak ers p ositioned at 0, 90, 180 & 270 ◦ (see Figure
5.1 ). Other horizon tal setup arrangements can b e implemented dep ending on the need,
considering the system’s angle rotation, frequency resp onse, and the p otential v ariation in
lo calization accuracy . Although there is a minimum num b er of necessary loudsp eak ers
(four), the metho d can b e used to auralize files to setups with an extended loudsp eak er num b
er. The presen ted algorithm was implemented in MA TLAB (Math w orks). Figure 5.1: T op view. L
oudsp e akers p osition on horizontal plane to virtualization with pr o- p ose d Ic eb er g
metho d. The prop osed loudspeaker setup had a radius of 1.35 m. Other distances need to b e ev
aluated regarding the system frequency resp onse. The prop osed Iceb erg’s implementation
derives an appropriate multi-c hannel audio signal with sp ecific information from a sound and
its reflections (inci- Chapter 5. Iceb erg an Hybrid Auralization Metho d 124 dence angle, sound
energy , spatial and temp oral distribution). These param- eters can b e enco ded into a sound
file with the repro duction setup’s sp ecific calibration v alues and p ositioning orien tation.
Finally , the auralized file can b e repro duced (virtualized) as spatial sound. 5.2.2.1 Comp
onen ts The Iceb erg metho d prop osed is a hybrid auralization metho d, a combination of VBAP
and first-order Ambisonics; section 2.3.2.2 reviews the deriv ation of b oth metho ds. Both tec
hniques are based on the panorama of amplitude. The main difference is in the mathematical
formulation of the gains applied to the amplitude of each sound source. VBAP treats the repro
duced sound as a unitary v ector in a tw o- or three-dimensional plane (Equations 2.4 and 2.7 ,
resp ectiv ely). The weigh ts applied to the amplitude of the signal at each loudsp eak er are
derived from the tangent law. It is traced as a v ector from the nearest a v ailable sources b
et wee n the listening p osition and the desired source position (Equation 2.3 ). On the other
hand, Ambisonics utilizes all loudsp eak ers av ailable to comp ose the sound field. The metho d
com bines the amplitude of the sources, calculating their w eights according to the sum of
spherical harmonics (Equation 2.9 ) that represen ts the pressure field formed b y the sound w av
e (Equation 2.8 ). While VBAP concen trates the energy b etw een t w o loudsp eakers in its 2D
implemen tation, Am bisonics spreads it through all av ailable loudsp eakers. That leads to a
more immersive exp erience on Am bisonics, while the VBAP can b etter represen t the sound
source direction. 5.2.2.2 Energy Balance The energy balance b etw een the metho ds is calculated
based on the Ambisonics first-order impulse resp onse (See example in Figure 5.2 ); on the left
is the Chapter 5. Iceb erg an Hybrid Auralization Metho d 125 impulse resp onse (or deca y
curve) and is not the deca y of the squared v alue of the sound pressure signal. On the righ t,
the 10log curves ( h 2 ( t )) for the differen t channels. Note that in these curves, the maximum
lev el is 0 [dB], as the interest is in the time it tak es for the p o wer to drop by 60 [dB].
Also, note that there is a small time gap b etw een time 0 [s] and when the energy v alue of h (
t ) is maximum. This interv al corresp onds to the time it tak es the sound wa ve to trav el b
et w een the source and the receiver and allows an estimate of the distance b et w een them. F
rom recorded IRs, the gap also includes the system dela y , which should b e comp ensated. That
was the c hoice since the impulse resp onse of an en vironment can easily b e acquired utilizing
an Ambisonics first-order microphone array . F urthermore, it is p ossible to find commercially a
v ailable acoustic soft w are to ols to simulate sound environmen ts capable of exp orting
impulse resp onses in Am bisonics format. Figure 5.2: Normalize d A mbisonics first-or der RIR
gener ate d via ODEON soft- war e. L eft p anel depicts the waveform; right p anel depicts the
waveform in dB. The system’s design requires an RIR to b e split in to t w o parts. The first
part con tains the amoun t of energy to b e deliv ered through VBAP . The second part will b e
computed through Am bisonics. F rom the reflectogram, the time represen tation of the latency and
atten uation of the direct sound (DS), early reflections (ER), and late reflections (LR; see
Figure 5.3 ), it is p ossible to find Chapter 5. Iceb erg an Hybrid Auralization Metho d 126 the
p oint in time representing the direct sound (the first p eak) and then sep- arate it correctly
from the rest of the RIR. Although splitting the RIR into DS and remainder ma y b e the most
straightforw ard metho d, the achiev ed re- sults w ere initially p erceiv ed in p ersonal exp
erience as unnatural highligh ted “dry” (not rev erb eran t) sound from a defined p osition follo
w ed by really dis- tan t/disconnected reverberation, coun ter to the aims of a more
ecologically v alid sound repro duction. Th us, in the prop osed metho d the ER part w as
included with the DS part. Figure 5.3: R efle cto gr am split into Dir e ct Sound Early and L ate
R efle ctions. The late reflections of an RIR refer to the signal wa vefron ts reflected and scat-
tered sev eral times across the different p ossible paths. These reflections ov erlap eac h other,
and as time progresses, successiv e w av efronts interact with an y sur- face, increasing
reflection order, c hanging direction and decreasing remaining sound energy . The literature
indicates that a psychoacoustical appro ximation of the time p oin t in a sp ecific RIR when the
human auditory system can no longer distin- guish single reflections due to reflection densit y [
38 ]. Lindau [ 156 ] prop osed a transition p oin t in time (transition time ( t m )) based on
mean free path length for the wa vefron t (Equation 5.1 ). t m = 20 V S + 12 [ms] , (5.1)
Chapter 5. Iceb erg an Hybrid Auralization Metho d 127 where, V is the v olume of the ro om in m
3 , and S is the surface area inside the ro om in m 2 . The minimum necessary order of
reflections to represent a uniform and isotropic sound field that leads to diffuse rev erb eration
from an Image Source (IS) mo del is 3. That agrees with observ ations from Kuttruf [ 148 ] on
the sp ecular reflec- tions’ contribution to diffuse energy in an RIR. This approach w as
implemented in a similar hybrid metho d by P elzer et al. [ 221 ]. Another metho d dev elop ed b
y F avrot [ 79 ] also uses the IS order information from simulated RIR computed with ODEON soft
w are. Its IS re- flection order information provides a p oin t to obtain a segment of the file
with the late reflections env elop e used by the system to deliv er a hybrid m ulti- c hannel
RIR. These metho ds consider RIR and mix sp ecific stim uli to the output, as do es the prop osed
metho d. Other h ybrid auralization metho ds suc h as DirAC [ 243 ] consider the recording of a
sound even t (in Am bisonics) and driv e the repro duction based on energy analysis spanning all
sound source directions. Th us, DirAC is intended to primarily work with recorded scenes instead
of conv olutions with RIR. 5.2.2.3 Iceb erg prop osition The prop osed Iceb erg metho d, how
ever, uses neither the t m metho d, whic h is dep enden t on the v olume of the ro oms and the
IS sim ulated reflection order, nor the LR env elop e time, derived from an IS sim ulation.
Instead, a different parameter is prop osed that allo ws generalizing to b oth recorded and sim
ulated Am bisonics RIRs. P arameters of clarity and definition are metrics to determine
early/late energy balance [ 43 ]. How ever, the fixed time of 50 or 80 milliseconds is not
appropriate Chapter 5. Iceb erg an Hybrid Auralization Metho d 128 to represent the transition p
oin t (from early to late reflections) on every RIR, as the slop e will differ and dep end on man
y factors [ 45 ]. The transition p oint c hanges as the amoun t of energy and the deca y
distribution change from RIR to RIR. A similar parameter that is not time fixed is the T) given b
y Equation 2.15 (see Section 2.3.3 ). This parameter is also derived from the squared RIR b y
calculating the transition p oin t from early to late reflections represen ted as the RIR’s cen
ter of gravit y . Therefore, the metho d’s name is given b ecause of the singularit y of the
RIRs. They presen t a center of gravit y on its p o w er deca y representation, whic h is
similar to the physical blo cks of frozen w ater called iceb ergs. The cen ter of gra vit y is
the equilibrium p oint b et ween the gra vit y force and the w ater buo y ancy for iceb ergs [
34 ]. This representation is translated to the Iceb erg metho d as the transition p oint b etw
een early and late reflections from an RIR. This pro cess entails an RIR applied through
multiplication in the frequency domain, equiv alent to a conv olution in the time domain, to a
sound that can b e virtualized through the system. The first action of the metho d’s algorithm is
the iden tification the cen ter time Ts in the c hannel relative to the omnidi- rectional channel
of the Ambisonics RIR. A schematic ov erview of the metho d is presented in Figure 5.4 . Chapter
5. Iceb erg an Hybrid Auralization Metho d 129 Figure 5.4: Ic eb er g’s pr o c essing Blo ck
diagr am. The Ambisonics RIR is tr e ate d, split, and c onvolve d to an input signal. A virtual
auditory sc ene c an b e cr e ate d by playing the multi-channel output signal with the appr
opriate setup.. Figure 5.5 shows an example of the RIR relativ e to the omnidirectional in- put
channel simulated through ODEON V.12 [ 59 ] relativ e to the sim ulated restauran t dining ro om
used in Chapter 4 with 1.1 seconds of rev erb eration time. Figure 5.5: Omnidir e ctional
channel of Ambisonics RIR for a simulate d r o om. The blue line indic ates the p art that pr
eviously sele cte d the c alculate d Center Time, henc e indic ate d as the dir e ct sound plus
the e arly r efle ctions. The or ange line indic ates the late r everb er ation p art of the RIR.
Chapter 5. Iceb erg an Hybrid Auralization Metho d 130 Figure 5.6 presents an example of the
Ambisonics RIR in the left column, the omnidirectional channel relative to the DS+ER in the
middle column. The righ t column graphs represen t the four c hannels late reflections part of
the Am bisonics RIR. Figure 5.6: First c olumn: four channels Ambisonics RIR. Midd le c olumn:
omni- dir e ctional channel (DS+ER p art). R ight c olumn: four channels A mbisonics RIR (LR p
art). In the sequence, the metho d first splits the RIR based on the TS. Then the direct sound
and the early reflections are con volv ed with the signal to b e repro- duced. In this step, only
the omnidirectional channel is used. Finally , the sig- nal is pro cessed using VBAP to provide
its directional prop erties. The VBAP metho d utilized was implemented in [ 237 ]. The VBAP
output is t w o-c hannel panned audio that is sent to channels of the corresp onding loudsp eak
ers. The output signal corresp onds to the relativ e full scale of the panned signal if the pro
vided Ambisonics RIR is normalized, or the absolute v alue in case of an un-normalized RIR. With
the normalized RIRs, calibration of a sound pres- sure level is required, and the repro duction
level can b e set accordingly to the application needs. Assuming a coherent sum b et ween tw o
loudsp eak ers that are set to repro duce the scaled signal to a predefined lev el, a prop ortion
is computed as follow: Chapter 5. Iceb erg an Hybrid Auralization Metho d 131 LS 1 = 20 log 10
(10 level / 20 ∗ sin 2 θ ) , (5.2a) LS 2 = 20 log 10 (10 level / 20 ∗ cos 2 θ ) , (5.2b) where
the user sets the level in dB SPL and θ is the incidence angle. A similar lev el calibration
recording a pure tone from a calibrator with a microphone to find a system’s α co efficient (as
explained in 3.2.3 ) will allo w pla ying the signal ov er each loudsp eaker with the intended
level. A frequency filter for eac h loudsp eak er is also p ossible if the loudsp eak ers’ FRF
needs to b e individually adjusted to achiev e a flat(ter) resp onse. The second part of the
impulse resp onse is then con v olved with the signal, all four c hannels in the prop osed
quadraphonic system b eing used. First, an Am bisonics deco der matrix observing the loudsp eak
ers’ p osition is created. Th us, the conv olved signal is deco ded from its bF ormat to aF
ormat. The implemen tation utilized in the algorithm to create the deco der matrix and to deco
de the signal uses the functions from the Politis [ 237 ] w ork. The separated signals are then
merged b eing ready to b e repro duced. Figure 5.7 show an example for an auralization of five
seconds of the Interna- tional Sp eec h T est Signal (ISTS) [ 120 ]. The top graph is the
original signal, and the mid-top is the signal conv olved with the DS and ER of the omnidi-
rectional c hannel of the Ambisonics RIR. The en v elop e is minimally affected b y the ER. The
mid-b ottom sho ws the signal conv olved to the LR part of four channels and deco ded from
Ambisonics bF ormat. The diffuse nature of the Ambisonics-generate LR eviden t in the smo other o
verall env elop e. The b ottom graph shows the result of the Iceb erg metho d, the merged
signal. This pro cess provides an auralized file that should b e repro duced through an Chapter
5. Iceb erg an Hybrid Auralization Metho d 132 Figure 5.7: Ic eb er g metho d example. T op gr
aph: original signal. Mid top gr aph: DS+ER p art (VBAP). Mid b ottom gr aph: LR p art
(Ambisonics). Bot tom p art (Ic eb er g). equalized and calibrated setup. An equalization and
calibration prop osal is describ ed in the Section 5.2.3 and can b e applied to similar setups
with equiv- alen t hardw are. Ho wev er, the results ma y v ary depending on hardware quality ,
loudsp eak er amplification, and frequency resp onse. In this w ork, the electroa- coustical
requirements (7.2.2) and Reference listening ro om (8.2) from the Recommendation ITU-R 1116-3 [
126 ]: Metho ds for the sub jective assessmen t of small impairments in audio systems were
observed. The frequency-sp ecific rev erb eration times w ere low er than the Recommendation:
0.04 s from 0.2- 4 kHz (0.08 s at 0.125 kHz) and 0.18 s in the Recommendation. The anechoic c
haracteristic of the room was inten tionally chosen in this case to ev aluate rev erb eration in
the virtualization setup. A setup within a differen t space will hav e different ro om acoustics c
haracteristics. The exp erimen ter can com- p ensate for the need for greater reverberation b y
con trolling the RIRs input. The electroacoustical requiremen ts for the loudsp eakers are also
relev ant as they aim to guaran tee the correct frequency reproduction or the p ossibility of
comp ensating the frequency resp onse with the appropriate hardw are. The ro om prop ortions are
also essen tial when setting a test environmen t, espe- cially if the repro duction will include
low frequencies affected by the ro om’s Chapter 5. Iceb erg an Hybrid Auralization Metho d 133
Eigen tones (standing wa ves). The address https://gith ub.com/aguirreSL/HybridAuralization con
tains an ex- ample and the necessary resources to auralize files according to this Iceb erg metho
d. This study utilizedAm bisonics first-order impulse resp onses generated with the ODEON V.12
soft ware. The choice was made by con v enience, and it can b e extended to an y equiv alent
Ambisonics RIR, simulated or recorded. The resulting RIR from ODEON is normalized. With that,
the user can play a sound on a different level (from the simulated one) without rerunning the sim
ulation using the normalized version. As an option, the metho d can denor- malize it (dividing
the RIR by its corresp onding factor provided in ODEON grid [ 159 ].). The denormalized result
sound will b e auralized to the level sim- ulated in ODEON (or equiv alent softw are). 5.2.3
Setup Equalization & Calibration: The setup can include a calibration and equalization
procedure that is included in the MA TLAB scripts to ensure a correct sound level repro duction
and also a flatter frequency resp onse from the system’s loudsp eak ers, av oiding additional
undesired coloration artifacts. First it was calculated a factor to transform the acquired
signals from full scale to dB SPL. This step consists in recording a pure tone at a sp ecific
frequency (1 kHz) with a known input level of 1 Pa, and calculating a factor to conv ert the
input from F ull-Scale to P a. The term indirect refers to the fact that this calculated factor
is applied to all frequencies, under the assumption that the setup (microphone, pre-amplifier,
pow er supply , and AD/DA conv erter) has a flat frequency resp onse in the audible frequency
range. T o calculate the conv ersion factor a sound pressure calibrator device (in this case the
B&K 4231) was connected to the microphone (1/2” B&K Chapter 5. Iceb erg an
Hybrid Auralization Metho d 134 4192 pressure-field and a pre-amplifier type 2669, supplied b y p
ow er mo dule B&K 5935). That provided a 93.98 dB SPL signal, which corresponds to 1 Pa.
The calibration factor ( α rms ) was calculated as in Equation 5.3 . Although this step was not
needed to the frequency equalization, it w as con venien t as once measured all the following
measurements were p erformed without the need of en tering the ro om. α rms = 1 RMS( v ( t )
1kHz ) P a FS , (5.3) The next step consists in equalize the frequency resp onse of eac h
loudsp eaker. A RIR from eac h loudsp eak er was measured and based on that an inv erted FIR
filter w as individually created to b e applied to the signals that will b e repro duced. The
frequency resp onse w as con v erted to its third-o ctav e v ersion, normalized, and inv erted
to create a vector with 27 v alues from 50 Hz to 20 kHz. These vectors con tained the correction
v alues in the frequency domain and can b e applied to any input signal. T o apply this
corrections a Piecewise Cubic Hermite Interpolating P olynomial (PCHIP) w as used in MA TLAB to
fit the v alues to the giv en input. Figure 5.8 presen ts an example of the normalized third o
cta v e mo ving av erage RIR acquired with a loudsp eaker (blue line), the same RIR but acquired
with a signal that was filtered (red line), and the filter frequency v alues obtained with the inv
ersion of the original RIR (black line). Chapter 5. Iceb erg an Hybrid Auralization Metho d 135
Figure 5.8: L oudsp e akers normalize d fr e quency r esp onse and inverte d filter. Dote d lines
r epr esent ITU-R 1116-3 limits. Figures 5.9 and 5.10 shows the moving av erage of each loudsp
eak er’s normal- ized frequency resp onse without and with the filter, resp ectively . Figure
5.9: L oudsp e akers normalize d fr e quency r esp onse (c olor e d solid lines), dote d lines r
epr esent ITU-R 1116-3 limits. Chapter 5. Iceb erg an Hybrid Auralization Metho d 136 Figure
5.10: L oudsp e akers normalize d fr e quency r esp onse with fr e quency filter c or- r e ction
(c olor e d solid lines), dote d lines r epr esent ITU-R 1116-3 limits. As the amplification to
eac h active loudsp eak er is individually controlled it is p ossible that a same file could b e
repro duced at a differen t sound pressure lev el (if someone inadv erten tly or accidentally c
hange the vol ume con trol button directly in the loudsp eak er for example). Since the α rms w
as already calculated and it was p ossible to con v ert a signal from FS to Pa, and consequen
tly , to dB SPL, and vic e-versa the individual loudsp eakers’ SPL w ere measured with a signal
defined to b e pla yed at 70 dB SPL Equation 5.4 . signal( t ) = signal( t ) rms(signal( t ))
10 70 − dBperV 20 µ Γ l (5.4) where Γ l is the level factor to the loudsp eak er l with
initial v alue = 1; dBp erV = 20 log 10 α rms 20 µ . The signal( t ) was pla y ed through a
loudsp eak er l and simultaneously recorded with the microphone S l ( t ); the SPL of the
recorded signal was calculated as follo ws Chapter 5. Iceb erg an Hybrid Auralization Metho d
137 SPL l = 20 log 10 S l ( t )[FS] α rms Pa FS 20[ µ P a] ! [dB] , (5.5) T en measuremen ts
were sequen tially performed with each loudsp eak er at inter- v als of 1 s; another iteration
of measuremen ts were p erformed if the measured SPL exceeded the tolerance of 0.5 dB on any of
the measuremen ts. A step of ± 0.1 [FS] is set to up date Γ l in its next iteration according to
the SPL obtained. Chapter 5. System Characterization 138 5.3 System Characterization The Iceb
erg auralization metho d in a four-loudsp eak er system (minimum re- quired) was ev aluated for
its capabilities to repro duce the intended reverbera- tion time and the appropriate binaural
cues. This section describ es the system setup, and the conditions experimented with utilizing
the Iceberg metho d. The metho d’s accuracy at the optimal and sub-optimal p ositions was
considered in this c haracterization as w ell the impact of the R T. F urthermore, placing a
second listener inside the ring was in vestigated to supp ort a more ecolog- ical situation. By
the end, a complementary study for those conditions was conducted with an aided mannequin to
supplement the ob jective data as the pandemic preven ted sub jective data collection. The
presen t study used the IT A-T o olb ox [ 29 ] for signal acquisition and pro cess- ing. T o
further enhanc e the accuracy of the lo calization estimates, a MA TLAB implemen tation of the
May and Kohlrausch [ 182 ] lo calization mo del from the Auditory Mo deling T o olb o x (AMT,
https://www.am to olb o x.org ) [ 287 ]) was also emplo y ed. The May mo del is sp ecifically
designed to b e robust against the detrimental effects of reverberation on lo calization p
erformance, making it an ideal choice for supplementing the ob jective data gathered in the
present study . The reverberation, or the p ersistence of sound after its initial source has
ceased, w as a parameter in this test that could significantly distort the estimated lo cation of
a sound source. The Ma y mo del accounts for reverbera- tion’s influence through frequency-dep
endent time dela y parameters, enabling more accurate lo calization estimates in reverberant en
vironments. By incorpo- rating the mo del in our analysis, we supplemen ted the ob jective data
gathered through signal pro cessing with an additional lay er of mo deling that allow ed a
relativ e comparison with previous studies. The main ob jective of an auralization metho d and
its virtualization setup is Chapter 5. System Characterization 139 to deliver appropriate
spatial a w areness to human listeners. The natural step for this would b e to verify and v
alidate the metho d. Unfortunately , special conditions were in place during the course of this
study; due to CO VID-19 re- strictions, v alidation tests with participants w ere not feasible.
The Section 5.5 extends the system verification and analysis to a targeted application of hearing
aid research. Although it do es not replace a sub jective impression v alidation and analysis,
it can help understand and predict the system’s b ehavior in a t ypical use case for hearing
research, whic h is the user with hearing aids. 5.3.1 Exp erimen tal Setup The proposed metho d
w as implemen ted, and the tests were conducted at Erik- sholm Research Centre in Denmark. The
test environmen t was an anec hoic ro om (IAC Acoustics) with inner dimensions of 4.3 m × 3.4 m
× 2.7 m. Sig- nals were routed through a sound card (MOTU PCIe-424) with a Firewire 440
connection to the MOTU Audio 24 I/O in terface and pla yed via loudsp eak- ers Genelec mo del
8030C (Genelec Oy , Iisalmi, Finland). The well-con trolled sound en vironment was appropriate
for the assessment of small impairments in audio systems, although the acoustic prop erties of
the ro om exceed the sound b o oths and ro oms commonly encountered in audiology clinics [ 316
]. 5.3.2 Virtualized RIRs & BRIRs A set of 72 ro om impulse resp onses and 72 binaural
ro om impulse resp onses w ere acquired through the system separated ov er 5 degrees angles
around the cen ter p osition assuming x as lateral axis and y the fron t-back (mid-saggital)
axis of a person inside the ring. Moreov er, the same n um b er of RIRs and BRIRs were measured
at off-center p ositions. Chapter 5. System Characterization 140 The virtualized RIRs and BRIRs
were acquired utilizing a logarithimc sweep signal (50-20000 Hz, 2.73 s (FFT Degree 18, Sample F
requency 96 kHz)) [ 194 ] as input. The signal was auralized to eac h angle in the Iceb erg
metho d for the same three spaces as in Chapter 4 : a classro om (9.46 m x 6.69 m x 3.00 m) with
an ov erall Rev erb eration Time R T of 0.5 s, a restauran t dining area (12.19 m x 7.71 m x
2.80 m) with an ov erall R T of 1.1 s, and an anec hoic ro om (4.3 m x 3.4 m x 2.7 m) with an
ideal o v erall R T of 0.0 s. All ro oms w ere acoustically simulated in ODEON soft w are V.12,
that generated the am- bisonics RIRs represen ting eac h mentioned source-receptor configuration.
The absorption co efficien ts of the ro om surfaces are listed in App endix E . The initial step
to acquiring the RIR and BRIR w as to auralize the sweep file utilizing the Iceberg metho d to
the desired p ositions (72 angles around the center) in three different ro om conditions. Then
play it through the four loudsp eak ers p ositioned in the front (0 ° ), left (90 ° ), bac k
(180 ° ) right (270 ° ) coun ter-clo c kwise angles. The auralized v ersion of the sweep should
corresp ond to the signal pla y ed in the virtual environmen t as the reverberation added by the
anechoic ro om is negligible. After that, the recorded file w as de-conv olved with a zero-padded
v ersion of the ra w sweep (See Figure 5.11 ). The playbac k and recording utilized the maximum
sampling rate supp orted on the AD/DA system (96,000 Hz) as the difference in time is in the µ s
scale. Therefore, the step size in time pro vided in microseconds is given by step size = (1 /
96 , 000) ∗ 1 , 000 , 000 = 10 . 42 µ s. The created sw eep duration was 2.731 s (FFT Degree =
18). Chapter 5. System Characterization 141 Figure 5.11: BRIR/RIR ac quisition flowchart: Ic eb
er g aur alization metho d. A manikin with artificial pinnae (HA TS mo del 4128-C; Br ¨ uel and
Kjær) was used to record the binaural files. Also, a second listener was simulated dur- ing the
tests with a different manikin (KEMAR; GRAS), (See Figure 5.12 ). The HA TS recordings w ere
calibrated as describ ed in Section 3.2.3 follo wing Equations 3.1a and 3.1b . Figure 5.12: BRIR
me asur ement setup: B&K HA TS and KEMAR p ositione d in- side the ane choic r o om.
Chapter 5. System Characterization 142 5.3.3 Conditions The auralized files w ere then recorded
under the follo wing conditions: • Optimal p osition (alone and cen tered) • Optimal p osition
(cen tered) accompanied b y a second listener • Off cen ter p ositions alone The p ositions grid
can b e visualized in Figure 5.13 . Figure 5.13: Me asur ement p ositions: Obtaine d thr ough
virtualize d sound sour c es with Ic eb er g metho d (VBAP and A mbisonics) in a four-loudsp e
aker setup. The most accurate p erformance is theoretically exp ected from optimal p osi- tion.
These tec hniques provide virtualization assuming the receptor (listener) is in the cen ter of
the loudsp eaker ring [ 65 , 241 ]. Adding a second listener in to the repro duction area and/or
moving the primary listener aw a y from the cen ter can c hallenge the system’s ability to
render the scene as in tended. The follo wing sections presents and discusses the system’s
capabilities to repro- duce Iceb erg auralized files b y measuring the binaural cues and R T in
differen t conditions. Chapter 5. System Characterization 143 5.3.4 Rev erb eration Time A ro
om’s characteristic wa ve field pattern can affect the human p erception of a repro duced sound.
Ro om acoustics can alter attributes related to spatial p erception. F or example, a recorded
sound has almost no c hance of b eing cor- rectly repro duced if the reproduction room has
stronger rev erb eration than the recorded one. Also, rev erb eration ov ersho ot can smear the
p erceiv ed direction of a sound source, as early reflections w ould b e heightened in this case
[ 242 ]. The R T w as calculated from impulse resp onses measured within the three vir- tualized
environmen ts (note that the simulated environmen ts were aimed to presen t R T of 0, 0.5, and
1.1 seconds). Reverberation time w as calculated using the IT A T o olb ox. The parameters w ere
set as follows: F requency from 125 Hz to 16 kHz, one band p er o cta v e, and 20 dB threshold b
elo w maximum. The rev erb eration time w as sho wn to b e stable in this virtualization setup.
An appro x. 0.08 s Overall R T can b e observed for the anechoic simulation (0 s R T). That is
most lik ely driven by the presence of the hardware inside the anec hoic ro om: loudsp eak ers
and woo d base for the chair, although cov ered with foam. The ov erall reverberation time was
measured without an omnidirectional sound source. T o circumv ent this limitation, the
measuremen t w as rep eated utilizing all 24 loudsp eak ers as sound source, one at a time. The
o v erall R T in this case w as considered as the maximum v alue across frequencies in o cta v e
bands from 125 Hz to 16k Hz. Figure 5.14 presents the b o xplot of the measured v alues in
relation to its p osition inside the ro om. Ro ws represent the aimed R T (0, 0.5 and 1.1 s).
The top line presen ts results without lateral displacemen t, the middle line presents the
results according to a lateral displacement of 2.5 cm from the center and the b ottom line
presents the results according to a lateral displacemen t of 5 cm from the cen ter. Chapter 5.
System Characterization 144 Figure 5.14: R everb er ation Time envir onments me asur e d with
files pr o duc e d with Ic eb er g metho d and virtualize d in four-loudsp e akers. T able 5.1
presen t the median of the v alues of the ov erall R Ts. Therefore, it is p ossible to notice
that virtualized environmen t R Ts’, tend to b e stable, and to the measured conditions, under
the just noticeable difference JND of 5% [ 264 , 265 ] across p ositions inside the ro om. T able
5.1: Reverberation Time in three virtualized environmen ts in differen t p osi- tions inside the
loudsp eak er ring. R T = 0 R T = 0.5 R T = 1.1 P osition [cm] Overall R T [s] x=0.0; y=0.0
0.085 0.519 1.114 x=0.0; y=2.5 0.085 0.519 1.111 x=0.0; y=5.0 0.085 0.526 1.113 x=0.0; y=10.0
0.084 0.526 1.147 x=2.5; y=0.0 0.085 0.531 1.120 x=2.5; y=2.5 0.086 0.529 1.114 x=2.5; y=5.0
0.084 0.559 1.148 x=2.5; y=10.0 0.083 0.546 1.157 x=5.0; y=0.0 0.085 0.537 1.139 x=5.0; y=2.5
0.085 0.538 1.138 x=5.0; y=5.0 0.085 0.548 1.138 x=5.0; y=10.0 0.084 0.552 1.147 Chapter 5. Main
Results 145 5.4 Main Results This section presen ts the results based on the mannequin p
ositions (center and off-cen ter) and conditions (HA TS and HA TS with KEMAR) to angles
referenced clo c kwise. 5.4.1 Cen tered P osition 5.4.1.1 In teraural Time Difference The blue
line in Figure 5.15 represents the result of the In teraural Time Differ- ence ITD filtered with a
1 kHz lo w-pass filter virtualized through the prop osed system. Figure 5.15: Inter aur al Time
Differ enc e under 1 kHz as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C
in the horizontal plane thr ough a pr op ose d Ic eb er g aur alization metho d on a 4 loudsp e
akers setup. The r e d line is the ITD r esults with r e al loudsp e akers (without
virtualization). A c c or ding to the sample r ate, the blue and r e d shade d ar e as in ar e
the c onfidenc e intervals. The black line r epr esents the analytic al ITD values. Chapter 5.
Main Results 146 W ang and Bro wn [ 297 ] defined the analytical ITD (blac k line in figure) (See
Equation 5.6 ) considering a centered, p erfect sphere of 10.5 cm of radius ( a ) and sound
propagation v elo cit y ( c ) of 340 m/s, θ is the angle in radians. ITD = a c 2 sin( θ )
(5.6) The maximum absolute difference found is 170 µ s, representing a mismatch around 15 º on
the given angle. The calculated a verage difference is 67 µ s, represen ting a difference of
around 7 º in lo calization. Figure 5.16: Inter aur al Time Differ enc e at 1 kHz as a function
of azimuth angle for a HA TS Br ¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr
o- p ose d Ic eb er g metho d on a 4 loudsp e akers setup on thr e e differ ent r everb er ation
time sc enarios. Three differen t simulated ro oms w ere measured utilizing files generated via
Iceb erg metho d to a four-loudsp eak er setup, k eeping the listener in the center p osition.
Figure 5.16 presents the acquired ITDs with Iceb erg metho d for R T = 0 s (blue), R T = 0.5 s
(red), R T = 1.1 s (yello w) and the ITD for R T = 0 without using virtualization ( i.e., repro
ducing through real loudsp eak ers) in Chapter 5. Main Results 147 blac k. There were no
substantial differences across the differen t reverberation times virtualized. This was exp ected,
though, as the direct sound drives the ITD. 5.4.1.2 In teraural Lev el Difference Figure 5.17 sho
ws the ILDs (calculated following Equation 3.7 ) across o ctav e bands for the angles around the
cen ter horizon tally spaced 30 º for b etter vi- sualization. The ILDs w ere most affected than
the ITDs, with a substan tial reduction in ILD relative to the actual loudsp eakers observ ed in
the 2 kHz band. The ILD v alues hav e a similar pattern and magnitude for a significan t part of
the sp ectrum at most angles. Figure 5.17: Ic eb er g Inter aur al L evel Differ enc es as a
function of o ctave-b and c en- ter fr e quencies; sep ar ate lines for angles of incidenc e.
Figure 5.18 presen ts the ILD for b oth setups: real loudsp eakers and Iceb erg metho d in six o
ctav e bands as a function of azimuth in spaces of 15 º . The top-righ t corner graph shows the
2 kHz band. It sho ws that apart from the p ositions where there is an actual loudsp eaker (
i.e., 0 º , 90 º , 180 º , and 270 º ), the differences are large, greater than 10 dB at some
azimuth angles. Chapter 5. Main Results 148 Figure 5.18: Ic eb er g ILD as a function of azimuth
angle. Listener alone in the c enter. Figure 5.19 shows the absolute difference in ILD b etw een
physical loudsp eakers and virtual loudsp eakers created b y the Iceb erg metho d. Figure 5.19:
Ic eb er g metho d: A bsolute ILD Differ enc es as a function of azimuth angle. T u [ 291 ]
measured just-noticeable differences (JNDs) in ILDs for normal- hearing participants using pure
tones at different presentation lev els. These JNDs can be used to estimate the p erception of
differences b et ween ILDs Chapter 5. Main Results 149 obtained with physical loudsp eak ers and
the Iceb erg auralization setup to an- alyze if the ILD difference b et w een setups will b e p
erceiv ed in a given sp ecific frequency band. Figure 5.20 presen ts the v alues from Figure 5.19
min us the appropriate pure-tone ILD JND v alues. That means that p ositive v alues that
exceeded the JND could b e p erceived not as in tended; that is, a p erceptible ILD deviation
can cause spatial distortion [ 38 ]. The 2-kHz ILDs show up to 8 º div ergence across most
angles. ILDs in other frequency bands (1, 2, 8, and 16 kHz) also presen ted v alues that could
relate to noticeable differences (up to 4 dB), but those are mostly limited to frontal ± 30 °
angles. The 2 kHz mismatc h can b e considered a fla w in the repro duction system. The effect on
sound lo calization and sub jectiv e impression of complex sounds in v olving these frequencies
needs further in vestigation as to the scale of spatial distortion. As the ITDs and ILDs at
other frequencies w ere relativ ely w ell preserv ed in the auralized system with only real
loudsp eak ers, it is p ossible this flaw at 2 kHz may ha v e a minimal effect, esp ecially for
low er frequency stim uli. System reliabilit y should b e verified first for stimuli with p eak
energy in the 2 kHz band or tasks requiring greater lo calization accuracy (e.g., with sound
sources within ± 30 ° ). Figure 5.20: Ic eb er g: Absolute ILD Differ enc es over JND as a
function of azimuth angles ar ound the c entr al p oint). Chapter 5. Main Results 150 5.4.1.3
Azim uth Estimation The fron tal azim uth angle was estimated using the binaural mo del by May
and Kohlrausc h [ 182 ]. Each BRIR w as con v olved with a pink noise with a duration of 2.9 s
as input into the mo del. The mean of the azim uth of each file is stated as azimuth predicted by
the mo del. Figure 5.21 presents the angles estimated with the Ma y and Kohlrausch mo del for
files auralized with the Iceb erg metho d for an anechoic ro om and virtualized o ver the
four-loudspeaker setup (blue curv e), angles estimated for binaural files acquired without
virtualization with real loudsp eak ers (red curv e), and the reference (dotted black). Figure
5.21: Ic eb er g metho d: Estimate d azimuth angle (mo del by May and Kohlr ausch [ 182 ]), HA
TS c enter e d, and R T = 0s. The mo del’s results are in line with the analysis of the binaural
cues supporting the assumption of the w orst lo calization accuracy around ± 30 º difference (30
º and 330 º in Figure 5.21 ). Also, the virtualized sound tends to hav e more difficult y
separating from the frontal angle (0 º ). Chapter 5. Main Results 151 5.4.2 Off-Cen ter P
ositions Mo ving the primary listener off-center (displaced b oth on the x and y axes) is prop
osed to measure the impact of a p erson’s head (and b ody) not b eing cen tered – such as when
not fixated – on the system’s ability to render the appropriate binaural cues. 5.4.2.1 In
teraural Time Difference Figure 5.22 presen ts 72 measured ITDs around the listener (5 º spacing)
in four differen t placemen ts: at center and displaced forw ards (y-axis) 2.5, 5 and 10 cm. When
displaced from the cen ter p osition, the Iceb erg metho d can cop e with deliv ering a
reasonably interaural time difference in frontal displacemen ts up to 5 cm or considering a sim
ultaneous misplacemen t lateral and frontal up to 2.5 cm. How ever, compared to the center p
osition, the error increased dramatically with 10 cm displacement for frontal angles (around ±
45) up to 400 µ s compared to the listener in the center. Lateral displacement p ositions (2.5
and 5.0 cm) were also inv estigated. The ITD results for these displacemen ts presented the same
trend as seen without lateral displacement. Similar results w ere found when virtualizing the
scenes with a rev erb eration time of 0.5 and 1.1 seconds. All combination results are presen
ted in Figure 5.23 to improv e readabilit y . Chapter 5. Main Results 152 Figure 5.22: Ic eb er
g ITD as a function of fr ontal displac ement: Center e d listener in pr op ose d Ic eb er g
metho d in a four-loudsp e aker setup. Figure 5.23: ITD Ic eb er g virtualize d setup: Listener
displac ement: listener p osition 2.5 cm off-c enter in pr op ose d Ic eb er g metho d in a
four-loudsp e aker setup. ITDs were affected with frontal displacements dep ending on the amount
of rev erb eration sim ulated. In the simulated dry condition the squared b eha v- ior is presen
t with 5 cm off cen ter, with mild rev erb eration the effect only Chapter 5. Main Results 153 app
ears with a displacemen t of 10 cm and the largest reverberation tested demonstrated the problem
to virtualize sources in all off center p ositions. The deviation is centered to ± 45 º in all
conditions. Lateral mov ements were ev en more affected, as expected, delivering ITDs based on
loudsp eak er position (the squared shap e) and not by circular placemen t of virtualized sound
sources on displacemen ts further than 3.5 cm from the center (combining the lateral and fron
tal mov ements.) 5.4.2.2 In teraural Lev el Difference Figure 5.24 presents the difference b et w
een the ILDs measured in the center and the ILDs measured in different p ositions for a dry ro om
simulation (R T = 0 s). The lateral displacemen t (x-axis) is ordered as ro ws (top ro w =
center, middle row = 2.5 cm and b ottom row = 5 cm to the righ t). The four columns are related
to the fron tal displacemen t (y-axis) of 0 (cen ter), 2.5, 5, and 10 cm. Note that these are
additional ILD errors to the previously discussed errors in tro duced b y the sim ulation itself
(with the listener at cen ter). Figure 5.24: Differ enc e in ILD as a function of azimuth angle
for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er
g aur al- ization metho d on a four-loudsp e akers setup (R T = 0.0 s). Chapter 5. Main Results
154 The ILDs are affected by listener displacement mostly in the mid frequencies and only at
certain angles. Lateral displacement of 2.5 cm pro duces larger in terference (up to 8 dB) for
the left angles 40 º and 130 º . In con trast, at other angles, ILD differences are lo wer than 3
dB. The 5 cm displacement also presents differences up to 15 dB at these same angles and up to 8
dB differences contralaterally (220 º and 320 º ). F rontal displacement follo ws a similar
pattern with more differences at some of the rear angles (130 º and 220 º ). These particular
differences indicate relativ ely lo w impact on ILD cues using the Iceb erg metho d in the
simulated anechoic ro om (R T = 0 s). Similar results in terms of affected angles w ere found
analyzing ILDs for the same listener p ositions for simulated ro oms with R T = 0.5 s Figure
5.25 and R T = 1.1 s Figure 5.26 . These conditions are closer to ev eryday situations. The
increased energy of the late reflections results in lesser magnitude differences in ILD,
indicating sligh tly b etter p erformance for more realistic simulations. Figure 5.25: Differ enc
e in ILD as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the
horizontal plane thr ough a pr op ose d Ic eb er g metho d on a four-loudsp e akers setup R T =
0.5 s. Chapter 5. Main Results 155 Figure 5.26: Differ enc e in ILD as a function of azimuth
angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic
eb er g metho d on a four-loudsp e akers setup R T = 1.1 s. 5.4.2.3 Azim uth Estimation Using
again the May et al . mo del, in the same setup as in Section 5.4.1.3 to predict the lo
calization of a sound source, Figure 5.27 presents the predicted source lo cations when moving
the listener along the grid p ositions men tioned (x = 0, 2.5 and 5 cm; y = 0, 2.5, 5 and 10
cm).The different R Ts are represen ted b y the line colors in the graphs (blue = 0.0s, red =
0.5s, yello w = 1.1s). The results indicate the system’s spatial sound accuracy is dep endent of
the listener p osition. On the other hand the error is not dep enden t on the reverberation
time. Lateral mov ements increase the error to the side that is getting closer to the ear, while
lessened on the con tra-lateral side. F rontal mo v ements increased the n umber of angles that
are not delivering the correct source angle (longer straigh t horizontal line around zero). The
mo del estimates an maximum error up to ≈ 30 º to a listener within 3.5 cm from the center
(combining Lateral anf fron tal displacement). Chapter 5. Main Results 156 Figure 5.27: Estimate
d (mo del by May and Kohlr ausch [ 182 ]) fr ontal azimuth angle at differ ent p ositions inside
the loudsp e aker ring as function of the tar get angle. The errors when comparing the angles
estimated on displaced p ositions to the estimated to the cen ter p osition are lessened with
the incremen t of rev erb era- tion to the ma jorit y of the angles. 5.4.3 Cen tered Accompanied
b y a Second Listener The Binaural cues w ere inv estigated, adding a second listener to the
scene and maintaining the first in the center (sw eet sp ot). The second listener w as p
ositioned in three differen t lateral (x-axis) distances on the left from the cen ter: • 50 cm
(simulating shoulder to shoulder). • 75 cm. • 100 cm. Chapter 5. Main Results 157 5.4.3.1 In
teraural Time Difference The upp er row of Figure 5.28 shows the ITDs considering the setup with
the HA TS alone at the center (blue line) and also with the presence of a second listener p
ositioned at the righ t side at three different distances from the center and the reference. The
ITDs in black were computed with no virtualization as a reference. Figure 5.28: ITDs and
absolute ITD differ enc es as a function of angle for multiple c onfigur ations with (c olor e d
lines) and without a se c ond listener (black line). There w as a small difference ( ≈ 15 µ s) as
the second listener is placed at the closest p osition (50 cm) considering rear and right
angles. The absolute difference has a maximum of 201 µ s, equiv alent to approximated 15 ◦ in the
source p osition (see b ottom ro w of Figure 5.28 ). 5.4.3.2 In teraural Lev el Difference Figure
5.29 presents the difference (∆ ILD) b et ween the ILD computed from the BRIRs collected with and
without a second listener inside the ring. The Chapter 5. Main Results 158 panel rows top to b
ottom show ∆ ILDs for simulated rooms with R Ts of 0, 0.5 and 1.1 s. The columns represen t the
differen t distances b et ween the centred and second listener, from 50 to 100 cm, left to right.
Figure 5.29: Inter aur al level differ enc es aver age d o ctave b ands as a function of azimuth
angle for a HA TS Br ¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d
Ic eb er g metho d on a four-loudsp e akers setup. The results sho w that adding a second
listener impacts the ILD for the angles shado w ed by the second listener. The effect is more
pronounced in the higher o cta v e bands (8 kHz and 16 kHz), reac hing appro x. 14 dB,
especially at the closest and farthest distances (50 cm and 100 cm). Although there is less
impact of ha ving a second listener at an intermediary distance (appro x. 9 dB), it still pro
duces noticeable ILD c hanges in the 4 kHz band. The ∆ ILD pro duced b y the presence of a
second listener is exp ected as a result of natural acoustic shadowing. The analysis of ILD
around the listener is also imp ortan t in the hemifield opp osite to the second listener ( i.e
., 180-360 ° ). The auralization metho ds that rely on the full set of loudspeakers to form the
sound pressure ( e.g. , Ambisonics) can presen t noise to the side with a free path as a ph
ysical ob ject, prev en ting the sound w a v e from forming accordingly at the Chapter 5. Main
Results 159 cen ter (sweet sp ot). Although the Iceb erg metho d is partially comp osed by
first-order Ambisonics, whic h is a metho d that requires all loudsp eak ers combined to form the
ap- propriate auralization, the part that VBAP p erforms presen ts the sound only through the
indicated quadrant, not requiring the other loudsp eakers to b e activ e. That extends the
system’s robustness with a limited n um b er of loud- sp eak ers and frequency limit (not b eing
dep endent on the Ambisonics order). 5.4.3.3 Azim uth Estimation Figure 5.30 depicts the fron
tal azimuth angles estimated by the May et al . for pink noise inputs of 2.9 seconds. The pink
noise is conv olved to the recorded BRIRs. The BRIRs, by its time, w ere recorded utilizing files
generated b y the Iceb erg auralization metho d in a four loudsp eak er setup. The columns in
the top row present graphs with the av erage estimated angle to a centered listener accompanied
b y a second lister at 50 cm (light blue curve), 75 cm (red curv e), 100 cm (y ello w curve)
according to room simulation (denoted b y reverberation time). The shaded area corresp onds to
the standard deviation. The top- left graph also presen ts the estimated angles to the real
loudsp eaker condition without virtualization (blue dotted line). Finally , the b ottom ro w
graphs sho w the differences b etw een the estimated azim uth angle and the target angle (the
estimated error). Chapter 5. Main Results 160 Figure 5.30: T op line = Estimate d lo c alization
err or with pr esenc e of a se c ond listener; b ottom line = differ enc e to r efer enc e.
Columns depicts differ ent R Ts, line c olors differ ent se c ond listener p ositions. According
to the mo del’s results, this difference reveals that the sound created via the Iceb erg metho d
and virtualized via a four-loudsp eaker setup gives consisten t lo calization cues even with a
side listener inside the ring in the describ ed p ositions. The median error is 9.9 º , and the
standard deviation is 8.8 º . On the other hand, the distribution of these differences made clear
that the setup of four loudsp eakers has more difficulty accurately presen ting the lo calization
cues b et ween the frontal loudsp eak ers, reaching up to 27 º of mismatching to these p
ositions (45 º &315 º ). This result is in line with sim ulations from Grimm et al. [ 97
]. The v alues obtained with the low n um b er of sp eakers utilized (four) are in line with the
exp ected in the literature [ 97 ] with a equiv alent pattern [ 84 ]. Although it can raise a
flag for exp erimen ts needing a more precise lo calization represen- tation, the Iceb erg metho
d can improv e simple setups’ realism. Accordingly , it needs to b e thoroughly inv estigated
considering sub jectiv e listening tests, esp ecially in the lateral angles. Chapter 5. Main
Results 161 Figure 5.31 presen ts the absolute differences b et ween the estimated angles of
arriv al for the cen tered p osition alone and the cen tered accompanied by a second listener at
50 cm, 75 cm, and 100 cm in the three sim ulated rooms tested. These differences reflect the
estimated influence of having the second listener inside the ring. Figure 5.31: A bsolute differ
enc e to tar get in estimate d lo c alization c onsidering the pr esenc e of a se c ond listener
and the r everb er ation time. The av erage error presen ts a sligh t increase with the proximit
y of the second listener. How ever, the effect is less perceiv able in mo derate R T. That fact
suggests that the acoustic shadow is presen t. An ANOV A analysis of the v ariance of the
estimated absolute errors b et ween the R T and KEMAR p osition groups was prop osed. F or the
distribution with 8 degrees of freedom and a n um b er of observ ations equal to 30, the
tabulated v alue of the F Distribution of Snedecor on ( p =0.05) is equal to 2.26. Thus, v alues
greater than the tabulated F accept the null h yp othesis that there is no significan t difference
b et ween the means of the absolute error of the groups of angles H 0 : µ i = µ j . F rom the
analysis, the results of the F statistic (presen ted in T able 5.2 ) H 0 is accepted in all
groups. Chapter 5. Main Results 162 T able 5.2: One wa y anov a, columns are absolute difference
b et w een estimated and reference angles for different KEMAR p ositions and R Ts. Source SS df
MS F Prob F Columns 746.5 8 93.3142 1.3755 0.2062 Error 21980.9 324 67.8423 T otal 22727.4 332
Therefore, there is no statistical difference b et ween the KEMAR p ositions for an y of the ev
aluated R Ts. That suggests the metho d’s stability in this setup, ev en with a second listener
considering the rev erb erations and p ositions tested. The mo del has difficulty estimating the
extreme lateral lo cations (90 º and 270), as ev en the actual loudsp eakers could not reac h
this estimation. A comparison b et w een the estimated angles with a listener alone in the
center p osition ac- quired only with actual loudsp eak ers (without virtualization) and the
listener in the center accompanied by a second listener acquired from virtualized files with the
Iceb erg mo del is presented in Figure 5.32 . Figure 5.32: Estimate d err or to R T=0 c
onsidering the estimation of r e al loud- sp e akers as b asis. The data analyzed in this
section suggests that b y observing the indicated p o- Chapter 5. Supplemen tary T est Results
163 sitions where the estimated difference can b e significant, although comparable to similar
methods listed in T able 2.2 experiments with equiv alent require- men ts (e.g., Chapter 4 ) can
b enefit from applying the Iceb erg metho d. The metho d will fairly repro duce sounds with the
presence of a second listener and increase the sense of immersiveness while repro ducing
spatialized sound with only four loudsp eak ers. Sub jective tests are needed to inv estigate
further the system’s spatial rendering p erformance. 5.5 Supplemen tary T est Results A concern
about virtualization pro cesses is ho w reliable they are when an extra la y er of signal pro
cessing is added to the exp erimen tal setup [ 97 , 213 ]. That is, ho w the sound acquisition
through a hearing device microphone and its signal pro cessing would b e affected b y
virtualization relativ e to a simple loudsp eak er repro duction or the real life situation [ 98
]. This section describes a comparison of the binaural cues with and without hearing aids. The
RIRs were collected in the same p ositions as presen ted in Section 5.4.1 and Section 5.4.2 . F
urther, inspired in the study of Simon et al. [ 276 ] the robustness of the virtualization setup
outside the sw eet sp ot was ev aluated. Oticon Opn S1 Mini RITE hearing aids with op en domes
were coupled to eac h ear of the HA TS manikin (See Figure 5.33 ). Mo dern hearing devices lik e
these presen t a series of signal pro cessing features that can affect the analysis dep ending on
the brand or mo del. In order to ensure compatibility of the results to other devices, sp ecific
features were not enabled. The devices were programmed in the fitting softw are to comp ensate
for the hearing loss of the N3 mo derate standard audiogram [ 35 ]; its b eamformer sensitivity
w as set to omnidirectional, and the noise reduction set to off. The hearing level of the
audiogram is presen ted in T able 5.3 . The op ened domes were c hosen to set the Chapter 5.
Supplemen tary T est Results 164 virtualization system’s most difficult signal mix condition. The
signal play ed through the system is not atten uated as the ear is not o ccluded. The amplified
signal from the hearing device is receiv ed at the eardrum (microphone) 8.1 ms after the
original signal. Figure 5.33: HA TS we aring the Otic on Op en-S 1 Mini RITE. T able 5.3:
Hearing Lev el in dB according to the prop osed Standard Audiograms for the Flat and Mo derately
sloping group [ 35 ]. N º Category F requency 250 375 500 750 1000 1500 2000 3000 4000 6000 N3
Mo derate 35 35 35 35 40 45 50 55 60 65 5.5.1 Cen tered P osition (Aided) The system was tested
b y measuring the BRIR with the listener (manikin) in the cen ter. The calculated binaural cues
are presen ted at incidence angles separated by 15 º at 1.35 m from the listener. Chapter 5.
Supplemen tary T est Results 165 5.5.1.1 In teraural Time Difference The ITD results (See Figure
5.34 ) in the aided condition w ere very similar to the unaided condition 5.4.1.1 .The maxim um
absolute difference found is 170 µ s, represen ting a mismatch around 15 º on the giv en angle
(same as previously measured unaided ITD difference). The calculated av erage difference is 67 µ
s, represen ting a difference of around 7 º in lo calization. Figure 5.34: Inter aur al Time
Differ enc e under 1 kHz at the pr op ose d Ic eb er g metho d as a function of azimuth angle for
a HA TS Br ¨ uel and Kjær TYPE 4128-C in the horizontal plane we aring a p air of he aring aids
in omnidir e ctional mo de (blue line). The r e d line is the ITD r esults with r e al loudsp e
akers (without virtualization). The black line r epr esents the analytic al ITD values. Figure
5.34 depicts higher differences concentrated on sp ecific regions: angles around ± 30 º to front
and back. The similarit y to the unaided condition is exp ected as the devices are not blo cking
the sound w a ve or increasing the path more in one ear than the other ( i.e. , there is only a
static group dela y added to the system). Therefore the sound reaches the HA TS microphones with
hearing aids prop ortionally as in the previous unaided condition. Chapter 5. Supplemen tary T
est Results 166 5.5.1.2 In teraural Lev el Difference Figure 5.35 shows the effect on the ILD to
the centered p osition. Although in higher o cta v e bands (8 and 16 kHz) the difference b et
ween ILD on an aided HA TS with real loudsp eakers to the aided HA TS utilizing the Iceb erg
metho d is a bit larger than unaided (See Figure 5.18 ), the effect on the 2 kHz band is
considerably smaller. That can b e due to the added delay in the signal, which can diminish the
p ossible comb filtering by the Iceb erg metho d at this sp ecific frequency region, esp ecially
for the angles b et ween tw o loudsp eak ers (where there is a larger distance b et ween real
loudsp eakers and the virtualized sound source). Figure 5.35: Inter aur al L evel Differ enc es
as a function of o ctave-b and c enter fr e- quencies. Angles ar ound the c entr al p oint.
5.5.1.3 Azim uth Estimation Figure 5.36 presen ts the angles estimated using the May and
Kohlrausch’s [ 182 ] mo del for files auralized with the Iceb erg method and virtualized o v er
the four- loudsp eak ers setup (blue curve), angles estimated for binaural files acquired Chapter
5. Supplemen tary T est Results 167 without virtualization with real loudsp eak ers (red curve),
and the reference (dotted blac k). The presen ted mo del’s result is in line with the analysis
of the binaural cues supporting the assumption of the worst lo calization accuracy around ± 30 º
(30 º and 330 º in Figure 5.21 ). Some differences bigger than the standard deviation are noted b
etw een different R T, esp ecially close to the lateral angles (90 º and 270 º ). The results
suggest that the added reverberation can negatively impact on lo calization accuracy . Figure
5.36: Ic eb er g metho d: Estimate d azimuth angle (mo del by May and Kohlr ausch [ 182 ]), HA
TS c enter e d and aide d. Also, according to the figure, the virtualized sound tends to ha v e
more difficult y separating from the frontal angle (0 º ) denoted by the flat lines from 30 º up to
340 º . Figure 5.37 depicts the b o xplot diagram of the absolute differences group ed b y R T.
An ANOV A analysis of the v ariance of the estimated absolute errors b et ween the R T and p
osition groups was prop osed. F or the distribution with 2 degrees of freedom and a num b er of
observ ations equal to 30, the tabulated v alue of the F Distribution of Snedecor on ( p =0.05)
is equal to 3.32. Chapter 5. Supplemen tary T est Results 168 Figure 5.37: Absolute differ enc e
to tar get in estimate d lo c alization in aide d c ondi- tion in aide d c ondition c onsidering
differ ent R Ts. Th us, v alues greater than the tabulated F den y the null hypothesis that there
is no significant difference b etw een the means of the absolute error of the groups of angles H 0
: µ i = µ j . F rom the analysis, the results of the F statistic (presen ted in T able 5.4 H 0
is rejected and the h yp othesis alternative H 1 : µ i ̸ = µ j is accepted ( F =5.68). T able
5.4: One wa y anov a, columns are absolute difference b et w een estimated and reference angles
for different positions and R Ts. Source SS df MS F Prob F Columns 520.77 2 260.386 5.68 0.0045
Error 4947.29 108 45.808 T otal 5468.06 110 T o identify in which sets of means the discrepancy
is statistically significan t, T ukey’s multiple comparison test was p erformed and the result is
shown in Figure 5.38 . Chapter 5. Supplemen tary T est Results 169 Figure 5.38: T ukey test to c
omp ar e me ans in aide d c ondition. Gr oup me an in R T 1.1s pr esente d signific ant differ enc
e fr om me an in gr oup R T 0.0 s This reflected a trend tow ards an increase in the estimated lo
cation error when there is signal amplification through the hearing aid, which did not o ccur in
the similar condition without the aid seen in Section 5.4.1.3 . 5.5.2 Off-cen ter P ositions
(Aided) The listener was mo v ed from the center p osition to simulate a displaced test
participan t w earing hearing aids. The BRIRs w ere measured in the p ositions describ ed in
Section 5.3.3 , and the results w ere analyzed in this section. 5.5.2.1 In teraural Time
Difference Figure 5.39 presen ts the ITD for the different angles around the listener as the
listener is displaced to different p ositions according to the sp ecified grid. Chapter 5.
Supplemen tary T est Results 170 Figure 5.39: Inter aur al Time Differ enc es as a function of o
ctave-b and c enter fr e- quencies. Angles ar ound the c entr al p oint. when it mo v es 5 cm to
fron t it starts to blur more the correct ITD for the fron tal angles. Especially around ± 45
degrees in the fron tal hemisphere where the ITD indicates that the sound is coming from 90 º or
0 º angles. F urther than this distance, also the rear ± 45 are affected, p oin ting to the break
of the panning illusion. Compared to the unaided condition (Section 5.4.2.1 ), this condition is
slightly more sensitiv e to displacemen ts Although the ITD analysis is angle-dep enden t, the
results in the T able 5.5 indicates that the displacement limitations can b e o v erall mapp ed
to indi- cate the maximum distance. T able 5.5 sho ws the maxim um ITD difference according the
displacement. Although the ITD analysis is angle-dep endent, the results The maxim um v alue
difference can indicate the tendency of the ITD shap e to b e squared, representing no
virtualization. That ma y help to iden tify displacemen t limitations can b e ov erall mapp ed
to indicate the maxi- m um distance. The squared b eha vior o ccurs when the sound of one
individual sp eak er is the main pressure con tribution, arriving to o early to one of HA TS
Chapter 5. Supplemen tary T est Results 171 ears b ecause of the HA TS’s p osition. T able 5.5:
Maximum ∆ITD relativ e to the center p osition according to displacement, lines refer to lateral
displacement and columns refer to fron tal displacement. R T = 0.0 s Displacemen t [cm] 0.0 2.5
5.0 10.0 0.0 0 [ µ s] 88 [ µ s] 182 [ µ s] 374 [ µ s] 2.5 233 [ µ s] 239 [ µ s] 364 [ µ s] 472 [
µ s] 5 317 [ µ s] 353 [ µ s] 399[ µ s] 566 [ µ s] R T = 0.5 s Displacemen t [cm] 0.0 2.5 5.0
10.0 0.0 0 [ µ s] 97 [ µ s] 229 [ µ s] 386 [ µ s] 2.5 213 [ µ s] 157 [ µ s] 313 [ µ s] 472 [ µ
s] 5 317 [ µ s] 282 [ µ s] 389 [ µ s] 566 [ µ s] R T = 1.1 s Displacemen t [cm] 0.0 2.5 5.0 10.0
0.0 0 [ µ s] 140 [ µ s] 299 [ µ s] 341 [ µ s] 2.5 236 [ µ s] 310 [ µ s] 372 [ µ s] 437 [ µ s] 5
283 [ µ s] 372 [ µ s] 380 [ µ s] 520 [ µ s] In this case frontal displacemen ts up to 2.5
centimeters are not presenting the square b eha vior and a maxim um ∆ITD, of 140 µ s (R T= 1.1
s), considering the centered p osition as a reference. Lateral mo v emen ts are more affected,
starting to present the squared b eha vior in the transition angles b etw een the rear loudsp
eak er and the right angle (230 º ) and the right loudsp eaker and the fron t (310 º ). This
pattern seems not b e R T dep endent, whic h is exp ected due to ITD’s nature. 5.5.2.2 In
teraural Lev el Difference Figure 5.40 presen ts the ILD, considering the simulation of an
anechoic envi- ronmen t (R T = 0s), on 24 angles around the listener as the listener is
displaced to different p ositions according to the sp ecified grid. Compared to the normal
condition, although it presen ts the same pattern, the aided condition has lesser differences b
et w een ILDs across more angles and frequencies. Chapter 5. Supplemen tary T est Results 172
Figure 5.40: Differ enc e in ILD as a function of azimuth angle for a B & K 4128-C. Ic eb
er g metho d, horizontal plane in a 4 loudsp e akers setup (R T = 0.0 s). The differences w ere
also lessened as the R T increased, as can b e seen in Figures 5.41 and 5.42 . This result shows
that increasing the reverberation can p ositiv ely affect the ILD error in off cen ter p ositions
(reducing the differences to the ILD in the center). Figure 5.41: Differ enc e in ILD as a
function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr
ough a pr op ose d Ic eb er g metho d on a four-loudsp e aker setup R T = 0.5 s. Chapter 5.
Supplemen tary T est Results 173 Figure 5.42: Differ enc e in ILD as a function of azimuth angle
for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er
g metho d on a four-loudsp e akers setup R T = 1.1 s. Chapter 5. Supplemen tary T est Results
174 5.5.2.3 Azim uth Estimation Figure 5.43 presents the estimated azim uth angle [ 182 ]
according to the p osi- tion of the listener. The different R Ts are represented by the line
colors in the graphs (blue = 0.0s, red = 0.5s, y ello w = 1.1s). The results demonstrate that
the Iceb erg metho d presen t less accuracy in the repro ducing sound in fron tal angles, esp
ecially ± 30 ◦ and ± 330 ◦ . The lateral discrepancy is smaller and also noted with real loudsp
eak ers, what can imply that the mo del used has some difficult y to assess that region. Figure
5.43: Estimate d fr ontal azimuth angle at differ ent p ositions inside the loud- sp e aker ring
as function of the tar get angle (aide d c ondition). Mo del by May and Kohlr ausch [ 182 ].
According to the mo del’s results, to an aided listener, the lo calization error is up to 30
degrees within a frontal or lateral displacement of 5 centimeters. In the case of 10 cm of
displacemen t, the virtualization will fail, presen ting the squared b eha vior on the
contralateral side. The increase of reverberation tends to main tain the maxim um error
magnitude, although increasing the spread to more angles. That means the lateral side close to
the loudsp eak er will presen t the sound source p osition less in the desired p osition but
more in Chapter 5. Supplemen tary T est Results 175 the loudsp eak er’s physical p osition.
Medium reverberations are less affected b y displacemen t, meaning extreme cases should drive
extra care with listener p ositioning. Chapter 5. Discussion 176 5.6 Discussion This c hapter
prop osed a new h ybrid auralization metho d (Iceb erg) for a vir- tualization setup comp osed
of 4 loudsp eak ers at a 1.35 m distance from the cen ter. This setup is a relatively limited
one intended as a feasible alternative to the m uc h more exp ensiv e and complicated arrangemen
ts prop osed and used in the reviewed literature (See Section 2.3 ). The inno v ation factor of
the Iceb erg metho d is the usage of a ro om acoustic parameter, called center time, used to
compute the transition p oin t b et ween early and late reflections. The Iceb erg’s c hannel
mixing and distribution au- tomation are generalized to any RIR collected or con verted in Am
bisonics’ first order. Implemen ted in MA TLAB, the Iceb erg auralization algorithm can generate
.w a v files that were virtualized in a setup with four loudsp eak ers (90-degree spacing around
the listener). Three sim ulated sound scenarios were predefined and sim ulated using acoustic mo
deling softw are generating RIRs in Ambisonics format. The setup provided appropriate rev erb
eration times even when the listener was aw ay from the cen ter p osition. Regarding binaural
cues, in the optimal p osition, the maximum deviation in ITD was 170 µ s, corresp onding to a
shift of approximately 15 º for sources around ± 30 º in fron t and back of the centered
listener. The considerable distance b etw een loudsp eak ers is the most likely cause of this
deviation. In con trast with Chapter 3 , the Iceb erg metho d could not repro duce the ITDs with
the same accuracy as VBAP in the sw eet sp ot p osition. How ever, it presen ted a b etter
performance than Am bisonics. The high accuracy of VBAP can b e attributed to the n um b er of
loudsp eakers, 24, which lessen their ph ysical distance, therefore, its maximum error. How
ever, ev en with the bigger n umber Chapter 5. Discussion 177 of loudsp eakers, Am bisonics was
truncated at first-order, thus not having the b enefit of more sound sources. There w ere also
deviations in ILD, mainly at the same angles. Ho w ever, the ILD deviations were most significant
in the 2 kHz o cta v e band. The actual effects of this difference in signals that encompass these
frequencies should b e further inv estigated in v alidation tests. The ILDs also denoted
patterns with b etter representation through VBAP than first-order Ambisonics in Sec- tion
3.3.2.2 . The Iceb erg metho d with four loudsp eakers presents a pattern with ILDs closer to
actual loudspeakers than pure Am bisonics, but again not as accurate as VBAP in 24 channels.
This is characterized mainly to a difference in the 2 kHz o cta ve band. That needs further in v
estigation and consideration in exp erimen ts requiring ILDs accuracy at that frequency band. Ov
erall the results for the binaural cues repro duction via the Iceb erg metho d in four loudsp
eak ers are b etter than a pure Ambisonics first-order but worse than VBAP (considering 24 loudsp
eakers). Therefore the Iceb erg metho d can b e considered an option when the num b er of loudsp
eakers is limited or the need for a sense of realism is higher. The Iceberg metho d com bines
relativ e accuracy with a sense of immersion. The maxim um estimated lo calization uncertaint y
w as around 30 degrees to the Iceb erg metho d in the minimal configuration of four loudsp eak
ers. The different amoun ts of reverberation tested did not impact the results. Although the
estimated lo calization was imp erfect, the metho d’s p erformance was in line with similar VBAP
implemen tations [ 97 ]. The results w ere similar to the aided condition, with ILDs indicating
b etter cue repro duction in the 2 kHz o cta ve band. This improv ement w as not trans- lated in
to a b etter-estimated angle, getting ab out the same results. A slight v ariation w as iden
tified, and a statistically significant difference was found b e- t w een differen t R Ts, esp
ecially in the lateral angles. This deviation needs to b e further ev aluated with other mo dels
and also with sub jective v alidation, Chapter 5. Discussion 178 esp ecially as the mo del
results presented unexpected results in these angles for non virtualized sound sources. A second
listener was introduced at the side of the primary listener while main taining the listener in
the optimal p osition to sim ulate a condition where there is a need for so cial interaction or
presence in a test. In this case, the binaural cues pro vided b y the Iceberg auralization metho
d virtualized in a four loudsp eak ers setup were compared to a baseline without virtualization
(actual loudsp eak ers). Also, the mo del of Ma y and Kohlrausch [ 182 ] was applied to predict
fron tal lo calization accuracy . Three distances w ere tested with the three sim ulated ro oms
(different R Ts). There was the exp ected acoustical shado w at angles blo c k ed by the second
listener but not to the remaining sound source lo cations around the listener. That can b e
considered a measure of rendering robustness; the second listener did not break the
virtualization of binaural cues by scram bling the sound pressure summation. Regarding
sub-optimal p ositions to unaided HA TS, the Section 5.4.2 presented surprising results. The the
virtualized effect was affected differen tly to the dis- placed HA TS according to the amount of R
T. In the dry R T (0 s) displacemen ts up to 5 cm when mo ving forw ard did not present the
undesired effect. The mild rev erb eration (0.5 s) got the undesired effect only with 10 cm from
the cen ter and the large R T up to 5. The large rev erb eration (1.1 s) w as hea vily af-
fected presenting the squared b eha vior on all off center displacemen ts. Lateral mo v emen ts w
ere affected in similar wa y for all the R Ts tested, presen ting the effect on displacemen ts
further than 3.5 cm from center. The ILDs presented the shadowing effect as exp ected, increasing
the distortions with the distance and reducing it with the increase of rev erb eration. The
combination came to an estimated azim uth angle in practice not dep enden t of R T and an error
off ≈ 30 º with displacemen ts up to 2.5 cm. As the displacement from cen ter in- creases the
maximum estimated error increases but also mo ved, meaning that Chapter 5. Discussion 179 the
virtualization is affected but still would pro duce a virtualized effect. In the aided condition
(Section 5.5.2 ) the off-center ITDs indicated a maxim um fron tal displacemen t on the Iceb erg
metho d under 10 cm from the center in unaided condition and under 5 cm in aided condition. The
ILDs were also im- pacted by distance, but to a low er extent, while the differen t R T affected
less the ITDs and more the ILDs. The I LDs presen t a smearing b eha vior, low ering the error,
with higher R T. That b eha vior suggests an equiv alen t comp ensation on the error predicted b
y the model. Within that distance limit, the maxim um error predicted w as around 30 degrees for
all R Ts, agreeing at the end with the non-aided condition. When the listener is a w ay from the
center, the Iceb erg metho d virtualized using the four-loudsp eak er setup increases the
deviations in binaural cues compared to the cues at those sound-source angles, with a near
complete loss of gradient in cues ( i.e. , either zero or extreme v alues) o ccurring when the
listener was 10 cm in front of the center p osition. ITDs for this condition rev ealed minor
differences across the tested rev erb eration conditions. The v alues indicated that the files
created by the metho d and repro duced on the four loudsp eak- ers configuration pro duce similar
ITDs as the baseline condition, having the w eek p oin t in the 30 degrees. The absolute ITD v
alues align with similar ex- p erimen ts found in the literature for VBAP configurations without
a second listener [ 241 ]. The acoustic shadow is indicated by an increase of the Delta ITD
difference around 270 degrees (left side), esp ecially to the closest p osition, similar to the
finding with pure VBAP Chapter 3 . Also, the difference in ILDs (∆ILDs) sho w ed that the presence
is well captured in higher frequencies. All R Ts conditions and positions demonstrated to
capture differences in ILD to the left side of the mannequin. A ∆ILD is exp ected as a result of
natural acoustic shado wing pro duced by the presence of a second listener. The b enefit of the
Iceb erg metho d is that the VBAP is not limited in frequency b y aliasing in the higher
frequencies and do es not require all to be activ e loudspeakers simultane- Chapter 5.
Discussion 180 ously as pure Ambisonics. The wa y the division is done in the Iceb erg metho d
brings the Ambisonics’ responsibility to the time domain, defining the metho d as more natural to
physical presence b et w een loudsp eakers and the listener. That extends the system’s
robustness with a limited num b er of loudsp eakers and frequency limit (not b eing dep endent
on the Am bisonics order). The predicted error for a second listener compared to the Iceb erg
baseline condition (listener cen tered alone). The method presen ts a deviation of around 10
degrees in all R Ts when the second listener is in the shoulder-to-shoulder situation, the
closest p osition (50 cm). As the difference is at the second listener p osition, it is p ossible
to argue that the Iceb erg energy balance is adv an tageous, not entirely dep ending on the four
loudsp eak ers’ summation. Therefore, compared to VBAP or Am bisonics, the Iceberg method is a
suitable option in terms of lo calization that adds the b enefit of immersiveness in a mo dest
hardw are. 5.6.1 Sub jectiv e impressions The auralization was compared b y the author and his
supervisor to VBAP and Am bisonics in sub jectiv e listening sessions. The experiment was not p
erformed systematically , as the Co vid-19 emergency rules imp osed a series of restrictions and
these impressions are the initial opinion. The sp eech signals w ere auralized via Iceb erg,
VBAP , and Ambisonics and reproduced in an anechoic ro om. The ro oms were simulated in Odeon
soft w are v.12 with reverberation time equiv a- len t to 0.5 and 1.3 seconds. Both agreed that
the sound direction from VBAP is easily identifiable but with p o or immersiveness, as all the
rev erb eration came from a sp ecific side (2 loudsp eak ers). Am bisonics offered a more
immersive ex- p erience with all loudsp eak ers activ e simulta neously , but the lo calization
was Chapter 5. Discussion 181 v ery difficult; a ”blurred” p osition seems to be a trending
description. The Ice- b erg system provided a sound lo calization close to VBAP while main
taining the immersiveness. The Iceb erg metho d, up on a trade-off on spatial lo calization, allo
ws for the repro duction of sounds that can b e easily manipulated regarding sound-source
direction, sound pressure lev el, reverberation time, and simultaneous sound sources. That makes
it p ossible to create or repro duce sp ecific virtual sound scenarios with high repro ducibility
. Thus, researc hers can conduct auditory tests with increased ecological v alidit y in spaces
that usually do not count with n umerous loudsp eakers, as is common in clinics, universities,
and small companies. Not withstanding these b enefits, some limitations challenged the metho d
with a small n um b er of loudsp eak ers. These limitations imp ose some constrain ts on its use
in terms of the spatial lo calization of sound sources. 5.6.2 Adv an tages and Limitations A
fundamen tal adv antage of the prop osed Iceberg metho d is the minim um n um b er of loudsp eak
ers required (four). F urthermore, its compatibilit y with an y RIR in low- or high-order
Ambisonics already collected. That is p ossi- ble as an RIR in HOA can b e easily scaled do wn
to first-order Ambisonics and its sound spatial prop erties comp osed with an y given sound via
the al- gorithm [ 237 ]. F urthermore, an essential part of the metho d’s definition and an
additional adv an tage is the automation of the definition of the amoun t of energy from the RIR
that corresp onds to the sp ecific auralization tec hnique. That automation p erformed by the
Central Time ro om acoustics parameter allo ws a smo oth transition b et ween the direct sound
and early reflections p or- tion and the late reflections of the RIR, resulting in a p oten tially
more natural sound while maintaining con trol ov er the incidence direction. Chapter 5.
Concluding Remarks 182 The auralization metho d is designed for a virtualization setup of 4
loudsp eak- ers. How ever, it is p ossible to use it with more loudspeakers, reducing the ev en
tual limitations on spatial accuracy . F urthermore, although not within the scop e of this
thesis, the metho d, using the VBAP tec hnique, would allo w the p ossibilit y for dynamically
moving sound sources around the listener. 5.6.3 Study limitations and F uture W ork The initial
aim of this study was to in v estigate the correlation b et ween ob jec- tiv e parameters
related to spatial sound, particularly those psyc hoacoustically motiv ated by auralization
metho ds, and sub jective resp onses to these meth- o ds. How ever, due to the Co vid-19
pandemic, tests with participan ts w ere not possible due to the risk of infection as mandated
by gov ernment rules. As a result, the study is limited to verifying ob jectiv e parameters.
Therefore, section 5.5 was included to explore the system capabilities within a relev ant con
text for hearing researc h, although without sub jective tests inv olving par- ticipan ts. In
future work, structured v alidation with participan ts w ould be of v alue to the field, allowing
for adjustmen ts and the measurement of the effectiv eness of this metho d in real-world auditory
tests. Additionally , future implemen tations of this metho d could include improv ements such
as guided sound source mov ements around the listener, with sim ultaneous up dates of VBAP and
Ambisonics w eigh ts defined b y time constan ts, and the ability to pan with in tensity using
techniques such as V ector-Based In tensity Panning (VBIP), whic h could b e tailored to specific
cases with differen t loudspeaker ar- rangemen ts or stimuli frequency con tent and p oten tially
merged with VBAP dep ending on the type of stimuli and sp ecific frequencies. Chapter 5.
Concluding Remarks 183 5.7 Concluding Remarks T ests that require hearing aids can b e p
erformed, considering some constraints, utilizing the prop osed Iceb erg metho d. These tests
aimed to verify the impact of the auralization metho d through a simple setup (four loudsp eak
ers) to the virtualized spatial impression by analyzing the binaural cues and their devi- ations
from actual sound-sources loudsp eak ers. This is an imp ortan t step, although not discounting
the imp ortance of v alidation with test participants. T o a centered listener, the verified
deviation in binaural cues presented limi- tations of around 30 º degrees in lo calization
(through ITD) with reasonably matc hing ILDs. The system’s reliabilit y is compromised as the
listener is mo v ed out from the sweet sp ot, but less so than when unaided, p ossibly due to
com b filtering or the addition of compression into the signal path. Small mo v emen ts up to 2.5
cm generated errors within a JND, meaning they likely w ould not be p erceiv ed as distortions
or artifacts. Th us, tests with p eople that require sound sources p ositioned in spaces larger
than 30 º can benefit from this Iceb erg metho d that incorp orates spatial a w areness and
immersiv eness. Chapter 6 Conclusion Throughout the course of this study , a new auralization
metho d called Iceb erg w as conceptualized and compared to well-kno wn metho ds, including VBAP
and first-order Ambisonics, using ob jective parameters. The Iceberg metho d is inno v ativ e in
that it uses TS to find the transition p oin t betw een early and late reflections in order to
split the Am bisonics impulse resp onses and adequately distribute them. VBAP is resp onsible
for lo calization cues in this prop osed metho d, while Am bisonics con tributes to the sense of
immersion. In the cen ter p osition, the Iceb erg metho d was found to b e in line with the lo
calization accuracy of other metho ds while also adding to the sense of immersion. Also, a
second listener added to the side did not present undesired effects to the auralization.
Additionally , it was found that virtualization of sound sources with Ambisonics can implicate
limitations on a participant’s b eha vior due to its sw eet sp ot in a listening-in-noise test.
How ever, these limitations can b e circum v en ted and extended to Iceb erg, resulting in sub
jective resp onses that align with b ehavioral p erformance in sp eec h in telligibilit y tests
and increasing the lo calization accuracy . 184 Chapter 6. Iceb erg 185 6.1 Iceb erg In the
previous chapter, w e conducted a thorough analysis comparing the p erformance of the Iceb erg
metho d to the results presented in Chapter 3 and the relev an t literature in Chapter 2 . This
comparison included ev aluating the Iceb erg metho d’s p erformance at the cen ter p osition, at
v arious off-center p ositions, and in the presence of a second listener. The results show ed
that the Iceb erg metho d w as able to provide the designed ov erall reverberation times of 0
seconds, 0.5 seconds, and 1.1 seconds across all measured p ositions. Additionally , the
differences b et ween the reverberation times were b elow the JND 5% threshold. When comparing v
alues to the ones obtained with a HA TS in the center with- out virtualization, it is noteworth
y that the Iceberg method uses 20 few er loud- sp eak ers than this VBAP and Ambisonics
configuration. The Iceb erg metho d exhibited low er accuracy in repro ducing ITDs at the sweet
sp ot p osition than VBAP , but it p erformed b etter than first-order Ambisonics. W e also
observ ed detrimen tal deviations in ILDs, with v alues exceeding 4 dB, particularly at the same
angles as the ITDs. The most significant ILD deviations o ccurred in the 2 kHz o cta v e band,
which could influence the perceived localization accuracy . F urther inv estigation through v
alidation tests is necessary to fully understand the exten t of these differences b et w een the
metho ds. Regarding o v erall binaural cue repro duction, the Iceb erg metho d using four loudsp
eak ers w as sup erior to pure first-order Ambisonics but less accurate than VBAP with 24 loudsp
eak ers. The Iceb erg presented a maximum estimated lo calization er- ror of around 30 degrees
for angles plus minus 40 degrees from the center while the listener is cen tered. Although this
magnitude matc hes the similar metho ds in T able 2.2 , the binaural cues were p ointed to a lo
wer estimate (around 15 degrees). Therefore further studies with p erceptual ev aluation are
highly en- couraged. In the Aided condition, w e observed that the ITD w as not affected Chapter
6. Iceb erg 186 at the center p osition, and the ILD was closer to the VBAP condition with 24
loudsp eak ers. Ho wev er, this impro v ement w as not reflected in the mo del estimate, which
still sho w ed maximum deviations of around ± 30 ◦ . A t off-cen ter positions, the Iceberg
method sho wed slight v ariations in lo caliza- tion estimates, particularly in lateral angles,
which were found to b e statisti- cally significant when comparing differen t reverberation times.
This v ariation is likely due to the metho d’s spatial limitation, known as the sweet sp ot, as
discussed in the Chapter 2 . When the reverberation time w as 0 s or 1.1 s, the sw eet sp ot was
more limited in terms of displacement from the cen ter (up to 3.5 cm). This means that these
conditions were more prone to breaking virtualization when sound sources were virtualized on the
contralateral side of the displacemen t. In contrast, the mild condition (0.5 s) maintained this
up to 5 cm. A sw eet sp ot is generally smaller in first-order Ambisonics compared to VBAP with a
24 loudsp eaker setup, as iden tified in Chapter 3. Ho w ev er, it is imp ortan t to note that ob
jectiv e parameters may not alw a ys corresp ond directly to sub jective impressions. Despite
this, the Iceb erg metho d with four loudsp eak ers was found to p erform similarly to VBAP
(with 24 loudsp eakers) in terms of binaural cue repro duction. The mo del estimates also show
ed that, within a com bined displacement of up to 3.5 cm in b oth lateral and fron tal
directions, the maximum error w ould b e less than 30 degrees, indicating the presence of
virtualization ( i.e. , the sound b eing physically comp osed of more than just the nearest sp
eaker). It is therefore recommended to ev aluate this deviation further using other mo dels and
sub jectiv e v alidation tests. The results in Section 5.4.3.3 , the condition with the listener
in the center, sho w ed that the presence of a second listener did not negatively affect the p
erformance of the Iceb erg metho d in all conditions of reverberation tested. No statistical
difference in the means of estimated error w as identified when considering the three R T
conditions and the three KEMAR positions. The Chapter 6. General Discussion 187 binaural cues
errors follo w ed the same trend as the Alone version, meaning that ITDs p ointed to an error
around 15 degrees, but with ILDs having absolute v alues with differences exceeding 4 dB (JND),
which can probably explain the 30 º error estimated by the mo del in the w orst p osition ( i.e.
, the angle of the virtualized sound source at ± 45 º ). Based on these results, the Iceb erg
metho d can b e viable for virtualization setups with limited loudsp eakers or when a higher
sense of realism is desired. 6.2 General Discussion In this work, we explored the use of
auralization methods in hearing research as a means of impro ving the ecological v alidit y of
acoustic en vironments. The use of virtualized sound fields has b ecome increasingly p opular in
lab oratory tests. Ho w ev er, it is essen tial to understand the limitations of these methods
in order to ensure unbiased results [ 97 ]. Our literature review (Chapter 2 ) identified the
need for auralization metho ds that can b e implemented in smaller-scale setups, and our initial
ev aluations fo cused on the spatial accuracy of several fundamen tal auralization metho ds, as
well as their p otential use in tasks in- v olving m ultiple listeners. A collab orativ e study
allo w ed us to test one of these tec hniques with real participants, and our findings highligh
ted both the limita- tions and p oten tial improv ements of using Am bisonics for conducting
listening effort tests. Based on this exp erience and our kno wledge of room acoustics and
auralization, w e prop osed a new hybrid metho d called Iceb erg, whic h combines the strengths
of Am bisonics and VBAP and can b e implemented using just four loudsp eak ers. This prop osed
metho d offers a lo w-cost option for auralization that could increase its adoption among researc
hers worldwide. In Chapter 3 , the VBAP and Ambisonics auralization metho ds were ob jectively c
haracterized and compared in terms of binaural cues for the center and off- Chapter 6. General
Discussion 188 cen ter p ositions. This inv estigation provided a foundation for combining the
metho ds and further highlighted the strengths of each tec hnique: lo calization in VBAP and
immersiv eness in Am bisonics. Ob jective parameters extracted from BRIRs and RIRs were examined
for a single listener and in the presence of a second listener in the ro om. The results show ed
that the presence of a second listener did not significan tly impact the p erformance of VBAP . A
t the same time, Am bisonics was less effectiv e in repro ducing the examined cues, esp ecially
with a second listener present. This information w as crucial in developing the prop osed Iceb
erg auralization metho d, whic h combines the strengths of b oth VBAP and Ambisonics to create a
hybrid metho d suitable for use with simple setups such as four loudsp eakers. The results of
the collab orativ e study describ ed in Chapter 4 demonstrate the feasibilit y of using a
virtualization method to deliv er a hearing test with a certain lev el of spatial resolution and
immersion across different ro om sim ula- tions and signal-to-noise ratios. This study suggests
that virtualization meth- o ds hav e the p otential to provide realistic acoustic environmen ts
for hearing tests, allowing researc hers to v ary the acoustic demands of a task and p o- ten
tially impro ve ecological v alidity . Additionally , the significant correlation b et w een
participants’ sub jective p erception of effort and their sp eec h recogni- tion p erformance
highligh ts the imp ortance of considering listening effort in hearing researc h. Ho w ever, the
limitations and p otential solutions identified in this study also highlight the need for further
in v estigation into virtualization metho ds in hearing research, including dev eloping new
auralization metho ds that address these limitations. In Chapter 5 , we presented the dev
elopment of a new auralization metho d called Iceb erg, whic h was designed to be compatible
with small-scale virtu- alization setups using only four loudsp eak ers. Previous hybrid metho
ds that com bine Am bisonics and VBAP hav e b een developed, but the innov ative as- Chapter 6.
General Discussion 189 p ect of the Iceb erg metho d is its approach to handling and combining
the dif- feren t metho ds to virtualize sounds while delivering appropriate spatial cues. This
feature is ac hiev ed b y iden tifying a transition p oin t in the RIR using the Cen tral Time
parameter from the omnidirectional channel of an Ambisonics RIR. This automated pro cess allo ws
the user to input any Ambisonics RIR, along with the desired presentation angle(s) and sound
file(s), to b e auralized using the VBAP and Am bisonics metho ds merged into a final multi-c
hannel .w a v file for presen tation o v er a four-loudsp eak er system. One of the b enefits of
this approac h is that it do es not require any additional parameters, such as those generated
by a simulation program, and can b e used with any Am bison- ics RIR, including those in
higher-order format that m ust b e con v erted to an appropriate order for the n um b er of
loudsp eak ers. Overall, the dev elopment of the Iceb erg metho d illustrates the p oten tial
for adapting existing technology to meet the needs of smaller-scale virtualization setups while
still deliv ering realistic spatial cues. This approac h could support the broader adoption of
au- ralization in hearing researc h and encourage researc hers to utilize virtualized sound
fields in their proto cols. 6.2.1 Iceb erg capabilities The auralization metho d proposed in this
work combines the use of Am bisonics RIRs and VBAP to balance the acoustic energy in t wo
spatial domains: the p erception of sound lo calization and the p erception of immersion. This
results in a file that captures the c haracteristics of a giv en sound as if it were play ed in
the desired environmen t. The metho d can b e repro duced with at least four loudsp eakers but
is scalable to a more extensive arra y of loudsp eakers of an y size greater than four,
theoretically increasing its efficiency . In addition, m ultiple sound sources can b e virtualized
and merged at presen tation to create more complex environmen ts. The input to Iceb erg includes
Ambisonics RIRs Chapter 6. General Discussion 190 corresp onding to sp ecific source-and-receptor
p ositions and the sounds to b e virtualized, preferably recorded in (near) anechoic conditions.
The method can pan the source around the listener, as the VBAP comp onen t is indep endent of Am
bisonics. Ho wev er, it is recommended that RIRs b e generated for sp ecific angles when using ro
om acoustic soft w are to generate the Am bisonics RIRs. One b enefit of this metho d is that it
can repro duce sounds ab o v e the cut-off frequency asso ciated with lo wer-order Am bisonics
due to its use of VBAP , whic h is initially not frequency limited [ 241 ]. VBAP is resp onsible
for the deliv ery of b oth direct sound and early reflections. Additionally , the default prop
erties are defined to work with normalized RIRs, enabling the researcher to sp ecify the sound
pressure level of the auralized files. 6.2.2 Iceb erg & Second Join t Listener T esting
with a second listener inside the loudsp eak er ring helps illuminate the p oten tial for this
virtualization system in different tasks and h uman-in teraction situations [ 143 , 202 , 230 ,
234 ]. A system that allo ws these tasks and situations needs to deliv er the appropriate sound
prop erties for the sound to b e p er- ceiv ed as coming from the intended p osition [ 97 ]. Am
bisonics w as sho wn to b e not effectiv e in this test, as the shadow caused by a second
listener pre- v en ted higher frequency spatial information from b eing correctly presen ted,
distorting the sound field (esp ecially in lo w-order Am bisonics). V ector-based solutions can
hav e less impact as the sound is physically formed from t w o (or three in 3D setups) loudsp
eak ers in the same quadrant. That means that the in terference will happen only at angles where
the acoustic shado w of a physical ob ject would naturally in terfere in a non-virtualized repro
duction. In Chap- ter 5 BRIRs w ere acquired with files generated b y the Iceb erg metho d and
repro duced via a mo dest setup comp osed of four loudsp eak ers in the presence of a second
listener. It could b e observed that it did not disturb the sound Chapter 6. General Discussion
191 field, as the (primary) listener in the center p osition received the appropri- ate binaural
cues. The system designed to repro duce files virtualized with Iceb erg metho d managed to
perform competitively with systems with more loudsp eak ers rendering pure metho ds (See T able
2.2 ). 6.2.3 Iceb erg: Listener W earing Hearing Aids Adding the p ossibilit y of allowing
participants to use hearing devices is an- other crucial step in making auditory tests with
auralized files accessible to more researc hers [ 134 , 144 ]. It has b een observ ed that
hearing aid signals can influence the intelligibilit y and clarity of sp eec h in virtualized
sound fields [ 7 , 97 , 99 , 103 , 161 , 188 , 213 , 276 ]. When the hearing aid signals are not
appro- priately aligned with the characteristics of the virtualized sound field, listeners ma y
struggle to comprehend sp oken w ords or sen tences [ 98 ]. This issue can b e exacerbated when
the virtualized sound field includes noise or other distrac- tions that can interfere with sp eec
h p erception or when the hearing aid signals fail to amplify or enhance the sp eec h signal to
an adequate degree [ 98 , 137 ]. Supp ose the hearing aid signals are not correctly capturing
the sound field and, therefore, not correcting it to the individual needs and preferences of the
lis- tener. In that case, the listener ma y exp erience difficulty using the virtualized sound
field comfortably and effectiv ely . Sw ept signals were auralized b y the Iceb erg metho d, pla
yed through the system, recorded with a manikin wearing hearing aids, and deconv olved. The
resulting BRIRs were analyzed in terms of binaural cues and compared to the same signals from
actual loudsp eak ers. The lo calization error was estimated b y May and Kohlrausch’s
probabilistic mo del for robust sound source lo calization based on a binaural auditory fron t
end. This mo del estimates the lo cation of a sound source using binaural cues such as in
teraural lev el differences and interaural time differences extracted from the signals received by
the t w o ears. By combining these cues in a probabilistic Chapter 6. General Conclusion 192
framew ork, the mo del can robustly estimate the lo cation of the sound source, ev en in noisy
or distracting environmen ts. Ev aluation of the mo del suggests its p oten tial for use in
practical con texts such as in hearing aids or virtual re- alit y systems. Results obtained
using the Iceberg metho d with an aided HA TS sho w ed similar p erformance to the unaided
results with the listener p ositioned in the sweet sp ot, indicating suitable p erformance (see
Section 5.5 ). 6.2.4 Iceb erg Limitations The virtualization system playing files auralized with
the Iceb erg metho d has b een found to b e less effective outside of the sweet sp ot, as the
binaural cues are not correctly rendered. This mismatc h, whic h o ccurs for more than 2.5 cm
displacemen ts, can b e mitigated b y keeping the listener centered in the virtu- alized sound
field. While this is a significant limitation, the metho d can still b e applied with simple
measures suc h as a mo dest head restraint, reducing the setup requirements compared to other
classical metho ds. One ma jor limitation of the Iceb erg metho d is its spatial resolution
capabilities. It is recommended for scenarios with a minimum of 30 º of separation b etw een
sound sources (it can b e low er if closer to loudsp eak ers, although it should b e c hec k ed
for the error distribution). F urthermore, the distance to the sound source should b e equal to
the radius of the loudsp eaker array , as Ambisonics and VBAP can- not define sources inside the
array . VBAP can only pan b et w een physical sound sources. These limitations should b e
considered when using the Iceb erg metho d to create virtualized sound fields. Chapter 6. General
Conclusion 193 6.3 General Conclusion As computational capacit y increases, using more complex
and natural sound scenarios in auditory research b ecomes feasible and desirable. This
technology allo ws for testing new features, sensors, and algorithms in controlled condi- tions
with increasing realism and ecological v alidity . Ev en clinical tests can b enefit from
auralization, allo wing for in vestigations in different scenarios with v arying acoustics ( e.g.
, in a sp eech-in-noise test). The spatial-cue p erformance of the Iceb erg auralization metho
d, repro ducing files through a system of four loudsp eak ers, is mainly sufficien t for these t yp
es of tests. It is essential to un- derstand the constrain ts of auralization metho ds, Iceb erg
included, whic h are tied to the virtualization setup and should b e c hosen by researchers
based on their needs and the av ailable hardw are. How ever, utilizing the Iceb erg, virtu-
alization can b e conducted by auditory research groups that cannot afford or house exp ensive
anec hoic cham b ers with tens or hundreds of loudsp eak ers and sophisticated hardw are and
need more freedom than using headphones. The metho d presented in this w ork serves as an
additional to ol for researc hers to consider. 6.4 Main Con tributions In this w ork, we ha v e
presented a nov el auralization metho d called Iceb erg, designed to create virtualized sound
scenarios for use in auditory research. The main contributions of this work are: 1. The dev
elopment of a hybrid auralization metho d that combines t w o psyc hoacoustic virtualization
metho ds to balance the energy of an RIR and output a m ulti-c hannel file for presentation.
Chapter 6. Main Contributions 194 2. The implemen tation of an effectiv e, simple, and partially
automated au- ralization metho d that allows for the creation of reasonably realistic vir-
tualized sound scenarios with a mo dest setup. 3. The exploration of the use and limitations of
auralization metho ds in auditory research, including the suggestion that the Iceb erg metho d
has the p oten tial to b e a helpful to ol for testing new features, sensors, and algorithms in
con trolled conditions with increasing realism and ecological v alidit y . 4. W e researc hed
the limitations and feasibility of using Ambisonics in the con text of sp eech intelligibilit y
with normal-hearing listeners. 5. Identifying the p oten tial for the Iceb erg metho d to b e
applied in a range of practical contexts, including in hearing aids and virtual realit y sys-
tems. Bibliograph y [1] Aguirre, S. L. (2017). Implemen ta¸ c˜ ao e a v alia¸ c˜ ao de um
sistema de vir- tualiza¸ c˜ ao de fontes sonoras (in p ortuguese). Master, Programa de P´ os-
Gradua¸ c˜ ao em Engenharia Mecˆ anica, Universidade F ederal de Santa Cata- rina. (Cite d on p
ages 41 and 49 ) [2] Aguirre, S. L., Bramsløw, L., Lunner, T., and Whitmer, W. M. (2019). Spa-
tial cue distortions within a virtualized sound field caused b y an additional listener. In Pr o
c e e dings of the 23r d International Congr ess on A c oustics : in- te gr ating 4th EAA Eur or
e gio 2019 , pages 6537–6544, Berlin, Germany . ICA In ternational Congress on Acoustics,
Deutsche Gesellschaft f ¨ ur Akustik. (Cite d on p age 94 ) [3] Aguirre, S. L., Seifi-Ala, T.,
Bramsløw, L., Gra v ersen, C., Hadley , L. V., Na ylor, G., and Whitmer, W. M. (2021). Com
bination study 3. h ttp: //hear- eco.eu/combination- study- 3/ . (accessed: 24.11.2021). (Cite d
on p age 97 ) [4] Agus, T. R., Akero yd, M. A., Gatehouse, S., and W arden, D. (2009). In-
formational masking in young and elderly listeners for sp eec h masked b y sim ultaneous speech
and noise. The Journal of the A c oustic al So ciety of A meric a , 126(4):1926–1940. (Cite d on
p age 40 ) [5] Ahnert F eistel Media Group (2011). Ease enhanced acoustic sim ulator for
engineers. h ttps://www.afmg.eu/en/ ease- enhanced- acoustic- sim ulator- engineers . Last c hec
ked on: No v 28, 2021. (Cite d on p age 27 ) [6] Ahrens, A., Marschall, M., and Dau, T. (2017).
Measuring sp eec h intelligi- bilit y with sp eech and noise interferers in a loudsp eaker-based
virtual sound en vironmen t. The Journal of the A c oustic al So ciety of Americ a ,
141(5):3510– 3510. (Cite d on p ages 16 , 42 and 52 ) [7] Ahrens, A., Marsc hall, M., and Dau,
T. (2019). Measuring and mo deling sp eec h intelligibilit y in real and loudsp eak er-based
virtual sound en viron- men ts. He aring R ese ar ch , 377:307–317. (Cite d on p ages 42 , 53 ,
99 and 191 ) 195 BIBLIOGRAPHY 196 [8] Ahrens, A., Marsc hall, M., and Dau, T. (2020). The effect
of spatial energy spread on sound image size and sp eech intelligibilit y . The Journal of the A
c oustic al So ciety of Americ a , 147(3):1368–1378. (Cite d on p age 42 ) [9] Akero yd, M. A.
(2006). The psychoacoustics of binaural hearing. Interna- tional Journal of A udiolo gy ,
45(sup1):25–33. (Cite d on p ages 9 and 17 ) [10] Alfandari Menase, D. (2022). Motivation and
fatigue effe cts in pupil lo- metric me asur es of listening effort . PhD thesis, Univ ersit y of
Nottingham. (Cite d on p age 33 ) [11] Algazi, V. R., Duda, R. O., and Thompson, D. M. (2004).
Motion-track ed binaural sound. Journal of the Audio Engine ering So ciety , 52(11):1142– 1156.
(Cite d on p age 23 ) [12] Alhanbali, S., Daw es, P ., Millman, R. E., and Munro, K. J. (2019).
Mea- sures of Listening Effort Are Multidimensional. Ear and he aring . (Cite d on p ages 51 and
118 ) [13] Alp ert, M. I., Alp ert, J. I., and Maltz, E. N. (2005). Purc hase o ccasion influence
on the role of m usic in advertising. Journal of business r ese ar ch , 58(3):369–376. (Cite d
on p age 8 ) [14] Arau-Puchades, H. (1988). An improv ed rev erb eration formula. A cta A custic
a unite d with A custic a , 65(4):163–180. (Cite d on p age 34 ) [15] Archon tis P olitis
(2020). Higher Order Ambisonics (HOA) library. (Cite d on p age 107 ) [16] Arlinger, S. (2003).
Negative consequences of uncorrected hearing loss—a review. International Journal of Audiolo gy
, 42(sup2):17–20. (Cite d on p ages 1 and 55 ) [17] Asp¨ ock, L., P ausc h, F., Stienen, J.,
Berzb orn, M., Kohnen, M., F els, J., and V orl¨ ander, M. (2018). Application of virtual
acoustic en vironmen ts in the scop e of auditory research. In XXVIII Enc ontr o da So cie dade
Br asileir a de A c ´ ustic a, SOBRAC, Porto Ale gr e, Br azil . SOBRAC. (Cite d on p ages 16
and 42 ) [18] Atten b orough, K. (2007). Sound Pr op agation in the Atmospher e , pages 113–147.
Springer New Y ork, New Y ork, NY. (Cite d on p age 10 ) [19] Baldan, S., Lac hambre, H., Delle
Monache, S., and Boussard, P . (2015). Ph ysically informed car engine sound synthesis for
virtual and augmented en vironmen ts. In 2015 IEEE 2nd VR Workshop on Sonic Inter actions for
Virtual Envir onments (SIVE) , pages 1–6. IEEE. (Cite d on p age 20 ) BIBLIOGRAPHY 197 [20]
Barron, M. (1971). The sub jective effects of first reflections in concert halls—the need for
lateral reflections. Journal of Sound and Vibr ation , 15(4):475–494. (Cite d on p age 37 ) [21]
Barron, M. and Marshall, A. (1981). Spatial impression due to early lateral reflections in
concert halls: The deriv ation of a physical measure. Journal of Sound and Vibr ation ,
77(2):211–232. (Cite d on p ages 9 , 37 and 38 ) [22] Bates, E., Kearney , G., F urlong, D., and
Boland, F. (2007). Localization accuracy of adv anced spatialisation tec hniques in small
concert halls. The Journal of the A c oustic al So ciety of Americ a , 121. (Cite d on p ages 47
, 49 and 53 ) [23] Benesty , J., Sondhi, M., and Huang, Y. (2008). Springer Handb o ok of Sp e e
ch Pr o c essing . Springer Handb o ok of Sp eech Pro cessing. Springer- V erlag Berlin Heidelb
erg. bibtex: Benesty2008. (Cite d on p age 24 ) [24] Berkhout, A. J. (1988). a holographic
approac h to acoustic control. Jour- nal of the A udio Engine ering So ciety , 36(12):977–995.
(Cite d on p age 31 ) [25] Berkhout, A. J., de V ries, D., and V ogel, P . (1993). Acoustic con
trol b y w av e field syn thesis. The Journal of the A c oustic al So ciety of Americ a ,
93(5):2764–2778. (Cite d on p age 31 ) [26] Bertet, S., Daniel, J., Parizet, E., and W arusfel,
O. (2009). Influence of microphone and loudsp eaker setup on p erceiv ed higher order ambisonics
repro duced sound field. Pr o c e e dings of A mbisonics Symp osium . cited By 3. (Cite d on p
age 31 ) [27] Bertet, S., Daniel, J., Parizet, E., and W arusfel, O. (2013). In vestigation on
lo calisation accuracy for first and higher order am bisonics repro duced sound sources. A cta A
custic a unite d with A custic a , 99:642 – 657. (Cite d on p ages 31 and 77 ) [28] Bertoli, S.
and Bo dmer, D. (2014). No v el sounds as a psychoph ysiological measure of listening effort in
older listeners with and without hearing loss. Clinic al Neur ophysiolo gy . (Cite d on p age 50
) [29] Berzb orn, M., Bomhardt, R., Klein, J., Rich ter, J.-G., and V orl¨ ander, M. (2017). The
IT A-T o olb o x: An Op en Source MA TLAB T o olb o x for Acoustic Measuremen ts and Signal Pro
cessing. In 43th A nnual German Congr ess on A c oustics, Kiel (Germany), 6 Mar 2017 - 9 Mar
2017 , v olume 43, pages 222–225. (Cite d on p ages 60 , 107 and 138 ) [30] Best, V., Kalluri,
S., McLachlan, S., V alentine, S., Edw ards, B., and Carlile, S. (2010). A comparison of cic and
bte hearing aids for three- dimensional localization of sp eec h. International Journal of
Audiolo gy , 49(10):723–732. (Cite d on p age 42 ) BIBLIOGRAPHY 198 [31] Best, V., Keidser, G.,
Buc hholz, J. M., and F reeston, K. (2015). An exam- ination of sp eec h reception thresholds
measured in a sim ulated reverberant cafeteria environmen t. International Journal of Audiolo gy
. (Cite d on p ages 42 and 52 ) [32] Best, V., Marrone, N., Mason, C. R., and Kidd, G. (2012).
The influence of non-spatial factors on measures of spatial release from masking. The Journal of
the A c oustic al So ciety of A meric a , 131(4):3103–3110. bibtex: b est2012. (Cite d on p age
33 ) [33] Bidelman, G. M., Da vis, M. K., and Pridgen, M. H. (2018). Brainstem- cortical
functional connectivity for sp eech is differentially challenged b y noise and reverberation. He
aring R ese ar ch . (Cite d on p age 50 ) [34] Bigg, G. R. (2015). The scienc e of ic eb er gs ,
page 21–124. Cambridge Univ ersit y Press. (Cite d on p age 128 ) [35] Bisgaard, N., Vlaming, M.
S. M. G., and Dahlquist, M. (2010). Stan- dard audiograms for the iec 60118-15 measuremen t pro
cedure. T r ends in A mplific ation , 14(2):113–120. (Cite d on p ages 163 and 164 ) [36]
Blackstock, D. (2000). F undamentals of Physic al A c oustics . A Wiley- In terscience
publication. Wiley . (Cite d on p age 228 ) [37] Blauert, J. (1969). Sound lo calization in the
median plane. A cta A custic a unite d with A custic a , 22(4):205–213. (Cite d on p age 10 )
[38] Blauert, J. (1997). Sp atial he aring: the psychophysics of human sound lo c alization .
MIT press. (Cite d on p ages 9 , 13 , 14 , 16 , 18 , 40 , 73 , 93 , 126 and 149 ) [39] Blauert,
J. (2005). Communic ation ac oustics . Springer-V erlag Berlin Hei- delb erg, 1 edition. (Cite d
on p ages 2 , 3 , 10 , 13 , 20 , 76 and 98 ) [40] Blauert, J. (2013). The te chnolo gy of binaur
al listening . Springer. (Cite d on p ages 9 , 20 , 22 , 33 and 76 ) [41] Blauert, J., Lehnert,
H., Sahrhage, J., and Strauss, H. (2000). An interac- tiv e virtual-environmen t generator for
psychoacoustic research. i: Arc hitec- ture and implementation. A cta A custic a unite d with A
custic a , 86:94–102. (Cite d on p ages 1 and 57 ) [42] Bo ck, T. M. and Keele, Jr., D. B. D.
(1986). The effects of in terau- ral crosstalk on stereo repro duction and minimizing interaural
crosstalk in nearfield monitoring by the use of a physical barrier: part 1. Journal of the A udio
Engine ering So ciety . (Cite d on p age 43 ) BIBLIOGRAPHY 199 [43] Bradley , J. S. (1986). Sp
eec h intelligibilit y studies in classro oms. The Journal of the A c oustic al So ciety of
Americ a , 80(3):846–854. (Cite d on p age 127 ) [44] Bradley , J. S. and Soulo dre, G. A.
(1995). Ob jectiv e measures of listener en v elopmen t. The Journal of the A c oustic al So
ciety of Americ a , 98(5):2590– 2597. (Cite d on p ages 36 , 38 and 58 ) [45] Brand˜ ao, E.
(2018). A c´ ustic a de salas: Pr ojeto e mo delagem . Editora Bluc her, S˜ ao Paulo. (Cite d on
p ages 16 , 19 , 33 , 36 , 38 and 128 ) [46] Brandao, E., Morgado, G., and F onseca, W. (2020).
A ray tracing engine in tegrated with blender and with uncertain t y estimation: Description and
initial results. Building A c oustics , 28:1–20. (Cite d on p age 27 ) [47] Breebaart, J., v an
de P ar, S., Kohlrausc h, A., and Sc h uijers, E. (2004). High-qualit y parametric spatial audio
co ding at low bitrates. Journal of the A udio Engine ering So ciety . (Cite d on p age 57 )
[48] Breebaart, J., V an de P ar, S., Kohlrausch, A., and Sch uijers, E. (2005). P arametric co
ding of stereo audio. EURASIP Journal on A dvanc es in Signal Pr o c essing , pages 1–18. (Cite
d on p age 57 ) [49] Brinkmann, F., Asp¨ oc k, L., Ac kermann, D., Lepa, S., V orl¨ ander, M.,
and W einzierl, S. (2019). A round robin on ro om acoustical simulation and auralization. The
Journal of the A c oustic al So ciety of Americ a , 145(4):2746– 2760. (Cite d on p age 21 )
[50] Brinkmann, F., Asp¨ ock, L., Ac kermann, D., Op dam, R., V orl¨ ander, M., and W einzierl,
S. (2021). A benchmark for ro om acoustical sim ulation. concept and database. Applie d A c
oustics , 176:107867. (Cite d on p age 21 ) [51] Brinkmann, F., Lindau, A., and W einzierl, S.
(2017). On the authen ticity of individual dynamic binaural synthesis. The Journal of the A c
oustic al So ciety of Americ a , 142(4):1784–1795. (Cite d on p age 14 ) [52] Brown, C. and
Duda, R. (1998). A structural mo del for binaural sound syn thesis. IEEE T r ansactions on Sp e
e ch and A udio Pr o c essing , 6(5):476– 488. (Cite d on p age 12 ) [53] Brown, V. A. and
Strand, J. F. (2019). Noise increases listening effort in normal-hearing young adults, regardless
of working memory capacit y. L anguage, Co gnition and Neur oscienc e . (Cite d on p ages 51 and
118 ) BIBLIOGRAPHY 200 [54] Brungart, D. S., Cohen, J., Cord, M., Zion, D., and Kalluri, S.
(2014). Assessmen t of auditory spatial a wareness in complex listening en vironments. The
Journal of the A c oustic al So ciety of Americ a , 136(4):1808–1820. (Cite d on p age 18 ) [55]
Buchholz, J. M. and Best, V. (2020). Sp eech detection and lo calization in a reverberant m
ultitalk er environmen t b y normal-hearing and hearing- impaired listeners. The Journal of the
A c oustic al So ciety of A meric a , 147(3):1469–1477. (Cite d on p age 42 ) [56] Byrnes, H.
(1984). The role of listening comprehension: A theoretical base. F or eign language annals ,
17(4):317. (Cite d on p age 8 ) [57] Campanini, S. and F arina, A. (2008). A new audacity
feature: ro om ob jectiv e acustical parameters calculation mo dule. (Cite d on p age 36 ) [58]
Choi, I., Shinn-Cunningham, B. G., Chon, S. B., and Sung, K.-m. (2008). Ob jectiv e measurement
of p erceived auditory quality in m ultic hannel audio compression co ding systems. Journal of
the Audio Engine ering So ciety , 56(1/2):3–17. (Cite d on p age 73 ) [59] Claus Lynge
Christensen, Gry Bælum Nielsen, J. H. R. (2008). Danish acoustical so ciety round robin on ro om
acoustic computer mo delling. https: //o deon.dk/learn/articles/auralisation/ . Last c hec k ed
on: No v 28, 2021. (Cite d on p ages 18 , 27 , 68 , 107 and 129 ) [60] Co op er, D. H. and
Bauck, J. L. (1989). prosp ects for transaural recording. Journal of the A udio Engine ering So
ciety , 37(1/2):3–19. (Cite d on p age 22 ) [61] Cubick, J. and Dau, T. (2016). V alidation of a
virtual sound en vironmen t system for testing hearing aids. A cta A custic a unite d with A
custic a . (Cite d on p ages 42 , 53 , 56 and 120 ) [62] Cuev as-Ro dr ´ ıguez, M., Picinali,
L., Gonz´ alez-T oledo, D., Garre, C., de la Rubia-Cuestas, E., Molina-T anco, L., and Rey
es-Lecuona, A. (2019). 3d tune-in to olkit: An op en-source library for real-time binaural
spatialisation. PloS one , 14(3):e0211899. (Cite d on p ages 16 , 20 and 121 ) [63] Cunningham,
L. L. and T ucci, D. L. (2017). Hearing loss in adults. New England Journal of Me dicine ,
377(25):2465–2473. (Cite d on p ages 1 and 55 ) [64] Daniel, J. (2000). R epr´ esentation de
champs ac oustiques, applic ation ` a la tr ansmission et ` a la r epr o duction de sc ` enes
sonor es c omplexes dans un c ontexte multim´ edia (In F r ench) . PhD thesis, Universit y of P
aris VI. (Cite d on p ages 31 and 99 ) [65] Daniel, J. and Moreau, S. (2004). F urther study of
sound field co ding with higher order ambisonics. In Audio Engine ering So ciety Convention 116 .
(Cite d on p ages 30 , 32 , 99 , 121 and 142 ) BIBLIOGRAPHY 201 [66] Davies, W. J., Bruce, N.
S., and Murph y , J. E. (2014). Soundscap e repro- duction and synthesis. A cta A custic a unite
d with A custic a , 100(2):285–292. (Cite d on p age 42 ) [67] Dietrich, P ., Masiero, B., M ¨
uller-T rap et, M., Pollo w, M., and Scharrer, R. (2010). Matlab to olbox for the comprehension
of acoustic measuremen t and signal pro cessing. In F ortschritte der A kustik – DA GA . (Cite d
on p age 107 ) [68] Dreier, C. and V orl¨ ander, M. (2020). Psyc hoacoustic optimisation of air-
craft noise-challenges and limits. In Inter-Noise and Noise-Con Congr ess and Confer enc e Pr o
c e e dings , volume 261, pages 2379–2386. Institute of Noise Con trol Engineering. (Cite d on p
age 20 ) [69] Dreier, C. and V orl¨ ander, M. (2021). Aircraft noise—auralization-based
assessmen t of weather-dependent effects on loudness and sharpness. The Journal of the A c oustic
al So ciety of Americ a , 149(5):3565–3575. (Cite d on p age 20 ) [70] Duda, R., Av endano, C.,
and Algazi, V. (1999). An adaptable ellip- soidal head mo del for the interaural time difference.
In 1999 IEEE Interna- tional Confer enc e on A c oustics, Sp e e ch, and Signal Pr o c essing.
Pr o c e e dings. ICASSP99 (Cat. No.99CH36258) , v olume 2, pages 965–968 vol.2. (Cite d on p
age 12 ) [71] Dunne, R., Desai, D., and Heyns, P . S. (2021). Developmen t of an acoustic
material prop ert y database and universal airflo w resistivity mo del. Applie d A c oustics ,
173:107730. (Cite d on p age 21 ) [72] Eddins, D. A. and Hall, J. W. (2010). Binaural pro
cessing and auditory asymmetries. In Gordon-Salant, S., F risina, R. D., Popper, A. N., and F ay
, R. R., editors, The A ging A uditory System , pages 135–165. Springer New Y ork, New Y ork,
NY. (Cite d on p age 11 ) [73] Epain, N., Guillon, P ., Kan, A., Kosobro dov, R., Sun, D., Jin,
C., and V an Sc haik, A. (2010). Ob jective ev aluation of a three-dimensional sound field repro
duction system. In Burgess, M., Dav ey , J., Don, C., and McMinn, T., editors, Pr o c e e dings
of 20th International Congr ess on A c oustics, ICA 2010 , volume 2, pages 949–955. In
ternational Congress on Acoustics (ICA). (Cite d on p age 31 ) [74] Eyring, C. F. (1930). Rev
erb eration time in “dead” ro oms. The Journal of the A c oustic al So ciety of A meric a ,
1(2A):168–168. (Cite d on p age 34 ) [75] F arina, A. (2000). Simultaneous measurement of
impulse resp onse and distortion with a sw ept-sine tec hnique. Journal of The A udio Engine
ering So ciety . (Cite d on p age 63 ) BIBLIOGRAPHY 202 [76] F arina, A., Glasgal, R.,
Armelloni, E., and T orger, A. (2001). am biophonic principles for the recording and repro
duction of surround sound for music. journal of the audio engine ering so ciety . (Cite d on p
ages 43 and 45 ) [77] F avrot, S. and Buchholz, J. (2009). V alidation of a loudsp eak er-based
ro om auralization system using sp eec h intelligibil ity measures. In A udio Engine ering So
ciety Convention Pap ers , volume Preprin t 7763, page 7763. Praesens V erlag. 126th Audio
Engineering So ciet y Con v en tion, AES126 ; Conference date: 07-05-2009 Through 10-05-2009.
(Cite d on p ages 21 and 99 ) [78] F avrot, S., Marsc hall, M., K¨ asbach, J., Buc hholz, J.,
and W eller, T. (2011). Mixed-order ambisonics recording and playbac k for improving hor- izon
tal directionality . In Pr o c e e ding of the audio engine ering so ciety 131st c onvention .
131st AES Con v en tion ; Conference date: 20-10-2011 Through 23-10-2011. (Cite d on p age 99 )
[79] F avrot, S. E., Buchholz, J., and Dau, T. (2010). A loudsp e aker-b ase d r o om aur
alization system for auditory r ese ar ch . phdthesis, T ec hnical Universit y of Denmark. (Cite
d on p ages 1 , 42 , 43 , 45 , 56 , 57 , 120 and 127 ) [80] Fichna, S., Bib erger, T., Seeb er,
B. U., and Ewert, S. D. (2021). Effect of acoustic scene complexit y and visual scene represen
tation on auditory p erception in virtual audio-visual environmen ts. 2021 Immersive and 3D A
udio: fr om Ar chite ctur e to Automotive (I3D A) . (Cite d on p age 42 ) [81] Fintor, E., Asp¨
ock, L., F els, J., and Sc hlittmeier, S. (2021). The role of spatial separation of t w o talk
ers auditory stimuli in the listener’s memory of running sp eec h: listening effort in a
non-noisy conv ersational setting. International Journal of A udiolo gy . (Cite d on p age 57 )
[82] Fitzroy , D. (1959). Reverberation form ula which seems to b e more accu- rate with non
uniform distribution of absorption. The Journal of the A c ous- tic al So ciety of Americ a ,
31(7):893–897. (Cite d on p age 34 ) [83] F rancis, A. L. and Lov e, J. (2020). Listening effort:
Are we measuring cognition or affect, or b oth? WIREs Co gnitive Scienc e , 11(1):e1514. (Cite d
on p age 98 ) [84] F rank, M. (2014). Lo calization using differen t amplitude-panning metho ds
in the frontal hhorizontal plane. In Pr o c e e dings of the EAA Joint Symp osium on Aur
alization and A mbisonics 2014 . (Cite d on p ages 45 , 49 and 160 ) [85] F rank, M. and Zotter,
F. (2008). Lo calization exp erimen ts using differ- en t 2d am bisonics deco ders. In 25th T
onmeistertagung-VDT International Convention, L eipzig . (Cite d on p ages 45 and 49 )
BIBLIOGRAPHY 203 [86] F ranklin, W. S. (1903). Deriv ation of equation of deca ying sound in a
ro om and definition of op en windo w equiv alen t of absorbing p o w er. Phys. R ev. (Series I)
, 16:372–374. (Cite d on p age 34 ) [87] F raser, S., Gagn ´ e, J. P ., Alepins, M., and Dub
ois, P . (2010). Ev aluat- ing the effort exp ended to understand sp eech in noise using a
dual-task paradigm: The effects of providing visual sp eech cues. Journal of Sp e e ch, L
anguage, and He aring R ese ar ch . (Cite d on p age 50 ) [88] F uruya, H., F ujimoto, K., Y
oung Ji, C., and Higa, N. (2001). Arriv al direc- tion of late sound and listener env elopment.
Applie d A c oustics , 62(2):125– 136. (Cite d on p age 37 ) [89] Gandemer, L., Parseihian, G.,
Bourdin, C., and Kronland-Martinet, R. (2018). Perception of Surrounding Sound Source T ra
jectories in the Horizon- tal Plane: A Comparison of VBAP and Basic-Deco ded HOA. A cta A custic
a unite d with A custic a , pages 338–350. (Cite d on p ages 40 , 57 and 122 ) [90] Gelfand, S.
and Gelfand, S. (2004). He aring: An Intr o duction to Psycho- lo gic al and Physiolo gic al A c
oustics, F ourth Edition . T aylor & F rancis. (Cite d on p age 76 ) [91] Gerzon, M. A.
(1985). Ambisonics in multic hannel broadcasting and video. AES: Journal of the A udio Engine
ering So ciety . (Cite d on p ages 23 and 58 ) [92] Giguere, C. and W o o dland, P . C. (1994).
A computational mo del of the auditory p eriphery for sp eec h and hearing researc h. i.
ascending path. The Journal of the A c oustic al So ciety of Americ a , 95(1):331–342. (Cite d
on p age 8 ) [93] Gil Carv a jal, J., Cubic k, J., Santurette, S., and Dau, T. (2016). Spatial
hearing with incongruent visual or auditory ro om cues. Scientific R ep orts , 6. (Cite d on p
ages 16 and 42 ) [94] Glasgal, R. (2001). the am biophone deriv ation of a recording methodology
optimized for am biophonic repro duction. journal of the audio engine ering so ciety . (Cite d
on p age 43 ) [95] Glasgal, R. and Y ates, K. (1995). Ambiophonics: Beyond Surr ound Sound to
Virtual Sonic R e ality . Am biophonics Institute. (Cite d on p age 43 ) [96] Gomes, L., F
onseca, W. D., de Carv alho, D. M. L., and Mareze, P . H. (2020). Rendering binaural signals for
mo ving sources. In R epr o duc e d Sound 2020 . (Cite d on p age 20 ) BIBLIOGRAPHY 204 [97]
Grimm, G., Ewert, S., and Hohmann, V. (2015a). Ev aluation of spatial audio repro duction
schemes for application in hearing aid research. A cta A custic a unite d with A custic a ,
101(4):842–854. (Cite d on p ages 18 , 21 , 31 , 41 , 49 , 53 , 94 , 117 , 121 , 160 , 163 , 177
, 187 , 190 and 191 ) [98] Grimm, G., Kollmeier, B., and Hohmann, V. (2016a). Spatial Acoustic
Scenarios in Multic hannel Loudspeaker Systems for Hearing Aid Ev aluation. Journal of the A
meric an A c ademy of Audiolo gy , 27(7):557–566. (Cite d on p ages 56 , 163 and 191 ) [99]
Grimm, G., Kollmeier, B., and Hohmann, V. (2016b). Spatial Acoustic Scenarios in Multic hannel
Loudspeaker Systems for Hearing Aid Ev aluation. Journal of the A meric an A c ademy of Audiolo
gy . (Cite d on p age 191 ) [100] Grimm, G., Luberadzk a, J., Herzke, T., and Hohmann, V.
(2015b). T o ol- b o x for acoustic scene creation and rendering (T ASCAR): Render metho ds and
research applications. Pr o c e e dings of the Linux A udio Confer enc e . (Cite d on p age 42 )
[101] Grimm, G., Lub eradzk a, J., and Hohmann, V. (2018). Virtual acoustic en vironmen ts for
comprehensive ev aluation of mo del-based hearing devices *. International Journal of Audiolo gy
. (Cite d on p ages 42 and 120 ) [102] Grimm, G., Lub eradzk a, J., and Hohmann, V. (2019). A to
olb o x for rendering virtual acoustic environmen ts in the context of audiology . A cta A
custic a unite d with A custic a , 105:566–578. (Cite d on p ages 1 , 42 and 57 ) [103]
Guastavino, C., Katz, B., P olack, J.-D., Levitin, D., and Dubois, D. (2004). Ecological v
alidit y of soundscap e repro duction. A cta A custic a unite d with A custic a , 50. (Cite d on
p ages 42 and 191 ) [104] Guastavino, C. and Katz, B. F. G. (2004). P erceptual ev aluation of m
ulti-dimensional spatial audio repro duction. The Journal of the A c oustic al So ciety of
Americ a , 116:1105–1115. (Cite d on p ages 57 , 86 and 122 ) [105] Guastavino, C., Larcher, V.,
Catusseau, G., and Boussard, P . (2007). Spatial audio quality ev aluation: comparing
transaural, am bisonics and stereo. In Pr o c e e dings of the 13th International Confer enc e
on Auditory Display. Montr´ eal Canada . Georgia Institute of T echnology . (Cite d on p ages 86
and 122 ) [106] Hacihabib oglu, H., De Sena, E., Cv etko vic, Z., Johnston, J., and Smith I I I,
J. O. (2017). P erceptual spatial audio recording, simulation, and rendering: An ov erview of
spatial-audio tec hniques based on psychoa- coustics. IEEE Signal Pr o c essing Magazine ,
34(3):36–54. (Cite d on p ages 20 , 22 and 23 ) [107] Hamdan, E. C. and Fletcher, M. D. (2022).
A compact tw o-loudsp eak er virtual sound reproduction system for clinical testing of spatial
hearing with hearing-assistiv e devices. F r ontiers in Neur oscienc e , 15. (Cite d on p ages
42 , 47 and 49 ) BIBLIOGRAPHY 205 [108] Hammershøi, D. and Møller, H. (1992). F undamentals of
binaural tech- nology . In F undamentals of Binaur al T e chnolo gy . (Cite d on p ages 12 and
16 ) [109] Hammershøi, D. and Møller, H. (2005). Binaural technique — basic metho ds for
recording, syn thesis, and repro duction. In Blauert, J., edi- tor, Communic ation A c oustics ,
pages 223–254. Springer Berlin Heidelb erg, Berlin, Heidelb erg. (Cite d on p age 16 ) [110]
Harris, P ., Nagy , S., and V ardaxis, N. (2018). Mosby’s Dictionary of Me dicine, Nursing and
He alth Pr ofessions - R evise d 3r d Anz Edition . Else- vier Health Sciences Apac. (Cite d on
p age 7 ) [111] Hav elo ck, D. I., Kuw ano, S., and V orlander, M. (2008). Handb o ok of signal
pr o c essing in ac oustics . Springer, New Y ork. (Cite d on p age 98 ) [112] Hazrati, O. and
Loizou, P . C. (2012). The combined effects of rev er- b eration and noise on sp eech in
telligibility b y co c hlear implan t listeners. International Journal of A udiolo gy . (Cite d
on p age 98 ) [113] He, J. (2016). Sp atial A udio R epr o duction with Primary Ambient Ex- tr
action . SpringerBriefs in Electrical and Computer Engineering. Springer Singap ore. (Cite d on
p age 18 ) [114] Heck er, S. (1984). Music for adv ertising effect. Psycholo gy &
Marketing , 1(3-4):3–8. (Cite d on p age 8 ) [115] Hendrickx, E., Stitt, P ., Messonnier, J.-C.,
Lyzw a, J.-M., Katz, B. F., and de Boish´ eraud, C. (2017). Impro vemen t of externalization b y
listener and source mov ement using a “binauralized” microphone arra y . Journal of the audio
engine ering so ciety , 65(7/8):589–599. (Cite d on p age 23 ) [116] Hendrikse, M. M. E., Llorac
h, G., Hohmann, V., and Grimm, G. (2019). Mo v emen t and gaze b eha vior in virtual audio
visual listening environmen ts resem bling everyda y life. T r ends in He aring , 23. (Cite d on
p ages 1 and 57 ) [117] Hiyama, K., Komiyama, S., and Hamasaki, K. (2002). The minimum n um b er
of loudsp eak ers and its arrangement for reproducing the spatial impression of diffuse sound
field. Journal of the Audio Engine ering So ciety . (Cite d on p age 41 ) [118] Hohmann, V., P
aluch, R., Krueger, M., Meis, M., and Grimm, G. (2020). The virtual realit y lab: Realization
and application of virtual sound envi- ronmen ts. Ear & He aring , 41:31S–38S. (Cite d
on p ages 1 and 57 ) BIBLIOGRAPHY 206 [119] Holman, J. A., Drummond, A., and Na ylor, G. (2021).
Hearing aids reduce daily-life fatigue and increase so cial activit y: a longitudinal study . me
dRxiv . (Cite d on p ages 1 and 56 ) [120] Holub e, I., F redelak e, S., Vlaming, M., and
Kollmeier, B. (2010). Devel- opmen t and analysis of an international sp eec h test signal
(ists). Interna- tional Journal of A udiolo gy , 49(12):891–903. (Cite d on p age 131 ) [121]
Holub e, I., Haeder, K., Imbery , C., and W eb er, R. (2016). Sub jective Listening Effort and
Electro dermal Activity in Listening Situations with Rev erb eration and Noise. T r ends in he
aring . (Cite d on p ages 99 and 118 ) [122] Hong, J. Y., He, J., Lam, B., Gupta, R., and Gan,
W.-S. (2017). Spatial audio for soundscap e design: Recording and repro duction. Applie d Scienc
es , 7(6). (Cite d on p age 18 ) [123] Hornsby , B. W. (2013). The effects of hearing aid use on
listening effort and men tal fatigue asso ciated with sustained speech pro cessing demands. Ear
and he aring , 34(5):523–534. (Cite d on p age 33 ) [124] How ard, D. and Angus, J. (2009). A c
oustics and Psycho ac oustics 4th Edition . Oxford: F o cal Press, 4th edition. (Cite d on p age
73 ) [125] Huisman, T., Ahrens, A., and MacDonald, E. (2021). Ambisonics sound source lo
calization with v arying amoun t of visual information in virtual realit y . F r ontiers in
Virtual R e ality , 2. (Cite d on p ages 16 and 49 ) [126] International T elecommunications
Union - Radio communications Sector (ITU-R) (2015). Metho ds for the sub jective assessmen t of
small impair- men ts in audio systems. T ec hnical rep ort, In ternational T elecomm unications
Union, Genev a. (Cite d on p ages 42 , 61 , 105 and 132 ) [127] ISO (2009). 3382-1: Acoustics -
measuremen t of ro om acoustic parame- ters. part 1 : P erformance spaces. ISO 1:2009, ISO.
(Cite d on p ages 33 , 34 , 35 and 70 ) [128] J¨ anck e, L. (2008). Music, memory and emotion.
Journal of biolo gy , 7(6):1–5. (Cite d on p age 8 ) [129] Jin, C., Corderoy , A., Carlile, S.,
and v an Schaik, A. (2004). Contrasting monaural and interaural sp ectral cues for h uman sound
lo calization. The Journal of the A c oustic al So ciety of Americ a , 115(6):3124–3141. (Cite d
on p age 11 ) [130] Jot, J.-M., W ardle, S., and Larcher, V. (1998). approaches to binaural syn
thesis. Journal of the Audio Engine ering So ciety . (Cite d on p age 27 ) BIBLIOGRAPHY 207
[131] Kang, S. and Kim, S.-H. K. (1996). Realistic audio teleconferencing using binaural and
auralization techniques. Etri Journal , 18:41–51. (Cite d on p age 23 ) [132] Katz, B. F. G. and
Noisternig, M. (2014). A comparative study of in ter- aural time delay estimation metho ds. The
Journal of the A c oustic al So ciety of Americ a , 135(6):3530–3540. (Cite d on p age 70 )
[133] Keet, V. (1968). The influence of early lateral reflections on the spatial impression. Pr o
c. 6th Int. Cong. A c oust., T okyo , 2. (Cite d on p age 39 ) [134] Keidser, G., Na ylor, G.,
Brungart, D. S., Caduff, A., Camp os, J., Carlile, S., Carp enter, M. G., Grimm, G., Hohmann, V.,
Holub e, I., Launer, S., Lunner, T., Mehra, R., Rapp ort, F., Slaney , M., and Smeds, K. (2020).
The quest for ecological v alidity in hearing science: what it is, why it matters, and how to
adv ance it. Ear and He aring , 41(S1):5S–19S. (Cite d on p ages 53 , 56 , 98 , 122 and 191 )
[135] Kestens, K., Degeest, S., and Keppler, H. (2021). The effect of cognition on the aided b
enefit in terms of sp eech understanding and listening effort obtained with digital hearing aids:
A systematic review. A meric an Journal of Audiolo gy , 30(1):190–210. (Cite d on p age 98 )
[136] Kirsch, C., Poppitz, J., W endt, T., v an de Par, S., and Ew ert, S. D. (2021).
Computationally efficient spatial rendering of late rev erb eration in virtual acoustic environmen
ts. 2021 Immersive and 3D A udio: fr om A r chite ctur e to Automotive (I3DA) . (Cite d on p age
42 ) [137] Klatte, M., Lachmann, T., Meis, M., et al. (2010). Effects of noise and rev erb
eration on sp eec h p erception and listening comprehension of children and adults in a classro
om-lik e setting. Noise and He alth , 12(49):270. (Cite d on p ages 1 and 191 ) [138] Kleiner,
M., Dalenb¨ ack, B.-I., and Svensson, P . (1993). Auralization-an o v erview. Journal of the
Audio Engine ering So ciety , 41(11):861–875. (Cite d on p age 19 ) [139] Klemenz, M. (2005).
Sound synthesis of starting electric railb ound v ehi- cles and the influence of consonance on
sound quality . A cta acustic a unite d with acustic a , 91(4):779–788. (Cite d on p age 20 )
[140] Klo ckgether, S. and v an de Par, S. (2016). Just noticeable differences of spatial cues in
ec hoic and anec hoic acoustical environmen ts. The Journal of the A c oustic al So ciety of
Americ a , 140(4):EL352–EL357. (Cite d on p ages 93 and 94 ) [141] Kobay ashi, M., Ueno, K., and
Ise, S. (2015). The Effects of Spatial- ized Sounds on the Sense of Presence in Auditory Virtual
Environmen ts: A Psyc hological and Physiological Study. Pr esenc e: T ele op er ators and
Virtual Envir onments , 24(2):163–174. (Cite d on p age 16 ) BIBLIOGRAPHY 208 [142] Ko ehnke, J.
and Besing, J. (1996). A procedure for testing sp eech intelli- gibilit y in a virtual listening
environmen t. Ear and He aring , 17(3):211–217. cited By 59. (Cite d on p age 40 ) [143] Ko
elewijn, T., Zekveld, A. A., F esten, J. M., and Kramer, S. E. (2012). Pupil dilation uncov ers
extra listening effort in the presence of a single-talker mask er. Ear and He aring ,
33(2):291–300. (Cite d on p age 190 ) [144] Kramer, S. E., Bh uiyan, T., Bramsløw, L., Fiedler,
L., Gra versen, C., Hadley , L. V., Innes-Brown, H., Naylor, G., Rich ter, M., Saunders, G. H.,
V ersfeld, N. J., W endt, D., Whitmer, W. M., and Zekveld, A. A. (2020). Inno v ativ e hearing
aid research on ecological conditions and outcome mea- sures: The hear-eco pro ject. (Cite d on
p age 191 ) [145] Kramer, S. E., Kapteyn, T. S., F esten, J. M., and T obi, H. (1996). The
relationships b et w een self-rep orted hearing disability and measures of auditory disability .
Audiolo gy , 35(5):277–287. (Cite d on p age 56 ) [146] Krokstad, A., Strom, S., and Sørsdal, S.
(1968). Calculating the acous- tical ro om resp onse by the use of a ra y tracing technique.
Journal of Sound and Vibr ation , 8(1):118–125. (Cite d on p age 19 ) [147] Krueger, M., Sch
ulte, M., Brand, T., and Holub e, I. (2017). Developmen t of an adaptive scaling metho d for sub
jective listening effort. The Journal of the A c oustic al So ciety of Americ a . (Cite d on p
age 50 ) [148] Kuttruff, H. (2009). R o om A c oustics, Fifth Edition . T a ylor & F
rancis. (Cite d on p ages 19 , 34 , 68 and 127 ) [149] Kwak, C., Han, W., Lee, J., Kim, J., and
Kim, S. (2018). Effect of noise and rev erb eration on sp eec h recognition and listening effort
for older adults. Geriatrics and Ger ontolo gy International . (Cite d on p ages 50 , 99 and 118
) [150] Laitinen, M.-V. and Pulkki, V. (2009). Binaural repro duction for di- rectional audio co
ding. In 2009 IEEE Workshop on Applic ations of Signal Pr o c essing to A udio and A c oustics ,
pages 337–340. (Cite d on p age 41 ) [151] Lau, M. K., Hicks, C., Kroll, T., and Zupancic, S.
(2019). Effect of auditory task type on ph ysiological and sub jective measures of listening
effort in individuals with normal hearing. Journal of Sp e e ch, L anguage, and He aring R ese ar
ch . (Cite d on p ages 50 and 52 ) [152] Lau, S.-T., Pic hora-F uller, M., Li, K., Singh, G.,
and Camp os, J. (2016). Effects of hearing loss on dual-task p erformance in an audio visual
virtual re- alit y simulation of listening while walking. Journal of the A meric an A c ademy of
Audiolo gy , 27. (Cite d on p age 57 ) BIBLIOGRAPHY 209 [153] Letowski, T. and Letowski, S.
(2011). Lo calization error accuracy and precision of auditory lo calization. In Strumillo, P .,
editor, A dvanc es in Sound L o c alization , chapter 4, pages 55–78. In tec h, Oxford. (Cite d
on p ages 9 , 10 and 17 ) [154] Levy , S. M. (2012). Section 9 - calculations to determine the
effec- tiv eness and con trol of thermal and sound transmission. In Levy , S. M., editor,
Construction Calculations Manual , pages 503–544. Butterw orth- Heinemann, Boston. (Cite d on p
age 60 ) [155] Lindau, A. and Brinkmann, F. (2012). p erceptual ev aluation of head- phone comp
ensation in binaural synthesis based on non-individual record- ings. journal of the audio engine
ering so ciety , 60(1/2):54–62. (Cite d on p age 14 ) [156] Lindau, A., Kosank e, L., and W
einzierl, S. (2010). p erceptual ev aluation of ph ysical predictors of the mixing time in
binaural room impulse responses. Journal of the A udio Engine ering So ciety . (Cite d on p age
126 ) [157] Lindemann, W. (1986). Extension of a binaural cross-correlation mo del b y
contralateral inhibition. i. simulation of lateralization for stationary sig- nals. The Journal
of the A c oustic al So ciety of Americ a , 80 6:1608–22. (Cite d on p age 45 ) [158] Liu, Z., F
ard, M., and Jazar, R. (2015). Developmen t of an acoustic material database for vehicle in
terior trims. T ec hnical report, SAE T ec hnical P ap er. (Cite d on p age 21 ) [159] Llopis,
H. S., Pind, F., and Jeong, C.-H. (2020). Dev elopmen t of an auditory virtual realit y system
based on pre-computed b-format im- pulse resp onses for building design ev aluation. Building
and Envir onment , 169:106553. (Cite d on p age 133 ) [160] Llorach, G., Ev ans, A., Blat, J.,
Grimm, G., and Hohmann, V. (2016). W eb-based liv e speech-driv en lip-sync. In 2016 8th
International Confer enc e on Games and Virtual Worlds for Serious Applic ations (VS-GAMES) ,
pages 1–4. (Cite d on p ages 1 and 57 ) [161] Llorach, G., Grimm, G., Hendrikse, M. M., and
Hohmann, V. (2018). T ow ards Realistic Immersiv e Audiovisual Sim ulations for Hearing
Research. In Pr o c e e dings of the 2018 Workshop on Audio-Visual Sc ene Understanding for
Immersive Multime dia , pages 33–40. (Cite d on p ages 1 , 18 , 27 , 40 , 53 , 56 , 57 , 58 and
191 ) [162] Llorca-Bof ´ ı, J., Dreier, C., Heck, J., and V orl¨ ander, M. (2022). Urban sound
auralization and visualization framew ork;case study at ih tapark. Sus- tainability , 14(4).
(Cite d on p age 20 ) BIBLIOGRAPHY 210 [163] Loizou, P . C. (2007). Sp e e ch enhanc ement: the
ory and pr actic e . CRC press. (Cite d on p age 52 ) [164] Lokki, T. and Savio ja, L. (2008).
Virtual acoustics. In Ha velock, D., Ku w ano, S., and V orl¨ ander, M., editors, Handb o ok of
Signal Pr o c essing in A c oustics , pages 761–771. Springer New Y ork, New Y ork, NY. (Cite d
on p age 20 ) [165] Long, M. (2014). A r chite ctur al A c oustics . Elsevier Science. (Cite d
on p ages 16 , 19 and 42 ) [166] Lop ez, J. J., Gutierrez, P ., Cob os, M., and Aguilera, E.
(2014). Sound distance p erception comparison b etw een W a v e Field Synthesis and V ector Base
Amplitude Panning. In ISCCSP 2014 - 2014 6th International Sym- p osium on Communic ations,
Contr ol and Signal Pr o c essing, Pr o c e e dings . (Cite d on p ages 21 and 121 ) [167] Lov
edee-T urner, M. and Murph y , D. (2018). Application of mac hine learning for the spatial
analysis of binaural ro om impulse resp onses. Applie d Scienc es , 8(1). (Cite d on p age 15 )
[168] Lund, K. D., Ahrens, A., and Dau, T. (2020). A metho d for ev alu- ating audio-visual
scene analysis in multi-talk er en vironmen ts. In Pr o c e e d- ings of the International Symp
osium on Auditory and Audiolo gic al R ese ar ch , v olume 7, pages 357–364. The Danav ox
Jubilee F oundation. International Symp osium on Auditory and Audiological Research ISAAR2019.
(Cite d on p age 42 ) [169] Lundb eck, M., Grimm, G., Hohmann, V., Laugesen, S., and Neher, T.
(2017). Sensitivity to Angular and Radial Source Mo v emen ts as a F unction of Acoustic
Complexity in Normal and Impaired Hearing. T r ends in He aring , 21:2331–2165. (Cite d on p
ages 33 and 57 ) [170] Lyon, R. F. (2017). Human and Machine He aring Extr acting Me aning fr om
Sound . a. Cam bridge Universit y Press. (Cite d on p age 10 ) [171] Magezi, D. A. (2015).
Linear mixed-effects models for within-participan t psyc hology exp eriments: an introductory
tutorial and free, graphical user in terface (lmmgui). F r ontiers in Psycholo gy , 6:2. (Cite d
on p age 113 ) [172] Malham, D. G. and Myatt, A. (1995). 3-d sound spatialization using am
bisonic techniques. Computer Music Journal , 19(4):58–70. (Cite d on p age 29 ) [173] Mansour,
N., Marschall, M., May , T., W estermann, A., and Dau, T. (2021a). Sp eech intelligibilit y in a
realistic virtual sound environmen t. The Journal of the A c oustic al So ciety of Americ a ,
149(4):2791–2801. (Cite d on p age 99 ) BIBLIOGRAPHY 211 [174] Mansour, N., W estermann, A.,
Marsc hall, M., Ma y , T., Dau, T., and Buc hholz, J. (2021b). Guided ecological momentary
assessmen t in real and virtual sound en vironments. A c oustic al So ciety of A meric a.
Journal , 150(4):2695–2704. (Cite d on p ages 16 and 42 ) [175] Marentakis, G., Zotter, F., and
F rank, M. (2014). V ector-base and am- bisonic amplitude panning: A comparison using p op,
classical, and contem- p orary spatial music. A cta A custic a unite d with A custic a . (Cite d
on p ages 57 and 86 ) [176] Marrone, N., Mason, C. R., and Kidd, G. (2008). The effects of
hearing loss and age on the b enefit of spatial separation b etw een multiple talk ers in
reverberant ro oms. The Journal of the A c oustic al So ciety of A meric a , 124(5):3064–3075.
(Cite d on p ages 16 and 40 ) [177] Marschall, M. (2014). Capturing and repro ducing realistic
acoustic scenes for hearing research. PhD Thesis - T e chnic al University of Denmark . (Cite d
on p ages 40 , 53 , 99 and 120 ) [178] Masiero, B. (2012). Individualize d Binaur al T e chnolo
gy. Me asur ement, Equalization and Per c eptual Evaluation . PhD thesis, R WTH Aac hen Uni- v
ersit y . (Cite d on p age 14 ) [179] Masiero, B. and F els, J. (2011). Perceptually robust
headphone equal- ization for binaural repro duction. In Audio Engine ering So ciety Convention
130 . Audio Engineering So ciet y . (Cite d on p age 22 ) [180] Masiero, B. and V orlaender, M.
(2011). Spatial Audio Repro duction Metho ds for Virtual Realit y. In 42 º Congr eso Esp a ˜ nol
de A c ´ ustic a Encuentr o Ib ´ eric o de A c ´ ustic a - Eur op e an Symp osium on Envir
onmental A c oustics and on Buildings A c oustic al ly Sustainable , pages 1–12, C´ aceres.
(Cite d on p ages 23 , 24 , 58 and 86 ) [181] Matthen, M. (2016). Effort and displeasure in p
eople who are hard of hearing. Ear and He aring , 37 Suppl 1. (Cite d on p age 57 ) [182] May ,
T., v an de P ar, S., and Kohlrausch, A. (2011). A probabilistic mo del for robust lo calization
based on a binaural auditory fron t-end. IEEE T r ansactions on A udio, Sp e e ch, and L anguage
Pr o c essing , 19(1):1–13. (Cite d on p ages xix , 6 , 138 , 150 , 156 , 166 , 167 , 174 and
178 ) [183] Meesaw at, K. and Hammershoi, D. (2003). The time when the reverber- ation tail in a
binaural ro om impulse resp onse b egins. In Audio Engine ering So ciety Convention 115 . Audio
Engineering So ciety . (Cite d on p age 15 ) [184] Menase, D. A., Ric hter, M., W endt, D.,
Fiedler, L., and Naylor, G. (2022). T ask-induced men tal fatigue and motiv ation influence
listening effort as measured by the pupil dilation in a sp eech-in-noise task. me dRxiv . (Cite d
on p age 33 ) BIBLIOGRAPHY 212 [185] Michael, V. and V orl¨ ander, M. (2008). A ur alization. F
undamentals of A c oustics, Mo del ling, Simulation, Algorithms and A c oustic Virtual R e ality
. Springer. (Cite d on p age 98 ) [186] Miles, K., McMahon, C., Boisvert, I., Ibrahim, R., de
Lissa, P ., Gra- ham, P ., and Lyxell, B. (2017). Ob jective Assessmen t of Listening Effort:
Coregistration of Pupillometry and EEG. T r ends in He aring . (Cite d on p ages 50 , 51 and 118
) [187] Millington, G. (1932). A mo dified formula for reverberation. The Journal of the A c
oustic al So ciety of A meric a , 4(1A):69–82. (Cite d on p age 34 ) [188] Minnaar, P ., F
avrot, S., and Buc hholz, J. (2010). Improving hearing aids through listening tests in a virtual
sound en vironment. He aring Journal , 63(10):40–44. (Cite d on p ages 1 , 16 , 42 , 121 and 191
) [189] Møller, H., Sørensen, M. F., Hammershøi, D., and Jensen, C. B. (1995). Head-related
transfer functions of h uman sub jects. Journal of the A udio Engine ering So ciety ,
43(5):300–321. (Cite d on p age 12 ) [190] Monaghan, J. J., Krumbholz, K., and Seeber, B. U.
(2013). F actors affecting the use of env elop e in teraural time differences in rev erb
erationa). The Journal of the A c oustic al So ciety of A meric a , 133(4):2288–2300. bibtex:
Monaghan2013. (Cite d on p age 33 ) [191] Mo ore, B. C. J. and T an, C.-T. (2004). dev elopmen t
and v alidation of a metho d for predicting the perceived naturalness of sounds sub jected to sp
ectral distortion. Journal of the Audio Engine ering So ciety , 52(9):900– 914. (Cite d on p age
41 ) [192] Mo ore, T. M. and Picou, E. M. (2018). A p otential bias in sub jective ratings of
men tal effort. Journal of Sp e e ch, L anguage, and He aring R ese ar ch . (Cite d on p ages 50
, 51 and 118 ) [193] Mueller, M. F., Kegel, A., Sc himmel, S. M., Dillier, N., and Hofbauer, M.
(2012). Localization of virtual sound sources with bilateral hearing aids in realistic
acoustical scenes. The Journal of the A c oustic al So ciety of A meric a , 131(6):4732–4742.
(Cite d on p age 16 ) [194] M ¨ uller, S. and Massarani, P . (2001). T ransfer-function
measurement with sweeps. Journal of the A udio Engine ering So ciety , 49:443–471. (Cite d on p
ages 63 and 140 ) [195] Murta, B. (2019). Plataforma p ar a ensaios de p er c ep¸ c˜ ao sonor a
c om fontes distribu ´ ıdas aplic´ avel a disp ositivos auditivos: p erSONA (in Por- tuguese) .
PhD thesis, F ederal Universit y of Santa Catarina. (Cite d on p ages 1 and 57 ) BIBLIOGRAPHY
213 [196] Murta, B., Chiea, R., Mour˜ ao, G., Pinheiro, M. M., Cordioli, J., P aul, S., and
Costa, M. (2019). Cci-mobile: Dev elopment of soft w are based to ols for sp eec h p erception
assessment and training with hearing impaired brazil- ian population. In CONFERENCE on
Implantable A uditory Pr ostheses (CIAP), L ake T aho e, California, US . (Cite d on p age 18 )
[197] Møller, H. (1992). F undamentals of binaural tec hnology . Applie d A c ous- tics ,
36(3-4):171–218. (Cite d on p ages 15 and 73 ) [198] Nach bar, C., Zotter, F., Deleflie, E., and
Sontacc hi, A. (2011). Ambix – a suggested ambisonics format. (Cite d on p age 103 ) [199]
Narbutt, M., Allen, A., Sk oglund, J., Chinen, M., and Hines, A. (2018). Am biqual - a full
reference ob jective qualit y metric for ambisonic spatial audio. In 2018 T enth International
Confer enc e on Quality of Multime dia Exp erienc e (QoMEX) , pages 1–6. (Cite d on p age 27 )
[200] Naugolnykh, K. A., Ostro vsky , L. A., Sap ozhnik ov, O. A., and Hamilton, M. F. (2000).
Nonlinear wa ve pro cesses in acoustics. (Cite d on p age 9 ) [201] Naylor, G. M. (1993).
Odeon—another h ybrid ro om acoustical mo del. Applie d A c oustics , 38(2-4):131–143. (Cite d
on p age 68 ) [202] Neuhoff, J. (2021). Ec olo gic al psycho ac oustics . Brill. (Cite d on p age
190 ) [203] Neuman, A. C., W roblewski, M., Ha jicek, J., and Rubinstein, A. (2010). Com bined
effects of noise and reverberation on sp eec h recognition p erfor- mance of normal-hearing c
hildren and adults. Ear and He aring . (Cite d on p ages 99 and 118 ) [204] Nicola, P . and
Chiara, V. (2019). Impact of Background Noise Fluc- tuation and Rev erb eration on Resp onse
Time in a Sp eec h Reception T ask. Journal of Sp e e ch, L anguage, and He aring R ese ar ch ,
62(11):4179–4195. (Cite d on p ages 50 , 99 and 118 ) [205] Nielsen, J. and Dau, T. (2011). The
danish hearing in noise test. Inter- national journal of audiolo gy , 50:202–8. (Cite d on p
ages 101 and 102 ) [206] No ck e, C. and Mellert, V. (2002). Brief review on in situ measurement
tec hniques of imp edance or absorption. In F orum A custicum, Sevil la . (Cite d on p age 21 )
[207] Nov o, P . (2005). Auditory virtual en vironmen ts. In Blauert, J., edi- tor, Communic
ation A c oustics , pages 277–297. Springer Berlin Heidelb erg, Berlin, Heidelb erg. (Cite d on
p age 57 ) BIBLIOGRAPHY 214 [208] Obleser, J., W¨ ostmann, M., Hellb ernd, N., Wilsch, A., and
Maess, B. (2012). Adverse listening conditions and memory load drive a common alpha oscillatory
netw ork. Journal of Neur oscienc e , 32(36):12376–12383. (Cite d on p age 111 ) [209]
Ohlenforst, B., W endt, D., Kramer, S. E., Naylor, G., Zekv eld, A. A., and Lunner, T. (2018).
Impact of SNR, mask er type and noise reduction pro cessing on sen tence recognition p
erformance and listening effort as indi- cated by the pupil dilation resp onse. He aring R ese ar
ch . (Cite d on p ages 50 and 101 ) [210] Ohlenforst, B., Zekv eld, A. A., Jansma, E. P ., W
ang, Y., Na ylor, G., Lorens, A., Lunner, T., and Kramer, S. E. (2017a). Effects of hearing
impairmen t and hearing aid amplification on listening effort: A systematic review. Ear and he
aring , 38(3):267—281. (Cite d on p age 98 ) [211] Ohlenforst, B., Zekveld, A. A., Lunner, T., W
endt, D., Na ylor, G., W ang, Y., V ersfeld, N. J., and Kramer, S. E. (2017b). Impact of
stimulus-related factors and hearing impairment on listening effort as indicated by pupil
dilation. He aring R ese ar ch , 351:68–79. (Cite d on p age 50 ) [212] Oreinos, C. and Buc
hholz, J. (2014). V alidation of realistic acoustic en vironmen ts for listening tests using
directional hearing aids. In 2014 14th International Workshop on A c oustic Signal Enhanc ement
(IW AENC) , pages 188–192. (Cite d on p ages 41 and 120 ) [213] Oreinos, C. and Buchholz, J. M.
(2015). Ob jectiv e analysis of am bisonics for hearing aid applications: Effect of listener’s
head, ro om reverberation, and directional microphones. The Journal of the A c oustic al So
ciety of A mer- ic a . (Cite d on p ages 18 , 41 , 53 , 163 and 191 ) [214] Palacino, J., Nicol,
R., Emerit, M., and Gros, L. (2012). P erceptual assessmen t of binaural deco ding of first-order
am bisonics. In A c oustics 2012 . (Cite d on p age 21 ) [215] Parsehian, G., Gandemer, L.,
Bourdin, C., and Kronland Martinet, R. (2015). Design and p erceptual ev aluation of a fully
immersiv e three- dimensional sound spatialization system. In 3r d International Confer enc e on
Sp atial Audio (ICSA 2015) , Graz, Austria. (Cite d on p age 42 ) [216] Paul, S. (13-15 maio
2014). A fisiologia da audi¸ c˜ ao como base para fenˆ omenos auditiv os. In Pr o c e e dings of
the 12th AES Br azil Confer enc e , S˜ ao Paulo, SP . (Cite d on p age 9 ) [217] Pausc h, F.,
Asp¨ oc k, L., V orl¨ ander, M., and F els, J. (2018). An Ex- tended Binaural Real-Time
Auralization System With an Interface to Re- searc h Hearing Aids for Exp eriments on Sub jects
With Hearing Loss. T r ends in He aring . (Cite d on p ages 16 , 44 , 45 , 120 and 121 )
BIBLIOGRAPHY 215 [218] Pausc h, F., Behler, G., and F els, J. (2020). Scalar - a surrounding
spher- ical cap loudsp eak er arra y for flexible generation and ev aluation of virtual acoustic
environmen ts. A cta A cust. , 4(5):19. (Cite d on p ages 1 and 57 ) [219] Pausc h, F. and F
els, J. (2019). Mobilab – a mobile lab oratory for on-site listening exp erimen ts in virtual
acoustic environmen ts. bioRxiv . (Cite d on p ages 1 and 57 ) [220] Pausc h, F. and F els, J.
(2020). Lo calization p erformance in a binaural real-time auralization system extended to
researc h hearing aids. T r ends in he aring , 24:1–18. (Cite d on p ages 1 , 42 and 57 ) [221]
Pelzer, S., Masiero, B., and V orl¨ ander, M. (2014). 3D Repro duction of Ro om Auralizations b
y Com bining Intensit y Panning, Crosstalk Cancella- tion and Ambisonics. Pr o c e e dings of
the EAA Joint Symp osium on Aur al- ization and Ambisonics . (Cite d on p ages 44 , 45 , 86 and
127 ) [222] Peng, Z. E. and Litovsky , R. Y. (2021). The role of in teraural differences, head
shado w, and binaural redundancy in binaural intelligibilit y b enefits among school-aged
children. T r ends in He aring , 25. (Cite d on p age 77 ) [223] Petersen, E. B., W¨ ostmann,
M., Obleser, J., Stenfelt, S., and Lunner, T. (2015). Hearing loss impacts neural alpha
oscillations under adverse listening conditions. F r ontiers in Psycholo gy . (Cite d on p age
50 ) [224] Pichora-F uller, M. K., Kramer, S. E., Eck ert, M. A., Edwards, B., Hornsb y , B. W.,
Humes, L. E., Lemke, U., Lunner, T., Matthen, M., Mack- ersie, C. L., Naylor, G., Phillips, N.
A., Rich ter, M., Rudner, M., Sommers, M. S., T rembla y , K. L., and Wingfield, A. (2016).
Hearing impairment and cognitiv e energy: The framework for understanding effortful listening
(FUEL). In Ear and He aring . (Cite d on p ages 50 , 51 , 52 , 57 and 118 ) [225] Picou, E. M.,
Gordon, J., and Ric ketts, T. A. (2016). The effects of noise and rev erb eration on listening
effort in adults with normal hearing. Ear and He aring . (Cite d on p ages 50 and 99 ) [226]
Picou, E. M., Mo ore, T. M., and Rick etts, T. A. (2017). The effects of directional pro cessing
on ob jectiv e and sub jectiv e listening effort. Journal of Sp e e ch, L anguage, and He aring R
ese ar ch . (Cite d on p ages 1 and 51 ) [227] Picou, E. M., Rick etts, T., and Hornsb y , B.
(2013). Ho w hearing aids, bac kground noise, and visual cues influence ob jectiv e listening
effort. Ear and He aring , 34:e52–e64. (Cite d on p ages 50 and 56 ) BIBLIOGRAPHY 216 [228]
Picou, E. M. and Rick etts, T. A. (2014). Increasing motiv ation changes sub jectiv e rep orts
of listening effort and c hoice of coping strategy . Interna- tional Journal of A udiolo gy ,
53(6):418–426. (Cite d on p age 50 ) [229] Picou, E. M. and Rick etts, T. A. (2018). The
relationship b etw een sp eech recognition, b ehavioural listening effort, and sub jective
ratings. Interna- tional Journal of A udiolo gy . (Cite d on p ages 51 and 118 ) [230] Pielage,
H., Zekveld, A. A., Saunders, G. H., V ersfeld, N. J., Lunner, T., and Kramer, S. E. (2021). The
Presence of Another Individual Influences Listening Effort, But Not Performance. Ear & He
aring . (Cite d on p ages 40 , 57 , 82 and 190 ) [231] Pieren, R. (2018). Aur alization of Envir
onmental A c oustic al Sc eneries: Synthesis of R o ad T r affic, R ailway and Wind T urbine Noise
. PhD thesis, Delft Universit y of T echnology . (Cite d on p age 20 ) [232] Pieren, R., Heutsc
hi, K., W underli, J. M., Snellen, M., and Simons, D. G. (2017). Auralization of railwa y noise:
Emission synthesis of rolling and impact noise. Applie d A c oustics , 127:34–45. (Cite d on p
age 20 ) [233] Pinheiro, J. C. and Bates, D. M. (2000). Linear mixed-effects mo dels: basic
concepts and examples. Mixe d-effe cts mo dels in S and S-Plus , pages 3–56. (Cite d on p age 113
) [234] Plain, B., Pielage, H., Rich ter, M., Bhuiy an, T., Lunner, T., Kramer, S., and Zekveld,
A. (2021). Social observ ation increases the cardio v ascular re- sp onse of hearing-impaired
listeners during a sp eech reception task. He aring R ese ar ch , page 108334. (Cite d on p ages
57 and 190 ) [235] Plinge, A., Schlec ht, S. J., Thiergart, O., Rob otham, T., Rumm uk ainen,
O., and Hab ets, E. A. P . (2018). six-degrees-of-freedom binaural audio repro duction of
first-order am bisonics with distance information. Journal of the audio engine ering so ciety .
(Cite d on p age 27 ) [236] Poletti, M. A. (2005). Three-dimensional surround sound systems
based on spherical harmonics. journal of the audio engine ering so ciety , 53(11):1004–1025.
(Cite d on p age 31 ) [237] Politis, A. (2016). Micr ophone arr ay pr o c essing for p ar
ametric sp atial audio te chniques . Do ctoral thesis, School of Electrical Engineering. (Cite d
on p ages 130 , 131 and 181 ) [238] Politis, A., McCormac k, L., and Pulkki, V. (2017).
Enhancement of am bisonic binaural repro duction using directional audio co ding with opti- mal
adaptiv e mixing. In 2017 IEEE Workshop on Applic ations of Signal Pr o c essing to A udio and A
c oustics (W ASP AA) , pages 379–383. (Cite d on p age 27 ) BIBLIOGRAPHY 217 [239] Pollo w, M.
(2015). Dir e ctivity Patterns for R o om A c oustic al Me asur e- ments and Simulations . Aac
hener Beitr¨ age zur T echnisc hen Akustik. Logos V erlag Berlin GmbH. (Cite d on p ages 28 and
29 ) [240] Portela, M. S. (2008). Caracteriza¸ c˜ ao de fontes sonoras e aplica¸ c˜ ao na
auraliza¸ c˜ ao de ambien tes. Mestrado, Universidade F ederal de Santa Cata- rina. (Cite d on p
age 13 ) [241] Pulkki, V. (1997). Virtual sound source p ositioning using v ector base amplitude
panning. Journal of the Audio Engine ering So ciety , 45(6). (Cite d on p ages 23 , 24 , 26 , 58
, 93 , 142 , 179 and 190 ) [242] Pulkki, V. and Karjalainen, M. (2015). Communic ation A c
oustics: A n Intr o duction to Sp e e ch, Audio and Psycho ac oustics . Wiley . (Cite d on p
ages 10 , 17 , 70 , 73 and 143 ) [243] Pulkki, V., Politis, A., Laitinen, M.-V., Vilk amo, J.,
and Ahonen, J. (2017). First-order directional audio co ding (dirac). In Par ametric Time- F r e
quency Domain Sp atial Audio , c hapter 5, pages 89–140. John Wiley & Sons, Ltd. (Cite d
on p ages 44 , 45 and 127 ) [244] Purdy , M. (1991). Listening and comm unit y: The role of
listening in comm unit y formation. International Journal of Listening , 5(1):51–67. (Cite d on
p age 8 ) [245] Queiroz, M., Iazzetta, F., Kon, F., Gomes, M. H. A., Figueiredo, F. L., Masiero,
B. S., Dias, L., T orres, M. H. C., and Thomaz, L. F. (2008). Acm us: an op en, integrated
platform for ro om acoustics research. J. Br az. Comput. So c. , 14(3):87–103. (Cite d on p age
36 ) [246] Rayleigh, L. (1907). Xii. on our p erception of sound direction. (Cite d on p age 10
) [247] Reichardt, W., Alim, O. A., and Schmidt, W. (1975). Definition and basis of making an ob
jective ev aluation to distinguish b et w een useful and useless clarit y defining m usical p
erformances. A cta A custic a unite d with A custic a , 32(3):126–137. (Cite d on p age 36 )
[248] Rennies, J., Brand, T., and Kollmeier, B. (2011). Prediction of the influence of
reverberation on binaural sp eec h intelligibilit y in noise and in quiet. The Journal of the A
c oustic al So ciety of Americ a , 130(5):2999–3012. (Cite d on p age 40 ) [249] Rennies, J., Sc
hepk er, H., Holub e, I., and Kollmeier, B. (2014). Listening effort and sp eec h in
telligibility in listening situations affected by noise and rev erb eration. The Journal of the A
c oustic al So ciety of Americ a . (Cite d on p age 50 ) BIBLIOGRAPHY 218 [250] Roffler, S. K. and
Butler, R. A. (1968). F actors that influence the lo cal- ization of sound in the vertical plane.
The Journal of the A c oustic al So ciety of Americ a , 43(6):1255–1259. (Cite d on p ages 10
and 33 ) [251] Roginsk a, A. (2017). Binaural audio through headphones. In Immersive Sound ,
pages 88–123. Routledge. (Cite d on p ages 40 and 53 ) [252] Romanov, M., Berghold, P ., F rank,
M., Rudric h, D., Zaunsc hirm, M., and Zotter, F. (2017). Implementation and ev aluation of a
low-cost head- trac k er for binaural syn thesis. Journal of the audio engine ering so ciety .
(Cite d on p age 23 ) [253] Rose, J., Nelson, P ., Rafaely , B., and T akeuc hi, T. (2002).
Sweet sp ot size of virtual acoustic imaging systems at asymmetric listener lo cations. The
Journal of the A c oustic al So ciety of Americ a , 112(5):1992–2002. (Cite d on p ages 31 and
121 ) [254] Rossing, T. D. (2007). Springer Handb o ok of A c oustics . Springer Hand- b o ok of
Acoustics. Springer-V erlag Berlin Heidelb erg, Stanford, CA, 2 edi- tion. (Cite d on p ages 16
, 19 , 33 , 36 , 38 and 98 ) [255] Rudenko, O. and Soluian, S. (1975). The theoretical
principles of non- linear acoustics. Mosc ow Izdatel Nauka . (Cite d on p age 9 ) [256] Rumsey ,
F. (2013). Sp atial Audio . F o cal Press, Burlington, MA, 2 edi- tion. (Cite d on p ages 30 and
98 ) [257] Ruotolo, F., Maffei, L., Di Gabriele, M., Iachini, T., Masullo, M., Rug- giero, G.,
and Senese, V. P . (2013). Immersiv e virtual realit y and environ- men tal noise assessment: An
inno v ative audio–visual approach. Envir on- mental Imp act Assessment R eview , 41:10–20.
(Cite d on p age 16 ) [258] Sabine, W. (1922). Col le cte d Pap ers on A c oustics . Harv ard
Univ ersity Press. (Cite d on p age 34 ) [259] Savio ja, L., Huopaniemi, J., Lokki, T., and V
aananen, R. (1999). Cre- ating in teractiv e virtual acoustic environmen ts. Journal of the
Audio Engi- ne ering So ciety , 47:675–705. (Cite d on p ages 1 and 57 ) [260] Schepk er, H.,
Haeder, K., Rennies, J., and Holub e, I. (2016). P er- ceiv ed listening effort and sp eec h in
telligibilit y in rev erb eration and noise for hearing-impaired listeners. International
Journal of A udiolo gy . (Cite d on p ages 1 and 50 ) [261] Schr¨ oder, D. (2011). Physic al ly
Base d R e al-Time A ur alization of Inter- active Virtual Envir onments . Aac hener Beitr¨ age
zur T echnisc hen Akustik. Logos V erlag Berlin. (Cite d on p age 28 ) BIBLIOGRAPHY 219 [262]
Schroeder, M. and Atal, B. (1963). Computer sim ulation of sound trans- mission in ro oms. Pr o
c e e dings of the IEEE , 51(3):536–537. (Cite d on p age 22 ) [263] Schroeder, M., Atal, B.,
and Bird, C. (1962). Digital computers in ro om acoustics. Pr o c. 4th ICA, Cop enhagen M , 21.
(Cite d on p age 19 ) [264] Schroeder, M. R. (1965). New metho d of measuring rev erb eration
time. The Journal of the A c oustic al So ciety of Americ a , 37(3):409–412. (Cite d on p age
144 ) [265] Schroeder, M. R. (1979). In tegrated impulse metho d measuring sound deca y without
using impulses. The Journal of the A c oustic al So ciety of A meric a , 66(2):497–500. (Cite d
on p age 144 ) [266] Schr¨ oder, D., Pohl, A., Drechsler, S., Svensson, U. P ., V orl¨ ander,
M., and Stephenson, U. M. (2013). op enmat - managemen t of acoustic material (meta-)prop erties
using an op en source database format. In Pr o c e e dings of the AIA-DA GA 2013 . (Cite d on p
age 21 ) [267] Schr¨ oder, D., W efers, F., Pelzer, S., Rausc h, D., V orlaender, M., and
Kuhlen, T. (2010). Virtual realit y system at rwth aac hen univ ersit y . In Pr o c e e dings of
the International Symp osium on R o om A c oustics (ISRA) . (Cite d on p age 56 ) [268] Seeb er,
B. U., Baumann, U., and F astl, H. (2004). Localization ability with bimo dal hearing aids and
bilateral co c hlear implan ts. The Journal of the A c oustic al So ciety of Americ a ,
116(3):1698–1709. (Cite d on p age 40 ) [269] Seeb er, B. U., Kerb er, S., and Hafter, E. R.
(2010). A system to sim u- late and repro duce audio–visual en vironmen ts for spatial hearing
researc h. He aring r ese ar ch , 260(1):1–10. (Cite d on p age 56 ) [270] Seikel, J., King, D.,
and Drumrigh t, D. (2015). A natomy & Physiolo gy for Sp e e ch, L anguage, and He aring
. Cengage Learning. (Cite d on p age 14 ) [271] Sette, W. J. (1933). A new rev erb eration time
form ula. The Journal of the A c oustic al So ciety of Americ a , 4(3):193–210. (Cite d on p age
34 ) [272] Shavit-Cohen, K. and Zion Golum bic, E. (2019). The dynamics of atten- tion shifts
among concurrent sp eec h in a naturalistic multi-speaker virtual en vironmen t. F r ontiers in
Human Neur oscienc e , 13:386. (Cite d on p ages 1 and 57 ) [273] Sho jaei, E., Ashay eri, H.,
Jafari, Z., Dast, M., and Kamali, K. (2016). Effect of signal to noise ratio on the sp eech p
erception ability of older adults. Me dic al journal of the Islamic R epublic of Ir an , 30:342.
(Cite d on p ages 1 and 55 ) BIBLIOGRAPHY 220 [274] Silzle, A., Kosmidis, D., F elix Greco, G.,
Beer, D., and Betz, L. (2016). The influence of microphone directivity on the lev el calibration
and equal- ization of 3d loudsp eak ers setups. In 29th T onmeistertagung - VDT Inter- national
Convention 2016 . (Cite d on p age 21 ) [275] Simon, L. S. R., Dillier, N., and W ¨ uthrich, H.
(2021). Comparison of 3D audio repro duction metho ds using hearing devices. Journal of the
Audio Engine ering So ciety , 68(12):899–909. (Cite d on p ages 21 , 46 , 93 , 94 and 121 )
[276] Simon, L. S. R., W uethrich, H., and Dillier, N. (2017). Comparison of higher-order
ambisonics, vector- and distance-based amplitude panning using a hearing device b eamformer. In
Pr o c e e dings of 4th International Confer enc e on Sp atial Audio, Gr az, A ustria . (Cite d
on p ages 20 , 21 , 23 , 117 , 121 , 163 and 191 ) [277] Sim´ on G´ alvez, M., Menzies, D., F
azi, F., de Campos, T., and Hilton, A. (2015). Listener tracking stereo for ob ject based audio
repro duction. In T e cniacustic a 2016 (V alen cia)-Eur op e an Symp osium in Virtual A c
oustics and Ambisonics . (Cite d on p age 27 ) [278] Skudrzyk, E. (1971). The foundations of ac
oustics: b asic mathematics and b asic ac oustics . Springer-V erlag. (Cite d on p age 229 )
[279] Solv ang, A. (2008). Sp ectral impairmen t of t w o-dimensional higher order am bisonics.
J. Audio Eng. So c , 56(4):267–279. (Cite d on p age 94 ) [280] Spand¨ ock, F. (1934). Akustisc
he mo dellv ersuc he. Annalen der Physik , 412(4):345–360. (Cite d on p age 19 ) [281] Sp ors,
S., T eutsc h, H., Kuntz, A., and Rab enstein, R. (2004). Sound field syn thesis. In Huang, Y.
and Benesty , J., editors, Audio Signal Pr o c essing for Next-Gener ation Multime dia Communic
ation Systems , pages 323–344. Springer US, Boston, MA. (Cite d on p age 31 ) [282] Sp ors, S.,
Wierstorf, H., Raak e, A., Melc hior, F., F rank, M., and Zotter, F. (2013). Spatial sound with
loudsp eak ers and its p erception: A review of the current state. (Cite d on p ages 21 , 23 ,
27 , 40 , 42 and 53 ) [283] Stitt, P ., Bertet, S., and V an W alstijn, M. (2013). P erceptual
inv estiga- tion of image placement with ambisonics for non-cen tred listeners. In Pr o c. of
the 16th Int. Confer enc e on Digital Audio Effe cts (DAFx-13), Mayno oth, Ir eland . (Cite d on
p ages 21 , 46 and 49 ) [284] Strauss, H. (1998). Implemen ting doppler shifts for virtual
auditory en vironmen ts. Journal of the Audio Engine ering So ciety . (Cite d on p age 20 )
BIBLIOGRAPHY 221 [285] Strumi l lo, P . (2011). A dvanc es in Sound L o c alization . a. InT
ech. (Cite d on p ages 9 and 10 ) [286] Sudarsono, A. S., Lam, Y. W., and Da vies, W. J. (2016).
The effect of sound lev el on p erception of repro duced soundscap es. Applie d A c oustics ,
110:53–60. (Cite d on p age 42 ) [287] Søndergaard, P . and Ma jdak, P . (2013). The auditory mo
deling to olb o x. In Blauert, J., editor, The T e chnolo gy of Binaur al Listening , pages
33–56. Springer, Berlin, Heidelb erg. (Cite d on p age 138 ) [288] T enenbaum, R. A., Camilo, T.
S., T orres, J. C. B., and Gerges, S. N. (2007). Hybrid metho d for numerical simulation of ro
om acoustics with au- ralization: part 1-theoretical and n umerical aspects. Journal of the Br
azilian So ciety of Me chanic al Scienc es and Engine ering , 29:211–221. (Cite d on p age 68 )
[289] T rembla y , P ., Brisson, V., and Desc hamps, I. (2020). Brain aging and sp eec h p
erception: Effects of background noise and talker v ariability . Neu- r oImage , 227:117675.
(Cite d on p ages 1 and 55 ) [290] T revi ˜ no, J., Ok amoto, T., Iwa ya, Y., and Suzuki, Y.
(2011). Ev aluation of a new ambisonic decoder for irregular loudsp eak er arrays using in
teraural cues. In Ambisonics Symp osium . (Cite d on p age 94 ) [291] T u, W., Hu, R., W ang,
H., and Chen, W. (2010). Measuremen t and analysis of just noticeable difference of in teraural
level difference cue. 2010 International Confer enc e on Multime dia T e chnolo gy , pages 1–3.
(Cite d on p age 148 ) [292] V an W anro oij, M. M. and V an Opstal, A. J. (2004). Contribution
of head shado w and pinna cues to c hronic monaural sound lo calization. Journal of Neur oscienc
e , 24(17):4163–4171. (Cite d on p age 11 ) [293] V orl¨ ander, M. (2007). A ur alization: F
undamentals of A c oustics, Mo d- el ling, Simulation, Algorithms and A c oustic Virtual R e
ality . R WTHedition. Springer Berlin Heidelb erg. (Cite d on p ages 2 , 14 , 15 , 18 , 19 , 20
, 21 , 22 , 33 , 40 , 42 , 53 and 121 ) [294] V orl¨ ander, M. (2008). Virtual Acoustics: Opp
ortunities and limits of spatial sound repro duction for audiology . Hausdesho er ens-Oldenbur g
. (Cite d on p age 56 ) [295] V orl¨ ander, M. (2014). Virtual acoustics. A r chives of A c
oustics , v ol. 39(No 3):307–318. (Cite d on p age 40 ) [296] W allach, H. (1938). On sound lo
calization. The Journal of the A c oustic al So ciety of Americ a , 10(1):83–83. (Cite d on p
age 10 ) BIBLIOGRAPHY 222 [297] W ang, D. and Bro wn, G. J. (2006). Binaural sound lo
calization. In Computational Auditory Sc ene A nalysis: Principles, A lgorithms, and Ap- plic
ations , pages 147–185. Wiley . (Cite d on p age 146 ) [298] W anner, L., Blat, J., Dasiopoulou,
S., Dom ´ ınguez, M., Llorac h, G., Mille, S., Sukno, F., Kamateri, E., V ro chidis, S.,
Kompatsiaris, I., Andr ´ e, E., Lin- genfelser, F., Mehlmann, G., Stam, A., Stellingw erff, L.,
Vieru, B., Lamel, L., Mink er, W., Pragst, L., and Ultes, S. (2016). T o w ards a m ultimedia
kno wledge-based agen t with so cial comp etence and human interaction ca- pabilities. In Pr o c
e e dings of the 1st International Workshop on Multime dia A nalysis and R etrieval for Multimo
dal Inter action , MARMI ’16, page 21–26, New Y ork, NY, U SA. Asso ciation for Computing
Machinery . (Cite d on p ages 1 and 57 ) [299] W ard, D. B. and Abhay apala, T. D. (2001). Repro
duction of a plane- w a v e sound field using an arra y of loudsp eakers. IEEE T r ansactions on
Sp e e ch and A udio Pr o c essing , 9(6):697–707. (Cite d on p ages 31 , 77 and 86 ) [300] W
endt, D., Dau, T., and Hjortkjær, J. (2016). Impact of background noise and sen tence complexit
y on pro cessing demands during sen tence com- prehension. F r ontiers in Psycholo gy . (Cite d
on p age 51 ) [301] W endt, D., Hietk amp, R. K., and Lunner, T. (2017). Impact of noise and
noise reduction on pro cessing effort: A pupillometry study. Ear and He aring . (Cite d on p ages
50 and 101 ) [302] W endt, D., Ko elewijn, T., Ksi¸ a ˙ zek, P ., Kramer, S. E., and Lunner, T.
(2018). T ow ard a more comprehensive understanding of the impact of mask er type and
signal-to-noise ratio on the pupillary resp onse while p er- forming a sp eech-in-noise test. He
aring R ese ar ch , pages 1–12. (Cite d on p ages 50 , 101 and 102 ) [303] W estermann, A. and
Buchholz, J. M. (2017). The effect of nearb y mask ers on sp eec h intelligibilit y in
reverberant, multi-talk er environmen ts. The Journal of the A c oustic al So ciety of Americ a
, 141(3):2214–2223. (Cite d on p ages 42 and 99 ) [304] Whitmer, W. M. and Ak eroyd, M. A.
(2013). The sensitivity of hearing- impaired adults to acoustic attributes in simulated ro oms.
Pr o c e e dings of Me etings on A c oustics , 19(1):015109. (Cite d on p ages 1 , 18 and 50 )
[305] Whitmer, W. M., Seeb er, B. U., and Akero yd, M. A. (2012). Apparen t auditory source
width insensitivity in older hearing-impaired individuals. The Journal of the A c oustic al So
ciety of Americ a , 132(1):369–379. (Cite d on p ages 16 , 18 and 40 ) [306] Wightman, F. L. and
Kistler, D. J. (1992). The dominan t role of lo w- frequency interaural time differences in sound
lo calization. The Journal of the A c oustic al So ciety of Americ a , 91(3):1648–1661. (Cite d
on p age 10 ) BIBLIOGRAPHY 223 [307] Wightman, F. L. and Kistler, D. J. (1997). Monaural sound
lo calization revisited. The Journal of the A c oustic al So ciety of Americ a , 101(2):1050–
1063. (Cite d on p age 11 ) [308] Wilcox, R. (2004). Inferences based on a skipp ed correlation
co efficien t. Journal of Applie d Statistics , 31(2):131–143. (Cite d on p age 116 ) [309]
Williams, G. (1999). F ourier A c oustics: Sound R adiation and Ne arfield A c oustic al Holo gr
aphy . Academic Press. (Cite d on p ages 28 and 229 ) [310] Wisniewski, M. G., Thompson, E. R.,
and Iyer, N. (2017). Theta- and alpha-p o w er enhancemen ts in the electro encephalogram as an
auditory de- la y ed match-to-sample task b ecomes imp ossibly difficult. Psychophysiolo gy ,
54(12):1916–1928. (Cite d on p age 111 ) [311] Wisniewski, M. G., Thompson, E. R., Iy er, N.,
Estepp, J. R., Go der- Reiser, M. N., and Sulliv an, S. C. (2015). F rontal midline θ pow er as
an index of listening effort. Neur or ep ort , 26(2):94—99. (Cite d on p age 111 ) [312] W ong,
G. S. K. (1986). Sp eed of sound in standard air. The Journal of the A c oustic al So ciety of
Americ a , 79(5):1359–1366. (Cite d on p age 15 ) [313] W¨ ostmann, M., Lim, S.-J., and Obleser,
J. (2017). The Human Neu- ral Alpha Resp onse to Sp eec h is a Proxy of Atten tional Control.
Cer ebr al Cortex , 27(6):3307–3317. (Cite d on p age 111 ) [314] Xie, B. (2013). He ad-r elate
d tr ansfer function and virtual auditory dis- play . J. Ross Publishing. (Cite d on p ages 22
and 70 ) [315] Y ost, W. (2013). F undamentals of He aring: An Intr o duction . Brill. (Cite d
on p age 8 ) [316] Zapata Ro driguez, V., Jeong, C.-H., Hoffmann, I., Cho, W.-H., Beldam, M.-B.,
and Harte, J. (2019). Acoustic conditions of clinic ro oms for sound field audiometry . In Pr o c
e e dings of 23r d International Congr ess on A c ous- tics , pages 4654–59. Deutsc he Gesellsc
haft f ¨ ur Akustik. 23rd International Congress on Acoustics , ICA 2019 ; Conference date:
09-09-2019 Through 13-09-2019. (Cite d on p ages 122 and 139 ) [317] Zekveld, A., Kramer, S.,
and F esten, J. (2011). Cognitive load during sp eec h p erception in noise: The influence of
age, hearing loss, and cognition on the pupil resp onse. Ear and he aring , 32:498–510. (Cite d
on p ages 1 and 55 ) BIBLIOGRAPHY 224 [318] Zekveld, A. A. and Kramer, S. E. (2014). Cognitive
processing load across a wide range of listening conditions: Insights from pupillometry . Psy-
chophysiolo gy . (Cite d on p ages 50 and 112 ) [319] Zekveld, A. A., Kramer, S. E., and F
esten, J. M. (2010). Pupil resp onse as an indication of effortful listening: The influence of
sentence intelligibilit y. Ear and He aring . (Cite d on p ages 50 and 118 ) [320] Zhang, W.,
Samarasinghe, P ., Chen, H., and Abhay apala, T. (2017). Sur- round b y Sound: A Review of
Spatial Audio Recording and Repro duction. Applie d Scienc es , 7(5):532. (Cite d on p ages 19 ,
20 , 21 , 27 and 40 ) [321] Zob el, B. H., W agner, A., Sanders, L. D., and Ba ¸ sken t, D.
(2019). Spatial release from informational masking declines with age: Evidence from a de-
tection task in a virtual separation paradigm. The Journal of the A c oustic al So ciety of
Americ a , 146(1):548–566. (Cite d on p ages 16 and 40 ) [322] ˇ Lub o ˇ s Hl´ adek, Ewert, S.
D., and Seeb er, B. U. (2021). Comm unication conditions in virtual acoustic scenes in an
underground station. (Cite d on p age 42 ) [323] S ¸ aher, K., Rindel, J. H., Nijs, L., and V an
Der V o orden, M. (2005). Impacts of reverberation time, absorption lo cation and background
noise on listening conditions in multi source en vironmen t. In F orum A custicum Budap est
2005: 4th Eur op e an Congr ess on A custics . (Cite d on p age 50 ) App endix A ITDs Am
bisonics Figure A.1 , depicts ITDs for measurements with a listener (HA TS manikin) in the cen
ter with Ambisonics (blac k line), in nine off-center p ositions combi- nations accompanied by a
second listener (KEMAR) and alone in those three off center p ositions. Figure A.1: ITD as a
function of sour c e angle in Ambisonics virtualize d setup. T op left HA TS displac ement = 25
cm, top right HA TS displac ement = 50 cm, b ottom left HA TS displac ement = 75 cm, b ottom
right HA TS displac ement matching KEMAR displac ement. 225 App endix B Delta ILD Am bisonics
Figures B.1 , B.2 , and B.3 , presen t the differences in ILD b et w een center and off-cen ter
listener p ositions utilizing 24 loudsp eak ers to render an Am bisonics with a second listener
present inside the ring of loudsp eak ers. In the figures, the num b er follo wing H indicates
the p osition of the main listener, while the n um b ers after K indicate the p osition of the
second listener. Figure B.1: Differ enc es in the ILD b etwe en c enter e d setup and off-c enter
setups: HA TS at 25 cm to the right with: KEMAR at 25 cm to the left (top); KEMAR at 50 cm to
the left (midd le); KEMAR 75 cm to the left (b ottom). 226 227 Figure B.2: Differ enc es in the
ILD b etwe en c enter e d setup and off- c enter setups: HA TS at 50 cm to the right with: KEMAR
at 25 cm to the left (top); KEMAR at 50 cm to the left (midd le); KEMAR 75 cm to the left (b
ottom). Figure B.3: Differ enc es in the ILD b etwe en c enter e d setup and off- c enter setups:
HA TS at 75 cm to the right with: KEMAR at 25 cm to the left (top); KEMAR at 50 cm to the left
(midd le); KEMAR 75 cm to the left (b ottom). App endix C W a v e Equation and Spherical
Harmonic Represen tation Spherical harmonics (SH) represent spatial v ariations of an orthogonal
set of solutions in the Laplace equation (orthonormal basis) when the solution is expressed in
spherical co ordinates, th us giving the spatial represen tation of w eigh ted sums in spherical
forms that represen ts a signal (space and frequency dep enden t). C.1 W a v e Equation in
Spherical Co ordinates Expressing the wa ve equation in spherical co ordinates ( r , ϕ, θ ) [ 36
] w e hav e: ∂ 2 p ∂ r 2 + 2 r ∂ p ∂ r + 1 r 2 sin( θ ) ∂ ∂ θ sin( θ ) ∂ p ∂ θ + 1 r 2 sin 2
( ϕ ) ∂ 2 p ∂ ϕ 2 − 1 c 2 0 ∂ 2 p ∂ t 2 = 0 , (C.1) C.2 Separation of the V ariables The
differential equation solution to ol called sep ar ation of variables can b e used for the
Equation C.1 , b eing formulated from the pro duct of three space dep enden t v ariables and a
time dep endent v ariable: p ( r , θ , ϕ, t ) = R ( r )Θ( θ )Φ( ϕ ) T ( t ) . (C.2) 228 Chapter
C. Separation of the V ariables 229 With the separation of the v ariables, according to Skudrzyk
[ 278 ], there are four homogeneous differential equations: d 2 Φ d ϕ + m 2 = 0 , (C.3a) 1 sin θ
d d θ sin θ dΘ d θ + n ( n + 1) − m 2 sin 2 θ Θ = 0 , (C.3b) 1 r d d r r 2 d R d r +
k 2 R − n ( n + 1) r 2 R = 0 , (C.3c) 1 c 2 d 2 T d t 2 + k 2 T = 0 . (C.3d) where m and n
integers, the general solutions to the equations are Φ( ϕ ) = Φ 1 e j mϕ +Φ 2 e − j mϕ , (C.4a)
Θ( θ ) = Θ 1 P m n (cos( θ )) + Θ 2 Q m n (cos( θ )) , (C.4b) R ( r ) = R 1 h (1) n ( k r ) + R
2 h (2) n ( k r ) , (C.4c) T ( ω ) = T 1 e j ω t + T 2 e − j ω t , (C.4d) where h (1) n ( x )
and h (2) n ( x ) are the first and second-kind spherical Hank el func- tions that represen t
conv ergent and divergen t wa ves dep ending on the signal agreed for the time and P m n ( x )
and Q m n ( x ) are the asso ciated Legendre func- tions of the first and second t yp e. Due to
the singularities in the poles of Legendre’s asso ciated functions at θ = 0 and θ = π the term Θ
2 is treated as n ull, and for simplification, y ou can use the p ositiv e m v ariable or negativ
e, so the term Φ 2 is also null. According to Williams [ 309 ], for there to b e no
singularities in the p oles of Legendre’s asso ciated functions, the n index must b e an in
teger. Still, considering causal systems, the term T 2 in C.4d is equal to 0 giv en the con v en
tion used. Chapter C. Spherical Harmonics 230 The asso ciated Legendre functions of the first
type defined for p ositive degrees m are P m n ( x ) = (1) m (1 − x 2 ) m 2 d m d x m P n ( x ) .
(C.5) Mean while, the functions for negative degrees − m are given b y P − m n = ( − 1) m ( n −
m )! ( n + m )! P m n ( x ) , (C.6) P n b eing the Legendre Polynomial giv en by P n ( x ) = 1 2
n n ! d n d x n ( x 2 − 1) n . (C.7) C.3 Spherical Harmonics Equations C.4a and C.4b admit p
erio dic solutions in angular co ordinates, and com bined are called spherical harmonics of
order n and degree m defined b y Y m n ( θ , ϕ ) = s (2 n + 1) 4 π ( n − m )! ( n + m )! P m n
(cos( θ )) e j mϕ . (C.8) The negative order SH functions are obtained through the relation Y m
n ( θ , ϕ ) = ( − 1) m · ( Y − n m ( θ , ϕ )) ∗ , (C.9) where ∗ denotes the conjugate complex,
and demonstrates that only the phase c hanges b etw een the p ositiv e and negativ e degrees of
the function. Th us the magnitude is commonly expressed with the radius and the phase in terms
of color or color scale, as in Figure 2.9 . App endix D Rev erb eration time in Acoustic Sim
ulation The reverberation time for the classro om and restauran t are presen ted in Fig- ure D.1
(a) (b) Figure D.1: R everb er ation time (a) Classr o om (b) R estaur ant in o ctave b ands 231
App endix E Alpha Co efficien ts Figures E.1 , E.2 , and E.3 presents the absorption co efficients
according to the frequency introduced in the ODEON softw are to simulate the en vironmen ts.
Figure E.1: Classr o om alpha c o efficients (ODEON softwar e). 232 233 Figure E.2: R estaur ant
alpha c o efficients (ODEON softwar e). Figure E.3: A ne choic r o om alpha c o efficients. App
endix F Questionnaire Questionnaire 1 | 1 TS_ 00 Date: ___ / ___ / 2019 Hvor meget anstrengte du
dig for at høre sætningerne? Hvor mange af ordene tror du, at du forstod korrekt? Hvor ofte
måtte du opgive at forstå sætningen? Ingen anstrengelse L av anstrengelse Mo derat anstrengelse
H ø j anstrengelse Meget h ø j anstrengelse Ingen Mindre end halvdelen Halvde len Mere e nd
halvdelen Alle Aldrig Mindre end halvdelen af tiden Halvde len af tiden Mere end halvdelen af
tiden Altid 234