Abstract
Depending on the acoustic scenario, people with hearing loss are challenged on a different scale
than independent of normal hearing people to comprehend sound, especially speech. That happens
especially during social interactions within a group, which often occurs in environments with
low signal-to-noise ratios. This communication disruption can create a barrier for people to
acquire and develop communication skills as a child or to interact with society as an adult.
Hearing loss compensation aims to provide an opportunity to restore the auditory part of
socialization.
Technology and academic efforts progressed to a better understanding of the human hearing
system. Through constant efforts to present new algorithms, miniaturization, and new materials,
constantly-improving hardware with high-end software is being developed with new features and
solutions to broad and specific auditory challenges. The effort to deliver innovative solutions
to the complex phenomena of hearing loss encompasses tests, verifications, and validation in
various forms. As the newer devices achieve their purpose, the tests need to increase the
sensitivity, requiring conditions that effectively assess their improvements.
Regarding realism, many levels are required in hearing research, from pure tone assessment in
small soundproof booths to hundreds of loudspeakers combined with visual stimuli through
projectors or head-mounted displays, light, and movement control. Hearing aids research commonly
relies on loudspeaker setups to reproduce sound sources. In addition, auditory research can use
well-known auralization techniques to generate sound signals. These signals can be encoded to
carry more than sound pressure level information, adding spatial information about the
environment where that sound event happened or was simulated.
This work reviews physical acoustics, virtualization, and auralization concepts and their uses
in listening effort research. This knowledge, combined with the experiments executed during the
studies, aimed to provide a hybrid auralization method to be virtualized in four-loudspeaker
setups. Auralization methods are techniques used to encode spatial information into sounds. The
main methods were discussed and derived, observing their spatial sound characteristics and
trade-offs to be used in auditory tests with one or two participants. Two well-known
auralization techniques (Ambisonics and Vector-Based Amplitude Panning) were selected and
compared through a calibrated virtualization setup regarding spatial distortions in the binaural
cues. The choice of techniques was based on the need for loudspeakers, although a small number
of them. Furthermore, the spatial cues were examined by adding a second listener to the
virtualized sound field. The outcome reinforced the literature around spatial localization and
these techniques driving Ambisonics to be less spatially accurate but with greater immersion
than Vector-Based Amplitude Panning.
A combination study to observe changes in listening effort due to different signal-to-noise
ratios and reverberation in a virtualized setup was defined. This experiment aimed to produce
the correct sound field via a virtualized setup and assess listening effort via subjective
impression with a questionnaire, an objective physiological outcome from EEG, and behavioral
performance on word recognition. Nine levels of degradation were imposed on speech signals over
speech maskers separated in the virtualized space through Ambisonics’ first-order technique in a
setup with 24 loudspeakers. A high correlation between participants’ performance and their
responses on the questionnaire was observed. The results showed that the increased virtualized
reverberation time negatively impacts speech intelligibility and listening effort.
A new hybrid auralization method was proposed merging the investigated techniques that presented
complementary spatial sound features. The method was derived through room acoustics concepts and
a specific objective parameter derived from the room impulse response called Center Time. The
verification around the binaural cues was driven with three different rooms (simulated). As the
validation with test subjects was not possible due to the COVID-19 pandemic situation, a
psychoacoustic model was implemented to estimate the spatial accuracy of the method within a
four-loudspeaker setup. Also, an investigation ran the same verification, and the model
estimation was performed with the introduction of hearing aids. The results showed that it is
possible to consider the hybrid method with four loudspeakers for audiological tests while
considering some limitations. The setup can provide binaural cues to a maximum ambiguity angle
of 30 degrees in the horizontal plane for a centered listener.
Introduction
Individuals with normal hearing often can effortlessly comprehend complex listening scenarios
involving multiple sound sources, background noise, and echoes. However, those with hearing loss
may find these situations particularly challenging. These environments are commonly encountered
in daily life, particularly during social events. They can negatively impact the communication
abilities of individuals with hearing loss. The difficulties associated with understanding
complex listening scenarios can be a significant barrier for individuals with hearing loss,
leading to reduced participation in social activities.
Several hearing research laboratories worldwide are developing systems to realistically simulate
challenging scenarios through virtualization to better understand and help with these everyday
challenges. The virtualization of sound sources is a powerful tool for auditory research capable
of achieving a high level of detail, but current methods use expensive, expansive technology. In
this work, a new auralization method has been developed to achieve sound spatialization with a
reduction in the technological hardware requirement, making virtualization at the clinic level
possible.
Key Chapters
Chapter 2: Literature Review
Examines previous work in virtualization and auralization, basic concepts of human sound
perception, room acoustics, and loudspeaker-based virtualization.
Chapter 3: Investigation of Binaural Cue Distortions
Compares VBAP and Ambisonics methods through a calibrated virtualization setup in terms of
spatial distortions and examines spatial cues with a second listener.
Chapter 4: Behavioral Study
Examines subjective effort within virtualized sound scenarios (first-order Ambisonics),
focusing on how signal-to-noise ratio (SNR) and reverberation affect listening effort in
speech-in-noise tasks.
Chapter 5: The Iceberg Method
Proposes a hybrid auralization method combining VBAP and Ambisonics for small reproduction
systems (four loudspeakers), evaluated with objective parameters and hearing aids.
Conclusion
Throughout the course of this study, a new auralization method called Iceberg
was conceptualized and compared to well-known methods, including VBAP and first-order
Ambisonics, using objective parameters. The Iceberg method is innovative in that it uses "Center
Time" (TS) to find the transition point between early and late reflections in order to split the
Ambisonics impulse responses and adequately distribute them. VBAP is responsible for
localization cues in this proposed method, while Ambisonics contributes to the sense of
immersion.
In the center position, the Iceberg method was found to be in line with the localization
accuracy of other methods while also adding to the sense of immersion. Also, a second listener
added to the side did not present undesired effects to the auralization. Additionally, it was
found that virtualization of sound sources with Ambisonics can implicate limitations on a
participant’s behavior due to its sweet spot in a listening-in-noise test. However, these
limitations can be circumvented and extended to Iceberg, resulting in subjective responses that
align with behavioral performance in speech intelligibility tests and increasing the
localization accuracy.
Iceb erg: A loudsp eak er-based ro om auralization metho d for auditory researc h Sergio Luiz Aguirre Submitte d in fulfilment of the r e quir ements for the de gr e e of Do ctor of Philosophy Hearing Sciences – Scottish Section - Sc ho ol of Medicine Univ ersit y of Nottingham Sup ervised b y William M. Whitmer, Lars Bramsløw, & Graham Na ylor 2022 Ther e is no “nonsp atial he aring” — Jens Blauert A bstr act Dep ending on the acoustic scenario, p eople with hearing loss are challenged on a differen t scale than normal hearing people to comprehend sound, esp e- cially sp eec h. That happ en esp ecially during so cial in teractions within a group, whic h often o ccurs in en vironments with lo w signal-to-noise ratios. This com- m unication disruption can create a barrier for people to acquire and dev elop comm unication skills as a child or to interact with so ciet y as an adult. Hear- ing loss comp ensation aims to provide an opp ortunity to restore the auditory part of socialization. T echnology and academic efforts progressed to a b et- ter understanding of the h uman hearing system. Through constant efforts to presen t new algorithms, miniaturization, and new materials, constantly- impro ving hardware with high-end softw are is b eing dev elop ed with new fea- tures and solutions to broad and sp ecific auditory c hallenges. The effort to deliv er inno v ativ e solutions to the complex phenomena of he aring loss encom- passes tests, verifications, and v alidation in v arious forms. As the newer de- vices achiev e their purp ose, the tests need to increase the sensitivit y , requiring conditions that effectively assess their improv ements. Regarding realism, many levels are required in hearing research, from pure tone assessment in small soundpro of b o oths to h undreds of loudsp eakers com- bined with visual stimuli through pro jectors or head-mounted displays, ligh t, and mov ement con trol. Hearing aids research commonly relies on loudsp eak er setups to repro duce sound sources. In addition, auditory researc h can use w ell-kno wn auralization tec hniques to generate sound signals. These signals can b e enco ded to carry more than sound pressure lev el information, adding spatial information ab out the environmen t where that sound even t happ ened or was simulated. This work reviews ph ysical acoustics, virtualization, and auralization concepts and their uses in listening effort research. This knowl- edge, com bined with the exp eriments executed during the studies, aimed to pro vide a hybrid auralization metho d to b e virtualized in four-loudsp eaker se- tups. Auralization metho ds are techniques used to enco de spatial information in to sounds. The main metho ds w ere discussed and derived, observing their spatial sound characteristics and trade-offs to b e used in auditory tests with one or t w o participants. Two well-kno wn auralization techniques (Am bisonics and V ector-Based Amplitude Panning) were selected and compared through a calibrated virtualization setup regarding spatial distortions in the binau- ral cues. The c hoice of techniques was based on the need for loudsp eakers, although a small num b er of them. F urthermore, the spatial cues were exam- ined by adding a second listener to the virtualized sound field. The outcome reinforced the literature around spatial lo calization and these techniques driv- ing Am bisonics to b e less spatially accurate but with greater immersion than V ector-Based Amplitude Panning. A com bination study to observ e changes in listening effort due to differen t signal-to-noise ratios and rev erb eration in a virtualized setup was defined. This i exp erimen t aimed to pro duce the correct sound field via a virtualized setup and assess listening effort via sub jective impression with a questionnaire, an ob jectiv e physiological outcome from EEG, and b eha vioral p erformance on w ord recognition. Nine levels of degradation were imp osed on sp eec h signals o v er sp eec h maskers separated in the virtualized space through Ambisonics’ first-order tec hnique in a setup with 24 loudsp eak ers. A high correlation b e- t w een participan ts’ p erformance and their resp onses on the questionnaire w as observ ed. The results show ed that the increased virtualized rev erb eration time negativ ely impacts sp eech intelligibilit y and listening effort. A new h ybrid auralization metho d w as prop osed merging the inv estigated tech- niques that presented complemen tary spatial sound features. The metho d was deriv ed through ro om acoustics concepts and a sp ecific ob jective parameter deriv ed from the ro om impulse resp onse called Center Time. The v erification around the binaural cues was driv en with three different ro oms (simulated). As the v alidation with test sub jects w as not p ossible due to the COVID-19 pandemic situation, a psychoacoustic mo del w as implemen ted to estimate the spatial accuracy of the metho d within a four-loudsp eak er setup. Also, an in- v estigation ran the same verification, and the mo del estimation was performed with the introduction of hearing aids. The results show ed that it is p ossible to consider the h ybrid method with four loudsp eak ers for audiological tests while considering some limitations. The setup can provide binaural cues to a maxim um ambiguit y angle of 30 degrees in the horizon tal plane for a centered listener. ii A cknow le dgements Thank you, Obrigado, Gr acias, Gr azie, T ak skal du have, Dank u ze er & Danke sehr Firstly , I w ould like to express m y gratitude to my supervisors, Drs. Bill Whit- mer, Lars Bramsløw, and Graham Naylor, for their guidance, exp ertise, and remark able patience throughout this pro cess. Y our supp ort and mentorship ha v e been in v aluable in helping me to fill my knowledge gaps, tirelessly encour- aging me to ask the righ t questions, and guiding me to pro duce high-qualit y scien tific research. I also thank Dr. Thomas Lunner for his initial guidance, insigh tful questions, and commen ts. Thank you to the sp ecial p eople at Eriksholm Researc h Centre in Denmark and the p eople at Hearing Sciences – Scottish Section. W orking with suc h fan tastic top teams has b een a jo y and a privilege. A sp ecial thank y ou to Jette, Mic hael, Bo, Niels, Claus, Dorothea, Sergi, Jepp e, James, Lorenz, Johannes, and Hamish. Thanks to all the p eople in v olv ed in HEAR-ECO for their hard w ork, esp ecially Hidde, Beth, Patrycja, Tirdad, and Defne. I am deeply grateful to m y sw eet wife, Lilian, for the lov e, encouragement, and supp ort she has given me throughout this journey . Her unw av ering supp ort has b een an enduring seed of resilience and inspiration. I cannot thank her enough for b eing such an integral part of my life. Thanks to my friends Math and Gil, for alwa ys b eing there. Special thanks to m y former professors, Drs.: Arcanjo Lenzi, William D’Andrea F onseca, Eric Brand˜ ao, Paulo Mareze, Stephan Paul, and Bruno Sanc hes Masiero for stim u- lating critical thinking and for all the support, knowledge, and encouragement. Thank y ou also to m y former Oticon Medical colleagues Simon Zisk a Krogholt, P atric k Maas, Brian Sk ov, and Jens T. Balslev for the and supp ort. I w ould like to thank you, my Professor, Pr ofessor a Dra. Dinara Xavier P aix˜ ao. All of this is p ossible b ecause of y ou and y our determination to create an official undergraduate course in Acoustical Engineering in Brazil. This course iii is praised for forming remark able professionals who are recognized w orldwide. It is not just my dream that y ou hav e made p ossible, but the coun tless p eople for whom this course has b een a life-changing exp erience. W e know that this w as a collective effort, but y our role was vital. Y our wa y of showing that p olitics is a part of everything, and that we need to b e gentle but correct, made all the difference. Thank you. Muito Obrigado . I sincerely thank m y friends and colleagues from m y undergraduate studies in acoustical engineering (USFM/EAC) and the master’s (UFSC/L V A). Y our supp ort, happiness, patience and encouragemen t hav e b een in v aluable through- out this journey . Thank y ou for helping me to develop m y skills and kno wledge and for b eing such a p ositiv e influence on m y academic career. I am deeply grateful for all y ou ha v e done for me and lo ok forw ard to contin uing our pro- fessional relationship. I wan t to thank the oldest friends, Fabrício, André, Juliano, and the Pan teon. I v alue the b ond and history that w e share. Thank you for b eing suc h wonderful friends. I wan t to express my heartfelt thanks to the Brazilian CNPq and the gov ern- men t (Lula/Dilma) p olicies that supp ort students with lo w income and from public schools. With their financial assistance, I could pursue m y studies and ac hiev e my goals. I am deeply gratef ul for their support and the opp ortunity to receiv e a qualit y education. I would also like to express m y gratitude to Marie Sk lo do wsk a-Curie Actions for their supp ort of my do ctoral education. Their reference programme for do ctoral education has provided me with in v aluable resources and opp ortunities, and I am extremely grateful for their support. Thank you for helping me achiev e my goals and b eing a v aluable part of m y academic journey . I wan t to express my gratitude to all those who will read this thesis in the future. Y our time and atten tion are greatly appreciated. I wish you a go o d reading exp erience and hope that you will find the ideas and research presented in this w ork to b e b oth thought-pro voking and b eneficial. Thank you again for considering this w ork. iv A uthor’s De clar ation This thesis is the result of the author’s original researc h. Chapter 4 is a collab oration w ork with Tirdad Seifi-Ala. The author has comp osed it and has not b een previously submitted for any other academic qualification. This pro ject has received funding from the Europ ean Union’s Horizon 2020 researc h and innov ation programme under the Marie-Sk lo do wsk a-Curie grant agreemen t No 765329; The funder had no role in study design. Sergio Luiz Aguirre v Con ten ts Abstract i Ac kno wledgemen ts iii Author’s Declaration v Nomenclature xxi 1 In tro duction 1 1.1 Motiv ations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Aims and Scop e . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Con tributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . 4 2 Literature Review 7 2.1 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Human Binaural Hearing . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Spatial Hearing Concepts . . . . . . . . . . . . . . . . . 9 2.2.2 Binaural cues . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3 Monaural cues . . . . . . . . . . . . . . . . . . . . . . . . 11 vi CONTENTS vii 2.2.4 Head-related transfer function . . . . . . . . . . . . . . . 12 2.2.5 Sub jectiv e asp ects of an audible reflection . . . . . . . . 16 2.3 Spatial Sound & Virtual Acoustics . . . . . . . . . . . . . . . . 17 2.3.1 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.1.1 Auralization . . . . . . . . . . . . . . . . . . . . 19 2.3.1.2 Repro duction . . . . . . . . . . . . . . . . . . . 21 2.3.2 Auralization Paradigms . . . . . . . . . . . . . . . . . . 22 2.3.2.1 Binaural . . . . . . . . . . . . . . . . . . . . . . 22 2.3.2.2 P anorama . . . . . . . . . . . . . . . . . . . . . 23 2.3.2.3 Sound Field Synthesis . . . . . . . . . . . . . . 31 2.3.3 Ro om acoustics . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.3.1 Ro om acoustics parameters . . . . . . . . . . . 33 2.3.3.2 Rev erb eration Time . . . . . . . . . . . . . . . 33 2.3.3.3 Clarit y and Definition . . . . . . . . . . . . . . 35 2.3.3.4 Cen ter Time . . . . . . . . . . . . . . . . . . . 37 2.3.3.5 P arameters related to spatialit y . . . . . . . . . 37 2.3.4 Loudsp eak er-based Virtualization in Auditory Researc h . 40 2.3.4.1 Hybrid Metho ds . . . . . . . . . . . . . . . . . 43 2.3.4.2 Sound Source Lo calization . . . . . . . . . . . . 45 2.4 Listening Effort Assessment . . . . . . . . . . . . . . . . . . . . 50 2.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 53 3 Binaural cue distortions in virtualized Am bisonics and VBAP 55 CONTENTS viii 3.1 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 Metho ds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2.1 Setups and system c haracterization . . . . . . . . . . . . 59 3.2.1.1 Rev erb eration time . . . . . . . . . . . . . . . . 60 3.2.1.2 Early-reflections . . . . . . . . . . . . . . . . . 61 3.2.2 Pro cedure . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.2.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.2.4 VBAP Auralization . . . . . . . . . . . . . . . . . . . . . 67 3.2.5 Am bisonics Auralization . . . . . . . . . . . . . . . . . . 67 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.3.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.3.2 Cen tered p osition . . . . . . . . . . . . . . . . . . . . . . 71 3.3.2.1 Cen tered ITD . . . . . . . . . . . . . . . . . . . 73 3.3.2.2 Cen tered ILD . . . . . . . . . . . . . . . . . . . 76 3.3.3 Off-cen tered p osition . . . . . . . . . . . . . . . . . . . . 82 3.3.3.1 Off-cen ter ITD . . . . . . . . . . . . . . . . . . 83 3.3.3.2 Off-cen ter ILD . . . . . . . . . . . . . . . . . . 87 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 95 4 Sub jectiv e Effort within Virtualized Sound Scenarios 97 4.1 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.2 Metho ds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 CONTENTS ix 4.2.1 P articipan ts . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.2.2 Stim uli . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.2.3 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.2.4 Auralization . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.2.5 Pro cedure . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.2.6 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . 112 4.2.7 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 119 5 Iceb erg: A Hybrid Auralization Metho d F o cused on Compact Setups. 120 5.1 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.2 Iceb erg an Hybrid Auralization Metho d . . . . . . . . . . . . . . 122 5.2.1 Motiv ation . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.2.2 Metho d . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.2.2.1 Comp onen ts . . . . . . . . . . . . . . . . . . . . 124 5.2.2.2 Energy Balance . . . . . . . . . . . . . . . . . . 124 5.2.2.3 Iceb erg prop osition . . . . . . . . . . . . . . . . 127 5.2.3 Setup Equalization & Calibration: . . . . . . . . . . . . . 133 5.3 System Characterization . . . . . . . . . . . . . . . . . . . . . . 138 5.3.1 Exp erimen tal Setup . . . . . . . . . . . . . . . . . . . . . 139 CONTENTS x 5.3.2 Virtualized RIRs & BRIRs . . . . . . . . . . . . . . . . . 139 5.3.3 Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.4 Rev erb eration Time . . . . . . . . . . . . . . . . . . . . . 143 5.4 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.4.1 Cen tered Position . . . . . . . . . . . . . . . . . . . . . . 145 5.4.1.1 In teraural Time Difference . . . . . . . . . . . . 145 5.4.1.2 In teraural Level Difference . . . . . . . . . . . . 147 5.4.1.3 Azim uth Estimation . . . . . . . . . . . . . . . 150 5.4.2 Off-Cen ter Positions . . . . . . . . . . . . . . . . . . . . 151 5.4.2.1 In teraural Time Difference . . . . . . . . . . . . 151 5.4.2.2 In teraural Level Difference . . . . . . . . . . . . 153 5.4.2.3 Azim uth Estimation . . . . . . . . . . . . . . . 155 5.4.3 Cen tered Accompanied by a Second Listener . . . . . . . 156 5.4.3.1 In teraural Time Difference . . . . . . . . . . . . 157 5.4.3.2 In teraural Level Difference . . . . . . . . . . . . 157 5.4.3.3 Azim uth Estimation . . . . . . . . . . . . . . . 159 5.5 Supplemen tary T est Results . . . . . . . . . . . . . . . . . . . . 163 5.5.1 Cen tered Position (Aided) . . . . . . . . . . . . . . . . . 164 5.5.1.1 In teraural Time Difference . . . . . . . . . . . . 165 5.5.1.2 In teraural Level Difference . . . . . . . . . . . . 166 5.5.1.3 Azim uth Estimation . . . . . . . . . . . . . . . 166 5.5.2 Off-cen ter Positions (Aided) . . . . . . . . . . . . . . . . 169 CONTENTS xi 5.5.2.1 In teraural Time Difference . . . . . . . . . . . . 169 5.5.2.2 In teraural Level Difference . . . . . . . . . . . . 171 5.5.2.3 Azim uth Estimation . . . . . . . . . . . . . . . 174 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 5.6.1 Sub jectiv e impressions . . . . . . . . . . . . . . . . . . . 180 5.6.2 Adv an tages and Limitations . . . . . . . . . . . . . . . . 181 5.6.3 Study limitations and F uture W ork . . . . . . . . . . . . 182 5.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 183 6 Conclusion 184 6.1 Iceb erg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 6.2 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.2.1 Iceb erg capabilities . . . . . . . . . . . . . . . . . . . . . 189 6.2.2 Iceb erg & Second Joint Listener . . . . . . . . . . . . . . 190 6.2.3 Iceb erg: Listener W earing Hearing Aids . . . . . . . . . . 191 6.2.4 Iceb erg Limitations . . . . . . . . . . . . . . . . . . . . . 192 6.3 General Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 193 6.4 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 193 Bibliograph y 195 App endices 225 A ITDs Ambisonics 225 B Delta ILD Ambisonics 226 C W av e Equation and Spherical Harmonic Representation 228 C.1 W av e Equation in Spherical Co ordinates . . . . . . . . . . . . . 228 C.2 Separation of the V ariables . . . . . . . . . . . . . . . . . . . . . 228 C.3 Spherical Harmonics . . . . . . . . . . . . . . . . . . . . . . . . 230 D Rev erb eration time in Acoustic Sim ulation 231 E Alpha Co efficien ts 232 F Questionnaire 234 xii List of T ables 2.1 Non-exhaustiv e ov erview list of hybrid auralization metho ds prop osed in the literature. The A-B order of the techniques do es not represent any order of significance. . . . . . . . . . . . 45 2.2 Ov erview of Lo calization Error Estimates or Measuremen ts from Loudsp eak er-Based Virtualization Systems Using V arious Au- ralization Metho ds. . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.1 Sound pressure level difference b etw een direct sound and early reflections ∆ SPL [dB] . . . . . . . . . . . . . . . . . . . . . . . 62 4.1 The questionnaire for sub jective ratings of p erformance, effort and engagement (English translation from Danish) . . . . . . . 113 4.2 Results of linear mixed mo del based on SNR and R T predictors estimates of the questionnaire. . . . . . . . . . . . . . . . . . . . 114 4.3 P earson skipp ed correlations b etw een p erformance and self-rep orted questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.1 Rev erb eration Time in three virtualized environmen ts . . . . . . 144 5.2 One wa y anov a, columns are absolute difference b et w een esti- mated and reference angles for different KEMAR p ositions and R Ts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 5.3 Hearing Level in dB according to the prop osed Standard Au- diograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 5.4 One wa y anov a, columns are absolute difference b et w een esti- mated and reference angles for different p ositions and R Ts. . . . 168 xiii 5.5 Maxim um ITD according to displacement . . . . . . . . . . . . 171 xiv List of Figures 2.1 Tw o-dimensional representation of the cone of confusion. . . . . 11 2.2 A descriptive definition of the measured free-field HR TF for a giv en angle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 P olar co ordinate system related to head incidence angles . . . . 13 2.4 Head-related transfer functions of four human test participants, fron tal incidence . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5 Audible effects of a single reflection . . . . . . . . . . . . . . . . 16 2.6 Binaural repro duction setups . . . . . . . . . . . . . . . . . . . . 23 2.7 V ector-based amplitude panning: 2D display of sound sources p ositions and weigh ts. . . . . . . . . . . . . . . . . . . . . . . . 25 2.8 Diagram representing the placemen t of sp eak ers in the VBAP tec hnique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.9 Spherical Harmonics Y m n ( θ , ϕ ). . . . . . . . . . . . . . . . . . . . 29 2.10 B-format comp onen ts: omnidirectional pressure comp onen t W, and the three v elo cit y comp onen ts X, Y, Z. . . . . . . . . . . . . 30 2.11 Illustration of Huygen’s Principle of a propagating wa ve front. . 32 2.12 Normalized Room Impulse Response: example from a real room in the time domain (left), and in the time domain in dB (righ t). 35 2.13 LoRa implementation pro cessing diagram . . . . . . . . . . . . . 43 3.1 Hearing Sciences - Scottish Section T est Ro om. . . . . . . . . . 59 xv LIST OF FIGURES xvi 3.2 Eriksholm T est Ro om. . . . . . . . . . . . . . . . . . . . . . . . 59 3.3 Rev erb eration time in third of o ctav e . . . . . . . . . . . . . . . 61 3.4 HA TS and KEMAR inside test ro om in Glasgo w . . . . . . . . . 63 3.5 HA TS and KEMAR inside Eriksholm’s Anechoic Ro om . . . . 63 3.6 Description of exp erimen t’s measured p ositions and mannequin placemen t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.7 In teraural cross correlation - F rontal angle . . . . . . . . . . . . 72 3.8 P olar representation IACC . . . . . . . . . . . . . . . . . . . . . 72 3.9 In teraural Time Difference b y angle: VBAP accompanied . . . . 74 3.10 Am bisonics - directivit y representation in 2D . . . . . . . . . . 75 3.11 In teraural Time Difference b y angle: Am bisoncis accompanied . 76 3.12 In teraural Level Differences: VBAP and Ambisonics . . . . . . . 77 3.13 In teraural Lev el Differences: av eraged o ctav e bands as a func- tion of azim uth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane. . . . . . . . . . . . . . . . . . . . . . . . 78 3.14 In teraural Lev el Differences with additional listener (VBAP and Am bisonics) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.15 Discrepancies in Interaural Lev el Differences (VBAP) . . . . . . 79 3.16 VBAP In teraural Lev el Differences as function of azim uth angle around the centered listener. . . . . . . . . . . . . . . . . . . . . 80 3.17 Discrepancies in Interaural Lev el Differences: Am bisonics . . . . 81 3.18 Am bisonics In teraural Lev el Differences as function of azimuth angle around the cen tered listener. . . . . . . . . . . . . . . . . 82 3.19 VBAP Off center ITD HA TS 25 cm . . . . . . . . . . . . . . . . 84 3.20 VBAP Off center ITD HA TS 50 cm . . . . . . . . . . . . . . . . 84 3.21 VBAP Off center ITD HA TS 75 cm . . . . . . . . . . . . . . . . 85 LIST OF FIGURES xvii 3.22 VBAP Off center ITD displaced HA TS . . . . . . . . . . . . . . 85 3.23 VBAP ITD considering real sound sources only . . . . . . . . . 86 3.24 Am bisonics ITD as a function of source angle . . . . . . . . . . 87 3.25 ILD (VBAP and Am bisonics) in off-cen ter setups . . . . . . . . 88 3.26 ILD centered setup and off-center VBAP setups . . . . . . . . . 89 3.27 Differences in the ILD b etw een centered and off-center (25 cm) in VBAP setups . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.28 Differences in the ILD b etw een centered and off-center (50 cm) in VBAP setups . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.29 Differences in the ILD b etw een centered and off-center (75 cm) in VBAP setups . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.30 ILD centered setup and off-center Ambisonics setups . . . . . . 92 4.1 Auralization pro cedure implemented to create mixed audible HINT sen tences with 4 spatially separated talkers at the sides and back (maskers) and one target in fron t. . . . . . . . . . . . 104 4.2 Spatial setup of the exp erimen t . . . . . . . . . . . . . . . . . . 105 4.3 Eriksholm Anechoic Ro om: Rev erb eration Time . . . . . . . . . 105 4.4 Eriksholm Anechoic Ro om: Bac kground noise . . . . . . . . . . 106 4.5 Exp erimen t setup placed inside anechoic ro om. . . . . . . . . . . 106 4.6 Ov erall rev erb eration time (R T) as a function of receptor (head) p osition in the mid-saggital plane re center (0 cm) . . . . . . . . 108 4.7 Sound pressure level Am bisonics virtualized setup . . . . . . . . 109 4.8 P articipan t p ositioned to the test. . . . . . . . . . . . . . . . . . 110 4.9 Exp erimen t’s trial design . . . . . . . . . . . . . . . . . . . . . . 111 4.10 Graphic User Interface . . . . . . . . . . . . . . . . . . . . . . . 112 LIST OF FIGURES xviii 4.11 P erformance accuracy (word-scoring) . . . . . . . . . . . . . . . 114 4.12 Self rep orted sub jective intelligibilit y . . . . . . . . . . . . . . . 115 4.13 Self rep orted sub jective effort . . . . . . . . . . . . . . . . . . . 115 4.14 Self rep orted sub jective disengagement . . . . . . . . . . . . . . 115 5.1 T op view. Loudsp eak ers p osition on horizontal plane to virtu- alization with prop osed Iceb erg metho d. . . . . . . . . . . . . . 123 5.2 Normalized Ambisonics first-order RIR generated via ODEON soft w are. Left panel depicts the w a v eform; right panel depicts the wa veform in dB. . . . . . . . . . . . . . . . . . . . . . . . . 125 5.3 Reflectogram split into Direct Sound Early and Late Reflections. 126 5.4 Iceb erg’s pro cessing Blo c k diagram. The Ambisonics RIR is treated, split, and con v olv ed to an input signal. A virtual audi- tory scene can b e created by playing the multi-c hannel output signal with the appropriate setup.. . . . . . . . . . . . . . . . . 129 5.5 Omnidirectional c hannel of Ambisonics RIR for a sim ulated ro om. 129 5.6 RIR Ambisonics segments . . . . . . . . . . . . . . . . . . . . . 130 5.7 Example of signal auralized with the Iceb erg metho d . . . . . . 132 5.8 Loudsp eak er frequency resp onse comparison . . . . . . . . . . . 135 5.9 Loudsp eak ers normalized frequency resp onse . . . . . . . . . . . 135 5.10 Loudsp eak ers normalized frequency resp onse filtered . . . . . . . 136 5.11 BRIR/RIR acquisition flow chart: Iceb erg auralization metho d. . 141 5.12 BRIR measurement setup: B&K HA TS and KEMAR p ositioned inside the anechoic ro om. . . . . . . . . . . . . . . . . . . . . . . 141 5.13 Measuremen t p ositions (grid) . . . . . . . . . . . . . . . . . . . 142 5.14 R T within Iceb erg virtualized en vironmen t . . . . . . . . . . . . 144 5.15 Iceb erg Cen tered Interaural Time Difference . . . . . . . . . . . 145 LIST OF FIGURES xix 5.16 Iceb erg Centered In teraural Time Difference R Ts = 0, 0.5 and 1.1 s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.17 Iceb erg cen tered ILD . . . . . . . . . . . . . . . . . . . . . . . . 147 5.18 Iceb erg and Real loudsp eak ers ILDs as a function of azimuth angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 5.19 Iceb erg Heatmap Absolute ∆ ILD . . . . . . . . . . . . . . . . 148 5.20 Iceb erg: Heatmap Absolute ∆ ILD min us JND . . . . . . . . . . 149 5.21 Iceb erg metho d: Estimated azimuth angle . . . . . . . . . . . . 150 5.22 Iceb erg ITD F rontal displacement . . . . . . . . . . . . . . . . . 152 5.23 Iceb erg ITD F rontal and lateral displacement . . . . . . . . . . 152 5.24 Delta Interaural Level Differences R T = 0.0 s . . . . . . . . . . 153 5.25 Delta Interaural Level Differences R T = 0.5 s . . . . . . . . . . 154 5.26 Delta Interaural Level Differences R T = 1.1 s . . . . . . . . . . 155 5.27 Estimated (mo del by Ma y and Kohlrausc h [ 182 ]) frontal az- im uth angle at different p ositions inside the loudsp eak er ring as function of the target angle. . . . . . . . . . . . . . . . . . . . . 156 5.28 ITD with second listener present . . . . . . . . . . . . . . . . . 157 5.29 Delta Interaural Level Differences Cen tered+Second Listener . . 158 5.30 Estimated lo calization error with presence of a second listener . 160 5.31 Difference to target in estimated lo calization with presence of a second listener . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 5.32 Estimated error to R T=0 considering the estimation of real loudsp eak ers as basis. . . . . . . . . . . . . . . . . . . . . . . . . 162 5.33 HA TS w earing a Oticon Hearing Device (righ t ear) . . . . . . . 164 5.34 In teraural Time Difference Iceb erg metho d (aided) . . . . . . . . 165 5.35 Iceb erg metho d ILD (aided condition) . . . . . . . . . . . . . . 166 5.36 Azim uth angle estimation (aided condition) . . . . . . . . . . . 167 5.37 Absolute difference in estimated azimuth angle (aided condition) 168 5.38 T ukey test to compare means aided condition. . . . . . . . . . . 169 5.39 Iceb erg metho d off center ITD (aided condition) . . . . . . . . . 170 5.40 Delta Interaural Level Differences Aided R T = 0.0 s . . . . . . . 172 5.41 Delta Interaural Level Differences Aided R T = 0.5 s . . . . . . . 172 5.42 Delta Interaural Level Differences Aided R T = 1.1 s . . . . . . . 173 5.43 Estimated fron tal azim uth angle on different p ositions (aided condition) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 A.1 ITD as a function of source angle Ambisonics . . . . . . . . . . 225 B.1 Differences in the ILD Am bisonics, 25 cm . . . . . . . . . . . . . 226 B.2 Differences in the ILD Am bisonics, 50 cm . . . . . . . . . . . . . 227 B.3 Differences in the ILD Am bisonics, 75 cm . . . . . . . . . . . . . 227 D.1 Reverberation time (a) Classro om (b) Restaurant . . . . . . . . 231 E.1 Classro om alpha co efficien ts . . . . . . . . . . . . . . . . . . . . 232 E.2 Restaurant alpha co efficien ts . . . . . . . . . . . . . . . . . . . . 233 E.3 Anechoic alpha co efficien ts . . . . . . . . . . . . . . . . . . . . . 233 xx Nomenclature General Sym b ols C 50 Clarit y: the ratio b et ween the first 50 ms of the RIR and from 50 ms to the end, Eq. (2.12), page 36 . C 80 Clarit y: the ratio b etw een first 80 ms of the RIR and RIR from 80 ms to the end, Eq. (2.12), page 36 . D 50 Clarit y: ratio b et w een first RIR 50 ms of a RIR and the complete RIR, Eq. (2.14), page 37 . D 80 Clarit y: ratio b etw een first 80 ms of a RIR and the the complete RIR, Eq. (2.14), page 37 . g Gain matrix, page 25 . h ( t ) Impulse resp onse energy in time domain, Eq. (2.10), page 35 . h b ( t ) RIR measured with a pressure gradient microphone, Eq. (2.16), page 38 . h L ( t ) Impulse resp onses collected from the left ear, page 39 . h R ( t ) Impulse resp onses collected from the right ear, page 39 . l 1 V ector from center p oint to c hannel 1, Eq. (2.3), page 25 . l 2 V ector from center p oint to c hannel 2, Eq. (2.3), page 25 . L 12 Sp eak er P osition Matrix (Channels), page 25 . m Am bisonics comp onen ts order, Eq. (2.9), page 30 . N n um b er of necessary sources to Ambisonics repro duction, Eq. (2.9), page 30 . p V ector from center p oint to virtual font, Eq. (2.3), page 25 . xxi p p-v alue for the t-statistic of the hypothesis test that the corresp ond- ing co efficien t is equal to zero or not., page 114 . p n L ( t ) Bandpassed left impulse resp onse, Eq. (3.7), page 71 . p n R ( t ) Bandpassed righ t impulse resp onse, Eq. (3.7), page 71 . p L ( t ) Impulse response at the en trance of the left ear canal, Eq. (3.5), page 70 . p R ( t ) Impulse resp onse at the en trance of the right ear canal, Eq. (3.5), page 70 . R T Rev erb eration time, page 34 . R T 60 Rev erb eration time, page 34 . s ( t ) Arbitrary sound source signal, page 29 . S l ( t ) Time signal recorded with the set microphone and the loudsp eaker l , page 66 . S E Standard error of the co efficien ts, page 114 . T 20 Rev erb eration Time ( T 60 ) extrap olated from 25 dB of energy deca y, Eq. (2.10), page 35 . T 30 Rev erb eration Time ( T 60 ) extrap olated from 35 dB of energy deca y, Eq. (2.10), page 35 . t Time, page 12 . t s Cen ter Time, Eq. (2.15), page 37 . T 60 Rev erb eration Time, Eq. (2.10), page 34 . v ( t ) 1kHz Sin usoidal 1 k Hz signal recorded from the calibrator in VFS, Eq. (3.2), page 66 . V V olume of the ro om, Eq. (2.10), page 34 . v l ( t ) Calibrator signal recorded in the left ear, Eq. (3.1), page 65 . v r ( t ) Calibrator signal recorded in the right ear, Eq. (3.1), page 65 . Greek Sym b ols α l, rms Calibration factor for the left ear, Eq. (3.1), page 65 . xxii α r, rms Calibration factor for the right ear, Eq. (3.1), page 65 . ¯ α Av eraged Absorption Co efficient, Eq. (2.10), page 34 . Γ l Lev el factor to the loudsp eaker l , Eq. (3.3), page 66 . ω Angular frequency, page 12 . ϕ Elev ation angle related to the ears axis of the listener, page 29 . θ Azimuthal angle related to the ears axis of the listener, page 29 . Mathematical Op erators and Con v en tions β Fixed-effects regression co efficient, page 114 . e Exp onen tial function, where e (1) ≈ 2 , 7182, page 12 . R In tegral, page 12 . j √ − 1, imaginary op erator, page 12 . τ Time delay, Eq. (2.18), page 39 . t x t-statistic for each co efficien t to test the n ull hypothesis, page 114 . Y m n ( θ , ϕ ) Spherical harmonics function of order n and degree m , Eq. (2.7), page 29 . L eq Equiv alen t contin uous sound lev el, page 103 . max() F unction that returns the elemen t with the maximum v alue for a sequence of num b ers, or for a vector, Eq. (2.18), page 39 . RMS() Ro ot mean square, Eq. (3.2), page 66 . Acron yms and Abbreviations 2D Tw o-dimensions in space, page 24 . 3D Three-dimensions in space, page 24 . vs. F rom Latin V ersus is the past participle of vertar e . which means “against” and “as opp osed or compared to., page 81 . AD/D A Analog-to-Digital Digital-to-Analog conv erter, page 59 . AR Augmen ted reality, page 27 . xxiii ASW Apparen t Source Width, page 38 . BRIR Binaural ro om impulse resp onse, page 15 . CTC Cross-talk cancellation, page 23 . dB HL Hearing Loss in decib els, page 102 . DBAP Distance-Based Amplitude Panning, page 47 . DS Direct sound, page 15 . EcoEG Combination study num b er 3: Eco (Reverberation/Ecological) and EEG, page 97 . EEG Electro encephalogram, page 50 . FFRs Brainstem frequency resp onses, page 50 . FFT F ast F ourier transform, page 70 . FIR Finite Impulse Resp onse, page 71 . HA TS Head and torso sim ulator, page 60 . HC B& K 4128 HA TS at cen ter p osition, page 79 . HC K+ X B& K 4128 HA TS at cen ter p osition and KEMAR at X cm to the left, page 80 . HC K- X B& K 4128 HA TS at center p osition and KEMAR at X cm to the righ t, page 79 . HEAR- ECO Inno v ativ e Hearing Aid Researc h – Ecological Conditions and Out- come Measures, page 97 . HINT Hearing in Noise T est, page 103 . HO A Higher Order Ambisonics, page 31 . HR TF Head-Related T ransfer F unction, page 12 . IA CC Interaural Cross-Correlation Co efficien t, page 39 . IA CF In teraural cross-correlation function, Eq. (3.5), page 70 . ILD In teraural Level Difference, page 10 . IPD In teraural Phase Difference, page 10 . xxiv ITD In teraural Time Difference, page 10 . ITF Interaural T ransfer F unction, page 14 . JND Just noticeable difference, page 93 . KEMAR Kno wles Electronics Manikin for Acoustic Research, page 60 . LEF Lateral Energy F raction, Eq. (2.16), page 38 . LEV Listener Env elopment , page 39 . LG Lateral Strength, Eq. (2.17), page 39 . LMM Linear Mixed-effect Mo del, page 113 . LPF Lo w-pass filter, page 73 . L TI Linear and Time-Inv arian t System, page 33 . MD AP Multiple-Direction Amplitude Panning, page 47 . MO A Mixed Order Ambisonics, page 52 . MTF Monaural T ransfer F unction, page 13 . NSP Nearest Sp eak er, page 52 . PLE P erceptual Lo calization Error, page 41 . PT A4 F our bands pure tone audiometry, page 102 . RIRs Ro om impulse resp onse, page 15 . SH Spherical Harmonics, page 28 . SPL Sound pressure lev el, page 35 . SR T Sp eec h Reception Threshold, page 52 . VBAP V ector-Based Amplitude P anning, page 23 . VBIP V ector-Based In tensit y Panning, page 41 . VFS V olts full scale, page 66 . VSE Virtual Sound Environmen t, page 18 . W Omnidirectional channel, Eq. (2.9), page 29 . WFS W av e Field Syn thesis, page 31 . xxv LIST OF FIGURES xxvi X Bi-directional pattern c hannel to wards the source, Eq. (2.9), page 29 . Y Bi-directional pattern channel perp endicular to the source in az- im uth, Eq. (2.9), page 29 . Z Bi-directional pattern channel p erp endicular to the source in elev a- tion, Eq. (2.9), page 29 . Chapter 1 In tro duction Individuals with normal hearing often can effortlessly comprehend complex listening scenarios in v olving m ultiple sound sources, bac kground noise, and ec ho es [ 226 ]. Ho wev er, those with hearing loss may find these situations par- ticularly c hallenging [ 273 , 289 , 304 , 317 ]. These environmen ts are commonly encoun tered in daily life, particularly during so cial even ts. They can negatively impact the communication abilities of individuals with hearing loss [ 137 , 260 ]. The difficulties asso ciated with understanding complex listening scenarios can b e a significant barrier for individuals with hearing loss, leading to reduced participation in so cial activities [ 16 , 63 , 119 ]. 1.1 Motiv ations Sev eral hearing research lab oratories worldwide are developing systems to re- alistically simulate challenging scenarios through virtualization to b etter un- derstand and help with these everyda y challe nges in p eople’s lives [ 41 , 79 , 102 , 116 , 118 , 160 , 161 , 188 , 195 , 218 – 220 , 259 , 272 , 298 ] The virtualization of sound sources is a p o w erful to ol for auditory research capable of achieving 1 Chapter 1. Aims and Scop e 2 a high lev el of detail, but curren t metho ds use exp ensiv e, expansive technol- ogy [ 293 ]. In this work, a new auralization metho d has been dev elop ed to ac hiev e sound spatialization with a reduction in the technological hardw are requiremen t, making virtualization at the clinic lev el p ossible. 1.2 Aims and Scop e Ov erall the ob jectiv e of the research was to in v estigate parameters of sound virtualization metho ds related to its lo calization accuracy , esp ecially the p er- ceptually based ones [ 39 ], in their optimal but also in c hallenging conditions. F urthermore, an auralization metho d oriented to a smaller setup to reduce the hardw are requirements is prop osed. The sp ecific ob jectives were: • T o inv estigate spatial distortions through binaural cue differences in tw o w ell-kno wn virtualization setups (V ector-Based Amplitude Panning and Am bisonics (VBAP)). • T o in vestigate the influence of a second listener inside the sound field (VBAP and Ambisonics). • T o ev aluate the feasibilit y of a sp eec h-in-noise test within Ambisonics virtualized reverberant ro oms. • T o study the relation b et ween reverberation, signal-to-noise ratio (SNR), and listening effort in en vironments virtualized in first-order Am bisonics. • T o in v estigate the binaural cues, ob jective lev el and rev erb eration time for a new auralization metho d utilizing four loudsp eak ers. Chapter 1. Con tributions 3 • T o inv estigate the influence of hearing aids on binaural cues and ob jec- tiv e parameters within virtualized scenes utilizing the new auralization metho d with an appropriate setup. The main ob jective of this research was to examine v arious parameters of sound virtualization metho ds related to their lo calization accuracy , with a fo cus on p erceptually-based methods [ 39 ], in optimal and c hallenging conditions. Addi- tionally , a new auralization metho d w as prop osed for a smaller setup to reduce hardw are requirements. The sp ecific goals of the research included: • Examining spatial distortions through differences in binaural cues in t w o w ell-kno wn virtualization setups (V ector-Based Amplitude Panning and Am bisonics (VBAP)). • Ev aluating a second listener’s impact within the sound field (VBAP and Am bisonics). • Assessing the feasibility of a sp eech-in-noise test within Ambisonics vir- tualized reverberant ro oms. • Inv estigating the relationship b et w een reverberation, signal-to-noise ratio (SNR), and listening effort in environmen ts virtualized using first-order Am bisonics. • Using four loudsp eak ers, prop ose an auralization metho d, measure it, analyze ob jective parameters against existent metho ds. • T est and analyze the influence of acquiring signals with hearing aids microphones on virtualized scenes using the new auralization metho d with a four-loudsp eaker virtualization setup. Chapter 1. Con tributions 4 1.3 Con tributions The main contribution of this researc h to the scientific field of auditory p er- ception is the dev elopmen t of a new auralization metho d that addresses the curren t gap in the virtualization of sound sources using a small n umber of loudsp eak ers. Sp ecifically , this metho d aims to achiev e b oth go o d lo calization accuracy and a high level of immersion simultaneously , which has b een a chal- lenge in previous approac hes. F urthermore, the proposed method com bines existing techniques. It can b e implemented using readily av ailable hardw are, requiring a minimum of four loudsp eak ers. This tec hnology mak es it more accessible for audiologists and researchers to create realistic listening scenarios for patien ts and participan ts while reducing the technical resources required for implemen tation. Ov erall, this work represen ts a v aluable contribution to the field of auditory p erception and has the p otential to adv ance the understanding of spatial hearing and the developmen t of effective hearing solutions. 1.4 Organization of the Thesis In Chapter 2 , a review examines previous work carried out in sev eral dif- feren t areas concerning virtualization and the auralization of sound sources. The chapter starts with an ov erview of the basic concepts of h uman sound p erception. Next, virtual acoustics are explored, reviewing the generation of virtual acoustic en vironments using different rendering paradigms and meth- o ds. In addition, relev an t ro om acoustics concepts and ob jective parameters, and their relation to hearing p erception, are described. Finally , the review considers auralization and virtualization as applied to auditory research. This review stresses the imp ortance of virtual sound sources for greater realism and ecological v alidit y in auditory research and the c hallenges of adequately Chapter 1. Organization of the Thesis 5 creating a virtual en vironmen t fo cused on auditory research. Chapter 3 presents an in vestigation of binaural cue distortions in imp erfect setups. First, the metho ds are describ ed, including the complete auralization of signals using tw o different methods and the system’s calibration. The in v es- tigation first compares b oth auralization methods through the same calibrated virtualization setup in terms of spatial distortions. Then the spatial cues are examined with the addition of a second listener to the virtualized sound field. Both inv estigations are p erformed with the primary listener on and off-center. In Chapter 4 , a b eha vioral study examines sub jectiv e effort within virtualized sound scenarios. As the study was part of a collab orativ e pro ject, only one auralization metho d was selected, first-order Am bisonics. The aim was to ex- amine ho w SNR and reverberation com bine to affect effort in a sp eec h-in-noise task. Also, the feasibility of using first-order Am bisonics was examined. Ho w- ev er, the sound sources were w ell separated in space, and lo calization accuracy w as not a factor. An imp ortan t asp ect of the study w as an auralization issue in v olving head mo v ement observed during pilot data collection. This issue led to a solution that allow ed the study to con tin ue. The results v erified the relationships b et w een sub jective effort and acoustic demand. F urthermore, this issue led to the further inv estigation of the effect of off-center listening, considered in b oth Chapter 3 and Chapter 5 . In Chapter 5 , a h ybrid method of auralization is prop osed com bining the meth- o ds examined and used in previous chapters: VBAP and Ambisonics. This metho d w as designed to allow auralized signals to b e virtualized in a small repro duction system, thus providing b etter accessibility to research within the virtualized sound field in clinics and researc h cen ters that do not ha ve a size- able acoustic apparatus. The hybrid auralization metho d aims to unite the strengths of b oth tec hniques: lo calization b y VBAP and immersion by Am- Chapter 1. Organization of the Thesis 6 bisonics. Both of these psychoacoustic strengths are related to the ro om’s im- pulse resp onse. The hybrid metho d con v olv es the desired signal with distinct parts of an Ambisonics-format impulse resp onse that c haracterizes the desired en vironmen t. The potential for generating auralizations for a repro duction system with at least four loudspeakers is demonstrated. The virtualization system was tested with three differen t scenarios. Parameters relev ant to the p erception of a scene, such as rev erb eration time, sound pressure lev el, and binaural cues, w ere ev aluated in differen t p ositions within the sp eak er arrange- men t. The effects of a second participan t inside the ring were also in vestigated. The ev aluated parameters w ere as exp ected, with the listener in the system’s cen ter (sw eet sp ot). How ever, deviations and issues at sp ecific presentation angles w ere identified that could b e impro v ed in future implementations. Such errors also need to b e further inv estigated as to their influence on the sub jec- tiv e perception of the scenario, whic h w as not performed due to the CO VID-19 pandemic. An alternativ e robustness assessment was p erformed offline, exam- ining the lo calization accuracy with a mo del prop osed b y May et al. [ 182 ] The metho d also prov ed effective for tests with hearing aids for listeners p ositioned in the cen ter of the sp eak er arrangement. Ho w ev er, the metho d p erformance considering hearing instruments with compression algorithms and adv anced signal pro cessing still needs to b e verified. Chapter 6 presents a general discussion of the feasibilit y of applying tests using the prop osed metho d and an o v erview of the pro cesses. In addition, the relev an t con tributions of the work are presen ted, as are the limitations and the suggestions for further impro v emen ts. Chapter 2 Literature Review 2.1 In tro duction The field of audiology is concerned with the study of hearing and hearing dis- orders, as w ell as the assessmen t and rehabilitation of individuals with hearing loss [ 110 ]. In this review chapter, w e will explore v arious topics related to h uman binaural hearing, spatial sound, and virtual acoustics to pro vide a comprehensiv e o v erview of the curren t state of kno wledge in these fields and highligh t their imp ortant contributions to our understanding of hearing and auditory p erception. First, we will delv e in to the intricacies of h uman binaural hearing. Next, w e will examine the concepts of spatial hearing, including the v arious binaural and monoaural cues that contribute to our ability to lo cal- ize sound in space. W e will also explore the head-related transfer function, whic h describ es the w ay that sounds are filtered as they tra vel from their source to the ear drum, as well as the sub jective asp ects of audible reflections. Next, we will turn our attention to spatial sound and virtual acoustics. W e will discuss the virtualization of sound, including the v arious methods used to achiev e this, suc h as auralization and virtual sound repro duction. W e will 7 Chapter 2. Human Binaural Hearing 8 also examine the differen t auralization paradigms used in auditory researc h, including binaural, panorama, vector-based amplitude panning, ambisonics, and sound field syn thesis. W e will then examine the role of ro om acoustics in virtualization, and auditory research, including the v arious parameters, used to describ e ro om acoustics, such as reverberation time, clarity and definition, cen ter time, and parameters related to spatiality . Finally , w e will explore the use of loudsp eak er-based virtualization in auditory researc h, including hybrid metho ds and sound source lo calization, as well as the assessmen t of listening effort. 2.2 Human Binaural Hearing The engineering side of the listening pro cess can be simplified mo deled through t w o input blo cks separated in space [ 92 ]. These inputs, frequency , and lev el are limited and are follo w ed b y a signal pro cessing c hain that relates the medium transformations for the wa ve propagation from air to fluid and elec- trical pulses [ 315 ]. Although this blo ck mo deling can b e reasonably accurate for educational pur- p oses, it falls short of capturing the true effect and imp ortance of listening on our essence as human b eings. The abilit y to feel and interpret the world through the sense of hearing, and to attribute meaning to sound ev ents, enables h umans to enric h their tangible world [ 56 , 244 ]. F or instance, a c haracteristic sound can ev ok e memories or trigger an alert [ 128 ]. A piece of music can bring tears to one’s eyes or p ersuade someone to purchase more cereal [ 13 , 114 ]. A p erson’s voice can activ ate certain facial nerves, turning hidden teeth into a smile. These are some of the reasons wh y researc hers and clinicians dedicate their liv es to understanding the transformation of sound even ts into auditory ev en ts, with a scien tific dedication fo cused on creating solutions and op ening Chapter 2. Human Binaural Hearing 9 opp ortunities for more p eople to exp erience the sound they lov e and deserve - a dedication fo cused on p eople and their needs . As the auditory system comprises t wo sensors, normal-hearing listeners can exp erience the b enefits of comparing sounds autonomously , relating them to the space around them [ 21 ]. This constant signal comparison is the main principle of binaural hearing, where the differences b et ween these sounds allo w for the identification of the direction of a sound even t, as w ell as the sensation of sound spatiality [ 9 , 40 ]. Usually , these signals are assumed to b e part of a linear and time-inv ariant system, whic h helps to study how humans in terpret the information present in the differen t signals across the time and frequency domains. Ho wev er, this assumption of linearity can fail when analyzing fast sound sources, reflectiv e surfaces, or sound propagating through disturb ed air [ 200 , 255 ]. Nonetheless, the adv an tages of quan tifying and capturing the effect hav e led to significant progress in hearing sciences. 2.2.1 Spatial Hearing Concepts Iden tifying the direction of incidence of a sound source based on the audible w a v es receiv ed b y the listener is defined as an act or pro cess of h uman sound lo calization [ 285 ]. F or research in acoustics, it is relev an t to ac kno wledge that the receiver is, in general, a human b eing. The human hearing mec hanism’s main anatomical c haracteristic is the binaural system. There are t wo signal reception p oin ts (external ears p ositioned on opp osite sides of the head). Al- b eit, the whole set (torso, head, hearing pavilions) can also mo dify the signal that reac hes the tw o tympanic mem branes at some extent [ 153 , 216 ]. Hu- man binaural hearing and asso ciated effects ha ve b een extensiv ely rep orted b y Blauert [ 38 ]. Chapter 2. Human Binaural Hearing 10 In addition to analyzing sound sources’ spatial lo cation, the cen tral auditory system extracts real-time information from the sound signals related to the acoustic en vironment, suc h as geometry and ph ysical properties [ 153 ]. An- other b enefit is the p ossibility of separating and in terpreting com bined sounds, esp ecially from sources in differen t directions [ 170 , 242 ]. 2.2.2 Binaural cues The sound propagation sp eed in the air can be assumed to be finite and appro ximately constan t, considering it as an appro ximately non-disp ersive medium [ 18 ]. Thus, when the incidence is not directly fron tal or rear, the w a v efront trav els through differen t paths to the ears, reaching them at differ- en t times. The time interv al b etw een that a sound takes to arrive on both ears is commonly expressed in the literature as Interaural Time Difference (ITD) [ 39 ]. It is crucial cue for sound source lo calization in lo w-frequency sounds [ 39 , 153 , 242 ]. Moreov er, it is considered the primary lo calization cue [ 306 ]. F or con tinuous pure tone signals and other p eriodic signals, the ITD can be expressed as the time Interaural Phase Difference (IPD) [ 285 ]. On the other hand, most mammals’ high-frequency sound source lo calization is based on a comparativ e analysis of sound energy in eac h ear’s frequency bands, the Interaural Lev el Difference (ILD). The named duplex theory sur- mises ITD cues as the basis to sound lo calization of low-frequency and ILD cues to high-frequency . The authorship of this is assigned to Lord Rayleigh at the b eginning of the last century [ 246 ]. These binaural cues are related to the azim uthal p osition. How ever, they do not presen t the same success explaining the localization on elev ated p ositions [ 37 , 250 ]. An am biguit y in binaural cues caused by head symmetry and referred to as the cone of confusion [ 296 ] can create difficulties to a correct sound source lo calization. The cone of confusion is the imaginary cone extended sidew ays from eac h ear where sound source Chapter 2. Human Binaural Hearing 11 lo cations will create the same interaural differences (see Figure 2.1 ). Figure 2.1: Two-dimensional r epr esentation of the c one of c onfusion. Head mov ements are essential for resolving the am biguous cues from sound sources lo cated on the cone of confusion. As the p erson mo ves their head, they change the reference and the incidence angle helping them to solv e the dualit y . This change is reflected in the cues asso ciated with directional sound filtering caused by the h uman b o dy’s reflection, absorption, and diffraction. 2.2.3 Monaural cues Monoaural cues are related to spatial impression, esp ecially in the lo calization of elev ated sound sources. These cues give, to some exten t, some limited but crucial localization abilities to p eople with unilateral hearing loss [ 72 , 307 ]. This type of cue is cen tered on instan t level comparison and frequency changes. As the level of a contin uous enough sound source changes, the appro ximation or distancing of that source can b e estimated. F urthermore, when there are head mov emen ts that shape the frequency conten t, the disturbance, mainly the pinnae provide, can b enefit the listener to learn the p osition of a sound source [ 129 , 292 ]. In addition, the imp ortance of the previous knowledge of the sound to the decon volution pro cess is also in vestigated, revealing mixed results [ 307 ]. Chapter 2. Human Binaural Hearing 12 2.2.4 Head-related transfer function The Head-Related T ransfer F unction (HR TF) describ es the directional filtering of incoming sound due to h uman bo dy parts suc h as the head and pinnae [ 189 ]. The free-field HR TF can b e expressed as the division of the impulse resp onses in the frequency domain measured at the en trance to the ear canal and the cen ter of the head but with the head absen t [ 108 ] (see Figure 2.2 ). HR TFs dep end on the direction of incidence of the sound and are generally measured for some discrete incidence directions. Mathematical mo dels can also generate individualized HR TFs based on anthropometric measures [ 52 ] or through geometric generalization [ 70 ]. Figure 2.2: A descriptive definition of the me asur e d fr e e-field HR TF for a given angle. The referential system related to the head can b e seen in Figure 2.3 , where β is the elev ation angle in the midplane, and ϕ is the angle defined in the horizon tal plane. Chapter 2. Human Binaural Hearing 13 Figure 2.3: Polar c o or dinate system r elate d to he ad incidenc e angles, adapte d fr om Portela [ 240 ]. Supp ose the distance to the sound source exceeds 3 meters. In that case, it can b e considered approximately a plane w a ve, thus making the previous HR TFs almost indep enden t of the distance to the sound source [ 38 ]. Blauert [ 39 ] also explain tw o other t yp es of HR TF, namely: • Monaural T ransfer F unction (MTF): relates the sound pressure, at a measuremen t p oint in the ear canal, from a sound source at an y p osition to a sound pressure measured at the same p oin t, with a sound source at a reference p osition ( ϕ = 0 and β = 0). MTF is giv en by MTF = P i P 1 r,ϕ,β,f P i P 1 ϕ =0 ◦ ,β =0 ◦ ,f , (2.1) where p i it can b e p 1 , p 2 , p 3 or p 4 . – p 1 sound pressure in the cen ter of the head position with the listener Chapter 2. Human Binaural Hearing 14 absen t; – p 2 sound pressure at the entrance of the o ccluded ear canal; – p 3 sound pressure at the entrance to the ear canal; – p 4 : eardrum sound pressure. • Interaural T ransfer F unction (ITF): relates the sound pressures at corre- sp onding measurement p oints in the tw o auditory canals. The reference pressure will then b e the ear that is directed to w ards the sound source. The ITF can b e obtained through ITF = P i Opp osite side of the source P i Side facing the source . (2.2) More considerable v ariations are seen ab o v e 200 Hz in HR TFs [ 293 ] b ecause the head, torso, and shoulders b egin to significantly in terfere in frequencies up to approximately 1.5 kHz (mid frequencies). In addition, the pinna and the ca vum conc hae (space inside the most inferior part of the helix cross; it forms the vestibule that leads into the external acoustic meatus [ 270 ]) distort frequencies greater than 2 kHz. HR TF measuremen ts v ary from p erson to p erson, as seen in Figure 2.4 , where TS 1, TS 2, TS 3, and TS 4. represent HR TFs of different p eople. When recording using mannequins or different p eople’s ear canals (non-individu alized HR TFs), the repro duction precision in terms of spatial lo cation and realism tends to b e diminished [ 51 , 178 ]. This p o orer precision is b ecause the transfer function will differ for eac h individual, especially at high frequencies [ 155 ]. This dep endence is related to the w a v elength and the singular irregularit y of the ear canal of each human b eing [ 38 ]. Chapter 2. Human Binaural Hearing 15 Figure 2.4: He ad-r elate d tr ansfer functions of four human test p articip ants, fr ontal incidenc e, fr om V orl¨ ander [ 293 ]. Binaural Impulse Resp onse A Binaural Ro om Impulse Resp onse (BRIR) results from a measurement of the resp onse of a ro om to excitation from an (ideally) impulsive sound [ 183 ]. The BRIRs are comp osed of a sequence of sounds. P arameters like the magnitude and the decay rate, the phase, and time distribution are the key to understanding how a BRIR can audibly char- acterize a ro om to a human p erception [ 167 ]. Alb eit the air contains a small p ortion of Co2 that is disp ersiv e, sound propagation v elo city can be considered homogeneous in the air (non-disp ersiv e medium) [ 312 ] for the Ro om Impulse Resp onses (RIRs). The first sound from a source that reaches a receptor inside the ro om tra vels a smaller distance, and it is called direct sound (DS). Usually , the following sounds result from reflections that tra vel a longer path, losing energy on each interaction and resulting in an exp onen tial deca y of magnitude. The BRIR is prop osed to collect the ro om information as a regular Impulse Resp onse, although ha ving tw o sensors separated as the typical human head. No w ada ys, BRIR can b e recorded with s mall microphones placed in the ear canal of a p erson or utilizing microphones placed in mannequins [ 197 ]. A BRIR is the auditory time representation of a set source-receptor defined b y its p osition, orientation, acoustic prop erties as directionalit y of the sound Chapter 2. Human Binaural Hearing 16 source, as well as from the physical elemen ts within the environmen t [ 38 , 108 ]. The conv olution of BRIR with audio signals is a feasible task for mo dern computation, which allows the creation and manipulation of sounds ev en in real-time applications [ 62 , 217 ]. Thus, it is p ossible to imp ose spatial and rev erb eran t characteristics of differen t spaces to a given sound [ 109 ]. 2.2.5 Sub jectiv e asp ects of an audible reflection The impulse resp onse is comp osed of the direct sound follo w ed by a series of reflections (initial and later reflections) [ 45 , 165 ]. Essential knowledge on how the human auditory system processes the sp ectral and spatial information con tained in the impulse resp onse has b een obtained through studies with sim ulated acoustic fields. [ 6 , 17 , 93 , 125 , 141 , 174 , 176 , 188 , 193 , 257 , 305 , 321 ]. The results of Barron’s exp erimen ts, depicted in Figure 2.5 , in volv ed the repro duction of b oth a direct sound and a lateral reflection. These t wo auditory stimuli were manipulated in terms of their time delay and relative amplitude, with the goal of eliciting sub jectiv e impressions correlated to these factors. By v arying the time b et w een the direct sound and the reflection, as w ell as the relative amplitude of these stimuli, it w as p ossible to understand b etter ho w these c haracteristics impact the o v erall auditory exp erience. Figure 2.5: Audible effe cts of a single r efle ction arriving fr om the side (adapt fr om R ossing [ 254 ]). Chapter 2. Spatial Sound & Virtual Acoustics 17 The audibility threshold curve indicates that the reflection will b e inaudible if the dela y or the relativ e level is minimal. The reflection’s sub jective effect also dep ends on the direction of incidence of the sound source in the horizon tal and v ertical plane. It is p ossible to note that for dela ys of up to 10 milliseconds, the relativ e difference in level must b e at least -20 dB for the reflection to b e noticeable. The echo effect is typically observ ed in delays of more than 50 milliseconds, b eing an acoustic rep etition with a high relativ e level—appro ximately the same energy as the direct sound. The coloring effect is associated with the significant c hange in the spectrum caused b y the constructive and destructiv e in terference of the sup erp osition of sound wa v es. The image c hange happ ens when there are reflections with relativ e lev els higher than direct sound or minimal dela ys. In this case, the sub jective p erception is that the sound source has a differen t p osition in space than the visual system p erceiv es. 2.3 Spatial Sound & Virtual Acoustics The sound p erceived by h umans is iden tified and classified based on physical prop erties, such as in tensity and frequency [ 242 ]. Human b eings are equipp ed with tw o ears (tw o highly efficien t sound sensors), enabling a real-time compar- ison of these prop erties b etw een the captured sound signals [ 9 ]. The sounds and the dynamic interaction betw een sound sources, their p ositions, mov e- men ts, and the ph ysical interaction of the generated sound w av es with the en vironmen t can b e p erceiv ed b y normal-hearing p eople, providing what is called spatial aw areness [ 153 ]. That auditory spatial aw areness includes the lo calization of the sound source, estimation of distance, and estimation of the Chapter 2. Spatial Sound & Virtual Acoustics 18 size of the surrounding space [ 38 , 305 ]. A p erson with hearing loss may lose this abilit y partially or en tirely; the spatial aw areness is also tied to the lis- tener’s exp erience with the sound and the en vironmen t, motiv ation, or fatigue lev el [ 54 , 304 ]. In the field of virtual acoustics, the ultimate goal is to generate a sound even t that elicits a desired auditory sensation, creating a Virtual Sound Environmen t (VSE) [ 293 ]. In order to ac hiev e this, it is necessary to syn thesize or record the acoustic prop erties of the target scene and subsequently repro duce them in a manner that accurately reflects the original acoustic conditions [ 97 ]. This in v olv es a careful consideration of the v arious factors that contribute to the o v erall auditory exp erience, including the sp ectral and spatial c haracteristics of the sound. By accurately recreating these prop erties, it is p ossible to create a highly immersive and realistic VSE that effectiv ely conv eys the intended auditory exp erience to the listener [ 196 , 213 , 293 ]. 2.3.1 Virtualization No w ada ys, it is p ossible to create audio files containing information ab out sound prop erties related to a sp ecific space [ 293 ]. F or example, it is p ossible to enco de information ab out the source and receptor p osition, the transmission path, reflections on surfaces, and the amoun t of energy absorb ed and scattered ( e.g., Odeon [ 59 ], a commercially a v ailable acoustical softw are). The sound field prop erties can b e simulated, syn thesized, or recorded in-situ [ 113 , 293 ]. These signals can be enco ded and repro duced correctly in v arious reproduction systems [ 122 , 161 ]. The creation of files that can b e repro duced containing suc h information is called auralization. As different interpretations of the terms o ccur in literature, in this thesis, the virtualization pro cess is considered to encompass the auralization and the reproduction of a sound (recorded, Chapter 2. Spatial Sound & Virtual Acoustics 19 sim ulated, or synthesized) that includes spatial prop erties. 2.3.1.1 Auralization Auralization is a relatively recent pro cedure. The first studies w ere conducted in 1929; Spand¨ ock and colleagues tried to pro cess signals measured in a scale- created ro om. After that, in 1934, Spand¨ oc k [ 280 ] succeeded in the first aural- ization, in the analogical wa y , using ultrasonic signals of scale mo dels recorded in magnetic tap es. In 1962 Schroeder [ 263 ] incorp orated the computing pro- cess into the auralization. In 1968 Krokstad [ 146 ] developed the first acoustic ro om sim ulation softw are. The term auralization was in tro duced in the litera- ture by Kleiner in 1993: “Auralization is the pro cess of rendering audible, by ph ysical or mathematical mo deling, the soundfield of a source in a space, in suc h a wa y as to simulate the binaural listening exp erience at a giv en p osition in the mo deled space.” (Kleiner [ 138 ]) In his b o ok titled Auralization, published in 2008, V orl¨ ander defined: “Aural- ization is the technique of creating audible sound files from n umerical (simu- lated, measured, or syn thesized) data.” (V orl¨ ander [ 293 ]) In this work, auralization is understo o d as a technique to create files that can b e executed as p erceiv able sounds. An auralization metho d describes the tec h- nique; it can inv olve one or more auralization techniques. These sounds can then b e virtualized (repro duced) via loudsp eakers or headphones and provide audible information ab out a sp ecific acoustical scene in a defined space, fol- lo wing V orlander’s definition. That w as chosen to encourage the separation of the pro cess as an auralized sound file can contain information that allows it to b e deco ded in different repro duction systems [ 320 ]. Auralization is consolidated in arc hitectural acoustics [ 45 , 148 , 165 , 254 ], and Chapter 2. Spatial Sound & Virtual Acoustics 20 it is also emerging in environmen tal acoustics [ 19 , 68 , 69 , 139 , 162 , 231 , 232 ]. This technique allows a piece of audible information to b e easily accessed and understo o d. It is also an integral part of the en tertainmen t industry in games, mo vies, and virtual or mixed realit y [ 320 ]. Knowing an en vironmen t’s acoustic prop erties allo ws it to manipulate or add syn thesized or recorded elements, leading the receiv er to the desired auditory impression, including the sound’s spatial distribution [ 62 ]. This pro cess is also used in hearing researc h, allowing researc hers to in tro duce more ecologically v alid sound scenarios to their study (See Section 2.3.4 ). Sound spatiality , or the p erception of sound wa ves arriving from v arious direc- tions and the abilit y to lo cate them in space, is a crucial aspect of the auditory exp erience [ 40 ]. Auralization, whic h is analogous to visualization, inv olves the represen tation of sound fields and sources, the simulation of sound propaga- tion, and the strategy to deco de in the spatial repro duction setup [ 293 ]. That is t ypically achiev ed through tri-dimensional computer mo dels and digital sig- nal pro cessing tec hniques, whic h are applied to generate auralizations that can b e repro duced via acoustic transducers [ 293 ]. The mo deling paradigm used to create the spatial sensation can b e p ercep- tually or physically based [ 39 , 106 , 164 , 276 ]. Multiple dimensions influ- ence sound p erception; the t yp e of generation of the sound, the wind direc- tion, the temp erature, the source and the receptor mo v emen t, space (size, shap e, and conten t), receptor’s spatial sensitivity , and source directivity are some examples. That implies the imp ortance of physical effects as Doppler shifts [ 96 , 284 , 293 ]. F urthermore, the review of ro om acoustic and psyc hoa- coustics elements (See Section 2.3.3 ) corrob orates the auralization mo deling pro cedure’s understanding. Chapter 2. Spatial Sound & Virtual Acoustics 21 2.3.1.2 Repro duction Sound signals con taining acoustic characteristics of a space can b e repro duced either with binaural techniques (headphones or loudsp eakers) or with multiple loudsp eak ers (m ultichannel techniques) [ 293 ]. Moreov er, an acoustic mo del for a space can be analytically or n umerically implemented, ha ving a series of comp etent algorithms and commercial soft ware and to ols av ailable [ 49 ]. With that, it is also p ossible to measure micro and macro acoustic prop erties for materials in a lab oratory or in-situ [ 206 ] and access databases of v arious co efficien ts and indexes to an extended catalog of materials [ 50 , 71 , 158 , 266 ]. On the repro duction end of the virtualization pro cess, factors such as fre- quency and lev el calibration, signal pro cessing, and the frequency resp onse of the hardw are can significantly impact the accuracy of the final sound (e.g., the orien tation/correction of the microphone when calibrating the system [ 274 ]). Dep ending on the chosen paradigm, a lack of attention to these details may disrupt an accurate description of the sound field, sound even t, or sound sen- sation [ 214 , 282 , 283 , 320 ]. Additionally , the qualit y of the stim uli ma y be compromised dep ending on the chosen repro duction technique, which is often tied to the hardware av ailable [ 77 , 166 , 275 , 276 ]. That can lead to undesired effects on the level of immersion and problems with the accuracy of sound lo calization and identification ( e.g. , source width, source separation, sound pressure level, and coloration and spatial confusion effects [ 97 ]). The pro cess of building a VSE is called sound virtualization, whic h in volv es b oth the aural- ization and repro duction stages to create audible sound from a file. The main tec hnical approac hes or paradigms for repro ducing auralized sound are Binau- ral, Panorama, and Sound Field Synthesis (Section 2.3.2 ). These paradigms can be distinguished by their output, which can be ph ysically or p erceptu- ally motiv ated. F or example, while binaural metho ds are treated apart, they can b e intrinsically classified in a ph ysically-motiv ated paradigm since its suc- Chapter 2. Spatial Sound & Virtual Acoustics 22 cess relies on repro ducing the correct physical signal at a sp ecific p oint in the listener’s auditory system, t ypically the entrance of the ear canal [ 106 ]. 2.3.2 Auralization P aradigms 2.3.2.1 Binaural Binaural hearing, whic h refers to the abilit y to p erceiv e sound in a three- dimensional auditory space, is a fundamental concept in auditory researc h and has b een extensiv ely studied by researchers suc h as Blauert [ 40 ]. In the con- text of auralization, the term ”binaural” refers to the sp ecific paradigm that aims to repro duce the exact sound pressure level of a sound even t at the lis- tener’s eardrums. That can b e achiev ed through the use of headphones or a pair of loudsp eakers (kno wn as transaural repro duction) [ 314 ]. How ever, when using distan t loudsp eak ers, it is necessary to consider the in terference that can o ccur b et ween the sounds coming from eac h sp eak er. T o mitigate this issue, tec hniques such as cross-talk cancellation (CTC) [ 60 , 262 ] can b e employ ed, whic h inv olve manipulating a set of filters to cancel out the distortions caused b y the sound from one sp eaker reaching the other ear. Another form of bin- aural repro duction inv olves the use of closer loudsp eak ers that are nearfield comp ensated. Binaural metho ds o ver headphones is commonly applied. It requires no ex- tensiv e hardw are (in simple setups that do not track the listener’s head), pro- viding a v alid acoustic representation and spatial a w areness [ 293 ]. A Disad- v an tage of this metho d can include the accuracy dep endence of individualized HR TF (as each h uman b eing ha ve his o wn slightly differen t anatomic ”fil- ter set”) [ 314 ]. Over headphones also, the mov ement of the listener’s head can b e disruptive to the immersion [ 179 ]. It ma y require tracking the head’s Chapter 2. Spatial Sound & Virtual Acoustics 23 mo v emen t [ 11 , 115 , 252 ], e.g. , when mo v ements are required or allow ed in an exp erimen t. F urthermore, a listener wearing a pair of headphones ma y not represent a realistic situation. F or example, an exp erimen t with a virtual auditory environmen t that represents a regular daily con versation with aged participan ts ma y lose the task’s ecological v alidity . Also, usually , headphones prev en t the listener from wearing hearing devices. Figure 2.6 illustrates the main idea b ehind different binaural repro duction setups. Figure 2.6: Binaur al r epr o duction setups: He adphones, tr ansaur al and ne ar-field tr ansaur al (A dapte d fr om Kang and Kim [ 131 ]). 2.3.2.2 P anorama The P anorama paradigm encompasses auralization metho ds fo cused on deliv- ering accurate ITDs and ILDs at the listener’s p osition, also kno wn as stereo- phonic tec hniques [ 106 , 276 ]. The most well-kno wn metho ds are based on am- plitude panning [ 180 ], including Lo w-order Ambisonics [ 91 ] and V ector-Based Amplitude Panning (VBAP) [ 241 ]. High Order Am bisonics is an extension of the Am bisonics metho d, which is not typically considered a panning metho d but rather a sound field synthesis metho d (See Section 2.3.2.3 ). VBAP em- plo ys lo cal panning b y rendering sound using pairs or triplets of loudsp eakers. In con trast, Ambisonics uses global panning to pro duce a single virtual source using all av ailable loudsp eakers [ 282 ]. Chapter 2. Spatial Sound & Virtual Acoustics 24 V ector Based Amplitude P anning: The V ector-Based Amplitude Pan- ning (VBAP) is a first-order appro ximation of the comp osition of emitted signals that creates virtual sources [ 241 ]. The virtualization pro cess using VBAP is based on amplitude panning in tw o dimensions (v ariation in ampli- tude b et ween the speakers), whic h is derived from the Law of Sines and Law of T angents (see Benest y et al. [ 23 ] for a deriv ation of these la ws). The original h yp othesis of VBAP assumes that the sp eak ers are arranged symmetrically , equidistan t from the listener, and in the same horizon tal plane. VBAP do es not limit the num b er of usable speakers but uses a maximum of 3 simulta- neously . The sp eakers are arranged in a reference circle (2D case) or sphere (3D case), and a limitation of the tec hnique is that virtual sources cannot b e created outside of this region. VBAP is mainly used for the repro duction of syn thetic sounds [ 180 ]. The formulation of the VBAP metho d (from Pulkki [ 241 ]) for tw o dimensions starts from the stereophonic configuration of tw o channels (see Figure 2.7 ). Reform ulated to a v ector base, formed by unit length v ectors l 1 = [ l 11 l 12 ] T and l 2 = [ l 21 l 22 ] T that p oin t to the sp eakers and the unit length v ector p = [ p 1 p 2 ] T whic h p oints to the virtual source and presen ts itself as a linear combination of the v ectors l 1 and l 2 . The notation T is used here to iden tify the matrix transp osition. Chapter 2. Spatial Sound & Virtual Acoustics 25 Figure 2.7: V e ctor-b ase d amplitude p anning: 2D display of sound sour c es p ositions and weights. Consider the vetor p : p = g 1 l 1 + g 2 l 2 , (2.3) where g 1 and g 2 (scalar) are the gain factors to b e calculated for p ositioning the vector relative to the virtual source. In matrix form, there is p T = g L 12 , (2.4) where g = [ g 1 g 2 ], and L 12 = [ l 1 l 2 ] T . The gains can b e calculated b y g = p T L − 1 12 = [ p 1 p 2 ] l 11 l 12 l 21 l 22 − 1 . (2.5) Chapter 2. Spatial Sound & Virtual Acoustics 26 The formulation is also expanded to 3 dimensions: p = g 1 l 1 + g 2 l 2 + g 3 l 3 , (2.6) and p T = g L 123 , (2.7) where g 1 , g 2 , and g 3 are gain factors, g = [ g 1 g 2 g 3 ], and L 123 = [ l 1 l 2 l 3 ] T . The detailed deriv ation can b e found at [ 241 ]. The deriv ation can use triangles and the three-dimensional system. Figure 2.8 presen ts an example of the sound source distribution for virtualization of a virtual source P using VBAP in three dimensions. Figure 2.8: Diagr am r epr esenting the plac ement of sp e akers in the VBAP te chnique A dapte d fr om [ 241 ]. Some factors collab orate so that metho ds based on Amplitude Panorama are widely used in virtual audio applications, such as the lo w computational cost and flexibility in the sp eak ers’ placement. Am bisonics: The original Ambisonics auralization metho d is an amplitude panning metho d that differs from th e V ector Base Amplitude Panning (VBAP) Chapter 2. Spatial Sound & Virtual Acoustics 27 metho d in sev eral w ays. While VBAP only uses p ositiv e w eights to pan sound across speakers, Ambisonics uses a com bination of p ositiv e and neg- ativ e weigh ts to create a shift in frequency and amplitude. This results in a more homogeneous sound field, alb eit with a broader virtual source. Addition- ally , Am bisonics has all loudsp eakers active for any source p osition. A t the same time, VBAP only activ ates sp ecific sp eakers based on the desired source p osition [ 199 ]. One of the b enefits of Am bisonics is its scalabilit y for repro duction on differen t loudsp eak er arra ys and the ability to enco de and deco de the sound field dur- ing the recording and repro duction pro cess [ 161 ]. This v ersatilit y is p ossible b ecause Am bisonics signals can b e directly recorded using an appropriate mi- crophone array or sim ulated through numerical acoustic algorithms that mo del the directional sensitivit y of the microphone arra y [ 5 , 46 , 59 ]. The signal can then be deco ded and rendered in real time to differen t arra ys with v arious n um b ers of loudsp eak ers. Hence, an Am bisonics deco der is a to ol for con v ert- ing an Ambisonics representation of a sound field into a m ultic hannel audio format that can b e repro duced ov er a giv en sp eaker setup [ 130 , 235 , 238 ]. In order to repro duce an Ambisonics signal, it must first b e transformed, or ”de- co ded,” in to a format compatible with a sp ecific sp eak er configuration. Simple deco ders consist of a frequency-indep enden t weigh ting matrix [ 282 ]. It is also p ossible to repro duce the signal via headphones, whic h can b e considered a sp ecific sp eaker setup, b y scaling it down to binaural signals [ 320 ]. Addi- tionally , Am bisonics can enhance realism by trac king head mov emen ts and correcting binaural signals utilizing HR TFs as filters [ 277 ]. This feature is par- ticularly relev an t in the recording and broadcasting industry , particularly with emerging technologies such as augmented realit y (AR) [ 320 ]. In summary , an Am bisonics deco der is a to ol used to transform an Ambisonics representation of a sound field in to a multic hannel audio format that can b e repro duced o v er a giv en loudspeaker setup, enabling the creation of immersiv e sound exp eriences. Chapter 2. Spatial Sound & Virtual Acoustics 28 According to Schr¨ oder [ 261 ], decomp osition in spherical harmonics (SH) is a recen t analysis and widely used in the mo deling of directivity paths. Analo- gous to a F ourier transform in the frequency domain, SH in the spatial domain decomp oses the signal into spherical functions (in the F ourier transform, the decomp osition is in sine or cosine functions) weigh ted by the co efficien ts of the corresp onding harmonic spheres. According to Pollo w [ 239 ], it is commonly applied in m ulti-dimensional domain problems. Ho wev er, the analytical re- quiremen ts for cases with few dimensions (tw o in the case of the sound field) are considerably simplified. Manipulating the wa ve equation by separating v ariables is an essen tial to ol here. App endix C sho ws the deriv ation of SH through the separation of v ariables of the wa ve equation in spherical co ordinates (Equation C.1 ). The solutions to the linear w a v e equation in spherical co ordinates expressed in the frequency domain (Helmholtz equation) are orthogonal basis functions Y m n ( θ , ϕ ) where n is the degree and m is the order. These angle-dep endent functions are called spherical harmonics and can represent, for example, a sound field [ 309 ]. That is the core assumption to Ambisonics recording and repro duction. Figure 2.9 depicts SHs up to order N = 2. Chapter 2. Spatial Sound & Virtual Acoustics 29 Figure 2.9: Spheric al Harmonics Y m n ( θ , ϕ ) . R ows c orr esp ond to or ders 0 ≤ n ≤ 2 , c olumns to de gr e es − n ≤ m ≤ n (adapte d fr om Pol low [ 239 ]). The four SH weigh ts Y m n ( θ , ϕ ) to enco de all the spatial audio information into a First-Order Ambisonics file is given by: B m n ( t ) = s ( t ) Y m n ( θ s , ϕ s ) (2.8) where the s ( t ) is the source signal in the time domain and Y m n ( θ s , ϕ s ) the enco ding co efficients to the source s ( t ). Computed as first order in the B- format, the normalized comp onen ts can b e describ ed as [ 172 ]: W = B 00 = S Y 00 ( θ S , ϕ S ) = S (0 . 707) X = B 11 = S Y 11 ( θ S , ϕ S ) = S cos θ S cos ϕ S Y = B 1 − 1 = S Y 1 − 1 ( θ S , ϕ S ) = S sin θ S cos ϕ S Z = B 10 = S Y 10 ( θ S , ϕ S ) = S sin ϕ S (2.9) The resulting four-c hannel signals are the equiv alent to an omnidirectional (W) and three orthogonal bi-directional (commonly called figure-of-eigh t) mi- Chapter 2. Spatial Sound & Virtual Acoustics 30 crophones (X, Y, and Z). The c hannels can represent the pressure and the particle velocity of a given sound (See Figure 2.10 . It is p ossible to transco de and manipulate the generated signal to change its orientation with a matrix m ultiplication in signal pro cessing. Also, it is possible to deco de the same enco ded signal to a single sound source, headphones, or a multic hannel arra y . Figure 2.10: B-format c omp onents: omnidir e ctional pr essur e c omp onent W, and the thr e e velo city c omp onents X, Y, Z. Extr acte d fr om Rumsey [ 256 ]. The limitation of first-order Ambisonics is spatial precision since it is only effec- tiv e at a point cen tered within a defined area. This limitation can b e o vercome with higher-order comp onen ts. Adding a set of higher-order comp onen ts im- pro v es the directionalit y . How ever, increasing the n um b er of comp onents will also increase the num b er of loudsp eakers required to play higher-order Am- bisonics. That means a more accurate sound field representation if the order is increased. The num b er of channels N for a p eriphonic Am bisonics of order m order is N = ( m + 1) 2 for 3D repro duction and N = (2 m + 1) for 2D [ 65 ]. Chapter 2. Spatial Sound & Virtual Acoustics 31 2.3.2.3 Sound Field Synthesis The ob jective of tec hniques from Sound Field Syn thesis remains the same as in the techniques from the p erceptually-motiv ated paradigm: a spatial sound field repro duction. The p erceptually motiv ated are centered on the psychoacoustic effects of summing binaural cues that lead the listener to p erceiv e a virtual source. On the other side, the Sound Field Synthesis techniques rely on the ph ysical reconstruction of the original/simulated sound field to a sp ecific area. The main techniques are the extension of Am bisonics repro duction to higher orders called Higher Order Am bisonics (HOA) and the W a ve Field Syn thesis (WFS) [ 24 , 25 ]. The HOA extends the order of the classical Am bisonics and, therefore, the n um b er of sound sources arranged in a spherical array . As the Ambisonics order increases, the p erceiv ed sound source direction accuracy also increases, although requiring more loudsp eak ers [ 97 ]. An imp ortan t distinction can b e made b et w een the Ambisoni cs and HOA. Given the truncation p ossibility in Am bisonics, the metho d is treated as a soft transition from a p erceptually based metho d to a ph ysically based one. Although the HOA utilizes the same principle as Am bisonics, it is classified as a sound field syn thesis paradigm (ph ysically based) along with WFS. The HOA limitations are rep orted in the literature b y several studies [ 26 , 27 , 64 , 73 , 236 , 299 ], esp ecially the aliasing in frequency that leads to pressure errors and the sweet sp ot size [ 253 ]. The WFS form ulation relies on Huygen’s principle: a propagatin g w av efront at an y instan t is shap ed to the env elop e of spherical wa ves emanating from ev ery p oin t on the wa vefron t at the prior instant [ 281 ], the principle is illustrated in Figure 2.11 ). Chapter 2. Spatial Sound & Virtual Acoustics 32 Figure 2.11: Il lustr ation of Huygen ’s Principle of a pr op agating wave fr ont. A conceptual difference b et w een WFS and HOA is that for HOA, the sound c haracteristics are describ ed in a point (or small area) inside the array , while in WFS, the sound pressure that m ust be kno wn is on the border of the repro duc- tion area. A review and a comparison of b oth metho ds and their compromises in terms of spatial aliasing errors and noise amplification is presen ted in [ 65 ]. Their findings indicated similar constraints to b oth metho ds. Ho wev er, they are b oth c haracterized by the requiremen t of large arrays of loudspeak ers. The HO A has b een found to ha v e a higher limit of the size of the cen ter area. In con trast, the WFS has limitations on the distortion of higher frequencies (alias- ing), dep ending on the num b er of loudsp eakers. Regardless, as the scop e of the thesis aims to w ork with a small n um b er of loudsp eak ers, they will not b e thoroughly discussed. 2.3.3 Ro om acoustics Differen t asp ects are tak en into accoun t when describing the hearing exp eri- ence of a h uman b eing in a space, for example, the individualit y of auditory Chapter 2. Spatial Sound & Virtual Acoustics 33 training, familiarity with space, p ersonal preferences, humor, fatigue, culture, and the sp ok en language [ 10 , 32 , 123 , 169 , 184 , 190 , 250 ]. Ho wev er, there are similarities in the expressions used b etw een sample groups. Such an effect is attributed to the similarit y of the auditory-cognitiv e mec hanism of human b eings [ 40 ]. In arc hitectural acoustics and ro om acoustics, studies of sound prop erties as- sume that a sound source and a receiv er in a giv en space is a linear and time- in v arian t system (L TI) [ 45 ]. Thus, a complete L TI characterization to each source-receiv er can b e expressed b y its impulse resp onse in the time domain or the transfer function in the frequency domain [ 45 , 293 ]. 2.3.3.1 Ro om acoustics parameters Ob jectiv e parameters are essen tial in acoustic pro jects and in comp ositions of statistical mo dels that aim to predict the human interpretation of acous- tic phenomena [ 254 ]. Ob jectiv e parameters deriv ed from the L TI’s impulse resp onse aim to create metrics that quantify sub jective descriptors from n u- merous exp eriments [ 254 ]. The calculation and measuremen t of many ob jective parameters are describ ed in an app endix to the International Organization for Standardization (ISO) standard 3382 [ 127 ]. 2.3.3.2 Rev erb eration Time The reverberation time (R T) measures the time it takes for the impulse re- sp onse’s sound pressure lev el to decrease to one-million th of its maximum v alue, equiv alent to a decline of 60 dB; it is also often referred to as R T 60 or T 60 . Note that the reverberation time measures how fast the decay of sound energy o ccurs and not how long the reverberation lasts in the environmen t, Chapter 2. Spatial Sound & Virtual Acoustics 34 dep ending on the sound source p o w er and the bac kground noise. The R T w as the first parameter studied, mo deled, and understo od, related to sev eral sub jectiv e asp ects of the human hearing exp erience in a ro om. T o da y , this is considered the most critical parameter, although it is not enough to describ e h uman p erception completely . W allace C. Sabine [ 258 ] initially describ ed it through mathematical relations obtained by an empirical metho d, follo wed later by developing the theoretical bases together with W. S. F ranklin [ 86 ]. The expression gives the analytical form of the reverberation time obtained by Sabine: T 60 = 0 . 161 V S ¯ α [s] (2.10) where V is the v olume of the ro om and S ¯ α represents the amoun t of absorption presen t in the environmen t, the unit is named [Sabins] in honor of Sabine. Subsequen t mo dels improv ed the calculation of the rev erb eration time by con- sidering the ev olution of the energy densit y , and the sound absorption carried out by the air [ 74 ], the sp ecular reflection of each sound wa ve [ 187 , 271 ], the propagation path [ 148 ], the triaxial arrangemen t of the differen t absorption co efficien ts [ 14 , 82 ], among others. In addition to statistical theory , T 60 can b e obtained from the measurement of the impulse response. In measuremen ts, the T 60 is obtained considering limitations regarding the bac kground noise lev el and the sound source’s max- im um sound pressure level. Th us, according to the ISO 3382 standard [ 127 ], the measuremen t’s dynamic range must present the end of the deca y at least 15 dB ab o v e the background noise and start 5 dB b elow the maximum. F or example, the sound pressure level required to measure the T 60 in a ro om with a background noise of 30 dB is 110 dB (30 + 15 + 60 + 5). Chapter 2. Spatial Sound & Virtual Acoustics 35 Linear b ehavior is noted b y observing the square of the energy h 2 ( t ) in the deca y curve plotted in dB (See Figure 2.12 ). Thus, to reduce the dynamic range required for measuremen t, it is p ossible to estimate the T 60 through other limits. The T 60 is commonly mistak en for double the T30, which is not true. The T 20 and T 30 also corresp ond to the time the sound pressure level (SPL) inside the ro om tak es to drop 60 dB but estimated from measuremen t restricted to ranges of -5 dB to -25 dB, and -5 dB to -35 dB, respectively . Therefore, the linear energy decay pro duces the relation T 60 = T 30 = T 20 . Figure 2.12: Normalize d R o om Impulse R esp onse: example fr om a r e al r o om in the time domain (left), and in the time domain in dB (right). The T 20 is obtained as the deca y rate by the linear least-squares regression of the measured decay curve, also called the Sc hro eder curve, in the range -5 dB to -25 dB. In comparison, the T 30 is obtained when the curv es’ adjustment is carried out in the range b etw een 5 dB and -35 dB [ 127 ]. 2.3.3.3 Clarit y and Definition The clarity and definition parameters express a balance b etw een the energy that arriv es earlier and later in the impulsiv e response, which is related to Chapter 2. Spatial Sound & Virtual Acoustics 36 h uman b eings’ particular abilit y to distinguish sounds in sequence [ 44 , 45 , 57 , 247 , 254 ]. With the first reflections arriving within the limits of 50 or 80 milliseconds, the tendency is to b e integrated b y the auditory system in to the direct sound. Th us, if the first reflections con tain relatively greater energy than the rev erb erating tail, the sound will b e exp erienced as amplified. On the other hand, if the reverberating tail has more energy and is long enough, it will b e p erceiv ed and mask the next direct sound. The limits of 50 and 80 milliseconds are defined in the literature as appropriate in optimizing sp eec h and music, resp ectively [ 245 , 247 ]. The Clarit y defined in the ISO 3382 standard measures the ratio b etw een the energy in the first reflections and the energy in the rest of the impulse response. Clarit y’s p ositive v alues, whic h are given in dB, mean more energy in the first reflections. Negative v alues indicate more energy in the rev erb erating tail. A n ull v alue indicates the balance b etw een the parts of the impulse resp onse. The ”Clarity” is giv en by: C 80 = 10 log R 80ms 0 h 2 ( t )d t R ∞ 80ms h 2 ( t )d t ! (2.11) and C 50 = 10 log R 50ms 0 h 2 ( t )d t R ∞ 50ms h 2 ( t )d t ! (2.12) The ”Definition” parameter, in turn, is presented on a linear scale and com- putes the ratio b et ween the energy con tained in the first reflections b y the total energy of the impulse resp onse. V alues greater than 0.5 indicate that most of the impulse resp onse’s energy is contained in the first reflections. The ”Definition” is given b y: Chapter 2. Spatial Sound & Virtual Acoustics 37 D 80 = R 80ms 0 h 2 ( t )d t R ∞ 0 h 2 ( t )d t (2.13) and D 50 = R 50ms 0 h 2 ( t )d t R ∞ 0 h 2 ( t )d t (2.14) 2.3.3.4 Cen ter Time The central time is a parameter analogous to the previous ones, measuring the balance b et w een the energy con tained in the early reflections and the rev er- b erating tail’s energy . Ho wev er, the central time is particularly interesting in p oin ting out what can b e seen as the center of gra vit y of the squared impulse resp onse. Moreov er, the cen tral time do es not previously delimit the transition barrier b et w een first reflections and a rev erb erating tail. Thus, the definition of the central time for an impulse resp onse is given b y: t s = R ∞ 0 th 2 ( t )d t R ∞ 0 h 2 ( t )d t (2.15) 2.3.3.5 P arameters related to spatialit y The relation to the human auditory spatiality sensation and the ob jective parameters deriv ed from measuremen ts are studied in detail in the litera- ture [ 20 , 21 , 88 ]. They observ e how the sound energy distribution is arranged from the directions and timing asp ect. The principal sensations and their Chapter 2. Spatial Sound & Virtual Acoustics 38 related parameters are presen ted for b etter understanding. Apparen t Source Width The Apparent Source Width (ASW) is related to the impression of the sound source’s size or ho w the source is distributed in the space. An ob jective metric asso ciated with ASW is the Lateral Energy F raction (LEF). The Equation 2.16 gives the LEF: LEF = R 80ms 5ms h 2 b ( t )d t R 80ms 0 h 2 ( t )d t (2.16) Where h ( t ) is the impulse resp onse measured with a microphone that has an omnidirectional sensitivit y pattern and h b ( t ) is the impulse response measured with a microphone that has bidirectional sensitivit y (pressure gradient) at the same p osition as the omnidirectional. Th us, this ob jective parameter represen ts the ratio b et w een the lateral energy that reac hes the receptor b et w een 5 and 80 milliseconds ( i.e. , the energy con- tained in the early reflections, excluding the direct sound) and the total energy arriving from all directions b et ween 0 and 80 milliseconds [ 21 ]. As low and mid frequencies make the dominant con tributions to the LEF, this parame- ter is usually represen ted by the arithmetic mean of the o cta ve bands’ v alues obtained b et w een 125 Hz and 1000 Hz [ 45 , 254 ]. Listener Env elopmen t The Listener En v elopmen t (LEV) is related to the impression of b eing immersed in the ro om’s rev erb eran t field. F rom Bradley and Soulodre’s exp erimen ts [ 44 ] with test participan ts inside an anechoic ro om, the sense of inv olvemen t was assessed with loudsp eakers. The authors find the LEV asso ciated with the ratio b et ween the lateral energy Chapter 2. Spatial Sound & Virtual Acoustics 39 and the total energy reaching the receptor. The lateral energy is contained in the impulse resp onse measured with a bidirectional microphone after the first 80 milliseconds. The total energy is defined as the impulse resp onse measured with an omnidirectional microphone, in free field condition, and 10 meters a w ay from the sound source utilizing the same sound source at the same p o wer. The ratio is called ”Lateral Strength” (LG) and is given b y: LG = R ∞ 80ms h 2 b ( t )d t R ∞ 0 h 2 10 ( t )d t (2.17) In teraural Cross-Correlation Co efficien t In his work, Keet [ 133 ] pro- p osed an auditory-cognitive pro cess relating the spatial impression to compar- ing the signals receiv ed by b oth ears. The cross-correlation function measures the degree of similarity of the signals. Therefore, the In ter-Aural Cross-Correlation Co efficien t (IACC) was incorp o- rated as a third parameter related to the spatial impression. The IACC is de- fined as the absolute maxim um v alue of the ratio b et ween the cross-correlation function of the impulse resp onses collected from the left ear ( h L ( t )) and the righ t ear ( h R ( t )) by the total energy contained in eac h of them. IA CC = max R t 2 t 1 h L ( t ) h R ( t + τ )d t r R t 2 t 1 h 2 L ( t )d t R t 2 t 1 h 2 R ( t )d t (2.18) where R t 2 t 1 h 2 L ( t )d t and R t 2 t 1 h 2 R ( t )d t are the energy b et w een the instan t t1 and the instant t2 in the impulse resp onse from the left and righ t ears; the expres- sion R t 2 t 1 h L ( t ) h R ( t + τ )d t is the cross correlation function b et ween the impulse resp onse; τ is giv en b et ween 0 and 1 ms. Chapter 2. Spatial Sound & Virtual Acoustics 40 2.3.4 Loudsp eak er-based Virtualization in Auditory Re- searc h Virtualization of sounds through auralization of simulated en vironmen ts hav e b een used in arc hitectural design to preview the sound behavior in ro oms when c hanging space design or even to preview a completely new space that is not built yet b efore the building pro cess [ 293 ]. As the ro om acoustics simulation and the auralization pro cess ev olves as a sound equiv alent to the visual preview rendered in 3D mo dels, it has found applications in research also outside the arc hitectural field [ 282 , 320 ]. Lately , the virtualization of sound sources has b een applied to extend the ecological v alidity of sound scenarios in auditory researc h [ 161 ]. Researc h that utilizes binaural virtualization with headphones are common in auditory research literature [ 4 , 38 , 142 , 248 , 305 ]. A series of adv antages may include, but are not limited to, the individual con trol of the stim uli repro duced in eac h ear, the smaller setup, and easier calibration [ 251 ]. Although binaural repro duction is a suitable metho d for some research questions, others may require a more complex test en vironmen t, esp ecially researc h encompassing hearing aids. In that regard, the use of loudsp eak ers can b e asso ciated with single loud- sp eak er presen tations where a loudsp eak er repro duce a single sound source p ositioned in space ( e.g ., [ 176 , 230 , 268 , 321 ]) or virtualization metho ds that manage auralized files to b e p erceiv ed as single sources or complex environ- men ts [ 89 , 177 , 295 ]. F or the virtualization of sound sources, the loudsp eak er n umber dep ends on the selected metho d of enco ding and deco ding the spatial information [ 293 ]. F or example, a quadrophonic loudsp eak er arrangement w as found to b e sufficient Chapter 2. Spatial Sound & Virtual Acoustics 41 to repro duce a diffuse sound field to a p erceptual spatial impression when constraining listener mov ements [ 117 ]. How ever, utilizing Directional audio co ding, Laitinen and Pullkki [ 150 ] found that to hav e an adequate repro duction of diffuse sound, it would b e necessary from 12 to 20 loudsp eak ers. VBAP and HOA techniques w ere ev aluated with different num b ers of loud- sp eak ers in simulations by Grimm et al. [ 97 ]. Perceptual lo calization error (PLE) w as computed for the arra ys utilizing these tec hniques. Eigh t loud- sp eak ers w ere estimated to b e sufficien t in terms of sound source lo calization. In the same w ork, Grimm textitet al. show ed that the effects of virtualiza- tion with VBAP and HOA on hearing-aid b eam patterns are present with less than 18 loudsp eakers in a bandwidth of 4 kHz (spatial aliasing higher than 5.7 dB criterion). Ho wev er, the sp ectral distances, a w eigh ted sum of the ab- solute differences in ripple and sp ectral slop e b etw een virtual and repro duced sound sources, w ere all v ery low, indicating high naturalness when compared to sub jective data from Mo ore and T an [ 191 ]. Aguirre [ 1 ] ev aluated VBAP and its v ariation V ector-Based Intensit y P anning (VBIP) in terms of spatial accuracy with 30 normal-hearing participants within an array of eigh t loudsp eakers. There was no significant difference among stim uli (sp eec h, in termitten t white noise, and con tinuous white noise) on b oth tec hniques. It was found that an a v erage PLE around 4 ◦ , consistent with the v alues simulated by Grimm et al. [ 97 ]. Ev aluating SNR b enefits on HA b eamformer algorithms within a spherical arra y with 41 loudsp eak ers, Oreinos and Buc hholz [ 212 ] found similar results b et w een the real en vironmen t and the auralized one. Repro duction errors in HO A repro duction to hearing aids were studied in [ 213 ]. The reverberation w as found to reduce the time-a v eraged errors in tro duced by HOA, implying that the frequency limit of usable renderings with HOA can b e extended in Chapter 2. Spatial Sound & Virtual Acoustics 42 those environmen ts. Loudsp eak er-based virtualization ha v e b een used in a hearing researc h con text ev aluating normal hearing, hearing impaired and hearing aid users through differen t metho ds [ 6 – 8 , 30 , 31 , 55 , 61 , 80 , 93 , 102 , 136 , 168 , 174 , 188 , 220 , 303 , 322 ]. F urthermore, some studies explored the ecological v alidity of the tech- niques with sub jective resp onses based on psyc ho-linguistic measure comparing in-situ and those virtualized in lab oratory [ 66 , 103 , 286 ]. The pro cess of virtualizing sound sources using loudsp eak ers is complex [ 282 ] and requires a thorough understanding of physical acoustics, psyc hoacoustics, signal pro cessing fundamen tals, and prop er calibration of softw are and hard- w are [ 126 , 165 , 293 ]. As a result, researc h centers ha v e developed systems to establish reliable procedures for virtualizing sound sources for auditory testing. Examples of suc h systems include the transaural CTC system dev elop ed by Asp¨ oc k et al. [ 17 ], the system with a spherical arra y of 42 loudsp eakers capable of rendering scenarios using HOA up to fifth order and VBAP presented by P arsehian et al. [ 215 ], and the Loudsp eak er Based Ro om Auralization (LoRa) system dev elop ed b y F avrot [ 79 ], whic h is capable of rendering auditory scenes using pure HOA and a h ybrid v ersion with Nearest Sp eaker (NSP) and HOA. In addition, Grimm [ 100 , 101 ] introduced the T o olb o x for Acoustic Scene Cre- ation and Rendering (T ASCAR), whic h is capable of rendering p erceptually plausible scenes in real-time using VBAP and HOA 2D implementation. A recen t study by Hamdan and Fletcher [ 107 ] prop osed a metho d using only tw o loudsp eak ers in 2022 based on the transaural metho d with cross-talk cancella- tion. While this list is not exhaustive, these studies provide recommendations and guidelines for the field and highlight the imp ortance of implementing re- liable systems and verifying their sound fields ob jectively and sub jectively to increase the ecological v alidity of auditory research and hearing aid develop- men t. Chapter 2. Spatial Sound & Virtual Acoustics 43 2.3.4.1 Hybrid Metho ds Hybrid metho ds that com bine the repro duction of direct sound and reverber- ation are not new, ha ving b een dev elop ed since at least the 1980s with the Am biophonics group [ 42 , 95 ]. They prop osed the Am biophonics metho d to re- pro duce concerts to one or t w o home listeners as if they w ere in the hall where the recording was p erformed. This metho d combined crosstalk canceled stereo- dip ole and conv olved signals with the IR from the recorded spaces [ 76 , 94 ]. The system aims to enhance the repro duction of recordings from existing systems (e.g., stereo and 5.1). The group also developed a new recording metho dology called Am biophone. This metho d is a microphone arrangemen t comp osed of t w o head-spaced omnidirectional microphones co v ered b y a baffle in the rear to fav or ro om reflections from frontal directions. In 2010, the Loudsp eak er based Ro om Auralization (LoRa) metho d developed b y F avrot [ 79 ] applied the hybrid concept using HOA and the nearest sp eak er (NSP) for the direct sound and early reflections. The metho d uses the env e- lop e from simulated ro oms to reduce the computational cost by multiplying it with uncorrelated noise. The scheme was originally conceptualized for a large spherical 69 loudsp eaker array . Figure 2.13 depicts its system schematic. Figure 2.13: L oR a implementation pr o c essing diagr am. The multichannel RIR is derive d in eight fr e quency b ands and for e ach p art of the input RIR (Figur e fr om F avr ot [ 79 ]). Chapter 2. Spatial Sound & Virtual Acoustics 44 P elzer et al. [ 221 ] presented a comparison b etw een transaural or cross-talk cancellation (CTC), VBAP , and 4th-order Am bisonics among t wo new h ybrid prop osals: (1) direct sound and early reflections through CTC and late re- flections with 4th-order Ambisonics and (2) direct sound and early reflections through VBAP and late reflections with 4th-order Ambisonics. The h ybrid metho ds were implemented in a single case without generalization to differen t sim ulations. These metho ds w ere tested within a 24 loudspeaker arra y with no statistically significant change in human lo calization p erformance by an y of the metho ds. Pausc h et al. [ 217 ] presented a metho d designed for inv estigations with sub jects with hearing loss. The metho d mixes binaural techniques to pro cess comp onents in complex simulated environmen ts and CTC to present them ov er loudsp eak ers. At the same time, the head p osition can b e track ed, allo wing user interaction. In 2017 Pulkki et al. [ 243 ] presented the first-order directional audio co ding (DirA C) metho d is a technique for repro ducing spatial sound o v er a standard stereo audio system. It is based on using first-order am bisonic channels, which enco de the sound pressure and particle v elo cit y at a listener’s lo cation to repre- sen t the sound field. These c hannels are transformed into a stereo audio signal using a frequency-dep enden t matrix, whic h preserves the spatial cues that are imp ortan t for localizing sound sources. The metho d implies the direction of ar- riv al of the sound source to b e able to virtualize it through amplitude panning. It uses real-world recordings. The DirAC metho d is effectiv e for v arious types of audio conten t, including m usic, sp eec h, and sound effects. It can p oten tially impro ve the spatial realism of audio exp eriences ov er traditional stereo systems and has applications in m yriad fields, including en tertainmen t, gaming, and virtual reality . T able 2.1 presents an ov erview of the listed metho ds and the techniques in- Chapter 2. Spatial Sound & Virtual Acoustics 45 v olv ed, their purp ose, and their parameters. T able 2.1: Non-exhaustive ov erview list of hybrid auralization metho ds prop osed in the literature. The A-B order of the techniques do es not represent any order of significance. Y ear Method Authors T echnique A T echnique B Proposed Loudspeaker Number Proposed to 1986 Ambiophonics F arina et al. [ 76 ] Crosstalk Cancelation Binaural 2 Music Repro duction 2005 DirAC Pulkki et al. [ 243 ] Ambisonics VBAP 2+ multiple applications 2010 LoRa F avrot [ 79 ] HOA NSP 64 2014 - P elzer et al. [ 221 ] Crosstalk Cancelation HOA 24 2014 - P elzer et al. [ 221 ] VBAP HOA 24 2018 Extended Binaural Real-Time Pausc h et al. [ 217 ] Binaural CTC 2 Hearing Loss Inv estigations 2.3.4.2 Sound Source Lo calization A comparison among VBAP and Am bisonics conducted by F rank [ 84 ] demon- strated a median deviation in exp erimental results from the ideal lo calization curv e of 2.35 º ± 2.93 º on VBAP and 1.05 º ± 4.07 º to third order Am bisonics using max-r E deco der. The setup w as placed in a t ypical non-anec hoic studio and a regular array of 8 loudsp eakers, listening in a 2.5m radius circle at the cen tral p osition. The sub jective results from 14 participants were listening to pink noise. These exp erimental results were compared to a lo calization mo del (Lindemann [ 157 ]) based on ITD and ILD from impulse resp onses. The re- sults sho wed a deviation close to the standard deviation in sub jective listening, 2.35 º on VBAP and 3.37 º to third order Ambisonics using max-r E . Off-center measuremen ts w ere p ointed out as necessary for future inv estigation b y the author. Am bisonics in first, third, and fifth order was examined in another study by F rank and Zotter [ 85 ], with 15 normal hearing, 12 loudsp eak ers setup, and listening to pink noise with in terv al attenuation. This study in vestigated the effect of the p osition (centered and off-center) and the order. The results sho w ed, to the first order rendered, a lo calization error of around 5 º to the Chapter 2. Spatial Sound & Virtual Acoustics 46 cen tered listener and 30 º for the off-cen ter p osition. Also, Am bisonics in the first order with four loudsp eakers and the third order with eight loudsp eakers w as in v estigated b y Stitt et al. , [ 283 ]. This study was conducted in a non-reverberant en vironmen t to verify the off-center p osition and the Am bisonics order. The setup w as a circular arra y with 2.2 meters of radius and an R T of 0.095 s. Eigh teen test participants listened to white noise bursts of 0.2 s. At this acoustically dry condition, the cen tered first-order median absolute error w as around 10 º , while in the off-cen ter p ositions tested w as close to 30 º . As exp ected, the error w as lo w er in the third order achieving a median of absolute error around 8 º in the cen ter and 11 º off-center. A study by Lauren t et al., [ 275 ] in v estigated the effect of 3D audio repro- duction artifacts on hearing devices assessing ITD, ILD, and DI on HOA (third and fifth orders), VBAP , distance-based amplitude panning (DBAP), and multiple-direction amplitude panning (MD AP). The study was conducted in a non-anechoic ro om with 32 loudsp eak ers in a spherical configuration. The loudsp eak er distance from the cen ter w as 1.5 m, except for the four loudsp eak- ers at the top, which w ere distan t only 98 cm. This study inv estigated cen tered and off-center p ositions (10 and 20 cm). The results presented an exp ected Am bisonics limitation of repro ducing ITD b ecause of the spatial aliasing at high frequencies, accordingly to the authors. In addition, they in vestigated MVDR monoaural b eamformer, whic h did not repro duce the correct ITD, es- p ecially off-center. At the cen tered p osition, only DBAP could not correct repro duced ITD. Am bisonics ITDs deteriorate more than VBAP on off-center p ositions. ILD errors in virtualized sound sources can make the system unre- liable for testing HI with pro cessing based on ILDs. In the exp eriment, the ILDs were less affected by b eamforming pro cessing in VBAP , and Am bisonics b enefited from the max RE deco ding that maximizes the energy v ector. How- ev er, the authors exp ect a b etter ILD represen tation from VBAP as HO A has Chapter 2. Spatial Sound & Virtual Acoustics 47 an aliasing frequency limitation. Hamdan and Fletcher [ 107 ] presen t the dev elopmen t of a compact tw o-loudsp eaker virtual sound repro duction system for clinical testing of spatial hearing with hearing-assistiv e devices. The system is based on the transaural metho d with cross-talk cancellation and is suitable for use in small, reverberant spaces, suc h as clinics and small researc h labs. The authors ev aluated the system’s p erfor- mance regarding the accuracy of sound pressure repro duction in the fron tal hemisphere. They found that it could pro duce virtual sound fields up to 8kHz. They suggest that tracking the listener’s p osition could improv e the system’s p erformance. Overall, the authors b elieve this system is a promising to ol for the clinical testing of spatial hearing with hearing-assistive devices. Finally , a study by Bates et al. , [ 22 ], ev aluated second-order Ambisonics and VBAP lo calization errors in sub jectiv e listening tests and ITD and IAF C com- parisons. They presented the stimuli to a sim ultaneous set of nine listeners in differen t p ositions inside a concert ro om (around 1 s of R T). With 16 loud- sp eak ers, they selected 1 second of sp eech (male and female), white noise, and m usic. The results indicate that VBAP and Am bisonics tec hniques cannot consisten tly create spatially accurate virtual sources for a distributed audience in a rev erb eran t en vironment. The off-center p ositions are compromised by tec hnique and stimulus. Dep ending on the stimuli, centered p ositions resulted in lo calization errors b et ween 10 º and 20 º degrees. In the spatial distribution inside the ring, a bias from the target image p osition and tow ards the nearer con tributing loudsp eak er is more presen t in the Am bisonics than the VBAP . The authors men tioned that the ro om acoustics could also impact lo calization accuracy . The num b er of v ariables across these previous studies and their con tributions is massive, e.g., ob jective measures, tec hnique v ariations, n umber of loud- Chapter 2. Spatial Sound & Virtual Acoustics 48 sp eak ers, loudsp eak er distance, num b er of simultaneous listeners, reverbera- tion time, and form of the arra y . T able 2.2 presents an ov erview of metho ds and estimated or measured lo cal- ization error. Chapter 2. Spatial Sound & Virtual Acoustics 49 T able 2.2: Ov erview of Lo calization Error Estimates or Measurements from Loudsp eak er-Based Virtualization Systems Using V arious Auralization Metho ds. Metho d Error at cen ter p osition Error at off-cen ter p osition LS num b er Presen t Study Iceb erg VBAP/Am bisonics 30 º (Max Estimated) 7 º (Average Estimated) 30 º (Max Estimated) 7 º (Average Estimated) 4 F rank [ 84 ] VBAP 2.35 º (Average) N/A 8 F rank [ 84 ] HO A (3rd order) 3.37 º (Average) N/A 8 Zotter [ 85 ] Am bisonics 5 º (Median) 30 º (Median) 12 Zotter [ 85 ] HO A (3rd order) 2 º (Median) 15 º (Median) 12 Zotter [ 85 ] HO A (5thorder) 1 º (Median) 10 º (Median) 12 Stitt et al. , [ 283 ] Am bisonics 10 º (Median) 30 º (Median) 8 Stitt et al. , [ 283 ] HO A (3rd order) 8 º (Median) 11 º (Median) 8 Bates et al. , [ 22 ] Am bisonics Second Order 10 º (mean) 20 º (mean) 16 Bates et al. , [ 22 ] VBAP 10 º (mean) 20 º (mean) 16 Grimm et al. [ 97 ] HO A (3rd order) 2 º (Estimated) 6 º (Estimated) 8 Grimm et al. [ 97 ] VBAP 4 º (Estimated) 6 º (Estimated) 8 Aguirre [ 1 ] VBAP 4 º (Median) N/A 8 Hamdan and Fletcher [ 107 ] CTC 2 º (Max Head Displacemen t) N/A 2 Huisman et al. [ 125 ] Am bisonics 30 º (Median) N/A 4 Huisman et al. [ 125 ] HO A (3rd order) ≈ 15 º (Median) N/A 8 Huisman et al. [ 125 ] HO A (5th order) ≈ 8 º (Median) N/A 12 Huisman et al. [ 125 ] HO A (11th order) ≈ 5 º (Median) N/A 24 Chapter 2. Listening Effort Assessment 50 2.4 Listening Effort Assessmen t The regular task of following a conv ersation, listening to a p erson’s sp eec h, or in teracting with someone in a conv ersation may require additional effort in an unfa v orable or c hallenging sound en vironmen t [ 227 ]. The listening effort is defined as ”the delib erate allo cation of mental resources to ov ercome obstacles in goal pursuit when carrying out a [listening] task” [ 224 ]. Studying asp ects of the listening effort related to different acoustic situations through reliable metho ds can lead to the developmen t of solutions to reduce it, improving the qualit y of life [ 304 ]. How ever, there is no consensus in the literature on the b est metho d to measure listening effort. A ttempts to measure ho w m uch energy a person tak es in a sp ecific acoustic sit- uation may rely on differen t paradigms. There are ob jective measuremen ts of ph ysiological parameters in literature asso ciated with changes in effort, such as pupil dilation [ 151 , 209 , 211 , 301 , 302 , 319 ], resp onses to brainstem fre- quency (FFRs) and cortical electro encephalogram (EEG) activity from ev en t- related p oten tials [ 28 , 33 ], or alpha band oscillations [ 186 , 223 ]. In addition, the b ehavioral p ersp ective studies changes in resp onse time in single [ 204 ] or dual-task paradigm tests, also assuming that they are related to c hanges in cognitiv e load in auditory tests [ 87 , 228 ] [ 225 ]. In turn, sub jectiv e assess- men ts of listening effort are p erformed through questionnaires [ 323 ] or effort scales [ 147 , 149 , 249 , 260 ] and their results generally agree with p erformance metrics [ 192 ]. Although sub jectiv e measuremen ts are intuitiv e and v alid, they tend to b e less accepted as an indication of the amoun t of listening effort b ecause of differences b et w een ob jectiv e and sub jective outcomes [ 151 , 225 ]. F or instance, Zekveld and Kramer [ 318 ] presen t evidence of disagreement b et ween the ph ysiological and the sub jective measure where the young normal-hearing participants at- Chapter 2. Listening Effort Assessment 51 tributed high sub jective effort to the most challenging conditions despite their smaller pupil dilation. The authors assumed that the metho dological asp ects and the participant’s tendency to drop out were also related to pupil dilation at lo w levels of in telligibility . In a study on syn tactic complexit y and noise level in the auditory effort, W endt et al. [ 300 ] ev aluated it through self-rated effort and pupil dilation. They found b oth background noise and syntactic complexit y reflected in their measurements. How ever, at high levels of in telligibilit y , the metho ds show different results. According to the authors, the explanation is that each measure represen ts a different asp ect of the effort. In its turn, Picou et al. [ 226 ]; and Picou and Rick etts [ 229 ] found sub jective ratings of listening effort were correlated with p erformance instead of the listening effort utilizing the resp onse time as a b eha vioral measure in a dual-task. Interestingly though, in this study , a question ab out control w as correlated to their resp onse time results. The v aried outcomes from sub jective and ob jective paradigms pro- p osed to ac hieve a proxy to listening effort can indicate that these metho ds are quantifying separated asp ects of a complex global pro cess [ 12 , 224 ]. Another explanation suggests a bias in the sub jective metho d due to the heuris- tic strategies adopted b y the participan ts to minimize the effort [ 192 ]. The men tioned strategy would consist of replacing the question ab out the amount of effort sp en t with a more straightforw ard question related to how they p er- formed in the task. Concomitan tly , studies based on ob jective measuremen t paradigms also hav e divergen t results. F or example, ev en among physiological measures sensitiv e to the spectral conten t of stim uli, such as pupil dilation and alpha p o w er, they are not alwa ys related, and can b e sensitiv e to different as- p ects of listening effort [ 186 ]. Even within the same paradigm, a different task ma y indicate that different asp ects are b eing observed. F or example, Brown and Strand [ 53 ] analyzed the role of the working memory as a weigh ting factor on listening effort. Although increasing bac kground noise indeed increases lis- tening effort measured by the dual-task paradigm, the memory load was not Chapter 2. Listening Effort Assessment 52 affected. They also suggested that the w orking memory and listening effort are related in the recall-based single-task, unlik e in the dual-task. In Lau et al. [ 151 ] significant differences b et w een sentence recognition and word recog- nition w ere found on pupil dilation measurements and in sub jective ratings, although with no correlation b et ween ob jective and sub jectiv e measures. The demand for mental resources can also be affected b y p ersonal factors, suc h as fatigue and motiv ation [ 224 ]. A t the same time, several physical-acoustical artifacts can degrade a sound, creating or leading to difficulties in everyda y comm unication (increasing listening effort), esp ecially in so cial situations. The masking noise, the sp ectral con ten t of the noise, the Signal-to-Noise Ratio (SNR), and the en vironmen t reverberation are examples of artifacts capable of smearing the temp oral en velope cues [ 163 ]. Also, sp eech in telligibilit y was assessed in a virtual environmen t that consists in a large spherical array of 64 loudsp eakers repro ducing Mixed Order Ambisonics (MO A) [ 6 ] presen ted comparable results of Sp eec h Reception Threshold (SR T) compared to real ro om in a co-lo cated situation of masker and target. With spatial separation of 30 degrees the virtual en vironmen t led to an SR T b enefit of 3 dB, it was argued that b enefit was not present in more reverberant or complex scenes suggesting the masking effect of more challenge scenes. SR Ts for normal hearing and hearing-impaired using hearing aids were also in- v estigated b y [ 31 ]. A complex scenario (reverberant cafeteria) and an anec hoic situation w ere ev aluated in a spherical arra y of 41 loudsp eakers. The virtu- alization w as pro vided con volving the direct sound and the early reflections parts of the RIR with the anechoic sen tence and presen ting the sound through the Nearest Sp eak er (NSP) and the late reflections part of the RIR are created through the directional env elop e of eac h loudsp eaker with uncorrelated noise. Chapter 2. Concluding Remarks 53 The reviewed studies were conducted in lab oratories mainly taking adv an tage of spatial sound and virtual acoustics via loudsp eak er or headphones repro- duction. Thus, the complex nature of h uman auditory phenomena and the imp ortance of repro ducibilit y in hearing research highlight the need for inno- v ativ e to ols suc h as spatial sound [ 134 ]. Virtualized sounds allows for realistic and controllable sound en vironments, enabling control o ver selected param- eters and consistent repro duction of exp erimen ts [ 61 , 161 , 282 , 293 ]. This tec hnology can help hearing inv estigations b ecome more true-to-life and reli- able [ 134 , 161 , 251 ]. F or example, it can b e used to study listening effort and sp eec h in telligibility using virtual sound sources to create ecologically v alid and controlled en vironmen ts [ 7 , 177 ]. It also can enable the integration of vir- tual sound scenarios with ecological tasks in volving m ultiple p eople, providing an ecologically v alid assessment of the p erformance of hearing solutions more accessible than large-field studies ( e.g., in Bates et al., [ 22 ]). Additionally , spatial audio enables the accessible in v estigation of spatial sep- aration’s effects on binaural cues considering differen t environmen ts, the role of binaural hearing in spatial p erception, and new hearing aid hardw are and algorithms [ 61 , 97 , 213 ]. Overall, spatial sound & virtual acoustics in hearing researc h offers numerous b enefits and represents a v aluable to ol for adv ancing our understanding of hearing and developing effectiv e hearing solutions. 2.5 Concluding Remarks The literature review suggests a contrast b et ween lo calization and immersion in auralization metho ds that virtualize sound using a low num b er of loudsp eak ers. Th us, there is a need for a method that can achiev e useful performance on b oth lo calization and immersion with a small num b er of loudsp eak ers and that is reliable in rendering sound for listeners in the presence of another Chapter 2. Concluding Remarks 54 listener within the virtualized sound field. Previous metho ds, including hybrid approac hes, hav e b een developed using a larger n um b er of loudsp eakers and differen t techniques for balancing energy . A recent study prop osed a metho d using only tw o loudsp eak ers in 2022. How ever, it implemen ted a different auralization method and had its limitations. The proposed metho d in this study is innov ativ e, using a ro om acoustic parameter called center time to calculate the energy balance of ro om impulse resp onses and combining it with t w o known auralization metho ds. Chapter 3 Binaural cue distortions in virtualized Am bisonics and VBAP 3.1 In tro duction In acoustics, the complex comm unication scenarios can inv olve, simultaneous sound sources, distracting background noise, mo ving sound sources, sources without large spatial separation, and lo w signal to noise ratio. Although p eo- ple with normal hearing can deal with most of these conditions in a relatively efficien t w a y , p eople with hearing loss p erform po orly [ 273 , 289 , 317 ]. Since so- cial ev en ts are often a real example of complex comm unication, the interaction barriers make p eople a void this and sometimes ostracizing themselves [ 16 , 63 ]. That can b e a factor in decreasing the qualit y of life of p eople with hearing problems. In hearing research, inno v ative signal pro cessing tec hniques, new devices, more 55 Chapter 3. In tro duction 56 p o w erful hardware, and up dated parameter settings are con tinuously devel- op ed and ev aluated. These technological improv ements aspire to resolv e com- m unication problems in ev eryda y situations for hearing aid users [ 227 ], increas- ing their so cialization and quality of life [ 119 ]. T ests as sp eec h recognition in noise are dev elop ed and tailored to ev aluate the human auditory resp onse on ev eryda y acoustics situations b etter than clinical based in pure tones stim- ulation [ 145 ]. Even though the tasks are moving tow ards a more realistic represen tation, they still need to improv e the ecological v alidity [ 134 ]. Auralization metho ds are designed to create files mean t to b e repro duced to a sp ecific listener or a group of listeners; these files contain particular c haracter- istics that try to mimic a recorded or digitally created sound scene according to the metho d. The mathematical formulations that pro duce these c haracteristics for the psyc hoacoustically based metho ds focus on deliv ering accurate binaural cues. The listener p osition, ph ysical obstacles as the listener mov ement will impact differently on distinct metho ds and cues. A VSE is an auralized sound field that can con tain realistic elements. Cur- ren tly , it is p ossible to create VSE employing loudsp eaker arrays or headphones for the listener, suc h as high bac kground noise, high reverberation, and con- comitan t sound ev en ts from differen t directions [ 61 , 79 , 294 ]. F urthermore, through a VSE, it is also p ossible to enable a participan t to w ear, for example, a hearing aid during the test. Thus, the researcher can main tain control of the stim uli, the incidence direction, signal-to-noise ratio (SNR), among other settings, while examining the hearing device p erformance in a more ecological situation [ 98 , 161 , 269 ]. Although nov el tec hnologies emerge and contribute to emulating sound sources and ev en en tire complex sound scenes with humans’ so cial interaction [ 267 ], these opp ortunities are often o v erlo ok ed in auditory ev aluations. Typically , Chapter 3. In tro duction 57 tests are p erformed b y observing only one individual within the laboratory [ 81 , 89 , 104 , 152 , 169 , 175 ]. F urthermore, the systems are designed to acquire resp onses from a single individual at a time [ 41 , 79 , 102 , 118 , 195 , 218 – 220 , 259 ]. A reasonable explanation for this is the lo w cost and complexit y of auralization through headphones. More complex tec hniques, lik e W av e Field Syn thesis, do not limit the listener to a restricted sp ot [ 207 ], repro ducing a complete sound field, although at the cost of a large n um b er of sound sources in a sp ecifically treated ro om. So cial situations can ha ve effect on p eople’s listening effort [ 230 , 234 ] and their motiv ation to listen [ 181 , 224 ]. In this context, so cial interactions hav e b een sim ulated through av atars or audio visual recordings in virtual environmen ts, gaining space in auditory researc h [ 116 , 160 , 161 , 272 , 298 ]. Although it can b e considered a significan t asset, it also fo cuses on a single individual’s resp onses to simulated so cial stimuli. The scenario creates a ground for this study to inv estigate controlled acousti- cal c hanges on the VSE. This study assesses tw o main situations within a ring of loudsp eakers virtualizing sound sources on Ambisonics and VBAP: (1) the displacemen t of the listener from the center (sweet sp ot), and (2) the effect including a second simultaneous listener inside the ring. These topics can help understand the p erception of sound in these sp ecific virtualization metho ds, increasing the fundamental scien tific basis for future hearing research appli- cations. The c hanges to the sound field w ere observ ed in three ma jor spatial cues: ITD, ILD, and IACC. That w as explored by c hanging the listener’s p osi- tion and including a second listener inside the ring of loudsp eakers to measure BRIRs. These metrics can describ e the spatial p erception of an auralized sound sig- nal [ 47 , 48 ], being ITD and ILD resp onsible by lo calization and IA CC p erceived Chapter 3. Metho ds 58 spaciousness and the listener env elopment [ 44 ]. Therefore, these measuremen ts can indicate the p ossibilit y of a simultaneous second participant in any hear- ing test with virtualized spatially distributed sound sources. Tw o differen t auralization techniques were used to virtualized sound sources, vector-based amplitude panning (VBAP) [ 241 ] and Am bisonics [ 91 ]. Both tec hniques rely on the same receptor-dep endent psyc hoacoustic paradigm to provide an au- ditory sense of immersion for those with normal hearing [ 161 , 180 ]. These tec hniques aim deliveri ng the correct binaural cues to a p oint or area to create a realistic spatial sound impression, alb eit through differen t mathematical for- m ulations. The work inv estigates if the techniques can pro vide an appropriate spatial impression for y oung normal-hearing listeners. Hyp othesis The main research question is how auralized scenarios with VBAP and Am bisonics are affected when displaced from the center and with another listener inside the ring. The h yp othesis is that lo calization cues can b e b etter provided by VBAP , esp ecially in off-cen ter p ositions. In contrast, Am- bisonics can provide a b etter sense of immersiv eness. Also, the second listener w ould impact Ambisonics more than VBAP virtualized sound sources. 3.2 Metho ds The exp eriment was conducted in tw o differen t lo cations; The first one is a sound treated test ro om at the Hearing Sciences - Scottish Section in Glasgow (See Figure 3.1 ), the second is an anechoic test ro om at Eriksholm Researc h Cen tre (See Figure 3.2 ). This section presen ts the ro oms’ acoustic characteri- zations and the metho ds used in this exp erimen t. Chapter 3. Metho ds 59 Figure 3.1: He aring Scienc es - Sc ottish Se ction T est R o om. Figure 3.2: Eriksholm T est R o om. 3.2.1 Setups and system c haracterization The exp erimen t conducted in Glasgo w was in a large sound-pro of audiometric b o oth (4.3 × 4.7 × 2.9 m; IAC Acoustics). An azimuthal circular arra y config- uration of 24 loudsp eak ers (3.5-m diameter; 15 ◦ of separation; T annoy VX6) w as used. The ceiling and walls w ere co v ered with 100-mm deep acoustic foam w edges to reduce reflections; the flo or was carpeted with a foam underlay . The AD/D A audio in terface that w as used was a F errofish Mo del A32. The loud- sp eak ers received signals that w ere amplified b y AR T SLA4 amplifiers. The reference microphone used to c haracterize the Glasgow T est Ro om was a 1/2” G.R.A.S 40AD pressure-field microphone set with e GRAS 26CA preamplifier. It was oriented 90 degrees v ertically from the sound source. At Eriksholm, an equiv alen t setup was fitted. This time in a full anec hoic ro om from IA C Chapter 3. Metho ds 60 Acoustics. The ro om’s outer dimensions (6.7 × 5.8 × 4.9 m; ) and inner di- mensions, from the tip of the foam edges (4.3 × 3.4 × 2.7 m). An azim uthal circular array configuration of 24 active loudsp eakers (16 Genelec 8030A and 8 Genelec 8030C; 2.4-m diameter; 15 ◦ of separation) was used. The AD/DA w as a MOTU PCI-e 424 com bined with a firewire 24-channel audio extension. The reference microphone used to characterize the Eriksholm test ro om w as a 1/2” B&K 4192 pressure-field and a preamplifier type 2669, supplied by pow er mo dule 5935. It w as oriented 90 degrees v ertically from the sound source. The signal acquisition and pro cessing w ere en tirely through Matlab 2020a soft- w are using the IT A-T o olb o x v.9 [ 29 ]. The technical setup w as equiv alent in b oth ro oms, a B&K head and torso sim- ulator (HA TS) mo del 4128-C mannequin for measurements, and a K no wles E lectronics M annequin for A coustic R esearch (KEMAR) was used as a ph ys- ical obstacle. Although tec hnically , both devices are head and torso sim ulators, in this thesis, HA TS will refer to the B&K 4128-C for simplicity . The sampling rate of the recordings w as fixed at 48 kHz, resulting in an uncertaint y of ± 20 µ s, therefore not compromising the final analysis. 3.2.1.1 Rev erb eration time The reverberation time is one of the most critical ob jectiv e parameters of a ro om [ 154 ]. The deca y of sound energy to 60 dB b elow its p eak after the cessation of a sound source c haracterizes the R T. The parameter is frequency- dep enden t; it is asso ciated with sp eec h understanding sp eec h, sound quality , and the sub jective p erception of the size of the ro om. F or controlled en viron- men ts, the v alues are fractions of seconds. The T 60 for b oth ro oms in the third o cta v e is presented in Figure 3.3 . Chapter 3. Metho ds 61 Figure 3.3: R everb er ation time in thir d of o ctave b ands up to 16 kHz. The ro om’s rev erb eration time T 20 w as measured using a loudsp eak er, arbi- trarily c hosen, and microphone setup as in Section 3.2.1 . The measurement and analysis were p erformed in Matlab through the IT A-T o olb o x soft w are. 3.2.1.2 Early-reflections T o ensure that there is no influence of the environmen t, Recommendation ITU-R 1116-3:2015 [ 126 ], determines that the magnitude of the first reflections should b e at least 10 dB b elo w the magnitude of the direct sound ∆SPL ≥ 10 dB. The differences in the SPL that are determined in the en vironmen ts of this work met this requirement. T able 3.1 sho ws the difference in sound pressure lev el b et w een the direct sound and early reflections. Higher differences in the Erkisholm environmen t are consisten t with its anec hoic setup compared to the sound treated b o oth in Glasgow, where the flo or provide some energy to the reflections. Chapter 3. Metho ds 62 T able 3.1: Sound pressure level difference b et ween direct sound and early reflections ∆ SPL [dB] ∆ SPL [dB] Angle Eriksholm Glasgo w 0 -20.99 -14.94 15 -23.40 -15.31 30 -22.66 -14.61 45 -21.97 -15.45 60 -20.39 -13.28 75 -21.22 -15.19 90 -17.71 -15.33 105 -21.49 -15.22 120 -17.83 -15.68 135 -20.12 -15.23 150 -19.70 -14.62 165 -19.13 -16.11 180 -24.57 -15.03 195 -23.56 -13.52 210 -22.62 -14.81 225 -21.04 -15.39 240 -22.29 -14.25 255 -23.73 -14.37 270 -20.90 -14.01 285 -24.06 -12.56 300 -19.61 -15.95 315 -17.68 -15.03 330 -21.46 -15.66 345 -23.08 -15.95 3.2.2 Pro cedure The exp erimen t studied how the presence of a second listener within a loud- sp eak er ring affects the spatial cues of the repro duced sound field. The data w ere collected through the HA TS, and the second listener being sim ultaneously inside the virtualized sound area w as sim ulated through another mannequin (KEMAR), as shown in Figures 3.4 and 3.5 . Using the results for the reverberation time as presented in Section 3.2.1 , the appropriate length of a logarithmic sw eep signal w as calculated as approxi- mately four times larger than the higher v alue of T 60 (1.49 seconds). Also, a stop margin of 0.1 seconds was set to ensure the quality of the ro om impulse Chapter 3. Metho ds 63 Figure 3.4: HA TS (with motion-tr acking cr own) and KEMAR inside test r o om in Glasgow. Figure 3.5: HA TS and KEMAR inside ane choic test r o om at Eriksholm. resp onses (RIRs) that w ere obtained [ 75 , 194 ]. The frequency of the sweep w as from 50 Hz to 20 kHz. The p osition of the head has a significant effect on the signals that are mea- sured. T o ha v e a reliable assessmen t of the absolute tri-dimensional p osition of the HA TS, its p osition w as measured with a the Vicon infra-red trac king Chapter 3. Metho ds 64 system with an accuracy of 0.5 mm in Glasgow. A t Eriksholm a laser tap e measure was used to ensure the correct p ositions. The microphones’ heigh t p osition in b oth exp erimen ts was set to match the geometrical cen ter of the loudsp eak ers enclosure in all measuremen ts. The first p osition measured used the HA TS in the center, without interference from another obstacle inside the ring, to provide a baseline. Figure 3.6a illustrates a set of positions to study the influence of a second listener inside the ring while keeping the test sub ject in the cen ter (the sweet sp ot). Three different p ositions for the KEMAR (50, 75 and 100 cm of sepa- ration) w ere measured with the HA TS fixed at the center of the loudsp eak er arra y . The data collected are from microphones in the HA TS ears; the KEMAR w as only a physical obstacle to sim ulate a listener inside the ring. Figure 3.6b , illustrates a different set of p ositions, maintaining a minimum separation of 50 cm b etw een the center of the heads, were measured. The purp ose of these p ositions with the HA TS off-cen ter was to iden tify the presence of distortions caused b y the decen tralization of the sub ject and the effect of the addition of a listener within the circle of loudspeakers as a physical obstacle to sound w av es. The p ositioning was standardized so that the mo v emen t along the x-axis to the left and righ t directions of the dummies were annotated as negative and p ositiv e, resp ectiv ely . 3.2.3 Calibration T o calibrate the HA TS recordings, the adapter B&K UA-1546 w as connected to the B&K 4231 calibrator. That provided a 97.1 dB SPL signal, which corresp onds to 1.43 P a, instead of 94 dB without an adapter. The recorded signal from eac h ear w as used to calibrate the levels of all measurements. The calibration factor was calculated as: Chapter 3. Metho ds 65 (a) Center e d p osition (b) Off-c enter p osition Figure 3.6: HA TS in gr ay, KEMAR in yel low. a) Me asur e d p ositions with the HA TS c enter e d and the KEMAR pr esent in the r o om in differ ent p ositions (thr e e c ombinations). b) Me asur e d p ositions with the HA TS in differ ent p ositions and the KEMAR pr esent in the r o om in differ ent p ositions (nine c ombinations). α l, rms = 1 . 43 rms( v l ( t ) 1kHz ) P a VFS , (3.1a) α r, rms = 1 . 43 rms( v r ( t ) 1kHz ) P a VFS , (3.1b) where α l, rms is the calibration factor for the left ear; α r, rms is that for the right ear; v l ( t ) is the calibrator signal recorded in the left ear; v r ( t ) is that for the right ear; The individual loudsp eak ers’ sound pressure lev el to the same file can differ dep ending on several factors ( e.g. , the amplification system’s level). T o balance that, a factor was then measured for a GRAS 1/2” pressure-field microphone recording a pistonphone’s calibrated sound signal from 1 kHz. The calibration factor α rms w as calculated from the ro ot mean square (RMS) using: Chapter 3. Metho ds 66 α rms = 10 RMS( v ( t ) 1kHz ) P a VFS , (3.2) where v ( t ) 1kHz is the sinusoidal signal at 10 Pa recorded from the calibrator in v olts full scale (VFS). The loudsp eak er correction factor is calculated through the iterativ e pro cess that starts repro ducing a RMS scaled v ersion of a pink noise signal at 70 dB SPL. pink noise( t ) = pink noise( t ) rms(pink noise( t )) 10 70 − dBperV 20 µ Γ l (3.3) where Γ l is the level factor to the loudsp eak er l with initial v alue = 1; dBp erV = 20 log 10 α rms 20 µ . The signal pink noise( t ) is play ed through a loudsp eaker l and simultaneously recorded with the microphone S l ( t ); the SPL of the recorded signal is calculated as follows SPL l [dB] = 20 log 10 S l ( t )[VFS] α rms Pa VFS 20[ µ P a] ! , (3.4) T en measurements are sequentially p erformed, making interv als of 1 second; the next iteration happens if the SPL obtained exceeds the tolerance of 0.5 [dB] on an y of the measurements. A step of ± 0.1 [VFS] is set to up date Γ l in its next iteration accordingly to the SPL obtained. Chapter 3. Metho ds 67 3.2.4 VBAP Auralization In the first measurement, VBAP was the tec hnique used to auralize the files. The first step in signal pro cessing was recording the 24 (RIRs) one from each loudsp eak er. Kno wing the R T of the ro om, a sweep (50-20000 Hz) was cre- ated, fulfilling the length requirement; in this case, a logarithmic sw eep of 1.49 seconds. After that, an inv erse filter (minim um-phased) w as created to comp ensate for the frequency resp onses from the different loudsp eak ers. This signal is then pro cessed through the VBAP technique to the sp ecified array of 24 loudsp eak ers. The output is a file with 24 c hannels containing the sweep signal appropriately w eighted to the sp ecific angle. The signal can b e pro cessed through a single channel (when the angle to b e pla y ed is at the loudsp eak er p osition) or up to tw o combined channels when it is a virtual loudsp eak er’s p osition. Each channel was also conv olved with the designed filter. The final (auralized) signal w as used as an excitation in the transfer function where the receptors were a pair of microphones in the B&K HA TS. 3.2.5 Am bisonics Auralization In the second measuremen t at the Eriksholm test room, the files were auralized with first-order Ambisonics in a similarlly to VBAP . T o b e able to pro cess the excitation signal, to acquire the impulse resp onses, some adaptations were required. In this case, the Am bisonics auralization pro cess requires an enco ded impulse resp onse that contains the magnitude and the direction of incidence infor- mation for eac h instance of time. This RIR can b e attained via computer sim ulation or recorded with a sp ecific arra y of microphones. The ODEON soft w are v ersion 12.15 w as used to sim ulate the sound b eha vior in an ane- Chapter 3. Metho ds 68 c hoic environmen t and enco de the impulse resp onses in Ambisonics first-order format around the listener. Odeon softw are is based on a h ybrid n umeric metho d [ 59 ]. In general, the Image-Source, a deterministic metho d, is fa vored in the region of the first reflections up to an order predetermined b y the user. Then, reflections from subsequen t orders than the predetermined transition order are calculated using ra y tracing, a sto chastic metho d [ 148 , 201 ]. Therefore, it is p ossible to simulate the sound b ehavior from a 3D mo del description of the space and details of its acoustic prop erties. F rom that sim ulation result, any music or sound can b e exp orted as recorded inside that space from the giv en p ositions of source and receptor [ 288 ]. Another option is to exp ort the ro om impulse resp onse, which represen ts the sound b eha vior of the given source receptor p ositions. The RIR can also b e exp orted as BRIR and Ambisonics in first and second order in the v ersion 12 of the Odeon softw are. The selected materials used to comp ose the simulation, and their corresp on- den t absorption co efficien ts used in the ODEON sim ulation are listed in the App endix E . In total, 72 differen t RIRs (5 degrees separation) were simu- lated for differen t p ositions of source-receptor. The simulated source p ositions w ere at the same distance of 1.35 meters from the cen ter as the loudsp eak- ers in the anec hoic ro om. These RIRs w ere conv olved with the appropriate sw eep signal, pro ducing a four-channel first-order Am bisonics sweep signal. These signals were then pro cessed by a deco der to the loudsp eakers array’s sp ecific p ositions, generating the auralized 24 c hannel files. The in v erse filter pro cedure to each loudsp eak er was applied as well as the calibration of the sound pressure level across loudsp eak ers. The alpha factor was calculated as α rms = 1 rms( v ( t ) 1kHz ) Pa VFS , since the recorded input w as from a sound calibrator t yp e 4231 b y B&K delivering 1 [Pa] SPL. The equalized, conv olved, deco ded, and filtered sweep signals contain the sim ulated source-receptor sound distri- Chapter 3. Results 69 butions in magnitude, time, and space as if recorded inside the simulated ro om. In this exp erimen t, the simulated anechoic ro om has an absorption co efficien t equal to one on all surfaces, sim ulating the anec hoic condition. The setup in first-order Ambisonics was chosen giv en the p ossibility of explor- ing a reduction in the num b er of loudsp eak ers in future exp eriments and the p ossibilit y of generating it through v alidated soft w are such as Odeon. 3.3 Results In this study , the p erformance of the system w as ev aluated by collecting and analyzing results based on the positions of a mannequin within the virtual sound field ( i.e. , center and off-cen ter) and the conditions under which the system was tested ( i.e. , with and without the presence of a second head-and- torso simulator). The results were presen ted in terms of angles referenced coun ter-clo c kwise, whic h allow ed for a detailed analysis of the system’s p er- formance under v arious conditions. Through this analysis, it was p ossible to gain a comprehensive understanding of the system’s capabilities and iden tify p oten tial areas for impro v emen t. 3.3.1 Analysis The signals were play ed and sim ultaneously recorded; the recorded result car- ried the auditory spatial effects from auralization and also the ph ysical limita- tions given b y the virtualization setup ( e.g., loudsp eakers’ frequency resp onse, and presence of loudspeakers inside the ro om). As the recorded sweep has a greater length than the original one, zero-padding was p erformed. In that pro cess, zero es are app ended to the end of the time domain signal, obtaining Chapter 3. Results 70 the equiv alent con volution nonetheless [ 242 ]. After that, it w as possible to calculate the virtual en vironmen t’s impulse resp onse by dividing the recorded signal b y the zero-padded v ersion of the initial sweep, b oth in the frequency domain. F or b oth measurements, the in teraural time difference is calculated b y compar- ing the sound’s arriv al time in the impulse resp onse b etw een the tw o c hannels of a binaural ro om impulse resp onse (BRIR). There are different metho ds for ITD calculation [ 132 , 314 ]. In this w ork, ITDs w ere estimated as the dela y that corresp onds to the maximum of the normalized in teraural cross-correlation function (IA CF). According to the ISO-3382-1:2009 [ 127 ], the IA CF is calcu- lated as: IA CF t 1 ,t 2 ( τ ) = R t 2 t 1 p L ( t ) p R ( t + τ )d t R t 2 t 1 p 2 L ( t )d t R t 2 t 1 p 2 R ( t )d t (3.5) where p L ( t ) is the impulse resp onse at the en trance of the left ear canal; p R ( t ) is that for the right canal; The interaural cross correlation co efficien ts, IACC [ 127 ], are given by: IA CC t 1 ,t 2 = max | IA CF( τ ) | , for − 1ms < τ < 1ms . (3.6) Similarly , to calculate the in teraural lev el difference (ILD), a fast F ourier trans- form (FFT) is applied to the time domain’s impulse resp onses, the sp ectrum is divided in to a v eraged o cta v e bands, and the ratio in dB b et w een the frequency magnitudes are calculated as the ILD: Chapter 3. Results 71 ILD( n ) = 20 log 10 q R p n R ( t ) 2 q R p n L ( t ) 2 , (3.7) where: n is the giv en frequency band; p n R ( t ) is the bandpassed right impulse resp onse; p n L ( t ) that to the left channel. 3.3.2 Cen tered p osition In the cen tered-p osition configuration, (Figure 3.6a ), the listener remains at the ideal VSE p osition (cen ter) to fo cus on the effect of an added listener inside the loudsp eak er ring. This framework can b e v aluable to auditory researc h as it can be used to analyze group responses to in terviews, argumen ts, collab orativ e w ork, so cial stress or disputes b etw een individuals in listening tasks. The IA CC to the fron tal angle (0 ◦ ) across frequencies is sho wn in Figure 3.7 . High v alues indicate that the system delivers the same signal to both ears. Con v ersely , the drop in IACC v alues at high frequencies can indicate that the Ambisonics may fail to render specific frequencies affecting the o ctav e bands analysis. The IA CC v alues measured across all angles for VBAP and Am bisonics can b e found in Figure 3.8 . They indicate that Ambisonics tend to pro vide less lateralization in lo w er frequencies (constant and higher IA CC v alues) and lo w er but constan t v alues in high frequencies, p ossibly translating to blurred sound lo calization. Chapter 3. Results 72 Figure 3.7: Inter aur al cr oss c orr elation as a function of fr e quency in o ctave b ands - F r ontal angle 0 º . Figure 3.8: Inter aur al cr oss c orr elation for aver age d o ctave b ands in Ambisonics and VBAP te chniques r epr esente d in p olar c o or dinates. That can happ en due to a tilt in p ositioning the hats or imprecision from the virtualization system. F or example, a high-frequency sound w a ve at 8 kHz has a w av elength of approximately 4 cm and 2 cm at 16 kHz, which means that ev en a sligh t tilt can influence high-frequency IA CC. F urthermore, the in verse FIR filter applied was not the in v erse broadband signal, but the filtered in Chapter 3. Results 73 third of o cta v e bands. That decision was a signal pro cessing compromise, as a broadband filter w ould only partially comp ensate for loudsp eakers’ geometry or phase differences in high frequencies. This p oint can b e further in v estigated as a wa y to improv e Ambisonics repro duction. There is a relative increase of v ariations with frequency in VBAP results, whic h are presen t to a lesser extent in the Ambisonics IA CC results. That reveals a difficulty from Am bisonics to drive a go o d sense of lo calization as a high coherence level indicates the sound coming from front or bac k [ 58 ]. A t the same time, due to the Ambisonics activ ation of all a v ailable loudsp eak ers to render the sound in the sweet sp ot area, the sense of immersion is higher. 3.3.2.1 Cen tered ITD The ITD results presented were obtained after a ten th-order low-pass Butter- w orth filter (LPF) w as applied. The filter’s cutoff frequency w as 1,000 Hz to appro ximate the low frequency dominance in ITD [ 38 , 124 , 197 , 242 ]. V ector Based Amplitude P anning The light blue line in Figure 3.9 shows the results for the ITD from the initial setup (HA TS alone cen tered). The sys- tem presented a magnitude p eak in resp onse time of appro ximately 650 µ s, whic h corresp onds to appro ximately 22 cm for a wa ve tra v eling at the ve- lo cit y of sound propagation in the air. This distance is comparable to the distance b etw een HA TS microphones (19 cm). It is appropriate to note that the symmetry of HA TS is also presented in the HA TS alone results (triangles in Figure 3.9 ) pro viding reassurance in the quality of the data collected. The HA TS was k ept in the center of the loudsp eaker ring for the next set of measuremen ts. A second listener’s influence w as then simulated by introduc- Chapter 3. Results 74 Figure 3.9: a) HA TS alone at c enter. b) Light blue line: HA TS alone at c enter. Black line: HA TS c enter e d and KEMAR at 0.5 m to the right. Blue line: HA TS c enter e d and KEMAR at 0.75 m to the right. R e d line: HA TS c enter e d and KEMAR at 1 m to the right. ing a KEMAR and laterally v arying its p osition along the lateral axis (x-axis). The results are presen ted in Figure 3.9 . The ITD data obtained from this exp erimen t make it p ossible to comprehend that the second mannequin (KE- MAR) has an impact as an obstacle on the interaural time difference in the HA TS at the center of the loudsp eaker ring. In the closest p osition of the second listener (50 cm from the center), there is a reduction of ITD v alues (angles b et ween 285 and 305 degrees). Thus, the maxim um difference is 50 us. That effect is related to the insertion of the physical obstacle represented b y the second listener. As the sound wa ve diffracts, differen t paths to the listener’s ears are imposed, reducing the sound’s arriv al time b etw een ears. Therefore, the effect should b e centered at 270 degrees. How ever, the second listener was not p erfectly aligned to the lateral of the centered listener. That was a limitation of the exp eriment as the KEMAR w as placed in an ordinary chair, and its b ottom is not flat. Chapter 3. Results 75 Am bisonics The ITD results for the initial setup (HA TS alone cen tered) virtualized from Ambisonics auralization are presen ted in Figure 3.11 . The system sho w ed a magnitude p eak in resp onse time, roughly 600 µ , 50 µ lo w er than the VBAP method. Another c haracteristic of Am bisonics ITDs is the flat b eha vior around the lateral angles, which is generated mainly by the c hosen order of the Ambisonics auralization. In first-order, the horizontal directivity is determined b y the to an intersection of three bi-directional (figure-eight) sensitivit y patterns, circumv ented b y a omnidirectional one, as illustrated in Figure 3.10 . That can also limit the lo calization p erformance when utilizing first-order Ambisonics, ev en when repro duced through a higher num b er of loudsp eak ers. Figure 3.10: Horizontal 2D Ambisonics dir e ctional sensitivity cr op r epr esentation. The r e d line r epr esents an omnidir e ctional p attern, the black line r epr esents a bidi- r e ctional p attern, y-axis oriente d (nul l p oints at the sides), and the purple line is a bidir e ctional p attern r epr esentation x-axis oriente d (nul l p oints in fr ont and the b ack). The HA TS w as kept in the center of the loudsp eak ers ring and simulated a second listener’s influence on the sound field by in tro ducing a KEMAR to three differen t p ositions along the x-axis 50, 75, and 100 cm to the left of HA TS ( i.e. at 270 ◦ ). The results are presented in Figure 3.11 by the black, blue, and red lines. The data clearly demonstrated that as an obstacle, the second listener (KEMAR) do es not influence the interaural time difference when using Ambisonics, and HA TS is at the center of the loudsp eak er ring. Chapter 3. Results 76 Figure 3.11: a) HA TS alone at c enter. b) Light blue line: HA TS alone at c enter. Black line: HA TS c enter e d and KEMAR at 0.5 m to the right. Blue line: HA TS c enter e d and KEMAR at 0.75 m to the right. purple line: HA TS c enter e d and KEMAR at 1 m to the right. 3.3.2.2 Cen tered ILD The effects in higher frequencies due to a second listener require an analysis of a differen t parameter. Instead of studying the difference in the arriv al time of the sound b et w een the ears, the representativ e metric is the level difference b et w een the ears. There are effects as absorption, reflection, and diffraction before the sound pressure signal reaches the eardrums. The torso, shoulders, outer ear, and pinna mechanically affect an incoming sound w av e. These effects are angle and frequency-dep enden t, as differen t frequency wa ves ha ve differen t w a ve- lengths [ 39 , 40 , 90 ]. The effects on ILD caused by the virtualization pro cess w ere calculated as the differences b etw een the reference ILDs measures with HA TS alone and centered and the ILDs measured with HA TS and a second mannequin (KEMAR). As a reference, Figure 3.12 presents the ILDs by each metho d from t w elve different angles (30 degrees separation) around the listener. Chapter 3. Results 77 Figure 3.12: Inter aur al L evel Differ enc es as a function of o ctave-b and c enter fr e- quencies in twelve differ ent angles ar ound the c entr al p oint. There are differences b et ween ILDs calculated from measuremen t with b oth tec hniques on the energy in the av eraged o cta v e bands. Ho w ev er, the ILDs from VBAP present a significant effect based on incidence angle (more natu- ral) than the Am bisonics [ 222 ]. F urthermore, the ILD p eak for the Am bisonics is observ ed around 2000 Hertz, which can b e interpreted as the limit in fre- quency repro duction of level difference b et w een ears when deco ding through 24 loudsp eak ers [ 299 ]. A more comprehensive comparison b etw een techniques with the HA TS centered alone can b e observ ed in the heatmap representation from Figure 3.13 including all 72 angles (5 degrees separation) measured. The homogeneit y across angles from Am bisonics measurements indicates that its ILD lac ks precision as a binaural spatial cue. Lo calization accuracy in Am- bisonics repro duction, esp ecially to lateral angles, is highly dep enden t on its order (acquisition and repro duction) [ 27 ]. Figure 3.14 sho ws the energy difference across the o cta ve bands for eight dif- feren t incidence angles on b oth techniques with and without the presence of the second mannequin. On b oth techniques, the strongest influence happ ens Chapter 3. Results 78 Figure 3.13: Inter aur al level differ enc es aver age d o ctave b ands as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane. when the second mannequin is closest to the center. The second listener is to the righ t angle in VBAP (270 ◦ ), while in the Ambisonics is p ositioned to the left (90 ◦ ). Figure 3.14: Inter aur al L evel Differ enc es (o ctave b and) angles ar ound the c entr al p oint c onsidering differ ent displac ement of the se c ond listener. The ILDs calculated from measurements with the second mannequin present Chapter 3. Results 79 are not extensively different compared to the reference ILD. The difference is prop osed to b e observed as a distortion parameter. These differences were cal- culated b y subtracting the ILDs with the second mannequin from the specified cen ter alone reference ILD. Ideally all graphs should b e black for a full matc h (no difference b et ween different setups/p ositions), meaning no measured dis- tortion. V ector Based Amplitude P anning Figure 3.15 presents the differences b et w een ILDs calculated from HA TS centered (HC) and the configuration that com bines the HA TS cen tered plus the KEMAR in one of the three p ositions ( e.g. , HC K-50 is the defined notation to HA TS cen tered and KEMAR at 50 cm to the righ t). The sounds w ere auralized via VBAP for all 72 angles (5 ◦ spacing). The angles that corresp ond to loudsp eak er lo cations (15 ◦ spacing) is placed were repro duced directly by the physical loudsp eak er at that angle. Figure 3.15: VBAP discr ep ancies in ILD b etwe en HA TS at the c enter and: (T op) HA TS at the c enter plus KEMAR at 50 cm to the right, (Midd le) HA TS at the c enter plus KEMAR at 75 cm to the right, (Bottom) HA TS at the c enter plus KEMAR at 100 cm to the right. The differences in frequencies ov er 1 kHz are pronounced for angles to the right side of cen tered HA TS, 270-305 ◦ azim uth. Smaller effects can also b e noted Chapter 3. Results 80 on other angles that corresp ond to virtual sound sources (where there is no loudsp eak er, and the sound source is pro duced via the auralization technique). These effects are diminished as the second mannequin p osition increases a wa y from the centered receptor, indicating a smaller acoustic shadow. Figure 3.16: VBAP Inter aur al L evel Differ enc es as function of azimuth angle ar ound the c enter e d listener. Figure 3.16 sho ws the ILD in six o cta ve bands from impulse resp onses recorded with files auralized using VBAP . The HA TS cen tered (HC) p osition refers to HA TS alone and it is compared to the configurations adding the second listener (KEMAR) in three differen t p ositions 50, 75 and 100 cm displaced from the cen ter (K+50, K+75, and K+100, resp ectively). The mismatch is pronounced when KEMAR is closer (blue line), esp ecially in the angles blo ck ed b y KEMAR. As the second listener blo cks the sound w a v e, an acoustic shadow is created, which reduces the sound energy to the ear facing the sound source, decreasing the lev el difference b etw een ears. There is also a reduction in ILD for angles from 35 to 50. That can b e related to the opp osite effect where the mannequin reflects part of the sound, increasing the lev el to HA TS-centered counter ear. The finding supp orts in terpreting that a Chapter 3. Results 81 substan tial effect o ccurs on ILDs to the KEMAR’s closer p osition. Am bisonics Figure 3.17 presen ts the calculated differences b et w een ILDs from the Ambisonics auralization with the same configurations ( i.e. , HC vs . HC K-50, HC vs . HC K-75, and HC vs . HC K-100).F or conv enience, the second mannequin was p ositioned to the left of the cen ter (90 ◦ ). The switc h from right to left do es not affect the comparison as b oth HA TS and the Eriksholm test ro om are symmetric. Figure 3.18 shows the ILD in o cta ve bands, to highligh t the stronger effect b eing at the 8 kHz. Figure 3.17: Ambisonics discr ep ancies in ILD b etwe en HA TS at the c enter and: (T op) HA TS at the c enter plus KEMAR at 50 cm to the left, (Midd le) HA TS at the c enter plus KEMAR at 75 cm to the left, (Bottom) HA TS at the c enter plus KEMAR at 100 cm to the left. The results demonstrate that including a second listener has a negligible ef- fect on Ambisonics first-order ILDs. Ho wev er, on the reference measurement (HA TS alone), the ILDs did not adequately repro duce this spatial cue through- out the angles around the listener, given the observ able minor ILD differences across angles, esp ecially ov er 2 kHz. Chapter 3. Results 82 Figure 3.18: Ambisonics Inter aur al L evel Differ enc es as function of azimuth angle ar ound the c enter e d listener. 3.3.3 Off-cen tered p osition Being able to ha ve the participan t aw ay from the center of the loudsp eak- ers ring can b e v aluable for testing simultaneous participants or a particular ph ysical apparatus’ influence ( e.g. Listening effort ev aluated under presence of another individual [ 230 ]). Auditory research that aims to test the influence of a particular noise, SNR, or the direction of the noise on the in teraction in participan ts’ con versation can benefit from a setup that would make it p ossible to virtualize a sound scene and presen t it without spatial distortions. Mea- suremen ts aiming to study the influence of off-center HA TS displacement w ere p erformed in nine differen t configurations: with HA TS and KEMAR indep en- den tly displaced 25, 50, and 75 cm from the cen ter, resulting in separations of 50, 75, 100, 125, and 150 cen timeters (See Figure 3.6b ). The listening p osition is critical to the auralization pro cess tec hniques pre- sen ted in this w ork as they are derived and programmed to render the sound in the center of a loudsp eaker arra y . Adding computer p o w er to real-time pro- Chapter 3. Results 83 cessing could handle participan t mo vemen ts; although that can b e considered, it w as not in this part of the exp erimen t scop e. Suc h pro cessing fo cuses on dynamics (head motion). The fo cus here is the effects of sub-optimal p ositions and the influence of a second listener as an obstacle to the sound field. 3.3.3.1 Off-cen ter ITD The effects of off-center p ositioning on sound’s arriv al time can affect the sub- jectiv e p erception of the sound incidence direction. V ector Based Amplitude P anning Observing the ITD results sho wn in Figures 3.19 , 3.20 , and 3.21 , almost no influence of the second mannequin (KEMAR) can b e noted ev en with the HA TS off-center. The ITD at off-center p ositions deviates from the ITD from HA TS cen tered at the same prop ortion regardless the second listener (KEMAR) p osition. Nonetheless, Figure 3.22 shows that a pronounced effect app ears by shifting out the HA TS off center. When exceeding 25 cm, the spikes represen t a difficulty of the v ector-based amplitude panning pro cess to generate the virtual sound sources. This b ehavior is exp ected as the VBAP mathematical form ulation is deriv ed by a unitary vector p ointing to the center. In figures 3.20 and 3.21 it is p ossible to observ e more considerable distortions (sharp p eaks crossing the reference line in addition to b eing offset from the reference line) in the ITD for the virtual sound sources. Such distortions increase as HA TS is mov ed a w a y from the cen tral position. Sound sources repro duced using VBAP in this loudsp eak er ring to these receptor p ositions w ould not b e correctly interpreted in terms of direction b y the listener. The ITDs difference is greater when the sound sources are at angles close to the Chapter 3. Results 84 Figure 3.19: ITD as a function of sour c e angle Light blue line: HA TS alone at the c enter. Black line: HA TS at -25, KEMAR at +25. Blue line: HA TS at -25, KEMAR at +50. R e d line: HA TS at -50, KEMAR at +75. Figure 3.20: ITD as a function of sour c e angle Light blue Line: HA TS alone at the c enter. Black Line: HA TS at -50, KEMAR at +25. Blue line: HA TS at -50, KEMAR at +50. R e d line: HA TS at -50, KEMAR at +75. fron t or rear (0 ◦ and 180 ◦ ) directions. This effect is related to HA TS ph ysical displacemen t. The ITD results at lateral angles are represen ting a larger lob e to the HA TS righ t ear (270 ◦ ), and a sharp ened lob e at the HA TS left ear (90 ◦ ) Chapter 3. Results 85 Figure 3.21: ITD as a function of sour c e angle. Light blue line: HA TS alone, c enter e d. Black line: HA TS at -75, KEMAR at +25. Blue line: HA TS at -75, KEMAR at +50. R e d line: HA TS at -50, KEMAR at +75. Figure 3.22: ITD as a function of sour c e angle. Light blue line: HA TS alone, c enter e d. Black line: HA TS at -25, KEMAR at +25. Blue line: HA TS at -50, KEMAR at +50. R e d line: HA TS at -75, KEMAR at +75. sho ws the off-center displacement. This effect occurs b ecause HA TS is not at the cen ter of the ring (See Figure 3.23b ), and the angles and separations b et w een the loudspeakers are mo dified. The effect is ev en more apparen t when Chapter 3. Results 86 lo oking only at the real sound source (angles corresp ondent to loudsp eak er lo cations) ITDs, without the distortions created by VBAP auralization, (See Figure 3.23a ). (a) (b) Figure 3.23: a) ITD for r e al sound sour c es. Light blue line: HA TS alone, c enter e d. Black line: HA TS at -25, KEMAR at +25. Blue line: HA TS at -50, KEMAR at +50. R e d line: HA TS at -75, KEMAR at +75. b) HA TS off-c enter p osition -75 cm scheme facing the thir d loudsp e aker. Am bisonics The VBAP metho d constructs the auditory spatial cues through one to three loudsp eak ers in this setup, usually in the same quadrant. Am- bisonics, in contrast, uses all the av ailable loudsp eakers in the rendering pro- cess. Hence, the sound lo calization is b enefited on VBAP auralization com- pared to Am bisonics due to the nature of the methods [ 104 , 105 , 175 , 180 , 221 ]. F urthermore, the ITDs results observed from the first-order Am bisonics reflect the metho d’s limitation on sw eet sp ot size. Figure 3.24 shows the calculated ITD in three differen t configurations H+25 K-25, H+50 K-50, H+75 K-75, and the center configuration for comparison. T o improv e readabilit y , the ITD results for the remaining spatial configuration (whic h were similar across conditions) can b e found in App endix A . The exp ected size of a listening area is 20 cm when com bining 24 speakers to repro duce Ambisonics in a 2D horizon tal matrix [ 299 ]. The displacemen t Chapter 3. Results 87 Figure 3.24: ITD as a function of sour c e angle in A mbisonics setup. Light blue line: HA TS alone, c enter e d. Black line: HA TS at -25, KEMAR a t +25. Blue line: HA TS at -50, KEMAR at +50. R e d line: HA TS at -75, KEMAR at +75. of 25 cm and greater puts the receptor outside the sw eet sp ot. Therefore, it is p ossible to observe in Figure 3.24 that Am bisonics do es not virtualize this acoustic track correctly outside the center p osition, as the v alues remain mostly constant for the side b eing play ed. 3.3.3.2 Off-cen ter ILD ILDs can b e highly sensitive to the listener’s p osition on a virtualized sound field, given the considered smaller w a velengths. The comp osition of a virtu- alized sound w a v e is b e p erformed by sim ultaneously combining sounds from sev eral sound sources, whic h requires a highly precise com bination. This section in v estigates the ILD c hanges due to ha ving the listener aw ay from the optimal p osition while having another listener present. ILD’s influence when the HA TS and a second participan t are aw ay from the center. A comparison of the ILD results across the p ositions is sho wn in Figure 3.25 ; it presents for b oth tec hniques the calculated ILDs o ver frequency on eigh t Chapter 3. Results 88 incidence directions spaced o ver 45 degrees in the azimuth on three different p ositions plus the centered p osition as a reference. The pattern deviation as the receptor is mo v ed from the cen ter is not the same across the tec hniques. As exp ected, the ph ysical construction of the summed sound wa ve from Am- bisonics that relies on all loudsp eak er has a higher impact on ILDs than the VBAP which only com bines few sound sources from the same quadrant. Figure 3.25: ILD as a function of fr e quency at differ ent angles (line c olor) for VBAP (top r ow) and Ambisonics (b ottom r ow) for symmetric al displac ement in off- c enter setups. On files auralized through VBAP , the discrepancies b et ween ILD measured ha ving HA TS in the center (optimal p osition) and the other p ositions can b e in terpreted as acoustic artifacts capable of conv eying the wrong lo calization of the sound source. Although the second listener did not hav e a primary influence, the observed displacement from the center affects the ILD pattern, esp ecially the higher frequencies. F or Am bisonics, the listener p osition is crit- ical. The ILD differences from center to off-center p ositions create artifacts that compromise ILD used as a cue to the sound localization on all tested p ositions. Chapter 3. Results 89 V ector Based Amplitude Panning The top row of Figure 3.25 shows the ILD screening in some of the incidence angles. Comprehensive visualization of ILDs across angles is presented in Figure 3.26 for the reference-cen tered (top) and off-centered p ositions. There is an effect on ILDs when mo ving the receptor from the center p osition and adding a second listener inside the loudsp eak er’s ring. Although noticeable, the effect still preserves the pattern, allo wing the difference to b e interpreted as artifacts. The vertical zero es ILDs indicated the fron tal and rear angles (0 ◦ and 180 ◦ ) where the sound should arriv e at the ears with the same level. These vertical black lines are shifted as the listener is displaced from the cen ter. A t 75 cm displacement the low est v alue vertical line on Figure 3.26 app ears is 35 ◦ (fron tal) and 145 ◦ (rear) Figure 3.26: VBAP setups: ILD on c enter e d p osition (top); ILD on off-c enter setups: HA TS at 25 cm to the left with KEMAR at 25 cm to the right (midd le top); HA TS at 50 cm to the left with KEMAR at 50 cm to the right (midd le b ottom); HA TS at 75 cm to the left with KEMAR 75 cm to the right (b ottom). The difference b etw een ILD with the HA TS in the reference p osition (alone and in the center) and the configurations with HA TS outside the center sim ul- taneously with KEMAR are shown in Figures 3.27 , 3.28 and 3.29 . The acoustic field b ehavior outside the cen ter of the ring at frequencies ab o v e Chapter 3. Results 90 Figure 3.27: VBAP differ enc es in the ILD b etwe en c enter e d Alone and off-c enter with KEMAR setups: HA TS at 25 cm to the left with: KEMAR at 25 cm to the right (top); KEMAR at 50 cm to the right (midd le); KEMAR 75 cm to the right (b ottom). Figure 3.28: VBAP differ enc es in the ILD b etwe en c enter e d setup and 25 cm off- c enter VBAP setups: HA TS at 50 cm to the left with: KEMAR at 25 cm to the right (top); KEMAR at 50 cm to the right (midd le); KEMAR 75 cm to the right (b ottom). 1 kHz presen ts significan t ILD differences for the measured configurations, esp ecially on angles that are virtual sound sources. The ILD difference reaches up to 15 dB. As in the ITD, the ILD data from HA TS in the off-cen ter p osition shows the acoustic shado wing effect caused by KEMAR. It is p ossible to note that as Chapter 3. Discussion 91 Figure 3.29: VBAP differ enc es in the ILD b etwe en c enter e d setup and off-c enter setups: HA TS at 75 cm to the left with: KEMAR at 25 cm to the right (top); KEMAR at 50 cm to the right (midd le); KEMAR 75 cm to the right (b ottom). close as KEMAR is p ositioned to HA TS, greater discrepancies in ILD around p ositions near 270 degrees o ccur. This effect is due to the diffraction and absorption of the sound on the second listener (KEMAR), and happens for b oth real (loudsp eaker) and virtual sound sources lo cations. Am bisonics Am bisonics presents a more considerable limitation regarding mo v emen t outside the cen ter of the ring due to its nature. The sound comp o- sition requires a combination of amplitude and phase from all av ailable loud- sp eak ers b eing the correct represen tation ac hiev ed only for an area at the center and without obstructions. The ILD in o cta v e bands is sho wn in Figure 3.30 . The low amplitude and homogeneity across frequencies demonstrate that Am- bisonics is limited to render the binaural cue prop osed, not appropriately de- liv ering the lev el differences outside the cen ter. The ILD differences from the off-cen ter p ositions to the HA TS centered are presen ted in App endix B . Chapter 3. Discussion 92 Figure 3.30: Ambisonics setups: ILD on c enter e d p osition (top); ILD on off-c enter setups: HA TS at 25 cm to the left with KEMAR at 25 cm to the right (midd le top); HA TS at 50 cm to the left with KEMAR at 50 cm to the right (midd le b ottom); HA TS at 75 cm to the left with KEMAR 75 cm to the right (b ottom). 3.4 Discussion Once the listener is cen tered in the loudsp eak er arra y , the second listener did not affect the auralization other than the angles ph ysically shadow ed b y the second listener. Th us, a second listener do es not deteriorate the spatial cues on b oth auralization techniques analyzed in this w ork. F or VBAP , the discrepancies in ITD only o ccur as the second listener was p ositioned 50 cm a wa y , the closest measured p osition in this exp eriment. Also, differences in ILD for VBAP are more notable at the second listener’s closest p osition. Concurren tly , Am bisonics has not presented an apparen t difference in ITD to a cen tered listener b y placing a second listener inside the ring. The difference in Am bisonics ILDs from the cen tered reference indicates an acoustic shado w (this time at the left angle of 90 degrees) and an additional sligh t difference across other angles. Chapter 3. Discussion 93 There is an apparent effect on ITD as the listener is mov ed out of the center. F or VBAP , the p eak of magnitude remains practica lly the same, appro ximately 650 microseconds, as the ITD 0 v alue (sound reac hing sim ultaneously in b oth ears) is shifted. At 75 cm off-center to its left side, the difference in arriv al time corresp onds to a shift of 30 degrees appro ximately . That is in line with the setup, as the mannequin was placed in front of another loudsp eak er. Ho wev er, the Ambisonics ITDs demonstrate that the comp osition of magnitude and phase is not completed in off-centered p ositions. The Am bisonics weigh ts are calculated to the sound wa ves from the loudsp eakers to interact in the center p osition and then form a sound field represen ting a sound wa ve from a defined incidence angle. Moving the primary listener to the right makes the in teraction b et w een the loudsp eak ers inaccurate. In this case, the time difference turns wrong due to the Am bisonics truncation order to b e lo w increasing the aliasing effect as it can b e similarly observed in third and fifth order in Lauren t et al., [ 275 ]. The sound from the right mainly reac hes the right ear and trav els to the left ear b efore the sound from the left side can trav el the extra distance. That is an expected effect since ev en the minim um displacemen t (25 cm) is larger than the exp ected repro ducible area (around 20 cm) for this setup. There w as no difference observ ed as the second listener (KEMAR) p ositions were changed (25, 50, and 75 cm to the right of the cen ter) in all VBAP measurements with HA TS p ositioned to the left of the center. Considering the ITD just noticeable difference (JND) in an anechoic condition is of the order of the 10 to 20 microseconds [ 38 , 140 , 241 ], the ITD results when the off-center HA TS p osition w as 25 cm to the left were a goo d approximation of the reference- cen tered measurement. That means that a listener would not b e able to discern the difference con- cerning the direction of incidence if placed in these p ositions relying only on Chapter 3. Discussion 94 the ITD cue. It is also w orth considering that the JND to rev erb eran t condi- tions is even higher [ 140 ] and the artifact can b e masked by rev erb eration [ 97 ] whic h would b enefit the auralization pro cess. The HA TS measurements p osi- tioned on 50- and 75-cm presents p eaks and crossov er v alues across the line that corresp onds to centered ITD, which indicate distortion problems at low frequencies regarding the spatial cue. A similar analysis of the KEMAR im- pact on ITDs from Ambisonics virtualization can not b e achiev ed since the ITD is not accurately rendered outside of the sweet sp ot. Eac h result of the in teraural lev el difference p osition combination (HA TS and KEMAR) was subtracted from the HA TS results alone to p erform the ILD analysis off-center. In the VBAP metho d, a shado w effect generated by a second listener is presen t as exp ected, mainly when the first listener is 25 or 50 cm left of center. Ho wev er, the differences in high frequencies are essentially on virtual sources, whic h indicates the difficulty of creating the virtual sound source impression outside the center p osition, indep endently of the second listener presence [ 2 ]. Off-cen ter p ositions did not allo w the accurate syn thesis of the ILDs from the loudsp eak ers using Ambisonics. The metho d did not repro duce time or level differences accurately in these conditions, whic h could lead to not ac hieving the correct spatial impression. That is in line with literature, although generally in v estigating higher Am bisonics orders, the complexity of accurately render high frequency cues is presen t [ 279 , 290 ], and also the off-cen ter increase on accuracy b y increasing the Ambisonics order with a prop er n umber of loud- sp eak ers [ 275 ]. It should b e noted that the curren t study did not measure c hanges in ITD and ILD for off-center listener p ositions without the presence of a second listener. Based on the effects of ha ving the first listener off-cen ter with a second listener Chapter 3. Concluding Remarks 95 presen t, coupled with the smaller c hanges with a second listener when the first listener is centered, it can b e deduced from the current results that the off- cen ter p osition has a degradation effect on the ITD and ILD. Considering that man y sim ulations are limited by a “sw eet spot” for the listener(s), the off- cen ter p osition, as opp osed to the presence of a second listener, is probably the greatest liability for m ulti-listener metho ds in hearing research. 3.5 Concluding Remarks The more demanding the test requiremen t in terms of lo calization of the sound source (out from left, right, front and back), the more the researcher should driv e tow ards VBAP . In case of fixed p ositions and requirement of more sense of immersion, Ambisonics should b e able to build more con vincing sound sce- narios. The techniques do not affect the ILD and ITD acoustic cues in the cen tral p osition for one test participant. The addition of a second listener within the ring also do es not significantly affect these parameters at the three distances tested, except for the angles usually hidden b y the shadow second listener. Th us, it is suitable to mo v e to wards sub jectiv e tests with a cen ter participant and an actor on the side. Although the second listener has not deteriorated the techniques, they presen t differen t p erformances in terms of spatial repre- sen tation and notably presen t a differen t sense of immersion. Thus, the test’s purp ose to b e designed m ust b e tak en in to account when defining the aural- ization metho d. There is a clear degradation when t wo test sub jects are sim ultaneously presen t b oth in off-cen ter p ositions, regardless of the distance of a second listener. The VBAP measurements show ed an increase in differences for ITD increasing the Chapter 3. Concluding Remarks 96 distance from the cen ter and significan t differences in ILD. These differences indicate the creation of acoustic artifacts, p ossibly generated by the metho d’s difficult y in correctly virtualizing high frequencies outside the sw eet sp ot. F or the ITD parameter, the displaced p osition of 25 cm of the center has little difference or evidence of artifacts generated by virtualization errors. At the same time, the other distances present significant differences and artifacts. The binaural cues analysis suggests that VBAP is less sensitive to the participant p ositions than the Ambisonics setup. Ho w ev er, it is relev ant to note that although the differences in the binaural cues denote differences in audio spatialization, reflecting on the p erceived angle of incidence of the sound, b oth techniques can b e calibrated to repro duce the stim uli at a desired lev el of sound pressure. That means that an auralized sound can b e repro duced with the correct sound pressure level although its direction ma y not b e correctly interpreted b y the listener as their binaural cues are not b eing deliv ered appropriately . Chapter 4 Sub jectiv e Effort within Virtualized Sound Scenarios This exp eriment was a collaborative study (EcoEG [ 3 ]) with fello w HEAR- ECO PhD student Tirdad Seifi-Ala, also from the Univ ersit y of Nottingham, that combined the virtualization of sound sources and electro encephalograph y (EEG) to assess listening effort in ecologically v alid conditions. Both studen ts con tributed equally to the study design, preparation, data collection and in- terpretation. TSA additionally p erformed the data analysis; SA additionally p erformed the ro om simulations, stimuli preparation, soft w are interface and sound calibration. As definitions can v ary , this chapter uses the following terms: • Simulation: Numerical acoustic sim ulation of spatial behavior of a sound in a defined space. • Auralization: Creation of a file that can b e conv erted to a p erceiv able sound and contains spatial information. 97 Chapter 4. In tro duction 98 • Sound Virtualization: Repro duction of an auralized sound file through loudsp eak ers or headphones. 4.1 In tro duction The in terest from researc hers and clinicians in the listening effort measures has gro wn recently [ 83 , 135 , 210 ], the imp ortance of studying listening effort in an ecologically v alid sound en vironmen t follows the same trend [ 134 ]. The previous chapter discussed the feasibility and constraints of the virtual- ized sound field through binaural cues and foreseeable effects on spatial im- pression and lo calization. This c hapter in v estigates whether reverberation and the signal-to-noise ratio (SNR) are mo deled in b eha vioral data, b eing a pro xy of sub jective listening effort in a virtualized sound environmen t. The rev erb eration is the accumulation of energy reflections (sound) in an en- closed space that creates diffusion in its sound field [ 256 ]. Reverberation Time, in turn, is an ob jective parameter that represent s the amoun t of time required to dissipate the energy of a sound source b y one-millionth of its v alue (60 dB) after the sound source has ceased [ 254 ]. This parameter w as reviewed in Sec- tion 2.3.3.2 . The remaining sound energy can blur the auditory cues, rapid transitions b et w een phonemes, and decrease the lo w-frequency mo dulation of a signal; it ma y compromise sp eech in telligibilit y [ 39 , 112 ]. Since reverberation is a complex phenomenon, depending on space and fre- quency [ 111 , 185 ], a wide range of ph ysical-acoustical factors ma y limit some comparisons. F or example, the repro duction method, the mask er t yp e, the p osition and num b er of sources, the SNR, the sound pressure level of the pre- sen tation, the reverberation time interv al studied, and the simulated p osition b eing in a free or in a diffuse sound field. Like the metho dologies, the findings Chapter 4. In tro duction 99 in terms of rev erb eration influence on listening effort across exp eriments can also v ary . Previous studies in v estigated the effect of rev erb eration on sp eec h in telligi- bilit y and listening effort. V ariations across reverberation time, level, and p opulation groups were observed. F or example, a correlation b et w een age and rev erb eration was traced in work b y Neuman et al. [ 203 ]. This study found that rev erb eration negatively impacts the necessary SNR to reach 50% of sp eec h recognition. This impact v aries across ages, with the effect decreasing as age increases. The sensitivity of sub jective measures and electro dermal activities w ere ev aluated b y Holub e et al. [ 121 ]. The effect of reverberation was found statistically significan t to sub jective measures but not to the electrodermal activit y . A study from Picou et al. [ 225 ] presents a resp onse time in a dual- task paradigm as a b eha vioral measure of listening effort. In their study , there w as no significant effect in resp onse time neither in the same SNR conditions nor comparing the resp onse time of equal p erformance scores. The impact on listening effort w as studied b y Kw ak et al. [ 149 ] through sub jective ratings resulting in a significant effect of reverberation on ratings of listening effort and the sentence recognition p erformance. In Nicola and Chiara’s study [ 204 ], the negative influence of reverberation on response time was considered in- dicativ e of an increase in listening effort. The study assessed the influence of rev erb eration and noise fluctuation on resp onse time. The different metho d- ologies applied in the studies and their groups of participan ts m ust b e carefully analyzed, as they can explain the differen t results. Am bisonics arrangemen ts (Mixed Order Am bisonics (MO A) [ 78 , 177 ] and HO A) are already used in audiological studies [ 7 , 77 , 173 , 303 ]. This study prop osed a low-order (first-order) Ambisonics implemen tation. The low-order tec hnique is more sensitiv e to the listener position [ 64 , 65 ], whic h w as also veri- fied in this study . That can b e seen as a counter-in tuitive and non-con ven tional Chapter 4. Metho ds 100 c hoice, although it was mean t to assess lo w-order Am bisonics’ feasibility in audiological studies and its constraints. This decision w as a step tow ards con- firming the feasibilit y of a listener in a cen tralized p osition found in Chapter 3 , observing its constrains, and further dev eloping an auralization metho d with lo w er hardware requirements in Chapter 5 . Hyp othesis The main researc h question is how the auralized acoustic sce- nario, sp ecifically the ro om and the SNR, increases auditory effort when vir- tualized. The hypothesis for the exp eriment is that a longer R T provided through sound virtualization and a low er SNR b oth lead to a more significant listening effort. Rev erb eration time can influence normal hearing and hearing- impaired p eople in different wa ys. F or example, on a v erage, hearing-impaired listeners exp erience more significan t difficulties with understanding sp eec h in a reverberant condition than normal hearing listeners, so they can suffer more from the strain of listening. As rev erb eration’s effects on hearing-impaired listeners v ary (see Chapter 2 ), this study employ ed only normal-hearing par- ticipan ts to inv estigate the effects of audio degradation. T o sub jectiv ely assess c hanges in hearing effort, a questionnaire was provided to participan ts, asking ho w muc h effort they found for each condition (describ ed in Section 4.2 ). This in v estigation is the first step to understanding the feasibility of including the simplified virtualization of sound sources in the expanding field of listening effort research. 4.2 Metho ds This exp eriment w as designed to gather data for tw o parallel analyses: the first w as to ev aluate differences in b eha vioral p erformance (sp eech recognition) and sub jectiv e impressions of listening effort driven b y differen t scenarios, manip- Chapter 4. Metho ds 101 ulating the ro om t yp e and the signal-to-noise ratio (SNR). The second study compared physiological resp onses of the brain as measures of listening effort to the same b eha vioral p erformance. This c hapter fo cuses on the exp eriment’s first study (b ehavioral data vs. sub jective impressions). Three ro oms were c hosen for this study: a classro om, a restaurant dining area, and an anec hoic ro om. F or this exp erimen t, a setup was dev elop ed to in vestigate the influence of lis- tening effort caused in nine differen t situations: three ro om sim ulations c harac- terized by their reverberation time and three SNRs. The setup was comp osed of four recorded talkers acting as maskers and one talk er acting as the target. The talkers’ p ositions were all spatially separated. The test paradigm inv olved the auditory presen tation of Danish hearing in noise test (HINT) sentences [ 205 ] on top of four sp eec h maske rs and recalling the words they could keep in memory after 2 seconds. The sound sources are spatially distributed and the participan t is informed that the target sp eec h is alw a ys from the front. The participants resp onses w ere w ord scored ( i.e. , w ord-based sp eec h intelligibilit y) b y Danish-sp eaking clinicians. The metho d in this study follows a similar setup with a four-talker babble setup as in [ 209 , 302 ], whic h in v estigated SNR and masker t yp es using pupilometry as a proxy for listening effort. Also, a study from W endt et al., [ 301 ] inv es- tigated the impact of noise and noise reduction through an equiv alent setup. This method’s inno v ation relies on using first-order Am bisonics to generate the rev erb eration based on Odeon simulated ro oms. Chapter 4. Metho ds 102 4.2.1 P articipan ts F or the data collection, 18 normal-hearing native Danish-sp eaking adults (eigh t females) with an av erage age of 36.9 ± 11.2 y ears first ga v e written consen t form and initially participated in the test. One participant was placed outside the sound field sw eet sp ot, so his data w ere discarded, and the data for the other 17 participan ts w ere used for further analysis. Ethical approv al for the study w as obtained from the Research Ethics Committees of the Capital Region of Denmark. F or eac h participan t, the pure-tone a verage of air conduction thresholds at 0.5, 1, 2 and 4 kHz pure tone audiometry (PT A4) w ere tested and confirmed b elow 25 dB HL. 4.2.2 Stim uli The target stimulus consisted of simple Danish sentences sp oken by a male sp eak er. The sen tences were from the HINT in Danish [ 205 ] and w ere 1.3-1.8 s in duration. The masking signal consisted of four differen t sp eakers, t w o female and t w o male, reading a Danish-language newspap er [ 302 ]. The total duration of each of the masker recordings w as appro ximately 90 seconds. The mask ers’ onset w as 3 s b efore and offset w as 2 s after the target, resulting in a masker duration of 6.3-6.8 s. In each trial, the time segment used of each mask er was randomized. In addition, the spatial p osition for each masker was also randomized in each trial, but alwa ys in tersp ersing male and female talk ers. The ov erall maskers’ equiv alen t contin uous sound level L eq w as set at 70 dB (64 dB each mask er), and the target L eq w ere set at 62 dB, 67 dB and 72 dB to generate three different SNR conditions: -8, -3 and +2 dB. In this study , SNR w as defined as the equiv alent contin uous sound lev el of the target signal compared to the comp eting masking L eq . The c hosen rev erb eration conditions aimed to represent common everyda y situations. The R T of the anechoic and Chapter 4. Metho ds 103 rev erb eran t conditions studied w ere defined as the ov erall rev erb eration time obtained through the output of the simulation soft ware (ODEON Softw are © v.12). The chosen rev erb eration time v alues aim to represent common everyda y situations. The absorption co efficien ts and relativ e area utilized to obtain the men tioned conditions are presen ted on App endix E . Fiv e source p ositions (one target and four maskers) w ere created around a receptor in eac h sim ulated ro om. All p ositions w ere 1.35 m from the center of eac h ro om where the receptor is lo cated. The approac h of creating t wo differen t rooms instead of changing the parameters of a single room w as c hosen to achiev e a more natural sound field. That w ay the absorption co efficien t applied to the ro om’s materials was kept close to real. The virtualization of the prop osed a coustic scenarios follo ws the path indicated in Figure 4.1 . An acoustic simulation is p erformed to create the appropriate c haracteristics of the sound according to the ro om. The softw are calculates the amplitude and the incidence directions of sound and its reflections arriving from sp ecific sources to a receptor p osition inside the ro om. F or eac h com bina- tion of source-receptor, the softw are generates a ro om impulse resp onse that is enco ded in Amb isonics first-order in Am biX [ 198 ] format (which is a channel order sp ecification for Ambisonics auralization first 4 channels are WYZX com- pared to WXYZ in the F uMa sp ecification). The generated file w as con v olv ed with anechoic audio and deco ded to the sp ecific arra y of 24 loudsp eak ers. Chapter 4. Metho ds 104 Figure 4.1: Aur alization pr o c e dur e implemente d to cr e ate mixe d audible HINT sentenc es with 4 sp atial ly sep ar ate d talkers at the sides and b ack (maskers) and one tar get in fr ont. 4.2.3 Apparatus The exp eriment w as set up in an anechoic ro om (IAC Acoustics) with 4.3 m × 3.4 m × 2.7 m (inner dimensions). The exp erimen tal setup consisted of a circular array of 24 loudsp eak ers p ositioned on 15 ◦ in terv al on the azim uth and 1.35 meters distance from the center. The target sound was repro duced at 0 ◦ (participan t’s front), the maskers were auralized at ± 90 ◦ and ± 150 ◦ (Figure 4.2 ). The p osition of the participan t during all the test w as monitored through a laser line and a camera ensuring they remained in the sweet sp ot. Stim uli were routed through a sound card (MOTU PCIe-424) with Firewire 440 connection to the MOTU Audio 24 I/O interface) and were play ed via 16 loudsp eak ers Genelec 8030A and 8 loudsp eak ers Genelec 8030C (Genelec Oy , Iisalmi, Finland) aligned in frequency and level. The Biosemi EEG device was used to collect the ph ysiological data, whic h help ed to restrain participan ts’ mo v emen t; the EEG data were not analyzed in this study . Chapter 4. Metho ds 105 Figure 4.2: Sp atial setup of the exp eriment: T est subje cts attende d to tar get (in blue) stimuli fr om a 0 ◦ angle in fr ont.The masking talkers (in r e d) ar e pr esente d at later al ± 90 and r e ar ± 150 p ositions. All enclosed spaces ha v e a certain degree of reverberation due to acoustically reflectiv e surfaces and bac kground noise due to equipment, including cont rolled audiological environmen ts. The lev els of reverberation and bac kground noise meet the criteria from Recommendation ITU-R BS.1116-3 [ 126 ] and are re- sp ectiv ely sho wn in Figures 4.4 and 4.3 . Figure 4.3: R everb er ation Time inside ane choic r o om at Eriksholm R ese ar ch Cen- tr e with setup in plac e. Chapter 4. Metho ds 106 Figure 4.4: Eriksholm Ane choic R o om: Backgr ound noise A-weighte d. L oudsp e ak- ers and lights on, motorize d chair off. The parameters were measured with the setup (loudsp eak ers, motorized c hair and BioSemi eeg equipmen t) inside the ro om and p ositioned as in the exp eri- men t. Figure 4.5 sho ws the setup placed inside the anechoic ro om. Figure 4.5: Setup inside ane choic r o om (Motorize d chair, adjustable ne ck supp ort and EEG e quipment). Chapter 4. Metho ds 107 4.2.4 Auralization Acoustic Scene Generation and Ro om Acoustic Simulation T o simulate the acoustics c haracteristics of the chosen scenarios, geometric mo dels were created in the ro om acoustics soft ware ODEON. Next, the Am- bisonics Ro om Impulse Resp onses w ere sim ulated using ODEON softw are, v ersion 12 [ 59 ]. The absorption co efficients of the ro om surfaces are listed in Annex E . All sen tences w ere auralized in Am bisonics [ 15 ], truncated by 1st order and enco ded to 24 c hannels. The analysis utilized the Institute of T ec h- nical Acoustics (IT A)T o olb o x [ 29 , 67 ]. Ro oms w ere chosen as representativ e of realistic and not extreme acoustic conditions. The spaces simulated were a classro om (9.46 m × 6.69 m × 3.00 m) with an ov erall R T of 0.5 seconds, and a restaurant’s dining area (12.19 m × 7.71 m × 2.80 m) with an ov erall R T of 1.1 seconds. The distance b et w een source and receptor w as kept the same, 1.35 m, across rooms. T arget and mask er p ositions w ere sim ulated b y selecting the appropriate sim ulated RIR to conv olve i.e., the sim ulated source-receptor RIR that corresp onds to the desired repro duction angle. Am bisonics Sw eet Sp ot In this study , tw o different metrics w ere used to compare the off-center p erformance of virtual sources auralized with first- order Ambisonics: the R T and the Sound Pressure Lev el (SPL). That is, the presen ted virtualized soundfield w as deliv ering the correct amoun t of rev erb er- ation and also the correct sound pressure level of eac h source resulting in the appropriate signal-to-noise ratio when was not p erfectly cen tered. T o estimate eac h p osition’s metrics, a logarithimc sweep signal (50-20000 Hz, 2.73 s (FFT Degree 18, Sample F requency 96 kHz)) was generated and conv olved with the Am bisonics first-order RIR calculated by ray-tracing in ODEON for eac h mod- eled ro om. The simulated ro oms presen ted an o v erall theoretical rev erb eration Chapter 4. Metho ds 108 time of 0, 0.5, and 1.1 s. These auralized files were enco ded to 24 channels distributed in the horizon tal axis. In the following, the files were play ed in- side the anec hoic ro om and simultaneously recorded. F rom the division in the frequency domain of the recorded signal and the zero-padded initial signal (de- con v olution), the calculated impulse resp onse (or binaural RIR (BRIR) when recorded with HA TS) represen ts the virtualized system, including the physical effects of the arra y and all calibration. Rev erb eration Time The R T w as calculated with IT A-T o olb o x from ini- tial 20-dB decrease from p eak level ( T 20 ) in the virtualized IRs. Figure 4.6 sho ws the ov erall R T results at the center position and by mo ving the receptor (manikin) tow ards the fron t. Figure 4.6: Over al l r everb er ation time (R T) as a function of r e c eptor (he ad) p osi- tion in the mid-saggital plane r e c enter (0 cm) The results show ed slightly greater R Ts (0.58 and 1.16 s) than what w as sim- ulated in the ODEON soft ware (0.5 and 1.1 s). How ever, this was exp ected since there is equipment inside the anechoic room (e.g., a large c hair and loud- sp eak ers) that can b e considered reflective surfaces that were not presen t in Chapter 4. Metho ds 109 the sim ulation. The results show ed that there is no ma jor effect on the energy deca y for small head mov ements. Sound Pressure lev el The sound pressure level w as determined b y con- v olving the target and mask er sounds with the impulse resp onses collected across t welv e p ositions with horizon tal displacements of 2.5, 5 and 10 cm and forw ard (mid-saggital) displacements of 2.5 and 5 cm. The results are shown in Figure 4.7 . F our sp eech talkers are individually conv oluted. The equiv alent sound pressure lev el is determined using the calibration factor. The measure is the av erage of 20 different sen tences. Figure 4.7: Sound pr essur e level virtualize d thr ough Ambisonics at differ ent listener p ositions. The changes in SPL as a function of off-cen tre p osition do not follo w a consis- ten t pattern. The SPL changes w ere, ho w ev er, mostly similar across the three sim ulated ro oms, with the exception of three p ositions where the restauran t (1.1 s R T) was 1-1.5 dB different (x = 2.5, y = 0; x = 0, y = 2.5; x = 10; y = 2.5). The cen ter p osition is the optimal p osition for sound pressure level accuracy . T o help get reliable, appropriate data from the exp eriment, a neck rest as well as a video feed and laser line were added to the setup after the Chapter 4. Metho ds 110 first pilot test. The participan ts were ask ed to b e in con tact with the neck rest all the time. The clinician was able to s ee the laser line at the patient’s head throughout the test. They could ask the participan t to quickly correct p osture at the start of eac h blo c k or at an y p oint of the session after the participant needed a break. Figure 4.8 sho ws a participan t positioned with all sensors con- nected. Another imp ortan t find w as, after adjusting the participan t p osition, the motorized c hair should b e unplugged, otherwise the EEG data would b e compromised. Figure 4.8: Particip ant p ositione d to the test. 4.2.5 Pro cedure There w ere 9 differen t conditions based on SNR (+2 dB, -3 dB, -8 dB) and rev erb eration time (0 s, 0.5 s, 1.1 s) of the sound. Eac h condition was presented in separate blo cks, and each blo c k consists of 20 sen tences, so in total there w ere 9 blo c ks and 180 sen tences presented to the participan ts in the main test. In addition to that, each participant w en t through a training round in the b eginning, consisting of 20 sentences with differen t conditions. The pro cedure for each trial is illustrated in Figure 4.9 . Each trial started with 2 s Chapter 4. Metho ds 111 of silence (preparation), then 3 s of background noise which served primarily as a baseline p erio d for the separate EEG analysis. Then a HINT sen tence w as play ed as the bac kground noise contin ued for 1.5 s on a v erage. After the target sen tence finished, the bac kground noise contin ued for another 2 seconds during whic h participan ts needed to main tain the w ords they just listened to (main tenance), also serving primarily for the companion analysis of EEG resp onses re baseline. When the background noise w as stopp ed, the participan ts were instructed to rep eat all the w ords within the sen tence (recall). The listening effort reflected in alpha p o w er changes in the maintenance phase ha v e b een inv estigated b y [ 208 , 310 , 311 , 313 ] Figure 4.9: T rial design. F or e ach trial, 20 in e ach blo ck, ther e was 2 s of silenc e, then 3 s of masker (4 sp atial ly sep ar ate d talkers), then a Danish HINT sentenc e as tar get stimuli in the pr esenc e of c ontinuing masker, then 2 additional s of masker, fol lowe d by silenc e when the p articip ant r ep e ate d as many tar get wor ds as they c ould understand and ke ep in memory. Figure 4.10 shows the user graphical user interface designed and implemented for this experiment. The 24-channel audio files were produced b eforehand (off- line), b eing calibrated to the sp ecific setup. Along with audio presen tation, the softw are also sent a series of triggers in synch with presentation timings to the EEG softw are (Actiview, BioSemi) to mark the EEG measuremen t Chapter 4. Metho ds 112 appropriately for the companion analysis. Figure 4.10: Gr aphic User Interfac e use d to ac quir e the data fr om p articip ants. Wor ds ar e state buttons that alternates b etwe en gr e en and r e d b eing save d as 1 or 0 r esp e ctively. 4.2.6 Questionnaire A t the end of each blo c k (SNR × ro om condition) a three-item questionnaire w as presen ted to the participan ts; the English translation is sho wn in T a- ble 4.1 . The questionnaire was translated from Zekveld and Kramer [ 318 ] to Danish. The resp onse to each question had a scale of 0 to 100 in integer units App endix F . The first question was aimed to measure participan ts’ estima- tion of their p erformance, referred to as “Sub jective in telligibilit y” for the rest of the text. The second question was to measure participan ts’ p erception of effort, referred to as “Sub jectiv e effort”. The third question provided to mea- sure ho w often participants gav e up during the test, referred to as “Sub jective disengagemen t”. Chapter 4. Results 113 T able 4.1: The questionnaire for sub jective ratings of p erformance, effort and en- gagemen t (English translation from Danish) Question 1 How man y w ords do y ou think that you understo o d correctly? Question 2 How m uc h effort did y ou sp end when listening to the sentences? Question 3 How often did you give up trying to p erceive the sen tences? 4.2.7 Statistics A linear mixed mo del [ 171 , 233 ] (LMM) was used to in v estigate SNR and R T effects on p erformance and questionnaire. The effects on differen t alpha bands through EEG p ow er b y SNR and R T w ere also explored through LMM in the collab orative analysis p erformed b y Seifi Ala. SNR and R T w ere fixed factors, while participan ts w ere random factors in the mo del. Implemented in MA TLAB, the syntax for LMM was Dep endent ∼ 1+SNR*R T+(1—Sub ject ID), with Dependent b eing either p erformance or questionnaire. Both the SNR (-5, 0, 5) and R T (-0.53, -0.03, 0.56) lev els w ere re-cen tered around zero for the mo del. 4.3 Results This section highlights the findings concerning the study’s questions, the fea- sibilit y of having a hearing in noise test virtualized in first-order Am bisonics, and the influence of degradation through SNR and Reverberation in the Sp eech In telligibilit y . The participant’s b eha vioral p erformance ( i.e. , speech recognition accuracy) demonstrated significant effects of SNR ( β = 5 . 98 , S E = 0 . 30 , t 158 = 19 . 67 , p < 0 . 001), and R T ( β = − 31 . 17 , = 1 . 78 , t 158 = − 17 . 49 , p < 0 . 001) and a significan t in teraction betw een the tw o ( β = 1 . 76 , S E = 0 . 43 , t 158 = 4 . 04 , p < 0 . 001). Figure 4.11 presents the mean p erformance (p ercen t correctly recalled w ords) as a function of SNR for eac h ro om. Less signal degradation, whether higher Chapter 4. Results 114 SNR or low er R T led to higher p erformance accuracy . Figure 4.11: Performanc e ac cur acy b ase d on p er c entage of c orr e ctly r e c al le d wor ds as a function of SNR and R T (line c olor/shading). Err or b ars r epr esent the standar d err or of the me an. Lines/symb ols ar e stagger e d for le gibility and do not indic ate variation in SNR. The statistical analysis of the results for sub jective in telligibility (Figure 4.12 ), sub jectiv e effort (Figure 4.13 ), and sub jectiv e disengagemen t (Figure 4.14 ) are sho wn in T able 4.2 . All the measures show a significan t interaction b et ween SNR and R T. Lo wer signal degradation (higher SNR and lo w er R T) led to higher sub jective estimation of intelligibilit y p erformance accuracy , decreased rep orted effort and disengagement. T able 4.2: Results of linear mixed mo del based on SNR and R T predictors estimates of the questionnaire. DF = 158 Self-rep ort scales Question Predictor Sub jective in telligibility Sub jective effort Sub jective disengagemen t SNR β = 5.71 S E = 0.42 t = 13.48 p < 0.001 β = -5.60 S E = 0.41 t = -13.57 p < 0.001 β = -5.78 S E = 0.48 t = -11.85 p < 0.001 R T β = -33.74 S E = 2.47 t = -13.61 p < 0.001 β = 23.58 S E = 2.41 t = 9.76 p < 0.001 β = 33.39 S E = 2.85 t = 11.68 p < 0.001 SNR x R T β = 1.56 S E = 0.60 t = 2.57 p = 0.010 β = 1.50 S E = 0.59 t = 2.54 p = 0.012 β = -2.06 S E = 0.69 t = -2.94 p = 0.003 Chapter 4. Results 115 Figure 4.12: Subje ctive intel ligibility as a function of SNR and R T (line c olor/shading). Err or b ars r epr esent the standar d err or of the me an. Lines/symb ols ar e stagger e d for le gibility and do not indic ate variation in SNR. The sub jective impression of how m uc h effort was required and how willing they were to give up in each situation are presented in Figures 4.13 and 4.14 , resp ectiv ely . Figure 4.13: Subje ctive effort as a function of SNR and R T (line c olor/shading). Err or b ars r epr e- sent the standar d err or of the me an. Lines/symb ols ar e stagger e d for le gibility and do not indic ate variation in SNR. Figure 4.14: Subje ctive disengage- ment as a function of SNR and R T (line c olor/shading). Err or b ars r ep- r esent the standar d err or of the me an. Lines/symb ols ar e stagger e d for le gibility and do not indic ate variation in SNR. Chapter 4. Discussion 116 The results show the statistically significant contributions of rev erb eration and SNR to p erceived p erformance, effort and disengagemen t. F rom Figures 4.12 , 4.13 , and 4.14 , the self-rep ort scales v aried near-linearly with the signal degra- dations across conditions, agreeing generally with the b eha vioral data (See Figure 4.11 ). The sub jective effort is related to the in verse of the reverberation time; the more time the energy needs to dissipate in the en vironmen t, the greater the p erceiv ed effort. The results from all the self-report scale questions w ere highly correlated to p erformance. P earson skipp ed correlations [ 308 ] revealed a sig- nifican t ρ co efficient (See T able 4.3 ): T able 4.3: P earson skipp ed correlations betw een performance and self-reported ques- tions. p erformance vs sub jectiv e intelligibilit y p erformance vs sub jectiv e effort p erformance vs sub jectiv e disengagement r 0.95 -0.79 -0.94 CI 0.93, 0.96 -0.84, -0.74 -0.96, -0.92 4.4 Discussion This study presen ted an in teresting challenge to the researchers. The pilot data p ointed to the direction of the virtualization not rendering the correct sound, esp ecially not the correct sound pressure level. The setup was retested and in vestigated in different p ositions, and the problem w as identified. The first-order Ambisonics rendering has a relatively small sw eet sp ot. Thus par- ticipan ts were monitored to b e in the correct p osition during the testing. The sw eet sp ot capabilities in terms of the correct ov erall SPL repro duction pre- sen ted limitations of plus-minus 1 dB up to 5 cm and plus-minus 3 dB to the target SPL up to 10 cm off center. Although not testing the exact Am bisonics implemen tation through a different p erformance measure, the findings agree with literature observing con trasts caused by the repro duction metho d in sim- Chapter 4. Discussion 117 ilar distances out of the cen ter. As a reference, Grimm et al. [ 97 ] analyzed sim ulated Ambisonics en vironmen ts with differen t num b ers of loudsp eak ers, studying its influence on a represen tative hearing aid algorithm. It sho wed a decrease in SNR errors when increasing loudsp eakers and decreasing frequency . A bandwidth of 2 kHz in the central listening p osition, 12 loudsp eak ers w ould b e required for HOA. If 24 loudsp eak ers are a v ailable, the bandwidth in the cen tral listening p osition would b e 6 kHz. Laurent et al. [ 276 ] analyzed the reconstruction error to assess the rendering system’s frequency capabilities. A KEMAR was fitted with a hearing aid, without pro cessing, to collect the impulse resp onses. Regarding range, a third-order implemen tation with 29 loudsp eak ers decreased from 3,150 Hz in the center to 2,500 Hz when p osi- tioned 10 cm from the center. T ests that in v olv e separated sound sources and are auralized and virtualized b y loudsp eak er setups need to b e v erified in terms of sweet sp ot size to the sp ecific sound parameters (e.g., R T and SPL). An off-cen tered or mo ving head can, in an Am bisonics 1st order auralization, easily encoun ter a sp ot in space where, for example, the wa ve field com bination ma y partially cancels one or more mask ers increasing the SNR, even if the in tended SNR is low (See Figure 4.7 ). In other off-center sp ot it could also b e p ossible to partially cancel the target. These distortions could profoundly impact the results, and not represen t what w ould b e achiev ed in the real scenario that was b eing sim ulated. F or normal hearing participants, a more psychologically orien ted psyc hoacous- tic auralization metho d such as lo wer order Ambisonics can pro vide the desired acoustic impression insofar as ob jective and sub jectiv e p erformance when the calibration is p erformed, and the setup limitations (e.g., v ery restricted sweet sp ot) are resp ected. An in vestigation of p erformance in off-center p ositions using hearing impaired participants would b e an imp ortant next step to w ards understanding a broad clinical application of this metho d. Chapter 4. Concluding Remarks 118 P articipan ts were tested in three different SNRs (-8, -3, +2 dB) and three vir- tual ro oms (with R Ts of 0, 0.5, and 1.1 s). The more the manipulated signal w as degraded (low er SNR and higher R T), the more demanding the listening conditions b ecame, which led to lo w er the participant’s sp eec h in telligibility . A questionnaire w as used as a sub jective measure of effort. Comprehensiv ely , participan ts rep orted increased sp eech intelligibilit y , less cognitive effort, and less tendency to disengagement when diminish the signal degradation. That denotes that if they could recall the sp eec h well, they p erceived that they p erformed w ell and also sp en t less effort. The results from all three ques- tions within the questionnaire were strongly correlated (either p ositively or negativ ely) to the sp eec h in telligibility of the participants. They significantly c hanged with both SNR and R T and the in teraction b etw een them. When ask ed ab out sub jective impressions of eac h block, the participants demon- strated to ha v e p erceiv ed the prop osed signal degradation b oth in SNR and R T. That is in line with the studies from Zekveld et al. [ 319 ], Holub e et al. [ 121 ], Neuman et al. [ 203 ], Kw ak et al. [ 149 ], Nicola & Chiara’s study [ 204 ] and Picou & Ric ketts [ 229 ]. F urthermore, studies that cross ob jectiv e measurements of ph ysiological parameters in the literature asso ciated with c hanges in effort can ha v e div ergen t outcomes, as discussed in Chapter 2 . F rom that discussion, it is sp eculated that these differen t metho ds, prop osed to ac hiev e a proxy to listen- ing effort, are sensitiv e separated asp ects of a complex global pro cess [ 12 , 224 ]. Another explanation would b e the minimization of effort utilized b y the par- ticipan t through the heuristic strategies in the sub jective metho d [ 192 ], and lastly , the effect of w orking memory being related differently to differen t meth- o ds [ 53 , 186 ]. A separated study by Tirdad Seifi-Ala from this combined ex- p erimen t examined the correlation b et w een ob jective (physiological resp onses of the brain) and sub jective paradigms. Chapter 4. Concluding Remarks 119 4.5 Concluding Remarks In this study , nine lev els of degradation were imp osed on sp eech signals ov er sp eec h mask ers separated in space and virtualized. Three different SNRs (-8, -3, +2 dB) and three different sim ulated rooms (with R Ts of 0, 0.5, 1.1 s) w ere used to manipulate task demand. The sp eec h intelligibilit y w as assessed through a w ord-scored sp eec h-in-noise test p erformed in a 24 loudsp eaker setup utilizing Am bisonics first-order. The results sho w ed a high correlation betw een participan ts’ p erformance and resp onses to questions ab out sub jective intel- ligibilit y , effort and disengagement. The main effects and interaction of SNR and R T w ere demonstrated on all questions. F urthermore, it w as observ ed that the rev erb eration time inside a ro om impacts b oth sp eec h intelligibilit y and listening effort. This study demonstrated the p ossibility of virtualizing a com bination of sound sources in low order Am bisonics and extracting quality b eha vioral data. Chapter 5 Iceb erg: A Hybrid Auralization Metho d F o cused on Compact Setups. 5.1 In tro duction P eople usually wear their hearing devices in spaces very differen t from the lab oratories’ soundpro of b o oths in ev eryday life. Additionally , the everyda y sounds are more complex and different from the pure tones, w ords, and phrases without context utilized in many hearing tests. Therefore, hearing researc h has increasingly aimed to include acoustic verisimilitude on auditory tests to make them more realistic and/or ecologically v alid [ 61 , 79 , 101 , 177 , 212 , 217 ]. Thus, they can ev aluate new features and algorithms implemen ted on hearing devices and exp erimen t with different fittings and treatmen ts while maintaining their rep eatabilit y and control. One can utilize a particular auralization tec hnique to create repro ducible sound 120 Chapter 5. In tro duction 121 files in a listening area. These sounds attempt to mimic the acoustical charac- teristics of en vironmen ts (from actual recordings or acoustic sim ulations). It can then b e play ed through a set of loudsp eak ers or pair of headphones, cre- ating b oth the sub jective impression and ob jective represen tation of listening to the intended sound en vironment [ 293 ]. Through an auralization metho d, it is p ossible to create a sound file con taining spatial information ab out the scene and a series of details ab out the configura- tion of the repro duction system [ 293 ]. The repro duction system includes, for example, the n um b er of loudsp eak ers and their physical p osition, the n um b er of audio channels av ailable, and the distance from the loudsp eak ers to the lis- tening p osition. The size of the effective listening repro duction area - where the auditory spatial cues of the scene are most accurate - is usually called the ”sweet sp ot” [ 253 ]. The spatialization accuracy is differently affected b y differen t systems as w ell as auralization metho ds [ 65 , 97 , 166 , 275 , 276 ]. The auralization metho d can be decisiv e in the repro duction system choice; for example, certain metho ds require certain num b ers of loudsp eakers [ 62 , 217 ]. Consequen tly , the auralization metho d can b e a limiting factor dep ending on the tests or experiments. A dedicated setup capable of handling different auralization metho ds with a large listening area [ 188 ] ma y require an excessiv e amoun t of funding and ph ysical space. These requiremen ts can b e a limiting factor to conducting researc h and developing inno v ativ e treatments. This c hapter prop oses a compact setup with a h ybrid auralization metho d. It is characterized in some conditions (R Ts, presence of a second listener, and listener p osition) by considering the intended use in auditory ev aluations as in the previous c hapter. The setup aims to repro duce sound scenes maintaining spatial lo calization and creating an immersive sound en vironmen t from either a scenario in an actual ro om or virtual ro oms created in acoustic softw are. Chapter 5. Iceb erg an Hybrid Auralization Metho d 122 5.2 Iceb erg an Hybrid Auralization Metho d The Iceb erg auralization metho d combines t w o well-kno wn metho ds: VBAP and Am bisonics. In Chapter 3 , VBAP and Am bisonics binaural cues w ere ob jectiv ely ev aluated. The VBAP metho d was found to render accurate cues in the center p osition, even with a second listener inside the array . That corrob orates the use of VBAP to increase tests’ ecological v alidit y in auditory tests [ 134 ]. On the other hand, Ambisonics deliv ered less precise lo calization cues, imp osing more restrictions on the listener’s position. The results are in line with literature presen ting p oor lo calization but high immersiv eness from lo w-order Am bisonics [ 104 , 105 ] and, conv ersely , lesser immersiveness and greater lo calization accuracy from VBAP [ 89 , 104 ]. Therefore, the idea here is to provide an auralization that contains temp oral and sp ectral features of the sounds encoded through VBAP while the spaciousness pro vided through the reverberation env elop e is enco ded through Ambisonics. This sp ecific com bination of auralization metho ds has also b een considered to decrease the n um b er of necessary loudsp eak ers for a setup that requires regular hearing devices. A t the same time, the setup ma y allo w some degree of head mo v emen t without the need for tracking equipmen t. That is a coun termeasure to ov ercome common limitations in ordinary auditory test spaces [ 316 ]. 5.2.1 Motiv ation The primary motiv ation for creating this auralization method w as to test hear- ing aid users in t ypical situations while w earing hearing devices in a small setup. Therefore, the method is loudsp eak er-based, but at the same time, the num b er of loudsp eak ers and the system complexity w ere also constraints. The theoretical supp ort for com bining these auralization metho ds and prop os- Chapter 5. Iceb erg an Hybrid Auralization Metho d 123 ing the smaller virtualization setup is gathered from ro om acoustic parameters and psychoacoustics principles presented in the review and during this c hapter. These parameters and principles led to a system able to use RIRs from simu- lated environmen ts (spaces that ma y only exist in a computer) and recorded RIRs from real ones. The initial Iceb erg fo cus is on tests that manipulate sound scenarios to ev aluate sp eec h in telligibilit y mask ed b y noise from static p ositions as tested with lo w-order Ambisonics in Chapter 4 . 5.2.2 Metho d The Iceb erg metho d is a relatively easy-to-use algorithm that can be intro- duced to test environmen ts with a simple calibration pro cess. The virtual- ization system presen ted auralized files in a quadraphonic arra y with loud- sp eak ers p ositioned at 0, 90, 180 & 270 ◦ (see Figure 5.1 ). Other horizon tal setup arrangements can b e implemented dep ending on the need, considering the system’s angle rotation, frequency resp onse, and the p otential v ariation in lo calization accuracy . Although there is a minimum num b er of necessary loudsp eak ers (four), the metho d can b e used to auralize files to setups with an extended loudsp eak er num b er. The presen ted algorithm was implemented in MA TLAB (Math w orks). Figure 5.1: T op view. L oudsp e akers p osition on horizontal plane to virtualization with pr o- p ose d Ic eb er g metho d. The prop osed loudspeaker setup had a radius of 1.35 m. Other distances need to b e ev aluated regarding the system frequency resp onse. The prop osed Iceb erg’s implementation derives an appropriate multi-c hannel audio signal with sp ecific information from a sound and its reflections (inci- Chapter 5. Iceb erg an Hybrid Auralization Metho d 124 dence angle, sound energy , spatial and temp oral distribution). These param- eters can b e enco ded into a sound file with the repro duction setup’s sp ecific calibration v alues and p ositioning orien tation. Finally , the auralized file can b e repro duced (virtualized) as spatial sound. 5.2.2.1 Comp onen ts The Iceb erg metho d prop osed is a hybrid auralization metho d, a combination of VBAP and first-order Ambisonics; section 2.3.2.2 reviews the deriv ation of b oth metho ds. Both tec hniques are based on the panorama of amplitude. The main difference is in the mathematical formulation of the gains applied to the amplitude of each sound source. VBAP treats the repro duced sound as a unitary v ector in a tw o- or three-dimensional plane (Equations 2.4 and 2.7 , resp ectiv ely). The weigh ts applied to the amplitude of the signal at each loudsp eak er are derived from the tangent law. It is traced as a v ector from the nearest a v ailable sources b et wee n the listening p osition and the desired source position (Equation 2.3 ). On the other hand, Ambisonics utilizes all loudsp eak ers av ailable to comp ose the sound field. The metho d com bines the amplitude of the sources, calculating their w eights according to the sum of spherical harmonics (Equation 2.9 ) that represen ts the pressure field formed b y the sound w av e (Equation 2.8 ). While VBAP concen trates the energy b etw een t w o loudsp eakers in its 2D implemen tation, Am bisonics spreads it through all av ailable loudsp eakers. That leads to a more immersive exp erience on Am bisonics, while the VBAP can b etter represen t the sound source direction. 5.2.2.2 Energy Balance The energy balance b etw een the metho ds is calculated based on the Ambisonics first-order impulse resp onse (See example in Figure 5.2 ); on the left is the Chapter 5. Iceb erg an Hybrid Auralization Metho d 125 impulse resp onse (or deca y curve) and is not the deca y of the squared v alue of the sound pressure signal. On the righ t, the 10log curves ( h 2 ( t )) for the differen t channels. Note that in these curves, the maximum lev el is 0 [dB], as the interest is in the time it tak es for the p o wer to drop by 60 [dB]. Also, note that there is a small time gap b etw een time 0 [s] and when the energy v alue of h ( t ) is maximum. This interv al corresp onds to the time it tak es the sound wa ve to trav el b et w een the source and the receiver and allows an estimate of the distance b et w een them. F rom recorded IRs, the gap also includes the system dela y , which should b e comp ensated. That was the c hoice since the impulse resp onse of an en vironment can easily b e acquired utilizing an Ambisonics first-order microphone array . F urthermore, it is p ossible to find commercially a v ailable acoustic soft w are to ols to simulate sound environmen ts capable of exp orting impulse resp onses in Am bisonics format. Figure 5.2: Normalize d A mbisonics first-or der RIR gener ate d via ODEON soft- war e. L eft p anel depicts the waveform; right p anel depicts the waveform in dB. The system’s design requires an RIR to b e split in to t w o parts. The first part con tains the amoun t of energy to b e deliv ered through VBAP . The second part will b e computed through Am bisonics. F rom the reflectogram, the time represen tation of the latency and atten uation of the direct sound (DS), early reflections (ER), and late reflections (LR; see Figure 5.3 ), it is p ossible to find Chapter 5. Iceb erg an Hybrid Auralization Metho d 126 the p oint in time representing the direct sound (the first p eak) and then sep- arate it correctly from the rest of the RIR. Although splitting the RIR into DS and remainder ma y b e the most straightforw ard metho d, the achiev ed re- sults w ere initially p erceiv ed in p ersonal exp erience as unnatural highligh ted “dry” (not rev erb eran t) sound from a defined p osition follo w ed by really dis- tan t/disconnected reverberation, coun ter to the aims of a more ecologically v alid sound repro duction. Th us, in the prop osed metho d the ER part w as included with the DS part. Figure 5.3: R efle cto gr am split into Dir e ct Sound Early and L ate R efle ctions. The late reflections of an RIR refer to the signal wa vefron ts reflected and scat- tered sev eral times across the different p ossible paths. These reflections ov erlap eac h other, and as time progresses, successiv e w av efronts interact with an y sur- face, increasing reflection order, c hanging direction and decreasing remaining sound energy . The literature indicates that a psychoacoustical appro ximation of the time p oin t in a sp ecific RIR when the human auditory system can no longer distin- guish single reflections due to reflection densit y [ 38 ]. Lindau [ 156 ] prop osed a transition p oin t in time (transition time ( t m )) based on mean free path length for the wa vefron t (Equation 5.1 ). t m = 20 V S + 12 [ms] , (5.1) Chapter 5. Iceb erg an Hybrid Auralization Metho d 127 where, V is the v olume of the ro om in m 3 , and S is the surface area inside the ro om in m 2 . The minimum necessary order of reflections to represent a uniform and isotropic sound field that leads to diffuse rev erb eration from an Image Source (IS) mo del is 3. That agrees with observ ations from Kuttruf [ 148 ] on the sp ecular reflec- tions’ contribution to diffuse energy in an RIR. This approach w as implemented in a similar hybrid metho d by P elzer et al. [ 221 ]. Another metho d dev elop ed b y F avrot [ 79 ] also uses the IS order information from simulated RIR computed with ODEON soft w are. Its IS re- flection order information provides a p oin t to obtain a segment of the file with the late reflections env elop e used by the system to deliv er a hybrid m ulti- c hannel RIR. These metho ds consider RIR and mix sp ecific stim uli to the output, as do es the prop osed metho d. Other h ybrid auralization metho ds suc h as DirAC [ 243 ] consider the recording of a sound even t (in Am bisonics) and driv e the repro duction based on energy analysis spanning all sound source directions. Th us, DirAC is intended to primarily work with recorded scenes instead of conv olutions with RIR. 5.2.2.3 Iceb erg prop osition The prop osed Iceb erg metho d, how ever, uses neither the t m metho d, whic h is dep enden t on the v olume of the ro oms and the IS sim ulated reflection order, nor the LR env elop e time, derived from an IS sim ulation. Instead, a different parameter is prop osed that allo ws generalizing to b oth recorded and sim ulated Am bisonics RIRs. P arameters of clarity and definition are metrics to determine early/late energy balance [ 43 ]. How ever, the fixed time of 50 or 80 milliseconds is not appropriate Chapter 5. Iceb erg an Hybrid Auralization Metho d 128 to represent the transition p oin t (from early to late reflections) on every RIR, as the slop e will differ and dep end on man y factors [ 45 ]. The transition p oint c hanges as the amoun t of energy and the deca y distribution change from RIR to RIR. A similar parameter that is not time fixed is the T) given b y Equation 2.15 (see Section 2.3.3 ). This parameter is also derived from the squared RIR b y calculating the transition p oin t from early to late reflections represen ted as the RIR’s cen ter of gravit y . Therefore, the metho d’s name is given b ecause of the singularit y of the RIRs. They presen t a center of gravit y on its p o w er deca y representation, whic h is similar to the physical blo cks of frozen w ater called iceb ergs. The cen ter of gra vit y is the equilibrium p oint b et ween the gra vit y force and the w ater buo y ancy for iceb ergs [ 34 ]. This representation is translated to the Iceb erg metho d as the transition p oint b etw een early and late reflections from an RIR. This pro cess entails an RIR applied through multiplication in the frequency domain, equiv alent to a conv olution in the time domain, to a sound that can b e virtualized through the system. The first action of the metho d’s algorithm is the iden tification the cen ter time Ts in the c hannel relative to the omnidi- rectional channel of the Ambisonics RIR. A schematic ov erview of the metho d is presented in Figure 5.4 . Chapter 5. Iceb erg an Hybrid Auralization Metho d 129 Figure 5.4: Ic eb er g’s pr o c essing Blo ck diagr am. The Ambisonics RIR is tr e ate d, split, and c onvolve d to an input signal. A virtual auditory sc ene c an b e cr e ate d by playing the multi-channel output signal with the appr opriate setup.. Figure 5.5 shows an example of the RIR relativ e to the omnidirectional in- put channel simulated through ODEON V.12 [ 59 ] relativ e to the sim ulated restauran t dining ro om used in Chapter 4 with 1.1 seconds of rev erb eration time. Figure 5.5: Omnidir e ctional channel of Ambisonics RIR for a simulate d r o om. The blue line indic ates the p art that pr eviously sele cte d the c alculate d Center Time, henc e indic ate d as the dir e ct sound plus the e arly r efle ctions. The or ange line indic ates the late r everb er ation p art of the RIR. Chapter 5. Iceb erg an Hybrid Auralization Metho d 130 Figure 5.6 presents an example of the Ambisonics RIR in the left column, the omnidirectional channel relative to the DS+ER in the middle column. The righ t column graphs represen t the four c hannels late reflections part of the Am bisonics RIR. Figure 5.6: First c olumn: four channels Ambisonics RIR. Midd le c olumn: omni- dir e ctional channel (DS+ER p art). R ight c olumn: four channels A mbisonics RIR (LR p art). In the sequence, the metho d first splits the RIR based on the TS. Then the direct sound and the early reflections are con volv ed with the signal to b e repro- duced. In this step, only the omnidirectional channel is used. Finally , the sig- nal is pro cessed using VBAP to provide its directional prop erties. The VBAP metho d utilized was implemented in [ 237 ]. The VBAP output is t w o-c hannel panned audio that is sent to channels of the corresp onding loudsp eak ers. The output signal corresp onds to the relativ e full scale of the panned signal if the pro vided Ambisonics RIR is normalized, or the absolute v alue in case of an un-normalized RIR. With the normalized RIRs, calibration of a sound pres- sure level is required, and the repro duction level can b e set accordingly to the application needs. Assuming a coherent sum b et ween tw o loudsp eak ers that are set to repro duce the scaled signal to a predefined lev el, a prop ortion is computed as follow: Chapter 5. Iceb erg an Hybrid Auralization Metho d 131 LS 1 = 20 log 10 (10 level / 20 ∗ sin 2 θ ) , (5.2a) LS 2 = 20 log 10 (10 level / 20 ∗ cos 2 θ ) , (5.2b) where the user sets the level in dB SPL and θ is the incidence angle. A similar lev el calibration recording a pure tone from a calibrator with a microphone to find a system’s α co efficient (as explained in 3.2.3 ) will allo w pla ying the signal ov er each loudsp eaker with the intended level. A frequency filter for eac h loudsp eak er is also p ossible if the loudsp eak ers’ FRF needs to b e individually adjusted to achiev e a flat(ter) resp onse. The second part of the impulse resp onse is then con v olved with the signal, all four c hannels in the prop osed quadraphonic system b eing used. First, an Am bisonics deco der matrix observing the loudsp eak ers’ p osition is created. Th us, the conv olved signal is deco ded from its bF ormat to aF ormat. The implemen tation utilized in the algorithm to create the deco der matrix and to deco de the signal uses the functions from the Politis [ 237 ] w ork. The separated signals are then merged b eing ready to b e repro duced. Figure 5.7 show an example for an auralization of five seconds of the Interna- tional Sp eec h T est Signal (ISTS) [ 120 ]. The top graph is the original signal, and the mid-top is the signal conv olved with the DS and ER of the omnidi- rectional c hannel of the Ambisonics RIR. The en v elop e is minimally affected b y the ER. The mid-b ottom sho ws the signal conv olved to the LR part of four channels and deco ded from Ambisonics bF ormat. The diffuse nature of the Ambisonics-generate LR eviden t in the smo other o verall env elop e. The b ottom graph shows the result of the Iceb erg metho d, the merged signal. This pro cess provides an auralized file that should b e repro duced through an Chapter 5. Iceb erg an Hybrid Auralization Metho d 132 Figure 5.7: Ic eb er g metho d example. T op gr aph: original signal. Mid top gr aph: DS+ER p art (VBAP). Mid b ottom gr aph: LR p art (Ambisonics). Bot tom p art (Ic eb er g). equalized and calibrated setup. An equalization and calibration prop osal is describ ed in the Section 5.2.3 and can b e applied to similar setups with equiv- alen t hardw are. Ho wev er, the results ma y v ary depending on hardware quality , loudsp eak er amplification, and frequency resp onse. In this w ork, the electroa- coustical requirements (7.2.2) and Reference listening ro om (8.2) from the Recommendation ITU-R 1116-3 [ 126 ]: Metho ds for the sub jective assessmen t of small impairments in audio systems were observed. The frequency-sp ecific rev erb eration times w ere low er than the Recommendation: 0.04 s from 0.2- 4 kHz (0.08 s at 0.125 kHz) and 0.18 s in the Recommendation. The anechoic c haracteristic of the room was inten tionally chosen in this case to ev aluate rev erb eration in the virtualization setup. A setup within a differen t space will hav e different ro om acoustics c haracteristics. The exp erimen ter can com- p ensate for the need for greater reverberation b y con trolling the RIRs input. The electroacoustical requiremen ts for the loudsp eakers are also relev ant as they aim to guaran tee the correct frequency reproduction or the p ossibility of comp ensating the frequency resp onse with the appropriate hardw are. The ro om prop ortions are also essen tial when setting a test environmen t, espe- cially if the repro duction will include low frequencies affected by the ro om’s Chapter 5. Iceb erg an Hybrid Auralization Metho d 133 Eigen tones (standing wa ves). The address https://gith ub.com/aguirreSL/HybridAuralization con tains an ex- ample and the necessary resources to auralize files according to this Iceb erg metho d. This study utilizedAm bisonics first-order impulse resp onses generated with the ODEON V.12 soft ware. The choice was made by con v enience, and it can b e extended to an y equiv alent Ambisonics RIR, simulated or recorded. The resulting RIR from ODEON is normalized. With that, the user can play a sound on a different level (from the simulated one) without rerunning the sim ulation using the normalized version. As an option, the metho d can denor- malize it (dividing the RIR by its corresp onding factor provided in ODEON grid [ 159 ].). The denormalized result sound will b e auralized to the level sim- ulated in ODEON (or equiv alent softw are). 5.2.3 Setup Equalization & Calibration: The setup can include a calibration and equalization procedure that is included in the MA TLAB scripts to ensure a correct sound level repro duction and also a flatter frequency resp onse from the system’s loudsp eak ers, av oiding additional undesired coloration artifacts. First it was calculated a factor to transform the acquired signals from full scale to dB SPL. This step consists in recording a pure tone at a sp ecific frequency (1 kHz) with a known input level of 1 Pa, and calculating a factor to conv ert the input from F ull-Scale to P a. The term indirect refers to the fact that this calculated factor is applied to all frequencies, under the assumption that the setup (microphone, pre-amplifier, pow er supply , and AD/DA conv erter) has a flat frequency resp onse in the audible frequency range. T o calculate the conv ersion factor a sound pressure calibrator device (in this case the B&K 4231) was connected to the microphone (1/2” B&K Chapter 5. Iceb erg an Hybrid Auralization Metho d 134 4192 pressure-field and a pre-amplifier type 2669, supplied b y p ow er mo dule B&K 5935). That provided a 93.98 dB SPL signal, which corresponds to 1 Pa. The calibration factor ( α rms ) was calculated as in Equation 5.3 . Although this step was not needed to the frequency equalization, it w as con venien t as once measured all the following measurements were p erformed without the need of en tering the ro om. α rms = 1 RMS( v ( t ) 1kHz ) P a FS , (5.3) The next step consists in equalize the frequency resp onse of eac h loudsp eaker. A RIR from eac h loudsp eak er was measured and based on that an inv erted FIR filter w as individually created to b e applied to the signals that will b e repro duced. The frequency resp onse w as con v erted to its third-o ctav e v ersion, normalized, and inv erted to create a vector with 27 v alues from 50 Hz to 20 kHz. These vectors con tained the correction v alues in the frequency domain and can b e applied to any input signal. T o apply this corrections a Piecewise Cubic Hermite Interpolating P olynomial (PCHIP) w as used in MA TLAB to fit the v alues to the giv en input. Figure 5.8 presen ts an example of the normalized third o cta v e mo ving av erage RIR acquired with a loudsp eaker (blue line), the same RIR but acquired with a signal that was filtered (red line), and the filter frequency v alues obtained with the inv ersion of the original RIR (black line). Chapter 5. Iceb erg an Hybrid Auralization Metho d 135 Figure 5.8: L oudsp e akers normalize d fr e quency r esp onse and inverte d filter. Dote d lines r epr esent ITU-R 1116-3 limits. Figures 5.9 and 5.10 shows the moving av erage of each loudsp eak er’s normal- ized frequency resp onse without and with the filter, resp ectively . Figure 5.9: L oudsp e akers normalize d fr e quency r esp onse (c olor e d solid lines), dote d lines r epr esent ITU-R 1116-3 limits. Chapter 5. Iceb erg an Hybrid Auralization Metho d 136 Figure 5.10: L oudsp e akers normalize d fr e quency r esp onse with fr e quency filter c or- r e ction (c olor e d solid lines), dote d lines r epr esent ITU-R 1116-3 limits. As the amplification to eac h active loudsp eak er is individually controlled it is p ossible that a same file could b e repro duced at a differen t sound pressure lev el (if someone inadv erten tly or accidentally c hange the vol ume con trol button directly in the loudsp eak er for example). Since the α rms w as already calculated and it was p ossible to con v ert a signal from FS to Pa, and consequen tly , to dB SPL, and vic e-versa the individual loudsp eakers’ SPL w ere measured with a signal defined to b e pla yed at 70 dB SPL Equation 5.4 . signal( t ) = signal( t ) rms(signal( t )) 10 70 − dBperV 20 µ Γ l (5.4) where Γ l is the level factor to the loudsp eak er l with initial v alue = 1; dBp erV = 20 log 10 α rms 20 µ . The signal( t ) was pla y ed through a loudsp eak er l and simultaneously recorded with the microphone S l ( t ); the SPL of the recorded signal was calculated as follo ws Chapter 5. Iceb erg an Hybrid Auralization Metho d 137 SPL l = 20 log 10 S l ( t )[FS] α rms Pa FS 20[ µ P a] ! [dB] , (5.5) T en measuremen ts were sequen tially performed with each loudsp eak er at inter- v als of 1 s; another iteration of measuremen ts were p erformed if the measured SPL exceeded the tolerance of 0.5 dB on any of the measuremen ts. A step of ± 0.1 [FS] is set to up date Γ l in its next iteration according to the SPL obtained. Chapter 5. System Characterization 138 5.3 System Characterization The Iceb erg auralization metho d in a four-loudsp eak er system (minimum re- quired) was ev aluated for its capabilities to repro duce the intended reverbera- tion time and the appropriate binaural cues. This section describ es the system setup, and the conditions experimented with utilizing the Iceberg metho d. The metho d’s accuracy at the optimal and sub-optimal p ositions was considered in this c haracterization as w ell the impact of the R T. F urthermore, placing a second listener inside the ring was in vestigated to supp ort a more ecolog- ical situation. By the end, a complementary study for those conditions was conducted with an aided mannequin to supplement the ob jective data as the pandemic preven ted sub jective data collection. The presen t study used the IT A-T o olb ox [ 29 ] for signal acquisition and pro cess- ing. T o further enhanc e the accuracy of the lo calization estimates, a MA TLAB implemen tation of the May and Kohlrausch [ 182 ] lo calization mo del from the Auditory Mo deling T o olb o x (AMT, https://www.am to olb o x.org ) [ 287 ]) was also emplo y ed. The May mo del is sp ecifically designed to b e robust against the detrimental effects of reverberation on lo calization p erformance, making it an ideal choice for supplementing the ob jective data gathered in the present study . The reverberation, or the p ersistence of sound after its initial source has ceased, w as a parameter in this test that could significantly distort the estimated lo cation of a sound source. The Ma y mo del accounts for reverbera- tion’s influence through frequency-dep endent time dela y parameters, enabling more accurate lo calization estimates in reverberant en vironments. By incorpo- rating the mo del in our analysis, we supplemen ted the ob jective data gathered through signal pro cessing with an additional lay er of mo deling that allow ed a relativ e comparison with previous studies. The main ob jective of an auralization metho d and its virtualization setup is Chapter 5. System Characterization 139 to deliver appropriate spatial a w areness to human listeners. The natural step for this would b e to verify and v alidate the metho d. Unfortunately , special conditions were in place during the course of this study; due to CO VID-19 re- strictions, v alidation tests with participants w ere not feasible. The Section 5.5 extends the system verification and analysis to a targeted application of hearing aid research. Although it do es not replace a sub jective impression v alidation and analysis, it can help understand and predict the system’s b ehavior in a t ypical use case for hearing research, whic h is the user with hearing aids. 5.3.1 Exp erimen tal Setup The proposed metho d w as implemen ted, and the tests were conducted at Erik- sholm Research Centre in Denmark. The test environmen t was an anec hoic ro om (IAC Acoustics) with inner dimensions of 4.3 m × 3.4 m × 2.7 m. Sig- nals were routed through a sound card (MOTU PCIe-424) with a Firewire 440 connection to the MOTU Audio 24 I/O in terface and pla yed via loudsp eak- ers Genelec mo del 8030C (Genelec Oy , Iisalmi, Finland). The well-con trolled sound en vironment was appropriate for the assessment of small impairments in audio systems, although the acoustic prop erties of the ro om exceed the sound b o oths and ro oms commonly encountered in audiology clinics [ 316 ]. 5.3.2 Virtualized RIRs & BRIRs A set of 72 ro om impulse resp onses and 72 binaural ro om impulse resp onses w ere acquired through the system separated ov er 5 degrees angles around the cen ter p osition assuming x as lateral axis and y the fron t-back (mid-saggital) axis of a person inside the ring. Moreov er, the same n um b er of RIRs and BRIRs were measured at off-center p ositions. Chapter 5. System Characterization 140 The virtualized RIRs and BRIRs were acquired utilizing a logarithimc sweep signal (50-20000 Hz, 2.73 s (FFT Degree 18, Sample F requency 96 kHz)) [ 194 ] as input. The signal was auralized to eac h angle in the Iceb erg metho d for the same three spaces as in Chapter 4 : a classro om (9.46 m x 6.69 m x 3.00 m) with an ov erall Rev erb eration Time R T of 0.5 s, a restauran t dining area (12.19 m x 7.71 m x 2.80 m) with an ov erall R T of 1.1 s, and an anec hoic ro om (4.3 m x 3.4 m x 2.7 m) with an ideal o v erall R T of 0.0 s. All ro oms w ere acoustically simulated in ODEON soft w are V.12, that generated the am- bisonics RIRs represen ting eac h mentioned source-receptor configuration. The absorption co efficien ts of the ro om surfaces are listed in App endix E . The initial step to acquiring the RIR and BRIR w as to auralize the sweep file utilizing the Iceberg metho d to the desired p ositions (72 angles around the center) in three different ro om conditions. Then play it through the four loudsp eak ers p ositioned in the front (0 ° ), left (90 ° ), bac k (180 ° ) right (270 ° ) coun ter-clo c kwise angles. The auralized v ersion of the sweep should corresp ond to the signal pla y ed in the virtual environmen t as the reverberation added by the anechoic ro om is negligible. After that, the recorded file w as de-conv olved with a zero-padded v ersion of the ra w sweep (See Figure 5.11 ). The playbac k and recording utilized the maximum sampling rate supp orted on the AD/DA system (96,000 Hz) as the difference in time is in the µ s scale. Therefore, the step size in time pro vided in microseconds is given by step size = (1 / 96 , 000) ∗ 1 , 000 , 000 = 10 . 42 µ s. The created sw eep duration was 2.731 s (FFT Degree = 18). Chapter 5. System Characterization 141 Figure 5.11: BRIR/RIR ac quisition flowchart: Ic eb er g aur alization metho d. A manikin with artificial pinnae (HA TS mo del 4128-C; Br ¨ uel and Kjær) was used to record the binaural files. Also, a second listener was simulated dur- ing the tests with a different manikin (KEMAR; GRAS), (See Figure 5.12 ). The HA TS recordings w ere calibrated as describ ed in Section 3.2.3 follo wing Equations 3.1a and 3.1b . Figure 5.12: BRIR me asur ement setup: B&K HA TS and KEMAR p ositione d in- side the ane choic r o om. Chapter 5. System Characterization 142 5.3.3 Conditions The auralized files w ere then recorded under the follo wing conditions: • Optimal p osition (alone and cen tered) • Optimal p osition (cen tered) accompanied b y a second listener • Off cen ter p ositions alone The p ositions grid can b e visualized in Figure 5.13 . Figure 5.13: Me asur ement p ositions: Obtaine d thr ough virtualize d sound sour c es with Ic eb er g metho d (VBAP and A mbisonics) in a four-loudsp e aker setup. The most accurate p erformance is theoretically exp ected from optimal p osi- tion. These tec hniques provide virtualization assuming the receptor (listener) is in the cen ter of the loudsp eaker ring [ 65 , 241 ]. Adding a second listener in to the repro duction area and/or moving the primary listener aw a y from the cen ter can c hallenge the system’s ability to render the scene as in tended. The follo wing sections presents and discusses the system’s capabilities to repro- duce Iceb erg auralized files b y measuring the binaural cues and R T in differen t conditions. Chapter 5. System Characterization 143 5.3.4 Rev erb eration Time A ro om’s characteristic wa ve field pattern can affect the human p erception of a repro duced sound. Ro om acoustics can alter attributes related to spatial p erception. F or example, a recorded sound has almost no c hance of b eing cor- rectly repro duced if the reproduction room has stronger rev erb eration than the recorded one. Also, rev erb eration ov ersho ot can smear the p erceiv ed direction of a sound source, as early reflections w ould b e heightened in this case [ 242 ]. The R T w as calculated from impulse resp onses measured within the three vir- tualized environmen ts (note that the simulated environmen ts were aimed to presen t R T of 0, 0.5, and 1.1 seconds). Reverberation time w as calculated using the IT A T o olb ox. The parameters w ere set as follows: F requency from 125 Hz to 16 kHz, one band p er o cta v e, and 20 dB threshold b elo w maximum. The rev erb eration time w as sho wn to b e stable in this virtualization setup. An appro x. 0.08 s Overall R T can b e observed for the anechoic simulation (0 s R T). That is most lik ely driven by the presence of the hardware inside the anec hoic ro om: loudsp eak ers and woo d base for the chair, although cov ered with foam. The ov erall reverberation time was measured without an omnidirectional sound source. T o circumv ent this limitation, the measuremen t w as rep eated utilizing all 24 loudsp eak ers as sound source, one at a time. The o v erall R T in this case w as considered as the maximum v alue across frequencies in o cta v e bands from 125 Hz to 16k Hz. Figure 5.14 presents the b o xplot of the measured v alues in relation to its p osition inside the ro om. Ro ws represent the aimed R T (0, 0.5 and 1.1 s). The top line presen ts results without lateral displacemen t, the middle line presents the results according to a lateral displacement of 2.5 cm from the center and the b ottom line presents the results according to a lateral displacemen t of 5 cm from the cen ter. Chapter 5. System Characterization 144 Figure 5.14: R everb er ation Time envir onments me asur e d with files pr o duc e d with Ic eb er g metho d and virtualize d in four-loudsp e akers. T able 5.1 presen t the median of the v alues of the ov erall R Ts. Therefore, it is p ossible to notice that virtualized environmen t R Ts’, tend to b e stable, and to the measured conditions, under the just noticeable difference JND of 5% [ 264 , 265 ] across p ositions inside the ro om. T able 5.1: Reverberation Time in three virtualized environmen ts in differen t p osi- tions inside the loudsp eak er ring. R T = 0 R T = 0.5 R T = 1.1 P osition [cm] Overall R T [s] x=0.0; y=0.0 0.085 0.519 1.114 x=0.0; y=2.5 0.085 0.519 1.111 x=0.0; y=5.0 0.085 0.526 1.113 x=0.0; y=10.0 0.084 0.526 1.147 x=2.5; y=0.0 0.085 0.531 1.120 x=2.5; y=2.5 0.086 0.529 1.114 x=2.5; y=5.0 0.084 0.559 1.148 x=2.5; y=10.0 0.083 0.546 1.157 x=5.0; y=0.0 0.085 0.537 1.139 x=5.0; y=2.5 0.085 0.538 1.138 x=5.0; y=5.0 0.085 0.548 1.138 x=5.0; y=10.0 0.084 0.552 1.147 Chapter 5. Main Results 145 5.4 Main Results This section presen ts the results based on the mannequin p ositions (center and off-cen ter) and conditions (HA TS and HA TS with KEMAR) to angles referenced clo c kwise. 5.4.1 Cen tered P osition 5.4.1.1 In teraural Time Difference The blue line in Figure 5.15 represents the result of the In teraural Time Differ- ence ITD filtered with a 1 kHz lo w-pass filter virtualized through the prop osed system. Figure 5.15: Inter aur al Time Differ enc e under 1 kHz as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er g aur alization metho d on a 4 loudsp e akers setup. The r e d line is the ITD r esults with r e al loudsp e akers (without virtualization). A c c or ding to the sample r ate, the blue and r e d shade d ar e as in ar e the c onfidenc e intervals. The black line r epr esents the analytic al ITD values. Chapter 5. Main Results 146 W ang and Bro wn [ 297 ] defined the analytical ITD (blac k line in figure) (See Equation 5.6 ) considering a centered, p erfect sphere of 10.5 cm of radius ( a ) and sound propagation v elo cit y ( c ) of 340 m/s, θ is the angle in radians. ITD = a c 2 sin( θ ) (5.6) The maximum absolute difference found is 170 µ s, representing a mismatch around 15 º on the given angle. The calculated a verage difference is 67 µ s, represen ting a difference of around 7 º in lo calization. Figure 5.16: Inter aur al Time Differ enc e at 1 kHz as a function of azimuth angle for a HA TS Br ¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr o- p ose d Ic eb er g metho d on a 4 loudsp e akers setup on thr e e differ ent r everb er ation time sc enarios. Three differen t simulated ro oms w ere measured utilizing files generated via Iceb erg metho d to a four-loudsp eak er setup, k eeping the listener in the center p osition. Figure 5.16 presents the acquired ITDs with Iceb erg metho d for R T = 0 s (blue), R T = 0.5 s (red), R T = 1.1 s (yello w) and the ITD for R T = 0 without using virtualization ( i.e., repro ducing through real loudsp eak ers) in Chapter 5. Main Results 147 blac k. There were no substantial differences across the differen t reverberation times virtualized. This was exp ected, though, as the direct sound drives the ITD. 5.4.1.2 In teraural Lev el Difference Figure 5.17 sho ws the ILDs (calculated following Equation 3.7 ) across o ctav e bands for the angles around the cen ter horizon tally spaced 30 º for b etter vi- sualization. The ILDs w ere most affected than the ITDs, with a substan tial reduction in ILD relative to the actual loudsp eakers observ ed in the 2 kHz band. The ILD v alues hav e a similar pattern and magnitude for a significan t part of the sp ectrum at most angles. Figure 5.17: Ic eb er g Inter aur al L evel Differ enc es as a function of o ctave-b and c en- ter fr e quencies; sep ar ate lines for angles of incidenc e. Figure 5.18 presen ts the ILD for b oth setups: real loudsp eakers and Iceb erg metho d in six o ctav e bands as a function of azimuth in spaces of 15 º . The top-righ t corner graph shows the 2 kHz band. It sho ws that apart from the p ositions where there is an actual loudsp eaker ( i.e., 0 º , 90 º , 180 º , and 270 º ), the differences are large, greater than 10 dB at some azimuth angles. Chapter 5. Main Results 148 Figure 5.18: Ic eb er g ILD as a function of azimuth angle. Listener alone in the c enter. Figure 5.19 shows the absolute difference in ILD b etw een physical loudsp eakers and virtual loudsp eakers created b y the Iceb erg metho d. Figure 5.19: Ic eb er g metho d: A bsolute ILD Differ enc es as a function of azimuth angle. T u [ 291 ] measured just-noticeable differences (JNDs) in ILDs for normal- hearing participants using pure tones at different presentation lev els. These JNDs can be used to estimate the p erception of differences b et ween ILDs Chapter 5. Main Results 149 obtained with physical loudsp eak ers and the Iceb erg auralization setup to an- alyze if the ILD difference b et w een setups will b e p erceiv ed in a given sp ecific frequency band. Figure 5.20 presen ts the v alues from Figure 5.19 min us the appropriate pure-tone ILD JND v alues. That means that p ositive v alues that exceeded the JND could b e p erceived not as in tended; that is, a p erceptible ILD deviation can cause spatial distortion [ 38 ]. The 2-kHz ILDs show up to 8 º div ergence across most angles. ILDs in other frequency bands (1, 2, 8, and 16 kHz) also presen ted v alues that could relate to noticeable differences (up to 4 dB), but those are mostly limited to frontal ± 30 ° angles. The 2 kHz mismatc h can b e considered a fla w in the repro duction system. The effect on sound lo calization and sub jectiv e impression of complex sounds in v olving these frequencies needs further in vestigation as to the scale of spatial distortion. As the ITDs and ILDs at other frequencies w ere relativ ely w ell preserv ed in the auralized system with only real loudsp eak ers, it is p ossible this flaw at 2 kHz may ha v e a minimal effect, esp ecially for low er frequency stim uli. System reliabilit y should b e verified first for stimuli with p eak energy in the 2 kHz band or tasks requiring greater lo calization accuracy (e.g., with sound sources within ± 30 ° ). Figure 5.20: Ic eb er g: Absolute ILD Differ enc es over JND as a function of azimuth angles ar ound the c entr al p oint). Chapter 5. Main Results 150 5.4.1.3 Azim uth Estimation The fron tal azim uth angle was estimated using the binaural mo del by May and Kohlrausc h [ 182 ]. Each BRIR w as con v olved with a pink noise with a duration of 2.9 s as input into the mo del. The mean of the azim uth of each file is stated as azimuth predicted by the mo del. Figure 5.21 presents the angles estimated with the Ma y and Kohlrausch mo del for files auralized with the Iceb erg metho d for an anechoic ro om and virtualized o ver the four-loudspeaker setup (blue curv e), angles estimated for binaural files acquired without virtualization with real loudsp eak ers (red curv e), and the reference (dotted black). Figure 5.21: Ic eb er g metho d: Estimate d azimuth angle (mo del by May and Kohlr ausch [ 182 ]), HA TS c enter e d, and R T = 0s. The mo del’s results are in line with the analysis of the binaural cues supporting the assumption of the w orst lo calization accuracy around ± 30 º difference (30 º and 330 º in Figure 5.21 ). Also, the virtualized sound tends to hav e more difficult y separating from the frontal angle (0 º ). Chapter 5. Main Results 151 5.4.2 Off-Cen ter P ositions Mo ving the primary listener off-center (displaced b oth on the x and y axes) is prop osed to measure the impact of a p erson’s head (and b ody) not b eing cen tered – such as when not fixated – on the system’s ability to render the appropriate binaural cues. 5.4.2.1 In teraural Time Difference Figure 5.22 presen ts 72 measured ITDs around the listener (5 º spacing) in four differen t placemen ts: at center and displaced forw ards (y-axis) 2.5, 5 and 10 cm. When displaced from the cen ter p osition, the Iceb erg metho d can cop e with deliv ering a reasonably interaural time difference in frontal displacemen ts up to 5 cm or considering a sim ultaneous misplacemen t lateral and frontal up to 2.5 cm. How ever, compared to the center p osition, the error increased dramatically with 10 cm displacement for frontal angles (around ± 45) up to 400 µ s compared to the listener in the center. Lateral displacement p ositions (2.5 and 5.0 cm) were also inv estigated. The ITD results for these displacemen ts presented the same trend as seen without lateral displacement. Similar results w ere found when virtualizing the scenes with a rev erb eration time of 0.5 and 1.1 seconds. All combination results are presen ted in Figure 5.23 to improv e readabilit y . Chapter 5. Main Results 152 Figure 5.22: Ic eb er g ITD as a function of fr ontal displac ement: Center e d listener in pr op ose d Ic eb er g metho d in a four-loudsp e aker setup. Figure 5.23: ITD Ic eb er g virtualize d setup: Listener displac ement: listener p osition 2.5 cm off-c enter in pr op ose d Ic eb er g metho d in a four-loudsp e aker setup. ITDs were affected with frontal displacements dep ending on the amount of rev erb eration sim ulated. In the simulated dry condition the squared b eha v- ior is presen t with 5 cm off cen ter, with mild rev erb eration the effect only Chapter 5. Main Results 153 app ears with a displacemen t of 10 cm and the largest reverberation tested demonstrated the problem to virtualize sources in all off center p ositions. The deviation is centered to ± 45 º in all conditions. Lateral mov ements were ev en more affected, as expected, delivering ITDs based on loudsp eak er position (the squared shap e) and not by circular placemen t of virtualized sound sources on displacemen ts further than 3.5 cm from the center (combining the lateral and fron tal mov ements.) 5.4.2.2 In teraural Lev el Difference Figure 5.24 presents the difference b et w een the ILDs measured in the center and the ILDs measured in different p ositions for a dry ro om simulation (R T = 0 s). The lateral displacemen t (x-axis) is ordered as ro ws (top ro w = center, middle row = 2.5 cm and b ottom row = 5 cm to the righ t). The four columns are related to the fron tal displacemen t (y-axis) of 0 (cen ter), 2.5, 5, and 10 cm. Note that these are additional ILD errors to the previously discussed errors in tro duced b y the sim ulation itself (with the listener at cen ter). Figure 5.24: Differ enc e in ILD as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er g aur al- ization metho d on a four-loudsp e akers setup (R T = 0.0 s). Chapter 5. Main Results 154 The ILDs are affected by listener displacement mostly in the mid frequencies and only at certain angles. Lateral displacement of 2.5 cm pro duces larger in terference (up to 8 dB) for the left angles 40 º and 130 º . In con trast, at other angles, ILD differences are lo wer than 3 dB. The 5 cm displacement also presents differences up to 15 dB at these same angles and up to 8 dB differences contralaterally (220 º and 320 º ). F rontal displacement follo ws a similar pattern with more differences at some of the rear angles (130 º and 220 º ). These particular differences indicate relativ ely lo w impact on ILD cues using the Iceb erg metho d in the simulated anechoic ro om (R T = 0 s). Similar results in terms of affected angles w ere found analyzing ILDs for the same listener p ositions for simulated ro oms with R T = 0.5 s Figure 5.25 and R T = 1.1 s Figure 5.26 . These conditions are closer to ev eryday situations. The increased energy of the late reflections results in lesser magnitude differences in ILD, indicating sligh tly b etter p erformance for more realistic simulations. Figure 5.25: Differ enc e in ILD as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er g metho d on a four-loudsp e akers setup R T = 0.5 s. Chapter 5. Main Results 155 Figure 5.26: Differ enc e in ILD as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er g metho d on a four-loudsp e akers setup R T = 1.1 s. 5.4.2.3 Azim uth Estimation Using again the May et al . mo del, in the same setup as in Section 5.4.1.3 to predict the lo calization of a sound source, Figure 5.27 presents the predicted source lo cations when moving the listener along the grid p ositions men tioned (x = 0, 2.5 and 5 cm; y = 0, 2.5, 5 and 10 cm).The different R Ts are represen ted b y the line colors in the graphs (blue = 0.0s, red = 0.5s, yello w = 1.1s). The results indicate the system’s spatial sound accuracy is dep endent of the listener p osition. On the other hand the error is not dep enden t on the reverberation time. Lateral mov ements increase the error to the side that is getting closer to the ear, while lessened on the con tra-lateral side. F rontal mo v ements increased the n umber of angles that are not delivering the correct source angle (longer straigh t horizontal line around zero). The mo del estimates an maximum error up to ≈ 30 º to a listener within 3.5 cm from the center (combining Lateral anf fron tal displacement). Chapter 5. Main Results 156 Figure 5.27: Estimate d (mo del by May and Kohlr ausch [ 182 ]) fr ontal azimuth angle at differ ent p ositions inside the loudsp e aker ring as function of the tar get angle. The errors when comparing the angles estimated on displaced p ositions to the estimated to the cen ter p osition are lessened with the incremen t of rev erb era- tion to the ma jorit y of the angles. 5.4.3 Cen tered Accompanied b y a Second Listener The Binaural cues w ere inv estigated, adding a second listener to the scene and maintaining the first in the center (sw eet sp ot). The second listener w as p ositioned in three differen t lateral (x-axis) distances on the left from the cen ter: • 50 cm (simulating shoulder to shoulder). • 75 cm. • 100 cm. Chapter 5. Main Results 157 5.4.3.1 In teraural Time Difference The upp er row of Figure 5.28 shows the ITDs considering the setup with the HA TS alone at the center (blue line) and also with the presence of a second listener p ositioned at the righ t side at three different distances from the center and the reference. The ITDs in black were computed with no virtualization as a reference. Figure 5.28: ITDs and absolute ITD differ enc es as a function of angle for multiple c onfigur ations with (c olor e d lines) and without a se c ond listener (black line). There w as a small difference ( ≈ 15 µ s) as the second listener is placed at the closest p osition (50 cm) considering rear and right angles. The absolute difference has a maximum of 201 µ s, equiv alent to approximated 15 ◦ in the source p osition (see b ottom ro w of Figure 5.28 ). 5.4.3.2 In teraural Lev el Difference Figure 5.29 presents the difference (∆ ILD) b et ween the ILD computed from the BRIRs collected with and without a second listener inside the ring. The Chapter 5. Main Results 158 panel rows top to b ottom show ∆ ILDs for simulated rooms with R Ts of 0, 0.5 and 1.1 s. The columns represen t the differen t distances b et ween the centred and second listener, from 50 to 100 cm, left to right. Figure 5.29: Inter aur al level differ enc es aver age d o ctave b ands as a function of azimuth angle for a HA TS Br ¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er g metho d on a four-loudsp e akers setup. The results sho w that adding a second listener impacts the ILD for the angles shado w ed by the second listener. The effect is more pronounced in the higher o cta v e bands (8 kHz and 16 kHz), reac hing appro x. 14 dB, especially at the closest and farthest distances (50 cm and 100 cm). Although there is less impact of ha ving a second listener at an intermediary distance (appro x. 9 dB), it still pro duces noticeable ILD c hanges in the 4 kHz band. The ∆ ILD pro duced b y the presence of a second listener is exp ected as a result of natural acoustic shadowing. The analysis of ILD around the listener is also imp ortan t in the hemifield opp osite to the second listener ( i.e ., 180-360 ° ). The auralization metho ds that rely on the full set of loudspeakers to form the sound pressure ( e.g. , Ambisonics) can presen t noise to the side with a free path as a ph ysical ob ject, prev en ting the sound w a v e from forming accordingly at the Chapter 5. Main Results 159 cen ter (sweet sp ot). Although the Iceb erg metho d is partially comp osed by first-order Ambisonics, whic h is a metho d that requires all loudsp eak ers combined to form the ap- propriate auralization, the part that VBAP p erforms presen ts the sound only through the indicated quadrant, not requiring the other loudsp eakers to b e activ e. That extends the system’s robustness with a limited n um b er of loud- sp eak ers and frequency limit (not b eing dep endent on the Ambisonics order). 5.4.3.3 Azim uth Estimation Figure 5.30 depicts the fron tal azimuth angles estimated by the May et al . for pink noise inputs of 2.9 seconds. The pink noise is conv olved to the recorded BRIRs. The BRIRs, by its time, w ere recorded utilizing files generated b y the Iceb erg auralization metho d in a four loudsp eak er setup. The columns in the top row present graphs with the av erage estimated angle to a centered listener accompanied b y a second lister at 50 cm (light blue curve), 75 cm (red curv e), 100 cm (y ello w curve) according to room simulation (denoted b y reverberation time). The shaded area corresp onds to the standard deviation. The top- left graph also presen ts the estimated angles to the real loudsp eaker condition without virtualization (blue dotted line). Finally , the b ottom ro w graphs sho w the differences b etw een the estimated azim uth angle and the target angle (the estimated error). Chapter 5. Main Results 160 Figure 5.30: T op line = Estimate d lo c alization err or with pr esenc e of a se c ond listener; b ottom line = differ enc e to r efer enc e. Columns depicts differ ent R Ts, line c olors differ ent se c ond listener p ositions. According to the mo del’s results, this difference reveals that the sound created via the Iceb erg metho d and virtualized via a four-loudsp eaker setup gives consisten t lo calization cues even with a side listener inside the ring in the describ ed p ositions. The median error is 9.9 º , and the standard deviation is 8.8 º . On the other hand, the distribution of these differences made clear that the setup of four loudsp eakers has more difficulty accurately presen ting the lo calization cues b et ween the frontal loudsp eak ers, reaching up to 27 º of mismatching to these p ositions (45 º &315 º ). This result is in line with sim ulations from Grimm et al. [ 97 ]. The v alues obtained with the low n um b er of sp eakers utilized (four) are in line with the exp ected in the literature [ 97 ] with a equiv alent pattern [ 84 ]. Although it can raise a flag for exp erimen ts needing a more precise lo calization represen- tation, the Iceb erg metho d can improv e simple setups’ realism. Accordingly , it needs to b e thoroughly inv estigated considering sub jectiv e listening tests, esp ecially in the lateral angles. Chapter 5. Main Results 161 Figure 5.31 presen ts the absolute differences b et ween the estimated angles of arriv al for the cen tered p osition alone and the cen tered accompanied by a second listener at 50 cm, 75 cm, and 100 cm in the three sim ulated rooms tested. These differences reflect the estimated influence of having the second listener inside the ring. Figure 5.31: A bsolute differ enc e to tar get in estimate d lo c alization c onsidering the pr esenc e of a se c ond listener and the r everb er ation time. The av erage error presen ts a sligh t increase with the proximit y of the second listener. How ever, the effect is less perceiv able in mo derate R T. That fact suggests that the acoustic shadow is presen t. An ANOV A analysis of the v ariance of the estimated absolute errors b et ween the R T and KEMAR p osition groups was prop osed. F or the distribution with 8 degrees of freedom and a n um b er of observ ations equal to 30, the tabulated v alue of the F Distribution of Snedecor on ( p =0.05) is equal to 2.26. Thus, v alues greater than the tabulated F accept the null h yp othesis that there is no significan t difference b et ween the means of the absolute error of the groups of angles H 0 : µ i = µ j . F rom the analysis, the results of the F statistic (presen ted in T able 5.2 ) H 0 is accepted in all groups. Chapter 5. Main Results 162 T able 5.2: One wa y anov a, columns are absolute difference b et w een estimated and reference angles for different KEMAR p ositions and R Ts. Source SS df MS F Prob F Columns 746.5 8 93.3142 1.3755 0.2062 Error 21980.9 324 67.8423 T otal 22727.4 332 Therefore, there is no statistical difference b et ween the KEMAR p ositions for an y of the ev aluated R Ts. That suggests the metho d’s stability in this setup, ev en with a second listener considering the rev erb erations and p ositions tested. The mo del has difficulty estimating the extreme lateral lo cations (90 º and 270), as ev en the actual loudsp eakers could not reac h this estimation. A comparison b et w een the estimated angles with a listener alone in the center p osition ac- quired only with actual loudsp eak ers (without virtualization) and the listener in the center accompanied by a second listener acquired from virtualized files with the Iceb erg mo del is presented in Figure 5.32 . Figure 5.32: Estimate d err or to R T=0 c onsidering the estimation of r e al loud- sp e akers as b asis. The data analyzed in this section suggests that b y observing the indicated p o- Chapter 5. Supplemen tary T est Results 163 sitions where the estimated difference can b e significant, although comparable to similar methods listed in T able 2.2 experiments with equiv alent require- men ts (e.g., Chapter 4 ) can b enefit from applying the Iceb erg metho d. The metho d will fairly repro duce sounds with the presence of a second listener and increase the sense of immersiveness while repro ducing spatialized sound with only four loudsp eak ers. Sub jective tests are needed to inv estigate further the system’s spatial rendering p erformance. 5.5 Supplemen tary T est Results A concern about virtualization pro cesses is ho w reliable they are when an extra la y er of signal pro cessing is added to the exp erimen tal setup [ 97 , 213 ]. That is, ho w the sound acquisition through a hearing device microphone and its signal pro cessing would b e affected b y virtualization relativ e to a simple loudsp eak er repro duction or the real life situation [ 98 ]. This section describes a comparison of the binaural cues with and without hearing aids. The RIRs were collected in the same p ositions as presen ted in Section 5.4.1 and Section 5.4.2 . F urther, inspired in the study of Simon et al. [ 276 ] the robustness of the virtualization setup outside the sw eet sp ot was ev aluated. Oticon Opn S1 Mini RITE hearing aids with op en domes were coupled to eac h ear of the HA TS manikin (See Figure 5.33 ). Mo dern hearing devices lik e these presen t a series of signal pro cessing features that can affect the analysis dep ending on the brand or mo del. In order to ensure compatibility of the results to other devices, sp ecific features were not enabled. The devices were programmed in the fitting softw are to comp ensate for the hearing loss of the N3 mo derate standard audiogram [ 35 ]; its b eamformer sensitivity w as set to omnidirectional, and the noise reduction set to off. The hearing level of the audiogram is presen ted in T able 5.3 . The op ened domes were c hosen to set the Chapter 5. Supplemen tary T est Results 164 virtualization system’s most difficult signal mix condition. The signal play ed through the system is not atten uated as the ear is not o ccluded. The amplified signal from the hearing device is receiv ed at the eardrum (microphone) 8.1 ms after the original signal. Figure 5.33: HA TS we aring the Otic on Op en-S 1 Mini RITE. T able 5.3: Hearing Lev el in dB according to the prop osed Standard Audiograms for the Flat and Mo derately sloping group [ 35 ]. N º Category F requency 250 375 500 750 1000 1500 2000 3000 4000 6000 N3 Mo derate 35 35 35 35 40 45 50 55 60 65 5.5.1 Cen tered P osition (Aided) The system was tested b y measuring the BRIR with the listener (manikin) in the cen ter. The calculated binaural cues are presen ted at incidence angles separated by 15 º at 1.35 m from the listener. Chapter 5. Supplemen tary T est Results 165 5.5.1.1 In teraural Time Difference The ITD results (See Figure 5.34 ) in the aided condition w ere very similar to the unaided condition 5.4.1.1 .The maxim um absolute difference found is 170 µ s, represen ting a mismatch around 15 º on the giv en angle (same as previously measured unaided ITD difference). The calculated av erage difference is 67 µ s, represen ting a difference of around 7 º in lo calization. Figure 5.34: Inter aur al Time Differ enc e under 1 kHz at the pr op ose d Ic eb er g metho d as a function of azimuth angle for a HA TS Br ¨ uel and Kjær TYPE 4128-C in the horizontal plane we aring a p air of he aring aids in omnidir e ctional mo de (blue line). The r e d line is the ITD r esults with r e al loudsp e akers (without virtualization). The black line r epr esents the analytic al ITD values. Figure 5.34 depicts higher differences concentrated on sp ecific regions: angles around ± 30 º to front and back. The similarit y to the unaided condition is exp ected as the devices are not blo cking the sound w a ve or increasing the path more in one ear than the other ( i.e. , there is only a static group dela y added to the system). Therefore the sound reaches the HA TS microphones with hearing aids prop ortionally as in the previous unaided condition. Chapter 5. Supplemen tary T est Results 166 5.5.1.2 In teraural Lev el Difference Figure 5.35 shows the effect on the ILD to the centered p osition. Although in higher o cta v e bands (8 and 16 kHz) the difference b et ween ILD on an aided HA TS with real loudsp eakers to the aided HA TS utilizing the Iceb erg metho d is a bit larger than unaided (See Figure 5.18 ), the effect on the 2 kHz band is considerably smaller. That can b e due to the added delay in the signal, which can diminish the p ossible comb filtering by the Iceb erg metho d at this sp ecific frequency region, esp ecially for the angles b et ween tw o loudsp eak ers (where there is a larger distance b et ween real loudsp eakers and the virtualized sound source). Figure 5.35: Inter aur al L evel Differ enc es as a function of o ctave-b and c enter fr e- quencies. Angles ar ound the c entr al p oint. 5.5.1.3 Azim uth Estimation Figure 5.36 presen ts the angles estimated using the May and Kohlrausch’s [ 182 ] mo del for files auralized with the Iceb erg method and virtualized o v er the four- loudsp eak ers setup (blue curve), angles estimated for binaural files acquired Chapter 5. Supplemen tary T est Results 167 without virtualization with real loudsp eak ers (red curve), and the reference (dotted blac k). The presen ted mo del’s result is in line with the analysis of the binaural cues supporting the assumption of the worst lo calization accuracy around ± 30 º (30 º and 330 º in Figure 5.21 ). Some differences bigger than the standard deviation are noted b etw een different R T, esp ecially close to the lateral angles (90 º and 270 º ). The results suggest that the added reverberation can negatively impact on lo calization accuracy . Figure 5.36: Ic eb er g metho d: Estimate d azimuth angle (mo del by May and Kohlr ausch [ 182 ]), HA TS c enter e d and aide d. Also, according to the figure, the virtualized sound tends to ha v e more difficult y separating from the frontal angle (0 º ) denoted by the flat lines from 30 º up to 340 º . Figure 5.37 depicts the b o xplot diagram of the absolute differences group ed b y R T. An ANOV A analysis of the v ariance of the estimated absolute errors b et ween the R T and p osition groups was prop osed. F or the distribution with 2 degrees of freedom and a num b er of observ ations equal to 30, the tabulated v alue of the F Distribution of Snedecor on ( p =0.05) is equal to 3.32. Chapter 5. Supplemen tary T est Results 168 Figure 5.37: Absolute differ enc e to tar get in estimate d lo c alization in aide d c ondi- tion in aide d c ondition c onsidering differ ent R Ts. Th us, v alues greater than the tabulated F den y the null hypothesis that there is no significant difference b etw een the means of the absolute error of the groups of angles H 0 : µ i = µ j . F rom the analysis, the results of the F statistic (presen ted in T able 5.4 H 0 is rejected and the h yp othesis alternative H 1 : µ i ̸ = µ j is accepted ( F =5.68). T able 5.4: One wa y anov a, columns are absolute difference b et w een estimated and reference angles for different positions and R Ts. Source SS df MS F Prob F Columns 520.77 2 260.386 5.68 0.0045 Error 4947.29 108 45.808 T otal 5468.06 110 T o identify in which sets of means the discrepancy is statistically significan t, T ukey’s multiple comparison test was p erformed and the result is shown in Figure 5.38 . Chapter 5. Supplemen tary T est Results 169 Figure 5.38: T ukey test to c omp ar e me ans in aide d c ondition. Gr oup me an in R T 1.1s pr esente d signific ant differ enc e fr om me an in gr oup R T 0.0 s This reflected a trend tow ards an increase in the estimated lo cation error when there is signal amplification through the hearing aid, which did not o ccur in the similar condition without the aid seen in Section 5.4.1.3 . 5.5.2 Off-cen ter P ositions (Aided) The listener was mo v ed from the center p osition to simulate a displaced test participan t w earing hearing aids. The BRIRs w ere measured in the p ositions describ ed in Section 5.3.3 , and the results w ere analyzed in this section. 5.5.2.1 In teraural Time Difference Figure 5.39 presen ts the ITD for the different angles around the listener as the listener is displaced to different p ositions according to the sp ecified grid. Chapter 5. Supplemen tary T est Results 170 Figure 5.39: Inter aur al Time Differ enc es as a function of o ctave-b and c enter fr e- quencies. Angles ar ound the c entr al p oint. when it mo v es 5 cm to fron t it starts to blur more the correct ITD for the fron tal angles. Especially around ± 45 degrees in the fron tal hemisphere where the ITD indicates that the sound is coming from 90 º or 0 º angles. F urther than this distance, also the rear ± 45 are affected, p oin ting to the break of the panning illusion. Compared to the unaided condition (Section 5.4.2.1 ), this condition is slightly more sensitiv e to displacemen ts Although the ITD analysis is angle-dep enden t, the results in the T able 5.5 indicates that the displacement limitations can b e o v erall mapp ed to indi- cate the maximum distance. T able 5.5 sho ws the maxim um ITD difference according the displacement. Although the ITD analysis is angle-dep endent, the results The maxim um v alue difference can indicate the tendency of the ITD shap e to b e squared, representing no virtualization. That ma y help to iden tify displacemen t limitations can b e ov erall mapp ed to indicate the maxi- m um distance. The squared b eha vior o ccurs when the sound of one individual sp eak er is the main pressure con tribution, arriving to o early to one of HA TS Chapter 5. Supplemen tary T est Results 171 ears b ecause of the HA TS’s p osition. T able 5.5: Maximum ∆ITD relativ e to the center p osition according to displacement, lines refer to lateral displacement and columns refer to fron tal displacement. R T = 0.0 s Displacemen t [cm] 0.0 2.5 5.0 10.0 0.0 0 [ µ s] 88 [ µ s] 182 [ µ s] 374 [ µ s] 2.5 233 [ µ s] 239 [ µ s] 364 [ µ s] 472 [ µ s] 5 317 [ µ s] 353 [ µ s] 399[ µ s] 566 [ µ s] R T = 0.5 s Displacemen t [cm] 0.0 2.5 5.0 10.0 0.0 0 [ µ s] 97 [ µ s] 229 [ µ s] 386 [ µ s] 2.5 213 [ µ s] 157 [ µ s] 313 [ µ s] 472 [ µ s] 5 317 [ µ s] 282 [ µ s] 389 [ µ s] 566 [ µ s] R T = 1.1 s Displacemen t [cm] 0.0 2.5 5.0 10.0 0.0 0 [ µ s] 140 [ µ s] 299 [ µ s] 341 [ µ s] 2.5 236 [ µ s] 310 [ µ s] 372 [ µ s] 437 [ µ s] 5 283 [ µ s] 372 [ µ s] 380 [ µ s] 520 [ µ s] In this case frontal displacemen ts up to 2.5 centimeters are not presenting the square b eha vior and a maxim um ∆ITD, of 140 µ s (R T= 1.1 s), considering the centered p osition as a reference. Lateral mo v emen ts are more affected, starting to present the squared b eha vior in the transition angles b etw een the rear loudsp eak er and the right angle (230 º ) and the right loudsp eaker and the fron t (310 º ). This pattern seems not b e R T dep endent, whic h is exp ected due to ITD’s nature. 5.5.2.2 In teraural Lev el Difference Figure 5.40 presen ts the ILD, considering the simulation of an anechoic envi- ronmen t (R T = 0s), on 24 angles around the listener as the listener is displaced to different p ositions according to the sp ecified grid. Compared to the normal condition, although it presen ts the same pattern, the aided condition has lesser differences b et w een ILDs across more angles and frequencies. Chapter 5. Supplemen tary T est Results 172 Figure 5.40: Differ enc e in ILD as a function of azimuth angle for a B & K 4128-C. Ic eb er g metho d, horizontal plane in a 4 loudsp e akers setup (R T = 0.0 s). The differences w ere also lessened as the R T increased, as can b e seen in Figures 5.41 and 5.42 . This result shows that increasing the reverberation can p ositiv ely affect the ILD error in off cen ter p ositions (reducing the differences to the ILD in the center). Figure 5.41: Differ enc e in ILD as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er g metho d on a four-loudsp e aker setup R T = 0.5 s. Chapter 5. Supplemen tary T est Results 173 Figure 5.42: Differ enc e in ILD as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er g metho d on a four-loudsp e akers setup R T = 1.1 s. Chapter 5. Supplemen tary T est Results 174 5.5.2.3 Azim uth Estimation Figure 5.43 presents the estimated azim uth angle [ 182 ] according to the p osi- tion of the listener. The different R Ts are represented by the line colors in the graphs (blue = 0.0s, red = 0.5s, y ello w = 1.1s). The results demonstrate that the Iceb erg metho d presen t less accuracy in the repro ducing sound in fron tal angles, esp ecially ± 30 ◦ and ± 330 ◦ . The lateral discrepancy is smaller and also noted with real loudsp eak ers, what can imply that the mo del used has some difficult y to assess that region. Figure 5.43: Estimate d fr ontal azimuth angle at differ ent p ositions inside the loud- sp e aker ring as function of the tar get angle (aide d c ondition). Mo del by May and Kohlr ausch [ 182 ]. According to the mo del’s results, to an aided listener, the lo calization error is up to 30 degrees within a frontal or lateral displacement of 5 centimeters. In the case of 10 cm of displacemen t, the virtualization will fail, presen ting the squared b eha vior on the contralateral side. The increase of reverberation tends to main tain the maxim um error magnitude, although increasing the spread to more angles. That means the lateral side close to the loudsp eak er will presen t the sound source p osition less in the desired p osition but more in Chapter 5. Supplemen tary T est Results 175 the loudsp eak er’s physical p osition. Medium reverberations are less affected b y displacemen t, meaning extreme cases should drive extra care with listener p ositioning. Chapter 5. Discussion 176 5.6 Discussion This c hapter prop osed a new h ybrid auralization metho d (Iceb erg) for a vir- tualization setup comp osed of 4 loudsp eak ers at a 1.35 m distance from the cen ter. This setup is a relatively limited one intended as a feasible alternative to the m uc h more exp ensiv e and complicated arrangemen ts prop osed and used in the reviewed literature (See Section 2.3 ). The inno v ation factor of the Iceb erg metho d is the usage of a ro om acoustic parameter, called center time, used to compute the transition p oin t b et ween early and late reflections. The Iceb erg’s c hannel mixing and distribution au- tomation are generalized to any RIR collected or con verted in Am bisonics’ first order. Implemen ted in MA TLAB, the Iceb erg auralization algorithm can generate .w a v files that were virtualized in a setup with four loudsp eak ers (90-degree spacing around the listener). Three sim ulated sound scenarios were predefined and sim ulated using acoustic mo deling softw are generating RIRs in Ambisonics format. The setup provided appropriate rev erb eration times even when the listener was aw ay from the cen ter p osition. Regarding binaural cues, in the optimal p osition, the maximum deviation in ITD was 170 µ s, corresp onding to a shift of approximately 15 º for sources around ± 30 º in fron t and back of the centered listener. The considerable distance b etw een loudsp eak ers is the most likely cause of this deviation. In con trast with Chapter 3 , the Iceb erg metho d could not repro duce the ITDs with the same accuracy as VBAP in the sw eet sp ot p osition. How ever, it presen ted a b etter performance than Am bisonics. The high accuracy of VBAP can b e attributed to the n um b er of loudsp eakers, 24, which lessen their ph ysical distance, therefore, its maximum error. How ever, ev en with the bigger n umber Chapter 5. Discussion 177 of loudsp eakers, Am bisonics was truncated at first-order, thus not having the b enefit of more sound sources. There w ere also deviations in ILD, mainly at the same angles. Ho w ever, the ILD deviations were most significant in the 2 kHz o cta v e band. The actual effects of this difference in signals that encompass these frequencies should b e further inv estigated in v alidation tests. The ILDs also denoted patterns with b etter representation through VBAP than first-order Ambisonics in Sec- tion 3.3.2.2 . The Iceb erg metho d with four loudsp eakers presents a pattern with ILDs closer to actual loudspeakers than pure Am bisonics, but again not as accurate as VBAP in 24 channels. This is characterized mainly to a difference in the 2 kHz o cta ve band. That needs further in v estigation and consideration in exp erimen ts requiring ILDs accuracy at that frequency band. Ov erall the results for the binaural cues repro duction via the Iceb erg metho d in four loudsp eak ers are b etter than a pure Ambisonics first-order but worse than VBAP (considering 24 loudsp eakers). Therefore the Iceb erg metho d can b e considered an option when the num b er of loudsp eakers is limited or the need for a sense of realism is higher. The Iceberg metho d com bines relativ e accuracy with a sense of immersion. The maxim um estimated lo calization uncertaint y w as around 30 degrees to the Iceb erg metho d in the minimal configuration of four loudsp eak ers. The different amoun ts of reverberation tested did not impact the results. Although the estimated lo calization was imp erfect, the metho d’s p erformance was in line with similar VBAP implemen tations [ 97 ]. The results w ere similar to the aided condition, with ILDs indicating b etter cue repro duction in the 2 kHz o cta ve band. This improv ement w as not trans- lated in to a b etter-estimated angle, getting ab out the same results. A slight v ariation w as iden tified, and a statistically significant difference was found b e- t w een differen t R Ts, esp ecially in the lateral angles. This deviation needs to b e further ev aluated with other mo dels and also with sub jective v alidation, Chapter 5. Discussion 178 esp ecially as the mo del results presented unexpected results in these angles for non virtualized sound sources. A second listener was introduced at the side of the primary listener while main taining the listener in the optimal p osition to sim ulate a condition where there is a need for so cial interaction or presence in a test. In this case, the binaural cues pro vided b y the Iceberg auralization metho d virtualized in a four loudsp eak ers setup were compared to a baseline without virtualization (actual loudsp eak ers). Also, the mo del of Ma y and Kohlrausch [ 182 ] was applied to predict fron tal lo calization accuracy . Three distances w ere tested with the three sim ulated ro oms (different R Ts). There was the exp ected acoustical shado w at angles blo c k ed by the second listener but not to the remaining sound source lo cations around the listener. That can b e considered a measure of rendering robustness; the second listener did not break the virtualization of binaural cues by scram bling the sound pressure summation. Regarding sub-optimal p ositions to unaided HA TS, the Section 5.4.2 presented surprising results. The the virtualized effect was affected differen tly to the dis- placed HA TS according to the amount of R T. In the dry R T (0 s) displacemen ts up to 5 cm when mo ving forw ard did not present the undesired effect. The mild rev erb eration (0.5 s) got the undesired effect only with 10 cm from the cen ter and the large R T up to 5. The large rev erb eration (1.1 s) w as hea vily af- fected presenting the squared b eha vior on all off center displacemen ts. Lateral mo v emen ts w ere affected in similar wa y for all the R Ts tested, presen ting the effect on displacemen ts further than 3.5 cm from center. The ILDs presented the shadowing effect as exp ected, increasing the distortions with the distance and reducing it with the increase of rev erb eration. The combination came to an estimated azim uth angle in practice not dep enden t of R T and an error off ≈ 30 º with displacemen ts up to 2.5 cm. As the displacement from cen ter in- creases the maximum estimated error increases but also mo ved, meaning that Chapter 5. Discussion 179 the virtualization is affected but still would pro duce a virtualized effect. In the aided condition (Section 5.5.2 ) the off-center ITDs indicated a maxim um fron tal displacemen t on the Iceb erg metho d under 10 cm from the center in unaided condition and under 5 cm in aided condition. The ILDs were also im- pacted by distance, but to a low er extent, while the differen t R T affected less the ITDs and more the ILDs. The I LDs presen t a smearing b eha vior, low ering the error, with higher R T. That b eha vior suggests an equiv alen t comp ensation on the error predicted b y the model. Within that distance limit, the maxim um error predicted w as around 30 degrees for all R Ts, agreeing at the end with the non-aided condition. When the listener is a w ay from the center, the Iceb erg metho d virtualized using the four-loudsp eak er setup increases the deviations in binaural cues compared to the cues at those sound-source angles, with a near complete loss of gradient in cues ( i.e. , either zero or extreme v alues) o ccurring when the listener was 10 cm in front of the center p osition. ITDs for this condition rev ealed minor differences across the tested rev erb eration conditions. The v alues indicated that the files created by the metho d and repro duced on the four loudsp eak- ers configuration pro duce similar ITDs as the baseline condition, having the w eek p oin t in the 30 degrees. The absolute ITD v alues align with similar ex- p erimen ts found in the literature for VBAP configurations without a second listener [ 241 ]. The acoustic shadow is indicated by an increase of the Delta ITD difference around 270 degrees (left side), esp ecially to the closest p osition, similar to the finding with pure VBAP Chapter 3 . Also, the difference in ILDs (∆ILDs) sho w ed that the presence is well captured in higher frequencies. All R Ts conditions and positions demonstrated to capture differences in ILD to the left side of the mannequin. A ∆ILD is exp ected as a result of natural acoustic shado wing pro duced by the presence of a second listener. The b enefit of the Iceb erg metho d is that the VBAP is not limited in frequency b y aliasing in the higher frequencies and do es not require all to be activ e loudspeakers simultane- Chapter 5. Discussion 180 ously as pure Ambisonics. The wa y the division is done in the Iceb erg metho d brings the Ambisonics’ responsibility to the time domain, defining the metho d as more natural to physical presence b et w een loudsp eakers and the listener. That extends the system’s robustness with a limited num b er of loudsp eakers and frequency limit (not b eing dep endent on the Am bisonics order). The predicted error for a second listener compared to the Iceb erg baseline condition (listener cen tered alone). The method presen ts a deviation of around 10 degrees in all R Ts when the second listener is in the shoulder-to-shoulder situation, the closest p osition (50 cm). As the difference is at the second listener p osition, it is p ossible to argue that the Iceb erg energy balance is adv an tageous, not entirely dep ending on the four loudsp eak ers’ summation. Therefore, compared to VBAP or Am bisonics, the Iceberg method is a suitable option in terms of lo calization that adds the b enefit of immersiveness in a mo dest hardw are. 5.6.1 Sub jectiv e impressions The auralization was compared b y the author and his supervisor to VBAP and Am bisonics in sub jectiv e listening sessions. The experiment was not p erformed systematically , as the Co vid-19 emergency rules imp osed a series of restrictions and these impressions are the initial opinion. The sp eech signals w ere auralized via Iceb erg, VBAP , and Ambisonics and reproduced in an anechoic ro om. The ro oms were simulated in Odeon soft w are v.12 with reverberation time equiv a- len t to 0.5 and 1.3 seconds. Both agreed that the sound direction from VBAP is easily identifiable but with p o or immersiveness, as all the rev erb eration came from a sp ecific side (2 loudsp eak ers). Am bisonics offered a more immersive ex- p erience with all loudsp eak ers activ e simulta neously , but the lo calization was Chapter 5. Discussion 181 v ery difficult; a ”blurred” p osition seems to be a trending description. The Ice- b erg system provided a sound lo calization close to VBAP while main taining the immersiveness. The Iceb erg metho d, up on a trade-off on spatial lo calization, allo ws for the repro duction of sounds that can b e easily manipulated regarding sound-source direction, sound pressure lev el, reverberation time, and simultaneous sound sources. That makes it p ossible to create or repro duce sp ecific virtual sound scenarios with high repro ducibility . Thus, researc hers can conduct auditory tests with increased ecological v alidit y in spaces that usually do not count with n umerous loudsp eakers, as is common in clinics, universities, and small companies. Not withstanding these b enefits, some limitations challenged the metho d with a small n um b er of loudsp eak ers. These limitations imp ose some constrain ts on its use in terms of the spatial lo calization of sound sources. 5.6.2 Adv an tages and Limitations A fundamen tal adv antage of the prop osed Iceberg metho d is the minim um n um b er of loudsp eak ers required (four). F urthermore, its compatibilit y with an y RIR in low- or high-order Ambisonics already collected. That is p ossi- ble as an RIR in HOA can b e easily scaled do wn to first-order Ambisonics and its sound spatial prop erties comp osed with an y given sound via the al- gorithm [ 237 ]. F urthermore, an essential part of the metho d’s definition and an additional adv an tage is the automation of the definition of the amoun t of energy from the RIR that corresp onds to the sp ecific auralization tec hnique. That automation p erformed by the Central Time ro om acoustics parameter allo ws a smo oth transition b et ween the direct sound and early reflections p or- tion and the late reflections of the RIR, resulting in a p oten tially more natural sound while maintaining con trol ov er the incidence direction. Chapter 5. Concluding Remarks 182 The auralization metho d is designed for a virtualization setup of 4 loudsp eak- ers. How ever, it is p ossible to use it with more loudspeakers, reducing the ev en tual limitations on spatial accuracy . F urthermore, although not within the scop e of this thesis, the metho d, using the VBAP tec hnique, would allo w the p ossibilit y for dynamically moving sound sources around the listener. 5.6.3 Study limitations and F uture W ork The initial aim of this study was to in v estigate the correlation b et ween ob jec- tiv e parameters related to spatial sound, particularly those psyc hoacoustically motiv ated by auralization metho ds, and sub jective resp onses to these meth- o ds. How ever, due to the Co vid-19 pandemic, tests with participan ts w ere not possible due to the risk of infection as mandated by gov ernment rules. As a result, the study is limited to verifying ob jectiv e parameters. Therefore, section 5.5 was included to explore the system capabilities within a relev ant con text for hearing researc h, although without sub jective tests inv olving par- ticipan ts. In future work, structured v alidation with participan ts w ould be of v alue to the field, allowing for adjustmen ts and the measurement of the effectiv eness of this metho d in real-world auditory tests. Additionally , future implemen tations of this metho d could include improv ements such as guided sound source mov ements around the listener, with sim ultaneous up dates of VBAP and Ambisonics w eigh ts defined b y time constan ts, and the ability to pan with in tensity using techniques such as V ector-Based In tensity Panning (VBIP), whic h could b e tailored to specific cases with differen t loudspeaker ar- rangemen ts or stimuli frequency con tent and p oten tially merged with VBAP dep ending on the type of stimuli and sp ecific frequencies. Chapter 5. Concluding Remarks 183 5.7 Concluding Remarks T ests that require hearing aids can b e p erformed, considering some constraints, utilizing the prop osed Iceb erg metho d. These tests aimed to verify the impact of the auralization metho d through a simple setup (four loudsp eak ers) to the virtualized spatial impression by analyzing the binaural cues and their devi- ations from actual sound-sources loudsp eak ers. This is an imp ortan t step, although not discounting the imp ortance of v alidation with test participants. T o a centered listener, the verified deviation in binaural cues presented limi- tations of around 30 º degrees in lo calization (through ITD) with reasonably matc hing ILDs. The system’s reliabilit y is compromised as the listener is mo v ed out from the sweet sp ot, but less so than when unaided, p ossibly due to com b filtering or the addition of compression into the signal path. Small mo v emen ts up to 2.5 cm generated errors within a JND, meaning they likely w ould not be p erceiv ed as distortions or artifacts. Th us, tests with p eople that require sound sources p ositioned in spaces larger than 30 º can benefit from this Iceb erg metho d that incorp orates spatial a w areness and immersiv eness. Chapter 6 Conclusion Throughout the course of this study , a new auralization metho d called Iceb erg w as conceptualized and compared to well-kno wn metho ds, including VBAP and first-order Ambisonics, using ob jective parameters. The Iceberg metho d is inno v ativ e in that it uses TS to find the transition p oin t betw een early and late reflections in order to split the Am bisonics impulse resp onses and adequately distribute them. VBAP is resp onsible for lo calization cues in this prop osed metho d, while Am bisonics con tributes to the sense of immersion. In the cen ter p osition, the Iceb erg metho d was found to b e in line with the lo calization accuracy of other metho ds while also adding to the sense of immersion. Also, a second listener added to the side did not present undesired effects to the auralization. Additionally , it was found that virtualization of sound sources with Ambisonics can implicate limitations on a participant’s b eha vior due to its sw eet sp ot in a listening-in-noise test. How ever, these limitations can b e circum v en ted and extended to Iceb erg, resulting in sub jective resp onses that align with b ehavioral p erformance in sp eec h in telligibilit y tests and increasing the lo calization accuracy . 184 Chapter 6. Iceb erg 185 6.1 Iceb erg In the previous chapter, w e conducted a thorough analysis comparing the p erformance of the Iceb erg metho d to the results presented in Chapter 3 and the relev an t literature in Chapter 2 . This comparison included ev aluating the Iceb erg metho d’s p erformance at the cen ter p osition, at v arious off-center p ositions, and in the presence of a second listener. The results show ed that the Iceb erg metho d w as able to provide the designed ov erall reverberation times of 0 seconds, 0.5 seconds, and 1.1 seconds across all measured p ositions. Additionally , the differences b et ween the reverberation times were b elow the JND 5% threshold. When comparing v alues to the ones obtained with a HA TS in the center with- out virtualization, it is noteworth y that the Iceberg method uses 20 few er loud- sp eak ers than this VBAP and Ambisonics configuration. The Iceb erg metho d exhibited low er accuracy in repro ducing ITDs at the sweet sp ot p osition than VBAP , but it p erformed b etter than first-order Ambisonics. W e also observ ed detrimen tal deviations in ILDs, with v alues exceeding 4 dB, particularly at the same angles as the ITDs. The most significant ILD deviations o ccurred in the 2 kHz o cta v e band, which could influence the perceived localization accuracy . F urther inv estigation through v alidation tests is necessary to fully understand the exten t of these differences b et w een the metho ds. Regarding o v erall binaural cue repro duction, the Iceb erg metho d using four loudsp eak ers w as sup erior to pure first-order Ambisonics but less accurate than VBAP with 24 loudsp eak ers. The Iceb erg presented a maximum estimated lo calization er- ror of around 30 degrees for angles plus minus 40 degrees from the center while the listener is cen tered. Although this magnitude matc hes the similar metho ds in T able 2.2 , the binaural cues were p ointed to a lo wer estimate (around 15 degrees). Therefore further studies with p erceptual ev aluation are highly en- couraged. In the Aided condition, w e observed that the ITD w as not affected Chapter 6. Iceb erg 186 at the center p osition, and the ILD was closer to the VBAP condition with 24 loudsp eak ers. Ho wev er, this impro v ement w as not reflected in the mo del estimate, which still sho w ed maximum deviations of around ± 30 ◦ . A t off-cen ter positions, the Iceberg method sho wed slight v ariations in lo caliza- tion estimates, particularly in lateral angles, which were found to b e statisti- cally significant when comparing differen t reverberation times. This v ariation is likely due to the metho d’s spatial limitation, known as the sweet sp ot, as discussed in the Chapter 2 . When the reverberation time w as 0 s or 1.1 s, the sw eet sp ot was more limited in terms of displacement from the cen ter (up to 3.5 cm). This means that these conditions were more prone to breaking virtualization when sound sources were virtualized on the contralateral side of the displacemen t. In contrast, the mild condition (0.5 s) maintained this up to 5 cm. A sw eet sp ot is generally smaller in first-order Ambisonics compared to VBAP with a 24 loudsp eaker setup, as iden tified in Chapter 3. Ho w ev er, it is imp ortan t to note that ob jectiv e parameters may not alw a ys corresp ond directly to sub jective impressions. Despite this, the Iceb erg metho d with four loudsp eak ers was found to p erform similarly to VBAP (with 24 loudsp eakers) in terms of binaural cue repro duction. The mo del estimates also show ed that, within a com bined displacement of up to 3.5 cm in b oth lateral and fron tal directions, the maximum error w ould b e less than 30 degrees, indicating the presence of virtualization ( i.e. , the sound b eing physically comp osed of more than just the nearest sp eaker). It is therefore recommended to ev aluate this deviation further using other mo dels and sub jectiv e v alidation tests. The results in Section 5.4.3.3 , the condition with the listener in the center, sho w ed that the presence of a second listener did not negatively affect the p erformance of the Iceb erg metho d in all conditions of reverberation tested. No statistical difference in the means of estimated error w as identified when considering the three R T conditions and the three KEMAR positions. The Chapter 6. General Discussion 187 binaural cues errors follo w ed the same trend as the Alone version, meaning that ITDs p ointed to an error around 15 degrees, but with ILDs having absolute v alues with differences exceeding 4 dB (JND), which can probably explain the 30 º error estimated by the mo del in the w orst p osition ( i.e. , the angle of the virtualized sound source at ± 45 º ). Based on these results, the Iceb erg metho d can b e viable for virtualization setups with limited loudsp eakers or when a higher sense of realism is desired. 6.2 General Discussion In this work, we explored the use of auralization methods in hearing research as a means of impro ving the ecological v alidit y of acoustic en vironments. The use of virtualized sound fields has b ecome increasingly p opular in lab oratory tests. Ho w ev er, it is essen tial to understand the limitations of these methods in order to ensure unbiased results [ 97 ]. Our literature review (Chapter 2 ) identified the need for auralization metho ds that can b e implemented in smaller-scale setups, and our initial ev aluations fo cused on the spatial accuracy of several fundamen tal auralization metho ds, as well as their p otential use in tasks in- v olving m ultiple listeners. A collab orativ e study allo w ed us to test one of these tec hniques with real participants, and our findings highligh ted both the limita- tions and p oten tial improv ements of using Am bisonics for conducting listening effort tests. Based on this exp erience and our kno wledge of room acoustics and auralization, w e prop osed a new hybrid metho d called Iceb erg, whic h combines the strengths of Am bisonics and VBAP and can b e implemented using just four loudsp eak ers. This prop osed metho d offers a lo w-cost option for auralization that could increase its adoption among researc hers worldwide. In Chapter 3 , the VBAP and Ambisonics auralization metho ds were ob jectively c haracterized and compared in terms of binaural cues for the center and off- Chapter 6. General Discussion 188 cen ter p ositions. This inv estigation provided a foundation for combining the metho ds and further highlighted the strengths of each tec hnique: lo calization in VBAP and immersiv eness in Am bisonics. Ob jective parameters extracted from BRIRs and RIRs were examined for a single listener and in the presence of a second listener in the ro om. The results show ed that the presence of a second listener did not significan tly impact the p erformance of VBAP . A t the same time, Am bisonics was less effectiv e in repro ducing the examined cues, esp ecially with a second listener present. This information w as crucial in developing the prop osed Iceb erg auralization metho d, whic h combines the strengths of b oth VBAP and Ambisonics to create a hybrid metho d suitable for use with simple setups such as four loudsp eakers. The results of the collab orativ e study describ ed in Chapter 4 demonstrate the feasibilit y of using a virtualization method to deliv er a hearing test with a certain lev el of spatial resolution and immersion across different ro om sim ula- tions and signal-to-noise ratios. This study suggests that virtualization meth- o ds hav e the p otential to provide realistic acoustic environmen ts for hearing tests, allowing researc hers to v ary the acoustic demands of a task and p o- ten tially impro ve ecological v alidity . Additionally , the significant correlation b et w een participants’ sub jective p erception of effort and their sp eec h recogni- tion p erformance highligh ts the imp ortance of considering listening effort in hearing researc h. Ho w ever, the limitations and p otential solutions identified in this study also highlight the need for further in v estigation into virtualization metho ds in hearing research, including dev eloping new auralization metho ds that address these limitations. In Chapter 5 , we presented the dev elopment of a new auralization metho d called Iceb erg, whic h was designed to be compatible with small-scale virtu- alization setups using only four loudsp eak ers. Previous hybrid metho ds that com bine Am bisonics and VBAP hav e b een developed, but the innov ative as- Chapter 6. General Discussion 189 p ect of the Iceb erg metho d is its approach to handling and combining the dif- feren t metho ds to virtualize sounds while delivering appropriate spatial cues. This feature is ac hiev ed b y iden tifying a transition p oin t in the RIR using the Cen tral Time parameter from the omnidirectional channel of an Ambisonics RIR. This automated pro cess allo ws the user to input any Ambisonics RIR, along with the desired presentation angle(s) and sound file(s), to b e auralized using the VBAP and Am bisonics metho ds merged into a final multi-c hannel .w a v file for presen tation o v er a four-loudsp eak er system. One of the b enefits of this approac h is that it do es not require any additional parameters, such as those generated by a simulation program, and can b e used with any Am bison- ics RIR, including those in higher-order format that m ust b e con v erted to an appropriate order for the n um b er of loudsp eak ers. Overall, the dev elopment of the Iceb erg metho d illustrates the p oten tial for adapting existing technology to meet the needs of smaller-scale virtualization setups while still deliv ering realistic spatial cues. This approac h could support the broader adoption of au- ralization in hearing researc h and encourage researc hers to utilize virtualized sound fields in their proto cols. 6.2.1 Iceb erg capabilities The auralization metho d proposed in this work combines the use of Am bisonics RIRs and VBAP to balance the acoustic energy in t wo spatial domains: the p erception of sound lo calization and the p erception of immersion. This results in a file that captures the c haracteristics of a giv en sound as if it were play ed in the desired environmen t. The metho d can b e repro duced with at least four loudsp eakers but is scalable to a more extensive arra y of loudsp eakers of an y size greater than four, theoretically increasing its efficiency . In addition, m ultiple sound sources can b e virtualized and merged at presen tation to create more complex environmen ts. The input to Iceb erg includes Ambisonics RIRs Chapter 6. General Discussion 190 corresp onding to sp ecific source-and-receptor p ositions and the sounds to b e virtualized, preferably recorded in (near) anechoic conditions. The method can pan the source around the listener, as the VBAP comp onen t is indep endent of Am bisonics. Ho wev er, it is recommended that RIRs b e generated for sp ecific angles when using ro om acoustic soft w are to generate the Am bisonics RIRs. One b enefit of this metho d is that it can repro duce sounds ab o v e the cut-off frequency asso ciated with lo wer-order Am bisonics due to its use of VBAP , whic h is initially not frequency limited [ 241 ]. VBAP is resp onsible for the deliv ery of b oth direct sound and early reflections. Additionally , the default prop erties are defined to work with normalized RIRs, enabling the researcher to sp ecify the sound pressure level of the auralized files. 6.2.2 Iceb erg & Second Join t Listener T esting with a second listener inside the loudsp eak er ring helps illuminate the p oten tial for this virtualization system in different tasks and h uman-in teraction situations [ 143 , 202 , 230 , 234 ]. A system that allo ws these tasks and situations needs to deliv er the appropriate sound prop erties for the sound to b e p er- ceiv ed as coming from the intended p osition [ 97 ]. Am bisonics w as sho wn to b e not effectiv e in this test, as the shadow caused by a second listener pre- v en ted higher frequency spatial information from b eing correctly presen ted, distorting the sound field (esp ecially in lo w-order Am bisonics). V ector-based solutions can hav e less impact as the sound is physically formed from t w o (or three in 3D setups) loudsp eak ers in the same quadrant. That means that the in terference will happen only at angles where the acoustic shado w of a physical ob ject would naturally in terfere in a non-virtualized repro duction. In Chap- ter 5 BRIRs w ere acquired with files generated b y the Iceb erg metho d and repro duced via a mo dest setup comp osed of four loudsp eak ers in the presence of a second listener. It could b e observed that it did not disturb the sound Chapter 6. General Discussion 191 field, as the (primary) listener in the center p osition received the appropri- ate binaural cues. The system designed to repro duce files virtualized with Iceb erg metho d managed to perform competitively with systems with more loudsp eak ers rendering pure metho ds (See T able 2.2 ). 6.2.3 Iceb erg: Listener W earing Hearing Aids Adding the p ossibilit y of allowing participants to use hearing devices is an- other crucial step in making auditory tests with auralized files accessible to more researc hers [ 134 , 144 ]. It has b een observ ed that hearing aid signals can influence the intelligibilit y and clarity of sp eec h in virtualized sound fields [ 7 , 97 , 99 , 103 , 161 , 188 , 213 , 276 ]. When the hearing aid signals are not appro- priately aligned with the characteristics of the virtualized sound field, listeners ma y struggle to comprehend sp oken w ords or sen tences [ 98 ]. This issue can b e exacerbated when the virtualized sound field includes noise or other distrac- tions that can interfere with sp eec h p erception or when the hearing aid signals fail to amplify or enhance the sp eec h signal to an adequate degree [ 98 , 137 ]. Supp ose the hearing aid signals are not correctly capturing the sound field and, therefore, not correcting it to the individual needs and preferences of the lis- tener. In that case, the listener ma y exp erience difficulty using the virtualized sound field comfortably and effectiv ely . Sw ept signals were auralized b y the Iceb erg metho d, pla yed through the system, recorded with a manikin wearing hearing aids, and deconv olved. The resulting BRIRs were analyzed in terms of binaural cues and compared to the same signals from actual loudsp eak ers. The lo calization error was estimated b y May and Kohlrausch’s probabilistic mo del for robust sound source lo calization based on a binaural auditory fron t end. This mo del estimates the lo cation of a sound source using binaural cues such as in teraural lev el differences and interaural time differences extracted from the signals received by the t w o ears. By combining these cues in a probabilistic Chapter 6. General Conclusion 192 framew ork, the mo del can robustly estimate the lo cation of the sound source, ev en in noisy or distracting environmen ts. Ev aluation of the mo del suggests its p oten tial for use in practical con texts such as in hearing aids or virtual re- alit y systems. Results obtained using the Iceberg metho d with an aided HA TS sho w ed similar p erformance to the unaided results with the listener p ositioned in the sweet sp ot, indicating suitable p erformance (see Section 5.5 ). 6.2.4 Iceb erg Limitations The virtualization system playing files auralized with the Iceb erg metho d has b een found to b e less effective outside of the sweet sp ot, as the binaural cues are not correctly rendered. This mismatc h, whic h o ccurs for more than 2.5 cm displacemen ts, can b e mitigated b y keeping the listener centered in the virtu- alized sound field. While this is a significant limitation, the metho d can still b e applied with simple measures suc h as a mo dest head restraint, reducing the setup requirements compared to other classical metho ds. One ma jor limitation of the Iceb erg metho d is its spatial resolution capabilities. It is recommended for scenarios with a minimum of 30 º of separation b etw een sound sources (it can b e low er if closer to loudsp eak ers, although it should b e c hec k ed for the error distribution). F urthermore, the distance to the sound source should b e equal to the radius of the loudsp eaker array , as Ambisonics and VBAP can- not define sources inside the array . VBAP can only pan b et w een physical sound sources. These limitations should b e considered when using the Iceb erg metho d to create virtualized sound fields. Chapter 6. General Conclusion 193 6.3 General Conclusion As computational capacit y increases, using more complex and natural sound scenarios in auditory research b ecomes feasible and desirable. This technology allo ws for testing new features, sensors, and algorithms in controlled condi- tions with increasing realism and ecological v alidity . Ev en clinical tests can b enefit from auralization, allo wing for in vestigations in different scenarios with v arying acoustics ( e.g. , in a sp eech-in-noise test). The spatial-cue p erformance of the Iceb erg auralization metho d, repro ducing files through a system of four loudsp eak ers, is mainly sufficien t for these t yp es of tests. It is essential to un- derstand the constrain ts of auralization metho ds, Iceb erg included, whic h are tied to the virtualization setup and should b e c hosen by researchers based on their needs and the av ailable hardw are. How ever, utilizing the Iceb erg, virtu- alization can b e conducted by auditory research groups that cannot afford or house exp ensive anec hoic cham b ers with tens or hundreds of loudsp eak ers and sophisticated hardw are and need more freedom than using headphones. The metho d presented in this w ork serves as an additional to ol for researc hers to consider. 6.4 Main Con tributions In this w ork, we ha v e presented a nov el auralization metho d called Iceb erg, designed to create virtualized sound scenarios for use in auditory research. The main contributions of this work are: 1. The dev elopment of a hybrid auralization metho d that combines t w o psyc hoacoustic virtualization metho ds to balance the energy of an RIR and output a m ulti-c hannel file for presentation. Chapter 6. Main Contributions 194 2. The implemen tation of an effectiv e, simple, and partially automated au- ralization metho d that allows for the creation of reasonably realistic vir- tualized sound scenarios with a mo dest setup. 3. The exploration of the use and limitations of auralization metho ds in auditory research, including the suggestion that the Iceb erg metho d has the p oten tial to b e a helpful to ol for testing new features, sensors, and algorithms in con trolled conditions with increasing realism and ecological v alidit y . 4. W e researc hed the limitations and feasibility of using Ambisonics in the con text of sp eech intelligibilit y with normal-hearing listeners. 5. Identifying the p oten tial for the Iceb erg metho d to b e applied in a range of practical contexts, including in hearing aids and virtual realit y sys- tems. Bibliograph y [1] Aguirre, S. L. (2017). Implemen ta¸ c˜ ao e a v alia¸ c˜ ao de um sistema de vir- tualiza¸ c˜ ao de fontes sonoras (in p ortuguese). Master, Programa de P´ os- Gradua¸ c˜ ao em Engenharia Mecˆ anica, Universidade F ederal de Santa Cata- rina. (Cite d on p ages 41 and 49 ) [2] Aguirre, S. L., Bramsløw, L., Lunner, T., and Whitmer, W. M. (2019). Spa- tial cue distortions within a virtualized sound field caused b y an additional listener. In Pr o c e e dings of the 23r d International Congr ess on A c oustics : in- te gr ating 4th EAA Eur or e gio 2019 , pages 6537–6544, Berlin, Germany . ICA In ternational Congress on Acoustics, Deutsche Gesellschaft f ¨ ur Akustik. (Cite d on p age 94 ) [3] Aguirre, S. L., Seifi-Ala, T., Bramsløw, L., Gra v ersen, C., Hadley , L. V., Na ylor, G., and Whitmer, W. M. (2021). Com bination study 3. h ttp: //hear- eco.eu/combination- study- 3/ . (accessed: 24.11.2021). (Cite d on p age 97 ) [4] Agus, T. R., Akero yd, M. A., Gatehouse, S., and W arden, D. (2009). In- formational masking in young and elderly listeners for sp eec h masked b y sim ultaneous speech and noise. The Journal of the A c oustic al So ciety of A meric a , 126(4):1926–1940. (Cite d on p age 40 ) [5] Ahnert F eistel Media Group (2011). Ease enhanced acoustic sim ulator for engineers. h ttps://www.afmg.eu/en/ ease- enhanced- acoustic- sim ulator- engineers . Last c hec ked on: No v 28, 2021. (Cite d on p age 27 ) [6] Ahrens, A., Marschall, M., and Dau, T. (2017). Measuring sp eec h intelligi- bilit y with sp eech and noise interferers in a loudsp eaker-based virtual sound en vironmen t. The Journal of the A c oustic al So ciety of Americ a , 141(5):3510– 3510. (Cite d on p ages 16 , 42 and 52 ) [7] Ahrens, A., Marsc hall, M., and Dau, T. (2019). Measuring and mo deling sp eec h intelligibilit y in real and loudsp eak er-based virtual sound en viron- men ts. He aring R ese ar ch , 377:307–317. (Cite d on p ages 42 , 53 , 99 and 191 ) 195 BIBLIOGRAPHY 196 [8] Ahrens, A., Marsc hall, M., and Dau, T. (2020). The effect of spatial energy spread on sound image size and sp eech intelligibilit y . The Journal of the A c oustic al So ciety of Americ a , 147(3):1368–1378. (Cite d on p age 42 ) [9] Akero yd, M. A. (2006). The psychoacoustics of binaural hearing. Interna- tional Journal of A udiolo gy , 45(sup1):25–33. (Cite d on p ages 9 and 17 ) [10] Alfandari Menase, D. (2022). Motivation and fatigue effe cts in pupil lo- metric me asur es of listening effort . PhD thesis, Univ ersit y of Nottingham. (Cite d on p age 33 ) [11] Algazi, V. R., Duda, R. O., and Thompson, D. M. (2004). Motion-track ed binaural sound. Journal of the Audio Engine ering So ciety , 52(11):1142– 1156. (Cite d on p age 23 ) [12] Alhanbali, S., Daw es, P ., Millman, R. E., and Munro, K. J. (2019). Mea- sures of Listening Effort Are Multidimensional. Ear and he aring . (Cite d on p ages 51 and 118 ) [13] Alp ert, M. I., Alp ert, J. I., and Maltz, E. N. (2005). Purc hase o ccasion influence on the role of m usic in advertising. Journal of business r ese ar ch , 58(3):369–376. (Cite d on p age 8 ) [14] Arau-Puchades, H. (1988). An improv ed rev erb eration formula. A cta A custic a unite d with A custic a , 65(4):163–180. (Cite d on p age 34 ) [15] Archon tis P olitis (2020). Higher Order Ambisonics (HOA) library. (Cite d on p age 107 ) [16] Arlinger, S. (2003). Negative consequences of uncorrected hearing loss—a review. International Journal of Audiolo gy , 42(sup2):17–20. (Cite d on p ages 1 and 55 ) [17] Asp¨ ock, L., P ausc h, F., Stienen, J., Berzb orn, M., Kohnen, M., F els, J., and V orl¨ ander, M. (2018). Application of virtual acoustic en vironmen ts in the scop e of auditory research. In XXVIII Enc ontr o da So cie dade Br asileir a de A c ´ ustic a, SOBRAC, Porto Ale gr e, Br azil . SOBRAC. (Cite d on p ages 16 and 42 ) [18] Atten b orough, K. (2007). Sound Pr op agation in the Atmospher e , pages 113–147. Springer New Y ork, New Y ork, NY. (Cite d on p age 10 ) [19] Baldan, S., Lac hambre, H., Delle Monache, S., and Boussard, P . (2015). Ph ysically informed car engine sound synthesis for virtual and augmented en vironmen ts. In 2015 IEEE 2nd VR Workshop on Sonic Inter actions for Virtual Envir onments (SIVE) , pages 1–6. IEEE. (Cite d on p age 20 ) BIBLIOGRAPHY 197 [20] Barron, M. (1971). The sub jective effects of first reflections in concert halls—the need for lateral reflections. Journal of Sound and Vibr ation , 15(4):475–494. (Cite d on p age 37 ) [21] Barron, M. and Marshall, A. (1981). Spatial impression due to early lateral reflections in concert halls: The deriv ation of a physical measure. Journal of Sound and Vibr ation , 77(2):211–232. (Cite d on p ages 9 , 37 and 38 ) [22] Bates, E., Kearney , G., F urlong, D., and Boland, F. (2007). Localization accuracy of adv anced spatialisation tec hniques in small concert halls. The Journal of the A c oustic al So ciety of Americ a , 121. (Cite d on p ages 47 , 49 and 53 ) [23] Benesty , J., Sondhi, M., and Huang, Y. (2008). Springer Handb o ok of Sp e e ch Pr o c essing . Springer Handb o ok of Sp eech Pro cessing. Springer- V erlag Berlin Heidelb erg. bibtex: Benesty2008. (Cite d on p age 24 ) [24] Berkhout, A. J. (1988). a holographic approac h to acoustic control. Jour- nal of the A udio Engine ering So ciety , 36(12):977–995. (Cite d on p age 31 ) [25] Berkhout, A. J., de V ries, D., and V ogel, P . (1993). Acoustic con trol b y w av e field syn thesis. The Journal of the A c oustic al So ciety of Americ a , 93(5):2764–2778. (Cite d on p age 31 ) [26] Bertet, S., Daniel, J., Parizet, E., and W arusfel, O. (2009). Influence of microphone and loudsp eaker setup on p erceiv ed higher order ambisonics repro duced sound field. Pr o c e e dings of A mbisonics Symp osium . cited By 3. (Cite d on p age 31 ) [27] Bertet, S., Daniel, J., Parizet, E., and W arusfel, O. (2013). In vestigation on lo calisation accuracy for first and higher order am bisonics repro duced sound sources. A cta A custic a unite d with A custic a , 99:642 – 657. (Cite d on p ages 31 and 77 ) [28] Bertoli, S. and Bo dmer, D. (2014). No v el sounds as a psychoph ysiological measure of listening effort in older listeners with and without hearing loss. Clinic al Neur ophysiolo gy . (Cite d on p age 50 ) [29] Berzb orn, M., Bomhardt, R., Klein, J., Rich ter, J.-G., and V orl¨ ander, M. (2017). The IT A-T o olb o x: An Op en Source MA TLAB T o olb o x for Acoustic Measuremen ts and Signal Pro cessing. In 43th A nnual German Congr ess on A c oustics, Kiel (Germany), 6 Mar 2017 - 9 Mar 2017 , v olume 43, pages 222–225. (Cite d on p ages 60 , 107 and 138 ) [30] Best, V., Kalluri, S., McLachlan, S., V alentine, S., Edw ards, B., and Carlile, S. (2010). A comparison of cic and bte hearing aids for three- dimensional localization of sp eec h. International Journal of Audiolo gy , 49(10):723–732. (Cite d on p age 42 ) BIBLIOGRAPHY 198 [31] Best, V., Keidser, G., Buc hholz, J. M., and F reeston, K. (2015). An exam- ination of sp eec h reception thresholds measured in a sim ulated reverberant cafeteria environmen t. International Journal of Audiolo gy . (Cite d on p ages 42 and 52 ) [32] Best, V., Marrone, N., Mason, C. R., and Kidd, G. (2012). The influence of non-spatial factors on measures of spatial release from masking. The Journal of the A c oustic al So ciety of A meric a , 131(4):3103–3110. bibtex: b est2012. (Cite d on p age 33 ) [33] Bidelman, G. M., Da vis, M. K., and Pridgen, M. H. (2018). Brainstem- cortical functional connectivity for sp eech is differentially challenged b y noise and reverberation. He aring R ese ar ch . (Cite d on p age 50 ) [34] Bigg, G. R. (2015). The scienc e of ic eb er gs , page 21–124. Cambridge Univ ersit y Press. (Cite d on p age 128 ) [35] Bisgaard, N., Vlaming, M. S. M. G., and Dahlquist, M. (2010). Stan- dard audiograms for the iec 60118-15 measuremen t pro cedure. T r ends in A mplific ation , 14(2):113–120. (Cite d on p ages 163 and 164 ) [36] Blackstock, D. (2000). F undamentals of Physic al A c oustics . A Wiley- In terscience publication. Wiley . (Cite d on p age 228 ) [37] Blauert, J. (1969). Sound lo calization in the median plane. A cta A custic a unite d with A custic a , 22(4):205–213. (Cite d on p age 10 ) [38] Blauert, J. (1997). Sp atial he aring: the psychophysics of human sound lo c alization . MIT press. (Cite d on p ages 9 , 13 , 14 , 16 , 18 , 40 , 73 , 93 , 126 and 149 ) [39] Blauert, J. (2005). Communic ation ac oustics . Springer-V erlag Berlin Hei- delb erg, 1 edition. (Cite d on p ages 2 , 3 , 10 , 13 , 20 , 76 and 98 ) [40] Blauert, J. (2013). The te chnolo gy of binaur al listening . Springer. (Cite d on p ages 9 , 20 , 22 , 33 and 76 ) [41] Blauert, J., Lehnert, H., Sahrhage, J., and Strauss, H. (2000). An interac- tiv e virtual-environmen t generator for psychoacoustic research. i: Arc hitec- ture and implementation. A cta A custic a unite d with A custic a , 86:94–102. (Cite d on p ages 1 and 57 ) [42] Bo ck, T. M. and Keele, Jr., D. B. D. (1986). The effects of in terau- ral crosstalk on stereo repro duction and minimizing interaural crosstalk in nearfield monitoring by the use of a physical barrier: part 1. Journal of the A udio Engine ering So ciety . (Cite d on p age 43 ) BIBLIOGRAPHY 199 [43] Bradley , J. S. (1986). Sp eec h intelligibilit y studies in classro oms. The Journal of the A c oustic al So ciety of Americ a , 80(3):846–854. (Cite d on p age 127 ) [44] Bradley , J. S. and Soulo dre, G. A. (1995). Ob jectiv e measures of listener en v elopmen t. The Journal of the A c oustic al So ciety of Americ a , 98(5):2590– 2597. (Cite d on p ages 36 , 38 and 58 ) [45] Brand˜ ao, E. (2018). A c´ ustic a de salas: Pr ojeto e mo delagem . Editora Bluc her, S˜ ao Paulo. (Cite d on p ages 16 , 19 , 33 , 36 , 38 and 128 ) [46] Brandao, E., Morgado, G., and F onseca, W. (2020). A ray tracing engine in tegrated with blender and with uncertain t y estimation: Description and initial results. Building A c oustics , 28:1–20. (Cite d on p age 27 ) [47] Breebaart, J., v an de P ar, S., Kohlrausc h, A., and Sc h uijers, E. (2004). High-qualit y parametric spatial audio co ding at low bitrates. Journal of the A udio Engine ering So ciety . (Cite d on p age 57 ) [48] Breebaart, J., V an de P ar, S., Kohlrausch, A., and Sch uijers, E. (2005). P arametric co ding of stereo audio. EURASIP Journal on A dvanc es in Signal Pr o c essing , pages 1–18. (Cite d on p age 57 ) [49] Brinkmann, F., Asp¨ oc k, L., Ac kermann, D., Lepa, S., V orl¨ ander, M., and W einzierl, S. (2019). A round robin on ro om acoustical simulation and auralization. The Journal of the A c oustic al So ciety of Americ a , 145(4):2746– 2760. (Cite d on p age 21 ) [50] Brinkmann, F., Asp¨ ock, L., Ac kermann, D., Op dam, R., V orl¨ ander, M., and W einzierl, S. (2021). A benchmark for ro om acoustical sim ulation. concept and database. Applie d A c oustics , 176:107867. (Cite d on p age 21 ) [51] Brinkmann, F., Lindau, A., and W einzierl, S. (2017). On the authen ticity of individual dynamic binaural synthesis. The Journal of the A c oustic al So ciety of Americ a , 142(4):1784–1795. (Cite d on p age 14 ) [52] Brown, C. and Duda, R. (1998). A structural mo del for binaural sound syn thesis. IEEE T r ansactions on Sp e e ch and A udio Pr o c essing , 6(5):476– 488. (Cite d on p age 12 ) [53] Brown, V. A. and Strand, J. F. (2019). Noise increases listening effort in normal-hearing young adults, regardless of working memory capacit y. L anguage, Co gnition and Neur oscienc e . (Cite d on p ages 51 and 118 ) BIBLIOGRAPHY 200 [54] Brungart, D. S., Cohen, J., Cord, M., Zion, D., and Kalluri, S. (2014). Assessmen t of auditory spatial a wareness in complex listening en vironments. The Journal of the A c oustic al So ciety of Americ a , 136(4):1808–1820. (Cite d on p age 18 ) [55] Buchholz, J. M. and Best, V. (2020). Sp eech detection and lo calization in a reverberant m ultitalk er environmen t b y normal-hearing and hearing- impaired listeners. The Journal of the A c oustic al So ciety of A meric a , 147(3):1469–1477. (Cite d on p age 42 ) [56] Byrnes, H. (1984). The role of listening comprehension: A theoretical base. F or eign language annals , 17(4):317. (Cite d on p age 8 ) [57] Campanini, S. and F arina, A. (2008). A new audacity feature: ro om ob jectiv e acustical parameters calculation mo dule. (Cite d on p age 36 ) [58] Choi, I., Shinn-Cunningham, B. G., Chon, S. B., and Sung, K.-m. (2008). Ob jectiv e measurement of p erceived auditory quality in m ultic hannel audio compression co ding systems. Journal of the Audio Engine ering So ciety , 56(1/2):3–17. (Cite d on p age 73 ) [59] Claus Lynge Christensen, Gry Bælum Nielsen, J. H. R. (2008). Danish acoustical so ciety round robin on ro om acoustic computer mo delling. https: //o deon.dk/learn/articles/auralisation/ . Last c hec k ed on: No v 28, 2021. (Cite d on p ages 18 , 27 , 68 , 107 and 129 ) [60] Co op er, D. H. and Bauck, J. L. (1989). prosp ects for transaural recording. Journal of the A udio Engine ering So ciety , 37(1/2):3–19. (Cite d on p age 22 ) [61] Cubick, J. and Dau, T. (2016). V alidation of a virtual sound en vironmen t system for testing hearing aids. A cta A custic a unite d with A custic a . (Cite d on p ages 42 , 53 , 56 and 120 ) [62] Cuev as-Ro dr ´ ıguez, M., Picinali, L., Gonz´ alez-T oledo, D., Garre, C., de la Rubia-Cuestas, E., Molina-T anco, L., and Rey es-Lecuona, A. (2019). 3d tune-in to olkit: An op en-source library for real-time binaural spatialisation. PloS one , 14(3):e0211899. (Cite d on p ages 16 , 20 and 121 ) [63] Cunningham, L. L. and T ucci, D. L. (2017). Hearing loss in adults. New England Journal of Me dicine , 377(25):2465–2473. (Cite d on p ages 1 and 55 ) [64] Daniel, J. (2000). R epr´ esentation de champs ac oustiques, applic ation ` a la tr ansmission et ` a la r epr o duction de sc ` enes sonor es c omplexes dans un c ontexte multim´ edia (In F r ench) . PhD thesis, Universit y of P aris VI. (Cite d on p ages 31 and 99 ) [65] Daniel, J. and Moreau, S. (2004). F urther study of sound field co ding with higher order ambisonics. In Audio Engine ering So ciety Convention 116 . (Cite d on p ages 30 , 32 , 99 , 121 and 142 ) BIBLIOGRAPHY 201 [66] Davies, W. J., Bruce, N. S., and Murph y , J. E. (2014). Soundscap e repro- duction and synthesis. A cta A custic a unite d with A custic a , 100(2):285–292. (Cite d on p age 42 ) [67] Dietrich, P ., Masiero, B., M ¨ uller-T rap et, M., Pollo w, M., and Scharrer, R. (2010). Matlab to olbox for the comprehension of acoustic measuremen t and signal pro cessing. In F ortschritte der A kustik – DA GA . (Cite d on p age 107 ) [68] Dreier, C. and V orl¨ ander, M. (2020). Psyc hoacoustic optimisation of air- craft noise-challenges and limits. In Inter-Noise and Noise-Con Congr ess and Confer enc e Pr o c e e dings , volume 261, pages 2379–2386. Institute of Noise Con trol Engineering. (Cite d on p age 20 ) [69] Dreier, C. and V orl¨ ander, M. (2021). Aircraft noise—auralization-based assessmen t of weather-dependent effects on loudness and sharpness. The Journal of the A c oustic al So ciety of Americ a , 149(5):3565–3575. (Cite d on p age 20 ) [70] Duda, R., Av endano, C., and Algazi, V. (1999). An adaptable ellip- soidal head mo del for the interaural time difference. In 1999 IEEE Interna- tional Confer enc e on A c oustics, Sp e e ch, and Signal Pr o c essing. Pr o c e e dings. ICASSP99 (Cat. No.99CH36258) , v olume 2, pages 965–968 vol.2. (Cite d on p age 12 ) [71] Dunne, R., Desai, D., and Heyns, P . S. (2021). Developmen t of an acoustic material prop ert y database and universal airflo w resistivity mo del. Applie d A c oustics , 173:107730. (Cite d on p age 21 ) [72] Eddins, D. A. and Hall, J. W. (2010). Binaural pro cessing and auditory asymmetries. In Gordon-Salant, S., F risina, R. D., Popper, A. N., and F ay , R. R., editors, The A ging A uditory System , pages 135–165. Springer New Y ork, New Y ork, NY. (Cite d on p age 11 ) [73] Epain, N., Guillon, P ., Kan, A., Kosobro dov, R., Sun, D., Jin, C., and V an Sc haik, A. (2010). Ob jective ev aluation of a three-dimensional sound field repro duction system. In Burgess, M., Dav ey , J., Don, C., and McMinn, T., editors, Pr o c e e dings of 20th International Congr ess on A c oustics, ICA 2010 , volume 2, pages 949–955. In ternational Congress on Acoustics (ICA). (Cite d on p age 31 ) [74] Eyring, C. F. (1930). Rev erb eration time in “dead” ro oms. The Journal of the A c oustic al So ciety of A meric a , 1(2A):168–168. (Cite d on p age 34 ) [75] F arina, A. (2000). Simultaneous measurement of impulse resp onse and distortion with a sw ept-sine tec hnique. Journal of The A udio Engine ering So ciety . (Cite d on p age 63 ) BIBLIOGRAPHY 202 [76] F arina, A., Glasgal, R., Armelloni, E., and T orger, A. (2001). am biophonic principles for the recording and repro duction of surround sound for music. journal of the audio engine ering so ciety . (Cite d on p ages 43 and 45 ) [77] F avrot, S. and Buchholz, J. (2009). V alidation of a loudsp eak er-based ro om auralization system using sp eec h intelligibil ity measures. In A udio Engine ering So ciety Convention Pap ers , volume Preprin t 7763, page 7763. Praesens V erlag. 126th Audio Engineering So ciet y Con v en tion, AES126 ; Conference date: 07-05-2009 Through 10-05-2009. (Cite d on p ages 21 and 99 ) [78] F avrot, S., Marsc hall, M., K¨ asbach, J., Buc hholz, J., and W eller, T. (2011). Mixed-order ambisonics recording and playbac k for improving hor- izon tal directionality . In Pr o c e e ding of the audio engine ering so ciety 131st c onvention . 131st AES Con v en tion ; Conference date: 20-10-2011 Through 23-10-2011. (Cite d on p age 99 ) [79] F avrot, S. E., Buchholz, J., and Dau, T. (2010). A loudsp e aker-b ase d r o om aur alization system for auditory r ese ar ch . phdthesis, T ec hnical Universit y of Denmark. (Cite d on p ages 1 , 42 , 43 , 45 , 56 , 57 , 120 and 127 ) [80] Fichna, S., Bib erger, T., Seeb er, B. U., and Ewert, S. D. (2021). Effect of acoustic scene complexit y and visual scene represen tation on auditory p erception in virtual audio-visual environmen ts. 2021 Immersive and 3D A udio: fr om Ar chite ctur e to Automotive (I3D A) . (Cite d on p age 42 ) [81] Fintor, E., Asp¨ ock, L., F els, J., and Sc hlittmeier, S. (2021). The role of spatial separation of t w o talk ers auditory stimuli in the listener’s memory of running sp eec h: listening effort in a non-noisy conv ersational setting. International Journal of A udiolo gy . (Cite d on p age 57 ) [82] Fitzroy , D. (1959). Reverberation form ula which seems to b e more accu- rate with non uniform distribution of absorption. The Journal of the A c ous- tic al So ciety of Americ a , 31(7):893–897. (Cite d on p age 34 ) [83] F rancis, A. L. and Lov e, J. (2020). Listening effort: Are we measuring cognition or affect, or b oth? WIREs Co gnitive Scienc e , 11(1):e1514. (Cite d on p age 98 ) [84] F rank, M. (2014). Lo calization using differen t amplitude-panning metho ds in the frontal hhorizontal plane. In Pr o c e e dings of the EAA Joint Symp osium on Aur alization and A mbisonics 2014 . (Cite d on p ages 45 , 49 and 160 ) [85] F rank, M. and Zotter, F. (2008). Lo calization exp erimen ts using differ- en t 2d am bisonics deco ders. In 25th T onmeistertagung-VDT International Convention, L eipzig . (Cite d on p ages 45 and 49 ) BIBLIOGRAPHY 203 [86] F ranklin, W. S. (1903). Deriv ation of equation of deca ying sound in a ro om and definition of op en windo w equiv alen t of absorbing p o w er. Phys. R ev. (Series I) , 16:372–374. (Cite d on p age 34 ) [87] F raser, S., Gagn ´ e, J. P ., Alepins, M., and Dub ois, P . (2010). Ev aluat- ing the effort exp ended to understand sp eech in noise using a dual-task paradigm: The effects of providing visual sp eech cues. Journal of Sp e e ch, L anguage, and He aring R ese ar ch . (Cite d on p age 50 ) [88] F uruya, H., F ujimoto, K., Y oung Ji, C., and Higa, N. (2001). Arriv al direc- tion of late sound and listener env elopment. Applie d A c oustics , 62(2):125– 136. (Cite d on p age 37 ) [89] Gandemer, L., Parseihian, G., Bourdin, C., and Kronland-Martinet, R. (2018). Perception of Surrounding Sound Source T ra jectories in the Horizon- tal Plane: A Comparison of VBAP and Basic-Deco ded HOA. A cta A custic a unite d with A custic a , pages 338–350. (Cite d on p ages 40 , 57 and 122 ) [90] Gelfand, S. and Gelfand, S. (2004). He aring: An Intr o duction to Psycho- lo gic al and Physiolo gic al A c oustics, F ourth Edition . T aylor & F rancis. (Cite d on p age 76 ) [91] Gerzon, M. A. (1985). Ambisonics in multic hannel broadcasting and video. AES: Journal of the A udio Engine ering So ciety . (Cite d on p ages 23 and 58 ) [92] Giguere, C. and W o o dland, P . C. (1994). A computational mo del of the auditory p eriphery for sp eec h and hearing researc h. i. ascending path. The Journal of the A c oustic al So ciety of Americ a , 95(1):331–342. (Cite d on p age 8 ) [93] Gil Carv a jal, J., Cubic k, J., Santurette, S., and Dau, T. (2016). Spatial hearing with incongruent visual or auditory ro om cues. Scientific R ep orts , 6. (Cite d on p ages 16 and 42 ) [94] Glasgal, R. (2001). the am biophone deriv ation of a recording methodology optimized for am biophonic repro duction. journal of the audio engine ering so ciety . (Cite d on p age 43 ) [95] Glasgal, R. and Y ates, K. (1995). Ambiophonics: Beyond Surr ound Sound to Virtual Sonic R e ality . Am biophonics Institute. (Cite d on p age 43 ) [96] Gomes, L., F onseca, W. D., de Carv alho, D. M. L., and Mareze, P . H. (2020). Rendering binaural signals for mo ving sources. In R epr o duc e d Sound 2020 . (Cite d on p age 20 ) BIBLIOGRAPHY 204 [97] Grimm, G., Ewert, S., and Hohmann, V. (2015a). Ev aluation of spatial audio repro duction schemes for application in hearing aid research. A cta A custic a unite d with A custic a , 101(4):842–854. (Cite d on p ages 18 , 21 , 31 , 41 , 49 , 53 , 94 , 117 , 121 , 160 , 163 , 177 , 187 , 190 and 191 ) [98] Grimm, G., Kollmeier, B., and Hohmann, V. (2016a). Spatial Acoustic Scenarios in Multic hannel Loudspeaker Systems for Hearing Aid Ev aluation. Journal of the A meric an A c ademy of Audiolo gy , 27(7):557–566. (Cite d on p ages 56 , 163 and 191 ) [99] Grimm, G., Kollmeier, B., and Hohmann, V. (2016b). Spatial Acoustic Scenarios in Multic hannel Loudspeaker Systems for Hearing Aid Ev aluation. Journal of the A meric an A c ademy of Audiolo gy . (Cite d on p age 191 ) [100] Grimm, G., Luberadzk a, J., Herzke, T., and Hohmann, V. (2015b). T o ol- b o x for acoustic scene creation and rendering (T ASCAR): Render metho ds and research applications. Pr o c e e dings of the Linux A udio Confer enc e . (Cite d on p age 42 ) [101] Grimm, G., Lub eradzk a, J., and Hohmann, V. (2018). Virtual acoustic en vironmen ts for comprehensive ev aluation of mo del-based hearing devices *. International Journal of Audiolo gy . (Cite d on p ages 42 and 120 ) [102] Grimm, G., Lub eradzk a, J., and Hohmann, V. (2019). A to olb o x for rendering virtual acoustic environmen ts in the context of audiology . A cta A custic a unite d with A custic a , 105:566–578. (Cite d on p ages 1 , 42 and 57 ) [103] Guastavino, C., Katz, B., P olack, J.-D., Levitin, D., and Dubois, D. (2004). Ecological v alidit y of soundscap e repro duction. A cta A custic a unite d with A custic a , 50. (Cite d on p ages 42 and 191 ) [104] Guastavino, C. and Katz, B. F. G. (2004). P erceptual ev aluation of m ulti-dimensional spatial audio repro duction. The Journal of the A c oustic al So ciety of Americ a , 116:1105–1115. (Cite d on p ages 57 , 86 and 122 ) [105] Guastavino, C., Larcher, V., Catusseau, G., and Boussard, P . (2007). Spatial audio quality ev aluation: comparing transaural, am bisonics and stereo. In Pr o c e e dings of the 13th International Confer enc e on Auditory Display. Montr´ eal Canada . Georgia Institute of T echnology . (Cite d on p ages 86 and 122 ) [106] Hacihabib oglu, H., De Sena, E., Cv etko vic, Z., Johnston, J., and Smith I I I, J. O. (2017). P erceptual spatial audio recording, simulation, and rendering: An ov erview of spatial-audio tec hniques based on psychoa- coustics. IEEE Signal Pr o c essing Magazine , 34(3):36–54. (Cite d on p ages 20 , 22 and 23 ) [107] Hamdan, E. C. and Fletcher, M. D. (2022). A compact tw o-loudsp eak er virtual sound reproduction system for clinical testing of spatial hearing with hearing-assistiv e devices. F r ontiers in Neur oscienc e , 15. (Cite d on p ages 42 , 47 and 49 ) BIBLIOGRAPHY 205 [108] Hammershøi, D. and Møller, H. (1992). F undamentals of binaural tech- nology . In F undamentals of Binaur al T e chnolo gy . (Cite d on p ages 12 and 16 ) [109] Hammershøi, D. and Møller, H. (2005). Binaural technique — basic metho ds for recording, syn thesis, and repro duction. In Blauert, J., edi- tor, Communic ation A c oustics , pages 223–254. Springer Berlin Heidelb erg, Berlin, Heidelb erg. (Cite d on p age 16 ) [110] Harris, P ., Nagy , S., and V ardaxis, N. (2018). Mosby’s Dictionary of Me dicine, Nursing and He alth Pr ofessions - R evise d 3r d Anz Edition . Else- vier Health Sciences Apac. (Cite d on p age 7 ) [111] Hav elo ck, D. I., Kuw ano, S., and V orlander, M. (2008). Handb o ok of signal pr o c essing in ac oustics . Springer, New Y ork. (Cite d on p age 98 ) [112] Hazrati, O. and Loizou, P . C. (2012). The combined effects of rev er- b eration and noise on sp eech in telligibility b y co c hlear implan t listeners. International Journal of A udiolo gy . (Cite d on p age 98 ) [113] He, J. (2016). Sp atial A udio R epr o duction with Primary Ambient Ex- tr action . SpringerBriefs in Electrical and Computer Engineering. Springer Singap ore. (Cite d on p age 18 ) [114] Heck er, S. (1984). Music for adv ertising effect. Psycholo gy & Marketing , 1(3-4):3–8. (Cite d on p age 8 ) [115] Hendrickx, E., Stitt, P ., Messonnier, J.-C., Lyzw a, J.-M., Katz, B. F., and de Boish´ eraud, C. (2017). Impro vemen t of externalization b y listener and source mov ement using a “binauralized” microphone arra y . Journal of the audio engine ering so ciety , 65(7/8):589–599. (Cite d on p age 23 ) [116] Hendrikse, M. M. E., Llorac h, G., Hohmann, V., and Grimm, G. (2019). Mo v emen t and gaze b eha vior in virtual audio visual listening environmen ts resem bling everyda y life. T r ends in He aring , 23. (Cite d on p ages 1 and 57 ) [117] Hiyama, K., Komiyama, S., and Hamasaki, K. (2002). The minimum n um b er of loudsp eak ers and its arrangement for reproducing the spatial impression of diffuse sound field. Journal of the Audio Engine ering So ciety . (Cite d on p age 41 ) [118] Hohmann, V., P aluch, R., Krueger, M., Meis, M., and Grimm, G. (2020). The virtual realit y lab: Realization and application of virtual sound envi- ronmen ts. Ear & He aring , 41:31S–38S. (Cite d on p ages 1 and 57 ) BIBLIOGRAPHY 206 [119] Holman, J. A., Drummond, A., and Na ylor, G. (2021). Hearing aids reduce daily-life fatigue and increase so cial activit y: a longitudinal study . me dRxiv . (Cite d on p ages 1 and 56 ) [120] Holub e, I., F redelak e, S., Vlaming, M., and Kollmeier, B. (2010). Devel- opmen t and analysis of an international sp eec h test signal (ists). Interna- tional Journal of A udiolo gy , 49(12):891–903. (Cite d on p age 131 ) [121] Holub e, I., Haeder, K., Imbery , C., and W eb er, R. (2016). Sub jective Listening Effort and Electro dermal Activity in Listening Situations with Rev erb eration and Noise. T r ends in he aring . (Cite d on p ages 99 and 118 ) [122] Hong, J. Y., He, J., Lam, B., Gupta, R., and Gan, W.-S. (2017). Spatial audio for soundscap e design: Recording and repro duction. Applie d Scienc es , 7(6). (Cite d on p age 18 ) [123] Hornsby , B. W. (2013). The effects of hearing aid use on listening effort and men tal fatigue asso ciated with sustained speech pro cessing demands. Ear and he aring , 34(5):523–534. (Cite d on p age 33 ) [124] How ard, D. and Angus, J. (2009). A c oustics and Psycho ac oustics 4th Edition . Oxford: F o cal Press, 4th edition. (Cite d on p age 73 ) [125] Huisman, T., Ahrens, A., and MacDonald, E. (2021). Ambisonics sound source lo calization with v arying amoun t of visual information in virtual realit y . F r ontiers in Virtual R e ality , 2. (Cite d on p ages 16 and 49 ) [126] International T elecommunications Union - Radio communications Sector (ITU-R) (2015). Metho ds for the sub jective assessmen t of small impair- men ts in audio systems. T ec hnical rep ort, In ternational T elecomm unications Union, Genev a. (Cite d on p ages 42 , 61 , 105 and 132 ) [127] ISO (2009). 3382-1: Acoustics - measuremen t of ro om acoustic parame- ters. part 1 : P erformance spaces. ISO 1:2009, ISO. (Cite d on p ages 33 , 34 , 35 and 70 ) [128] J¨ anck e, L. (2008). Music, memory and emotion. Journal of biolo gy , 7(6):1–5. (Cite d on p age 8 ) [129] Jin, C., Corderoy , A., Carlile, S., and v an Schaik, A. (2004). Contrasting monaural and interaural sp ectral cues for h uman sound lo calization. The Journal of the A c oustic al So ciety of Americ a , 115(6):3124–3141. (Cite d on p age 11 ) [130] Jot, J.-M., W ardle, S., and Larcher, V. (1998). approaches to binaural syn thesis. Journal of the Audio Engine ering So ciety . (Cite d on p age 27 ) BIBLIOGRAPHY 207 [131] Kang, S. and Kim, S.-H. K. (1996). Realistic audio teleconferencing using binaural and auralization techniques. Etri Journal , 18:41–51. (Cite d on p age 23 ) [132] Katz, B. F. G. and Noisternig, M. (2014). A comparative study of in ter- aural time delay estimation metho ds. The Journal of the A c oustic al So ciety of Americ a , 135(6):3530–3540. (Cite d on p age 70 ) [133] Keet, V. (1968). The influence of early lateral reflections on the spatial impression. Pr o c. 6th Int. Cong. A c oust., T okyo , 2. (Cite d on p age 39 ) [134] Keidser, G., Na ylor, G., Brungart, D. S., Caduff, A., Camp os, J., Carlile, S., Carp enter, M. G., Grimm, G., Hohmann, V., Holub e, I., Launer, S., Lunner, T., Mehra, R., Rapp ort, F., Slaney , M., and Smeds, K. (2020). The quest for ecological v alidity in hearing science: what it is, why it matters, and how to adv ance it. Ear and He aring , 41(S1):5S–19S. (Cite d on p ages 53 , 56 , 98 , 122 and 191 ) [135] Kestens, K., Degeest, S., and Keppler, H. (2021). The effect of cognition on the aided b enefit in terms of sp eech understanding and listening effort obtained with digital hearing aids: A systematic review. A meric an Journal of Audiolo gy , 30(1):190–210. (Cite d on p age 98 ) [136] Kirsch, C., Poppitz, J., W endt, T., v an de Par, S., and Ew ert, S. D. (2021). Computationally efficient spatial rendering of late rev erb eration in virtual acoustic environmen ts. 2021 Immersive and 3D A udio: fr om A r chite ctur e to Automotive (I3DA) . (Cite d on p age 42 ) [137] Klatte, M., Lachmann, T., Meis, M., et al. (2010). Effects of noise and rev erb eration on sp eec h p erception and listening comprehension of children and adults in a classro om-lik e setting. Noise and He alth , 12(49):270. (Cite d on p ages 1 and 191 ) [138] Kleiner, M., Dalenb¨ ack, B.-I., and Svensson, P . (1993). Auralization-an o v erview. Journal of the Audio Engine ering So ciety , 41(11):861–875. (Cite d on p age 19 ) [139] Klemenz, M. (2005). Sound synthesis of starting electric railb ound v ehi- cles and the influence of consonance on sound quality . A cta acustic a unite d with acustic a , 91(4):779–788. (Cite d on p age 20 ) [140] Klo ckgether, S. and v an de Par, S. (2016). Just noticeable differences of spatial cues in ec hoic and anec hoic acoustical environmen ts. The Journal of the A c oustic al So ciety of Americ a , 140(4):EL352–EL357. (Cite d on p ages 93 and 94 ) [141] Kobay ashi, M., Ueno, K., and Ise, S. (2015). The Effects of Spatial- ized Sounds on the Sense of Presence in Auditory Virtual Environmen ts: A Psyc hological and Physiological Study. Pr esenc e: T ele op er ators and Virtual Envir onments , 24(2):163–174. (Cite d on p age 16 ) BIBLIOGRAPHY 208 [142] Ko ehnke, J. and Besing, J. (1996). A procedure for testing sp eech intelli- gibilit y in a virtual listening environmen t. Ear and He aring , 17(3):211–217. cited By 59. (Cite d on p age 40 ) [143] Ko elewijn, T., Zekveld, A. A., F esten, J. M., and Kramer, S. E. (2012). Pupil dilation uncov ers extra listening effort in the presence of a single-talker mask er. Ear and He aring , 33(2):291–300. (Cite d on p age 190 ) [144] Kramer, S. E., Bh uiyan, T., Bramsløw, L., Fiedler, L., Gra versen, C., Hadley , L. V., Innes-Brown, H., Naylor, G., Rich ter, M., Saunders, G. H., V ersfeld, N. J., W endt, D., Whitmer, W. M., and Zekveld, A. A. (2020). Inno v ativ e hearing aid research on ecological conditions and outcome mea- sures: The hear-eco pro ject. (Cite d on p age 191 ) [145] Kramer, S. E., Kapteyn, T. S., F esten, J. M., and T obi, H. (1996). The relationships b et w een self-rep orted hearing disability and measures of auditory disability . Audiolo gy , 35(5):277–287. (Cite d on p age 56 ) [146] Krokstad, A., Strom, S., and Sørsdal, S. (1968). Calculating the acous- tical ro om resp onse by the use of a ra y tracing technique. Journal of Sound and Vibr ation , 8(1):118–125. (Cite d on p age 19 ) [147] Krueger, M., Sch ulte, M., Brand, T., and Holub e, I. (2017). Developmen t of an adaptive scaling metho d for sub jective listening effort. The Journal of the A c oustic al So ciety of Americ a . (Cite d on p age 50 ) [148] Kuttruff, H. (2009). R o om A c oustics, Fifth Edition . T a ylor & F rancis. (Cite d on p ages 19 , 34 , 68 and 127 ) [149] Kwak, C., Han, W., Lee, J., Kim, J., and Kim, S. (2018). Effect of noise and rev erb eration on sp eec h recognition and listening effort for older adults. Geriatrics and Ger ontolo gy International . (Cite d on p ages 50 , 99 and 118 ) [150] Laitinen, M.-V. and Pulkki, V. (2009). Binaural repro duction for di- rectional audio co ding. In 2009 IEEE Workshop on Applic ations of Signal Pr o c essing to A udio and A c oustics , pages 337–340. (Cite d on p age 41 ) [151] Lau, M. K., Hicks, C., Kroll, T., and Zupancic, S. (2019). Effect of auditory task type on ph ysiological and sub jective measures of listening effort in individuals with normal hearing. Journal of Sp e e ch, L anguage, and He aring R ese ar ch . (Cite d on p ages 50 and 52 ) [152] Lau, S.-T., Pic hora-F uller, M., Li, K., Singh, G., and Camp os, J. (2016). Effects of hearing loss on dual-task p erformance in an audio visual virtual re- alit y simulation of listening while walking. Journal of the A meric an A c ademy of Audiolo gy , 27. (Cite d on p age 57 ) BIBLIOGRAPHY 209 [153] Letowski, T. and Letowski, S. (2011). Lo calization error accuracy and precision of auditory lo calization. In Strumillo, P ., editor, A dvanc es in Sound L o c alization , chapter 4, pages 55–78. In tec h, Oxford. (Cite d on p ages 9 , 10 and 17 ) [154] Levy , S. M. (2012). Section 9 - calculations to determine the effec- tiv eness and con trol of thermal and sound transmission. In Levy , S. M., editor, Construction Calculations Manual , pages 503–544. Butterw orth- Heinemann, Boston. (Cite d on p age 60 ) [155] Lindau, A. and Brinkmann, F. (2012). p erceptual ev aluation of head- phone comp ensation in binaural synthesis based on non-individual record- ings. journal of the audio engine ering so ciety , 60(1/2):54–62. (Cite d on p age 14 ) [156] Lindau, A., Kosank e, L., and W einzierl, S. (2010). p erceptual ev aluation of ph ysical predictors of the mixing time in binaural room impulse responses. Journal of the A udio Engine ering So ciety . (Cite d on p age 126 ) [157] Lindemann, W. (1986). Extension of a binaural cross-correlation mo del b y contralateral inhibition. i. simulation of lateralization for stationary sig- nals. The Journal of the A c oustic al So ciety of Americ a , 80 6:1608–22. (Cite d on p age 45 ) [158] Liu, Z., F ard, M., and Jazar, R. (2015). Developmen t of an acoustic material database for vehicle in terior trims. T ec hnical report, SAE T ec hnical P ap er. (Cite d on p age 21 ) [159] Llopis, H. S., Pind, F., and Jeong, C.-H. (2020). Dev elopmen t of an auditory virtual realit y system based on pre-computed b-format im- pulse resp onses for building design ev aluation. Building and Envir onment , 169:106553. (Cite d on p age 133 ) [160] Llorach, G., Ev ans, A., Blat, J., Grimm, G., and Hohmann, V. (2016). W eb-based liv e speech-driv en lip-sync. In 2016 8th International Confer enc e on Games and Virtual Worlds for Serious Applic ations (VS-GAMES) , pages 1–4. (Cite d on p ages 1 and 57 ) [161] Llorach, G., Grimm, G., Hendrikse, M. M., and Hohmann, V. (2018). T ow ards Realistic Immersiv e Audiovisual Sim ulations for Hearing Research. In Pr o c e e dings of the 2018 Workshop on Audio-Visual Sc ene Understanding for Immersive Multime dia , pages 33–40. (Cite d on p ages 1 , 18 , 27 , 40 , 53 , 56 , 57 , 58 and 191 ) [162] Llorca-Bof ´ ı, J., Dreier, C., Heck, J., and V orl¨ ander, M. (2022). Urban sound auralization and visualization framew ork;case study at ih tapark. Sus- tainability , 14(4). (Cite d on p age 20 ) BIBLIOGRAPHY 210 [163] Loizou, P . C. (2007). Sp e e ch enhanc ement: the ory and pr actic e . CRC press. (Cite d on p age 52 ) [164] Lokki, T. and Savio ja, L. (2008). Virtual acoustics. In Ha velock, D., Ku w ano, S., and V orl¨ ander, M., editors, Handb o ok of Signal Pr o c essing in A c oustics , pages 761–771. Springer New Y ork, New Y ork, NY. (Cite d on p age 20 ) [165] Long, M. (2014). A r chite ctur al A c oustics . Elsevier Science. (Cite d on p ages 16 , 19 and 42 ) [166] Lop ez, J. J., Gutierrez, P ., Cob os, M., and Aguilera, E. (2014). Sound distance p erception comparison b etw een W a v e Field Synthesis and V ector Base Amplitude Panning. In ISCCSP 2014 - 2014 6th International Sym- p osium on Communic ations, Contr ol and Signal Pr o c essing, Pr o c e e dings . (Cite d on p ages 21 and 121 ) [167] Lov edee-T urner, M. and Murph y , D. (2018). Application of mac hine learning for the spatial analysis of binaural ro om impulse resp onses. Applie d Scienc es , 8(1). (Cite d on p age 15 ) [168] Lund, K. D., Ahrens, A., and Dau, T. (2020). A metho d for ev alu- ating audio-visual scene analysis in multi-talk er en vironmen ts. In Pr o c e e d- ings of the International Symp osium on Auditory and Audiolo gic al R ese ar ch , v olume 7, pages 357–364. The Danav ox Jubilee F oundation. International Symp osium on Auditory and Audiological Research ISAAR2019. (Cite d on p age 42 ) [169] Lundb eck, M., Grimm, G., Hohmann, V., Laugesen, S., and Neher, T. (2017). Sensitivity to Angular and Radial Source Mo v emen ts as a F unction of Acoustic Complexity in Normal and Impaired Hearing. T r ends in He aring , 21:2331–2165. (Cite d on p ages 33 and 57 ) [170] Lyon, R. F. (2017). Human and Machine He aring Extr acting Me aning fr om Sound . a. Cam bridge Universit y Press. (Cite d on p age 10 ) [171] Magezi, D. A. (2015). Linear mixed-effects models for within-participan t psyc hology exp eriments: an introductory tutorial and free, graphical user in terface (lmmgui). F r ontiers in Psycholo gy , 6:2. (Cite d on p age 113 ) [172] Malham, D. G. and Myatt, A. (1995). 3-d sound spatialization using am bisonic techniques. Computer Music Journal , 19(4):58–70. (Cite d on p age 29 ) [173] Mansour, N., Marschall, M., May , T., W estermann, A., and Dau, T. (2021a). Sp eech intelligibilit y in a realistic virtual sound environmen t. The Journal of the A c oustic al So ciety of Americ a , 149(4):2791–2801. (Cite d on p age 99 ) BIBLIOGRAPHY 211 [174] Mansour, N., W estermann, A., Marsc hall, M., Ma y , T., Dau, T., and Buc hholz, J. (2021b). Guided ecological momentary assessmen t in real and virtual sound en vironments. A c oustic al So ciety of A meric a. Journal , 150(4):2695–2704. (Cite d on p ages 16 and 42 ) [175] Marentakis, G., Zotter, F., and F rank, M. (2014). V ector-base and am- bisonic amplitude panning: A comparison using p op, classical, and contem- p orary spatial music. A cta A custic a unite d with A custic a . (Cite d on p ages 57 and 86 ) [176] Marrone, N., Mason, C. R., and Kidd, G. (2008). The effects of hearing loss and age on the b enefit of spatial separation b etw een multiple talk ers in reverberant ro oms. The Journal of the A c oustic al So ciety of A meric a , 124(5):3064–3075. (Cite d on p ages 16 and 40 ) [177] Marschall, M. (2014). Capturing and repro ducing realistic acoustic scenes for hearing research. PhD Thesis - T e chnic al University of Denmark . (Cite d on p ages 40 , 53 , 99 and 120 ) [178] Masiero, B. (2012). Individualize d Binaur al T e chnolo gy. Me asur ement, Equalization and Per c eptual Evaluation . PhD thesis, R WTH Aac hen Uni- v ersit y . (Cite d on p age 14 ) [179] Masiero, B. and F els, J. (2011). Perceptually robust headphone equal- ization for binaural repro duction. In Audio Engine ering So ciety Convention 130 . Audio Engineering So ciet y . (Cite d on p age 22 ) [180] Masiero, B. and V orlaender, M. (2011). Spatial Audio Repro duction Metho ds for Virtual Realit y. In 42 º Congr eso Esp a ˜ nol de A c ´ ustic a Encuentr o Ib ´ eric o de A c ´ ustic a - Eur op e an Symp osium on Envir onmental A c oustics and on Buildings A c oustic al ly Sustainable , pages 1–12, C´ aceres. (Cite d on p ages 23 , 24 , 58 and 86 ) [181] Matthen, M. (2016). Effort and displeasure in p eople who are hard of hearing. Ear and He aring , 37 Suppl 1. (Cite d on p age 57 ) [182] May , T., v an de P ar, S., and Kohlrausch, A. (2011). A probabilistic mo del for robust lo calization based on a binaural auditory fron t-end. IEEE T r ansactions on A udio, Sp e e ch, and L anguage Pr o c essing , 19(1):1–13. (Cite d on p ages xix , 6 , 138 , 150 , 156 , 166 , 167 , 174 and 178 ) [183] Meesaw at, K. and Hammershoi, D. (2003). The time when the reverber- ation tail in a binaural ro om impulse resp onse b egins. In Audio Engine ering So ciety Convention 115 . Audio Engineering So ciety . (Cite d on p age 15 ) [184] Menase, D. A., Ric hter, M., W endt, D., Fiedler, L., and Naylor, G. (2022). T ask-induced men tal fatigue and motiv ation influence listening effort as measured by the pupil dilation in a sp eech-in-noise task. me dRxiv . (Cite d on p age 33 ) BIBLIOGRAPHY 212 [185] Michael, V. and V orl¨ ander, M. (2008). A ur alization. F undamentals of A c oustics, Mo del ling, Simulation, Algorithms and A c oustic Virtual R e ality . Springer. (Cite d on p age 98 ) [186] Miles, K., McMahon, C., Boisvert, I., Ibrahim, R., de Lissa, P ., Gra- ham, P ., and Lyxell, B. (2017). Ob jective Assessmen t of Listening Effort: Coregistration of Pupillometry and EEG. T r ends in He aring . (Cite d on p ages 50 , 51 and 118 ) [187] Millington, G. (1932). A mo dified formula for reverberation. The Journal of the A c oustic al So ciety of A meric a , 4(1A):69–82. (Cite d on p age 34 ) [188] Minnaar, P ., F avrot, S., and Buc hholz, J. (2010). Improving hearing aids through listening tests in a virtual sound en vironment. He aring Journal , 63(10):40–44. (Cite d on p ages 1 , 16 , 42 , 121 and 191 ) [189] Møller, H., Sørensen, M. F., Hammershøi, D., and Jensen, C. B. (1995). Head-related transfer functions of h uman sub jects. Journal of the A udio Engine ering So ciety , 43(5):300–321. (Cite d on p age 12 ) [190] Monaghan, J. J., Krumbholz, K., and Seeber, B. U. (2013). F actors affecting the use of env elop e in teraural time differences in rev erb erationa). The Journal of the A c oustic al So ciety of A meric a , 133(4):2288–2300. bibtex: Monaghan2013. (Cite d on p age 33 ) [191] Mo ore, B. C. J. and T an, C.-T. (2004). dev elopmen t and v alidation of a metho d for predicting the perceived naturalness of sounds sub jected to sp ectral distortion. Journal of the Audio Engine ering So ciety , 52(9):900– 914. (Cite d on p age 41 ) [192] Mo ore, T. M. and Picou, E. M. (2018). A p otential bias in sub jective ratings of men tal effort. Journal of Sp e e ch, L anguage, and He aring R ese ar ch . (Cite d on p ages 50 , 51 and 118 ) [193] Mueller, M. F., Kegel, A., Sc himmel, S. M., Dillier, N., and Hofbauer, M. (2012). Localization of virtual sound sources with bilateral hearing aids in realistic acoustical scenes. The Journal of the A c oustic al So ciety of A meric a , 131(6):4732–4742. (Cite d on p age 16 ) [194] M ¨ uller, S. and Massarani, P . (2001). T ransfer-function measurement with sweeps. Journal of the A udio Engine ering So ciety , 49:443–471. (Cite d on p ages 63 and 140 ) [195] Murta, B. (2019). Plataforma p ar a ensaios de p er c ep¸ c˜ ao sonor a c om fontes distribu ´ ıdas aplic´ avel a disp ositivos auditivos: p erSONA (in Por- tuguese) . PhD thesis, F ederal Universit y of Santa Catarina. (Cite d on p ages 1 and 57 ) BIBLIOGRAPHY 213 [196] Murta, B., Chiea, R., Mour˜ ao, G., Pinheiro, M. M., Cordioli, J., P aul, S., and Costa, M. (2019). Cci-mobile: Dev elopment of soft w are based to ols for sp eec h p erception assessment and training with hearing impaired brazil- ian population. In CONFERENCE on Implantable A uditory Pr ostheses (CIAP), L ake T aho e, California, US . (Cite d on p age 18 ) [197] Møller, H. (1992). F undamentals of binaural tec hnology . Applie d A c ous- tics , 36(3-4):171–218. (Cite d on p ages 15 and 73 ) [198] Nach bar, C., Zotter, F., Deleflie, E., and Sontacc hi, A. (2011). Ambix – a suggested ambisonics format. (Cite d on p age 103 ) [199] Narbutt, M., Allen, A., Sk oglund, J., Chinen, M., and Hines, A. (2018). Am biqual - a full reference ob jective qualit y metric for ambisonic spatial audio. In 2018 T enth International Confer enc e on Quality of Multime dia Exp erienc e (QoMEX) , pages 1–6. (Cite d on p age 27 ) [200] Naugolnykh, K. A., Ostro vsky , L. A., Sap ozhnik ov, O. A., and Hamilton, M. F. (2000). Nonlinear wa ve pro cesses in acoustics. (Cite d on p age 9 ) [201] Naylor, G. M. (1993). Odeon—another h ybrid ro om acoustical mo del. Applie d A c oustics , 38(2-4):131–143. (Cite d on p age 68 ) [202] Neuhoff, J. (2021). Ec olo gic al psycho ac oustics . Brill. (Cite d on p age 190 ) [203] Neuman, A. C., W roblewski, M., Ha jicek, J., and Rubinstein, A. (2010). Com bined effects of noise and reverberation on sp eec h recognition p erfor- mance of normal-hearing c hildren and adults. Ear and He aring . (Cite d on p ages 99 and 118 ) [204] Nicola, P . and Chiara, V. (2019). Impact of Background Noise Fluc- tuation and Rev erb eration on Resp onse Time in a Sp eec h Reception T ask. Journal of Sp e e ch, L anguage, and He aring R ese ar ch , 62(11):4179–4195. (Cite d on p ages 50 , 99 and 118 ) [205] Nielsen, J. and Dau, T. (2011). The danish hearing in noise test. Inter- national journal of audiolo gy , 50:202–8. (Cite d on p ages 101 and 102 ) [206] No ck e, C. and Mellert, V. (2002). Brief review on in situ measurement tec hniques of imp edance or absorption. In F orum A custicum, Sevil la . (Cite d on p age 21 ) [207] Nov o, P . (2005). Auditory virtual en vironmen ts. In Blauert, J., edi- tor, Communic ation A c oustics , pages 277–297. Springer Berlin Heidelb erg, Berlin, Heidelb erg. (Cite d on p age 57 ) BIBLIOGRAPHY 214 [208] Obleser, J., W¨ ostmann, M., Hellb ernd, N., Wilsch, A., and Maess, B. (2012). Adverse listening conditions and memory load drive a common alpha oscillatory netw ork. Journal of Neur oscienc e , 32(36):12376–12383. (Cite d on p age 111 ) [209] Ohlenforst, B., W endt, D., Kramer, S. E., Naylor, G., Zekv eld, A. A., and Lunner, T. (2018). Impact of SNR, mask er type and noise reduction pro cessing on sen tence recognition p erformance and listening effort as indi- cated by the pupil dilation resp onse. He aring R ese ar ch . (Cite d on p ages 50 and 101 ) [210] Ohlenforst, B., Zekv eld, A. A., Jansma, E. P ., W ang, Y., Na ylor, G., Lorens, A., Lunner, T., and Kramer, S. E. (2017a). Effects of hearing impairmen t and hearing aid amplification on listening effort: A systematic review. Ear and he aring , 38(3):267—281. (Cite d on p age 98 ) [211] Ohlenforst, B., Zekveld, A. A., Lunner, T., W endt, D., Na ylor, G., W ang, Y., V ersfeld, N. J., and Kramer, S. E. (2017b). Impact of stimulus-related factors and hearing impairment on listening effort as indicated by pupil dilation. He aring R ese ar ch , 351:68–79. (Cite d on p age 50 ) [212] Oreinos, C. and Buc hholz, J. (2014). V alidation of realistic acoustic en vironmen ts for listening tests using directional hearing aids. In 2014 14th International Workshop on A c oustic Signal Enhanc ement (IW AENC) , pages 188–192. (Cite d on p ages 41 and 120 ) [213] Oreinos, C. and Buchholz, J. M. (2015). Ob jectiv e analysis of am bisonics for hearing aid applications: Effect of listener’s head, ro om reverberation, and directional microphones. The Journal of the A c oustic al So ciety of A mer- ic a . (Cite d on p ages 18 , 41 , 53 , 163 and 191 ) [214] Palacino, J., Nicol, R., Emerit, M., and Gros, L. (2012). P erceptual assessmen t of binaural deco ding of first-order am bisonics. In A c oustics 2012 . (Cite d on p age 21 ) [215] Parsehian, G., Gandemer, L., Bourdin, C., and Kronland Martinet, R. (2015). Design and p erceptual ev aluation of a fully immersiv e three- dimensional sound spatialization system. In 3r d International Confer enc e on Sp atial Audio (ICSA 2015) , Graz, Austria. (Cite d on p age 42 ) [216] Paul, S. (13-15 maio 2014). A fisiologia da audi¸ c˜ ao como base para fenˆ omenos auditiv os. In Pr o c e e dings of the 12th AES Br azil Confer enc e , S˜ ao Paulo, SP . (Cite d on p age 9 ) [217] Pausc h, F., Asp¨ oc k, L., V orl¨ ander, M., and F els, J. (2018). An Ex- tended Binaural Real-Time Auralization System With an Interface to Re- searc h Hearing Aids for Exp eriments on Sub jects With Hearing Loss. T r ends in He aring . (Cite d on p ages 16 , 44 , 45 , 120 and 121 ) BIBLIOGRAPHY 215 [218] Pausc h, F., Behler, G., and F els, J. (2020). Scalar - a surrounding spher- ical cap loudsp eak er arra y for flexible generation and ev aluation of virtual acoustic environmen ts. A cta A cust. , 4(5):19. (Cite d on p ages 1 and 57 ) [219] Pausc h, F. and F els, J. (2019). Mobilab – a mobile lab oratory for on-site listening exp erimen ts in virtual acoustic environmen ts. bioRxiv . (Cite d on p ages 1 and 57 ) [220] Pausc h, F. and F els, J. (2020). Lo calization p erformance in a binaural real-time auralization system extended to researc h hearing aids. T r ends in he aring , 24:1–18. (Cite d on p ages 1 , 42 and 57 ) [221] Pelzer, S., Masiero, B., and V orl¨ ander, M. (2014). 3D Repro duction of Ro om Auralizations b y Com bining Intensit y Panning, Crosstalk Cancella- tion and Ambisonics. Pr o c e e dings of the EAA Joint Symp osium on Aur al- ization and Ambisonics . (Cite d on p ages 44 , 45 , 86 and 127 ) [222] Peng, Z. E. and Litovsky , R. Y. (2021). The role of in teraural differences, head shado w, and binaural redundancy in binaural intelligibilit y b enefits among school-aged children. T r ends in He aring , 25. (Cite d on p age 77 ) [223] Petersen, E. B., W¨ ostmann, M., Obleser, J., Stenfelt, S., and Lunner, T. (2015). Hearing loss impacts neural alpha oscillations under adverse listening conditions. F r ontiers in Psycholo gy . (Cite d on p age 50 ) [224] Pichora-F uller, M. K., Kramer, S. E., Eck ert, M. A., Edwards, B., Hornsb y , B. W., Humes, L. E., Lemke, U., Lunner, T., Matthen, M., Mack- ersie, C. L., Naylor, G., Phillips, N. A., Rich ter, M., Rudner, M., Sommers, M. S., T rembla y , K. L., and Wingfield, A. (2016). Hearing impairment and cognitiv e energy: The framework for understanding effortful listening (FUEL). In Ear and He aring . (Cite d on p ages 50 , 51 , 52 , 57 and 118 ) [225] Picou, E. M., Gordon, J., and Ric ketts, T. A. (2016). The effects of noise and rev erb eration on listening effort in adults with normal hearing. Ear and He aring . (Cite d on p ages 50 and 99 ) [226] Picou, E. M., Mo ore, T. M., and Rick etts, T. A. (2017). The effects of directional pro cessing on ob jectiv e and sub jectiv e listening effort. Journal of Sp e e ch, L anguage, and He aring R ese ar ch . (Cite d on p ages 1 and 51 ) [227] Picou, E. M., Rick etts, T., and Hornsb y , B. (2013). Ho w hearing aids, bac kground noise, and visual cues influence ob jectiv e listening effort. Ear and He aring , 34:e52–e64. (Cite d on p ages 50 and 56 ) BIBLIOGRAPHY 216 [228] Picou, E. M. and Rick etts, T. A. (2014). Increasing motiv ation changes sub jectiv e rep orts of listening effort and c hoice of coping strategy . Interna- tional Journal of A udiolo gy , 53(6):418–426. (Cite d on p age 50 ) [229] Picou, E. M. and Rick etts, T. A. (2018). The relationship b etw een sp eech recognition, b ehavioural listening effort, and sub jective ratings. Interna- tional Journal of A udiolo gy . (Cite d on p ages 51 and 118 ) [230] Pielage, H., Zekveld, A. A., Saunders, G. H., V ersfeld, N. J., Lunner, T., and Kramer, S. E. (2021). The Presence of Another Individual Influences Listening Effort, But Not Performance. Ear & He aring . (Cite d on p ages 40 , 57 , 82 and 190 ) [231] Pieren, R. (2018). Aur alization of Envir onmental A c oustic al Sc eneries: Synthesis of R o ad T r affic, R ailway and Wind T urbine Noise . PhD thesis, Delft Universit y of T echnology . (Cite d on p age 20 ) [232] Pieren, R., Heutsc hi, K., W underli, J. M., Snellen, M., and Simons, D. G. (2017). Auralization of railwa y noise: Emission synthesis of rolling and impact noise. Applie d A c oustics , 127:34–45. (Cite d on p age 20 ) [233] Pinheiro, J. C. and Bates, D. M. (2000). Linear mixed-effects mo dels: basic concepts and examples. Mixe d-effe cts mo dels in S and S-Plus , pages 3–56. (Cite d on p age 113 ) [234] Plain, B., Pielage, H., Rich ter, M., Bhuiy an, T., Lunner, T., Kramer, S., and Zekveld, A. (2021). Social observ ation increases the cardio v ascular re- sp onse of hearing-impaired listeners during a sp eech reception task. He aring R ese ar ch , page 108334. (Cite d on p ages 57 and 190 ) [235] Plinge, A., Schlec ht, S. J., Thiergart, O., Rob otham, T., Rumm uk ainen, O., and Hab ets, E. A. P . (2018). six-degrees-of-freedom binaural audio repro duction of first-order am bisonics with distance information. Journal of the audio engine ering so ciety . (Cite d on p age 27 ) [236] Poletti, M. A. (2005). Three-dimensional surround sound systems based on spherical harmonics. journal of the audio engine ering so ciety , 53(11):1004–1025. (Cite d on p age 31 ) [237] Politis, A. (2016). Micr ophone arr ay pr o c essing for p ar ametric sp atial audio te chniques . Do ctoral thesis, School of Electrical Engineering. (Cite d on p ages 130 , 131 and 181 ) [238] Politis, A., McCormac k, L., and Pulkki, V. (2017). Enhancement of am bisonic binaural repro duction using directional audio co ding with opti- mal adaptiv e mixing. In 2017 IEEE Workshop on Applic ations of Signal Pr o c essing to A udio and A c oustics (W ASP AA) , pages 379–383. (Cite d on p age 27 ) BIBLIOGRAPHY 217 [239] Pollo w, M. (2015). Dir e ctivity Patterns for R o om A c oustic al Me asur e- ments and Simulations . Aac hener Beitr¨ age zur T echnisc hen Akustik. Logos V erlag Berlin GmbH. (Cite d on p ages 28 and 29 ) [240] Portela, M. S. (2008). Caracteriza¸ c˜ ao de fontes sonoras e aplica¸ c˜ ao na auraliza¸ c˜ ao de ambien tes. Mestrado, Universidade F ederal de Santa Cata- rina. (Cite d on p age 13 ) [241] Pulkki, V. (1997). Virtual sound source p ositioning using v ector base amplitude panning. Journal of the Audio Engine ering So ciety , 45(6). (Cite d on p ages 23 , 24 , 26 , 58 , 93 , 142 , 179 and 190 ) [242] Pulkki, V. and Karjalainen, M. (2015). Communic ation A c oustics: A n Intr o duction to Sp e e ch, Audio and Psycho ac oustics . Wiley . (Cite d on p ages 10 , 17 , 70 , 73 and 143 ) [243] Pulkki, V., Politis, A., Laitinen, M.-V., Vilk amo, J., and Ahonen, J. (2017). First-order directional audio co ding (dirac). In Par ametric Time- F r e quency Domain Sp atial Audio , c hapter 5, pages 89–140. John Wiley & Sons, Ltd. (Cite d on p ages 44 , 45 and 127 ) [244] Purdy , M. (1991). Listening and comm unit y: The role of listening in comm unit y formation. International Journal of Listening , 5(1):51–67. (Cite d on p age 8 ) [245] Queiroz, M., Iazzetta, F., Kon, F., Gomes, M. H. A., Figueiredo, F. L., Masiero, B. S., Dias, L., T orres, M. H. C., and Thomaz, L. F. (2008). Acm us: an op en, integrated platform for ro om acoustics research. J. Br az. Comput. So c. , 14(3):87–103. (Cite d on p age 36 ) [246] Rayleigh, L. (1907). Xii. on our p erception of sound direction. (Cite d on p age 10 ) [247] Reichardt, W., Alim, O. A., and Schmidt, W. (1975). Definition and basis of making an ob jective ev aluation to distinguish b et w een useful and useless clarit y defining m usical p erformances. A cta A custic a unite d with A custic a , 32(3):126–137. (Cite d on p age 36 ) [248] Rennies, J., Brand, T., and Kollmeier, B. (2011). Prediction of the influence of reverberation on binaural sp eec h intelligibilit y in noise and in quiet. The Journal of the A c oustic al So ciety of Americ a , 130(5):2999–3012. (Cite d on p age 40 ) [249] Rennies, J., Sc hepk er, H., Holub e, I., and Kollmeier, B. (2014). Listening effort and sp eec h in telligibility in listening situations affected by noise and rev erb eration. The Journal of the A c oustic al So ciety of Americ a . (Cite d on p age 50 ) BIBLIOGRAPHY 218 [250] Roffler, S. K. and Butler, R. A. (1968). F actors that influence the lo cal- ization of sound in the vertical plane. The Journal of the A c oustic al So ciety of Americ a , 43(6):1255–1259. (Cite d on p ages 10 and 33 ) [251] Roginsk a, A. (2017). Binaural audio through headphones. In Immersive Sound , pages 88–123. Routledge. (Cite d on p ages 40 and 53 ) [252] Romanov, M., Berghold, P ., F rank, M., Rudric h, D., Zaunsc hirm, M., and Zotter, F. (2017). Implementation and ev aluation of a low-cost head- trac k er for binaural syn thesis. Journal of the audio engine ering so ciety . (Cite d on p age 23 ) [253] Rose, J., Nelson, P ., Rafaely , B., and T akeuc hi, T. (2002). Sweet sp ot size of virtual acoustic imaging systems at asymmetric listener lo cations. The Journal of the A c oustic al So ciety of Americ a , 112(5):1992–2002. (Cite d on p ages 31 and 121 ) [254] Rossing, T. D. (2007). Springer Handb o ok of A c oustics . Springer Hand- b o ok of Acoustics. Springer-V erlag Berlin Heidelb erg, Stanford, CA, 2 edi- tion. (Cite d on p ages 16 , 19 , 33 , 36 , 38 and 98 ) [255] Rudenko, O. and Soluian, S. (1975). The theoretical principles of non- linear acoustics. Mosc ow Izdatel Nauka . (Cite d on p age 9 ) [256] Rumsey , F. (2013). Sp atial Audio . F o cal Press, Burlington, MA, 2 edi- tion. (Cite d on p ages 30 and 98 ) [257] Ruotolo, F., Maffei, L., Di Gabriele, M., Iachini, T., Masullo, M., Rug- giero, G., and Senese, V. P . (2013). Immersiv e virtual realit y and environ- men tal noise assessment: An inno v ative audio–visual approach. Envir on- mental Imp act Assessment R eview , 41:10–20. (Cite d on p age 16 ) [258] Sabine, W. (1922). Col le cte d Pap ers on A c oustics . Harv ard Univ ersity Press. (Cite d on p age 34 ) [259] Savio ja, L., Huopaniemi, J., Lokki, T., and V aananen, R. (1999). Cre- ating in teractiv e virtual acoustic environmen ts. Journal of the Audio Engi- ne ering So ciety , 47:675–705. (Cite d on p ages 1 and 57 ) [260] Schepk er, H., Haeder, K., Rennies, J., and Holub e, I. (2016). P er- ceiv ed listening effort and sp eec h in telligibilit y in rev erb eration and noise for hearing-impaired listeners. International Journal of A udiolo gy . (Cite d on p ages 1 and 50 ) [261] Schr¨ oder, D. (2011). Physic al ly Base d R e al-Time A ur alization of Inter- active Virtual Envir onments . Aac hener Beitr¨ age zur T echnisc hen Akustik. Logos V erlag Berlin. (Cite d on p age 28 ) BIBLIOGRAPHY 219 [262] Schroeder, M. and Atal, B. (1963). Computer sim ulation of sound trans- mission in ro oms. Pr o c e e dings of the IEEE , 51(3):536–537. (Cite d on p age 22 ) [263] Schroeder, M., Atal, B., and Bird, C. (1962). Digital computers in ro om acoustics. Pr o c. 4th ICA, Cop enhagen M , 21. (Cite d on p age 19 ) [264] Schroeder, M. R. (1965). New metho d of measuring rev erb eration time. The Journal of the A c oustic al So ciety of Americ a , 37(3):409–412. (Cite d on p age 144 ) [265] Schroeder, M. R. (1979). In tegrated impulse metho d measuring sound deca y without using impulses. The Journal of the A c oustic al So ciety of A meric a , 66(2):497–500. (Cite d on p age 144 ) [266] Schr¨ oder, D., Pohl, A., Drechsler, S., Svensson, U. P ., V orl¨ ander, M., and Stephenson, U. M. (2013). op enmat - managemen t of acoustic material (meta-)prop erties using an op en source database format. In Pr o c e e dings of the AIA-DA GA 2013 . (Cite d on p age 21 ) [267] Schr¨ oder, D., W efers, F., Pelzer, S., Rausc h, D., V orlaender, M., and Kuhlen, T. (2010). Virtual realit y system at rwth aac hen univ ersit y . In Pr o c e e dings of the International Symp osium on R o om A c oustics (ISRA) . (Cite d on p age 56 ) [268] Seeb er, B. U., Baumann, U., and F astl, H. (2004). Localization ability with bimo dal hearing aids and bilateral co c hlear implan ts. The Journal of the A c oustic al So ciety of Americ a , 116(3):1698–1709. (Cite d on p age 40 ) [269] Seeb er, B. U., Kerb er, S., and Hafter, E. R. (2010). A system to sim u- late and repro duce audio–visual en vironmen ts for spatial hearing researc h. He aring r ese ar ch , 260(1):1–10. (Cite d on p age 56 ) [270] Seikel, J., King, D., and Drumrigh t, D. (2015). A natomy & Physiolo gy for Sp e e ch, L anguage, and He aring . Cengage Learning. (Cite d on p age 14 ) [271] Sette, W. J. (1933). A new rev erb eration time form ula. The Journal of the A c oustic al So ciety of Americ a , 4(3):193–210. (Cite d on p age 34 ) [272] Shavit-Cohen, K. and Zion Golum bic, E. (2019). The dynamics of atten- tion shifts among concurrent sp eec h in a naturalistic multi-speaker virtual en vironmen t. F r ontiers in Human Neur oscienc e , 13:386. (Cite d on p ages 1 and 57 ) [273] Sho jaei, E., Ashay eri, H., Jafari, Z., Dast, M., and Kamali, K. (2016). Effect of signal to noise ratio on the sp eech p erception ability of older adults. Me dic al journal of the Islamic R epublic of Ir an , 30:342. (Cite d on p ages 1 and 55 ) BIBLIOGRAPHY 220 [274] Silzle, A., Kosmidis, D., F elix Greco, G., Beer, D., and Betz, L. (2016). The influence of microphone directivity on the lev el calibration and equal- ization of 3d loudsp eak ers setups. In 29th T onmeistertagung - VDT Inter- national Convention 2016 . (Cite d on p age 21 ) [275] Simon, L. S. R., Dillier, N., and W ¨ uthrich, H. (2021). Comparison of 3D audio repro duction metho ds using hearing devices. Journal of the Audio Engine ering So ciety , 68(12):899–909. (Cite d on p ages 21 , 46 , 93 , 94 and 121 ) [276] Simon, L. S. R., W uethrich, H., and Dillier, N. (2017). Comparison of higher-order ambisonics, vector- and distance-based amplitude panning using a hearing device b eamformer. In Pr o c e e dings of 4th International Confer enc e on Sp atial Audio, Gr az, A ustria . (Cite d on p ages 20 , 21 , 23 , 117 , 121 , 163 and 191 ) [277] Sim´ on G´ alvez, M., Menzies, D., F azi, F., de Campos, T., and Hilton, A. (2015). Listener tracking stereo for ob ject based audio repro duction. In T e cniacustic a 2016 (V alen cia)-Eur op e an Symp osium in Virtual A c oustics and Ambisonics . (Cite d on p age 27 ) [278] Skudrzyk, E. (1971). The foundations of ac oustics: b asic mathematics and b asic ac oustics . Springer-V erlag. (Cite d on p age 229 ) [279] Solv ang, A. (2008). Sp ectral impairmen t of t w o-dimensional higher order am bisonics. J. Audio Eng. So c , 56(4):267–279. (Cite d on p age 94 ) [280] Spand¨ ock, F. (1934). Akustisc he mo dellv ersuc he. Annalen der Physik , 412(4):345–360. (Cite d on p age 19 ) [281] Sp ors, S., T eutsc h, H., Kuntz, A., and Rab enstein, R. (2004). Sound field syn thesis. In Huang, Y. and Benesty , J., editors, Audio Signal Pr o c essing for Next-Gener ation Multime dia Communic ation Systems , pages 323–344. Springer US, Boston, MA. (Cite d on p age 31 ) [282] Sp ors, S., Wierstorf, H., Raak e, A., Melc hior, F., F rank, M., and Zotter, F. (2013). Spatial sound with loudsp eak ers and its p erception: A review of the current state. (Cite d on p ages 21 , 23 , 27 , 40 , 42 and 53 ) [283] Stitt, P ., Bertet, S., and V an W alstijn, M. (2013). P erceptual inv estiga- tion of image placement with ambisonics for non-cen tred listeners. In Pr o c. of the 16th Int. Confer enc e on Digital Audio Effe cts (DAFx-13), Mayno oth, Ir eland . (Cite d on p ages 21 , 46 and 49 ) [284] Strauss, H. (1998). Implemen ting doppler shifts for virtual auditory en vironmen ts. Journal of the Audio Engine ering So ciety . (Cite d on p age 20 ) BIBLIOGRAPHY 221 [285] Strumi l lo, P . (2011). A dvanc es in Sound L o c alization . a. InT ech. (Cite d on p ages 9 and 10 ) [286] Sudarsono, A. S., Lam, Y. W., and Da vies, W. J. (2016). The effect of sound lev el on p erception of repro duced soundscap es. Applie d A c oustics , 110:53–60. (Cite d on p age 42 ) [287] Søndergaard, P . and Ma jdak, P . (2013). The auditory mo deling to olb o x. In Blauert, J., editor, The T e chnolo gy of Binaur al Listening , pages 33–56. Springer, Berlin, Heidelb erg. (Cite d on p age 138 ) [288] T enenbaum, R. A., Camilo, T. S., T orres, J. C. B., and Gerges, S. N. (2007). Hybrid metho d for numerical simulation of ro om acoustics with au- ralization: part 1-theoretical and n umerical aspects. Journal of the Br azilian So ciety of Me chanic al Scienc es and Engine ering , 29:211–221. (Cite d on p age 68 ) [289] T rembla y , P ., Brisson, V., and Desc hamps, I. (2020). Brain aging and sp eec h p erception: Effects of background noise and talker v ariability . Neu- r oImage , 227:117675. (Cite d on p ages 1 and 55 ) [290] T revi ˜ no, J., Ok amoto, T., Iwa ya, Y., and Suzuki, Y. (2011). Ev aluation of a new ambisonic decoder for irregular loudsp eak er arrays using in teraural cues. In Ambisonics Symp osium . (Cite d on p age 94 ) [291] T u, W., Hu, R., W ang, H., and Chen, W. (2010). Measuremen t and analysis of just noticeable difference of in teraural level difference cue. 2010 International Confer enc e on Multime dia T e chnolo gy , pages 1–3. (Cite d on p age 148 ) [292] V an W anro oij, M. M. and V an Opstal, A. J. (2004). Contribution of head shado w and pinna cues to c hronic monaural sound lo calization. Journal of Neur oscienc e , 24(17):4163–4171. (Cite d on p age 11 ) [293] V orl¨ ander, M. (2007). A ur alization: F undamentals of A c oustics, Mo d- el ling, Simulation, Algorithms and A c oustic Virtual R e ality . R WTHedition. Springer Berlin Heidelb erg. (Cite d on p ages 2 , 14 , 15 , 18 , 19 , 20 , 21 , 22 , 33 , 40 , 42 , 53 and 121 ) [294] V orl¨ ander, M. (2008). Virtual Acoustics: Opp ortunities and limits of spatial sound repro duction for audiology . Hausdesho er ens-Oldenbur g . (Cite d on p age 56 ) [295] V orl¨ ander, M. (2014). Virtual acoustics. A r chives of A c oustics , v ol. 39(No 3):307–318. (Cite d on p age 40 ) [296] W allach, H. (1938). On sound lo calization. The Journal of the A c oustic al So ciety of Americ a , 10(1):83–83. (Cite d on p age 10 ) BIBLIOGRAPHY 222 [297] W ang, D. and Bro wn, G. J. (2006). Binaural sound lo calization. In Computational Auditory Sc ene A nalysis: Principles, A lgorithms, and Ap- plic ations , pages 147–185. Wiley . (Cite d on p age 146 ) [298] W anner, L., Blat, J., Dasiopoulou, S., Dom ´ ınguez, M., Llorac h, G., Mille, S., Sukno, F., Kamateri, E., V ro chidis, S., Kompatsiaris, I., Andr ´ e, E., Lin- genfelser, F., Mehlmann, G., Stam, A., Stellingw erff, L., Vieru, B., Lamel, L., Mink er, W., Pragst, L., and Ultes, S. (2016). T o w ards a m ultimedia kno wledge-based agen t with so cial comp etence and human interaction ca- pabilities. In Pr o c e e dings of the 1st International Workshop on Multime dia A nalysis and R etrieval for Multimo dal Inter action , MARMI ’16, page 21–26, New Y ork, NY, U SA. Asso ciation for Computing Machinery . (Cite d on p ages 1 and 57 ) [299] W ard, D. B. and Abhay apala, T. D. (2001). Repro duction of a plane- w a v e sound field using an arra y of loudsp eakers. IEEE T r ansactions on Sp e e ch and A udio Pr o c essing , 9(6):697–707. (Cite d on p ages 31 , 77 and 86 ) [300] W endt, D., Dau, T., and Hjortkjær, J. (2016). Impact of background noise and sen tence complexit y on pro cessing demands during sen tence com- prehension. F r ontiers in Psycholo gy . (Cite d on p age 51 ) [301] W endt, D., Hietk amp, R. K., and Lunner, T. (2017). Impact of noise and noise reduction on pro cessing effort: A pupillometry study. Ear and He aring . (Cite d on p ages 50 and 101 ) [302] W endt, D., Ko elewijn, T., Ksi¸ a ˙ zek, P ., Kramer, S. E., and Lunner, T. (2018). T ow ard a more comprehensive understanding of the impact of mask er type and signal-to-noise ratio on the pupillary resp onse while p er- forming a sp eech-in-noise test. He aring R ese ar ch , pages 1–12. (Cite d on p ages 50 , 101 and 102 ) [303] W estermann, A. and Buchholz, J. M. (2017). The effect of nearb y mask ers on sp eec h intelligibilit y in reverberant, multi-talk er environmen ts. The Journal of the A c oustic al So ciety of Americ a , 141(3):2214–2223. (Cite d on p ages 42 and 99 ) [304] Whitmer, W. M. and Ak eroyd, M. A. (2013). The sensitivity of hearing- impaired adults to acoustic attributes in simulated ro oms. Pr o c e e dings of Me etings on A c oustics , 19(1):015109. (Cite d on p ages 1 , 18 and 50 ) [305] Whitmer, W. M., Seeb er, B. U., and Akero yd, M. A. (2012). Apparen t auditory source width insensitivity in older hearing-impaired individuals. The Journal of the A c oustic al So ciety of Americ a , 132(1):369–379. (Cite d on p ages 16 , 18 and 40 ) [306] Wightman, F. L. and Kistler, D. J. (1992). The dominan t role of lo w- frequency interaural time differences in sound lo calization. The Journal of the A c oustic al So ciety of Americ a , 91(3):1648–1661. (Cite d on p age 10 ) BIBLIOGRAPHY 223 [307] Wightman, F. L. and Kistler, D. J. (1997). Monaural sound lo calization revisited. The Journal of the A c oustic al So ciety of Americ a , 101(2):1050– 1063. (Cite d on p age 11 ) [308] Wilcox, R. (2004). Inferences based on a skipp ed correlation co efficien t. Journal of Applie d Statistics , 31(2):131–143. (Cite d on p age 116 ) [309] Williams, G. (1999). F ourier A c oustics: Sound R adiation and Ne arfield A c oustic al Holo gr aphy . Academic Press. (Cite d on p ages 28 and 229 ) [310] Wisniewski, M. G., Thompson, E. R., and Iyer, N. (2017). Theta- and alpha-p o w er enhancemen ts in the electro encephalogram as an auditory de- la y ed match-to-sample task b ecomes imp ossibly difficult. Psychophysiolo gy , 54(12):1916–1928. (Cite d on p age 111 ) [311] Wisniewski, M. G., Thompson, E. R., Iy er, N., Estepp, J. R., Go der- Reiser, M. N., and Sulliv an, S. C. (2015). F rontal midline θ pow er as an index of listening effort. Neur or ep ort , 26(2):94—99. (Cite d on p age 111 ) [312] W ong, G. S. K. (1986). Sp eed of sound in standard air. The Journal of the A c oustic al So ciety of Americ a , 79(5):1359–1366. (Cite d on p age 15 ) [313] W¨ ostmann, M., Lim, S.-J., and Obleser, J. (2017). The Human Neu- ral Alpha Resp onse to Sp eec h is a Proxy of Atten tional Control. Cer ebr al Cortex , 27(6):3307–3317. (Cite d on p age 111 ) [314] Xie, B. (2013). He ad-r elate d tr ansfer function and virtual auditory dis- play . J. Ross Publishing. (Cite d on p ages 22 and 70 ) [315] Y ost, W. (2013). F undamentals of He aring: An Intr o duction . Brill. (Cite d on p age 8 ) [316] Zapata Ro driguez, V., Jeong, C.-H., Hoffmann, I., Cho, W.-H., Beldam, M.-B., and Harte, J. (2019). Acoustic conditions of clinic ro oms for sound field audiometry . In Pr o c e e dings of 23r d International Congr ess on A c ous- tics , pages 4654–59. Deutsc he Gesellsc haft f ¨ ur Akustik. 23rd International Congress on Acoustics , ICA 2019 ; Conference date: 09-09-2019 Through 13-09-2019. (Cite d on p ages 122 and 139 ) [317] Zekveld, A., Kramer, S., and F esten, J. (2011). Cognitive load during sp eec h p erception in noise: The influence of age, hearing loss, and cognition on the pupil resp onse. Ear and he aring , 32:498–510. (Cite d on p ages 1 and 55 ) BIBLIOGRAPHY 224 [318] Zekveld, A. A. and Kramer, S. E. (2014). Cognitive processing load across a wide range of listening conditions: Insights from pupillometry . Psy- chophysiolo gy . (Cite d on p ages 50 and 112 ) [319] Zekveld, A. A., Kramer, S. E., and F esten, J. M. (2010). Pupil resp onse as an indication of effortful listening: The influence of sentence intelligibilit y. Ear and He aring . (Cite d on p ages 50 and 118 ) [320] Zhang, W., Samarasinghe, P ., Chen, H., and Abhay apala, T. (2017). Sur- round b y Sound: A Review of Spatial Audio Recording and Repro duction. Applie d Scienc es , 7(5):532. (Cite d on p ages 19 , 20 , 21 , 27 and 40 ) [321] Zob el, B. H., W agner, A., Sanders, L. D., and Ba ¸ sken t, D. (2019). Spatial release from informational masking declines with age: Evidence from a de- tection task in a virtual separation paradigm. The Journal of the A c oustic al So ciety of Americ a , 146(1):548–566. (Cite d on p ages 16 and 40 ) [322] ˇ Lub o ˇ s Hl´ adek, Ewert, S. D., and Seeb er, B. U. (2021). Comm unication conditions in virtual acoustic scenes in an underground station. (Cite d on p age 42 ) [323] S ¸ aher, K., Rindel, J. H., Nijs, L., and V an Der V o orden, M. (2005). Impacts of reverberation time, absorption lo cation and background noise on listening conditions in multi source en vironmen t. In F orum A custicum Budap est 2005: 4th Eur op e an Congr ess on A custics . (Cite d on p age 50 ) App endix A ITDs Am bisonics Figure A.1 , depicts ITDs for measurements with a listener (HA TS manikin) in the cen ter with Ambisonics (blac k line), in nine off-center p ositions combi- nations accompanied by a second listener (KEMAR) and alone in those three off center p ositions. Figure A.1: ITD as a function of sour c e angle in Ambisonics virtualize d setup. T op left HA TS displac ement = 25 cm, top right HA TS displac ement = 50 cm, b ottom left HA TS displac ement = 75 cm, b ottom right HA TS displac ement matching KEMAR displac ement. 225 App endix B Delta ILD Am bisonics Figures B.1 , B.2 , and B.3 , presen t the differences in ILD b et w een center and off-cen ter listener p ositions utilizing 24 loudsp eak ers to render an Am bisonics with a second listener present inside the ring of loudsp eak ers. In the figures, the num b er follo wing H indicates the p osition of the main listener, while the n um b ers after K indicate the p osition of the second listener. Figure B.1: Differ enc es in the ILD b etwe en c enter e d setup and off-c enter setups: HA TS at 25 cm to the right with: KEMAR at 25 cm to the left (top); KEMAR at 50 cm to the left (midd le); KEMAR 75 cm to the left (b ottom). 226 227 Figure B.2: Differ enc es in the ILD b etwe en c enter e d setup and off- c enter setups: HA TS at 50 cm to the right with: KEMAR at 25 cm to the left (top); KEMAR at 50 cm to the left (midd le); KEMAR 75 cm to the left (b ottom). Figure B.3: Differ enc es in the ILD b etwe en c enter e d setup and off- c enter setups: HA TS at 75 cm to the right with: KEMAR at 25 cm to the left (top); KEMAR at 50 cm to the left (midd le); KEMAR 75 cm to the left (b ottom). App endix C W a v e Equation and Spherical Harmonic Represen tation Spherical harmonics (SH) represent spatial v ariations of an orthogonal set of solutions in the Laplace equation (orthonormal basis) when the solution is expressed in spherical co ordinates, th us giving the spatial represen tation of w eigh ted sums in spherical forms that represen ts a signal (space and frequency dep enden t). C.1 W a v e Equation in Spherical Co ordinates Expressing the wa ve equation in spherical co ordinates ( r , ϕ, θ ) [ 36 ] w e hav e: ∂ 2 p ∂ r 2 + 2 r ∂ p ∂ r + 1 r 2 sin( θ ) ∂ ∂ θ sin( θ ) ∂ p ∂ θ + 1 r 2 sin 2 ( ϕ ) ∂ 2 p ∂ ϕ 2 − 1 c 2 0 ∂ 2 p ∂ t 2 = 0 , (C.1) C.2 Separation of the V ariables The differential equation solution to ol called sep ar ation of variables can b e used for the Equation C.1 , b eing formulated from the pro duct of three space dep enden t v ariables and a time dep endent v ariable: p ( r , θ , ϕ, t ) = R ( r )Θ( θ )Φ( ϕ ) T ( t ) . (C.2) 228 Chapter C. Separation of the V ariables 229 With the separation of the v ariables, according to Skudrzyk [ 278 ], there are four homogeneous differential equations: d 2 Φ d ϕ + m 2 = 0 , (C.3a) 1 sin θ d d θ sin θ dΘ d θ + n ( n + 1) − m 2 sin 2 θ Θ = 0 , (C.3b) 1 r d d r r 2 d R d r + k 2 R − n ( n + 1) r 2 R = 0 , (C.3c) 1 c 2 d 2 T d t 2 + k 2 T = 0 . (C.3d) where m and n integers, the general solutions to the equations are Φ( ϕ ) = Φ 1 e j mϕ +Φ 2 e − j mϕ , (C.4a) Θ( θ ) = Θ 1 P m n (cos( θ )) + Θ 2 Q m n (cos( θ )) , (C.4b) R ( r ) = R 1 h (1) n ( k r ) + R 2 h (2) n ( k r ) , (C.4c) T ( ω ) = T 1 e j ω t + T 2 e − j ω t , (C.4d) where h (1) n ( x ) and h (2) n ( x ) are the first and second-kind spherical Hank el func- tions that represen t conv ergent and divergen t wa ves dep ending on the signal agreed for the time and P m n ( x ) and Q m n ( x ) are the asso ciated Legendre func- tions of the first and second t yp e. Due to the singularities in the poles of Legendre’s asso ciated functions at θ = 0 and θ = π the term Θ 2 is treated as n ull, and for simplification, y ou can use the p ositiv e m v ariable or negativ e, so the term Φ 2 is also null. According to Williams [ 309 ], for there to b e no singularities in the p oles of Legendre’s asso ciated functions, the n index must b e an in teger. Still, considering causal systems, the term T 2 in C.4d is equal to 0 giv en the con v en tion used. Chapter C. Spherical Harmonics 230 The asso ciated Legendre functions of the first type defined for p ositive degrees m are P m n ( x ) = (1) m (1 − x 2 ) m 2 d m d x m P n ( x ) . (C.5) Mean while, the functions for negative degrees − m are given b y P − m n = ( − 1) m ( n − m )! ( n + m )! P m n ( x ) , (C.6) P n b eing the Legendre Polynomial giv en by P n ( x ) = 1 2 n n ! d n d x n ( x 2 − 1) n . (C.7) C.3 Spherical Harmonics Equations C.4a and C.4b admit p erio dic solutions in angular co ordinates, and com bined are called spherical harmonics of order n and degree m defined b y Y m n ( θ , ϕ ) = s (2 n + 1) 4 π ( n − m )! ( n + m )! P m n (cos( θ )) e j mϕ . (C.8) The negative order SH functions are obtained through the relation Y m n ( θ , ϕ ) = ( − 1) m · ( Y − n m ( θ , ϕ )) ∗ , (C.9) where ∗ denotes the conjugate complex, and demonstrates that only the phase c hanges b etw een the p ositiv e and negativ e degrees of the function. Th us the magnitude is commonly expressed with the radius and the phase in terms of color or color scale, as in Figure 2.9 . App endix D Rev erb eration time in Acoustic Sim ulation The reverberation time for the classro om and restauran t are presen ted in Fig- ure D.1 (a) (b) Figure D.1: R everb er ation time (a) Classr o om (b) R estaur ant in o ctave b ands 231 App endix E Alpha Co efficien ts Figures E.1 , E.2 , and E.3 presents the absorption co efficients according to the frequency introduced in the ODEON softw are to simulate the en vironmen ts. Figure E.1: Classr o om alpha c o efficients (ODEON softwar e). 232 233 Figure E.2: R estaur ant alpha c o efficients (ODEON softwar e). Figure E.3: A ne choic r o om alpha c o efficients. App endix F Questionnaire Questionnaire 1 | 1 TS_ 00 Date: ___ / ___ / 2019 Hvor meget anstrengte du dig for at høre sætningerne? Hvor mange af ordene tror du, at du forstod korrekt? Hvor ofte måtte du opgive at forstå sætningen? Ingen anstrengelse L av anstrengelse Mo derat anstrengelse H ø j anstrengelse Meget h ø j anstrengelse Ingen Mindre end halvdelen Halvde len Mere e nd halvdelen Alle Aldrig Mindre end halvdelen af tiden Halvde len af tiden Mere end halvdelen af tiden Altid 234