Hello,

My Name is

Sergio

About me

Hello! I'm Sergio, and this is your chance to get to know me better. Science, particularly acoustics, is my passion, and I enjoy photography and videography as hobbies.

As an Audiology System Engineer at Oticon's Audiology and Software team, I am dedicated to developing innovative solutions and concepts for hearing devices. I have a background in Acoustic Engineering (UFSM-BR), a Master's degree in Mechanical Engineering (UFSC-BR), and a Ph.D. in Hearing Sciences (Nottingham-UK).

As an Early Stage Researcher at the Eriksholm Research Centre as part of my industrial Ph.D., I gained expertise in evaluating sound localization and focused on creating realistic communication scenarios. I also designed the Iceberg auralization method to create auditory stimuli for audiological experiments.

My ultimate goal is to continuously improve my skills and knowledge in my field to impact people's lives positively.
Linkedin Profile

Professional Experience

JUNE 2022 - NOW

Oticon Audiology and Software
Kongebakken 9, DK-2765 Smørum, Denmark

Position: Senior Audiology System Engineer
Description: Conducting research and development to enhance the auditory experience for hearing aid users using Agile methodologies. Responsibilities include conceiving new concepts, creating proof-of-concepts, and refining audiological algorithms. Also actively engaged in releasing new styles and developing internal simulation tools to drive algorithm innovation.

JUNE 2021 - JUNE 2022

Oticon Medical - Denmark
Bone-Anchored Hearing System
BAHS Audiology and Software
Position: Audiology and Signal Processing Engineer
Description: In this role I was part of the team that designed audiological solutions and concepts for bone-anchored devices, supported product development through tests, automation, and systems integration.

JULY 2018 - JUNE 2021

University of Nottingham Department of Medicine -
Glasgow, United Kingdom
Eriksholm Research Centre
part of Oticon
Position: Early Stage Researcher
Description: Working in the development and validation of auralized realistic communication scenarios. These scenarios are used to investigate changes in listening effort (as measured using physiological measures) and to demonstrate hearing aid benefit.

JANUARY 2016 - JANUARY 2018

Federal University of Santa Catarina
Department of Mechanical Engineering
Laboratory of Acoustics and Vibration
Trindade, Florianópolis Santa Catarina - Brazil

Position: Graduate Student Researcher
Description: Worked on the cochlear implant project. My experience was focused on the virtualization of sound sources, specifically with Vector Based Amplitude Panning (VBAP).

April 2011 - December 2015

Federal University of Santa Maria
Department of Structures and Construction (DECC) - Brazil
Position: Undergraduate Research Assistant
Description: During my undergrad studies, I collaborated in different scientific projects as a student researcher in the Acoustics & Vibrations Research Group
(GPAV) Department of Structures and Construction.

Research Interests

Bone Anchored

Bone-anchored hearing systems are surgically implanted prosthetic devices that treat hearing loss. The sound vibrations are captured, processed, and sent through the skin, bypassing the ear parts that aren't working.

Spatial Audio

Various spatial audio techniques, with different complexities, qualities and paradigms, allow virtualization of both individual sound sources and sound fields. Through the process called auralization, files can be created that can be reproduced and perceived as spatialized audio.

Immersive Technologies

Immersive technologies integrate virtual and real-world elements. Virtual reality is the most well-known technique. In virtual reality, users are immersed in a computer-generated experience. Augmented reality superimposes digital information in the real world. Augmented reality enhances the real world with images, text and other virtual information. In turn, mixed reality allows users to interact in real time with virtual objects, as if they were real objects, within the real world.

Spatial Hearing Research

Study of the spatial aspects of hearing impairment, the behavior of normal and disabled listeners and improvements that the devices can bring to mitigate related problems. In particular, the study of realistic communication scenarios through spatial audio reproduction systems. The aim is to create the necessary conditions to investigate benefit provided by hearing devices, for example, reducing daily auditory effort.

Science Communication

In a world that is increasingly globalized, connected and with more readily accessible technologies, a considerable portion of the population still has no interest in science, technology and the functioning of their devices and programs. This may be in part because people do not understand communication provided by the academic-scientific world. Scientific communication is the practice of informing, educating, sharing wonders and raising awareness about topics related to science. Among so many genres on YouTube, scientific communication presents new ways to connect with a larger audience and in less orthodox ways. The world has evolved thanks to science and scientific communication is changing.

Room Acoustics

Room Acoustics is the study of physical acoustics that aims to model and describe the propagation of sound energy in a closed or semi-closed space. Sound waves are affected due to the geometric composition and the physical properties of the materials. That creates an identity for each space, thus affecting the quality of a sound, be it speech, music or any type of noise.

Contact info

Email

contact@sergioaguirre.com

/sergio-aguirre

Research Gate

/Sergio_Aguirre

Iceberg Auralization Method

Iceberg: A loudspeaker-based room auralization method for auditory research

Sergio Luiz Aguirre - PhD Thesis, 2022

Abstract

Depending on the acoustic scenario, people with hearing loss are challenged on a different scale than independent of normal hearing people to comprehend sound, especially speech. That happens especially during social interactions within a group, which often occurs in environments with low signal-to-noise ratios. This communication disruption can create a barrier for people to acquire and develop communication skills as a child or to interact with society as an adult. Hearing loss compensation aims to provide an opportunity to restore the auditory part of socialization.

Technology and academic efforts progressed to a better understanding of the human hearing system. Through constant efforts to present new algorithms, miniaturization, and new materials, constantly-improving hardware with high-end software is being developed with new features and solutions to broad and specific auditory challenges. The effort to deliver innovative solutions to the complex phenomena of hearing loss encompasses tests, verifications, and validation in various forms. As the newer devices achieve their purpose, the tests need to increase the sensitivity, requiring conditions that effectively assess their improvements.

Regarding realism, many levels are required in hearing research, from pure tone assessment in small soundproof booths to hundreds of loudspeakers combined with visual stimuli through projectors or head-mounted displays, light, and movement control. Hearing aids research commonly relies on loudspeaker setups to reproduce sound sources. In addition, auditory research can use well-known auralization techniques to generate sound signals. These signals can be encoded to carry more than sound pressure level information, adding spatial information about the environment where that sound event happened or was simulated.

This work reviews physical acoustics, virtualization, and auralization concepts and their uses in listening effort research. This knowledge, combined with the experiments executed during the studies, aimed to provide a hybrid auralization method to be virtualized in four-loudspeaker setups. Auralization methods are techniques used to encode spatial information into sounds. The main methods were discussed and derived, observing their spatial sound characteristics and trade-offs to be used in auditory tests with one or two participants. Two well-known auralization techniques (Ambisonics and Vector-Based Amplitude Panning) were selected and compared through a calibrated virtualization setup regarding spatial distortions in the binaural cues. The choice of techniques was based on the need for loudspeakers, although a small number of them. Furthermore, the spatial cues were examined by adding a second listener to the virtualized sound field. The outcome reinforced the literature around spatial localization and these techniques driving Ambisonics to be less spatially accurate but with greater immersion than Vector-Based Amplitude Panning.

A combination study to observe changes in listening effort due to different signal-to-noise ratios and reverberation in a virtualized setup was defined. This experiment aimed to produce the correct sound field via a virtualized setup and assess listening effort via subjective impression with a questionnaire, an objective physiological outcome from EEG, and behavioral performance on word recognition. Nine levels of degradation were imposed on speech signals over speech maskers separated in the virtualized space through Ambisonics’ first-order technique in a setup with 24 loudspeakers. A high correlation between participants’ performance and their responses on the questionnaire was observed. The results showed that the increased virtualized reverberation time negatively impacts speech intelligibility and listening effort.

A new hybrid auralization method was proposed merging the investigated techniques that presented complementary spatial sound features. The method was derived through room acoustics concepts and a specific objective parameter derived from the room impulse response called Center Time. The verification around the binaural cues was driven with three different rooms (simulated). As the validation with test subjects was not possible due to the COVID-19 pandemic situation, a psychoacoustic model was implemented to estimate the spatial accuracy of the method within a four-loudspeaker setup. Also, an investigation ran the same verification, and the model estimation was performed with the introduction of hearing aids. The results showed that it is possible to consider the hybrid method with four loudspeakers for audiological tests while considering some limitations. The setup can provide binaural cues to a maximum ambiguity angle of 30 degrees in the horizontal plane for a centered listener.

Introduction

Individuals with normal hearing often can effortlessly comprehend complex listening scenarios involving multiple sound sources, background noise, and echoes. However, those with hearing loss may find these situations particularly challenging. These environments are commonly encountered in daily life, particularly during social events. They can negatively impact the communication abilities of individuals with hearing loss. The difficulties associated with understanding complex listening scenarios can be a significant barrier for individuals with hearing loss, leading to reduced participation in social activities.

Several hearing research laboratories worldwide are developing systems to realistically simulate challenging scenarios through virtualization to better understand and help with these everyday challenges. The virtualization of sound sources is a powerful tool for auditory research capable of achieving a high level of detail, but current methods use expensive, expansive technology. In this work, a new auralization method has been developed to achieve sound spatialization with a reduction in the technological hardware requirement, making virtualization at the clinic level possible.

Key Chapters

Chapter 2: Literature Review

Examines previous work in virtualization and auralization, basic concepts of human sound perception, room acoustics, and loudspeaker-based virtualization.

Chapter 3: Investigation of Binaural Cue Distortions

Compares VBAP and Ambisonics methods through a calibrated virtualization setup in terms of spatial distortions and examines spatial cues with a second listener.

Chapter 4: Behavioral Study

Examines subjective effort within virtualized sound scenarios (first-order Ambisonics), focusing on how signal-to-noise ratio (SNR) and reverberation affect listening effort in speech-in-noise tasks.

Chapter 5: The Iceberg Method

Proposes a hybrid auralization method combining VBAP and Ambisonics for small reproduction systems (four loudspeakers), evaluated with objective parameters and hearing aids.

Conclusion

Throughout the course of this study, a new auralization method called Iceberg was conceptualized and compared to well-known methods, including VBAP and first-order Ambisonics, using objective parameters. The Iceberg method is innovative in that it uses "Center Time" (TS) to find the transition point between early and late reflections in order to split the Ambisonics impulse responses and adequately distribute them. VBAP is responsible for localization cues in this proposed method, while Ambisonics contributes to the sense of immersion.

In the center position, the Iceberg method was found to be in line with the localization accuracy of other methods while also adding to the sense of immersion. Also, a second listener added to the side did not present undesired effects to the auralization. Additionally, it was found that virtualization of sound sources with Ambisonics can implicate limitations on a participant’s behavior due to its sweet spot in a listening-in-noise test. However, these limitations can be circumvented and extended to Iceberg, resulting in subjective responses that align with behavioral performance in speech intelligibility tests and increasing the localization accuracy.

Download Full Thesis (PDF)

Iceb erg: A loudsp eak er-based ro om auralization metho d for auditory researc h Sergio Luiz Aguirre Submitte d in fulﬁlment of the r e quir ements for the de gr e e of Do ctor of Philosophy Hearing Sciences – Scottish Section - Sc ho ol of Medicine Univ ersit y of Nottingham Sup ervised b y William M. Whitmer, Lars Bramsløw, & Graham Na ylor 2022 Ther e is no “nonsp atial he aring” — Jens Blauert A bstr act Dep ending on the acoustic scenario, p eople with hearing loss are challenged on a diﬀeren t scale than normal hearing people to comprehend sound, esp e- cially sp eec h. That happ en esp ecially during so cial in teractions within a group, whic h often o ccurs in en vironments with lo w signal-to-noise ratios. This com- m unication disruption can create a barrier for people to acquire and dev elop comm unication skills as a child or to interact with so ciet y as an adult. Hear- ing loss comp ensation aims to provide an opp ortunity to restore the auditory part of socialization. T echnology and academic eﬀorts progressed to a b et- ter understanding of the h uman hearing system. Through constant eﬀorts to presen t new algorithms, miniaturization, and new materials, constantly- impro ving hardware with high-end softw are is b eing dev elop ed with new fea- tures and solutions to broad and sp eciﬁc auditory c hallenges. The eﬀort to deliv er inno v ativ e solutions to the complex phenomena of he aring loss encom- passes tests, veriﬁcations, and v alidation in v arious forms. As the newer de- vices achiev e their purp ose, the tests need to increase the sensitivit y , requiring conditions that eﬀectively assess their improv ements. Regarding realism, many levels are required in hearing research, from pure tone assessment in small soundpro of b o oths to h undreds of loudsp eakers com- bined with visual stimuli through pro jectors or head-mounted displays, ligh t, and mov ement con trol. Hearing aids research commonly relies on loudsp eak er setups to repro duce sound sources. In addition, auditory researc h can use w ell-kno wn auralization tec hniques to generate sound signals. These signals can b e enco ded to carry more than sound pressure lev el information, adding spatial information ab out the environmen t where that sound even t happ ened or was simulated. This work reviews ph ysical acoustics, virtualization, and auralization concepts and their uses in listening eﬀort research. This knowl- edge, com bined with the exp eriments executed during the studies, aimed to pro vide a hybrid auralization metho d to b e virtualized in four-loudsp eaker se- tups. Auralization metho ds are techniques used to enco de spatial information in to sounds. The main metho ds w ere discussed and derived, observing their spatial sound characteristics and trade-oﬀs to b e used in auditory tests with one or t w o participants. Two well-kno wn auralization techniques (Am bisonics and V ector-Based Amplitude Panning) were selected and compared through a calibrated virtualization setup regarding spatial distortions in the binau- ral cues. The c hoice of techniques was based on the need for loudsp eakers, although a small num b er of them. F urthermore, the spatial cues were exam- ined by adding a second listener to the virtualized sound ﬁeld. The outcome reinforced the literature around spatial lo calization and these techniques driv- ing Am bisonics to b e less spatially accurate but with greater immersion than V ector-Based Amplitude Panning. A com bination study to observ e changes in listening eﬀort due to diﬀeren t signal-to-noise ratios and rev erb eration in a virtualized setup was deﬁned. This i exp erimen t aimed to pro duce the correct sound ﬁeld via a virtualized setup and assess listening eﬀort via sub jective impression with a questionnaire, an ob jectiv e physiological outcome from EEG, and b eha vioral p erformance on w ord recognition. Nine levels of degradation were imp osed on sp eec h signals o v er sp eec h maskers separated in the virtualized space through Ambisonics’ ﬁrst-order tec hnique in a setup with 24 loudsp eak ers. A high correlation b e- t w een participan ts’ p erformance and their resp onses on the questionnaire w as observ ed. The results show ed that the increased virtualized rev erb eration time negativ ely impacts sp eech intelligibilit y and listening eﬀort. A new h ybrid auralization metho d w as prop osed merging the inv estigated tech- niques that presented complemen tary spatial sound features. The metho d was deriv ed through ro om acoustics concepts and a sp eciﬁc ob jective parameter deriv ed from the ro om impulse resp onse called Center Time. The v eriﬁcation around the binaural cues was driv en with three diﬀerent ro oms (simulated). As the v alidation with test sub jects w as not p ossible due to the COVID-19 pandemic situation, a psychoacoustic mo del w as implemen ted to estimate the spatial accuracy of the metho d within a four-loudsp eak er setup. Also, an in- v estigation ran the same veriﬁcation, and the mo del estimation was performed with the introduction of hearing aids. The results show ed that it is p ossible to consider the h ybrid method with four loudsp eak ers for audiological tests while considering some limitations. The setup can provide binaural cues to a maxim um ambiguit y angle of 30 degrees in the horizon tal plane for a centered listener. ii A cknow le dgements Thank you, Obrigado, Gr acias, Gr azie, T ak skal du have, Dank u ze er & Danke sehr Firstly , I w ould like to express m y gratitude to my supervisors, Drs. Bill Whit- mer, Lars Bramsløw, and Graham Naylor, for their guidance, exp ertise, and remark able patience throughout this pro cess. Y our supp ort and mentorship ha v e been in v aluable in helping me to ﬁll my knowledge gaps, tirelessly encour- aging me to ask the righ t questions, and guiding me to pro duce high-qualit y scien tiﬁc research. I also thank Dr. Thomas Lunner for his initial guidance, insigh tful questions, and commen ts. Thank you to the sp ecial p eople at Eriksholm Researc h Centre in Denmark and the p eople at Hearing Sciences – Scottish Section. W orking with suc h fan tastic top teams has b een a jo y and a privilege. A sp ecial thank y ou to Jette, Mic hael, Bo, Niels, Claus, Dorothea, Sergi, Jepp e, James, Lorenz, Johannes, and Hamish. Thanks to all the p eople in v olv ed in HEAR-ECO for their hard w ork, esp ecially Hidde, Beth, Patrycja, Tirdad, and Defne. I am deeply grateful to m y sw eet wife, Lilian, for the lov e, encouragement, and supp ort she has given me throughout this journey . Her unw av ering supp ort has b een an enduring seed of resilience and inspiration. I cannot thank her enough for b eing such an integral part of my life. Thanks to my friends Math and Gil, for alwa ys b eing there. Special thanks to m y former professors, Drs.: Arcanjo Lenzi, William D’Andrea F onseca, Eric Brand˜ ao, Paulo Mareze, Stephan Paul, and Bruno Sanc hes Masiero for stim u- lating critical thinking and for all the support, knowledge, and encouragement. Thank y ou also to m y former Oticon Medical colleagues Simon Zisk a Krogholt, P atric k Maas, Brian Sk ov, and Jens T. Balslev for the and supp ort. I w ould like to thank you, my Professor, Pr ofessor a Dra. Dinara Xavier P aix˜ ao. All of this is p ossible b ecause of y ou and y our determination to create an oﬃcial undergraduate course in Acoustical Engineering in Brazil. This course iii is praised for forming remark able professionals who are recognized w orldwide. It is not just my dream that y ou hav e made p ossible, but the coun tless p eople for whom this course has b een a life-changing exp erience. W e know that this w as a collective eﬀort, but y our role was vital. Y our wa y of showing that p olitics is a part of everything, and that we need to b e gentle but correct, made all the diﬀerence. Thank you. Muito Obrigado . I sincerely thank m y friends and colleagues from m y undergraduate studies in acoustical engineering (USFM/EAC) and the master’s (UFSC/L V A). Y our supp ort, happiness, patience and encouragemen t hav e b een in v aluable through- out this journey . Thank y ou for helping me to develop m y skills and kno wledge and for b eing such a p ositiv e inﬂuence on m y academic career. I am deeply grateful for all y ou ha v e done for me and lo ok forw ard to contin uing our pro- fessional relationship. I wan t to thank the oldest friends, Fabrício, André, Juliano, and the Pan teon. I v alue the b ond and history that w e share. Thank you for b eing suc h wonderful friends. I wan t to express my heartfelt thanks to the Brazilian CNPq and the gov ern- men t (Lula/Dilma) p olicies that supp ort students with lo w income and from public schools. With their ﬁnancial assistance, I could pursue m y studies and ac hiev e my goals. I am deeply gratef ul for their support and the opp ortunity to receiv e a qualit y education. I would also like to express m y gratitude to Marie Sk lo do wsk a-Curie Actions for their supp ort of my do ctoral education. Their reference programme for do ctoral education has provided me with in v aluable resources and opp ortunities, and I am extremely grateful for their support. Thank you for helping me achiev e my goals and b eing a v aluable part of m y academic journey . I wan t to express my gratitude to all those who will read this thesis in the future. Y our time and atten tion are greatly appreciated. I wish you a go o d reading exp erience and hope that you will ﬁnd the ideas and research presented in this w ork to b e b oth thought-pro voking and b eneﬁcial. Thank you again for considering this w ork. iv A uthor’s De clar ation This thesis is the result of the author’s original researc h. Chapter 4 is a collab oration w ork with Tirdad Seiﬁ-Ala. The author has comp osed it and has not b een previously submitted for any other academic qualiﬁcation. This pro ject has received funding from the Europ ean Union’s Horizon 2020 researc h and innov ation programme under the Marie-Sk lo do wsk a-Curie grant agreemen t No 765329; The funder had no role in study design. Sergio Luiz Aguirre v Con ten ts Abstract i Ac kno wledgemen ts iii Author’s Declaration v Nomenclature xxi 1 In tro duction 1 1.1 Motiv ations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Aims and Scop e . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Con tributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . 4 2 Literature Review 7 2.1 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Human Binaural Hearing . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Spatial Hearing Concepts . . . . . . . . . . . . . . . . . 9 2.2.2 Binaural cues . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3 Monaural cues . . . . . . . . . . . . . . . . . . . . . . . . 11 vi CONTENTS vii 2.2.4 Head-related transfer function . . . . . . . . . . . . . . . 12 2.2.5 Sub jectiv e asp ects of an audible reﬂection . . . . . . . . 16 2.3 Spatial Sound & Virtual Acoustics . . . . . . . . . . . . . . . . 17 2.3.1 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.1.1 Auralization . . . . . . . . . . . . . . . . . . . . 19 2.3.1.2 Repro duction . . . . . . . . . . . . . . . . . . . 21 2.3.2 Auralization Paradigms . . . . . . . . . . . . . . . . . . 22 2.3.2.1 Binaural . . . . . . . . . . . . . . . . . . . . . . 22 2.3.2.2 P anorama . . . . . . . . . . . . . . . . . . . . . 23 2.3.2.3 Sound Field Synthesis . . . . . . . . . . . . . . 31 2.3.3 Ro om acoustics . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.3.1 Ro om acoustics parameters . . . . . . . . . . . 33 2.3.3.2 Rev erb eration Time . . . . . . . . . . . . . . . 33 2.3.3.3 Clarit y and Deﬁnition . . . . . . . . . . . . . . 35 2.3.3.4 Cen ter Time . . . . . . . . . . . . . . . . . . . 37 2.3.3.5 P arameters related to spatialit y . . . . . . . . . 37 2.3.4 Loudsp eak er-based Virtualization in Auditory Researc h . 40 2.3.4.1 Hybrid Metho ds . . . . . . . . . . . . . . . . . 43 2.3.4.2 Sound Source Lo calization . . . . . . . . . . . . 45 2.4 Listening Eﬀort Assessment . . . . . . . . . . . . . . . . . . . . 50 2.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 53 3 Binaural cue distortions in virtualized Am bisonics and VBAP 55 CONTENTS viii 3.1 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 Metho ds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2.1 Setups and system c haracterization . . . . . . . . . . . . 59 3.2.1.1 Rev erb eration time . . . . . . . . . . . . . . . . 60 3.2.1.2 Early-reﬂections . . . . . . . . . . . . . . . . . 61 3.2.2 Pro cedure . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.2.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.2.4 VBAP Auralization . . . . . . . . . . . . . . . . . . . . . 67 3.2.5 Am bisonics Auralization . . . . . . . . . . . . . . . . . . 67 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.3.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.3.2 Cen tered p osition . . . . . . . . . . . . . . . . . . . . . . 71 3.3.2.1 Cen tered ITD . . . . . . . . . . . . . . . . . . . 73 3.3.2.2 Cen tered ILD . . . . . . . . . . . . . . . . . . . 76 3.3.3 Oﬀ-cen tered p osition . . . . . . . . . . . . . . . . . . . . 82 3.3.3.1 Oﬀ-cen ter ITD . . . . . . . . . . . . . . . . . . 83 3.3.3.2 Oﬀ-cen ter ILD . . . . . . . . . . . . . . . . . . 87 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 95 4 Sub jectiv e Eﬀort within Virtualized Sound Scenarios 97 4.1 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.2 Metho ds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 CONTENTS ix 4.2.1 P articipan ts . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.2.2 Stim uli . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.2.3 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.2.4 Auralization . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.2.5 Pro cedure . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.2.6 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . 112 4.2.7 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 119 5 Iceb erg: A Hybrid Auralization Metho d F o cused on Compact Setups. 120 5.1 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.2 Iceb erg an Hybrid Auralization Metho d . . . . . . . . . . . . . . 122 5.2.1 Motiv ation . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.2.2 Metho d . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.2.2.1 Comp onen ts . . . . . . . . . . . . . . . . . . . . 124 5.2.2.2 Energy Balance . . . . . . . . . . . . . . . . . . 124 5.2.2.3 Iceb erg prop osition . . . . . . . . . . . . . . . . 127 5.2.3 Setup Equalization & Calibration: . . . . . . . . . . . . . 133 5.3 System Characterization . . . . . . . . . . . . . . . . . . . . . . 138 5.3.1 Exp erimen tal Setup . . . . . . . . . . . . . . . . . . . . . 139 CONTENTS x 5.3.2 Virtualized RIRs & BRIRs . . . . . . . . . . . . . . . . . 139 5.3.3 Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.4 Rev erb eration Time . . . . . . . . . . . . . . . . . . . . . 143 5.4 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.4.1 Cen tered Position . . . . . . . . . . . . . . . . . . . . . . 145 5.4.1.1 In teraural Time Diﬀerence . . . . . . . . . . . . 145 5.4.1.2 In teraural Level Diﬀerence . . . . . . . . . . . . 147 5.4.1.3 Azim uth Estimation . . . . . . . . . . . . . . . 150 5.4.2 Oﬀ-Cen ter Positions . . . . . . . . . . . . . . . . . . . . 151 5.4.2.1 In teraural Time Diﬀerence . . . . . . . . . . . . 151 5.4.2.2 In teraural Level Diﬀerence . . . . . . . . . . . . 153 5.4.2.3 Azim uth Estimation . . . . . . . . . . . . . . . 155 5.4.3 Cen tered Accompanied by a Second Listener . . . . . . . 156 5.4.3.1 In teraural Time Diﬀerence . . . . . . . . . . . . 157 5.4.3.2 In teraural Level Diﬀerence . . . . . . . . . . . . 157 5.4.3.3 Azim uth Estimation . . . . . . . . . . . . . . . 159 5.5 Supplemen tary T est Results . . . . . . . . . . . . . . . . . . . . 163 5.5.1 Cen tered Position (Aided) . . . . . . . . . . . . . . . . . 164 5.5.1.1 In teraural Time Diﬀerence . . . . . . . . . . . . 165 5.5.1.2 In teraural Level Diﬀerence . . . . . . . . . . . . 166 5.5.1.3 Azim uth Estimation . . . . . . . . . . . . . . . 166 5.5.2 Oﬀ-cen ter Positions (Aided) . . . . . . . . . . . . . . . . 169 CONTENTS xi 5.5.2.1 In teraural Time Diﬀerence . . . . . . . . . . . . 169 5.5.2.2 In teraural Level Diﬀerence . . . . . . . . . . . . 171 5.5.2.3 Azim uth Estimation . . . . . . . . . . . . . . . 174 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 5.6.1 Sub jectiv e impressions . . . . . . . . . . . . . . . . . . . 180 5.6.2 Adv an tages and Limitations . . . . . . . . . . . . . . . . 181 5.6.3 Study limitations and F uture W ork . . . . . . . . . . . . 182 5.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 183 6 Conclusion 184 6.1 Iceb erg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 6.2 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.2.1 Iceb erg capabilities . . . . . . . . . . . . . . . . . . . . . 189 6.2.2 Iceb erg & Second Joint Listener . . . . . . . . . . . . . . 190 6.2.3 Iceb erg: Listener W earing Hearing Aids . . . . . . . . . . 191 6.2.4 Iceb erg Limitations . . . . . . . . . . . . . . . . . . . . . 192 6.3 General Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 193 6.4 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 193 Bibliograph y 195 App endices 225 A ITDs Ambisonics 225 B Delta ILD Ambisonics 226 C W av e Equation and Spherical Harmonic Representation 228 C.1 W av e Equation in Spherical Co ordinates . . . . . . . . . . . . . 228 C.2 Separation of the V ariables . . . . . . . . . . . . . . . . . . . . . 228 C.3 Spherical Harmonics . . . . . . . . . . . . . . . . . . . . . . . . 230 D Rev erb eration time in Acoustic Sim ulation 231 E Alpha Co eﬃcien ts 232 F Questionnaire 234 xii List of T ables 2.1 Non-exhaustiv e ov erview list of hybrid auralization metho ds prop osed in the literature. The A-B order of the techniques do es not represent any order of signiﬁcance. . . . . . . . . . . . 45 2.2 Ov erview of Lo calization Error Estimates or Measuremen ts from Loudsp eak er-Based Virtualization Systems Using V arious Au- ralization Metho ds. . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.1 Sound pressure level diﬀerence b etw een direct sound and early reﬂections ∆ SPL [dB] . . . . . . . . . . . . . . . . . . . . . . . 62 4.1 The questionnaire for sub jective ratings of p erformance, eﬀort and engagement (English translation from Danish) . . . . . . . 113 4.2 Results of linear mixed mo del based on SNR and R T predictors estimates of the questionnaire. . . . . . . . . . . . . . . . . . . . 114 4.3 P earson skipp ed correlations b etw een p erformance and self-rep orted questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.1 Rev erb eration Time in three virtualized environmen ts . . . . . . 144 5.2 One wa y anov a, columns are absolute diﬀerence b et w een esti- mated and reference angles for diﬀerent KEMAR p ositions and R Ts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 5.3 Hearing Level in dB according to the prop osed Standard Au- diograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 5.4 One wa y anov a, columns are absolute diﬀerence b et w een esti- mated and reference angles for diﬀerent p ositions and R Ts. . . . 168 xiii 5.5 Maxim um ITD according to displacement . . . . . . . . . . . . 171 xiv List of Figures 2.1 Tw o-dimensional representation of the cone of confusion. . . . . 11 2.2 A descriptive deﬁnition of the measured free-ﬁeld HR TF for a giv en angle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 P olar co ordinate system related to head incidence angles . . . . 13 2.4 Head-related transfer functions of four human test participants, fron tal incidence . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5 Audible eﬀects of a single reﬂection . . . . . . . . . . . . . . . . 16 2.6 Binaural repro duction setups . . . . . . . . . . . . . . . . . . . . 23 2.7 V ector-based amplitude panning: 2D display of sound sources p ositions and weigh ts. . . . . . . . . . . . . . . . . . . . . . . . 25 2.8 Diagram representing the placemen t of sp eak ers in the VBAP tec hnique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.9 Spherical Harmonics Y m n ( θ , ϕ ). . . . . . . . . . . . . . . . . . . . 29 2.10 B-format comp onen ts: omnidirectional pressure comp onen t W, and the three v elo cit y comp onen ts X, Y, Z. . . . . . . . . . . . . 30 2.11 Illustration of Huygen’s Principle of a propagating wa ve front. . 32 2.12 Normalized Room Impulse Response: example from a real room in the time domain (left), and in the time domain in dB (righ t). 35 2.13 LoRa implementation pro cessing diagram . . . . . . . . . . . . . 43 3.1 Hearing Sciences - Scottish Section T est Ro om. . . . . . . . . . 59 xv LIST OF FIGURES xvi 3.2 Eriksholm T est Ro om. . . . . . . . . . . . . . . . . . . . . . . . 59 3.3 Rev erb eration time in third of o ctav e . . . . . . . . . . . . . . . 61 3.4 HA TS and KEMAR inside test ro om in Glasgo w . . . . . . . . . 63 3.5 HA TS and KEMAR inside Eriksholm’s Anechoic Ro om . . . . 63 3.6 Description of exp erimen t’s measured p ositions and mannequin placemen t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.7 In teraural cross correlation - F rontal angle . . . . . . . . . . . . 72 3.8 P olar representation IACC . . . . . . . . . . . . . . . . . . . . . 72 3.9 In teraural Time Diﬀerence b y angle: VBAP accompanied . . . . 74 3.10 Am bisonics - directivit y representation in 2D . . . . . . . . . . 75 3.11 In teraural Time Diﬀerence b y angle: Am bisoncis accompanied . 76 3.12 In teraural Level Diﬀerences: VBAP and Ambisonics . . . . . . . 77 3.13 In teraural Lev el Diﬀerences: av eraged o ctav e bands as a func- tion of azim uth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane. . . . . . . . . . . . . . . . . . . . . . . . 78 3.14 In teraural Lev el Diﬀerences with additional listener (VBAP and Am bisonics) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.15 Discrepancies in Interaural Lev el Diﬀerences (VBAP) . . . . . . 79 3.16 VBAP In teraural Lev el Diﬀerences as function of azim uth angle around the centered listener. . . . . . . . . . . . . . . . . . . . . 80 3.17 Discrepancies in Interaural Lev el Diﬀerences: Am bisonics . . . . 81 3.18 Am bisonics In teraural Lev el Diﬀerences as function of azimuth angle around the cen tered listener. . . . . . . . . . . . . . . . . 82 3.19 VBAP Oﬀ center ITD HA TS 25 cm . . . . . . . . . . . . . . . . 84 3.20 VBAP Oﬀ center ITD HA TS 50 cm . . . . . . . . . . . . . . . . 84 3.21 VBAP Oﬀ center ITD HA TS 75 cm . . . . . . . . . . . . . . . . 85 LIST OF FIGURES xvii 3.22 VBAP Oﬀ center ITD displaced HA TS . . . . . . . . . . . . . . 85 3.23 VBAP ITD considering real sound sources only . . . . . . . . . 86 3.24 Am bisonics ITD as a function of source angle . . . . . . . . . . 87 3.25 ILD (VBAP and Am bisonics) in oﬀ-cen ter setups . . . . . . . . 88 3.26 ILD centered setup and oﬀ-center VBAP setups . . . . . . . . . 89 3.27 Diﬀerences in the ILD b etw een centered and oﬀ-center (25 cm) in VBAP setups . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.28 Diﬀerences in the ILD b etw een centered and oﬀ-center (50 cm) in VBAP setups . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.29 Diﬀerences in the ILD b etw een centered and oﬀ-center (75 cm) in VBAP setups . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.30 ILD centered setup and oﬀ-center Ambisonics setups . . . . . . 92 4.1 Auralization pro cedure implemented to create mixed audible HINT sen tences with 4 spatially separated talkers at the sides and back (maskers) and one target in fron t. . . . . . . . . . . . 104 4.2 Spatial setup of the exp erimen t . . . . . . . . . . . . . . . . . . 105 4.3 Eriksholm Anechoic Ro om: Rev erb eration Time . . . . . . . . . 105 4.4 Eriksholm Anechoic Ro om: Bac kground noise . . . . . . . . . . 106 4.5 Exp erimen t setup placed inside anechoic ro om. . . . . . . . . . . 106 4.6 Ov erall rev erb eration time (R T) as a function of receptor (head) p osition in the mid-saggital plane re center (0 cm) . . . . . . . . 108 4.7 Sound pressure level Am bisonics virtualized setup . . . . . . . . 109 4.8 P articipan t p ositioned to the test. . . . . . . . . . . . . . . . . . 110 4.9 Exp erimen t’s trial design . . . . . . . . . . . . . . . . . . . . . . 111 4.10 Graphic User Interface . . . . . . . . . . . . . . . . . . . . . . . 112 LIST OF FIGURES xviii 4.11 P erformance accuracy (word-scoring) . . . . . . . . . . . . . . . 114 4.12 Self rep orted sub jective intelligibilit y . . . . . . . . . . . . . . . 115 4.13 Self rep orted sub jective eﬀort . . . . . . . . . . . . . . . . . . . 115 4.14 Self rep orted sub jective disengagement . . . . . . . . . . . . . . 115 5.1 T op view. Loudsp eak ers p osition on horizontal plane to virtu- alization with prop osed Iceb erg metho d. . . . . . . . . . . . . . 123 5.2 Normalized Ambisonics ﬁrst-order RIR generated via ODEON soft w are. Left panel depicts the w a v eform; right panel depicts the wa veform in dB. . . . . . . . . . . . . . . . . . . . . . . . . 125 5.3 Reﬂectogram split into Direct Sound Early and Late Reﬂections. 126 5.4 Iceb erg’s pro cessing Blo c k diagram. The Ambisonics RIR is treated, split, and con v olv ed to an input signal. A virtual audi- tory scene can b e created by playing the multi-c hannel output signal with the appropriate setup.. . . . . . . . . . . . . . . . . 129 5.5 Omnidirectional c hannel of Ambisonics RIR for a sim ulated ro om. 129 5.6 RIR Ambisonics segments . . . . . . . . . . . . . . . . . . . . . 130 5.7 Example of signal auralized with the Iceb erg metho d . . . . . . 132 5.8 Loudsp eak er frequency resp onse comparison . . . . . . . . . . . 135 5.9 Loudsp eak ers normalized frequency resp onse . . . . . . . . . . . 135 5.10 Loudsp eak ers normalized frequency resp onse ﬁltered . . . . . . . 136 5.11 BRIR/RIR acquisition ﬂow chart: Iceb erg auralization metho d. . 141 5.12 BRIR measurement setup: B&K HA TS and KEMAR p ositioned inside the anechoic ro om. . . . . . . . . . . . . . . . . . . . . . . 141 5.13 Measuremen t p ositions (grid) . . . . . . . . . . . . . . . . . . . 142 5.14 R T within Iceb erg virtualized en vironmen t . . . . . . . . . . . . 144 5.15 Iceb erg Cen tered Interaural Time Diﬀerence . . . . . . . . . . . 145 LIST OF FIGURES xix 5.16 Iceb erg Centered In teraural Time Diﬀerence R Ts = 0, 0.5 and 1.1 s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.17 Iceb erg cen tered ILD . . . . . . . . . . . . . . . . . . . . . . . . 147 5.18 Iceb erg and Real loudsp eak ers ILDs as a function of azimuth angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 5.19 Iceb erg Heatmap Absolute ∆ ILD . . . . . . . . . . . . . . . . 148 5.20 Iceb erg: Heatmap Absolute ∆ ILD min us JND . . . . . . . . . . 149 5.21 Iceb erg metho d: Estimated azimuth angle . . . . . . . . . . . . 150 5.22 Iceb erg ITD F rontal displacement . . . . . . . . . . . . . . . . . 152 5.23 Iceb erg ITD F rontal and lateral displacement . . . . . . . . . . 152 5.24 Delta Interaural Level Diﬀerences R T = 0.0 s . . . . . . . . . . 153 5.25 Delta Interaural Level Diﬀerences R T = 0.5 s . . . . . . . . . . 154 5.26 Delta Interaural Level Diﬀerences R T = 1.1 s . . . . . . . . . . 155 5.27 Estimated (mo del by Ma y and Kohlrausc h [ 182 ]) frontal az- im uth angle at diﬀerent p ositions inside the loudsp eak er ring as function of the target angle. . . . . . . . . . . . . . . . . . . . . 156 5.28 ITD with second listener present . . . . . . . . . . . . . . . . . 157 5.29 Delta Interaural Level Diﬀerences Cen tered+Second Listener . . 158 5.30 Estimated lo calization error with presence of a second listener . 160 5.31 Diﬀerence to target in estimated lo calization with presence of a second listener . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 5.32 Estimated error to R T=0 considering the estimation of real loudsp eak ers as basis. . . . . . . . . . . . . . . . . . . . . . . . . 162 5.33 HA TS w earing a Oticon Hearing Device (righ t ear) . . . . . . . 164 5.34 In teraural Time Diﬀerence Iceb erg metho d (aided) . . . . . . . . 165 5.35 Iceb erg metho d ILD (aided condition) . . . . . . . . . . . . . . 166 5.36 Azim uth angle estimation (aided condition) . . . . . . . . . . . 167 5.37 Absolute diﬀerence in estimated azimuth angle (aided condition) 168 5.38 T ukey test to compare means aided condition. . . . . . . . . . . 169 5.39 Iceb erg metho d oﬀ center ITD (aided condition) . . . . . . . . . 170 5.40 Delta Interaural Level Diﬀerences Aided R T = 0.0 s . . . . . . . 172 5.41 Delta Interaural Level Diﬀerences Aided R T = 0.5 s . . . . . . . 172 5.42 Delta Interaural Level Diﬀerences Aided R T = 1.1 s . . . . . . . 173 5.43 Estimated fron tal azim uth angle on diﬀerent p ositions (aided condition) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 A.1 ITD as a function of source angle Ambisonics . . . . . . . . . . 225 B.1 Diﬀerences in the ILD Am bisonics, 25 cm . . . . . . . . . . . . . 226 B.2 Diﬀerences in the ILD Am bisonics, 50 cm . . . . . . . . . . . . . 227 B.3 Diﬀerences in the ILD Am bisonics, 75 cm . . . . . . . . . . . . . 227 D.1 Reverberation time (a) Classro om (b) Restaurant . . . . . . . . 231 E.1 Classro om alpha co eﬃcien ts . . . . . . . . . . . . . . . . . . . . 232 E.2 Restaurant alpha co eﬃcien ts . . . . . . . . . . . . . . . . . . . . 233 E.3 Anechoic alpha co eﬃcien ts . . . . . . . . . . . . . . . . . . . . . 233 xx Nomenclature General Sym b ols C 50 Clarit y: the ratio b et ween the ﬁrst 50 ms of the RIR and from 50 ms to the end, Eq. (2.12), page 36 . C 80 Clarit y: the ratio b etw een ﬁrst 80 ms of the RIR and RIR from 80 ms to the end, Eq. (2.12), page 36 . D 50 Clarit y: ratio b et w een ﬁrst RIR 50 ms of a RIR and the complete RIR, Eq. (2.14), page 37 . D 80 Clarit y: ratio b etw een ﬁrst 80 ms of a RIR and the the complete RIR, Eq. (2.14), page 37 . g Gain matrix, page 25 . h ( t ) Impulse resp onse energy in time domain, Eq. (2.10), page 35 . h b ( t ) RIR measured with a pressure gradient microphone, Eq. (2.16), page 38 . h L ( t ) Impulse resp onses collected from the left ear, page 39 . h R ( t ) Impulse resp onses collected from the right ear, page 39 . l 1 V ector from center p oint to c hannel 1, Eq. (2.3), page 25 . l 2 V ector from center p oint to c hannel 2, Eq. (2.3), page 25 . L 12 Sp eak er P osition Matrix (Channels), page 25 . m Am bisonics comp onen ts order, Eq. (2.9), page 30 . N n um b er of necessary sources to Ambisonics repro duction, Eq. (2.9), page 30 . p V ector from center p oint to virtual font, Eq. (2.3), page 25 . xxi p p-v alue for the t-statistic of the hypothesis test that the corresp ond- ing co eﬃcien t is equal to zero or not., page 114 . p n L ( t ) Bandpassed left impulse resp onse, Eq. (3.7), page 71 . p n R ( t ) Bandpassed righ t impulse resp onse, Eq. (3.7), page 71 . p L ( t ) Impulse response at the en trance of the left ear canal, Eq. (3.5), page 70 . p R ( t ) Impulse resp onse at the en trance of the right ear canal, Eq. (3.5), page 70 . R T Rev erb eration time, page 34 . R T 60 Rev erb eration time, page 34 . s ( t ) Arbitrary sound source signal, page 29 . S l ( t ) Time signal recorded with the set microphone and the loudsp eaker l , page 66 . S E Standard error of the co eﬃcien ts, page 114 . T 20 Rev erb eration Time ( T 60 ) extrap olated from 25 dB of energy deca y, Eq. (2.10), page 35 . T 30 Rev erb eration Time ( T 60 ) extrap olated from 35 dB of energy deca y, Eq. (2.10), page 35 . t Time, page 12 . t s Cen ter Time, Eq. (2.15), page 37 . T 60 Rev erb eration Time, Eq. (2.10), page 34 . v ( t ) 1kHz Sin usoidal 1 k Hz signal recorded from the calibrator in VFS, Eq. (3.2), page 66 . V V olume of the ro om, Eq. (2.10), page 34 . v l ( t ) Calibrator signal recorded in the left ear, Eq. (3.1), page 65 . v r ( t ) Calibrator signal recorded in the right ear, Eq. (3.1), page 65 . Greek Sym b ols α l, rms Calibration factor for the left ear, Eq. (3.1), page 65 . xxii α r, rms Calibration factor for the right ear, Eq. (3.1), page 65 . ¯ α Av eraged Absorption Co eﬃcient, Eq. (2.10), page 34 . Γ l Lev el factor to the loudsp eaker l , Eq. (3.3), page 66 . ω Angular frequency, page 12 . ϕ Elev ation angle related to the ears axis of the listener, page 29 . θ Azimuthal angle related to the ears axis of the listener, page 29 . Mathematical Op erators and Con v en tions β Fixed-eﬀects regression co eﬃcient, page 114 . e Exp onen tial function, where e (1) ≈ 2 , 7182, page 12 . R In tegral, page 12 . j √ − 1, imaginary op erator, page 12 . τ Time delay, Eq. (2.18), page 39 . t x t-statistic for each co eﬃcien t to test the n ull hypothesis, page 114 . Y m n ( θ , ϕ ) Spherical harmonics function of order n and degree m , Eq. (2.7), page 29 . L eq Equiv alen t contin uous sound lev el, page 103 . max() F unction that returns the elemen t with the maximum v alue for a sequence of num b ers, or for a vector, Eq. (2.18), page 39 . RMS() Ro ot mean square, Eq. (3.2), page 66 . Acron yms and Abbreviations 2D Tw o-dimensions in space, page 24 . 3D Three-dimensions in space, page 24 . vs. F rom Latin V ersus is the past participle of vertar e . which means “against” and “as opp osed or compared to., page 81 . AD/D A Analog-to-Digital Digital-to-Analog conv erter, page 59 . AR Augmen ted reality, page 27 . xxiii ASW Apparen t Source Width, page 38 . BRIR Binaural ro om impulse resp onse, page 15 . CTC Cross-talk cancellation, page 23 . dB HL Hearing Loss in decib els, page 102 . DBAP Distance-Based Amplitude Panning, page 47 . DS Direct sound, page 15 . EcoEG Combination study num b er 3: Eco (Reverberation/Ecological) and EEG, page 97 . EEG Electro encephalogram, page 50 . FFRs Brainstem frequency resp onses, page 50 . FFT F ast F ourier transform, page 70 . FIR Finite Impulse Resp onse, page 71 . HA TS Head and torso sim ulator, page 60 . HC B& K 4128 HA TS at cen ter p osition, page 79 . HC K+ X B& K 4128 HA TS at cen ter p osition and KEMAR at X cm to the left, page 80 . HC K- X B& K 4128 HA TS at center p osition and KEMAR at X cm to the righ t, page 79 . HEAR- ECO Inno v ativ e Hearing Aid Researc h – Ecological Conditions and Out- come Measures, page 97 . HINT Hearing in Noise T est, page 103 . HO A Higher Order Ambisonics, page 31 . HR TF Head-Related T ransfer F unction, page 12 . IA CC Interaural Cross-Correlation Co eﬃcien t, page 39 . IA CF In teraural cross-correlation function, Eq. (3.5), page 70 . ILD In teraural Level Diﬀerence, page 10 . IPD In teraural Phase Diﬀerence, page 10 . xxiv ITD In teraural Time Diﬀerence, page 10 . ITF Interaural T ransfer F unction, page 14 . JND Just noticeable diﬀerence, page 93 . KEMAR Kno wles Electronics Manikin for Acoustic Research, page 60 . LEF Lateral Energy F raction, Eq. (2.16), page 38 . LEV Listener Env elopment , page 39 . LG Lateral Strength, Eq. (2.17), page 39 . LMM Linear Mixed-eﬀect Mo del, page 113 . LPF Lo w-pass ﬁlter, page 73 . L TI Linear and Time-Inv arian t System, page 33 . MD AP Multiple-Direction Amplitude Panning, page 47 . MO A Mixed Order Ambisonics, page 52 . MTF Monaural T ransfer F unction, page 13 . NSP Nearest Sp eak er, page 52 . PLE P erceptual Lo calization Error, page 41 . PT A4 F our bands pure tone audiometry, page 102 . RIRs Ro om impulse resp onse, page 15 . SH Spherical Harmonics, page 28 . SPL Sound pressure lev el, page 35 . SR T Sp eec h Reception Threshold, page 52 . VBAP V ector-Based Amplitude P anning, page 23 . VBIP V ector-Based In tensit y Panning, page 41 . VFS V olts full scale, page 66 . VSE Virtual Sound Environmen t, page 18 . W Omnidirectional channel, Eq. (2.9), page 29 . WFS W av e Field Syn thesis, page 31 . xxv LIST OF FIGURES xxvi X Bi-directional pattern c hannel to wards the source, Eq. (2.9), page 29 . Y Bi-directional pattern channel perp endicular to the source in az- im uth, Eq. (2.9), page 29 . Z Bi-directional pattern channel p erp endicular to the source in elev a- tion, Eq. (2.9), page 29 . Chapter 1 In tro duction Individuals with normal hearing often can eﬀortlessly comprehend complex listening scenarios in v olving m ultiple sound sources, bac kground noise, and ec ho es [ 226 ]. Ho wev er, those with hearing loss may ﬁnd these situations par- ticularly c hallenging [ 273 , 289 , 304 , 317 ]. These environmen ts are commonly encoun tered in daily life, particularly during so cial even ts. They can negatively impact the communication abilities of individuals with hearing loss [ 137 , 260 ]. The diﬃculties asso ciated with understanding complex listening scenarios can b e a signiﬁcant barrier for individuals with hearing loss, leading to reduced participation in so cial activities [ 16 , 63 , 119 ]. 1.1 Motiv ations Sev eral hearing research lab oratories worldwide are developing systems to re- alistically simulate challenging scenarios through virtualization to b etter un- derstand and help with these everyda y challe nges in p eople’s lives [ 41 , 79 , 102 , 116 , 118 , 160 , 161 , 188 , 195 , 218 – 220 , 259 , 272 , 298 ] The virtualization of sound sources is a p o w erful to ol for auditory research capable of achieving 1 Chapter 1. Aims and Scop e 2 a high lev el of detail, but curren t metho ds use exp ensiv e, expansive technol- ogy [ 293 ]. In this work, a new auralization metho d has been dev elop ed to ac hiev e sound spatialization with a reduction in the technological hardw are requiremen t, making virtualization at the clinic lev el p ossible. 1.2 Aims and Scop e Ov erall the ob jectiv e of the research was to in v estigate parameters of sound virtualization metho ds related to its lo calization accuracy , esp ecially the p er- ceptually based ones [ 39 ], in their optimal but also in c hallenging conditions. F urthermore, an auralization metho d oriented to a smaller setup to reduce the hardw are requirements is prop osed. The sp eciﬁc ob jectives were: • T o inv estigate spatial distortions through binaural cue diﬀerences in tw o w ell-kno wn virtualization setups (V ector-Based Amplitude Panning and Am bisonics (VBAP)). • T o in vestigate the inﬂuence of a second listener inside the sound ﬁeld (VBAP and Ambisonics). • T o ev aluate the feasibilit y of a sp eec h-in-noise test within Ambisonics virtualized reverberant ro oms. • T o study the relation b et ween reverberation, signal-to-noise ratio (SNR), and listening eﬀort in en vironments virtualized in ﬁrst-order Am bisonics. • T o in v estigate the binaural cues, ob jective lev el and rev erb eration time for a new auralization metho d utilizing four loudsp eak ers. Chapter 1. Con tributions 3 • T o inv estigate the inﬂuence of hearing aids on binaural cues and ob jec- tiv e parameters within virtualized scenes utilizing the new auralization metho d with an appropriate setup. The main ob jective of this research was to examine v arious parameters of sound virtualization metho ds related to their lo calization accuracy , with a fo cus on p erceptually-based methods [ 39 ], in optimal and c hallenging conditions. Addi- tionally , a new auralization metho d w as prop osed for a smaller setup to reduce hardw are requirements. The sp eciﬁc goals of the research included: • Examining spatial distortions through diﬀerences in binaural cues in t w o w ell-kno wn virtualization setups (V ector-Based Amplitude Panning and Am bisonics (VBAP)). • Ev aluating a second listener’s impact within the sound ﬁeld (VBAP and Am bisonics). • Assessing the feasibility of a sp eech-in-noise test within Ambisonics vir- tualized reverberant ro oms. • Inv estigating the relationship b et w een reverberation, signal-to-noise ratio (SNR), and listening eﬀort in environmen ts virtualized using ﬁrst-order Am bisonics. • Using four loudsp eak ers, prop ose an auralization metho d, measure it, analyze ob jective parameters against existent metho ds. • T est and analyze the inﬂuence of acquiring signals with hearing aids microphones on virtualized scenes using the new auralization metho d with a four-loudsp eaker virtualization setup. Chapter 1. Con tributions 4 1.3 Con tributions The main contribution of this researc h to the scientiﬁc ﬁeld of auditory p er- ception is the dev elopmen t of a new auralization metho d that addresses the curren t gap in the virtualization of sound sources using a small n umber of loudsp eak ers. Sp eciﬁcally , this metho d aims to achiev e b oth go o d lo calization accuracy and a high level of immersion simultaneously , which has b een a chal- lenge in previous approac hes. F urthermore, the proposed method com bines existing techniques. It can b e implemented using readily av ailable hardw are, requiring a minimum of four loudsp eak ers. This tec hnology mak es it more accessible for audiologists and researchers to create realistic listening scenarios for patien ts and participan ts while reducing the technical resources required for implemen tation. Ov erall, this work represen ts a v aluable contribution to the ﬁeld of auditory p erception and has the p otential to adv ance the understanding of spatial hearing and the developmen t of eﬀective hearing solutions. 1.4 Organization of the Thesis In Chapter 2 , a review examines previous work carried out in sev eral dif- feren t areas concerning virtualization and the auralization of sound sources. The chapter starts with an ov erview of the basic concepts of h uman sound p erception. Next, virtual acoustics are explored, reviewing the generation of virtual acoustic en vironments using diﬀerent rendering paradigms and meth- o ds. In addition, relev an t ro om acoustics concepts and ob jective parameters, and their relation to hearing p erception, are described. Finally , the review considers auralization and virtualization as applied to auditory research. This review stresses the imp ortance of virtual sound sources for greater realism and ecological v alidit y in auditory research and the c hallenges of adequately Chapter 1. Organization of the Thesis 5 creating a virtual en vironmen t fo cused on auditory research. Chapter 3 presents an in vestigation of binaural cue distortions in imp erfect setups. First, the metho ds are describ ed, including the complete auralization of signals using tw o diﬀerent methods and the system’s calibration. The in v es- tigation ﬁrst compares b oth auralization methods through the same calibrated virtualization setup in terms of spatial distortions. Then the spatial cues are examined with the addition of a second listener to the virtualized sound ﬁeld. Both inv estigations are p erformed with the primary listener on and oﬀ-center. In Chapter 4 , a b eha vioral study examines sub jectiv e eﬀort within virtualized sound scenarios. As the study was part of a collab orativ e pro ject, only one auralization metho d was selected, ﬁrst-order Am bisonics. The aim was to ex- amine ho w SNR and reverberation com bine to aﬀect eﬀort in a sp eec h-in-noise task. Also, the feasibility of using ﬁrst-order Am bisonics was examined. Ho w- ev er, the sound sources were w ell separated in space, and lo calization accuracy w as not a factor. An imp ortan t asp ect of the study w as an auralization issue in v olving head mo v ement observed during pilot data collection. This issue led to a solution that allow ed the study to con tin ue. The results v eriﬁed the relationships b et w een sub jective eﬀort and acoustic demand. F urthermore, this issue led to the further inv estigation of the eﬀect of oﬀ-center listening, considered in b oth Chapter 3 and Chapter 5 . In Chapter 5 , a h ybrid method of auralization is prop osed com bining the meth- o ds examined and used in previous chapters: VBAP and Ambisonics. This metho d w as designed to allow auralized signals to b e virtualized in a small repro duction system, thus providing b etter accessibility to research within the virtualized sound ﬁeld in clinics and researc h cen ters that do not ha ve a size- able acoustic apparatus. The hybrid auralization metho d aims to unite the strengths of b oth tec hniques: lo calization b y VBAP and immersion by Am- Chapter 1. Organization of the Thesis 6 bisonics. Both of these psychoacoustic strengths are related to the ro om’s im- pulse resp onse. The hybrid metho d con v olv es the desired signal with distinct parts of an Ambisonics-format impulse resp onse that c haracterizes the desired en vironmen t. The potential for generating auralizations for a repro duction system with at least four loudspeakers is demonstrated. The virtualization system was tested with three diﬀeren t scenarios. Parameters relev ant to the p erception of a scene, such as rev erb eration time, sound pressure lev el, and binaural cues, w ere ev aluated in diﬀeren t p ositions within the sp eak er arrange- men t. The eﬀects of a second participan t inside the ring were also in vestigated. The ev aluated parameters w ere as exp ected, with the listener in the system’s cen ter (sw eet sp ot). How ever, deviations and issues at sp eciﬁc presentation angles w ere identiﬁed that could b e impro v ed in future implementations. Such errors also need to b e further inv estigated as to their inﬂuence on the sub jec- tiv e perception of the scenario, whic h w as not performed due to the CO VID-19 pandemic. An alternativ e robustness assessment was p erformed oﬄine, exam- ining the lo calization accuracy with a mo del prop osed b y May et al. [ 182 ] The metho d also prov ed eﬀective for tests with hearing aids for listeners p ositioned in the cen ter of the sp eak er arrangement. Ho w ev er, the metho d p erformance considering hearing instruments with compression algorithms and adv anced signal pro cessing still needs to b e veriﬁed. Chapter 6 presents a general discussion of the feasibilit y of applying tests using the prop osed metho d and an o v erview of the pro cesses. In addition, the relev an t con tributions of the work are presen ted, as are the limitations and the suggestions for further impro v emen ts. Chapter 2 Literature Review 2.1 In tro duction The ﬁeld of audiology is concerned with the study of hearing and hearing dis- orders, as w ell as the assessmen t and rehabilitation of individuals with hearing loss [ 110 ]. In this review chapter, w e will explore v arious topics related to h uman binaural hearing, spatial sound, and virtual acoustics to pro vide a comprehensiv e o v erview of the curren t state of kno wledge in these ﬁelds and highligh t their imp ortant contributions to our understanding of hearing and auditory p erception. First, we will delv e in to the intricacies of h uman binaural hearing. Next, w e will examine the concepts of spatial hearing, including the v arious binaural and monoaural cues that contribute to our ability to lo cal- ize sound in space. W e will also explore the head-related transfer function, whic h describ es the w ay that sounds are ﬁltered as they tra vel from their source to the ear drum, as well as the sub jective asp ects of audible reﬂections. Next, we will turn our attention to spatial sound and virtual acoustics. W e will discuss the virtualization of sound, including the v arious methods used to achiev e this, suc h as auralization and virtual sound repro duction. W e will 7 Chapter 2. Human Binaural Hearing 8 also examine the diﬀeren t auralization paradigms used in auditory researc h, including binaural, panorama, vector-based amplitude panning, ambisonics, and sound ﬁeld syn thesis. W e will then examine the role of ro om acoustics in virtualization, and auditory research, including the v arious parameters, used to describ e ro om acoustics, such as reverberation time, clarity and deﬁnition, cen ter time, and parameters related to spatiality . Finally , w e will explore the use of loudsp eak er-based virtualization in auditory researc h, including hybrid metho ds and sound source lo calization, as well as the assessmen t of listening eﬀort. 2.2 Human Binaural Hearing The engineering side of the listening pro cess can be simpliﬁed mo deled through t w o input blo cks separated in space [ 92 ]. These inputs, frequency , and lev el are limited and are follo w ed b y a signal pro cessing c hain that relates the medium transformations for the wa ve propagation from air to ﬂuid and elec- trical pulses [ 315 ]. Although this blo ck mo deling can b e reasonably accurate for educational pur- p oses, it falls short of capturing the true eﬀect and imp ortance of listening on our essence as human b eings. The abilit y to feel and interpret the world through the sense of hearing, and to attribute meaning to sound ev ents, enables h umans to enric h their tangible world [ 56 , 244 ]. F or instance, a c haracteristic sound can ev ok e memories or trigger an alert [ 128 ]. A piece of music can bring tears to one’s eyes or p ersuade someone to purchase more cereal [ 13 , 114 ]. A p erson’s voice can activ ate certain facial nerves, turning hidden teeth into a smile. These are some of the reasons wh y researc hers and clinicians dedicate their liv es to understanding the transformation of sound even ts into auditory ev en ts, with a scien tiﬁc dedication fo cused on creating solutions and op ening Chapter 2. Human Binaural Hearing 9 opp ortunities for more p eople to exp erience the sound they lov e and deserve - a dedication fo cused on p eople and their needs . As the auditory system comprises t wo sensors, normal-hearing listeners can exp erience the b eneﬁts of comparing sounds autonomously , relating them to the space around them [ 21 ]. This constant signal comparison is the main principle of binaural hearing, where the diﬀerences b et ween these sounds allo w for the identiﬁcation of the direction of a sound even t, as w ell as the sensation of sound spatiality [ 9 , 40 ]. Usually , these signals are assumed to b e part of a linear and time-inv ariant system, whic h helps to study how humans in terpret the information present in the diﬀeren t signals across the time and frequency domains. Ho wev er, this assumption of linearity can fail when analyzing fast sound sources, reﬂectiv e surfaces, or sound propagating through disturb ed air [ 200 , 255 ]. Nonetheless, the adv an tages of quan tifying and capturing the eﬀect hav e led to signiﬁcant progress in hearing sciences. 2.2.1 Spatial Hearing Concepts Iden tifying the direction of incidence of a sound source based on the audible w a v es receiv ed b y the listener is deﬁned as an act or pro cess of h uman sound lo calization [ 285 ]. F or research in acoustics, it is relev an t to ac kno wledge that the receiver is, in general, a human b eing. The human hearing mec hanism’s main anatomical c haracteristic is the binaural system. There are t wo signal reception p oin ts (external ears p ositioned on opp osite sides of the head). Al- b eit, the whole set (torso, head, hearing pavilions) can also mo dify the signal that reac hes the tw o tympanic mem branes at some extent [ 153 , 216 ]. Hu- man binaural hearing and asso ciated eﬀects ha ve b een extensiv ely rep orted b y Blauert [ 38 ]. Chapter 2. Human Binaural Hearing 10 In addition to analyzing sound sources’ spatial lo cation, the cen tral auditory system extracts real-time information from the sound signals related to the acoustic en vironment, suc h as geometry and ph ysical properties [ 153 ]. An- other b eneﬁt is the p ossibility of separating and in terpreting com bined sounds, esp ecially from sources in diﬀeren t directions [ 170 , 242 ]. 2.2.2 Binaural cues The sound propagation sp eed in the air can be assumed to be ﬁnite and appro ximately constan t, considering it as an appro ximately non-disp ersive medium [ 18 ]. Thus, when the incidence is not directly fron tal or rear, the w a v efront trav els through diﬀeren t paths to the ears, reaching them at diﬀer- en t times. The time interv al b etw een that a sound takes to arrive on both ears is commonly expressed in the literature as Interaural Time Diﬀerence (ITD) [ 39 ]. It is crucial cue for sound source lo calization in lo w-frequency sounds [ 39 , 153 , 242 ]. Moreov er, it is considered the primary lo calization cue [ 306 ]. F or con tinuous pure tone signals and other p eriodic signals, the ITD can be expressed as the time Interaural Phase Diﬀerence (IPD) [ 285 ]. On the other hand, most mammals’ high-frequency sound source lo calization is based on a comparativ e analysis of sound energy in eac h ear’s frequency bands, the Interaural Lev el Diﬀerence (ILD). The named duplex theory sur- mises ITD cues as the basis to sound lo calization of low-frequency and ILD cues to high-frequency . The authorship of this is assigned to Lord Rayleigh at the b eginning of the last century [ 246 ]. These binaural cues are related to the azim uthal p osition. How ever, they do not presen t the same success explaining the localization on elev ated p ositions [ 37 , 250 ]. An am biguit y in binaural cues caused by head symmetry and referred to as the cone of confusion [ 296 ] can create diﬃculties to a correct sound source lo calization. The cone of confusion is the imaginary cone extended sidew ays from eac h ear where sound source Chapter 2. Human Binaural Hearing 11 lo cations will create the same interaural diﬀerences (see Figure 2.1 ). Figure 2.1: Two-dimensional r epr esentation of the c one of c onfusion. Head mov ements are essential for resolving the am biguous cues from sound sources lo cated on the cone of confusion. As the p erson mo ves their head, they change the reference and the incidence angle helping them to solv e the dualit y . This change is reﬂected in the cues asso ciated with directional sound ﬁltering caused by the h uman b o dy’s reﬂection, absorption, and diﬀraction. 2.2.3 Monaural cues Monoaural cues are related to spatial impression, esp ecially in the lo calization of elev ated sound sources. These cues give, to some exten t, some limited but crucial localization abilities to p eople with unilateral hearing loss [ 72 , 307 ]. This type of cue is cen tered on instan t level comparison and frequency changes. As the level of a contin uous enough sound source changes, the appro ximation or distancing of that source can b e estimated. F urthermore, when there are head mov emen ts that shape the frequency conten t, the disturbance, mainly the pinnae provide, can b eneﬁt the listener to learn the p osition of a sound source [ 129 , 292 ]. In addition, the imp ortance of the previous knowledge of the sound to the decon volution pro cess is also in vestigated, revealing mixed results [ 307 ]. Chapter 2. Human Binaural Hearing 12 2.2.4 Head-related transfer function The Head-Related T ransfer F unction (HR TF) describ es the directional ﬁltering of incoming sound due to h uman bo dy parts suc h as the head and pinnae [ 189 ]. The free-ﬁeld HR TF can b e expressed as the division of the impulse resp onses in the frequency domain measured at the en trance to the ear canal and the cen ter of the head but with the head absen t [ 108 ] (see Figure 2.2 ). HR TFs dep end on the direction of incidence of the sound and are generally measured for some discrete incidence directions. Mathematical mo dels can also generate individualized HR TFs based on anthropometric measures [ 52 ] or through geometric generalization [ 70 ]. Figure 2.2: A descriptive deﬁnition of the me asur e d fr e e-ﬁeld HR TF for a given angle. The referential system related to the head can b e seen in Figure 2.3 , where β is the elev ation angle in the midplane, and ϕ is the angle deﬁned in the horizon tal plane. Chapter 2. Human Binaural Hearing 13 Figure 2.3: Polar c o or dinate system r elate d to he ad incidenc e angles, adapte d fr om Portela [ 240 ]. Supp ose the distance to the sound source exceeds 3 meters. In that case, it can b e considered approximately a plane w a ve, thus making the previous HR TFs almost indep enden t of the distance to the sound source [ 38 ]. Blauert [ 39 ] also explain tw o other t yp es of HR TF, namely: • Monaural T ransfer F unction (MTF): relates the sound pressure, at a measuremen t p oint in the ear canal, from a sound source at an y p osition to a sound pressure measured at the same p oin t, with a sound source at a reference p osition ( ϕ = 0 and β = 0). MTF is giv en by MTF =  P i P 1  r,ϕ,β,f  P i P 1  ϕ =0 ◦ ,β =0 ◦ ,f , (2.1) where p i it can b e p 1 , p 2 , p 3 or p 4 . – p 1 sound pressure in the cen ter of the head position with the listener Chapter 2. Human Binaural Hearing 14 absen t; – p 2 sound pressure at the entrance of the o ccluded ear canal; – p 3 sound pressure at the entrance to the ear canal; – p 4 : eardrum sound pressure. • Interaural T ransfer F unction (ITF): relates the sound pressures at corre- sp onding measurement p oints in the tw o auditory canals. The reference pressure will then b e the ear that is directed to w ards the sound source. The ITF can b e obtained through ITF = P i Opp osite side of the source P i Side facing the source . (2.2) More considerable v ariations are seen ab o v e 200 Hz in HR TFs [ 293 ] b ecause the head, torso, and shoulders b egin to signiﬁcantly in terfere in frequencies up to approximately 1.5 kHz (mid frequencies). In addition, the pinna and the ca vum conc hae (space inside the most inferior part of the helix cross; it forms the vestibule that leads into the external acoustic meatus [ 270 ]) distort frequencies greater than 2 kHz. HR TF measuremen ts v ary from p erson to p erson, as seen in Figure 2.4 , where TS 1, TS 2, TS 3, and TS 4. represent HR TFs of diﬀerent p eople. When recording using mannequins or diﬀerent p eople’s ear canals (non-individu alized HR TFs), the repro duction precision in terms of spatial lo cation and realism tends to b e diminished [ 51 , 178 ]. This p o orer precision is b ecause the transfer function will diﬀer for eac h individual, especially at high frequencies [ 155 ]. This dep endence is related to the w a v elength and the singular irregularit y of the ear canal of each human b eing [ 38 ]. Chapter 2. Human Binaural Hearing 15 Figure 2.4: He ad-r elate d tr ansfer functions of four human test p articip ants, fr ontal incidenc e, fr om V orl¨ ander [ 293 ]. Binaural Impulse Resp onse A Binaural Ro om Impulse Resp onse (BRIR) results from a measurement of the resp onse of a ro om to excitation from an (ideally) impulsive sound [ 183 ]. The BRIRs are comp osed of a sequence of sounds. P arameters like the magnitude and the decay rate, the phase, and time distribution are the key to understanding how a BRIR can audibly char- acterize a ro om to a human p erception [ 167 ]. Alb eit the air contains a small p ortion of Co2 that is disp ersiv e, sound propagation v elo city can be considered homogeneous in the air (non-disp ersiv e medium) [ 312 ] for the Ro om Impulse Resp onses (RIRs). The ﬁrst sound from a source that reaches a receptor inside the ro om tra vels a smaller distance, and it is called direct sound (DS). Usually , the following sounds result from reﬂections that tra vel a longer path, losing energy on each interaction and resulting in an exp onen tial deca y of magnitude. The BRIR is prop osed to collect the ro om information as a regular Impulse Resp onse, although ha ving tw o sensors separated as the typical human head. No w ada ys, BRIR can b e recorded with s mall microphones placed in the ear canal of a p erson or utilizing microphones placed in mannequins [ 197 ]. A BRIR is the auditory time representation of a set source-receptor deﬁned b y its p osition, orientation, acoustic prop erties as directionalit y of the sound Chapter 2. Human Binaural Hearing 16 source, as well as from the physical elemen ts within the environmen t [ 38 , 108 ]. The conv olution of BRIR with audio signals is a feasible task for mo dern computation, which allows the creation and manipulation of sounds ev en in real-time applications [ 62 , 217 ]. Thus, it is p ossible to imp ose spatial and rev erb eran t characteristics of diﬀeren t spaces to a given sound [ 109 ]. 2.2.5 Sub jectiv e asp ects of an audible reﬂection The impulse resp onse is comp osed of the direct sound follo w ed by a series of reﬂections (initial and later reﬂections) [ 45 , 165 ]. Essential knowledge on how the human auditory system processes the sp ectral and spatial information con tained in the impulse resp onse has b een obtained through studies with sim ulated acoustic ﬁelds. [ 6 , 17 , 93 , 125 , 141 , 174 , 176 , 188 , 193 , 257 , 305 , 321 ]. The results of Barron’s exp erimen ts, depicted in Figure 2.5 , in volv ed the repro duction of b oth a direct sound and a lateral reﬂection. These t wo auditory stimuli were manipulated in terms of their time delay and relative amplitude, with the goal of eliciting sub jectiv e impressions correlated to these factors. By v arying the time b et w een the direct sound and the reﬂection, as w ell as the relative amplitude of these stimuli, it w as p ossible to understand b etter ho w these c haracteristics impact the o v erall auditory exp erience. Figure 2.5: Audible eﬀe cts of a single r eﬂe ction arriving fr om the side (adapt fr om R ossing [ 254 ]). Chapter 2. Spatial Sound & Virtual Acoustics 17 The audibility threshold curve indicates that the reﬂection will b e inaudible if the dela y or the relativ e level is minimal. The reﬂection’s sub jective eﬀect also dep ends on the direction of incidence of the sound source in the horizon tal and v ertical plane. It is p ossible to note that for dela ys of up to 10 milliseconds, the relativ e diﬀerence in level must b e at least -20 dB for the reﬂection to b e noticeable. The echo eﬀect is typically observ ed in delays of more than 50 milliseconds, b eing an acoustic rep etition with a high relativ e level—appro ximately the same energy as the direct sound. The coloring eﬀect is associated with the signiﬁcant c hange in the spectrum caused b y the constructive and destructiv e in terference of the sup erp osition of sound wa v es. The image c hange happ ens when there are reﬂections with relativ e lev els higher than direct sound or minimal dela ys. In this case, the sub jective p erception is that the sound source has a diﬀeren t p osition in space than the visual system p erceiv es. 2.3 Spatial Sound & Virtual Acoustics The sound p erceived by h umans is iden tiﬁed and classiﬁed based on physical prop erties, such as in tensity and frequency [ 242 ]. Human b eings are equipp ed with tw o ears (tw o highly eﬃcien t sound sensors), enabling a real-time compar- ison of these prop erties b etw een the captured sound signals [ 9 ]. The sounds and the dynamic interaction betw een sound sources, their p ositions, mov e- men ts, and the ph ysical interaction of the generated sound w av es with the en vironmen t can b e p erceiv ed b y normal-hearing p eople, providing what is called spatial aw areness [ 153 ]. That auditory spatial aw areness includes the lo calization of the sound source, estimation of distance, and estimation of the Chapter 2. Spatial Sound & Virtual Acoustics 18 size of the surrounding space [ 38 , 305 ]. A p erson with hearing loss may lose this abilit y partially or en tirely; the spatial aw areness is also tied to the lis- tener’s exp erience with the sound and the en vironmen t, motiv ation, or fatigue lev el [ 54 , 304 ]. In the ﬁeld of virtual acoustics, the ultimate goal is to generate a sound even t that elicits a desired auditory sensation, creating a Virtual Sound Environmen t (VSE) [ 293 ]. In order to ac hiev e this, it is necessary to syn thesize or record the acoustic prop erties of the target scene and subsequently repro duce them in a manner that accurately reﬂects the original acoustic conditions [ 97 ]. This in v olv es a careful consideration of the v arious factors that contribute to the o v erall auditory exp erience, including the sp ectral and spatial c haracteristics of the sound. By accurately recreating these prop erties, it is p ossible to create a highly immersive and realistic VSE that eﬀectiv ely conv eys the intended auditory exp erience to the listener [ 196 , 213 , 293 ]. 2.3.1 Virtualization No w ada ys, it is p ossible to create audio ﬁles containing information ab out sound prop erties related to a sp eciﬁc space [ 293 ]. F or example, it is p ossible to enco de information ab out the source and receptor p osition, the transmission path, reﬂections on surfaces, and the amoun t of energy absorb ed and scattered ( e.g., Odeon [ 59 ], a commercially a v ailable acoustical softw are). The sound ﬁeld prop erties can b e simulated, syn thesized, or recorded in-situ [ 113 , 293 ]. These signals can be enco ded and repro duced correctly in v arious reproduction systems [ 122 , 161 ]. The creation of ﬁles that can b e repro duced containing suc h information is called auralization. As diﬀerent interpretations of the terms o ccur in literature, in this thesis, the virtualization pro cess is considered to encompass the auralization and the reproduction of a sound (recorded, Chapter 2. Spatial Sound & Virtual Acoustics 19 sim ulated, or synthesized) that includes spatial prop erties. 2.3.1.1 Auralization Auralization is a relatively recent pro cedure. The ﬁrst studies w ere conducted in 1929; Spand¨ ock and colleagues tried to pro cess signals measured in a scale- created ro om. After that, in 1934, Spand¨ oc k [ 280 ] succeeded in the ﬁrst aural- ization, in the analogical wa y , using ultrasonic signals of scale mo dels recorded in magnetic tap es. In 1962 Schroeder [ 263 ] incorp orated the computing pro- cess into the auralization. In 1968 Krokstad [ 146 ] developed the ﬁrst acoustic ro om sim ulation softw are. The term auralization was in tro duced in the litera- ture by Kleiner in 1993: “Auralization is the pro cess of rendering audible, by ph ysical or mathematical mo deling, the soundﬁeld of a source in a space, in suc h a wa y as to simulate the binaural listening exp erience at a giv en p osition in the mo deled space.” (Kleiner [ 138 ]) In his b o ok titled Auralization, published in 2008, V orl¨ ander deﬁned: “Aural- ization is the technique of creating audible sound ﬁles from n umerical (simu- lated, measured, or syn thesized) data.” (V orl¨ ander [ 293 ]) In this work, auralization is understo o d as a technique to create ﬁles that can b e executed as p erceiv able sounds. An auralization metho d describes the tec h- nique; it can inv olve one or more auralization techniques. These sounds can then b e virtualized (repro duced) via loudsp eakers or headphones and provide audible information ab out a sp eciﬁc acoustical scene in a deﬁned space, fol- lo wing V orlander’s deﬁnition. That w as chosen to encourage the separation of the pro cess as an auralized sound ﬁle can contain information that allows it to b e deco ded in diﬀerent repro duction systems [ 320 ]. Auralization is consolidated in arc hitectural acoustics [ 45 , 148 , 165 , 254 ], and Chapter 2. Spatial Sound & Virtual Acoustics 20 it is also emerging in environmen tal acoustics [ 19 , 68 , 69 , 139 , 162 , 231 , 232 ]. This technique allows a piece of audible information to b e easily accessed and understo o d. It is also an integral part of the en tertainmen t industry in games, mo vies, and virtual or mixed realit y [ 320 ]. Knowing an en vironmen t’s acoustic prop erties allo ws it to manipulate or add syn thesized or recorded elements, leading the receiv er to the desired auditory impression, including the sound’s spatial distribution [ 62 ]. This pro cess is also used in hearing researc h, allowing researc hers to in tro duce more ecologically v alid sound scenarios to their study (See Section 2.3.4 ). Sound spatiality , or the p erception of sound wa ves arriving from v arious direc- tions and the abilit y to lo cate them in space, is a crucial aspect of the auditory exp erience [ 40 ]. Auralization, whic h is analogous to visualization, inv olves the represen tation of sound ﬁelds and sources, the simulation of sound propaga- tion, and the strategy to deco de in the spatial repro duction setup [ 293 ]. That is t ypically achiev ed through tri-dimensional computer mo dels and digital sig- nal pro cessing tec hniques, whic h are applied to generate auralizations that can b e repro duced via acoustic transducers [ 293 ]. The mo deling paradigm used to create the spatial sensation can b e p ercep- tually or physically based [ 39 , 106 , 164 , 276 ]. Multiple dimensions inﬂu- ence sound p erception; the t yp e of generation of the sound, the wind direc- tion, the temp erature, the source and the receptor mo v emen t, space (size, shap e, and conten t), receptor’s spatial sensitivity , and source directivity are some examples. That implies the imp ortance of physical eﬀects as Doppler shifts [ 96 , 284 , 293 ]. F urthermore, the review of ro om acoustic and psyc hoa- coustics elements (See Section 2.3.3 ) corrob orates the auralization mo deling pro cedure’s understanding. Chapter 2. Spatial Sound & Virtual Acoustics 21 2.3.1.2 Repro duction Sound signals con taining acoustic characteristics of a space can b e repro duced either with binaural techniques (headphones or loudsp eakers) or with multiple loudsp eak ers (m ultichannel techniques) [ 293 ]. Moreov er, an acoustic mo del for a space can be analytically or n umerically implemented, ha ving a series of comp etent algorithms and commercial soft ware and to ols av ailable [ 49 ]. With that, it is also p ossible to measure micro and macro acoustic prop erties for materials in a lab oratory or in-situ [ 206 ] and access databases of v arious co eﬃcien ts and indexes to an extended catalog of materials [ 50 , 71 , 158 , 266 ]. On the repro duction end of the virtualization pro cess, factors such as fre- quency and lev el calibration, signal pro cessing, and the frequency resp onse of the hardw are can signiﬁcantly impact the accuracy of the ﬁnal sound (e.g., the orien tation/correction of the microphone when calibrating the system [ 274 ]). Dep ending on the chosen paradigm, a lack of attention to these details may disrupt an accurate description of the sound ﬁeld, sound even t, or sound sen- sation [ 214 , 282 , 283 , 320 ]. Additionally , the qualit y of the stim uli ma y be compromised dep ending on the chosen repro duction technique, which is often tied to the hardware av ailable [ 77 , 166 , 275 , 276 ]. That can lead to undesired eﬀects on the level of immersion and problems with the accuracy of sound lo calization and identiﬁcation ( e.g. , source width, source separation, sound pressure level, and coloration and spatial confusion eﬀects [ 97 ]). The pro cess of building a VSE is called sound virtualization, whic h in volv es b oth the aural- ization and repro duction stages to create audible sound from a ﬁle. The main tec hnical approac hes or paradigms for repro ducing auralized sound are Binau- ral, Panorama, and Sound Field Synthesis (Section 2.3.2 ). These paradigms can be distinguished by their output, which can be ph ysically or p erceptu- ally motiv ated. F or example, while binaural metho ds are treated apart, they can b e intrinsically classiﬁed in a ph ysically-motiv ated paradigm since its suc- Chapter 2. Spatial Sound & Virtual Acoustics 22 cess relies on repro ducing the correct physical signal at a sp eciﬁc p oint in the listener’s auditory system, t ypically the entrance of the ear canal [ 106 ]. 2.3.2 Auralization P aradigms 2.3.2.1 Binaural Binaural hearing, whic h refers to the abilit y to p erceiv e sound in a three- dimensional auditory space, is a fundamental concept in auditory researc h and has b een extensiv ely studied by researchers suc h as Blauert [ 40 ]. In the con- text of auralization, the term ”binaural” refers to the sp eciﬁc paradigm that aims to repro duce the exact sound pressure level of a sound even t at the lis- tener’s eardrums. That can b e achiev ed through the use of headphones or a pair of loudsp eakers (kno wn as transaural repro duction) [ 314 ]. How ever, when using distan t loudsp eak ers, it is necessary to consider the in terference that can o ccur b et ween the sounds coming from eac h sp eak er. T o mitigate this issue, tec hniques such as cross-talk cancellation (CTC) [ 60 , 262 ] can b e employ ed, whic h inv olve manipulating a set of ﬁlters to cancel out the distortions caused b y the sound from one sp eaker reaching the other ear. Another form of bin- aural repro duction inv olves the use of closer loudsp eak ers that are nearﬁeld comp ensated. Binaural metho ds o ver headphones is commonly applied. It requires no ex- tensiv e hardw are (in simple setups that do not track the listener’s head), pro- viding a v alid acoustic representation and spatial a w areness [ 293 ]. A Disad- v an tage of this metho d can include the accuracy dep endence of individualized HR TF (as each h uman b eing ha ve his o wn slightly diﬀeren t anatomic ”ﬁl- ter set”) [ 314 ]. Over headphones also, the mov ement of the listener’s head can b e disruptive to the immersion [ 179 ]. It ma y require tracking the head’s Chapter 2. Spatial Sound & Virtual Acoustics 23 mo v emen t [ 11 , 115 , 252 ], e.g. , when mo v ements are required or allow ed in an exp erimen t. F urthermore, a listener wearing a pair of headphones ma y not represent a realistic situation. F or example, an exp erimen t with a virtual auditory environmen t that represents a regular daily con versation with aged participan ts ma y lose the task’s ecological v alidity . Also, usually , headphones prev en t the listener from wearing hearing devices. Figure 2.6 illustrates the main idea b ehind diﬀerent binaural repro duction setups. Figure 2.6: Binaur al r epr o duction setups: He adphones, tr ansaur al and ne ar-ﬁeld tr ansaur al (A dapte d fr om Kang and Kim [ 131 ]). 2.3.2.2 P anorama The P anorama paradigm encompasses auralization metho ds fo cused on deliv- ering accurate ITDs and ILDs at the listener’s p osition, also kno wn as stereo- phonic tec hniques [ 106 , 276 ]. The most well-kno wn metho ds are based on am- plitude panning [ 180 ], including Lo w-order Ambisonics [ 91 ] and V ector-Based Amplitude Panning (VBAP) [ 241 ]. High Order Am bisonics is an extension of the Am bisonics metho d, which is not typically considered a panning metho d but rather a sound ﬁeld synthesis metho d (See Section 2.3.2.3 ). VBAP em- plo ys lo cal panning b y rendering sound using pairs or triplets of loudsp eakers. In con trast, Ambisonics uses global panning to pro duce a single virtual source using all av ailable loudsp eakers [ 282 ]. Chapter 2. Spatial Sound & Virtual Acoustics 24 V ector Based Amplitude P anning: The V ector-Based Amplitude Pan- ning (VBAP) is a ﬁrst-order appro ximation of the comp osition of emitted signals that creates virtual sources [ 241 ]. The virtualization pro cess using VBAP is based on amplitude panning in tw o dimensions (v ariation in ampli- tude b et ween the speakers), whic h is derived from the Law of Sines and Law of T angents (see Benest y et al. [ 23 ] for a deriv ation of these la ws). The original h yp othesis of VBAP assumes that the sp eak ers are arranged symmetrically , equidistan t from the listener, and in the same horizon tal plane. VBAP do es not limit the num b er of usable speakers but uses a maximum of 3 simulta- neously . The sp eakers are arranged in a reference circle (2D case) or sphere (3D case), and a limitation of the tec hnique is that virtual sources cannot b e created outside of this region. VBAP is mainly used for the repro duction of syn thetic sounds [ 180 ]. The formulation of the VBAP metho d (from Pulkki [ 241 ]) for tw o dimensions starts from the stereophonic conﬁguration of tw o channels (see Figure 2.7 ). Reform ulated to a v ector base, formed by unit length v ectors l 1 = [ l 11 l 12 ] T and l 2 = [ l 21 l 22 ] T that p oin t to the sp eakers and the unit length v ector p = [ p 1 p 2 ] T whic h p oints to the virtual source and presen ts itself as a linear combination of the v ectors l 1 and l 2 . The notation T is used here to iden tify the matrix transp osition. Chapter 2. Spatial Sound & Virtual Acoustics 25 Figure 2.7: V e ctor-b ase d amplitude p anning: 2D display of sound sour c es p ositions and weights. Consider the vetor p : p = g 1 l 1 + g 2 l 2 , (2.3) where g 1 and g 2 (scalar) are the gain factors to b e calculated for p ositioning the vector relative to the virtual source. In matrix form, there is p T = g L 12 , (2.4) where g = [ g 1 g 2 ], and L 12 = [ l 1 l 2 ] T . The gains can b e calculated b y g = p T L − 1 12 = [ p 1 p 2 ]    l 11 l 12 l 21 l 22    − 1 . (2.5) Chapter 2. Spatial Sound & Virtual Acoustics 26 The formulation is also expanded to 3 dimensions: p = g 1 l 1 + g 2 l 2 + g 3 l 3 , (2.6) and p T = g L 123 , (2.7) where g 1 , g 2 , and g 3 are gain factors, g = [ g 1 g 2 g 3 ], and L 123 = [ l 1 l 2 l 3 ] T . The detailed deriv ation can b e found at [ 241 ]. The deriv ation can use triangles and the three-dimensional system. Figure 2.8 presen ts an example of the sound source distribution for virtualization of a virtual source P using VBAP in three dimensions. Figure 2.8: Diagr am r epr esenting the plac ement of sp e akers in the VBAP te chnique A dapte d fr om [ 241 ]. Some factors collab orate so that metho ds based on Amplitude Panorama are widely used in virtual audio applications, such as the lo w computational cost and ﬂexibility in the sp eak ers’ placement. Am bisonics: The original Ambisonics auralization metho d is an amplitude panning metho d that diﬀers from th e V ector Base Amplitude Panning (VBAP) Chapter 2. Spatial Sound & Virtual Acoustics 27 metho d in sev eral w ays. While VBAP only uses p ositiv e w eights to pan sound across speakers, Ambisonics uses a com bination of p ositiv e and neg- ativ e weigh ts to create a shift in frequency and amplitude. This results in a more homogeneous sound ﬁeld, alb eit with a broader virtual source. Addition- ally , Am bisonics has all loudsp eakers active for any source p osition. A t the same time, VBAP only activ ates sp eciﬁc sp eakers based on the desired source p osition [ 199 ]. One of the b eneﬁts of Am bisonics is its scalabilit y for repro duction on diﬀeren t loudsp eak er arra ys and the ability to enco de and deco de the sound ﬁeld dur- ing the recording and repro duction pro cess [ 161 ]. This v ersatilit y is p ossible b ecause Am bisonics signals can b e directly recorded using an appropriate mi- crophone array or sim ulated through numerical acoustic algorithms that mo del the directional sensitivit y of the microphone arra y [ 5 , 46 , 59 ]. The signal can then be deco ded and rendered in real time to diﬀeren t arra ys with v arious n um b ers of loudsp eak ers. Hence, an Am bisonics deco der is a to ol for con v ert- ing an Ambisonics representation of a sound ﬁeld into a m ultic hannel audio format that can b e repro duced ov er a giv en sp eaker setup [ 130 , 235 , 238 ]. In order to repro duce an Ambisonics signal, it must ﬁrst b e transformed, or ”de- co ded,” in to a format compatible with a sp eciﬁc sp eak er conﬁguration. Simple deco ders consist of a frequency-indep enden t weigh ting matrix [ 282 ]. It is also p ossible to repro duce the signal via headphones, whic h can b e considered a sp eciﬁc sp eaker setup, b y scaling it down to binaural signals [ 320 ]. Addi- tionally , Am bisonics can enhance realism by trac king head mov emen ts and correcting binaural signals utilizing HR TFs as ﬁlters [ 277 ]. This feature is par- ticularly relev an t in the recording and broadcasting industry , particularly with emerging technologies such as augmented realit y (AR) [ 320 ]. In summary , an Am bisonics deco der is a to ol used to transform an Ambisonics representation of a sound ﬁeld in to a multic hannel audio format that can b e repro duced o v er a giv en loudspeaker setup, enabling the creation of immersiv e sound exp eriences. Chapter 2. Spatial Sound & Virtual Acoustics 28 According to Schr¨ oder [ 261 ], decomp osition in spherical harmonics (SH) is a recen t analysis and widely used in the mo deling of directivity paths. Analo- gous to a F ourier transform in the frequency domain, SH in the spatial domain decomp oses the signal into spherical functions (in the F ourier transform, the decomp osition is in sine or cosine functions) weigh ted by the co eﬃcien ts of the corresp onding harmonic spheres. According to Pollo w [ 239 ], it is commonly applied in m ulti-dimensional domain problems. Ho wev er, the analytical re- quiremen ts for cases with few dimensions (tw o in the case of the sound ﬁeld) are considerably simpliﬁed. Manipulating the wa ve equation by separating v ariables is an essen tial to ol here. App endix C sho ws the deriv ation of SH through the separation of v ariables of the wa ve equation in spherical co ordinates (Equation C.1 ). The solutions to the linear w a v e equation in spherical co ordinates expressed in the frequency domain (Helmholtz equation) are orthogonal basis functions Y m n ( θ , ϕ ) where n is the degree and m is the order. These angle-dep endent functions are called spherical harmonics and can represent, for example, a sound ﬁeld [ 309 ]. That is the core assumption to Ambisonics recording and repro duction. Figure 2.9 depicts SHs up to order N = 2. Chapter 2. Spatial Sound & Virtual Acoustics 29 Figure 2.9: Spheric al Harmonics Y m n ( θ , ϕ ) . R ows c orr esp ond to or ders 0 ≤ n ≤ 2 , c olumns to de gr e es − n ≤ m ≤ n (adapte d fr om Pol low [ 239 ]). The four SH weigh ts Y m n ( θ , ϕ ) to enco de all the spatial audio information into a First-Order Ambisonics ﬁle is given by: B m n ( t ) = s ( t ) Y m n ( θ s , ϕ s ) (2.8) where the s ( t ) is the source signal in the time domain and Y m n ( θ s , ϕ s ) the enco ding co eﬃcients to the source s ( t ). Computed as ﬁrst order in the B- format, the normalized comp onen ts can b e describ ed as [ 172 ]:                  W = B 00 = S Y 00 ( θ S , ϕ S ) = S (0 . 707) X = B 11 = S Y 11 ( θ S , ϕ S ) = S cos θ S cos ϕ S Y = B 1 − 1 = S Y 1 − 1 ( θ S , ϕ S ) = S sin θ S cos ϕ S Z = B 10 = S Y 10 ( θ S , ϕ S ) = S sin ϕ S (2.9) The resulting four-c hannel signals are the equiv alent to an omnidirectional (W) and three orthogonal bi-directional (commonly called ﬁgure-of-eigh t) mi- Chapter 2. Spatial Sound & Virtual Acoustics 30 crophones (X, Y, and Z). The c hannels can represent the pressure and the particle velocity of a given sound (See Figure 2.10 . It is p ossible to transco de and manipulate the generated signal to change its orientation with a matrix m ultiplication in signal pro cessing. Also, it is possible to deco de the same enco ded signal to a single sound source, headphones, or a multic hannel arra y . Figure 2.10: B-format c omp onents: omnidir e ctional pr essur e c omp onent W, and the thr e e velo city c omp onents X, Y, Z. Extr acte d fr om Rumsey [ 256 ]. The limitation of ﬁrst-order Ambisonics is spatial precision since it is only eﬀec- tiv e at a point cen tered within a deﬁned area. This limitation can b e o vercome with higher-order comp onen ts. Adding a set of higher-order comp onen ts im- pro v es the directionalit y . How ever, increasing the n um b er of comp onents will also increase the num b er of loudsp eakers required to play higher-order Am- bisonics. That means a more accurate sound ﬁeld representation if the order is increased. The num b er of channels N for a p eriphonic Am bisonics of order m order is N = ( m + 1) 2 for 3D repro duction and N = (2 m + 1) for 2D [ 65 ]. Chapter 2. Spatial Sound & Virtual Acoustics 31 2.3.2.3 Sound Field Synthesis The ob jective of tec hniques from Sound Field Syn thesis remains the same as in the techniques from the p erceptually-motiv ated paradigm: a spatial sound ﬁeld repro duction. The p erceptually motiv ated are centered on the psychoacoustic eﬀects of summing binaural cues that lead the listener to p erceiv e a virtual source. On the other side, the Sound Field Synthesis techniques rely on the ph ysical reconstruction of the original/simulated sound ﬁeld to a sp eciﬁc area. The main techniques are the extension of Am bisonics repro duction to higher orders called Higher Order Am bisonics (HOA) and the W a ve Field Syn thesis (WFS) [ 24 , 25 ]. The HOA extends the order of the classical Am bisonics and, therefore, the n um b er of sound sources arranged in a spherical array . As the Ambisonics order increases, the p erceiv ed sound source direction accuracy also increases, although requiring more loudsp eak ers [ 97 ]. An imp ortan t distinction can b e made b et w een the Ambisoni cs and HOA. Given the truncation p ossibility in Am bisonics, the metho d is treated as a soft transition from a p erceptually based metho d to a ph ysically based one. Although the HOA utilizes the same principle as Am bisonics, it is classiﬁed as a sound ﬁeld syn thesis paradigm (ph ysically based) along with WFS. The HOA limitations are rep orted in the literature b y several studies [ 26 , 27 , 64 , 73 , 236 , 299 ], esp ecially the aliasing in frequency that leads to pressure errors and the sweet sp ot size [ 253 ]. The WFS form ulation relies on Huygen’s principle: a propagatin g w av efront at an y instan t is shap ed to the env elop e of spherical wa ves emanating from ev ery p oin t on the wa vefron t at the prior instant [ 281 ], the principle is illustrated in Figure 2.11 ). Chapter 2. Spatial Sound & Virtual Acoustics 32 Figure 2.11: Il lustr ation of Huygen ’s Principle of a pr op agating wave fr ont. A conceptual diﬀerence b et w een WFS and HOA is that for HOA, the sound c haracteristics are describ ed in a point (or small area) inside the array , while in WFS, the sound pressure that m ust be kno wn is on the border of the repro duc- tion area. A review and a comparison of b oth metho ds and their compromises in terms of spatial aliasing errors and noise ampliﬁcation is presen ted in [ 65 ]. Their ﬁndings indicated similar constraints to b oth metho ds. Ho wev er, they are b oth c haracterized by the requiremen t of large arrays of loudspeak ers. The HO A has b een found to ha v e a higher limit of the size of the cen ter area. In con trast, the WFS has limitations on the distortion of higher frequencies (alias- ing), dep ending on the num b er of loudsp eakers. Regardless, as the scop e of the thesis aims to w ork with a small n um b er of loudsp eak ers, they will not b e thoroughly discussed. 2.3.3 Ro om acoustics Diﬀeren t asp ects are tak en into accoun t when describing the hearing exp eri- ence of a h uman b eing in a space, for example, the individualit y of auditory Chapter 2. Spatial Sound & Virtual Acoustics 33 training, familiarity with space, p ersonal preferences, humor, fatigue, culture, and the sp ok en language [ 10 , 32 , 123 , 169 , 184 , 190 , 250 ]. Ho wev er, there are similarities in the expressions used b etw een sample groups. Such an eﬀect is attributed to the similarit y of the auditory-cognitiv e mec hanism of human b eings [ 40 ]. In arc hitectural acoustics and ro om acoustics, studies of sound prop erties as- sume that a sound source and a receiv er in a giv en space is a linear and time- in v arian t system (L TI) [ 45 ]. Thus, a complete L TI characterization to each source-receiv er can b e expressed b y its impulse resp onse in the time domain or the transfer function in the frequency domain [ 45 , 293 ]. 2.3.3.1 Ro om acoustics parameters Ob jectiv e parameters are essen tial in acoustic pro jects and in comp ositions of statistical mo dels that aim to predict the human interpretation of acous- tic phenomena [ 254 ]. Ob jectiv e parameters deriv ed from the L TI’s impulse resp onse aim to create metrics that quantify sub jective descriptors from n u- merous exp eriments [ 254 ]. The calculation and measuremen t of many ob jective parameters are describ ed in an app endix to the International Organization for Standardization (ISO) standard 3382 [ 127 ]. 2.3.3.2 Rev erb eration Time The reverberation time (R T) measures the time it takes for the impulse re- sp onse’s sound pressure lev el to decrease to one-million th of its maximum v alue, equiv alent to a decline of 60 dB; it is also often referred to as R T 60 or T 60 . Note that the reverberation time measures how fast the decay of sound energy o ccurs and not how long the reverberation lasts in the environmen t, Chapter 2. Spatial Sound & Virtual Acoustics 34 dep ending on the sound source p o w er and the bac kground noise. The R T w as the ﬁrst parameter studied, mo deled, and understo od, related to sev eral sub jectiv e asp ects of the human hearing exp erience in a ro om. T o da y , this is considered the most critical parameter, although it is not enough to describ e h uman p erception completely . W allace C. Sabine [ 258 ] initially describ ed it through mathematical relations obtained by an empirical metho d, follo wed later by developing the theoretical bases together with W. S. F ranklin [ 86 ]. The expression gives the analytical form of the reverberation time obtained by Sabine: T 60 = 0 . 161 V S ¯ α [s] (2.10) where V is the v olume of the ro om and S ¯ α represents the amoun t of absorption presen t in the environmen t, the unit is named [Sabins] in honor of Sabine. Subsequen t mo dels improv ed the calculation of the rev erb eration time by con- sidering the ev olution of the energy densit y , and the sound absorption carried out by the air [ 74 ], the sp ecular reﬂection of each sound wa ve [ 187 , 271 ], the propagation path [ 148 ], the triaxial arrangemen t of the diﬀeren t absorption co eﬃcien ts [ 14 , 82 ], among others. In addition to statistical theory , T 60 can b e obtained from the measurement of the impulse response. In measuremen ts, the T 60 is obtained considering limitations regarding the bac kground noise lev el and the sound source’s max- im um sound pressure level. Th us, according to the ISO 3382 standard [ 127 ], the measuremen t’s dynamic range must present the end of the deca y at least 15 dB ab o v e the background noise and start 5 dB b elow the maximum. F or example, the sound pressure level required to measure the T 60 in a ro om with a background noise of 30 dB is 110 dB (30 + 15 + 60 + 5). Chapter 2. Spatial Sound & Virtual Acoustics 35 Linear b ehavior is noted b y observing the square of the energy h 2 ( t ) in the deca y curve plotted in dB (See Figure 2.12 ). Thus, to reduce the dynamic range required for measuremen t, it is p ossible to estimate the T 60 through other limits. The T 60 is commonly mistak en for double the T30, which is not true. The T 20 and T 30 also corresp ond to the time the sound pressure level (SPL) inside the ro om tak es to drop 60 dB but estimated from measuremen t restricted to ranges of -5 dB to -25 dB, and -5 dB to -35 dB, respectively . Therefore, the linear energy decay pro duces the relation T 60 = T 30 = T 20 . Figure 2.12: Normalize d R o om Impulse R esp onse: example fr om a r e al r o om in the time domain (left), and in the time domain in dB (right). The T 20 is obtained as the deca y rate by the linear least-squares regression of the measured decay curve, also called the Sc hro eder curve, in the range -5 dB to -25 dB. In comparison, the T 30 is obtained when the curv es’ adjustment is carried out in the range b etw een 5 dB and -35 dB [ 127 ]. 2.3.3.3 Clarit y and Deﬁnition The clarity and deﬁnition parameters express a balance b etw een the energy that arriv es earlier and later in the impulsiv e response, which is related to Chapter 2. Spatial Sound & Virtual Acoustics 36 h uman b eings’ particular abilit y to distinguish sounds in sequence [ 44 , 45 , 57 , 247 , 254 ]. With the ﬁrst reﬂections arriving within the limits of 50 or 80 milliseconds, the tendency is to b e integrated b y the auditory system in to the direct sound. Th us, if the ﬁrst reﬂections con tain relatively greater energy than the rev erb erating tail, the sound will b e exp erienced as ampliﬁed. On the other hand, if the reverberating tail has more energy and is long enough, it will b e p erceiv ed and mask the next direct sound. The limits of 50 and 80 milliseconds are deﬁned in the literature as appropriate in optimizing sp eec h and music, resp ectively [ 245 , 247 ]. The Clarit y deﬁned in the ISO 3382 standard measures the ratio b etw een the energy in the ﬁrst reﬂections and the energy in the rest of the impulse response. Clarit y’s p ositive v alues, whic h are given in dB, mean more energy in the ﬁrst reﬂections. Negative v alues indicate more energy in the rev erb erating tail. A n ull v alue indicates the balance b etw een the parts of the impulse resp onse. The ”Clarity” is giv en by: C 80 = 10 log R 80ms 0 h 2 ( t )d t R ∞ 80ms h 2 ( t )d t ! (2.11) and C 50 = 10 log R 50ms 0 h 2 ( t )d t R ∞ 50ms h 2 ( t )d t ! (2.12) The ”Deﬁnition” parameter, in turn, is presented on a linear scale and com- putes the ratio b et ween the energy con tained in the ﬁrst reﬂections b y the total energy of the impulse resp onse. V alues greater than 0.5 indicate that most of the impulse resp onse’s energy is contained in the ﬁrst reﬂections. The ”Deﬁnition” is given b y: Chapter 2. Spatial Sound & Virtual Acoustics 37 D 80 = R 80ms 0 h 2 ( t )d t R ∞ 0 h 2 ( t )d t (2.13) and D 50 = R 50ms 0 h 2 ( t )d t R ∞ 0 h 2 ( t )d t (2.14) 2.3.3.4 Cen ter Time The central time is a parameter analogous to the previous ones, measuring the balance b et w een the energy con tained in the early reﬂections and the rev er- b erating tail’s energy . Ho wev er, the central time is particularly interesting in p oin ting out what can b e seen as the center of gra vit y of the squared impulse resp onse. Moreov er, the cen tral time do es not previously delimit the transition barrier b et w een ﬁrst reﬂections and a rev erb erating tail. Thus, the deﬁnition of the central time for an impulse resp onse is given b y: t s = R ∞ 0 th 2 ( t )d t R ∞ 0 h 2 ( t )d t (2.15) 2.3.3.5 P arameters related to spatialit y The relation to the human auditory spatiality sensation and the ob jective parameters deriv ed from measuremen ts are studied in detail in the litera- ture [ 20 , 21 , 88 ]. They observ e how the sound energy distribution is arranged from the directions and timing asp ect. The principal sensations and their Chapter 2. Spatial Sound & Virtual Acoustics 38 related parameters are presen ted for b etter understanding. Apparen t Source Width The Apparent Source Width (ASW) is related to the impression of the sound source’s size or ho w the source is distributed in the space. An ob jective metric asso ciated with ASW is the Lateral Energy F raction (LEF). The Equation 2.16 gives the LEF: LEF = R 80ms 5ms h 2 b ( t )d t R 80ms 0 h 2 ( t )d t (2.16) Where h ( t ) is the impulse resp onse measured with a microphone that has an omnidirectional sensitivit y pattern and h b ( t ) is the impulse response measured with a microphone that has bidirectional sensitivit y (pressure gradient) at the same p osition as the omnidirectional. Th us, this ob jective parameter represen ts the ratio b et w een the lateral energy that reac hes the receptor b et w een 5 and 80 milliseconds ( i.e. , the energy con- tained in the early reﬂections, excluding the direct sound) and the total energy arriving from all directions b et ween 0 and 80 milliseconds [ 21 ]. As low and mid frequencies make the dominant con tributions to the LEF, this parame- ter is usually represen ted by the arithmetic mean of the o cta ve bands’ v alues obtained b et w een 125 Hz and 1000 Hz [ 45 , 254 ]. Listener Env elopmen t The Listener En v elopmen t (LEV) is related to the impression of b eing immersed in the ro om’s rev erb eran t ﬁeld. F rom Bradley and Soulodre’s exp erimen ts [ 44 ] with test participan ts inside an anechoic ro om, the sense of inv olvemen t was assessed with loudsp eakers. The authors ﬁnd the LEV asso ciated with the ratio b et ween the lateral energy Chapter 2. Spatial Sound & Virtual Acoustics 39 and the total energy reaching the receptor. The lateral energy is contained in the impulse resp onse measured with a bidirectional microphone after the ﬁrst 80 milliseconds. The total energy is deﬁned as the impulse resp onse measured with an omnidirectional microphone, in free ﬁeld condition, and 10 meters a w ay from the sound source utilizing the same sound source at the same p o wer. The ratio is called ”Lateral Strength” (LG) and is given b y: LG = R ∞ 80ms h 2 b ( t )d t R ∞ 0 h 2 10 ( t )d t (2.17) In teraural Cross-Correlation Co eﬃcien t In his work, Keet [ 133 ] pro- p osed an auditory-cognitive pro cess relating the spatial impression to compar- ing the signals receiv ed by b oth ears. The cross-correlation function measures the degree of similarity of the signals. Therefore, the In ter-Aural Cross-Correlation Co eﬃcien t (IACC) was incorp o- rated as a third parameter related to the spatial impression. The IACC is de- ﬁned as the absolute maxim um v alue of the ratio b et ween the cross-correlation function of the impulse resp onses collected from the left ear ( h L ( t )) and the righ t ear ( h R ( t )) by the total energy contained in eac h of them. IA CC = max         R t 2 t 1 h L ( t ) h R ( t + τ )d t r  R t 2 t 1 h 2 L ( t )d t   R t 2 t 1 h 2 R ( t )d t          (2.18) where R t 2 t 1 h 2 L ( t )d t and R t 2 t 1 h 2 R ( t )d t are the energy b et w een the instan t t1 and the instant t2 in the impulse resp onse from the left and righ t ears; the expres- sion R t 2 t 1 h L ( t ) h R ( t + τ )d t is the cross correlation function b et ween the impulse resp onse; τ is giv en b et ween 0 and 1 ms. Chapter 2. Spatial Sound & Virtual Acoustics 40 2.3.4 Loudsp eak er-based Virtualization in Auditory Re- searc h Virtualization of sounds through auralization of simulated en vironmen ts hav e b een used in arc hitectural design to preview the sound behavior in ro oms when c hanging space design or even to preview a completely new space that is not built yet b efore the building pro cess [ 293 ]. As the ro om acoustics simulation and the auralization pro cess ev olves as a sound equiv alent to the visual preview rendered in 3D mo dels, it has found applications in research also outside the arc hitectural ﬁeld [ 282 , 320 ]. Lately , the virtualization of sound sources has b een applied to extend the ecological v alidity of sound scenarios in auditory researc h [ 161 ]. Researc h that utilizes binaural virtualization with headphones are common in auditory research literature [ 4 , 38 , 142 , 248 , 305 ]. A series of adv antages may include, but are not limited to, the individual con trol of the stim uli repro duced in eac h ear, the smaller setup, and easier calibration [ 251 ]. Although binaural repro duction is a suitable metho d for some research questions, others may require a more complex test en vironmen t, esp ecially researc h encompassing hearing aids. In that regard, the use of loudsp eak ers can b e asso ciated with single loud- sp eak er presen tations where a loudsp eak er repro duce a single sound source p ositioned in space ( e.g ., [ 176 , 230 , 268 , 321 ]) or virtualization metho ds that manage auralized ﬁles to b e p erceiv ed as single sources or complex environ- men ts [ 89 , 177 , 295 ]. F or the virtualization of sound sources, the loudsp eak er n umber dep ends on the selected metho d of enco ding and deco ding the spatial information [ 293 ]. F or example, a quadrophonic loudsp eak er arrangement w as found to b e suﬃcient Chapter 2. Spatial Sound & Virtual Acoustics 41 to repro duce a diﬀuse sound ﬁeld to a p erceptual spatial impression when constraining listener mov ements [ 117 ]. How ever, utilizing Directional audio co ding, Laitinen and Pullkki [ 150 ] found that to hav e an adequate repro duction of diﬀuse sound, it would b e necessary from 12 to 20 loudsp eak ers. VBAP and HOA techniques w ere ev aluated with diﬀerent num b ers of loud- sp eak ers in simulations by Grimm et al. [ 97 ]. Perceptual lo calization error (PLE) w as computed for the arra ys utilizing these tec hniques. Eigh t loud- sp eak ers w ere estimated to b e suﬃcien t in terms of sound source lo calization. In the same w ork, Grimm textitet al. show ed that the eﬀects of virtualiza- tion with VBAP and HOA on hearing-aid b eam patterns are present with less than 18 loudsp eakers in a bandwidth of 4 kHz (spatial aliasing higher than 5.7 dB criterion). Ho wev er, the sp ectral distances, a w eigh ted sum of the ab- solute diﬀerences in ripple and sp ectral slop e b etw een virtual and repro duced sound sources, w ere all v ery low, indicating high naturalness when compared to sub jective data from Mo ore and T an [ 191 ]. Aguirre [ 1 ] ev aluated VBAP and its v ariation V ector-Based Intensit y P anning (VBIP) in terms of spatial accuracy with 30 normal-hearing participants within an array of eigh t loudsp eakers. There was no signiﬁcant diﬀerence among stim uli (sp eec h, in termitten t white noise, and con tinuous white noise) on b oth tec hniques. It was found that an a v erage PLE around 4 ◦ , consistent with the v alues simulated by Grimm et al. [ 97 ]. Ev aluating SNR b eneﬁts on HA b eamformer algorithms within a spherical arra y with 41 loudsp eak ers, Oreinos and Buc hholz [ 212 ] found similar results b et w een the real en vironmen t and the auralized one. Repro duction errors in HO A repro duction to hearing aids were studied in [ 213 ]. The reverberation w as found to reduce the time-a v eraged errors in tro duced by HOA, implying that the frequency limit of usable renderings with HOA can b e extended in Chapter 2. Spatial Sound & Virtual Acoustics 42 those environmen ts. Loudsp eak er-based virtualization ha v e b een used in a hearing researc h con text ev aluating normal hearing, hearing impaired and hearing aid users through diﬀeren t metho ds [ 6 – 8 , 30 , 31 , 55 , 61 , 80 , 93 , 102 , 136 , 168 , 174 , 188 , 220 , 303 , 322 ]. F urthermore, some studies explored the ecological v alidity of the tech- niques with sub jective resp onses based on psyc ho-linguistic measure comparing in-situ and those virtualized in lab oratory [ 66 , 103 , 286 ]. The pro cess of virtualizing sound sources using loudsp eak ers is complex [ 282 ] and requires a thorough understanding of physical acoustics, psyc hoacoustics, signal pro cessing fundamen tals, and prop er calibration of softw are and hard- w are [ 126 , 165 , 293 ]. As a result, researc h centers ha v e developed systems to establish reliable procedures for virtualizing sound sources for auditory testing. Examples of suc h systems include the transaural CTC system dev elop ed by Asp¨ oc k et al. [ 17 ], the system with a spherical arra y of 42 loudsp eakers capable of rendering scenarios using HOA up to ﬁfth order and VBAP presented by P arsehian et al. [ 215 ], and the Loudsp eak er Based Ro om Auralization (LoRa) system dev elop ed b y F avrot [ 79 ], whic h is capable of rendering auditory scenes using pure HOA and a h ybrid v ersion with Nearest Sp eaker (NSP) and HOA. In addition, Grimm [ 100 , 101 ] introduced the T o olb o x for Acoustic Scene Cre- ation and Rendering (T ASCAR), whic h is capable of rendering p erceptually plausible scenes in real-time using VBAP and HOA 2D implementation. A recen t study by Hamdan and Fletcher [ 107 ] prop osed a metho d using only tw o loudsp eak ers in 2022 based on the transaural metho d with cross-talk cancella- tion. While this list is not exhaustive, these studies provide recommendations and guidelines for the ﬁeld and highlight the imp ortance of implementing re- liable systems and verifying their sound ﬁelds ob jectively and sub jectively to increase the ecological v alidity of auditory research and hearing aid develop- men t. Chapter 2. Spatial Sound & Virtual Acoustics 43 2.3.4.1 Hybrid Metho ds Hybrid metho ds that com bine the repro duction of direct sound and reverber- ation are not new, ha ving b een dev elop ed since at least the 1980s with the Am biophonics group [ 42 , 95 ]. They prop osed the Am biophonics metho d to re- pro duce concerts to one or t w o home listeners as if they w ere in the hall where the recording was p erformed. This metho d combined crosstalk canceled stereo- dip ole and conv olved signals with the IR from the recorded spaces [ 76 , 94 ]. The system aims to enhance the repro duction of recordings from existing systems (e.g., stereo and 5.1). The group also developed a new recording metho dology called Am biophone. This metho d is a microphone arrangemen t comp osed of t w o head-spaced omnidirectional microphones co v ered b y a baﬄe in the rear to fav or ro om reﬂections from frontal directions. In 2010, the Loudsp eak er based Ro om Auralization (LoRa) metho d developed b y F avrot [ 79 ] applied the hybrid concept using HOA and the nearest sp eak er (NSP) for the direct sound and early reﬂections. The metho d uses the env e- lop e from simulated ro oms to reduce the computational cost by multiplying it with uncorrelated noise. The scheme was originally conceptualized for a large spherical 69 loudsp eaker array . Figure 2.13 depicts its system schematic. Figure 2.13: L oR a implementation pr o c essing diagr am. The multichannel RIR is derive d in eight fr e quency b ands and for e ach p art of the input RIR (Figur e fr om F avr ot [ 79 ]). Chapter 2. Spatial Sound & Virtual Acoustics 44 P elzer et al. [ 221 ] presented a comparison b etw een transaural or cross-talk cancellation (CTC), VBAP , and 4th-order Am bisonics among t wo new h ybrid prop osals: (1) direct sound and early reﬂections through CTC and late re- ﬂections with 4th-order Ambisonics and (2) direct sound and early reﬂections through VBAP and late reﬂections with 4th-order Ambisonics. The h ybrid metho ds were implemented in a single case without generalization to diﬀeren t sim ulations. These metho ds w ere tested within a 24 loudspeaker arra y with no statistically signiﬁcant change in human lo calization p erformance by an y of the metho ds. Pausc h et al. [ 217 ] presented a metho d designed for inv estigations with sub jects with hearing loss. The metho d mixes binaural techniques to pro cess comp onents in complex simulated environmen ts and CTC to present them ov er loudsp eak ers. At the same time, the head p osition can b e track ed, allo wing user interaction. In 2017 Pulkki et al. [ 243 ] presented the ﬁrst-order directional audio co ding (DirA C) metho d is a technique for repro ducing spatial sound o v er a standard stereo audio system. It is based on using ﬁrst-order am bisonic channels, which enco de the sound pressure and particle v elo cit y at a listener’s lo cation to repre- sen t the sound ﬁeld. These c hannels are transformed into a stereo audio signal using a frequency-dep enden t matrix, whic h preserves the spatial cues that are imp ortan t for localizing sound sources. The metho d implies the direction of ar- riv al of the sound source to b e able to virtualize it through amplitude panning. It uses real-world recordings. The DirAC metho d is eﬀectiv e for v arious types of audio conten t, including m usic, sp eec h, and sound eﬀects. It can p oten tially impro ve the spatial realism of audio exp eriences ov er traditional stereo systems and has applications in m yriad ﬁelds, including en tertainmen t, gaming, and virtual reality . T able 2.1 presents an ov erview of the listed metho ds and the techniques in- Chapter 2. Spatial Sound & Virtual Acoustics 45 v olv ed, their purp ose, and their parameters. T able 2.1: Non-exhaustive ov erview list of hybrid auralization metho ds prop osed in the literature. The A-B order of the techniques do es not represent any order of signiﬁcance. Y ear Method Authors T echnique A T echnique B Proposed Loudspeaker Number Proposed to 1986 Ambiophonics F arina et al. [ 76 ] Crosstalk Cancelation Binaural 2 Music Repro duction 2005 DirAC Pulkki et al. [ 243 ] Ambisonics VBAP 2+ multiple applications 2010 LoRa F avrot [ 79 ] HOA NSP 64 2014 - P elzer et al. [ 221 ] Crosstalk Cancelation HOA 24 2014 - P elzer et al. [ 221 ] VBAP HOA 24 2018 Extended Binaural Real-Time Pausc h et al. [ 217 ] Binaural CTC 2 Hearing Loss Inv estigations 2.3.4.2 Sound Source Lo calization A comparison among VBAP and Am bisonics conducted by F rank [ 84 ] demon- strated a median deviation in exp erimental results from the ideal lo calization curv e of 2.35 º ± 2.93 º on VBAP and 1.05 º ± 4.07 º to third order Am bisonics using max-r E deco der. The setup w as placed in a t ypical non-anec hoic studio and a regular array of 8 loudsp eakers, listening in a 2.5m radius circle at the cen tral p osition. The sub jective results from 14 participants were listening to pink noise. These exp erimental results were compared to a lo calization mo del (Lindemann [ 157 ]) based on ITD and ILD from impulse resp onses. The re- sults sho wed a deviation close to the standard deviation in sub jective listening, 2.35 º on VBAP and 3.37 º to third order Ambisonics using max-r E . Oﬀ-center measuremen ts w ere p ointed out as necessary for future inv estigation b y the author. Am bisonics in ﬁrst, third, and ﬁfth order was examined in another study by F rank and Zotter [ 85 ], with 15 normal hearing, 12 loudsp eak ers setup, and listening to pink noise with in terv al attenuation. This study in vestigated the eﬀect of the p osition (centered and oﬀ-center) and the order. The results sho w ed, to the ﬁrst order rendered, a lo calization error of around 5 º to the Chapter 2. Spatial Sound & Virtual Acoustics 46 cen tered listener and 30 º for the oﬀ-cen ter p osition. Also, Am bisonics in the ﬁrst order with four loudsp eakers and the third order with eight loudsp eakers w as in v estigated b y Stitt et al. , [ 283 ]. This study was conducted in a non-reverberant en vironmen t to verify the oﬀ-center p osition and the Am bisonics order. The setup w as a circular arra y with 2.2 meters of radius and an R T of 0.095 s. Eigh teen test participants listened to white noise bursts of 0.2 s. At this acoustically dry condition, the cen tered ﬁrst-order median absolute error w as around 10 º , while in the oﬀ-cen ter p ositions tested w as close to 30 º . As exp ected, the error w as lo w er in the third order achieving a median of absolute error around 8 º in the cen ter and 11 º oﬀ-center. A study by Lauren t et al., [ 275 ] in v estigated the eﬀect of 3D audio repro- duction artifacts on hearing devices assessing ITD, ILD, and DI on HOA (third and ﬁfth orders), VBAP , distance-based amplitude panning (DBAP), and multiple-direction amplitude panning (MD AP). The study was conducted in a non-anechoic ro om with 32 loudsp eak ers in a spherical conﬁguration. The loudsp eak er distance from the cen ter w as 1.5 m, except for the four loudsp eak- ers at the top, which w ere distan t only 98 cm. This study inv estigated cen tered and oﬀ-center p ositions (10 and 20 cm). The results presented an exp ected Am bisonics limitation of repro ducing ITD b ecause of the spatial aliasing at high frequencies, accordingly to the authors. In addition, they in vestigated MVDR monoaural b eamformer, whic h did not repro duce the correct ITD, es- p ecially oﬀ-center. At the cen tered p osition, only DBAP could not correct repro duced ITD. Am bisonics ITDs deteriorate more than VBAP on oﬀ-center p ositions. ILD errors in virtualized sound sources can make the system unre- liable for testing HI with pro cessing based on ILDs. In the exp eriment, the ILDs were less aﬀected by b eamforming pro cessing in VBAP , and Am bisonics b eneﬁted from the max RE deco ding that maximizes the energy v ector. How- ev er, the authors exp ect a b etter ILD represen tation from VBAP as HO A has Chapter 2. Spatial Sound & Virtual Acoustics 47 an aliasing frequency limitation. Hamdan and Fletcher [ 107 ] presen t the dev elopmen t of a compact tw o-loudsp eaker virtual sound repro duction system for clinical testing of spatial hearing with hearing-assistiv e devices. The system is based on the transaural metho d with cross-talk cancellation and is suitable for use in small, reverberant spaces, suc h as clinics and small researc h labs. The authors ev aluated the system’s p erfor- mance regarding the accuracy of sound pressure repro duction in the fron tal hemisphere. They found that it could pro duce virtual sound ﬁelds up to 8kHz. They suggest that tracking the listener’s p osition could improv e the system’s p erformance. Overall, the authors b elieve this system is a promising to ol for the clinical testing of spatial hearing with hearing-assistive devices. Finally , a study by Bates et al. , [ 22 ], ev aluated second-order Ambisonics and VBAP lo calization errors in sub jectiv e listening tests and ITD and IAF C com- parisons. They presented the stimuli to a sim ultaneous set of nine listeners in diﬀeren t p ositions inside a concert ro om (around 1 s of R T). With 16 loud- sp eak ers, they selected 1 second of sp eech (male and female), white noise, and m usic. The results indicate that VBAP and Am bisonics tec hniques cannot consisten tly create spatially accurate virtual sources for a distributed audience in a rev erb eran t en vironment. The oﬀ-center p ositions are compromised by tec hnique and stimulus. Dep ending on the stimuli, centered p ositions resulted in lo calization errors b et ween 10 º and 20 º degrees. In the spatial distribution inside the ring, a bias from the target image p osition and tow ards the nearer con tributing loudsp eak er is more presen t in the Am bisonics than the VBAP . The authors men tioned that the ro om acoustics could also impact lo calization accuracy . The num b er of v ariables across these previous studies and their con tributions is massive, e.g., ob jective measures, tec hnique v ariations, n umber of loud- Chapter 2. Spatial Sound & Virtual Acoustics 48 sp eak ers, loudsp eak er distance, num b er of simultaneous listeners, reverbera- tion time, and form of the arra y . T able 2.2 presents an ov erview of metho ds and estimated or measured lo cal- ization error. Chapter 2. Spatial Sound & Virtual Acoustics 49 T able 2.2: Ov erview of Lo calization Error Estimates or Measurements from Loudsp eak er-Based Virtualization Systems Using V arious Auralization Metho ds. Metho d Error at cen ter p osition Error at oﬀ-cen ter p osition LS num b er Presen t Study Iceb erg VBAP/Am bisonics 30 º (Max Estimated) 7 º (Average Estimated) 30 º (Max Estimated) 7 º (Average Estimated) 4 F rank [ 84 ] VBAP 2.35 º (Average) N/A 8 F rank [ 84 ] HO A (3rd order) 3.37 º (Average) N/A 8 Zotter [ 85 ] Am bisonics 5 º (Median) 30 º (Median) 12 Zotter [ 85 ] HO A (3rd order) 2 º (Median) 15 º (Median) 12 Zotter [ 85 ] HO A (5thorder) 1 º (Median) 10 º (Median) 12 Stitt et al. , [ 283 ] Am bisonics 10 º (Median) 30 º (Median) 8 Stitt et al. , [ 283 ] HO A (3rd order) 8 º (Median) 11 º (Median) 8 Bates et al. , [ 22 ] Am bisonics Second Order 10 º (mean) 20 º (mean) 16 Bates et al. , [ 22 ] VBAP 10 º (mean) 20 º (mean) 16 Grimm et al. [ 97 ] HO A (3rd order) 2 º (Estimated) 6 º (Estimated) 8 Grimm et al. [ 97 ] VBAP 4 º (Estimated) 6 º (Estimated) 8 Aguirre [ 1 ] VBAP 4 º (Median) N/A 8 Hamdan and Fletcher [ 107 ] CTC 2 º (Max Head Displacemen t) N/A 2 Huisman et al. [ 125 ] Am bisonics 30 º (Median) N/A 4 Huisman et al. [ 125 ] HO A (3rd order) ≈ 15 º (Median) N/A 8 Huisman et al. [ 125 ] HO A (5th order) ≈ 8 º (Median) N/A 12 Huisman et al. [ 125 ] HO A (11th order) ≈ 5 º (Median) N/A 24 Chapter 2. Listening Eﬀort Assessment 50 2.4 Listening Eﬀort Assessmen t The regular task of following a conv ersation, listening to a p erson’s sp eec h, or in teracting with someone in a conv ersation may require additional eﬀort in an unfa v orable or c hallenging sound en vironmen t [ 227 ]. The listening eﬀort is deﬁned as ”the delib erate allo cation of mental resources to ov ercome obstacles in goal pursuit when carrying out a [listening] task” [ 224 ]. Studying asp ects of the listening eﬀort related to diﬀerent acoustic situations through reliable metho ds can lead to the developmen t of solutions to reduce it, improving the qualit y of life [ 304 ]. How ever, there is no consensus in the literature on the b est metho d to measure listening eﬀort. A ttempts to measure ho w m uch energy a person tak es in a sp eciﬁc acoustic sit- uation may rely on diﬀeren t paradigms. There are ob jective measuremen ts of ph ysiological parameters in literature asso ciated with changes in eﬀort, such as pupil dilation [ 151 , 209 , 211 , 301 , 302 , 319 ], resp onses to brainstem fre- quency (FFRs) and cortical electro encephalogram (EEG) activity from ev en t- related p oten tials [ 28 , 33 ], or alpha band oscillations [ 186 , 223 ]. In addition, the b ehavioral p ersp ective studies changes in resp onse time in single [ 204 ] or dual-task paradigm tests, also assuming that they are related to c hanges in cognitiv e load in auditory tests [ 87 , 228 ] [ 225 ]. In turn, sub jectiv e assess- men ts of listening eﬀort are p erformed through questionnaires [ 323 ] or eﬀort scales [ 147 , 149 , 249 , 260 ] and their results generally agree with p erformance metrics [ 192 ]. Although sub jectiv e measuremen ts are intuitiv e and v alid, they tend to b e less accepted as an indication of the amoun t of listening eﬀort b ecause of diﬀerences b et w een ob jectiv e and sub jective outcomes [ 151 , 225 ]. F or instance, Zekveld and Kramer [ 318 ] presen t evidence of disagreement b et ween the ph ysiological and the sub jective measure where the young normal-hearing participants at- Chapter 2. Listening Eﬀort Assessment 51 tributed high sub jective eﬀort to the most challenging conditions despite their smaller pupil dilation. The authors assumed that the metho dological asp ects and the participant’s tendency to drop out were also related to pupil dilation at lo w levels of in telligibility . In a study on syn tactic complexit y and noise level in the auditory eﬀort, W endt et al. [ 300 ] ev aluated it through self-rated eﬀort and pupil dilation. They found b oth background noise and syntactic complexit y reﬂected in their measurements. How ever, at high levels of in telligibilit y , the metho ds show diﬀerent results. According to the authors, the explanation is that each measure represen ts a diﬀerent asp ect of the eﬀort. In its turn, Picou et al. [ 226 ]; and Picou and Rick etts [ 229 ] found sub jective ratings of listening eﬀort were correlated with p erformance instead of the listening eﬀort utilizing the resp onse time as a b eha vioral measure in a dual-task. Interestingly though, in this study , a question ab out control w as correlated to their resp onse time results. The v aried outcomes from sub jective and ob jective paradigms pro- p osed to ac hieve a proxy to listening eﬀort can indicate that these metho ds are quantifying separated asp ects of a complex global pro cess [ 12 , 224 ]. Another explanation suggests a bias in the sub jective metho d due to the heuris- tic strategies adopted b y the participan ts to minimize the eﬀort [ 192 ]. The men tioned strategy would consist of replacing the question ab out the amount of eﬀort sp en t with a more straightforw ard question related to how they p er- formed in the task. Concomitan tly , studies based on ob jective measuremen t paradigms also hav e divergen t results. F or example, ev en among physiological measures sensitiv e to the spectral conten t of stim uli, such as pupil dilation and alpha p o w er, they are not alwa ys related, and can b e sensitiv e to diﬀerent as- p ects of listening eﬀort [ 186 ]. Even within the same paradigm, a diﬀerent task ma y indicate that diﬀerent asp ects are b eing observed. F or example, Brown and Strand [ 53 ] analyzed the role of the working memory as a weigh ting factor on listening eﬀort. Although increasing bac kground noise indeed increases lis- tening eﬀort measured by the dual-task paradigm, the memory load was not Chapter 2. Listening Eﬀort Assessment 52 aﬀected. They also suggested that the w orking memory and listening eﬀort are related in the recall-based single-task, unlik e in the dual-task. In Lau et al. [ 151 ] signiﬁcant diﬀerences b et w een sentence recognition and word recog- nition w ere found on pupil dilation measurements and in sub jective ratings, although with no correlation b et ween ob jective and sub jectiv e measures. The demand for mental resources can also be aﬀected b y p ersonal factors, suc h as fatigue and motiv ation [ 224 ]. A t the same time, several physical-acoustical artifacts can degrade a sound, creating or leading to diﬃculties in everyda y comm unication (increasing listening eﬀort), esp ecially in so cial situations. The masking noise, the sp ectral con ten t of the noise, the Signal-to-Noise Ratio (SNR), and the en vironmen t reverberation are examples of artifacts capable of smearing the temp oral en velope cues [ 163 ]. Also, sp eech in telligibilit y was assessed in a virtual environmen t that consists in a large spherical array of 64 loudsp eakers repro ducing Mixed Order Ambisonics (MO A) [ 6 ] presen ted comparable results of Sp eec h Reception Threshold (SR T) compared to real ro om in a co-lo cated situation of masker and target. With spatial separation of 30 degrees the virtual en vironmen t led to an SR T b eneﬁt of 3 dB, it was argued that b eneﬁt was not present in more reverberant or complex scenes suggesting the masking eﬀect of more challenge scenes. SR Ts for normal hearing and hearing-impaired using hearing aids were also in- v estigated b y [ 31 ]. A complex scenario (reverberant cafeteria) and an anec hoic situation w ere ev aluated in a spherical arra y of 41 loudsp eakers. The virtu- alization w as pro vided con volving the direct sound and the early reﬂections parts of the RIR with the anechoic sen tence and presen ting the sound through the Nearest Sp eak er (NSP) and the late reﬂections part of the RIR are created through the directional env elop e of eac h loudsp eaker with uncorrelated noise. Chapter 2. Concluding Remarks 53 The reviewed studies were conducted in lab oratories mainly taking adv an tage of spatial sound and virtual acoustics via loudsp eak er or headphones repro- duction. Thus, the complex nature of h uman auditory phenomena and the imp ortance of repro ducibilit y in hearing research highlight the need for inno- v ativ e to ols suc h as spatial sound [ 134 ]. Virtualized sounds allows for realistic and controllable sound en vironments, enabling control o ver selected param- eters and consistent repro duction of exp erimen ts [ 61 , 161 , 282 , 293 ]. This tec hnology can help hearing inv estigations b ecome more true-to-life and reli- able [ 134 , 161 , 251 ]. F or example, it can b e used to study listening eﬀort and sp eec h in telligibility using virtual sound sources to create ecologically v alid and controlled en vironmen ts [ 7 , 177 ]. It also can enable the integration of vir- tual sound scenarios with ecological tasks in volving m ultiple p eople, providing an ecologically v alid assessment of the p erformance of hearing solutions more accessible than large-ﬁeld studies ( e.g., in Bates et al., [ 22 ]). Additionally , spatial audio enables the accessible in v estigation of spatial sep- aration’s eﬀects on binaural cues considering diﬀeren t environmen ts, the role of binaural hearing in spatial p erception, and new hearing aid hardw are and algorithms [ 61 , 97 , 213 ]. Overall, spatial sound & virtual acoustics in hearing researc h oﬀers numerous b eneﬁts and represents a v aluable to ol for adv ancing our understanding of hearing and developing eﬀectiv e hearing solutions. 2.5 Concluding Remarks The literature review suggests a contrast b et ween lo calization and immersion in auralization metho ds that virtualize sound using a low num b er of loudsp eak ers. Th us, there is a need for a method that can achiev e useful performance on b oth lo calization and immersion with a small num b er of loudsp eak ers and that is reliable in rendering sound for listeners in the presence of another Chapter 2. Concluding Remarks 54 listener within the virtualized sound ﬁeld. Previous metho ds, including hybrid approac hes, hav e b een developed using a larger n um b er of loudsp eakers and diﬀeren t techniques for balancing energy . A recent study prop osed a metho d using only tw o loudsp eak ers in 2022. How ever, it implemen ted a diﬀerent auralization method and had its limitations. The proposed metho d in this study is innov ativ e, using a ro om acoustic parameter called center time to calculate the energy balance of ro om impulse resp onses and combining it with t w o known auralization metho ds. Chapter 3 Binaural cue distortions in virtualized Am bisonics and VBAP 3.1 In tro duction In acoustics, the complex comm unication scenarios can inv olve, simultaneous sound sources, distracting background noise, mo ving sound sources, sources without large spatial separation, and lo w signal to noise ratio. Although p eo- ple with normal hearing can deal with most of these conditions in a relatively eﬃcien t w a y , p eople with hearing loss p erform po orly [ 273 , 289 , 317 ]. Since so- cial ev en ts are often a real example of complex comm unication, the interaction barriers make p eople a void this and sometimes ostracizing themselves [ 16 , 63 ]. That can b e a factor in decreasing the qualit y of life of p eople with hearing problems. In hearing research, inno v ative signal pro cessing tec hniques, new devices, more 55 Chapter 3. In tro duction 56 p o w erful hardware, and up dated parameter settings are con tinuously devel- op ed and ev aluated. These technological improv ements aspire to resolv e com- m unication problems in ev eryda y situations for hearing aid users [ 227 ], increas- ing their so cialization and quality of life [ 119 ]. T ests as sp eec h recognition in noise are dev elop ed and tailored to ev aluate the human auditory resp onse on ev eryda y acoustics situations b etter than clinical based in pure tones stim- ulation [ 145 ]. Even though the tasks are moving tow ards a more realistic represen tation, they still need to improv e the ecological v alidity [ 134 ]. Auralization metho ds are designed to create ﬁles mean t to b e repro duced to a sp eciﬁc listener or a group of listeners; these ﬁles contain particular c haracter- istics that try to mimic a recorded or digitally created sound scene according to the metho d. The mathematical formulations that pro duce these c haracteristics for the psyc hoacoustically based metho ds focus on deliv ering accurate binaural cues. The listener p osition, ph ysical obstacles as the listener mov ement will impact diﬀerently on distinct metho ds and cues. A VSE is an auralized sound ﬁeld that can con tain realistic elements. Cur- ren tly , it is p ossible to create VSE employing loudsp eaker arrays or headphones for the listener, suc h as high bac kground noise, high reverberation, and con- comitan t sound ev en ts from diﬀeren t directions [ 61 , 79 , 294 ]. F urthermore, through a VSE, it is also p ossible to enable a participan t to w ear, for example, a hearing aid during the test. Thus, the researcher can main tain control of the stim uli, the incidence direction, signal-to-noise ratio (SNR), among other settings, while examining the hearing device p erformance in a more ecological situation [ 98 , 161 , 269 ]. Although nov el tec hnologies emerge and contribute to emulating sound sources and ev en en tire complex sound scenes with humans’ so cial interaction [ 267 ], these opp ortunities are often o v erlo ok ed in auditory ev aluations. Typically , Chapter 3. In tro duction 57 tests are p erformed b y observing only one individual within the laboratory [ 81 , 89 , 104 , 152 , 169 , 175 ]. F urthermore, the systems are designed to acquire resp onses from a single individual at a time [ 41 , 79 , 102 , 118 , 195 , 218 – 220 , 259 ]. A reasonable explanation for this is the lo w cost and complexit y of auralization through headphones. More complex tec hniques, lik e W av e Field Syn thesis, do not limit the listener to a restricted sp ot [ 207 ], repro ducing a complete sound ﬁeld, although at the cost of a large n um b er of sound sources in a sp eciﬁcally treated ro om. So cial situations can ha ve eﬀect on p eople’s listening eﬀort [ 230 , 234 ] and their motiv ation to listen [ 181 , 224 ]. In this context, so cial interactions hav e b een sim ulated through av atars or audio visual recordings in virtual environmen ts, gaining space in auditory researc h [ 116 , 160 , 161 , 272 , 298 ]. Although it can b e considered a signiﬁcan t asset, it also fo cuses on a single individual’s resp onses to simulated so cial stimuli. The scenario creates a ground for this study to inv estigate controlled acousti- cal c hanges on the VSE. This study assesses tw o main situations within a ring of loudsp eakers virtualizing sound sources on Ambisonics and VBAP: (1) the displacemen t of the listener from the center (sweet sp ot), and (2) the eﬀect including a second simultaneous listener inside the ring. These topics can help understand the p erception of sound in these sp eciﬁc virtualization metho ds, increasing the fundamental scien tiﬁc basis for future hearing research appli- cations. The c hanges to the sound ﬁeld w ere observ ed in three ma jor spatial cues: ITD, ILD, and IACC. That w as explored by c hanging the listener’s p osi- tion and including a second listener inside the ring of loudsp eakers to measure BRIRs. These metrics can describ e the spatial p erception of an auralized sound sig- nal [ 47 , 48 ], being ITD and ILD resp onsible by lo calization and IA CC p erceived Chapter 3. Metho ds 58 spaciousness and the listener env elopment [ 44 ]. Therefore, these measuremen ts can indicate the p ossibilit y of a simultaneous second participant in any hear- ing test with virtualized spatially distributed sound sources. Tw o diﬀeren t auralization techniques were used to virtualized sound sources, vector-based amplitude panning (VBAP) [ 241 ] and Am bisonics [ 91 ]. Both tec hniques rely on the same receptor-dep endent psyc hoacoustic paradigm to provide an au- ditory sense of immersion for those with normal hearing [ 161 , 180 ]. These tec hniques aim deliveri ng the correct binaural cues to a p oint or area to create a realistic spatial sound impression, alb eit through diﬀeren t mathematical for- m ulations. The work inv estigates if the techniques can pro vide an appropriate spatial impression for y oung normal-hearing listeners. Hyp othesis The main research question is how auralized scenarios with VBAP and Am bisonics are aﬀected when displaced from the center and with another listener inside the ring. The h yp othesis is that lo calization cues can b e b etter provided by VBAP , esp ecially in oﬀ-cen ter p ositions. In contrast, Am- bisonics can provide a b etter sense of immersiv eness. Also, the second listener w ould impact Ambisonics more than VBAP virtualized sound sources. 3.2 Metho ds The exp eriment was conducted in tw o diﬀeren t lo cations; The ﬁrst one is a sound treated test ro om at the Hearing Sciences - Scottish Section in Glasgow (See Figure 3.1 ), the second is an anechoic test ro om at Eriksholm Researc h Cen tre (See Figure 3.2 ). This section presen ts the ro oms’ acoustic characteri- zations and the metho ds used in this exp erimen t. Chapter 3. Metho ds 59 Figure 3.1: He aring Scienc es - Sc ottish Se ction T est R o om. Figure 3.2: Eriksholm T est R o om. 3.2.1 Setups and system c haracterization The exp erimen t conducted in Glasgo w was in a large sound-pro of audiometric b o oth (4.3 × 4.7 × 2.9 m; IAC Acoustics). An azimuthal circular arra y conﬁg- uration of 24 loudsp eak ers (3.5-m diameter; 15 ◦ of separation; T annoy VX6) w as used. The ceiling and walls w ere co v ered with 100-mm deep acoustic foam w edges to reduce reﬂections; the ﬂo or was carpeted with a foam underlay . The AD/D A audio in terface that w as used was a F erroﬁsh Mo del A32. The loud- sp eak ers received signals that w ere ampliﬁed b y AR T SLA4 ampliﬁers. The reference microphone used to c haracterize the Glasgow T est Ro om was a 1/2” G.R.A.S 40AD pressure-ﬁeld microphone set with e GRAS 26CA preampliﬁer. It was oriented 90 degrees v ertically from the sound source. At Eriksholm, an equiv alen t setup was ﬁtted. This time in a full anec hoic ro om from IA C Chapter 3. Metho ds 60 Acoustics. The ro om’s outer dimensions (6.7 × 5.8 × 4.9 m; ) and inner di- mensions, from the tip of the foam edges (4.3 × 3.4 × 2.7 m). An azim uthal circular array conﬁguration of 24 active loudsp eakers (16 Genelec 8030A and 8 Genelec 8030C; 2.4-m diameter; 15 ◦ of separation) was used. The AD/DA w as a MOTU PCI-e 424 com bined with a ﬁrewire 24-channel audio extension. The reference microphone used to characterize the Eriksholm test ro om w as a 1/2” B&K 4192 pressure-ﬁeld and a preampliﬁer type 2669, supplied by pow er mo dule 5935. It w as oriented 90 degrees v ertically from the sound source. The signal acquisition and pro cessing w ere en tirely through Matlab 2020a soft- w are using the IT A-T o olb o x v.9 [ 29 ]. The technical setup w as equiv alent in b oth ro oms, a B&K head and torso sim- ulator (HA TS) mo del 4128-C mannequin for measurements, and a K no wles E lectronics M annequin for A coustic R esearch (KEMAR) was used as a ph ys- ical obstacle. Although tec hnically , both devices are head and torso sim ulators, in this thesis, HA TS will refer to the B&K 4128-C for simplicity . The sampling rate of the recordings w as ﬁxed at 48 kHz, resulting in an uncertaint y of ± 20 µ s, therefore not compromising the ﬁnal analysis. 3.2.1.1 Rev erb eration time The reverberation time is one of the most critical ob jectiv e parameters of a ro om [ 154 ]. The deca y of sound energy to 60 dB b elow its p eak after the cessation of a sound source c haracterizes the R T. The parameter is frequency- dep enden t; it is asso ciated with sp eec h understanding sp eec h, sound quality , and the sub jective p erception of the size of the ro om. F or controlled en viron- men ts, the v alues are fractions of seconds. The T 60 for b oth ro oms in the third o cta v e is presented in Figure 3.3 . Chapter 3. Metho ds 61 Figure 3.3: R everb er ation time in thir d of o ctave b ands up to 16 kHz. The ro om’s rev erb eration time T 20 w as measured using a loudsp eak er, arbi- trarily c hosen, and microphone setup as in Section 3.2.1 . The measurement and analysis were p erformed in Matlab through the IT A-T o olb o x soft w are. 3.2.1.2 Early-reﬂections T o ensure that there is no inﬂuence of the environmen t, Recommendation ITU-R 1116-3:2015 [ 126 ], determines that the magnitude of the ﬁrst reﬂections should b e at least 10 dB b elo w the magnitude of the direct sound ∆SPL ≥ 10 dB. The diﬀerences in the SPL that are determined in the en vironmen ts of this work met this requirement. T able 3.1 sho ws the diﬀerence in sound pressure lev el b et w een the direct sound and early reﬂections. Higher diﬀerences in the Erkisholm environmen t are consisten t with its anec hoic setup compared to the sound treated b o oth in Glasgow, where the ﬂo or provide some energy to the reﬂections. Chapter 3. Metho ds 62 T able 3.1: Sound pressure level diﬀerence b et ween direct sound and early reﬂections ∆ SPL [dB] ∆ SPL [dB] Angle Eriksholm Glasgo w 0 -20.99 -14.94 15 -23.40 -15.31 30 -22.66 -14.61 45 -21.97 -15.45 60 -20.39 -13.28 75 -21.22 -15.19 90 -17.71 -15.33 105 -21.49 -15.22 120 -17.83 -15.68 135 -20.12 -15.23 150 -19.70 -14.62 165 -19.13 -16.11 180 -24.57 -15.03 195 -23.56 -13.52 210 -22.62 -14.81 225 -21.04 -15.39 240 -22.29 -14.25 255 -23.73 -14.37 270 -20.90 -14.01 285 -24.06 -12.56 300 -19.61 -15.95 315 -17.68 -15.03 330 -21.46 -15.66 345 -23.08 -15.95 3.2.2 Pro cedure The exp erimen t studied how the presence of a second listener within a loud- sp eak er ring aﬀects the spatial cues of the repro duced sound ﬁeld. The data w ere collected through the HA TS, and the second listener being sim ultaneously inside the virtualized sound area w as sim ulated through another mannequin (KEMAR), as shown in Figures 3.4 and 3.5 . Using the results for the reverberation time as presented in Section 3.2.1 , the appropriate length of a logarithmic sw eep signal w as calculated as approxi- mately four times larger than the higher v alue of T 60 (1.49 seconds). Also, a stop margin of 0.1 seconds was set to ensure the quality of the ro om impulse Chapter 3. Metho ds 63 Figure 3.4: HA TS (with motion-tr acking cr own) and KEMAR inside test r o om in Glasgow. Figure 3.5: HA TS and KEMAR inside ane choic test r o om at Eriksholm. resp onses (RIRs) that w ere obtained [ 75 , 194 ]. The frequency of the sweep w as from 50 Hz to 20 kHz. The p osition of the head has a signiﬁcant eﬀect on the signals that are mea- sured. T o ha v e a reliable assessmen t of the absolute tri-dimensional p osition of the HA TS, its p osition w as measured with a the Vicon infra-red trac king Chapter 3. Metho ds 64 system with an accuracy of 0.5 mm in Glasgow. A t Eriksholm a laser tap e measure was used to ensure the correct p ositions. The microphones’ heigh t p osition in b oth exp erimen ts was set to match the geometrical cen ter of the loudsp eak ers enclosure in all measuremen ts. The ﬁrst p osition measured used the HA TS in the center, without interference from another obstacle inside the ring, to provide a baseline. Figure 3.6a illustrates a set of positions to study the inﬂuence of a second listener inside the ring while keeping the test sub ject in the cen ter (the sweet sp ot). Three diﬀerent p ositions for the KEMAR (50, 75 and 100 cm of sepa- ration) w ere measured with the HA TS ﬁxed at the center of the loudsp eak er arra y . The data collected are from microphones in the HA TS ears; the KEMAR w as only a physical obstacle to sim ulate a listener inside the ring. Figure 3.6b , illustrates a diﬀerent set of p ositions, maintaining a minimum separation of 50 cm b etw een the center of the heads, were measured. The purp ose of these p ositions with the HA TS oﬀ-cen ter was to iden tify the presence of distortions caused b y the decen tralization of the sub ject and the eﬀect of the addition of a listener within the circle of loudspeakers as a physical obstacle to sound w av es. The p ositioning was standardized so that the mo v emen t along the x-axis to the left and righ t directions of the dummies were annotated as negative and p ositiv e, resp ectiv ely . 3.2.3 Calibration T o calibrate the HA TS recordings, the adapter B&K UA-1546 w as connected to the B&K 4231 calibrator. That provided a 97.1 dB SPL signal, which corresp onds to 1.43 P a, instead of 94 dB without an adapter. The recorded signal from eac h ear w as used to calibrate the levels of all measurements. The calibration factor was calculated as: Chapter 3. Metho ds 65 (a) Center e d p osition (b) Oﬀ-c enter p osition Figure 3.6: HA TS in gr ay, KEMAR in yel low. a) Me asur e d p ositions with the HA TS c enter e d and the KEMAR pr esent in the r o om in diﬀer ent p ositions (thr e e c ombinations). b) Me asur e d p ositions with the HA TS in diﬀer ent p ositions and the KEMAR pr esent in the r o om in diﬀer ent p ositions (nine c ombinations). α l, rms = 1 . 43 rms( v l ( t ) 1kHz )  P a VFS  , (3.1a) α r, rms = 1 . 43 rms( v r ( t ) 1kHz )  P a VFS  , (3.1b) where α l, rms is the calibration factor for the left ear; α r, rms is that for the right ear; v l ( t ) is the calibrator signal recorded in the left ear; v r ( t ) is that for the right ear; The individual loudsp eak ers’ sound pressure lev el to the same ﬁle can diﬀer dep ending on several factors ( e.g. , the ampliﬁcation system’s level). T o balance that, a factor was then measured for a GRAS 1/2” pressure-ﬁeld microphone recording a pistonphone’s calibrated sound signal from 1 kHz. The calibration factor α rms w as calculated from the ro ot mean square (RMS) using: Chapter 3. Metho ds 66 α rms = 10 RMS( v ( t ) 1kHz )  P a VFS  , (3.2) where v ( t ) 1kHz is the sinusoidal signal at 10 Pa recorded from the calibrator in v olts full scale (VFS). The loudsp eak er correction factor is calculated through the iterativ e pro cess that starts repro ducing a RMS scaled v ersion of a pink noise signal at 70 dB SPL. pink noise( t ) =  pink noise( t ) rms(pink noise( t )) 10 70 − dBperV 20 µ  Γ l (3.3) where Γ l is the level factor to the loudsp eak er l with initial v alue = 1; dBp erV = 20 log 10  α rms 20 µ  . The signal pink noise( t ) is play ed through a loudsp eaker l and simultaneously recorded with the microphone S l ( t ); the SPL of the recorded signal is calculated as follows SPL l [dB] = 20 log 10 S l ( t )[VFS] α rms  Pa VFS  20[ µ P a] ! , (3.4) T en measurements are sequentially p erformed, making interv als of 1 second; the next iteration happens if the SPL obtained exceeds the tolerance of 0.5 [dB] on an y of the measurements. A step of ± 0.1 [VFS] is set to up date Γ l in its next iteration accordingly to the SPL obtained. Chapter 3. Metho ds 67 3.2.4 VBAP Auralization In the ﬁrst measurement, VBAP was the tec hnique used to auralize the ﬁles. The ﬁrst step in signal pro cessing was recording the 24 (RIRs) one from each loudsp eak er. Kno wing the R T of the ro om, a sweep (50-20000 Hz) was cre- ated, fulﬁlling the length requirement; in this case, a logarithmic sw eep of 1.49 seconds. After that, an inv erse ﬁlter (minim um-phased) w as created to comp ensate for the frequency resp onses from the diﬀerent loudsp eak ers. This signal is then pro cessed through the VBAP technique to the sp eciﬁed array of 24 loudsp eak ers. The output is a ﬁle with 24 c hannels containing the sweep signal appropriately w eighted to the sp eciﬁc angle. The signal can b e pro cessed through a single channel (when the angle to b e pla y ed is at the loudsp eak er p osition) or up to tw o combined channels when it is a virtual loudsp eak er’s p osition. Each channel was also conv olved with the designed ﬁlter. The ﬁnal (auralized) signal w as used as an excitation in the transfer function where the receptors were a pair of microphones in the B&K HA TS. 3.2.5 Am bisonics Auralization In the second measuremen t at the Eriksholm test room, the ﬁles were auralized with ﬁrst-order Ambisonics in a similarlly to VBAP . T o b e able to pro cess the excitation signal, to acquire the impulse resp onses, some adaptations were required. In this case, the Am bisonics auralization pro cess requires an enco ded impulse resp onse that contains the magnitude and the direction of incidence infor- mation for eac h instance of time. This RIR can b e attained via computer sim ulation or recorded with a sp eciﬁc arra y of microphones. The ODEON soft w are v ersion 12.15 w as used to sim ulate the sound b eha vior in an ane- Chapter 3. Metho ds 68 c hoic environmen t and enco de the impulse resp onses in Ambisonics ﬁrst-order format around the listener. Odeon softw are is based on a h ybrid n umeric metho d [ 59 ]. In general, the Image-Source, a deterministic metho d, is fa vored in the region of the ﬁrst reﬂections up to an order predetermined b y the user. Then, reﬂections from subsequen t orders than the predetermined transition order are calculated using ra y tracing, a sto chastic metho d [ 148 , 201 ]. Therefore, it is p ossible to simulate the sound b ehavior from a 3D mo del description of the space and details of its acoustic prop erties. F rom that sim ulation result, any music or sound can b e exp orted as recorded inside that space from the giv en p ositions of source and receptor [ 288 ]. Another option is to exp ort the ro om impulse resp onse, which represen ts the sound b eha vior of the given source receptor p ositions. The RIR can also b e exp orted as BRIR and Ambisonics in ﬁrst and second order in the v ersion 12 of the Odeon softw are. The selected materials used to comp ose the simulation, and their corresp on- den t absorption co eﬃcien ts used in the ODEON sim ulation are listed in the App endix E . In total, 72 diﬀeren t RIRs (5 degrees separation) were simu- lated for diﬀeren t p ositions of source-receptor. The simulated source p ositions w ere at the same distance of 1.35 meters from the cen ter as the loudsp eak- ers in the anec hoic ro om. These RIRs w ere conv olved with the appropriate sw eep signal, pro ducing a four-channel ﬁrst-order Am bisonics sweep signal. These signals were then pro cessed by a deco der to the loudsp eakers array’s sp eciﬁc p ositions, generating the auralized 24 c hannel ﬁles. The in v erse ﬁlter pro cedure to each loudsp eak er was applied as well as the calibration of the sound pressure level across loudsp eak ers. The alpha factor was calculated as α rms = 1 rms( v ( t ) 1kHz )  Pa VFS  , since the recorded input w as from a sound calibrator t yp e 4231 b y B&K delivering 1 [Pa] SPL. The equalized, conv olved, deco ded, and ﬁltered sweep signals contain the sim ulated source-receptor sound distri- Chapter 3. Results 69 butions in magnitude, time, and space as if recorded inside the simulated ro om. In this exp erimen t, the simulated anechoic ro om has an absorption co eﬃcien t equal to one on all surfaces, sim ulating the anec hoic condition. The setup in ﬁrst-order Ambisonics was chosen giv en the p ossibility of explor- ing a reduction in the num b er of loudsp eak ers in future exp eriments and the p ossibilit y of generating it through v alidated soft w are such as Odeon. 3.3 Results In this study , the p erformance of the system w as ev aluated by collecting and analyzing results based on the positions of a mannequin within the virtual sound ﬁeld ( i.e. , center and oﬀ-cen ter) and the conditions under which the system was tested ( i.e. , with and without the presence of a second head-and- torso simulator). The results were presen ted in terms of angles referenced coun ter-clo c kwise, whic h allow ed for a detailed analysis of the system’s p er- formance under v arious conditions. Through this analysis, it was p ossible to gain a comprehensive understanding of the system’s capabilities and iden tify p oten tial areas for impro v emen t. 3.3.1 Analysis The signals were play ed and sim ultaneously recorded; the recorded result car- ried the auditory spatial eﬀects from auralization and also the ph ysical limita- tions given b y the virtualization setup ( e.g., loudsp eakers’ frequency resp onse, and presence of loudspeakers inside the ro om). As the recorded sweep has a greater length than the original one, zero-padding was p erformed. In that pro cess, zero es are app ended to the end of the time domain signal, obtaining Chapter 3. Results 70 the equiv alent con volution nonetheless [ 242 ]. After that, it w as possible to calculate the virtual en vironmen t’s impulse resp onse by dividing the recorded signal b y the zero-padded v ersion of the initial sweep, b oth in the frequency domain. F or b oth measurements, the in teraural time diﬀerence is calculated b y compar- ing the sound’s arriv al time in the impulse resp onse b etw een the tw o c hannels of a binaural ro om impulse resp onse (BRIR). There are diﬀerent metho ds for ITD calculation [ 132 , 314 ]. In this w ork, ITDs w ere estimated as the dela y that corresp onds to the maximum of the normalized in teraural cross-correlation function (IA CF). According to the ISO-3382-1:2009 [ 127 ], the IA CF is calcu- lated as: IA CF t 1 ,t 2 ( τ ) = R t 2 t 1 p L ( t ) p R ( t + τ )d t  R t 2 t 1 p 2 L ( t )d t   R t 2 t 1 p 2 R ( t )d t  (3.5) where p L ( t ) is the impulse resp onse at the en trance of the left ear canal; p R ( t ) is that for the right canal; The interaural cross correlation co eﬃcien ts, IACC [ 127 ], are given by: IA CC t 1 ,t 2 = max | IA CF( τ ) | , for − 1ms < τ < 1ms . (3.6) Similarly , to calculate the in teraural lev el diﬀerence (ILD), a fast F ourier trans- form (FFT) is applied to the time domain’s impulse resp onses, the sp ectrum is divided in to a v eraged o cta v e bands, and the ratio in dB b et w een the frequency magnitudes are calculated as the ILD: Chapter 3. Results 71 ILD( n ) = 20 log 10   q R p n R ( t ) 2 q R p n L ( t ) 2   , (3.7) where: n is the giv en frequency band; p n R ( t ) is the bandpassed right impulse resp onse; p n L ( t ) that to the left channel. 3.3.2 Cen tered p osition In the cen tered-p osition conﬁguration, (Figure 3.6a ), the listener remains at the ideal VSE p osition (cen ter) to fo cus on the eﬀect of an added listener inside the loudsp eak er ring. This framework can b e v aluable to auditory researc h as it can be used to analyze group responses to in terviews, argumen ts, collab orativ e w ork, so cial stress or disputes b etw een individuals in listening tasks. The IA CC to the fron tal angle (0 ◦ ) across frequencies is sho wn in Figure 3.7 . High v alues indicate that the system delivers the same signal to both ears. Con v ersely , the drop in IACC v alues at high frequencies can indicate that the Ambisonics may fail to render speciﬁc frequencies aﬀecting the o ctav e bands analysis. The IA CC v alues measured across all angles for VBAP and Am bisonics can b e found in Figure 3.8 . They indicate that Ambisonics tend to pro vide less lateralization in lo w er frequencies (constant and higher IA CC v alues) and lo w er but constan t v alues in high frequencies, p ossibly translating to blurred sound lo calization. Chapter 3. Results 72 Figure 3.7: Inter aur al cr oss c orr elation as a function of fr e quency in o ctave b ands - F r ontal angle 0 º . Figure 3.8: Inter aur al cr oss c orr elation for aver age d o ctave b ands in Ambisonics and VBAP te chniques r epr esente d in p olar c o or dinates. That can happ en due to a tilt in p ositioning the hats or imprecision from the virtualization system. F or example, a high-frequency sound w a ve at 8 kHz has a w av elength of approximately 4 cm and 2 cm at 16 kHz, which means that ev en a sligh t tilt can inﬂuence high-frequency IA CC. F urthermore, the in verse FIR ﬁlter applied was not the in v erse broadband signal, but the ﬁltered in Chapter 3. Results 73 third of o cta v e bands. That decision was a signal pro cessing compromise, as a broadband ﬁlter w ould only partially comp ensate for loudsp eakers’ geometry or phase diﬀerences in high frequencies. This p oint can b e further in v estigated as a wa y to improv e Ambisonics repro duction. There is a relative increase of v ariations with frequency in VBAP results, whic h are presen t to a lesser extent in the Ambisonics IA CC results. That reveals a diﬃculty from Am bisonics to drive a go o d sense of lo calization as a high coherence level indicates the sound coming from front or bac k [ 58 ]. A t the same time, due to the Ambisonics activ ation of all a v ailable loudsp eak ers to render the sound in the sweet sp ot area, the sense of immersion is higher. 3.3.2.1 Cen tered ITD The ITD results presented were obtained after a ten th-order low-pass Butter- w orth ﬁlter (LPF) w as applied. The ﬁlter’s cutoﬀ frequency w as 1,000 Hz to appro ximate the low frequency dominance in ITD [ 38 , 124 , 197 , 242 ]. V ector Based Amplitude P anning The light blue line in Figure 3.9 shows the results for the ITD from the initial setup (HA TS alone cen tered). The sys- tem presented a magnitude p eak in resp onse time of appro ximately 650 µ s, whic h corresp onds to appro ximately 22 cm for a wa ve tra v eling at the ve- lo cit y of sound propagation in the air. This distance is comparable to the distance b etw een HA TS microphones (19 cm). It is appropriate to note that the symmetry of HA TS is also presented in the HA TS alone results (triangles in Figure 3.9 ) pro viding reassurance in the quality of the data collected. The HA TS was k ept in the center of the loudsp eaker ring for the next set of measuremen ts. A second listener’s inﬂuence w as then simulated by introduc- Chapter 3. Results 74 Figure 3.9: a) HA TS alone at c enter. b) Light blue line: HA TS alone at c enter. Black line: HA TS c enter e d and KEMAR at 0.5 m to the right. Blue line: HA TS c enter e d and KEMAR at 0.75 m to the right. R e d line: HA TS c enter e d and KEMAR at 1 m to the right. ing a KEMAR and laterally v arying its p osition along the lateral axis (x-axis). The results are presen ted in Figure 3.9 . The ITD data obtained from this exp erimen t make it p ossible to comprehend that the second mannequin (KE- MAR) has an impact as an obstacle on the interaural time diﬀerence in the HA TS at the center of the loudsp eaker ring. In the closest p osition of the second listener (50 cm from the center), there is a reduction of ITD v alues (angles b et ween 285 and 305 degrees). Thus, the maxim um diﬀerence is 50 us. That eﬀect is related to the insertion of the physical obstacle represented b y the second listener. As the sound wa ve diﬀracts, diﬀeren t paths to the listener’s ears are imposed, reducing the sound’s arriv al time b etw een ears. Therefore, the eﬀect should b e centered at 270 degrees. How ever, the second listener was not p erfectly aligned to the lateral of the centered listener. That was a limitation of the exp eriment as the KEMAR w as placed in an ordinary chair, and its b ottom is not ﬂat. Chapter 3. Results 75 Am bisonics The ITD results for the initial setup (HA TS alone cen tered) virtualized from Ambisonics auralization are presen ted in Figure 3.11 . The system sho w ed a magnitude p eak in resp onse time, roughly 600 µ , 50 µ lo w er than the VBAP method. Another c haracteristic of Am bisonics ITDs is the ﬂat b eha vior around the lateral angles, which is generated mainly by the c hosen order of the Ambisonics auralization. In ﬁrst-order, the horizontal directivity is determined b y the to an intersection of three bi-directional (ﬁgure-eight) sensitivit y patterns, circumv ented b y a omnidirectional one, as illustrated in Figure 3.10 . That can also limit the lo calization p erformance when utilizing ﬁrst-order Ambisonics, ev en when repro duced through a higher num b er of loudsp eak ers. Figure 3.10: Horizontal 2D Ambisonics dir e ctional sensitivity cr op r epr esentation. The r e d line r epr esents an omnidir e ctional p attern, the black line r epr esents a bidi- r e ctional p attern, y-axis oriente d (nul l p oints at the sides), and the purple line is a bidir e ctional p attern r epr esentation x-axis oriente d (nul l p oints in fr ont and the b ack). The HA TS w as kept in the center of the loudsp eak ers ring and simulated a second listener’s inﬂuence on the sound ﬁeld by in tro ducing a KEMAR to three diﬀeren t p ositions along the x-axis 50, 75, and 100 cm to the left of HA TS ( i.e. at 270 ◦ ). The results are presented in Figure 3.11 by the black, blue, and red lines. The data clearly demonstrated that as an obstacle, the second listener (KEMAR) do es not inﬂuence the interaural time diﬀerence when using Ambisonics, and HA TS is at the center of the loudsp eak er ring. Chapter 3. Results 76 Figure 3.11: a) HA TS alone at c enter. b) Light blue line: HA TS alone at c enter. Black line: HA TS c enter e d and KEMAR at 0.5 m to the right. Blue line: HA TS c enter e d and KEMAR at 0.75 m to the right. purple line: HA TS c enter e d and KEMAR at 1 m to the right. 3.3.2.2 Cen tered ILD The eﬀects in higher frequencies due to a second listener require an analysis of a diﬀeren t parameter. Instead of studying the diﬀerence in the arriv al time of the sound b et w een the ears, the representativ e metric is the level diﬀerence b et w een the ears. There are eﬀects as absorption, reﬂection, and diﬀraction before the sound pressure signal reaches the eardrums. The torso, shoulders, outer ear, and pinna mechanically aﬀect an incoming sound w av e. These eﬀects are angle and frequency-dep enden t, as diﬀeren t frequency wa ves ha ve diﬀeren t w a ve- lengths [ 39 , 40 , 90 ]. The eﬀects on ILD caused by the virtualization pro cess w ere calculated as the diﬀerences b etw een the reference ILDs measures with HA TS alone and centered and the ILDs measured with HA TS and a second mannequin (KEMAR). As a reference, Figure 3.12 presents the ILDs by each metho d from t w elve diﬀerent angles (30 degrees separation) around the listener. Chapter 3. Results 77 Figure 3.12: Inter aur al L evel Diﬀer enc es as a function of o ctave-b and c enter fr e- quencies in twelve diﬀer ent angles ar ound the c entr al p oint. There are diﬀerences b et ween ILDs calculated from measuremen t with b oth tec hniques on the energy in the av eraged o cta v e bands. Ho w ev er, the ILDs from VBAP present a signiﬁcant eﬀect based on incidence angle (more natu- ral) than the Am bisonics [ 222 ]. F urthermore, the ILD p eak for the Am bisonics is observ ed around 2000 Hertz, which can b e interpreted as the limit in fre- quency repro duction of level diﬀerence b et w een ears when deco ding through 24 loudsp eak ers [ 299 ]. A more comprehensive comparison b etw een techniques with the HA TS centered alone can b e observ ed in the heatmap representation from Figure 3.13 including all 72 angles (5 degrees separation) measured. The homogeneit y across angles from Am bisonics measurements indicates that its ILD lac ks precision as a binaural spatial cue. Lo calization accuracy in Am- bisonics repro duction, esp ecially to lateral angles, is highly dep enden t on its order (acquisition and repro duction) [ 27 ]. Figure 3.14 sho ws the energy diﬀerence across the o cta ve bands for eight dif- feren t incidence angles on b oth techniques with and without the presence of the second mannequin. On b oth techniques, the strongest inﬂuence happ ens Chapter 3. Results 78 Figure 3.13: Inter aur al level diﬀer enc es aver age d o ctave b ands as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane. when the second mannequin is closest to the center. The second listener is to the righ t angle in VBAP (270 ◦ ), while in the Ambisonics is p ositioned to the left (90 ◦ ). Figure 3.14: Inter aur al L evel Diﬀer enc es (o ctave b and) angles ar ound the c entr al p oint c onsidering diﬀer ent displac ement of the se c ond listener. The ILDs calculated from measurements with the second mannequin present Chapter 3. Results 79 are not extensively diﬀerent compared to the reference ILD. The diﬀerence is prop osed to b e observed as a distortion parameter. These diﬀerences were cal- culated b y subtracting the ILDs with the second mannequin from the speciﬁed cen ter alone reference ILD. Ideally all graphs should b e black for a full matc h (no diﬀerence b et ween diﬀerent setups/p ositions), meaning no measured dis- tortion. V ector Based Amplitude P anning Figure 3.15 presents the diﬀerences b et w een ILDs calculated from HA TS centered (HC) and the conﬁguration that com bines the HA TS cen tered plus the KEMAR in one of the three p ositions ( e.g. , HC K-50 is the deﬁned notation to HA TS cen tered and KEMAR at 50 cm to the righ t). The sounds w ere auralized via VBAP for all 72 angles (5 ◦ spacing). The angles that corresp ond to loudsp eak er lo cations (15 ◦ spacing) is placed were repro duced directly by the physical loudsp eak er at that angle. Figure 3.15: VBAP discr ep ancies in ILD b etwe en HA TS at the c enter and: (T op) HA TS at the c enter plus KEMAR at 50 cm to the right, (Midd le) HA TS at the c enter plus KEMAR at 75 cm to the right, (Bottom) HA TS at the c enter plus KEMAR at 100 cm to the right. The diﬀerences in frequencies ov er 1 kHz are pronounced for angles to the right side of cen tered HA TS, 270-305 ◦ azim uth. Smaller eﬀects can also b e noted Chapter 3. Results 80 on other angles that corresp ond to virtual sound sources (where there is no loudsp eak er, and the sound source is pro duced via the auralization technique). These eﬀects are diminished as the second mannequin p osition increases a wa y from the centered receptor, indicating a smaller acoustic shadow. Figure 3.16: VBAP Inter aur al L evel Diﬀer enc es as function of azimuth angle ar ound the c enter e d listener. Figure 3.16 sho ws the ILD in six o cta ve bands from impulse resp onses recorded with ﬁles auralized using VBAP . The HA TS cen tered (HC) p osition refers to HA TS alone and it is compared to the conﬁgurations adding the second listener (KEMAR) in three diﬀeren t p ositions 50, 75 and 100 cm displaced from the cen ter (K+50, K+75, and K+100, resp ectively). The mismatch is pronounced when KEMAR is closer (blue line), esp ecially in the angles blo ck ed b y KEMAR. As the second listener blo cks the sound w a v e, an acoustic shadow is created, which reduces the sound energy to the ear facing the sound source, decreasing the lev el diﬀerence b etw een ears. There is also a reduction in ILD for angles from 35 to 50. That can b e related to the opp osite eﬀect where the mannequin reﬂects part of the sound, increasing the lev el to HA TS-centered counter ear. The ﬁnding supp orts in terpreting that a Chapter 3. Results 81 substan tial eﬀect o ccurs on ILDs to the KEMAR’s closer p osition. Am bisonics Figure 3.17 presen ts the calculated diﬀerences b et w een ILDs from the Ambisonics auralization with the same conﬁgurations ( i.e. , HC vs . HC K-50, HC vs . HC K-75, and HC vs . HC K-100).F or conv enience, the second mannequin was p ositioned to the left of the cen ter (90 ◦ ). The switc h from right to left do es not aﬀect the comparison as b oth HA TS and the Eriksholm test ro om are symmetric. Figure 3.18 shows the ILD in o cta ve bands, to highligh t the stronger eﬀect b eing at the 8 kHz. Figure 3.17: Ambisonics discr ep ancies in ILD b etwe en HA TS at the c enter and: (T op) HA TS at the c enter plus KEMAR at 50 cm to the left, (Midd le) HA TS at the c enter plus KEMAR at 75 cm to the left, (Bottom) HA TS at the c enter plus KEMAR at 100 cm to the left. The results demonstrate that including a second listener has a negligible ef- fect on Ambisonics ﬁrst-order ILDs. Ho wev er, on the reference measurement (HA TS alone), the ILDs did not adequately repro duce this spatial cue through- out the angles around the listener, given the observ able minor ILD diﬀerences across angles, esp ecially ov er 2 kHz. Chapter 3. Results 82 Figure 3.18: Ambisonics Inter aur al L evel Diﬀer enc es as function of azimuth angle ar ound the c enter e d listener. 3.3.3 Oﬀ-cen tered p osition Being able to ha ve the participan t aw ay from the center of the loudsp eak- ers ring can b e v aluable for testing simultaneous participants or a particular ph ysical apparatus’ inﬂuence ( e.g. Listening eﬀort ev aluated under presence of another individual [ 230 ]). Auditory research that aims to test the inﬂuence of a particular noise, SNR, or the direction of the noise on the in teraction in participan ts’ con versation can beneﬁt from a setup that would make it p ossible to virtualize a sound scene and presen t it without spatial distortions. Mea- suremen ts aiming to study the inﬂuence of oﬀ-center HA TS displacement w ere p erformed in nine diﬀeren t conﬁgurations: with HA TS and KEMAR indep en- den tly displaced 25, 50, and 75 cm from the cen ter, resulting in separations of 50, 75, 100, 125, and 150 cen timeters (See Figure 3.6b ). The listening p osition is critical to the auralization pro cess tec hniques pre- sen ted in this w ork as they are derived and programmed to render the sound in the center of a loudsp eaker arra y . Adding computer p o w er to real-time pro- Chapter 3. Results 83 cessing could handle participan t mo vemen ts; although that can b e considered, it w as not in this part of the exp erimen t scop e. Suc h pro cessing fo cuses on dynamics (head motion). The fo cus here is the eﬀects of sub-optimal p ositions and the inﬂuence of a second listener as an obstacle to the sound ﬁeld. 3.3.3.1 Oﬀ-cen ter ITD The eﬀects of oﬀ-center p ositioning on sound’s arriv al time can aﬀect the sub- jectiv e p erception of the sound incidence direction. V ector Based Amplitude P anning Observing the ITD results sho wn in Figures 3.19 , 3.20 , and 3.21 , almost no inﬂuence of the second mannequin (KEMAR) can b e noted ev en with the HA TS oﬀ-center. The ITD at oﬀ-center p ositions deviates from the ITD from HA TS cen tered at the same prop ortion regardless the second listener (KEMAR) p osition. Nonetheless, Figure 3.22 shows that a pronounced eﬀect app ears by shifting out the HA TS oﬀ center. When exceeding 25 cm, the spikes represen t a diﬃculty of the v ector-based amplitude panning pro cess to generate the virtual sound sources. This b ehavior is exp ected as the VBAP mathematical form ulation is deriv ed by a unitary vector p ointing to the center. In ﬁgures 3.20 and 3.21 it is p ossible to observ e more considerable distortions (sharp p eaks crossing the reference line in addition to b eing oﬀset from the reference line) in the ITD for the virtual sound sources. Such distortions increase as HA TS is mov ed a w a y from the cen tral position. Sound sources repro duced using VBAP in this loudsp eak er ring to these receptor p ositions w ould not b e correctly interpreted in terms of direction b y the listener. The ITDs diﬀerence is greater when the sound sources are at angles close to the Chapter 3. Results 84 Figure 3.19: ITD as a function of sour c e angle Light blue line: HA TS alone at the c enter. Black line: HA TS at -25, KEMAR at +25. Blue line: HA TS at -25, KEMAR at +50. R e d line: HA TS at -50, KEMAR at +75. Figure 3.20: ITD as a function of sour c e angle Light blue Line: HA TS alone at the c enter. Black Line: HA TS at -50, KEMAR at +25. Blue line: HA TS at -50, KEMAR at +50. R e d line: HA TS at -50, KEMAR at +75. fron t or rear (0 ◦ and 180 ◦ ) directions. This eﬀect is related to HA TS ph ysical displacemen t. The ITD results at lateral angles are represen ting a larger lob e to the HA TS righ t ear (270 ◦ ), and a sharp ened lob e at the HA TS left ear (90 ◦ ) Chapter 3. Results 85 Figure 3.21: ITD as a function of sour c e angle. Light blue line: HA TS alone, c enter e d. Black line: HA TS at -75, KEMAR at +25. Blue line: HA TS at -75, KEMAR at +50. R e d line: HA TS at -50, KEMAR at +75. Figure 3.22: ITD as a function of sour c e angle. Light blue line: HA TS alone, c enter e d. Black line: HA TS at -25, KEMAR at +25. Blue line: HA TS at -50, KEMAR at +50. R e d line: HA TS at -75, KEMAR at +75. sho ws the oﬀ-center displacement. This eﬀect occurs b ecause HA TS is not at the cen ter of the ring (See Figure 3.23b ), and the angles and separations b et w een the loudspeakers are mo diﬁed. The eﬀect is ev en more apparen t when Chapter 3. Results 86 lo oking only at the real sound source (angles corresp ondent to loudsp eak er lo cations) ITDs, without the distortions created by VBAP auralization, (See Figure 3.23a ). (a) (b) Figure 3.23: a) ITD for r e al sound sour c es. Light blue line: HA TS alone, c enter e d. Black line: HA TS at -25, KEMAR at +25. Blue line: HA TS at -50, KEMAR at +50. R e d line: HA TS at -75, KEMAR at +75. b) HA TS oﬀ-c enter p osition -75 cm scheme facing the thir d loudsp e aker. Am bisonics The VBAP metho d constructs the auditory spatial cues through one to three loudsp eak ers in this setup, usually in the same quadrant. Am- bisonics, in contrast, uses all the av ailable loudsp eakers in the rendering pro- cess. Hence, the sound lo calization is b eneﬁted on VBAP auralization com- pared to Am bisonics due to the nature of the methods [ 104 , 105 , 175 , 180 , 221 ]. F urthermore, the ITDs results observed from the ﬁrst-order Am bisonics reﬂect the metho d’s limitation on sw eet sp ot size. Figure 3.24 shows the calculated ITD in three diﬀeren t conﬁgurations H+25 K-25, H+50 K-50, H+75 K-75, and the center conﬁguration for comparison. T o improv e readabilit y , the ITD results for the remaining spatial conﬁguration (whic h were similar across conditions) can b e found in App endix A . The exp ected size of a listening area is 20 cm when com bining 24 speakers to repro duce Ambisonics in a 2D horizon tal matrix [ 299 ]. The displacemen t Chapter 3. Results 87 Figure 3.24: ITD as a function of sour c e angle in A mbisonics setup. Light blue line: HA TS alone, c enter e d. Black line: HA TS at -25, KEMAR a t +25. Blue line: HA TS at -50, KEMAR at +50. R e d line: HA TS at -75, KEMAR at +75. of 25 cm and greater puts the receptor outside the sw eet sp ot. Therefore, it is p ossible to observe in Figure 3.24 that Am bisonics do es not virtualize this acoustic track correctly outside the center p osition, as the v alues remain mostly constant for the side b eing play ed. 3.3.3.2 Oﬀ-cen ter ILD ILDs can b e highly sensitive to the listener’s p osition on a virtualized sound ﬁeld, given the considered smaller w a velengths. The comp osition of a virtu- alized sound w a v e is b e p erformed by sim ultaneously combining sounds from sev eral sound sources, whic h requires a highly precise com bination. This section in v estigates the ILD c hanges due to ha ving the listener aw ay from the optimal p osition while having another listener present. ILD’s inﬂuence when the HA TS and a second participan t are aw ay from the center. A comparison of the ILD results across the p ositions is sho wn in Figure 3.25 ; it presents for b oth tec hniques the calculated ILDs o ver frequency on eigh t Chapter 3. Results 88 incidence directions spaced o ver 45 degrees in the azimuth on three diﬀerent p ositions plus the centered p osition as a reference. The pattern deviation as the receptor is mo v ed from the cen ter is not the same across the tec hniques. As exp ected, the ph ysical construction of the summed sound wa ve from Am- bisonics that relies on all loudsp eak er has a higher impact on ILDs than the VBAP which only com bines few sound sources from the same quadrant. Figure 3.25: ILD as a function of fr e quency at diﬀer ent angles (line c olor) for VBAP (top r ow) and Ambisonics (b ottom r ow) for symmetric al displac ement in oﬀ- c enter setups. On ﬁles auralized through VBAP , the discrepancies b et ween ILD measured ha ving HA TS in the center (optimal p osition) and the other p ositions can b e in terpreted as acoustic artifacts capable of conv eying the wrong lo calization of the sound source. Although the second listener did not hav e a primary inﬂuence, the observed displacement from the center aﬀects the ILD pattern, esp ecially the higher frequencies. F or Am bisonics, the listener p osition is crit- ical. The ILD diﬀerences from center to oﬀ-center p ositions create artifacts that compromise ILD used as a cue to the sound localization on all tested p ositions. Chapter 3. Results 89 V ector Based Amplitude Panning The top row of Figure 3.25 shows the ILD screening in some of the incidence angles. Comprehensive visualization of ILDs across angles is presented in Figure 3.26 for the reference-cen tered (top) and oﬀ-centered p ositions. There is an eﬀect on ILDs when mo ving the receptor from the center p osition and adding a second listener inside the loudsp eak er’s ring. Although noticeable, the eﬀect still preserves the pattern, allo wing the diﬀerence to b e interpreted as artifacts. The vertical zero es ILDs indicated the fron tal and rear angles (0 ◦ and 180 ◦ ) where the sound should arriv e at the ears with the same level. These vertical black lines are shifted as the listener is displaced from the cen ter. A t 75 cm displacement the low est v alue vertical line on Figure 3.26 app ears is 35 ◦ (fron tal) and 145 ◦ (rear) Figure 3.26: VBAP setups: ILD on c enter e d p osition (top); ILD on oﬀ-c enter setups: HA TS at 25 cm to the left with KEMAR at 25 cm to the right (midd le top); HA TS at 50 cm to the left with KEMAR at 50 cm to the right (midd le b ottom); HA TS at 75 cm to the left with KEMAR 75 cm to the right (b ottom). The diﬀerence b etw een ILD with the HA TS in the reference p osition (alone and in the center) and the conﬁgurations with HA TS outside the center sim ul- taneously with KEMAR are shown in Figures 3.27 , 3.28 and 3.29 . The acoustic ﬁeld b ehavior outside the cen ter of the ring at frequencies ab o v e Chapter 3. Results 90 Figure 3.27: VBAP diﬀer enc es in the ILD b etwe en c enter e d Alone and oﬀ-c enter with KEMAR setups: HA TS at 25 cm to the left with: KEMAR at 25 cm to the right (top); KEMAR at 50 cm to the right (midd le); KEMAR 75 cm to the right (b ottom). Figure 3.28: VBAP diﬀer enc es in the ILD b etwe en c enter e d setup and 25 cm oﬀ- c enter VBAP setups: HA TS at 50 cm to the left with: KEMAR at 25 cm to the right (top); KEMAR at 50 cm to the right (midd le); KEMAR 75 cm to the right (b ottom). 1 kHz presen ts signiﬁcan t ILD diﬀerences for the measured conﬁgurations, esp ecially on angles that are virtual sound sources. The ILD diﬀerence reaches up to 15 dB. As in the ITD, the ILD data from HA TS in the oﬀ-cen ter p osition shows the acoustic shado wing eﬀect caused by KEMAR. It is p ossible to note that as Chapter 3. Discussion 91 Figure 3.29: VBAP diﬀer enc es in the ILD b etwe en c enter e d setup and oﬀ-c enter setups: HA TS at 75 cm to the left with: KEMAR at 25 cm to the right (top); KEMAR at 50 cm to the right (midd le); KEMAR 75 cm to the right (b ottom). close as KEMAR is p ositioned to HA TS, greater discrepancies in ILD around p ositions near 270 degrees o ccur. This eﬀect is due to the diﬀraction and absorption of the sound on the second listener (KEMAR), and happens for b oth real (loudsp eaker) and virtual sound sources lo cations. Am bisonics Am bisonics presents a more considerable limitation regarding mo v emen t outside the cen ter of the ring due to its nature. The sound comp o- sition requires a combination of amplitude and phase from all av ailable loud- sp eak ers b eing the correct represen tation ac hiev ed only for an area at the center and without obstructions. The ILD in o cta v e bands is sho wn in Figure 3.30 . The low amplitude and homogeneity across frequencies demonstrate that Am- bisonics is limited to render the binaural cue prop osed, not appropriately de- liv ering the lev el diﬀerences outside the cen ter. The ILD diﬀerences from the oﬀ-cen ter p ositions to the HA TS centered are presen ted in App endix B . Chapter 3. Discussion 92 Figure 3.30: Ambisonics setups: ILD on c enter e d p osition (top); ILD on oﬀ-c enter setups: HA TS at 25 cm to the left with KEMAR at 25 cm to the right (midd le top); HA TS at 50 cm to the left with KEMAR at 50 cm to the right (midd le b ottom); HA TS at 75 cm to the left with KEMAR 75 cm to the right (b ottom). 3.4 Discussion Once the listener is cen tered in the loudsp eak er arra y , the second listener did not aﬀect the auralization other than the angles ph ysically shadow ed b y the second listener. Th us, a second listener do es not deteriorate the spatial cues on b oth auralization techniques analyzed in this w ork. F or VBAP , the discrepancies in ITD only o ccur as the second listener was p ositioned 50 cm a wa y , the closest measured p osition in this exp eriment. Also, diﬀerences in ILD for VBAP are more notable at the second listener’s closest p osition. Concurren tly , Am bisonics has not presented an apparen t diﬀerence in ITD to a cen tered listener b y placing a second listener inside the ring. The diﬀerence in Am bisonics ILDs from the cen tered reference indicates an acoustic shado w (this time at the left angle of 90 degrees) and an additional sligh t diﬀerence across other angles. Chapter 3. Discussion 93 There is an apparent eﬀect on ITD as the listener is mov ed out of the center. F or VBAP , the p eak of magnitude remains practica lly the same, appro ximately 650 microseconds, as the ITD 0 v alue (sound reac hing sim ultaneously in b oth ears) is shifted. At 75 cm oﬀ-center to its left side, the diﬀerence in arriv al time corresp onds to a shift of 30 degrees appro ximately . That is in line with the setup, as the mannequin was placed in front of another loudsp eak er. Ho wev er, the Ambisonics ITDs demonstrate that the comp osition of magnitude and phase is not completed in oﬀ-centered p ositions. The Am bisonics weigh ts are calculated to the sound wa ves from the loudsp eakers to interact in the center p osition and then form a sound ﬁeld represen ting a sound wa ve from a deﬁned incidence angle. Moving the primary listener to the right makes the in teraction b et w een the loudsp eak ers inaccurate. In this case, the time diﬀerence turns wrong due to the Am bisonics truncation order to b e lo w increasing the aliasing eﬀect as it can b e similarly observed in third and ﬁfth order in Lauren t et al., [ 275 ]. The sound from the right mainly reac hes the right ear and trav els to the left ear b efore the sound from the left side can trav el the extra distance. That is an expected eﬀect since ev en the minim um displacemen t (25 cm) is larger than the exp ected repro ducible area (around 20 cm) for this setup. There w as no diﬀerence observ ed as the second listener (KEMAR) p ositions were changed (25, 50, and 75 cm to the right of the cen ter) in all VBAP measurements with HA TS p ositioned to the left of the center. Considering the ITD just noticeable diﬀerence (JND) in an anechoic condition is of the order of the 10 to 20 microseconds [ 38 , 140 , 241 ], the ITD results when the oﬀ-center HA TS p osition w as 25 cm to the left were a goo d approximation of the reference- cen tered measurement. That means that a listener would not b e able to discern the diﬀerence con- cerning the direction of incidence if placed in these p ositions relying only on Chapter 3. Discussion 94 the ITD cue. It is also w orth considering that the JND to rev erb eran t condi- tions is even higher [ 140 ] and the artifact can b e masked by rev erb eration [ 97 ] whic h would b eneﬁt the auralization pro cess. The HA TS measurements p osi- tioned on 50- and 75-cm presents p eaks and crossov er v alues across the line that corresp onds to centered ITD, which indicate distortion problems at low frequencies regarding the spatial cue. A similar analysis of the KEMAR im- pact on ITDs from Ambisonics virtualization can not b e achiev ed since the ITD is not accurately rendered outside of the sweet sp ot. Eac h result of the in teraural lev el diﬀerence p osition combination (HA TS and KEMAR) was subtracted from the HA TS results alone to p erform the ILD analysis oﬀ-center. In the VBAP metho d, a shado w eﬀect generated by a second listener is presen t as exp ected, mainly when the ﬁrst listener is 25 or 50 cm left of center. Ho wev er, the diﬀerences in high frequencies are essentially on virtual sources, whic h indicates the diﬃculty of creating the virtual sound source impression outside the center p osition, indep endently of the second listener presence [ 2 ]. Oﬀ-cen ter p ositions did not allo w the accurate syn thesis of the ILDs from the loudsp eak ers using Ambisonics. The metho d did not repro duce time or level diﬀerences accurately in these conditions, whic h could lead to not ac hieving the correct spatial impression. That is in line with literature, although generally in v estigating higher Am bisonics orders, the complexity of accurately render high frequency cues is presen t [ 279 , 290 ], and also the oﬀ-cen ter increase on accuracy b y increasing the Ambisonics order with a prop er n umber of loud- sp eak ers [ 275 ]. It should b e noted that the curren t study did not measure c hanges in ITD and ILD for oﬀ-center listener p ositions without the presence of a second listener. Based on the eﬀects of ha ving the ﬁrst listener oﬀ-cen ter with a second listener Chapter 3. Concluding Remarks 95 presen t, coupled with the smaller c hanges with a second listener when the ﬁrst listener is centered, it can b e deduced from the current results that the oﬀ- cen ter p osition has a degradation eﬀect on the ITD and ILD. Considering that man y sim ulations are limited by a “sw eet spot” for the listener(s), the oﬀ- cen ter p osition, as opp osed to the presence of a second listener, is probably the greatest liability for m ulti-listener metho ds in hearing research. 3.5 Concluding Remarks The more demanding the test requiremen t in terms of lo calization of the sound source (out from left, right, front and back), the more the researcher should driv e tow ards VBAP . In case of ﬁxed p ositions and requirement of more sense of immersion, Ambisonics should b e able to build more con vincing sound sce- narios. The techniques do not aﬀect the ILD and ITD acoustic cues in the cen tral p osition for one test participant. The addition of a second listener within the ring also do es not signiﬁcantly aﬀect these parameters at the three distances tested, except for the angles usually hidden b y the shadow second listener. Th us, it is suitable to mo v e to wards sub jectiv e tests with a cen ter participant and an actor on the side. Although the second listener has not deteriorated the techniques, they presen t diﬀeren t p erformances in terms of spatial repre- sen tation and notably presen t a diﬀeren t sense of immersion. Thus, the test’s purp ose to b e designed m ust b e tak en in to account when deﬁning the aural- ization metho d. There is a clear degradation when t wo test sub jects are sim ultaneously presen t b oth in oﬀ-cen ter p ositions, regardless of the distance of a second listener. The VBAP measurements show ed an increase in diﬀerences for ITD increasing the Chapter 3. Concluding Remarks 96 distance from the cen ter and signiﬁcan t diﬀerences in ILD. These diﬀerences indicate the creation of acoustic artifacts, p ossibly generated by the metho d’s diﬃcult y in correctly virtualizing high frequencies outside the sw eet sp ot. F or the ITD parameter, the displaced p osition of 25 cm of the center has little diﬀerence or evidence of artifacts generated by virtualization errors. At the same time, the other distances present signiﬁcant diﬀerences and artifacts. The binaural cues analysis suggests that VBAP is less sensitive to the participant p ositions than the Ambisonics setup. Ho w ev er, it is relev ant to note that although the diﬀerences in the binaural cues denote diﬀerences in audio spatialization, reﬂecting on the p erceived angle of incidence of the sound, b oth techniques can b e calibrated to repro duce the stim uli at a desired lev el of sound pressure. That means that an auralized sound can b e repro duced with the correct sound pressure level although its direction ma y not b e correctly interpreted b y the listener as their binaural cues are not b eing deliv ered appropriately . Chapter 4 Sub jectiv e Eﬀort within Virtualized Sound Scenarios This exp eriment was a collaborative study (EcoEG [ 3 ]) with fello w HEAR- ECO PhD student Tirdad Seiﬁ-Ala, also from the Univ ersit y of Nottingham, that combined the virtualization of sound sources and electro encephalograph y (EEG) to assess listening eﬀort in ecologically v alid conditions. Both studen ts con tributed equally to the study design, preparation, data collection and in- terpretation. TSA additionally p erformed the data analysis; SA additionally p erformed the ro om simulations, stimuli preparation, soft w are interface and sound calibration. As deﬁnitions can v ary , this chapter uses the following terms: • Simulation: Numerical acoustic sim ulation of spatial behavior of a sound in a deﬁned space. • Auralization: Creation of a ﬁle that can b e conv erted to a p erceiv able sound and contains spatial information. 97 Chapter 4. In tro duction 98 • Sound Virtualization: Repro duction of an auralized sound ﬁle through loudsp eak ers or headphones. 4.1 In tro duction The in terest from researc hers and clinicians in the listening eﬀort measures has gro wn recently [ 83 , 135 , 210 ], the imp ortance of studying listening eﬀort in an ecologically v alid sound en vironmen t follows the same trend [ 134 ]. The previous chapter discussed the feasibility and constraints of the virtual- ized sound ﬁeld through binaural cues and foreseeable eﬀects on spatial im- pression and lo calization. This c hapter in v estigates whether reverberation and the signal-to-noise ratio (SNR) are mo deled in b eha vioral data, b eing a pro xy of sub jective listening eﬀort in a virtualized sound environmen t. The rev erb eration is the accumulation of energy reﬂections (sound) in an en- closed space that creates diﬀusion in its sound ﬁeld [ 256 ]. Reverberation Time, in turn, is an ob jective parameter that represent s the amoun t of time required to dissipate the energy of a sound source b y one-millionth of its v alue (60 dB) after the sound source has ceased [ 254 ]. This parameter w as reviewed in Sec- tion 2.3.3.2 . The remaining sound energy can blur the auditory cues, rapid transitions b et w een phonemes, and decrease the lo w-frequency mo dulation of a signal; it ma y compromise sp eech in telligibilit y [ 39 , 112 ]. Since reverberation is a complex phenomenon, depending on space and fre- quency [ 111 , 185 ], a wide range of ph ysical-acoustical factors ma y limit some comparisons. F or example, the repro duction method, the mask er t yp e, the p osition and num b er of sources, the SNR, the sound pressure level of the pre- sen tation, the reverberation time interv al studied, and the simulated p osition b eing in a free or in a diﬀuse sound ﬁeld. Like the metho dologies, the ﬁndings Chapter 4. In tro duction 99 in terms of rev erb eration inﬂuence on listening eﬀort across exp eriments can also v ary . Previous studies in v estigated the eﬀect of rev erb eration on sp eec h in telligi- bilit y and listening eﬀort. V ariations across reverberation time, level, and p opulation groups were observed. F or example, a correlation b et w een age and rev erb eration was traced in work b y Neuman et al. [ 203 ]. This study found that rev erb eration negatively impacts the necessary SNR to reach 50% of sp eec h recognition. This impact v aries across ages, with the eﬀect decreasing as age increases. The sensitivity of sub jective measures and electro dermal activities w ere ev aluated b y Holub e et al. [ 121 ]. The eﬀect of reverberation was found statistically signiﬁcan t to sub jective measures but not to the electrodermal activit y . A study from Picou et al. [ 225 ] presents a resp onse time in a dual- task paradigm as a b eha vioral measure of listening eﬀort. In their study , there w as no signiﬁcant eﬀect in resp onse time neither in the same SNR conditions nor comparing the resp onse time of equal p erformance scores. The impact on listening eﬀort w as studied b y Kw ak et al. [ 149 ] through sub jective ratings resulting in a signiﬁcant eﬀect of reverberation on ratings of listening eﬀort and the sentence recognition p erformance. In Nicola and Chiara’s study [ 204 ], the negative inﬂuence of reverberation on response time was considered in- dicativ e of an increase in listening eﬀort. The study assessed the inﬂuence of rev erb eration and noise ﬂuctuation on resp onse time. The diﬀerent metho d- ologies applied in the studies and their groups of participan ts m ust b e carefully analyzed, as they can explain the diﬀeren t results. Am bisonics arrangemen ts (Mixed Order Am bisonics (MO A) [ 78 , 177 ] and HO A) are already used in audiological studies [ 7 , 77 , 173 , 303 ]. This study prop osed a low-order (ﬁrst-order) Ambisonics implemen tation. The low-order tec hnique is more sensitiv e to the listener position [ 64 , 65 ], whic h w as also veri- ﬁed in this study . That can b e seen as a counter-in tuitive and non-con ven tional Chapter 4. Metho ds 100 c hoice, although it was mean t to assess lo w-order Am bisonics’ feasibility in audiological studies and its constraints. This decision w as a step tow ards con- ﬁrming the feasibilit y of a listener in a cen tralized p osition found in Chapter 3 , observing its constrains, and further dev eloping an auralization metho d with lo w er hardware requirements in Chapter 5 . Hyp othesis The main researc h question is how the auralized acoustic sce- nario, sp eciﬁcally the ro om and the SNR, increases auditory eﬀort when vir- tualized. The hypothesis for the exp eriment is that a longer R T provided through sound virtualization and a low er SNR b oth lead to a more signiﬁcant listening eﬀort. Rev erb eration time can inﬂuence normal hearing and hearing- impaired p eople in diﬀerent wa ys. F or example, on a v erage, hearing-impaired listeners exp erience more signiﬁcan t diﬃculties with understanding sp eec h in a reverberant condition than normal hearing listeners, so they can suﬀer more from the strain of listening. As rev erb eration’s eﬀects on hearing-impaired listeners v ary (see Chapter 2 ), this study employ ed only normal-hearing par- ticipan ts to inv estigate the eﬀects of audio degradation. T o sub jectiv ely assess c hanges in hearing eﬀort, a questionnaire was provided to participan ts, asking ho w muc h eﬀort they found for each condition (describ ed in Section 4.2 ). This in v estigation is the ﬁrst step to understanding the feasibility of including the simpliﬁed virtualization of sound sources in the expanding ﬁeld of listening eﬀort research. 4.2 Metho ds This exp eriment w as designed to gather data for tw o parallel analyses: the ﬁrst w as to ev aluate diﬀerences in b eha vioral p erformance (sp eech recognition) and sub jectiv e impressions of listening eﬀort driven b y diﬀeren t scenarios, manip- Chapter 4. Metho ds 101 ulating the ro om t yp e and the signal-to-noise ratio (SNR). The second study compared physiological resp onses of the brain as measures of listening eﬀort to the same b eha vioral p erformance. This c hapter fo cuses on the exp eriment’s ﬁrst study (b ehavioral data vs. sub jective impressions). Three ro oms were c hosen for this study: a classro om, a restaurant dining area, and an anec hoic ro om. F or this exp erimen t, a setup was dev elop ed to in vestigate the inﬂuence of lis- tening eﬀort caused in nine diﬀeren t situations: three ro om sim ulations c harac- terized by their reverberation time and three SNRs. The setup was comp osed of four recorded talkers acting as maskers and one talk er acting as the target. The talkers’ p ositions were all spatially separated. The test paradigm inv olved the auditory presen tation of Danish hearing in noise test (HINT) sentences [ 205 ] on top of four sp eec h maske rs and recalling the words they could keep in memory after 2 seconds. The sound sources are spatially distributed and the participan t is informed that the target sp eec h is alw a ys from the front. The participants resp onses w ere w ord scored ( i.e. , w ord-based sp eec h intelligibilit y) b y Danish-sp eaking clinicians. The metho d in this study follows a similar setup with a four-talker babble setup as in [ 209 , 302 ], whic h in v estigated SNR and masker t yp es using pupilometry as a proxy for listening eﬀort. Also, a study from W endt et al., [ 301 ] inv es- tigated the impact of noise and noise reduction through an equiv alent setup. This method’s inno v ation relies on using ﬁrst-order Am bisonics to generate the rev erb eration based on Odeon simulated ro oms. Chapter 4. Metho ds 102 4.2.1 P articipan ts F or the data collection, 18 normal-hearing native Danish-sp eaking adults (eigh t females) with an av erage age of 36.9 ± 11.2 y ears ﬁrst ga v e written consen t form and initially participated in the test. One participant was placed outside the sound ﬁeld sw eet sp ot, so his data w ere discarded, and the data for the other 17 participan ts w ere used for further analysis. Ethical approv al for the study w as obtained from the Research Ethics Committees of the Capital Region of Denmark. F or eac h participan t, the pure-tone a verage of air conduction thresholds at 0.5, 1, 2 and 4 kHz pure tone audiometry (PT A4) w ere tested and conﬁrmed b elow 25 dB HL. 4.2.2 Stim uli The target stimulus consisted of simple Danish sentences sp oken by a male sp eak er. The sen tences were from the HINT in Danish [ 205 ] and w ere 1.3-1.8 s in duration. The masking signal consisted of four diﬀeren t sp eakers, t w o female and t w o male, reading a Danish-language newspap er [ 302 ]. The total duration of each of the masker recordings w as appro ximately 90 seconds. The mask ers’ onset w as 3 s b efore and oﬀset w as 2 s after the target, resulting in a masker duration of 6.3-6.8 s. In each trial, the time segment used of each mask er was randomized. In addition, the spatial p osition for each masker was also randomized in each trial, but alwa ys in tersp ersing male and female talk ers. The ov erall maskers’ equiv alen t contin uous sound level L eq w as set at 70 dB (64 dB each mask er), and the target L eq w ere set at 62 dB, 67 dB and 72 dB to generate three diﬀerent SNR conditions: -8, -3 and +2 dB. In this study , SNR w as deﬁned as the equiv alent contin uous sound lev el of the target signal compared to the comp eting masking L eq . The c hosen rev erb eration conditions aimed to represent common everyda y situations. The R T of the anechoic and Chapter 4. Metho ds 103 rev erb eran t conditions studied w ere deﬁned as the ov erall rev erb eration time obtained through the output of the simulation soft ware (ODEON Softw are © v.12). The chosen rev erb eration time v alues aim to represent common everyda y situations. The absorption co eﬃcien ts and relativ e area utilized to obtain the men tioned conditions are presen ted on App endix E . Fiv e source p ositions (one target and four maskers) w ere created around a receptor in eac h sim ulated ro om. All p ositions w ere 1.35 m from the center of eac h ro om where the receptor is lo cated. The approac h of creating t wo diﬀeren t rooms instead of changing the parameters of a single room w as c hosen to achiev e a more natural sound ﬁeld. That w ay the absorption co eﬃcien t applied to the ro om’s materials was kept close to real. The virtualization of the prop osed a coustic scenarios follo ws the path indicated in Figure 4.1 . An acoustic simulation is p erformed to create the appropriate c haracteristics of the sound according to the ro om. The softw are calculates the amplitude and the incidence directions of sound and its reﬂections arriving from sp eciﬁc sources to a receptor p osition inside the ro om. F or eac h com bina- tion of source-receptor, the softw are generates a ro om impulse resp onse that is enco ded in Amb isonics ﬁrst-order in Am biX [ 198 ] format (which is a channel order sp eciﬁcation for Ambisonics auralization ﬁrst 4 channels are WYZX com- pared to WXYZ in the F uMa sp eciﬁcation). The generated ﬁle w as con v olv ed with anechoic audio and deco ded to the sp eciﬁc arra y of 24 loudsp eak ers. Chapter 4. Metho ds 104 Figure 4.1: Aur alization pr o c e dur e implemente d to cr e ate mixe d audible HINT sentenc es with 4 sp atial ly sep ar ate d talkers at the sides and b ack (maskers) and one tar get in fr ont. 4.2.3 Apparatus The exp eriment w as set up in an anechoic ro om (IAC Acoustics) with 4.3 m × 3.4 m × 2.7 m (inner dimensions). The exp erimen tal setup consisted of a circular array of 24 loudsp eak ers p ositioned on 15 ◦ in terv al on the azim uth and 1.35 meters distance from the center. The target sound was repro duced at 0 ◦ (participan t’s front), the maskers were auralized at ± 90 ◦ and ± 150 ◦ (Figure 4.2 ). The p osition of the participan t during all the test w as monitored through a laser line and a camera ensuring they remained in the sweet sp ot. Stim uli were routed through a sound card (MOTU PCIe-424) with Firewire 440 connection to the MOTU Audio 24 I/O interface) and were play ed via 16 loudsp eak ers Genelec 8030A and 8 loudsp eak ers Genelec 8030C (Genelec Oy , Iisalmi, Finland) aligned in frequency and level. The Biosemi EEG device was used to collect the ph ysiological data, whic h help ed to restrain participan ts’ mo v emen t; the EEG data were not analyzed in this study . Chapter 4. Metho ds 105 Figure 4.2: Sp atial setup of the exp eriment: T est subje cts attende d to tar get (in blue) stimuli fr om a 0 ◦ angle in fr ont.The masking talkers (in r e d) ar e pr esente d at later al ± 90 and r e ar ± 150 p ositions. All enclosed spaces ha v e a certain degree of reverberation due to acoustically reﬂectiv e surfaces and bac kground noise due to equipment, including cont rolled audiological environmen ts. The lev els of reverberation and bac kground noise meet the criteria from Recommendation ITU-R BS.1116-3 [ 126 ] and are re- sp ectiv ely sho wn in Figures 4.4 and 4.3 . Figure 4.3: R everb er ation Time inside ane choic r o om at Eriksholm R ese ar ch Cen- tr e with setup in plac e. Chapter 4. Metho ds 106 Figure 4.4: Eriksholm Ane choic R o om: Backgr ound noise A-weighte d. L oudsp e ak- ers and lights on, motorize d chair oﬀ. The parameters were measured with the setup (loudsp eak ers, motorized c hair and BioSemi eeg equipmen t) inside the ro om and p ositioned as in the exp eri- men t. Figure 4.5 sho ws the setup placed inside the anechoic ro om. Figure 4.5: Setup inside ane choic r o om (Motorize d chair, adjustable ne ck supp ort and EEG e quipment). Chapter 4. Metho ds 107 4.2.4 Auralization Acoustic Scene Generation and Ro om Acoustic Simulation T o simulate the acoustics c haracteristics of the chosen scenarios, geometric mo dels were created in the ro om acoustics soft ware ODEON. Next, the Am- bisonics Ro om Impulse Resp onses w ere sim ulated using ODEON softw are, v ersion 12 [ 59 ]. The absorption co eﬃcients of the ro om surfaces are listed in Annex E . All sen tences w ere auralized in Am bisonics [ 15 ], truncated by 1st order and enco ded to 24 c hannels. The analysis utilized the Institute of T ec h- nical Acoustics (IT A)T o olb o x [ 29 , 67 ]. Ro oms w ere chosen as representativ e of realistic and not extreme acoustic conditions. The spaces simulated were a classro om (9.46 m × 6.69 m × 3.00 m) with an ov erall R T of 0.5 seconds, and a restaurant’s dining area (12.19 m × 7.71 m × 2.80 m) with an ov erall R T of 1.1 seconds. The distance b et w een source and receptor w as kept the same, 1.35 m, across rooms. T arget and mask er p ositions w ere sim ulated b y selecting the appropriate sim ulated RIR to conv olve i.e., the sim ulated source-receptor RIR that corresp onds to the desired repro duction angle. Am bisonics Sw eet Sp ot In this study , tw o diﬀerent metrics w ere used to compare the oﬀ-center p erformance of virtual sources auralized with ﬁrst- order Ambisonics: the R T and the Sound Pressure Lev el (SPL). That is, the presen ted virtualized soundﬁeld w as deliv ering the correct amoun t of rev erb er- ation and also the correct sound pressure level of eac h source resulting in the appropriate signal-to-noise ratio when was not p erfectly cen tered. T o estimate eac h p osition’s metrics, a logarithimc sweep signal (50-20000 Hz, 2.73 s (FFT Degree 18, Sample F requency 96 kHz)) was generated and conv olved with the Am bisonics ﬁrst-order RIR calculated by ray-tracing in ODEON for eac h mod- eled ro om. The simulated ro oms presen ted an o v erall theoretical rev erb eration Chapter 4. Metho ds 108 time of 0, 0.5, and 1.1 s. These auralized ﬁles were enco ded to 24 channels distributed in the horizon tal axis. In the following, the ﬁles were play ed in- side the anec hoic ro om and simultaneously recorded. F rom the division in the frequency domain of the recorded signal and the zero-padded initial signal (de- con v olution), the calculated impulse resp onse (or binaural RIR (BRIR) when recorded with HA TS) represen ts the virtualized system, including the physical eﬀects of the arra y and all calibration. Rev erb eration Time The R T w as calculated with IT A-T o olb o x from ini- tial 20-dB decrease from p eak level ( T 20 ) in the virtualized IRs. Figure 4.6 sho ws the ov erall R T results at the center position and by mo ving the receptor (manikin) tow ards the fron t. Figure 4.6: Over al l r everb er ation time (R T) as a function of r e c eptor (he ad) p osi- tion in the mid-saggital plane r e c enter (0 cm) The results show ed slightly greater R Ts (0.58 and 1.16 s) than what w as sim- ulated in the ODEON soft ware (0.5 and 1.1 s). How ever, this was exp ected since there is equipment inside the anechoic room (e.g., a large c hair and loud- sp eak ers) that can b e considered reﬂective surfaces that were not presen t in Chapter 4. Metho ds 109 the sim ulation. The results show ed that there is no ma jor eﬀect on the energy deca y for small head mov ements. Sound Pressure lev el The sound pressure level w as determined b y con- v olving the target and mask er sounds with the impulse resp onses collected across t welv e p ositions with horizon tal displacements of 2.5, 5 and 10 cm and forw ard (mid-saggital) displacements of 2.5 and 5 cm. The results are shown in Figure 4.7 . F our sp eech talkers are individually conv oluted. The equiv alent sound pressure lev el is determined using the calibration factor. The measure is the av erage of 20 diﬀerent sen tences. Figure 4.7: Sound pr essur e level virtualize d thr ough Ambisonics at diﬀer ent listener p ositions. The changes in SPL as a function of oﬀ-cen tre p osition do not follo w a consis- ten t pattern. The SPL changes w ere, ho w ev er, mostly similar across the three sim ulated ro oms, with the exception of three p ositions where the restauran t (1.1 s R T) was 1-1.5 dB diﬀerent (x = 2.5, y = 0; x = 0, y = 2.5; x = 10; y = 2.5). The cen ter p osition is the optimal p osition for sound pressure level accuracy . T o help get reliable, appropriate data from the exp eriment, a neck rest as well as a video feed and laser line were added to the setup after the Chapter 4. Metho ds 110 ﬁrst pilot test. The participan ts were ask ed to b e in con tact with the neck rest all the time. The clinician was able to s ee the laser line at the patient’s head throughout the test. They could ask the participan t to quickly correct p osture at the start of eac h blo c k or at an y p oint of the session after the participant needed a break. Figure 4.8 sho ws a participan t positioned with all sensors con- nected. Another imp ortan t ﬁnd w as, after adjusting the participan t p osition, the motorized c hair should b e unplugged, otherwise the EEG data would b e compromised. Figure 4.8: Particip ant p ositione d to the test. 4.2.5 Pro cedure There w ere 9 diﬀeren t conditions based on SNR (+2 dB, -3 dB, -8 dB) and rev erb eration time (0 s, 0.5 s, 1.1 s) of the sound. Eac h condition was presented in separate blo cks, and each blo c k consists of 20 sen tences, so in total there w ere 9 blo c ks and 180 sen tences presented to the participan ts in the main test. In addition to that, each participant w en t through a training round in the b eginning, consisting of 20 sentences with diﬀeren t conditions. The pro cedure for each trial is illustrated in Figure 4.9 . Each trial started with 2 s Chapter 4. Metho ds 111 of silence (preparation), then 3 s of background noise which served primarily as a baseline p erio d for the separate EEG analysis. Then a HINT sen tence w as play ed as the bac kground noise contin ued for 1.5 s on a v erage. After the target sen tence ﬁnished, the bac kground noise contin ued for another 2 seconds during whic h participan ts needed to main tain the w ords they just listened to (main tenance), also serving primarily for the companion analysis of EEG resp onses re baseline. When the background noise w as stopp ed, the participan ts were instructed to rep eat all the w ords within the sen tence (recall). The listening eﬀort reﬂected in alpha p o w er changes in the maintenance phase ha v e b een inv estigated b y [ 208 , 310 , 311 , 313 ] Figure 4.9: T rial design. F or e ach trial, 20 in e ach blo ck, ther e was 2 s of silenc e, then 3 s of masker (4 sp atial ly sep ar ate d talkers), then a Danish HINT sentenc e as tar get stimuli in the pr esenc e of c ontinuing masker, then 2 additional s of masker, fol lowe d by silenc e when the p articip ant r ep e ate d as many tar get wor ds as they c ould understand and ke ep in memory. Figure 4.10 shows the user graphical user interface designed and implemented for this experiment. The 24-channel audio ﬁles were produced b eforehand (oﬀ- line), b eing calibrated to the sp eciﬁc setup. Along with audio presen tation, the softw are also sent a series of triggers in synch with presentation timings to the EEG softw are (Actiview, BioSemi) to mark the EEG measuremen t Chapter 4. Metho ds 112 appropriately for the companion analysis. Figure 4.10: Gr aphic User Interfac e use d to ac quir e the data fr om p articip ants. Wor ds ar e state buttons that alternates b etwe en gr e en and r e d b eing save d as 1 or 0 r esp e ctively. 4.2.6 Questionnaire A t the end of each blo c k (SNR × ro om condition) a three-item questionnaire w as presen ted to the participan ts; the English translation is sho wn in T a- ble 4.1 . The questionnaire was translated from Zekveld and Kramer [ 318 ] to Danish. The resp onse to each question had a scale of 0 to 100 in integer units App endix F . The ﬁrst question was aimed to measure participan ts’ estima- tion of their p erformance, referred to as “Sub jective in telligibilit y” for the rest of the text. The second question was to measure participan ts’ p erception of eﬀort, referred to as “Sub jectiv e eﬀort”. The third question provided to mea- sure ho w often participants gav e up during the test, referred to as “Sub jective disengagemen t”. Chapter 4. Results 113 T able 4.1: The questionnaire for sub jective ratings of p erformance, eﬀort and en- gagemen t (English translation from Danish) Question 1 How man y w ords do y ou think that you understo o d correctly? Question 2 How m uc h eﬀort did y ou sp end when listening to the sentences? Question 3 How often did you give up trying to p erceive the sen tences? 4.2.7 Statistics A linear mixed mo del [ 171 , 233 ] (LMM) was used to in v estigate SNR and R T eﬀects on p erformance and questionnaire. The eﬀects on diﬀeren t alpha bands through EEG p ow er b y SNR and R T w ere also explored through LMM in the collab orative analysis p erformed b y Seiﬁ Ala. SNR and R T w ere ﬁxed factors, while participan ts w ere random factors in the mo del. Implemented in MA TLAB, the syntax for LMM was Dep endent ∼ 1+SNR*R T+(1—Sub ject ID), with Dependent b eing either p erformance or questionnaire. Both the SNR (-5, 0, 5) and R T (-0.53, -0.03, 0.56) lev els w ere re-cen tered around zero for the mo del. 4.3 Results This section highlights the ﬁndings concerning the study’s questions, the fea- sibilit y of having a hearing in noise test virtualized in ﬁrst-order Am bisonics, and the inﬂuence of degradation through SNR and Reverberation in the Sp eech In telligibilit y . The participant’s b eha vioral p erformance ( i.e. , speech recognition accuracy) demonstrated signiﬁcant eﬀects of SNR ( β = 5 . 98 , S E = 0 . 30 , t 158 = 19 . 67 , p < 0 . 001), and R T ( β = − 31 . 17 , = 1 . 78 , t 158 = − 17 . 49 , p < 0 . 001) and a signiﬁcan t in teraction betw een the tw o ( β = 1 . 76 , S E = 0 . 43 , t 158 = 4 . 04 , p < 0 . 001). Figure 4.11 presents the mean p erformance (p ercen t correctly recalled w ords) as a function of SNR for eac h ro om. Less signal degradation, whether higher Chapter 4. Results 114 SNR or low er R T led to higher p erformance accuracy . Figure 4.11: Performanc e ac cur acy b ase d on p er c entage of c orr e ctly r e c al le d wor ds as a function of SNR and R T (line c olor/shading). Err or b ars r epr esent the standar d err or of the me an. Lines/symb ols ar e stagger e d for le gibility and do not indic ate variation in SNR. The statistical analysis of the results for sub jective in telligibility (Figure 4.12 ), sub jectiv e eﬀort (Figure 4.13 ), and sub jectiv e disengagemen t (Figure 4.14 ) are sho wn in T able 4.2 . All the measures show a signiﬁcan t interaction b et ween SNR and R T. Lo wer signal degradation (higher SNR and lo w er R T) led to higher sub jective estimation of intelligibilit y p erformance accuracy , decreased rep orted eﬀort and disengagement. T able 4.2: Results of linear mixed mo del based on SNR and R T predictors estimates of the questionnaire. DF = 158 Self-rep ort scales Question Predictor Sub jective in telligibility Sub jective eﬀort Sub jective disengagemen t SNR β = 5.71 S E = 0.42 t = 13.48 p < 0.001 β = -5.60 S E = 0.41 t = -13.57 p < 0.001 β = -5.78 S E = 0.48 t = -11.85 p < 0.001 R T β = -33.74 S E = 2.47 t = -13.61 p < 0.001 β = 23.58 S E = 2.41 t = 9.76 p < 0.001 β = 33.39 S E = 2.85 t = 11.68 p < 0.001 SNR x R T β = 1.56 S E = 0.60 t = 2.57 p = 0.010 β = 1.50 S E = 0.59 t = 2.54 p = 0.012 β = -2.06 S E = 0.69 t = -2.94 p = 0.003 Chapter 4. Results 115 Figure 4.12: Subje ctive intel ligibility as a function of SNR and R T (line c olor/shading). Err or b ars r epr esent the standar d err or of the me an. Lines/symb ols ar e stagger e d for le gibility and do not indic ate variation in SNR. The sub jective impression of how m uc h eﬀort was required and how willing they were to give up in each situation are presented in Figures 4.13 and 4.14 , resp ectiv ely . Figure 4.13: Subje ctive eﬀort as a function of SNR and R T (line c olor/shading). Err or b ars r epr e- sent the standar d err or of the me an. Lines/symb ols ar e stagger e d for le gibility and do not indic ate variation in SNR. Figure 4.14: Subje ctive disengage- ment as a function of SNR and R T (line c olor/shading). Err or b ars r ep- r esent the standar d err or of the me an. Lines/symb ols ar e stagger e d for le gibility and do not indic ate variation in SNR. Chapter 4. Discussion 116 The results show the statistically signiﬁcant contributions of rev erb eration and SNR to p erceived p erformance, eﬀort and disengagemen t. F rom Figures 4.12 , 4.13 , and 4.14 , the self-rep ort scales v aried near-linearly with the signal degra- dations across conditions, agreeing generally with the b eha vioral data (See Figure 4.11 ). The sub jective eﬀort is related to the in verse of the reverberation time; the more time the energy needs to dissipate in the en vironmen t, the greater the p erceiv ed eﬀort. The results from all the self-report scale questions w ere highly correlated to p erformance. P earson skipp ed correlations [ 308 ] revealed a sig- niﬁcan t ρ co eﬃcient (See T able 4.3 ): T able 4.3: P earson skipp ed correlations betw een performance and self-reported ques- tions. p erformance vs sub jectiv e intelligibilit y p erformance vs sub jectiv e eﬀort p erformance vs sub jectiv e disengagement r 0.95 -0.79 -0.94 CI 0.93, 0.96 -0.84, -0.74 -0.96, -0.92 4.4 Discussion This study presen ted an in teresting challenge to the researchers. The pilot data p ointed to the direction of the virtualization not rendering the correct sound, esp ecially not the correct sound pressure level. The setup was retested and in vestigated in diﬀerent p ositions, and the problem w as identiﬁed. The ﬁrst-order Ambisonics rendering has a relatively small sw eet sp ot. Thus par- ticipan ts were monitored to b e in the correct p osition during the testing. The sw eet sp ot capabilities in terms of the correct ov erall SPL repro duction pre- sen ted limitations of plus-minus 1 dB up to 5 cm and plus-minus 3 dB to the target SPL up to 10 cm oﬀ center. Although not testing the exact Am bisonics implemen tation through a diﬀerent p erformance measure, the ﬁndings agree with literature observing con trasts caused by the repro duction metho d in sim- Chapter 4. Discussion 117 ilar distances out of the cen ter. As a reference, Grimm et al. [ 97 ] analyzed sim ulated Ambisonics en vironmen ts with diﬀeren t num b ers of loudsp eak ers, studying its inﬂuence on a represen tative hearing aid algorithm. It sho wed a decrease in SNR errors when increasing loudsp eakers and decreasing frequency . A bandwidth of 2 kHz in the central listening p osition, 12 loudsp eak ers w ould b e required for HOA. If 24 loudsp eak ers are a v ailable, the bandwidth in the cen tral listening p osition would b e 6 kHz. Laurent et al. [ 276 ] analyzed the reconstruction error to assess the rendering system’s frequency capabilities. A KEMAR was ﬁtted with a hearing aid, without pro cessing, to collect the impulse resp onses. Regarding range, a third-order implemen tation with 29 loudsp eak ers decreased from 3,150 Hz in the center to 2,500 Hz when p osi- tioned 10 cm from the center. T ests that in v olv e separated sound sources and are auralized and virtualized b y loudsp eak er setups need to b e v eriﬁed in terms of sweet sp ot size to the sp eciﬁc sound parameters (e.g., R T and SPL). An oﬀ-cen tered or mo ving head can, in an Am bisonics 1st order auralization, easily encoun ter a sp ot in space where, for example, the wa ve ﬁeld com bination ma y partially cancels one or more mask ers increasing the SNR, even if the in tended SNR is low (See Figure 4.7 ). In other oﬀ-center sp ot it could also b e p ossible to partially cancel the target. These distortions could profoundly impact the results, and not represen t what w ould b e achiev ed in the real scenario that was b eing sim ulated. F or normal hearing participants, a more psychologically orien ted psyc hoacous- tic auralization metho d such as lo wer order Ambisonics can pro vide the desired acoustic impression insofar as ob jective and sub jectiv e p erformance when the calibration is p erformed, and the setup limitations (e.g., v ery restricted sweet sp ot) are resp ected. An in vestigation of p erformance in oﬀ-center p ositions using hearing impaired participants would b e an imp ortant next step to w ards understanding a broad clinical application of this metho d. Chapter 4. Concluding Remarks 118 P articipan ts were tested in three diﬀerent SNRs (-8, -3, +2 dB) and three vir- tual ro oms (with R Ts of 0, 0.5, and 1.1 s). The more the manipulated signal w as degraded (low er SNR and higher R T), the more demanding the listening conditions b ecame, which led to lo w er the participant’s sp eec h in telligibility . A questionnaire w as used as a sub jective measure of eﬀort. Comprehensiv ely , participan ts rep orted increased sp eech intelligibilit y , less cognitive eﬀort, and less tendency to disengagement when diminish the signal degradation. That denotes that if they could recall the sp eec h well, they p erceived that they p erformed w ell and also sp en t less eﬀort. The results from all three ques- tions within the questionnaire were strongly correlated (either p ositively or negativ ely) to the sp eec h in telligibility of the participants. They signiﬁcantly c hanged with both SNR and R T and the in teraction b etw een them. When ask ed ab out sub jective impressions of eac h block, the participants demon- strated to ha v e p erceiv ed the prop osed signal degradation b oth in SNR and R T. That is in line with the studies from Zekveld et al. [ 319 ], Holub e et al. [ 121 ], Neuman et al. [ 203 ], Kw ak et al. [ 149 ], Nicola & Chiara’s study [ 204 ] and Picou & Ric ketts [ 229 ]. F urthermore, studies that cross ob jectiv e measurements of ph ysiological parameters in the literature asso ciated with c hanges in eﬀort can ha v e div ergen t outcomes, as discussed in Chapter 2 . F rom that discussion, it is sp eculated that these diﬀeren t metho ds, prop osed to ac hiev e a proxy to listen- ing eﬀort, are sensitiv e separated asp ects of a complex global pro cess [ 12 , 224 ]. Another explanation would b e the minimization of eﬀort utilized b y the par- ticipan t through the heuristic strategies in the sub jective metho d [ 192 ], and lastly , the eﬀect of w orking memory being related diﬀerently to diﬀeren t meth- o ds [ 53 , 186 ]. A separated study by Tirdad Seiﬁ-Ala from this combined ex- p erimen t examined the correlation b et w een ob jective (physiological resp onses of the brain) and sub jective paradigms. Chapter 4. Concluding Remarks 119 4.5 Concluding Remarks In this study , nine lev els of degradation were imp osed on sp eech signals ov er sp eec h mask ers separated in space and virtualized. Three diﬀerent SNRs (-8, -3, +2 dB) and three diﬀerent sim ulated rooms (with R Ts of 0, 0.5, 1.1 s) w ere used to manipulate task demand. The sp eec h intelligibilit y w as assessed through a w ord-scored sp eec h-in-noise test p erformed in a 24 loudsp eaker setup utilizing Am bisonics ﬁrst-order. The results sho w ed a high correlation betw een participan ts’ p erformance and resp onses to questions ab out sub jective intel- ligibilit y , eﬀort and disengagement. The main eﬀects and interaction of SNR and R T w ere demonstrated on all questions. F urthermore, it w as observ ed that the rev erb eration time inside a ro om impacts b oth sp eec h intelligibilit y and listening eﬀort. This study demonstrated the p ossibility of virtualizing a com bination of sound sources in low order Am bisonics and extracting quality b eha vioral data. Chapter 5 Iceb erg: A Hybrid Auralization Metho d F o cused on Compact Setups. 5.1 In tro duction P eople usually wear their hearing devices in spaces very diﬀeren t from the lab oratories’ soundpro of b o oths in ev eryday life. Additionally , the everyda y sounds are more complex and diﬀerent from the pure tones, w ords, and phrases without context utilized in many hearing tests. Therefore, hearing researc h has increasingly aimed to include acoustic verisimilitude on auditory tests to make them more realistic and/or ecologically v alid [ 61 , 79 , 101 , 177 , 212 , 217 ]. Thus, they can ev aluate new features and algorithms implemen ted on hearing devices and exp erimen t with diﬀerent ﬁttings and treatmen ts while maintaining their rep eatabilit y and control. One can utilize a particular auralization tec hnique to create repro ducible sound 120 Chapter 5. In tro duction 121 ﬁles in a listening area. These sounds attempt to mimic the acoustical charac- teristics of en vironmen ts (from actual recordings or acoustic sim ulations). It can then b e play ed through a set of loudsp eak ers or pair of headphones, cre- ating b oth the sub jective impression and ob jective represen tation of listening to the intended sound en vironment [ 293 ]. Through an auralization metho d, it is p ossible to create a sound ﬁle con taining spatial information ab out the scene and a series of details ab out the conﬁgura- tion of the repro duction system [ 293 ]. The repro duction system includes, for example, the n um b er of loudsp eak ers and their physical p osition, the n um b er of audio channels av ailable, and the distance from the loudsp eak ers to the lis- tening p osition. The size of the eﬀective listening repro duction area - where the auditory spatial cues of the scene are most accurate - is usually called the ”sweet sp ot” [ 253 ]. The spatialization accuracy is diﬀerently aﬀected b y diﬀeren t systems as w ell as auralization metho ds [ 65 , 97 , 166 , 275 , 276 ]. The auralization metho d can be decisiv e in the repro duction system choice; for example, certain metho ds require certain num b ers of loudsp eakers [ 62 , 217 ]. Consequen tly , the auralization metho d can b e a limiting factor dep ending on the tests or experiments. A dedicated setup capable of handling diﬀerent auralization metho ds with a large listening area [ 188 ] ma y require an excessiv e amoun t of funding and ph ysical space. These requiremen ts can b e a limiting factor to conducting researc h and developing inno v ativ e treatments. This c hapter prop oses a compact setup with a h ybrid auralization metho d. It is characterized in some conditions (R Ts, presence of a second listener, and listener p osition) by considering the intended use in auditory ev aluations as in the previous c hapter. The setup aims to repro duce sound scenes maintaining spatial lo calization and creating an immersive sound en vironmen t from either a scenario in an actual ro om or virtual ro oms created in acoustic softw are. Chapter 5. Iceb erg an Hybrid Auralization Metho d 122 5.2 Iceb erg an Hybrid Auralization Metho d The Iceb erg auralization metho d combines t w o well-kno wn metho ds: VBAP and Am bisonics. In Chapter 3 , VBAP and Am bisonics binaural cues w ere ob jectiv ely ev aluated. The VBAP metho d was found to render accurate cues in the center p osition, even with a second listener inside the array . That corrob orates the use of VBAP to increase tests’ ecological v alidit y in auditory tests [ 134 ]. On the other hand, Ambisonics deliv ered less precise lo calization cues, imp osing more restrictions on the listener’s position. The results are in line with literature presen ting p oor lo calization but high immersiv eness from lo w-order Am bisonics [ 104 , 105 ] and, conv ersely , lesser immersiveness and greater lo calization accuracy from VBAP [ 89 , 104 ]. Therefore, the idea here is to provide an auralization that contains temp oral and sp ectral features of the sounds encoded through VBAP while the spaciousness pro vided through the reverberation env elop e is enco ded through Ambisonics. This sp eciﬁc com bination of auralization metho ds has also b een considered to decrease the n um b er of necessary loudsp eak ers for a setup that requires regular hearing devices. A t the same time, the setup ma y allo w some degree of head mo v emen t without the need for tracking equipmen t. That is a coun termeasure to ov ercome common limitations in ordinary auditory test spaces [ 316 ]. 5.2.1 Motiv ation The primary motiv ation for creating this auralization method w as to test hear- ing aid users in t ypical situations while w earing hearing devices in a small setup. Therefore, the method is loudsp eak er-based, but at the same time, the num b er of loudsp eak ers and the system complexity w ere also constraints. The theoretical supp ort for com bining these auralization metho ds and prop os- Chapter 5. Iceb erg an Hybrid Auralization Metho d 123 ing the smaller virtualization setup is gathered from ro om acoustic parameters and psychoacoustics principles presented in the review and during this c hapter. These parameters and principles led to a system able to use RIRs from simu- lated environmen ts (spaces that ma y only exist in a computer) and recorded RIRs from real ones. The initial Iceb erg fo cus is on tests that manipulate sound scenarios to ev aluate sp eec h in telligibilit y mask ed b y noise from static p ositions as tested with lo w-order Ambisonics in Chapter 4 . 5.2.2 Metho d The Iceb erg metho d is a relatively easy-to-use algorithm that can be intro- duced to test environmen ts with a simple calibration pro cess. The virtual- ization system presen ted auralized ﬁles in a quadraphonic arra y with loud- sp eak ers p ositioned at 0, 90, 180 & 270 ◦ (see Figure 5.1 ). Other horizon tal setup arrangements can b e implemented dep ending on the need, considering the system’s angle rotation, frequency resp onse, and the p otential v ariation in lo calization accuracy . Although there is a minimum num b er of necessary loudsp eak ers (four), the metho d can b e used to auralize ﬁles to setups with an extended loudsp eak er num b er. The presen ted algorithm was implemented in MA TLAB (Math w orks). Figure 5.1: T op view. L oudsp e akers p osition on horizontal plane to virtualization with pr o- p ose d Ic eb er g metho d. The prop osed loudspeaker setup had a radius of 1.35 m. Other distances need to b e ev aluated regarding the system frequency resp onse. The prop osed Iceb erg’s implementation derives an appropriate multi-c hannel audio signal with sp eciﬁc information from a sound and its reﬂections (inci- Chapter 5. Iceb erg an Hybrid Auralization Metho d 124 dence angle, sound energy , spatial and temp oral distribution). These param- eters can b e enco ded into a sound ﬁle with the repro duction setup’s sp eciﬁc calibration v alues and p ositioning orien tation. Finally , the auralized ﬁle can b e repro duced (virtualized) as spatial sound. 5.2.2.1 Comp onen ts The Iceb erg metho d prop osed is a hybrid auralization metho d, a combination of VBAP and ﬁrst-order Ambisonics; section 2.3.2.2 reviews the deriv ation of b oth metho ds. Both tec hniques are based on the panorama of amplitude. The main diﬀerence is in the mathematical formulation of the gains applied to the amplitude of each sound source. VBAP treats the repro duced sound as a unitary v ector in a tw o- or three-dimensional plane (Equations 2.4 and 2.7 , resp ectiv ely). The weigh ts applied to the amplitude of the signal at each loudsp eak er are derived from the tangent law. It is traced as a v ector from the nearest a v ailable sources b et wee n the listening p osition and the desired source position (Equation 2.3 ). On the other hand, Ambisonics utilizes all loudsp eak ers av ailable to comp ose the sound ﬁeld. The metho d com bines the amplitude of the sources, calculating their w eights according to the sum of spherical harmonics (Equation 2.9 ) that represen ts the pressure ﬁeld formed b y the sound w av e (Equation 2.8 ). While VBAP concen trates the energy b etw een t w o loudsp eakers in its 2D implemen tation, Am bisonics spreads it through all av ailable loudsp eakers. That leads to a more immersive exp erience on Am bisonics, while the VBAP can b etter represen t the sound source direction. 5.2.2.2 Energy Balance The energy balance b etw een the metho ds is calculated based on the Ambisonics ﬁrst-order impulse resp onse (See example in Figure 5.2 ); on the left is the Chapter 5. Iceb erg an Hybrid Auralization Metho d 125 impulse resp onse (or deca y curve) and is not the deca y of the squared v alue of the sound pressure signal. On the righ t, the 10log curves ( h 2 ( t )) for the diﬀeren t channels. Note that in these curves, the maximum lev el is 0 [dB], as the interest is in the time it tak es for the p o wer to drop by 60 [dB]. Also, note that there is a small time gap b etw een time 0 [s] and when the energy v alue of h ( t ) is maximum. This interv al corresp onds to the time it tak es the sound wa ve to trav el b et w een the source and the receiver and allows an estimate of the distance b et w een them. F rom recorded IRs, the gap also includes the system dela y , which should b e comp ensated. That was the c hoice since the impulse resp onse of an en vironment can easily b e acquired utilizing an Ambisonics ﬁrst-order microphone array . F urthermore, it is p ossible to ﬁnd commercially a v ailable acoustic soft w are to ols to simulate sound environmen ts capable of exp orting impulse resp onses in Am bisonics format. Figure 5.2: Normalize d A mbisonics ﬁrst-or der RIR gener ate d via ODEON soft- war e. L eft p anel depicts the waveform; right p anel depicts the waveform in dB. The system’s design requires an RIR to b e split in to t w o parts. The ﬁrst part con tains the amoun t of energy to b e deliv ered through VBAP . The second part will b e computed through Am bisonics. F rom the reﬂectogram, the time represen tation of the latency and atten uation of the direct sound (DS), early reﬂections (ER), and late reﬂections (LR; see Figure 5.3 ), it is p ossible to ﬁnd Chapter 5. Iceb erg an Hybrid Auralization Metho d 126 the p oint in time representing the direct sound (the ﬁrst p eak) and then sep- arate it correctly from the rest of the RIR. Although splitting the RIR into DS and remainder ma y b e the most straightforw ard metho d, the achiev ed re- sults w ere initially p erceiv ed in p ersonal exp erience as unnatural highligh ted “dry” (not rev erb eran t) sound from a deﬁned p osition follo w ed by really dis- tan t/disconnected reverberation, coun ter to the aims of a more ecologically v alid sound repro duction. Th us, in the prop osed metho d the ER part w as included with the DS part. Figure 5.3: R eﬂe cto gr am split into Dir e ct Sound Early and L ate R eﬂe ctions. The late reﬂections of an RIR refer to the signal wa vefron ts reﬂected and scat- tered sev eral times across the diﬀerent p ossible paths. These reﬂections ov erlap eac h other, and as time progresses, successiv e w av efronts interact with an y sur- face, increasing reﬂection order, c hanging direction and decreasing remaining sound energy . The literature indicates that a psychoacoustical appro ximation of the time p oin t in a sp eciﬁc RIR when the human auditory system can no longer distin- guish single reﬂections due to reﬂection densit y [ 38 ]. Lindau [ 156 ] prop osed a transition p oin t in time (transition time ( t m )) based on mean free path length for the wa vefron t (Equation 5.1 ). t m = 20 V S + 12 [ms] , (5.1) Chapter 5. Iceb erg an Hybrid Auralization Metho d 127 where, V is the v olume of the ro om in m 3 , and S is the surface area inside the ro om in m 2 . The minimum necessary order of reﬂections to represent a uniform and isotropic sound ﬁeld that leads to diﬀuse rev erb eration from an Image Source (IS) mo del is 3. That agrees with observ ations from Kuttruf [ 148 ] on the sp ecular reﬂec- tions’ contribution to diﬀuse energy in an RIR. This approach w as implemented in a similar hybrid metho d by P elzer et al. [ 221 ]. Another metho d dev elop ed b y F avrot [ 79 ] also uses the IS order information from simulated RIR computed with ODEON soft w are. Its IS re- ﬂection order information provides a p oin t to obtain a segment of the ﬁle with the late reﬂections env elop e used by the system to deliv er a hybrid m ulti- c hannel RIR. These metho ds consider RIR and mix sp eciﬁc stim uli to the output, as do es the prop osed metho d. Other h ybrid auralization metho ds suc h as DirAC [ 243 ] consider the recording of a sound even t (in Am bisonics) and driv e the repro duction based on energy analysis spanning all sound source directions. Th us, DirAC is intended to primarily work with recorded scenes instead of conv olutions with RIR. 5.2.2.3 Iceb erg prop osition The prop osed Iceb erg metho d, how ever, uses neither the t m metho d, whic h is dep enden t on the v olume of the ro oms and the IS sim ulated reﬂection order, nor the LR env elop e time, derived from an IS sim ulation. Instead, a diﬀerent parameter is prop osed that allo ws generalizing to b oth recorded and sim ulated Am bisonics RIRs. P arameters of clarity and deﬁnition are metrics to determine early/late energy balance [ 43 ]. How ever, the ﬁxed time of 50 or 80 milliseconds is not appropriate Chapter 5. Iceb erg an Hybrid Auralization Metho d 128 to represent the transition p oin t (from early to late reﬂections) on every RIR, as the slop e will diﬀer and dep end on man y factors [ 45 ]. The transition p oint c hanges as the amoun t of energy and the deca y distribution change from RIR to RIR. A similar parameter that is not time ﬁxed is the T) given b y Equation 2.15 (see Section 2.3.3 ). This parameter is also derived from the squared RIR b y calculating the transition p oin t from early to late reﬂections represen ted as the RIR’s cen ter of gravit y . Therefore, the metho d’s name is given b ecause of the singularit y of the RIRs. They presen t a center of gravit y on its p o w er deca y representation, whic h is similar to the physical blo cks of frozen w ater called iceb ergs. The cen ter of gra vit y is the equilibrium p oint b et ween the gra vit y force and the w ater buo y ancy for iceb ergs [ 34 ]. This representation is translated to the Iceb erg metho d as the transition p oint b etw een early and late reﬂections from an RIR. This pro cess entails an RIR applied through multiplication in the frequency domain, equiv alent to a conv olution in the time domain, to a sound that can b e virtualized through the system. The ﬁrst action of the metho d’s algorithm is the iden tiﬁcation the cen ter time Ts in the c hannel relative to the omnidi- rectional channel of the Ambisonics RIR. A schematic ov erview of the metho d is presented in Figure 5.4 . Chapter 5. Iceb erg an Hybrid Auralization Metho d 129 Figure 5.4: Ic eb er g’s pr o c essing Blo ck diagr am. The Ambisonics RIR is tr e ate d, split, and c onvolve d to an input signal. A virtual auditory sc ene c an b e cr e ate d by playing the multi-channel output signal with the appr opriate setup.. Figure 5.5 shows an example of the RIR relativ e to the omnidirectional in- put channel simulated through ODEON V.12 [ 59 ] relativ e to the sim ulated restauran t dining ro om used in Chapter 4 with 1.1 seconds of rev erb eration time. Figure 5.5: Omnidir e ctional channel of Ambisonics RIR for a simulate d r o om. The blue line indic ates the p art that pr eviously sele cte d the c alculate d Center Time, henc e indic ate d as the dir e ct sound plus the e arly r eﬂe ctions. The or ange line indic ates the late r everb er ation p art of the RIR. Chapter 5. Iceb erg an Hybrid Auralization Metho d 130 Figure 5.6 presents an example of the Ambisonics RIR in the left column, the omnidirectional channel relative to the DS+ER in the middle column. The righ t column graphs represen t the four c hannels late reﬂections part of the Am bisonics RIR. Figure 5.6: First c olumn: four channels Ambisonics RIR. Midd le c olumn: omni- dir e ctional channel (DS+ER p art). R ight c olumn: four channels A mbisonics RIR (LR p art). In the sequence, the metho d ﬁrst splits the RIR based on the TS. Then the direct sound and the early reﬂections are con volv ed with the signal to b e repro- duced. In this step, only the omnidirectional channel is used. Finally , the sig- nal is pro cessed using VBAP to provide its directional prop erties. The VBAP metho d utilized was implemented in [ 237 ]. The VBAP output is t w o-c hannel panned audio that is sent to channels of the corresp onding loudsp eak ers. The output signal corresp onds to the relativ e full scale of the panned signal if the pro vided Ambisonics RIR is normalized, or the absolute v alue in case of an un-normalized RIR. With the normalized RIRs, calibration of a sound pres- sure level is required, and the repro duction level can b e set accordingly to the application needs. Assuming a coherent sum b et ween tw o loudsp eak ers that are set to repro duce the scaled signal to a predeﬁned lev el, a prop ortion is computed as follow: Chapter 5. Iceb erg an Hybrid Auralization Metho d 131 LS 1 = 20 log 10 (10 level / 20 ∗ sin 2 θ ) , (5.2a) LS 2 = 20 log 10 (10 level / 20 ∗ cos 2 θ ) , (5.2b) where the user sets the level in dB SPL and θ is the incidence angle. A similar lev el calibration recording a pure tone from a calibrator with a microphone to ﬁnd a system’s α co eﬃcient (as explained in 3.2.3 ) will allo w pla ying the signal ov er each loudsp eaker with the intended level. A frequency ﬁlter for eac h loudsp eak er is also p ossible if the loudsp eak ers’ FRF needs to b e individually adjusted to achiev e a ﬂat(ter) resp onse. The second part of the impulse resp onse is then con v olved with the signal, all four c hannels in the prop osed quadraphonic system b eing used. First, an Am bisonics deco der matrix observing the loudsp eak ers’ p osition is created. Th us, the conv olved signal is deco ded from its bF ormat to aF ormat. The implemen tation utilized in the algorithm to create the deco der matrix and to deco de the signal uses the functions from the Politis [ 237 ] w ork. The separated signals are then merged b eing ready to b e repro duced. Figure 5.7 show an example for an auralization of ﬁve seconds of the Interna- tional Sp eec h T est Signal (ISTS) [ 120 ]. The top graph is the original signal, and the mid-top is the signal conv olved with the DS and ER of the omnidi- rectional c hannel of the Ambisonics RIR. The en v elop e is minimally aﬀected b y the ER. The mid-b ottom sho ws the signal conv olved to the LR part of four channels and deco ded from Ambisonics bF ormat. The diﬀuse nature of the Ambisonics-generate LR eviden t in the smo other o verall env elop e. The b ottom graph shows the result of the Iceb erg metho d, the merged signal. This pro cess provides an auralized ﬁle that should b e repro duced through an Chapter 5. Iceb erg an Hybrid Auralization Metho d 132 Figure 5.7: Ic eb er g metho d example. T op gr aph: original signal. Mid top gr aph: DS+ER p art (VBAP). Mid b ottom gr aph: LR p art (Ambisonics). Bot tom p art (Ic eb er g). equalized and calibrated setup. An equalization and calibration prop osal is describ ed in the Section 5.2.3 and can b e applied to similar setups with equiv- alen t hardw are. Ho wev er, the results ma y v ary depending on hardware quality , loudsp eak er ampliﬁcation, and frequency resp onse. In this w ork, the electroa- coustical requirements (7.2.2) and Reference listening ro om (8.2) from the Recommendation ITU-R 1116-3 [ 126 ]: Metho ds for the sub jective assessmen t of small impairments in audio systems were observed. The frequency-sp eciﬁc rev erb eration times w ere low er than the Recommendation: 0.04 s from 0.2- 4 kHz (0.08 s at 0.125 kHz) and 0.18 s in the Recommendation. The anechoic c haracteristic of the room was inten tionally chosen in this case to ev aluate rev erb eration in the virtualization setup. A setup within a diﬀeren t space will hav e diﬀerent ro om acoustics c haracteristics. The exp erimen ter can com- p ensate for the need for greater reverberation b y con trolling the RIRs input. The electroacoustical requiremen ts for the loudsp eakers are also relev ant as they aim to guaran tee the correct frequency reproduction or the p ossibility of comp ensating the frequency resp onse with the appropriate hardw are. The ro om prop ortions are also essen tial when setting a test environmen t, espe- cially if the repro duction will include low frequencies aﬀected by the ro om’s Chapter 5. Iceb erg an Hybrid Auralization Metho d 133 Eigen tones (standing wa ves). The address https://gith ub.com/aguirreSL/HybridAuralization con tains an ex- ample and the necessary resources to auralize ﬁles according to this Iceb erg metho d. This study utilizedAm bisonics ﬁrst-order impulse resp onses generated with the ODEON V.12 soft ware. The choice was made by con v enience, and it can b e extended to an y equiv alent Ambisonics RIR, simulated or recorded. The resulting RIR from ODEON is normalized. With that, the user can play a sound on a diﬀerent level (from the simulated one) without rerunning the sim ulation using the normalized version. As an option, the metho d can denor- malize it (dividing the RIR by its corresp onding factor provided in ODEON grid [ 159 ].). The denormalized result sound will b e auralized to the level sim- ulated in ODEON (or equiv alent softw are). 5.2.3 Setup Equalization & Calibration: The setup can include a calibration and equalization procedure that is included in the MA TLAB scripts to ensure a correct sound level repro duction and also a ﬂatter frequency resp onse from the system’s loudsp eak ers, av oiding additional undesired coloration artifacts. First it was calculated a factor to transform the acquired signals from full scale to dB SPL. This step consists in recording a pure tone at a sp eciﬁc frequency (1 kHz) with a known input level of 1 Pa, and calculating a factor to conv ert the input from F ull-Scale to P a. The term indirect refers to the fact that this calculated factor is applied to all frequencies, under the assumption that the setup (microphone, pre-ampliﬁer, pow er supply , and AD/DA conv erter) has a ﬂat frequency resp onse in the audible frequency range. T o calculate the conv ersion factor a sound pressure calibrator device (in this case the B&K 4231) was connected to the microphone (1/2” B&K Chapter 5. Iceb erg an Hybrid Auralization Metho d 134 4192 pressure-ﬁeld and a pre-ampliﬁer type 2669, supplied b y p ow er mo dule B&K 5935). That provided a 93.98 dB SPL signal, which corresponds to 1 Pa. The calibration factor ( α rms ) was calculated as in Equation 5.3 . Although this step was not needed to the frequency equalization, it w as con venien t as once measured all the following measurements were p erformed without the need of en tering the ro om. α rms = 1 RMS( v ( t ) 1kHz )  P a FS  , (5.3) The next step consists in equalize the frequency resp onse of eac h loudsp eaker. A RIR from eac h loudsp eak er was measured and based on that an inv erted FIR ﬁlter w as individually created to b e applied to the signals that will b e repro duced. The frequency resp onse w as con v erted to its third-o ctav e v ersion, normalized, and inv erted to create a vector with 27 v alues from 50 Hz to 20 kHz. These vectors con tained the correction v alues in the frequency domain and can b e applied to any input signal. T o apply this corrections a Piecewise Cubic Hermite Interpolating P olynomial (PCHIP) w as used in MA TLAB to ﬁt the v alues to the giv en input. Figure 5.8 presen ts an example of the normalized third o cta v e mo ving av erage RIR acquired with a loudsp eaker (blue line), the same RIR but acquired with a signal that was ﬁltered (red line), and the ﬁlter frequency v alues obtained with the inv ersion of the original RIR (black line). Chapter 5. Iceb erg an Hybrid Auralization Metho d 135 Figure 5.8: L oudsp e akers normalize d fr e quency r esp onse and inverte d ﬁlter. Dote d lines r epr esent ITU-R 1116-3 limits. Figures 5.9 and 5.10 shows the moving av erage of each loudsp eak er’s normal- ized frequency resp onse without and with the ﬁlter, resp ectively . Figure 5.9: L oudsp e akers normalize d fr e quency r esp onse (c olor e d solid lines), dote d lines r epr esent ITU-R 1116-3 limits. Chapter 5. Iceb erg an Hybrid Auralization Metho d 136 Figure 5.10: L oudsp e akers normalize d fr e quency r esp onse with fr e quency ﬁlter c or- r e ction (c olor e d solid lines), dote d lines r epr esent ITU-R 1116-3 limits. As the ampliﬁcation to eac h active loudsp eak er is individually controlled it is p ossible that a same ﬁle could b e repro duced at a diﬀeren t sound pressure lev el (if someone inadv erten tly or accidentally c hange the vol ume con trol button directly in the loudsp eak er for example). Since the α rms w as already calculated and it was p ossible to con v ert a signal from FS to Pa, and consequen tly , to dB SPL, and vic e-versa the individual loudsp eakers’ SPL w ere measured with a signal deﬁned to b e pla yed at 70 dB SPL Equation 5.4 . signal( t ) =  signal( t ) rms(signal( t )) 10 70 − dBperV 20 µ  Γ l (5.4) where Γ l is the level factor to the loudsp eak er l with initial v alue = 1; dBp erV = 20 log 10  α rms 20 µ  . The signal( t ) was pla y ed through a loudsp eak er l and simultaneously recorded with the microphone S l ( t ); the SPL of the recorded signal was calculated as follo ws Chapter 5. Iceb erg an Hybrid Auralization Metho d 137 SPL l = 20 log 10 S l ( t )[FS] α rms  Pa FS  20[ µ P a] ! [dB] , (5.5) T en measuremen ts were sequen tially performed with each loudsp eak er at inter- v als of 1 s; another iteration of measuremen ts were p erformed if the measured SPL exceeded the tolerance of 0.5 dB on any of the measuremen ts. A step of ± 0.1 [FS] is set to up date Γ l in its next iteration according to the SPL obtained. Chapter 5. System Characterization 138 5.3 System Characterization The Iceb erg auralization metho d in a four-loudsp eak er system (minimum re- quired) was ev aluated for its capabilities to repro duce the intended reverbera- tion time and the appropriate binaural cues. This section describ es the system setup, and the conditions experimented with utilizing the Iceberg metho d. The metho d’s accuracy at the optimal and sub-optimal p ositions was considered in this c haracterization as w ell the impact of the R T. F urthermore, placing a second listener inside the ring was in vestigated to supp ort a more ecolog- ical situation. By the end, a complementary study for those conditions was conducted with an aided mannequin to supplement the ob jective data as the pandemic preven ted sub jective data collection. The presen t study used the IT A-T o olb ox [ 29 ] for signal acquisition and pro cess- ing. T o further enhanc e the accuracy of the lo calization estimates, a MA TLAB implemen tation of the May and Kohlrausch [ 182 ] lo calization mo del from the Auditory Mo deling T o olb o x (AMT, https://www.am to olb o x.org ) [ 287 ]) was also emplo y ed. The May mo del is sp eciﬁcally designed to b e robust against the detrimental eﬀects of reverberation on lo calization p erformance, making it an ideal choice for supplementing the ob jective data gathered in the present study . The reverberation, or the p ersistence of sound after its initial source has ceased, w as a parameter in this test that could signiﬁcantly distort the estimated lo cation of a sound source. The Ma y mo del accounts for reverbera- tion’s inﬂuence through frequency-dep endent time dela y parameters, enabling more accurate lo calization estimates in reverberant en vironments. By incorpo- rating the mo del in our analysis, we supplemen ted the ob jective data gathered through signal pro cessing with an additional lay er of mo deling that allow ed a relativ e comparison with previous studies. The main ob jective of an auralization metho d and its virtualization setup is Chapter 5. System Characterization 139 to deliver appropriate spatial a w areness to human listeners. The natural step for this would b e to verify and v alidate the metho d. Unfortunately , special conditions were in place during the course of this study; due to CO VID-19 re- strictions, v alidation tests with participants w ere not feasible. The Section 5.5 extends the system veriﬁcation and analysis to a targeted application of hearing aid research. Although it do es not replace a sub jective impression v alidation and analysis, it can help understand and predict the system’s b ehavior in a t ypical use case for hearing research, whic h is the user with hearing aids. 5.3.1 Exp erimen tal Setup The proposed metho d w as implemen ted, and the tests were conducted at Erik- sholm Research Centre in Denmark. The test environmen t was an anec hoic ro om (IAC Acoustics) with inner dimensions of 4.3 m × 3.4 m × 2.7 m. Sig- nals were routed through a sound card (MOTU PCIe-424) with a Firewire 440 connection to the MOTU Audio 24 I/O in terface and pla yed via loudsp eak- ers Genelec mo del 8030C (Genelec Oy , Iisalmi, Finland). The well-con trolled sound en vironment was appropriate for the assessment of small impairments in audio systems, although the acoustic prop erties of the ro om exceed the sound b o oths and ro oms commonly encountered in audiology clinics [ 316 ]. 5.3.2 Virtualized RIRs & BRIRs A set of 72 ro om impulse resp onses and 72 binaural ro om impulse resp onses w ere acquired through the system separated ov er 5 degrees angles around the cen ter p osition assuming x as lateral axis and y the fron t-back (mid-saggital) axis of a person inside the ring. Moreov er, the same n um b er of RIRs and BRIRs were measured at oﬀ-center p ositions. Chapter 5. System Characterization 140 The virtualized RIRs and BRIRs were acquired utilizing a logarithimc sweep signal (50-20000 Hz, 2.73 s (FFT Degree 18, Sample F requency 96 kHz)) [ 194 ] as input. The signal was auralized to eac h angle in the Iceb erg metho d for the same three spaces as in Chapter 4 : a classro om (9.46 m x 6.69 m x 3.00 m) with an ov erall Rev erb eration Time R T of 0.5 s, a restauran t dining area (12.19 m x 7.71 m x 2.80 m) with an ov erall R T of 1.1 s, and an anec hoic ro om (4.3 m x 3.4 m x 2.7 m) with an ideal o v erall R T of 0.0 s. All ro oms w ere acoustically simulated in ODEON soft w are V.12, that generated the am- bisonics RIRs represen ting eac h mentioned source-receptor conﬁguration. The absorption co eﬃcien ts of the ro om surfaces are listed in App endix E . The initial step to acquiring the RIR and BRIR w as to auralize the sweep ﬁle utilizing the Iceberg metho d to the desired p ositions (72 angles around the center) in three diﬀerent ro om conditions. Then play it through the four loudsp eak ers p ositioned in the front (0 ° ), left (90 ° ), bac k (180 ° ) right (270 ° ) coun ter-clo c kwise angles. The auralized v ersion of the sweep should corresp ond to the signal pla y ed in the virtual environmen t as the reverberation added by the anechoic ro om is negligible. After that, the recorded ﬁle w as de-conv olved with a zero-padded v ersion of the ra w sweep (See Figure 5.11 ). The playbac k and recording utilized the maximum sampling rate supp orted on the AD/DA system (96,000 Hz) as the diﬀerence in time is in the µ s scale. Therefore, the step size in time pro vided in microseconds is given by step size = (1 / 96 , 000) ∗ 1 , 000 , 000 = 10 . 42 µ s. The created sw eep duration was 2.731 s (FFT Degree = 18). Chapter 5. System Characterization 141 Figure 5.11: BRIR/RIR ac quisition ﬂowchart: Ic eb er g aur alization metho d. A manikin with artiﬁcial pinnae (HA TS mo del 4128-C; Br ¨ uel and Kjær) was used to record the binaural ﬁles. Also, a second listener was simulated dur- ing the tests with a diﬀerent manikin (KEMAR; GRAS), (See Figure 5.12 ). The HA TS recordings w ere calibrated as describ ed in Section 3.2.3 follo wing Equations 3.1a and 3.1b . Figure 5.12: BRIR me asur ement setup: B&K HA TS and KEMAR p ositione d in- side the ane choic r o om. Chapter 5. System Characterization 142 5.3.3 Conditions The auralized ﬁles w ere then recorded under the follo wing conditions: • Optimal p osition (alone and cen tered) • Optimal p osition (cen tered) accompanied b y a second listener • Oﬀ cen ter p ositions alone The p ositions grid can b e visualized in Figure 5.13 . Figure 5.13: Me asur ement p ositions: Obtaine d thr ough virtualize d sound sour c es with Ic eb er g metho d (VBAP and A mbisonics) in a four-loudsp e aker setup. The most accurate p erformance is theoretically exp ected from optimal p osi- tion. These tec hniques provide virtualization assuming the receptor (listener) is in the cen ter of the loudsp eaker ring [ 65 , 241 ]. Adding a second listener in to the repro duction area and/or moving the primary listener aw a y from the cen ter can c hallenge the system’s ability to render the scene as in tended. The follo wing sections presents and discusses the system’s capabilities to repro- duce Iceb erg auralized ﬁles b y measuring the binaural cues and R T in diﬀeren t conditions. Chapter 5. System Characterization 143 5.3.4 Rev erb eration Time A ro om’s characteristic wa ve ﬁeld pattern can aﬀect the human p erception of a repro duced sound. Ro om acoustics can alter attributes related to spatial p erception. F or example, a recorded sound has almost no c hance of b eing cor- rectly repro duced if the reproduction room has stronger rev erb eration than the recorded one. Also, rev erb eration ov ersho ot can smear the p erceiv ed direction of a sound source, as early reﬂections w ould b e heightened in this case [ 242 ]. The R T w as calculated from impulse resp onses measured within the three vir- tualized environmen ts (note that the simulated environmen ts were aimed to presen t R T of 0, 0.5, and 1.1 seconds). Reverberation time w as calculated using the IT A T o olb ox. The parameters w ere set as follows: F requency from 125 Hz to 16 kHz, one band p er o cta v e, and 20 dB threshold b elo w maximum. The rev erb eration time w as sho wn to b e stable in this virtualization setup. An appro x. 0.08 s Overall R T can b e observed for the anechoic simulation (0 s R T). That is most lik ely driven by the presence of the hardware inside the anec hoic ro om: loudsp eak ers and woo d base for the chair, although cov ered with foam. The ov erall reverberation time was measured without an omnidirectional sound source. T o circumv ent this limitation, the measuremen t w as rep eated utilizing all 24 loudsp eak ers as sound source, one at a time. The o v erall R T in this case w as considered as the maximum v alue across frequencies in o cta v e bands from 125 Hz to 16k Hz. Figure 5.14 presents the b o xplot of the measured v alues in relation to its p osition inside the ro om. Ro ws represent the aimed R T (0, 0.5 and 1.1 s). The top line presen ts results without lateral displacemen t, the middle line presents the results according to a lateral displacement of 2.5 cm from the center and the b ottom line presents the results according to a lateral displacemen t of 5 cm from the cen ter. Chapter 5. System Characterization 144 Figure 5.14: R everb er ation Time envir onments me asur e d with ﬁles pr o duc e d with Ic eb er g metho d and virtualize d in four-loudsp e akers. T able 5.1 presen t the median of the v alues of the ov erall R Ts. Therefore, it is p ossible to notice that virtualized environmen t R Ts’, tend to b e stable, and to the measured conditions, under the just noticeable diﬀerence JND of 5% [ 264 , 265 ] across p ositions inside the ro om. T able 5.1: Reverberation Time in three virtualized environmen ts in diﬀeren t p osi- tions inside the loudsp eak er ring. R T = 0 R T = 0.5 R T = 1.1 P osition [cm] Overall R T [s] x=0.0; y=0.0 0.085 0.519 1.114 x=0.0; y=2.5 0.085 0.519 1.111 x=0.0; y=5.0 0.085 0.526 1.113 x=0.0; y=10.0 0.084 0.526 1.147 x=2.5; y=0.0 0.085 0.531 1.120 x=2.5; y=2.5 0.086 0.529 1.114 x=2.5; y=5.0 0.084 0.559 1.148 x=2.5; y=10.0 0.083 0.546 1.157 x=5.0; y=0.0 0.085 0.537 1.139 x=5.0; y=2.5 0.085 0.538 1.138 x=5.0; y=5.0 0.085 0.548 1.138 x=5.0; y=10.0 0.084 0.552 1.147 Chapter 5. Main Results 145 5.4 Main Results This section presen ts the results based on the mannequin p ositions (center and oﬀ-cen ter) and conditions (HA TS and HA TS with KEMAR) to angles referenced clo c kwise. 5.4.1 Cen tered P osition 5.4.1.1 In teraural Time Diﬀerence The blue line in Figure 5.15 represents the result of the In teraural Time Diﬀer- ence ITD ﬁltered with a 1 kHz lo w-pass ﬁlter virtualized through the prop osed system. Figure 5.15: Inter aur al Time Diﬀer enc e under 1 kHz as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er g aur alization metho d on a 4 loudsp e akers setup. The r e d line is the ITD r esults with r e al loudsp e akers (without virtualization). A c c or ding to the sample r ate, the blue and r e d shade d ar e as in ar e the c onﬁdenc e intervals. The black line r epr esents the analytic al ITD values. Chapter 5. Main Results 146 W ang and Bro wn [ 297 ] deﬁned the analytical ITD (blac k line in ﬁgure) (See Equation 5.6 ) considering a centered, p erfect sphere of 10.5 cm of radius ( a ) and sound propagation v elo cit y ( c ) of 340 m/s, θ is the angle in radians. ITD =  a c  2 sin( θ ) (5.6) The maximum absolute diﬀerence found is 170 µ s, representing a mismatch around 15 º on the given angle. The calculated a verage diﬀerence is 67 µ s, represen ting a diﬀerence of around 7 º in lo calization. Figure 5.16: Inter aur al Time Diﬀer enc e at 1 kHz as a function of azimuth angle for a HA TS Br ¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr o- p ose d Ic eb er g metho d on a 4 loudsp e akers setup on thr e e diﬀer ent r everb er ation time sc enarios. Three diﬀeren t simulated ro oms w ere measured utilizing ﬁles generated via Iceb erg metho d to a four-loudsp eak er setup, k eeping the listener in the center p osition. Figure 5.16 presents the acquired ITDs with Iceb erg metho d for R T = 0 s (blue), R T = 0.5 s (red), R T = 1.1 s (yello w) and the ITD for R T = 0 without using virtualization ( i.e., repro ducing through real loudsp eak ers) in Chapter 5. Main Results 147 blac k. There were no substantial diﬀerences across the diﬀeren t reverberation times virtualized. This was exp ected, though, as the direct sound drives the ITD. 5.4.1.2 In teraural Lev el Diﬀerence Figure 5.17 sho ws the ILDs (calculated following Equation 3.7 ) across o ctav e bands for the angles around the cen ter horizon tally spaced 30 º for b etter vi- sualization. The ILDs w ere most aﬀected than the ITDs, with a substan tial reduction in ILD relative to the actual loudsp eakers observ ed in the 2 kHz band. The ILD v alues hav e a similar pattern and magnitude for a signiﬁcan t part of the sp ectrum at most angles. Figure 5.17: Ic eb er g Inter aur al L evel Diﬀer enc es as a function of o ctave-b and c en- ter fr e quencies; sep ar ate lines for angles of incidenc e. Figure 5.18 presen ts the ILD for b oth setups: real loudsp eakers and Iceb erg metho d in six o ctav e bands as a function of azimuth in spaces of 15 º . The top-righ t corner graph shows the 2 kHz band. It sho ws that apart from the p ositions where there is an actual loudsp eaker ( i.e., 0 º , 90 º , 180 º , and 270 º ), the diﬀerences are large, greater than 10 dB at some azimuth angles. Chapter 5. Main Results 148 Figure 5.18: Ic eb er g ILD as a function of azimuth angle. Listener alone in the c enter. Figure 5.19 shows the absolute diﬀerence in ILD b etw een physical loudsp eakers and virtual loudsp eakers created b y the Iceb erg metho d. Figure 5.19: Ic eb er g metho d: A bsolute ILD Diﬀer enc es as a function of azimuth angle. T u [ 291 ] measured just-noticeable diﬀerences (JNDs) in ILDs for normal- hearing participants using pure tones at diﬀerent presentation lev els. These JNDs can be used to estimate the p erception of diﬀerences b et ween ILDs Chapter 5. Main Results 149 obtained with physical loudsp eak ers and the Iceb erg auralization setup to an- alyze if the ILD diﬀerence b et w een setups will b e p erceiv ed in a given sp eciﬁc frequency band. Figure 5.20 presen ts the v alues from Figure 5.19 min us the appropriate pure-tone ILD JND v alues. That means that p ositive v alues that exceeded the JND could b e p erceived not as in tended; that is, a p erceptible ILD deviation can cause spatial distortion [ 38 ]. The 2-kHz ILDs show up to 8 º div ergence across most angles. ILDs in other frequency bands (1, 2, 8, and 16 kHz) also presen ted v alues that could relate to noticeable diﬀerences (up to 4 dB), but those are mostly limited to frontal ± 30 ° angles. The 2 kHz mismatc h can b e considered a ﬂa w in the repro duction system. The eﬀect on sound lo calization and sub jectiv e impression of complex sounds in v olving these frequencies needs further in vestigation as to the scale of spatial distortion. As the ITDs and ILDs at other frequencies w ere relativ ely w ell preserv ed in the auralized system with only real loudsp eak ers, it is p ossible this ﬂaw at 2 kHz may ha v e a minimal eﬀect, esp ecially for low er frequency stim uli. System reliabilit y should b e veriﬁed ﬁrst for stimuli with p eak energy in the 2 kHz band or tasks requiring greater lo calization accuracy (e.g., with sound sources within ± 30 ° ). Figure 5.20: Ic eb er g: Absolute ILD Diﬀer enc es over JND as a function of azimuth angles ar ound the c entr al p oint). Chapter 5. Main Results 150 5.4.1.3 Azim uth Estimation The fron tal azim uth angle was estimated using the binaural mo del by May and Kohlrausc h [ 182 ]. Each BRIR w as con v olved with a pink noise with a duration of 2.9 s as input into the mo del. The mean of the azim uth of each ﬁle is stated as azimuth predicted by the mo del. Figure 5.21 presents the angles estimated with the Ma y and Kohlrausch mo del for ﬁles auralized with the Iceb erg metho d for an anechoic ro om and virtualized o ver the four-loudspeaker setup (blue curv e), angles estimated for binaural ﬁles acquired without virtualization with real loudsp eak ers (red curv e), and the reference (dotted black). Figure 5.21: Ic eb er g metho d: Estimate d azimuth angle (mo del by May and Kohlr ausch [ 182 ]), HA TS c enter e d, and R T = 0s. The mo del’s results are in line with the analysis of the binaural cues supporting the assumption of the w orst lo calization accuracy around ± 30 º diﬀerence (30 º and 330 º in Figure 5.21 ). Also, the virtualized sound tends to hav e more diﬃcult y separating from the frontal angle (0 º ). Chapter 5. Main Results 151 5.4.2 Oﬀ-Cen ter P ositions Mo ving the primary listener oﬀ-center (displaced b oth on the x and y axes) is prop osed to measure the impact of a p erson’s head (and b ody) not b eing cen tered – such as when not ﬁxated – on the system’s ability to render the appropriate binaural cues. 5.4.2.1 In teraural Time Diﬀerence Figure 5.22 presen ts 72 measured ITDs around the listener (5 º spacing) in four diﬀeren t placemen ts: at center and displaced forw ards (y-axis) 2.5, 5 and 10 cm. When displaced from the cen ter p osition, the Iceb erg metho d can cop e with deliv ering a reasonably interaural time diﬀerence in frontal displacemen ts up to 5 cm or considering a sim ultaneous misplacemen t lateral and frontal up to 2.5 cm. How ever, compared to the center p osition, the error increased dramatically with 10 cm displacement for frontal angles (around ± 45) up to 400 µ s compared to the listener in the center. Lateral displacement p ositions (2.5 and 5.0 cm) were also inv estigated. The ITD results for these displacemen ts presented the same trend as seen without lateral displacement. Similar results w ere found when virtualizing the scenes with a rev erb eration time of 0.5 and 1.1 seconds. All combination results are presen ted in Figure 5.23 to improv e readabilit y . Chapter 5. Main Results 152 Figure 5.22: Ic eb er g ITD as a function of fr ontal displac ement: Center e d listener in pr op ose d Ic eb er g metho d in a four-loudsp e aker setup. Figure 5.23: ITD Ic eb er g virtualize d setup: Listener displac ement: listener p osition 2.5 cm oﬀ-c enter in pr op ose d Ic eb er g metho d in a four-loudsp e aker setup. ITDs were aﬀected with frontal displacements dep ending on the amount of rev erb eration sim ulated. In the simulated dry condition the squared b eha v- ior is presen t with 5 cm oﬀ cen ter, with mild rev erb eration the eﬀect only Chapter 5. Main Results 153 app ears with a displacemen t of 10 cm and the largest reverberation tested demonstrated the problem to virtualize sources in all oﬀ center p ositions. The deviation is centered to ± 45 º in all conditions. Lateral mov ements were ev en more aﬀected, as expected, delivering ITDs based on loudsp eak er position (the squared shap e) and not by circular placemen t of virtualized sound sources on displacemen ts further than 3.5 cm from the center (combining the lateral and fron tal mov ements.) 5.4.2.2 In teraural Lev el Diﬀerence Figure 5.24 presents the diﬀerence b et w een the ILDs measured in the center and the ILDs measured in diﬀerent p ositions for a dry ro om simulation (R T = 0 s). The lateral displacemen t (x-axis) is ordered as ro ws (top ro w = center, middle row = 2.5 cm and b ottom row = 5 cm to the righ t). The four columns are related to the fron tal displacemen t (y-axis) of 0 (cen ter), 2.5, 5, and 10 cm. Note that these are additional ILD errors to the previously discussed errors in tro duced b y the sim ulation itself (with the listener at cen ter). Figure 5.24: Diﬀer enc e in ILD as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er g aur al- ization metho d on a four-loudsp e akers setup (R T = 0.0 s). Chapter 5. Main Results 154 The ILDs are aﬀected by listener displacement mostly in the mid frequencies and only at certain angles. Lateral displacement of 2.5 cm pro duces larger in terference (up to 8 dB) for the left angles 40 º and 130 º . In con trast, at other angles, ILD diﬀerences are lo wer than 3 dB. The 5 cm displacement also presents diﬀerences up to 15 dB at these same angles and up to 8 dB diﬀerences contralaterally (220 º and 320 º ). F rontal displacement follo ws a similar pattern with more diﬀerences at some of the rear angles (130 º and 220 º ). These particular diﬀerences indicate relativ ely lo w impact on ILD cues using the Iceb erg metho d in the simulated anechoic ro om (R T = 0 s). Similar results in terms of aﬀected angles w ere found analyzing ILDs for the same listener p ositions for simulated ro oms with R T = 0.5 s Figure 5.25 and R T = 1.1 s Figure 5.26 . These conditions are closer to ev eryday situations. The increased energy of the late reﬂections results in lesser magnitude diﬀerences in ILD, indicating sligh tly b etter p erformance for more realistic simulations. Figure 5.25: Diﬀer enc e in ILD as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er g metho d on a four-loudsp e akers setup R T = 0.5 s. Chapter 5. Main Results 155 Figure 5.26: Diﬀer enc e in ILD as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er g metho d on a four-loudsp e akers setup R T = 1.1 s. 5.4.2.3 Azim uth Estimation Using again the May et al . mo del, in the same setup as in Section 5.4.1.3 to predict the lo calization of a sound source, Figure 5.27 presents the predicted source lo cations when moving the listener along the grid p ositions men tioned (x = 0, 2.5 and 5 cm; y = 0, 2.5, 5 and 10 cm).The diﬀerent R Ts are represen ted b y the line colors in the graphs (blue = 0.0s, red = 0.5s, yello w = 1.1s). The results indicate the system’s spatial sound accuracy is dep endent of the listener p osition. On the other hand the error is not dep enden t on the reverberation time. Lateral mov ements increase the error to the side that is getting closer to the ear, while lessened on the con tra-lateral side. F rontal mo v ements increased the n umber of angles that are not delivering the correct source angle (longer straigh t horizontal line around zero). The mo del estimates an maximum error up to ≈ 30 º to a listener within 3.5 cm from the center (combining Lateral anf fron tal displacement). Chapter 5. Main Results 156 Figure 5.27: Estimate d (mo del by May and Kohlr ausch [ 182 ]) fr ontal azimuth angle at diﬀer ent p ositions inside the loudsp e aker ring as function of the tar get angle. The errors when comparing the angles estimated on displaced p ositions to the estimated to the cen ter p osition are lessened with the incremen t of rev erb era- tion to the ma jorit y of the angles. 5.4.3 Cen tered Accompanied b y a Second Listener The Binaural cues w ere inv estigated, adding a second listener to the scene and maintaining the ﬁrst in the center (sw eet sp ot). The second listener w as p ositioned in three diﬀeren t lateral (x-axis) distances on the left from the cen ter: • 50 cm (simulating shoulder to shoulder). • 75 cm. • 100 cm. Chapter 5. Main Results 157 5.4.3.1 In teraural Time Diﬀerence The upp er row of Figure 5.28 shows the ITDs considering the setup with the HA TS alone at the center (blue line) and also with the presence of a second listener p ositioned at the righ t side at three diﬀerent distances from the center and the reference. The ITDs in black were computed with no virtualization as a reference. Figure 5.28: ITDs and absolute ITD diﬀer enc es as a function of angle for multiple c onﬁgur ations with (c olor e d lines) and without a se c ond listener (black line). There w as a small diﬀerence ( ≈ 15 µ s) as the second listener is placed at the closest p osition (50 cm) considering rear and right angles. The absolute diﬀerence has a maximum of 201 µ s, equiv alent to approximated 15 ◦ in the source p osition (see b ottom ro w of Figure 5.28 ). 5.4.3.2 In teraural Lev el Diﬀerence Figure 5.29 presents the diﬀerence (∆ ILD) b et ween the ILD computed from the BRIRs collected with and without a second listener inside the ring. The Chapter 5. Main Results 158 panel rows top to b ottom show ∆ ILDs for simulated rooms with R Ts of 0, 0.5 and 1.1 s. The columns represen t the diﬀeren t distances b et ween the centred and second listener, from 50 to 100 cm, left to right. Figure 5.29: Inter aur al level diﬀer enc es aver age d o ctave b ands as a function of azimuth angle for a HA TS Br ¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er g metho d on a four-loudsp e akers setup. The results sho w that adding a second listener impacts the ILD for the angles shado w ed by the second listener. The eﬀect is more pronounced in the higher o cta v e bands (8 kHz and 16 kHz), reac hing appro x. 14 dB, especially at the closest and farthest distances (50 cm and 100 cm). Although there is less impact of ha ving a second listener at an intermediary distance (appro x. 9 dB), it still pro duces noticeable ILD c hanges in the 4 kHz band. The ∆ ILD pro duced b y the presence of a second listener is exp ected as a result of natural acoustic shadowing. The analysis of ILD around the listener is also imp ortan t in the hemiﬁeld opp osite to the second listener ( i.e ., 180-360 ° ). The auralization metho ds that rely on the full set of loudspeakers to form the sound pressure ( e.g. , Ambisonics) can presen t noise to the side with a free path as a ph ysical ob ject, prev en ting the sound w a v e from forming accordingly at the Chapter 5. Main Results 159 cen ter (sweet sp ot). Although the Iceb erg metho d is partially comp osed by ﬁrst-order Ambisonics, whic h is a metho d that requires all loudsp eak ers combined to form the ap- propriate auralization, the part that VBAP p erforms presen ts the sound only through the indicated quadrant, not requiring the other loudsp eakers to b e activ e. That extends the system’s robustness with a limited n um b er of loud- sp eak ers and frequency limit (not b eing dep endent on the Ambisonics order). 5.4.3.3 Azim uth Estimation Figure 5.30 depicts the fron tal azimuth angles estimated by the May et al . for pink noise inputs of 2.9 seconds. The pink noise is conv olved to the recorded BRIRs. The BRIRs, by its time, w ere recorded utilizing ﬁles generated b y the Iceb erg auralization metho d in a four loudsp eak er setup. The columns in the top row present graphs with the av erage estimated angle to a centered listener accompanied b y a second lister at 50 cm (light blue curve), 75 cm (red curv e), 100 cm (y ello w curve) according to room simulation (denoted b y reverberation time). The shaded area corresp onds to the standard deviation. The top- left graph also presen ts the estimated angles to the real loudsp eaker condition without virtualization (blue dotted line). Finally , the b ottom ro w graphs sho w the diﬀerences b etw een the estimated azim uth angle and the target angle (the estimated error). Chapter 5. Main Results 160 Figure 5.30: T op line = Estimate d lo c alization err or with pr esenc e of a se c ond listener; b ottom line = diﬀer enc e to r efer enc e. Columns depicts diﬀer ent R Ts, line c olors diﬀer ent se c ond listener p ositions. According to the mo del’s results, this diﬀerence reveals that the sound created via the Iceb erg metho d and virtualized via a four-loudsp eaker setup gives consisten t lo calization cues even with a side listener inside the ring in the describ ed p ositions. The median error is 9.9 º , and the standard deviation is 8.8 º . On the other hand, the distribution of these diﬀerences made clear that the setup of four loudsp eakers has more diﬃculty accurately presen ting the lo calization cues b et ween the frontal loudsp eak ers, reaching up to 27 º of mismatching to these p ositions (45 º &315 º ). This result is in line with sim ulations from Grimm et al. [ 97 ]. The v alues obtained with the low n um b er of sp eakers utilized (four) are in line with the exp ected in the literature [ 97 ] with a equiv alent pattern [ 84 ]. Although it can raise a ﬂag for exp erimen ts needing a more precise lo calization represen- tation, the Iceb erg metho d can improv e simple setups’ realism. Accordingly , it needs to b e thoroughly inv estigated considering sub jectiv e listening tests, esp ecially in the lateral angles. Chapter 5. Main Results 161 Figure 5.31 presen ts the absolute diﬀerences b et ween the estimated angles of arriv al for the cen tered p osition alone and the cen tered accompanied by a second listener at 50 cm, 75 cm, and 100 cm in the three sim ulated rooms tested. These diﬀerences reﬂect the estimated inﬂuence of having the second listener inside the ring. Figure 5.31: A bsolute diﬀer enc e to tar get in estimate d lo c alization c onsidering the pr esenc e of a se c ond listener and the r everb er ation time. The av erage error presen ts a sligh t increase with the proximit y of the second listener. How ever, the eﬀect is less perceiv able in mo derate R T. That fact suggests that the acoustic shadow is presen t. An ANOV A analysis of the v ariance of the estimated absolute errors b et ween the R T and KEMAR p osition groups was prop osed. F or the distribution with 8 degrees of freedom and a n um b er of observ ations equal to 30, the tabulated v alue of the F Distribution of Snedecor on ( p =0.05) is equal to 2.26. Thus, v alues greater than the tabulated F accept the null h yp othesis that there is no signiﬁcan t diﬀerence b et ween the means of the absolute error of the groups of angles H 0 : µ i = µ j . F rom the analysis, the results of the F statistic (presen ted in T able 5.2 ) H 0 is accepted in all groups. Chapter 5. Main Results 162 T able 5.2: One wa y anov a, columns are absolute diﬀerence b et w een estimated and reference angles for diﬀerent KEMAR p ositions and R Ts. Source SS df MS F Prob F Columns 746.5 8 93.3142 1.3755 0.2062 Error 21980.9 324 67.8423 T otal 22727.4 332 Therefore, there is no statistical diﬀerence b et ween the KEMAR p ositions for an y of the ev aluated R Ts. That suggests the metho d’s stability in this setup, ev en with a second listener considering the rev erb erations and p ositions tested. The mo del has diﬃculty estimating the extreme lateral lo cations (90 º and 270), as ev en the actual loudsp eakers could not reac h this estimation. A comparison b et w een the estimated angles with a listener alone in the center p osition ac- quired only with actual loudsp eak ers (without virtualization) and the listener in the center accompanied by a second listener acquired from virtualized ﬁles with the Iceb erg mo del is presented in Figure 5.32 . Figure 5.32: Estimate d err or to R T=0 c onsidering the estimation of r e al loud- sp e akers as b asis. The data analyzed in this section suggests that b y observing the indicated p o- Chapter 5. Supplemen tary T est Results 163 sitions where the estimated diﬀerence can b e signiﬁcant, although comparable to similar methods listed in T able 2.2 experiments with equiv alent require- men ts (e.g., Chapter 4 ) can b eneﬁt from applying the Iceb erg metho d. The metho d will fairly repro duce sounds with the presence of a second listener and increase the sense of immersiveness while repro ducing spatialized sound with only four loudsp eak ers. Sub jective tests are needed to inv estigate further the system’s spatial rendering p erformance. 5.5 Supplemen tary T est Results A concern about virtualization pro cesses is ho w reliable they are when an extra la y er of signal pro cessing is added to the exp erimen tal setup [ 97 , 213 ]. That is, ho w the sound acquisition through a hearing device microphone and its signal pro cessing would b e aﬀected b y virtualization relativ e to a simple loudsp eak er repro duction or the real life situation [ 98 ]. This section describes a comparison of the binaural cues with and without hearing aids. The RIRs were collected in the same p ositions as presen ted in Section 5.4.1 and Section 5.4.2 . F urther, inspired in the study of Simon et al. [ 276 ] the robustness of the virtualization setup outside the sw eet sp ot was ev aluated. Oticon Opn S1 Mini RITE hearing aids with op en domes were coupled to eac h ear of the HA TS manikin (See Figure 5.33 ). Mo dern hearing devices lik e these presen t a series of signal pro cessing features that can aﬀect the analysis dep ending on the brand or mo del. In order to ensure compatibility of the results to other devices, sp eciﬁc features were not enabled. The devices were programmed in the ﬁtting softw are to comp ensate for the hearing loss of the N3 mo derate standard audiogram [ 35 ]; its b eamformer sensitivity w as set to omnidirectional, and the noise reduction set to oﬀ. The hearing level of the audiogram is presen ted in T able 5.3 . The op ened domes were c hosen to set the Chapter 5. Supplemen tary T est Results 164 virtualization system’s most diﬃcult signal mix condition. The signal play ed through the system is not atten uated as the ear is not o ccluded. The ampliﬁed signal from the hearing device is receiv ed at the eardrum (microphone) 8.1 ms after the original signal. Figure 5.33: HA TS we aring the Otic on Op en-S 1 Mini RITE. T able 5.3: Hearing Lev el in dB according to the prop osed Standard Audiograms for the Flat and Mo derately sloping group [ 35 ]. N º Category F requency 250 375 500 750 1000 1500 2000 3000 4000 6000 N3 Mo derate 35 35 35 35 40 45 50 55 60 65 5.5.1 Cen tered P osition (Aided) The system was tested b y measuring the BRIR with the listener (manikin) in the cen ter. The calculated binaural cues are presen ted at incidence angles separated by 15 º at 1.35 m from the listener. Chapter 5. Supplemen tary T est Results 165 5.5.1.1 In teraural Time Diﬀerence The ITD results (See Figure 5.34 ) in the aided condition w ere very similar to the unaided condition 5.4.1.1 .The maxim um absolute diﬀerence found is 170 µ s, represen ting a mismatch around 15 º on the giv en angle (same as previously measured unaided ITD diﬀerence). The calculated av erage diﬀerence is 67 µ s, represen ting a diﬀerence of around 7 º in lo calization. Figure 5.34: Inter aur al Time Diﬀer enc e under 1 kHz at the pr op ose d Ic eb er g metho d as a function of azimuth angle for a HA TS Br ¨ uel and Kjær TYPE 4128-C in the horizontal plane we aring a p air of he aring aids in omnidir e ctional mo de (blue line). The r e d line is the ITD r esults with r e al loudsp e akers (without virtualization). The black line r epr esents the analytic al ITD values. Figure 5.34 depicts higher diﬀerences concentrated on sp eciﬁc regions: angles around ± 30 º to front and back. The similarit y to the unaided condition is exp ected as the devices are not blo cking the sound w a ve or increasing the path more in one ear than the other ( i.e. , there is only a static group dela y added to the system). Therefore the sound reaches the HA TS microphones with hearing aids prop ortionally as in the previous unaided condition. Chapter 5. Supplemen tary T est Results 166 5.5.1.2 In teraural Lev el Diﬀerence Figure 5.35 shows the eﬀect on the ILD to the centered p osition. Although in higher o cta v e bands (8 and 16 kHz) the diﬀerence b et ween ILD on an aided HA TS with real loudsp eakers to the aided HA TS utilizing the Iceb erg metho d is a bit larger than unaided (See Figure 5.18 ), the eﬀect on the 2 kHz band is considerably smaller. That can b e due to the added delay in the signal, which can diminish the p ossible comb ﬁltering by the Iceb erg metho d at this sp eciﬁc frequency region, esp ecially for the angles b et ween tw o loudsp eak ers (where there is a larger distance b et ween real loudsp eakers and the virtualized sound source). Figure 5.35: Inter aur al L evel Diﬀer enc es as a function of o ctave-b and c enter fr e- quencies. Angles ar ound the c entr al p oint. 5.5.1.3 Azim uth Estimation Figure 5.36 presen ts the angles estimated using the May and Kohlrausch’s [ 182 ] mo del for ﬁles auralized with the Iceb erg method and virtualized o v er the four- loudsp eak ers setup (blue curve), angles estimated for binaural ﬁles acquired Chapter 5. Supplemen tary T est Results 167 without virtualization with real loudsp eak ers (red curve), and the reference (dotted blac k). The presen ted mo del’s result is in line with the analysis of the binaural cues supporting the assumption of the worst lo calization accuracy around ± 30 º (30 º and 330 º in Figure 5.21 ). Some diﬀerences bigger than the standard deviation are noted b etw een diﬀerent R T, esp ecially close to the lateral angles (90 º and 270 º ). The results suggest that the added reverberation can negatively impact on lo calization accuracy . Figure 5.36: Ic eb er g metho d: Estimate d azimuth angle (mo del by May and Kohlr ausch [ 182 ]), HA TS c enter e d and aide d. Also, according to the ﬁgure, the virtualized sound tends to ha v e more diﬃcult y separating from the frontal angle (0 º ) denoted by the ﬂat lines from 30 º up to 340 º . Figure 5.37 depicts the b o xplot diagram of the absolute diﬀerences group ed b y R T. An ANOV A analysis of the v ariance of the estimated absolute errors b et ween the R T and p osition groups was prop osed. F or the distribution with 2 degrees of freedom and a num b er of observ ations equal to 30, the tabulated v alue of the F Distribution of Snedecor on ( p =0.05) is equal to 3.32. Chapter 5. Supplemen tary T est Results 168 Figure 5.37: Absolute diﬀer enc e to tar get in estimate d lo c alization in aide d c ondi- tion in aide d c ondition c onsidering diﬀer ent R Ts. Th us, v alues greater than the tabulated F den y the null hypothesis that there is no signiﬁcant diﬀerence b etw een the means of the absolute error of the groups of angles H 0 : µ i = µ j . F rom the analysis, the results of the F statistic (presen ted in T able 5.4 H 0 is rejected and the h yp othesis alternative H 1 : µ i ̸ = µ j is accepted ( F =5.68). T able 5.4: One wa y anov a, columns are absolute diﬀerence b et w een estimated and reference angles for diﬀerent positions and R Ts. Source SS df MS F Prob F Columns 520.77 2 260.386 5.68 0.0045 Error 4947.29 108 45.808 T otal 5468.06 110 T o identify in which sets of means the discrepancy is statistically signiﬁcan t, T ukey’s multiple comparison test was p erformed and the result is shown in Figure 5.38 . Chapter 5. Supplemen tary T est Results 169 Figure 5.38: T ukey test to c omp ar e me ans in aide d c ondition. Gr oup me an in R T 1.1s pr esente d signiﬁc ant diﬀer enc e fr om me an in gr oup R T 0.0 s This reﬂected a trend tow ards an increase in the estimated lo cation error when there is signal ampliﬁcation through the hearing aid, which did not o ccur in the similar condition without the aid seen in Section 5.4.1.3 . 5.5.2 Oﬀ-cen ter P ositions (Aided) The listener was mo v ed from the center p osition to simulate a displaced test participan t w earing hearing aids. The BRIRs w ere measured in the p ositions describ ed in Section 5.3.3 , and the results w ere analyzed in this section. 5.5.2.1 In teraural Time Diﬀerence Figure 5.39 presen ts the ITD for the diﬀerent angles around the listener as the listener is displaced to diﬀerent p ositions according to the sp eciﬁed grid. Chapter 5. Supplemen tary T est Results 170 Figure 5.39: Inter aur al Time Diﬀer enc es as a function of o ctave-b and c enter fr e- quencies. Angles ar ound the c entr al p oint. when it mo v es 5 cm to fron t it starts to blur more the correct ITD for the fron tal angles. Especially around ± 45 degrees in the fron tal hemisphere where the ITD indicates that the sound is coming from 90 º or 0 º angles. F urther than this distance, also the rear ± 45 are aﬀected, p oin ting to the break of the panning illusion. Compared to the unaided condition (Section 5.4.2.1 ), this condition is slightly more sensitiv e to displacemen ts Although the ITD analysis is angle-dep enden t, the results in the T able 5.5 indicates that the displacement limitations can b e o v erall mapp ed to indi- cate the maximum distance. T able 5.5 sho ws the maxim um ITD diﬀerence according the displacement. Although the ITD analysis is angle-dep endent, the results The maxim um v alue diﬀerence can indicate the tendency of the ITD shap e to b e squared, representing no virtualization. That ma y help to iden tify displacemen t limitations can b e ov erall mapp ed to indicate the maxi- m um distance. The squared b eha vior o ccurs when the sound of one individual sp eak er is the main pressure con tribution, arriving to o early to one of HA TS Chapter 5. Supplemen tary T est Results 171 ears b ecause of the HA TS’s p osition. T able 5.5: Maximum ∆ITD relativ e to the center p osition according to displacement, lines refer to lateral displacement and columns refer to fron tal displacement. R T = 0.0 s Displacemen t [cm] 0.0 2.5 5.0 10.0 0.0 0 [ µ s] 88 [ µ s] 182 [ µ s] 374 [ µ s] 2.5 233 [ µ s] 239 [ µ s] 364 [ µ s] 472 [ µ s] 5 317 [ µ s] 353 [ µ s] 399[ µ s] 566 [ µ s] R T = 0.5 s Displacemen t [cm] 0.0 2.5 5.0 10.0 0.0 0 [ µ s] 97 [ µ s] 229 [ µ s] 386 [ µ s] 2.5 213 [ µ s] 157 [ µ s] 313 [ µ s] 472 [ µ s] 5 317 [ µ s] 282 [ µ s] 389 [ µ s] 566 [ µ s] R T = 1.1 s Displacemen t [cm] 0.0 2.5 5.0 10.0 0.0 0 [ µ s] 140 [ µ s] 299 [ µ s] 341 [ µ s] 2.5 236 [ µ s] 310 [ µ s] 372 [ µ s] 437 [ µ s] 5 283 [ µ s] 372 [ µ s] 380 [ µ s] 520 [ µ s] In this case frontal displacemen ts up to 2.5 centimeters are not presenting the square b eha vior and a maxim um ∆ITD, of 140 µ s (R T= 1.1 s), considering the centered p osition as a reference. Lateral mo v emen ts are more aﬀected, starting to present the squared b eha vior in the transition angles b etw een the rear loudsp eak er and the right angle (230 º ) and the right loudsp eaker and the fron t (310 º ). This pattern seems not b e R T dep endent, whic h is exp ected due to ITD’s nature. 5.5.2.2 In teraural Lev el Diﬀerence Figure 5.40 presen ts the ILD, considering the simulation of an anechoic envi- ronmen t (R T = 0s), on 24 angles around the listener as the listener is displaced to diﬀerent p ositions according to the sp eciﬁed grid. Compared to the normal condition, although it presen ts the same pattern, the aided condition has lesser diﬀerences b et w een ILDs across more angles and frequencies. Chapter 5. Supplemen tary T est Results 172 Figure 5.40: Diﬀer enc e in ILD as a function of azimuth angle for a B & K 4128-C. Ic eb er g metho d, horizontal plane in a 4 loudsp e akers setup (R T = 0.0 s). The diﬀerences w ere also lessened as the R T increased, as can b e seen in Figures 5.41 and 5.42 . This result shows that increasing the reverberation can p ositiv ely aﬀect the ILD error in oﬀ cen ter p ositions (reducing the diﬀerences to the ILD in the center). Figure 5.41: Diﬀer enc e in ILD as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er g metho d on a four-loudsp e aker setup R T = 0.5 s. Chapter 5. Supplemen tary T est Results 173 Figure 5.42: Diﬀer enc e in ILD as a function of azimuth angle for a HA TS Br¨ uel and Kjær TYPE 4128-C in the horizontal plane thr ough a pr op ose d Ic eb er g metho d on a four-loudsp e akers setup R T = 1.1 s. Chapter 5. Supplemen tary T est Results 174 5.5.2.3 Azim uth Estimation Figure 5.43 presents the estimated azim uth angle [ 182 ] according to the p osi- tion of the listener. The diﬀerent R Ts are represented by the line colors in the graphs (blue = 0.0s, red = 0.5s, y ello w = 1.1s). The results demonstrate that the Iceb erg metho d presen t less accuracy in the repro ducing sound in fron tal angles, esp ecially ± 30 ◦ and ± 330 ◦ . The lateral discrepancy is smaller and also noted with real loudsp eak ers, what can imply that the mo del used has some diﬃcult y to assess that region. Figure 5.43: Estimate d fr ontal azimuth angle at diﬀer ent p ositions inside the loud- sp e aker ring as function of the tar get angle (aide d c ondition). Mo del by May and Kohlr ausch [ 182 ]. According to the mo del’s results, to an aided listener, the lo calization error is up to 30 degrees within a frontal or lateral displacement of 5 centimeters. In the case of 10 cm of displacemen t, the virtualization will fail, presen ting the squared b eha vior on the contralateral side. The increase of reverberation tends to main tain the maxim um error magnitude, although increasing the spread to more angles. That means the lateral side close to the loudsp eak er will presen t the sound source p osition less in the desired p osition but more in Chapter 5. Supplemen tary T est Results 175 the loudsp eak er’s physical p osition. Medium reverberations are less aﬀected b y displacemen t, meaning extreme cases should drive extra care with listener p ositioning. Chapter 5. Discussion 176 5.6 Discussion This c hapter prop osed a new h ybrid auralization metho d (Iceb erg) for a vir- tualization setup comp osed of 4 loudsp eak ers at a 1.35 m distance from the cen ter. This setup is a relatively limited one intended as a feasible alternative to the m uc h more exp ensiv e and complicated arrangemen ts prop osed and used in the reviewed literature (See Section 2.3 ). The inno v ation factor of the Iceb erg metho d is the usage of a ro om acoustic parameter, called center time, used to compute the transition p oin t b et ween early and late reﬂections. The Iceb erg’s c hannel mixing and distribution au- tomation are generalized to any RIR collected or con verted in Am bisonics’ ﬁrst order. Implemen ted in MA TLAB, the Iceb erg auralization algorithm can generate .w a v ﬁles that were virtualized in a setup with four loudsp eak ers (90-degree spacing around the listener). Three sim ulated sound scenarios were predeﬁned and sim ulated using acoustic mo deling softw are generating RIRs in Ambisonics format. The setup provided appropriate rev erb eration times even when the listener was aw ay from the cen ter p osition. Regarding binaural cues, in the optimal p osition, the maximum deviation in ITD was 170 µ s, corresp onding to a shift of approximately 15 º for sources around ± 30 º in fron t and back of the centered listener. The considerable distance b etw een loudsp eak ers is the most likely cause of this deviation. In con trast with Chapter 3 , the Iceb erg metho d could not repro duce the ITDs with the same accuracy as VBAP in the sw eet sp ot p osition. How ever, it presen ted a b etter performance than Am bisonics. The high accuracy of VBAP can b e attributed to the n um b er of loudsp eakers, 24, which lessen their ph ysical distance, therefore, its maximum error. How ever, ev en with the bigger n umber Chapter 5. Discussion 177 of loudsp eakers, Am bisonics was truncated at ﬁrst-order, thus not having the b eneﬁt of more sound sources. There w ere also deviations in ILD, mainly at the same angles. Ho w ever, the ILD deviations were most signiﬁcant in the 2 kHz o cta v e band. The actual eﬀects of this diﬀerence in signals that encompass these frequencies should b e further inv estigated in v alidation tests. The ILDs also denoted patterns with b etter representation through VBAP than ﬁrst-order Ambisonics in Sec- tion 3.3.2.2 . The Iceb erg metho d with four loudsp eakers presents a pattern with ILDs closer to actual loudspeakers than pure Am bisonics, but again not as accurate as VBAP in 24 channels. This is characterized mainly to a diﬀerence in the 2 kHz o cta ve band. That needs further in v estigation and consideration in exp erimen ts requiring ILDs accuracy at that frequency band. Ov erall the results for the binaural cues repro duction via the Iceb erg metho d in four loudsp eak ers are b etter than a pure Ambisonics ﬁrst-order but worse than VBAP (considering 24 loudsp eakers). Therefore the Iceb erg metho d can b e considered an option when the num b er of loudsp eakers is limited or the need for a sense of realism is higher. The Iceberg metho d com bines relativ e accuracy with a sense of immersion. The maxim um estimated lo calization uncertaint y w as around 30 degrees to the Iceb erg metho d in the minimal conﬁguration of four loudsp eak ers. The diﬀerent amoun ts of reverberation tested did not impact the results. Although the estimated lo calization was imp erfect, the metho d’s p erformance was in line with similar VBAP implemen tations [ 97 ]. The results w ere similar to the aided condition, with ILDs indicating b etter cue repro duction in the 2 kHz o cta ve band. This improv ement w as not trans- lated in to a b etter-estimated angle, getting ab out the same results. A slight v ariation w as iden tiﬁed, and a statistically signiﬁcant diﬀerence was found b e- t w een diﬀeren t R Ts, esp ecially in the lateral angles. This deviation needs to b e further ev aluated with other mo dels and also with sub jective v alidation, Chapter 5. Discussion 178 esp ecially as the mo del results presented unexpected results in these angles for non virtualized sound sources. A second listener was introduced at the side of the primary listener while main taining the listener in the optimal p osition to sim ulate a condition where there is a need for so cial interaction or presence in a test. In this case, the binaural cues pro vided b y the Iceberg auralization metho d virtualized in a four loudsp eak ers setup were compared to a baseline without virtualization (actual loudsp eak ers). Also, the mo del of Ma y and Kohlrausch [ 182 ] was applied to predict fron tal lo calization accuracy . Three distances w ere tested with the three sim ulated ro oms (diﬀerent R Ts). There was the exp ected acoustical shado w at angles blo c k ed by the second listener but not to the remaining sound source lo cations around the listener. That can b e considered a measure of rendering robustness; the second listener did not break the virtualization of binaural cues by scram bling the sound pressure summation. Regarding sub-optimal p ositions to unaided HA TS, the Section 5.4.2 presented surprising results. The the virtualized eﬀect was aﬀected diﬀeren tly to the dis- placed HA TS according to the amount of R T. In the dry R T (0 s) displacemen ts up to 5 cm when mo ving forw ard did not present the undesired eﬀect. The mild rev erb eration (0.5 s) got the undesired eﬀect only with 10 cm from the cen ter and the large R T up to 5. The large rev erb eration (1.1 s) w as hea vily af- fected presenting the squared b eha vior on all oﬀ center displacemen ts. Lateral mo v emen ts w ere aﬀected in similar wa y for all the R Ts tested, presen ting the eﬀect on displacemen ts further than 3.5 cm from center. The ILDs presented the shadowing eﬀect as exp ected, increasing the distortions with the distance and reducing it with the increase of rev erb eration. The combination came to an estimated azim uth angle in practice not dep enden t of R T and an error oﬀ ≈ 30 º with displacemen ts up to 2.5 cm. As the displacement from cen ter in- creases the maximum estimated error increases but also mo ved, meaning that Chapter 5. Discussion 179 the virtualization is aﬀected but still would pro duce a virtualized eﬀect. In the aided condition (Section 5.5.2 ) the oﬀ-center ITDs indicated a maxim um fron tal displacemen t on the Iceb erg metho d under 10 cm from the center in unaided condition and under 5 cm in aided condition. The ILDs were also im- pacted by distance, but to a low er extent, while the diﬀeren t R T aﬀected less the ITDs and more the ILDs. The I LDs presen t a smearing b eha vior, low ering the error, with higher R T. That b eha vior suggests an equiv alen t comp ensation on the error predicted b y the model. Within that distance limit, the maxim um error predicted w as around 30 degrees for all R Ts, agreeing at the end with the non-aided condition. When the listener is a w ay from the center, the Iceb erg metho d virtualized using the four-loudsp eak er setup increases the deviations in binaural cues compared to the cues at those sound-source angles, with a near complete loss of gradient in cues ( i.e. , either zero or extreme v alues) o ccurring when the listener was 10 cm in front of the center p osition. ITDs for this condition rev ealed minor diﬀerences across the tested rev erb eration conditions. The v alues indicated that the ﬁles created by the metho d and repro duced on the four loudsp eak- ers conﬁguration pro duce similar ITDs as the baseline condition, having the w eek p oin t in the 30 degrees. The absolute ITD v alues align with similar ex- p erimen ts found in the literature for VBAP conﬁgurations without a second listener [ 241 ]. The acoustic shadow is indicated by an increase of the Delta ITD diﬀerence around 270 degrees (left side), esp ecially to the closest p osition, similar to the ﬁnding with pure VBAP Chapter 3 . Also, the diﬀerence in ILDs (∆ILDs) sho w ed that the presence is well captured in higher frequencies. All R Ts conditions and positions demonstrated to capture diﬀerences in ILD to the left side of the mannequin. A ∆ILD is exp ected as a result of natural acoustic shado wing pro duced by the presence of a second listener. The b eneﬁt of the Iceb erg metho d is that the VBAP is not limited in frequency b y aliasing in the higher frequencies and do es not require all to be activ e loudspeakers simultane- Chapter 5. Discussion 180 ously as pure Ambisonics. The wa y the division is done in the Iceb erg metho d brings the Ambisonics’ responsibility to the time domain, deﬁning the metho d as more natural to physical presence b et w een loudsp eakers and the listener. That extends the system’s robustness with a limited num b er of loudsp eakers and frequency limit (not b eing dep endent on the Am bisonics order). The predicted error for a second listener compared to the Iceb erg baseline condition (listener cen tered alone). The method presen ts a deviation of around 10 degrees in all R Ts when the second listener is in the shoulder-to-shoulder situation, the closest p osition (50 cm). As the diﬀerence is at the second listener p osition, it is p ossible to argue that the Iceb erg energy balance is adv an tageous, not entirely dep ending on the four loudsp eak ers’ summation. Therefore, compared to VBAP or Am bisonics, the Iceberg method is a suitable option in terms of lo calization that adds the b eneﬁt of immersiveness in a mo dest hardw are. 5.6.1 Sub jectiv e impressions The auralization was compared b y the author and his supervisor to VBAP and Am bisonics in sub jectiv e listening sessions. The experiment was not p erformed systematically , as the Co vid-19 emergency rules imp osed a series of restrictions and these impressions are the initial opinion. The sp eech signals w ere auralized via Iceb erg, VBAP , and Ambisonics and reproduced in an anechoic ro om. The ro oms were simulated in Odeon soft w are v.12 with reverberation time equiv a- len t to 0.5 and 1.3 seconds. Both agreed that the sound direction from VBAP is easily identiﬁable but with p o or immersiveness, as all the rev erb eration came from a sp eciﬁc side (2 loudsp eak ers). Am bisonics oﬀered a more immersive ex- p erience with all loudsp eak ers activ e simulta neously , but the lo calization was Chapter 5. Discussion 181 v ery diﬃcult; a ”blurred” p osition seems to be a trending description. The Ice- b erg system provided a sound lo calization close to VBAP while main taining the immersiveness. The Iceb erg metho d, up on a trade-oﬀ on spatial lo calization, allo ws for the repro duction of sounds that can b e easily manipulated regarding sound-source direction, sound pressure lev el, reverberation time, and simultaneous sound sources. That makes it p ossible to create or repro duce sp eciﬁc virtual sound scenarios with high repro ducibility . Thus, researc hers can conduct auditory tests with increased ecological v alidit y in spaces that usually do not count with n umerous loudsp eakers, as is common in clinics, universities, and small companies. Not withstanding these b eneﬁts, some limitations challenged the metho d with a small n um b er of loudsp eak ers. These limitations imp ose some constrain ts on its use in terms of the spatial lo calization of sound sources. 5.6.2 Adv an tages and Limitations A fundamen tal adv antage of the prop osed Iceberg metho d is the minim um n um b er of loudsp eak ers required (four). F urthermore, its compatibilit y with an y RIR in low- or high-order Ambisonics already collected. That is p ossi- ble as an RIR in HOA can b e easily scaled do wn to ﬁrst-order Ambisonics and its sound spatial prop erties comp osed with an y given sound via the al- gorithm [ 237 ]. F urthermore, an essential part of the metho d’s deﬁnition and an additional adv an tage is the automation of the deﬁnition of the amoun t of energy from the RIR that corresp onds to the sp eciﬁc auralization tec hnique. That automation p erformed by the Central Time ro om acoustics parameter allo ws a smo oth transition b et ween the direct sound and early reﬂections p or- tion and the late reﬂections of the RIR, resulting in a p oten tially more natural sound while maintaining con trol ov er the incidence direction. Chapter 5. Concluding Remarks 182 The auralization metho d is designed for a virtualization setup of 4 loudsp eak- ers. How ever, it is p ossible to use it with more loudspeakers, reducing the ev en tual limitations on spatial accuracy . F urthermore, although not within the scop e of this thesis, the metho d, using the VBAP tec hnique, would allo w the p ossibilit y for dynamically moving sound sources around the listener. 5.6.3 Study limitations and F uture W ork The initial aim of this study was to in v estigate the correlation b et ween ob jec- tiv e parameters related to spatial sound, particularly those psyc hoacoustically motiv ated by auralization metho ds, and sub jective resp onses to these meth- o ds. How ever, due to the Co vid-19 pandemic, tests with participan ts w ere not possible due to the risk of infection as mandated by gov ernment rules. As a result, the study is limited to verifying ob jectiv e parameters. Therefore, section 5.5 was included to explore the system capabilities within a relev ant con text for hearing researc h, although without sub jective tests inv olving par- ticipan ts. In future work, structured v alidation with participan ts w ould be of v alue to the ﬁeld, allowing for adjustmen ts and the measurement of the eﬀectiv eness of this metho d in real-world auditory tests. Additionally , future implemen tations of this metho d could include improv ements such as guided sound source mov ements around the listener, with sim ultaneous up dates of VBAP and Ambisonics w eigh ts deﬁned b y time constan ts, and the ability to pan with in tensity using techniques such as V ector-Based In tensity Panning (VBIP), whic h could b e tailored to speciﬁc cases with diﬀeren t loudspeaker ar- rangemen ts or stimuli frequency con tent and p oten tially merged with VBAP dep ending on the type of stimuli and sp eciﬁc frequencies. Chapter 5. Concluding Remarks 183 5.7 Concluding Remarks T ests that require hearing aids can b e p erformed, considering some constraints, utilizing the prop osed Iceb erg metho d. These tests aimed to verify the impact of the auralization metho d through a simple setup (four loudsp eak ers) to the virtualized spatial impression by analyzing the binaural cues and their devi- ations from actual sound-sources loudsp eak ers. This is an imp ortan t step, although not discounting the imp ortance of v alidation with test participants. T o a centered listener, the veriﬁed deviation in binaural cues presented limi- tations of around 30 º degrees in lo calization (through ITD) with reasonably matc hing ILDs. The system’s reliabilit y is compromised as the listener is mo v ed out from the sweet sp ot, but less so than when unaided, p ossibly due to com b ﬁltering or the addition of compression into the signal path. Small mo v emen ts up to 2.5 cm generated errors within a JND, meaning they likely w ould not be p erceiv ed as distortions or artifacts. Th us, tests with p eople that require sound sources p ositioned in spaces larger than 30 º can beneﬁt from this Iceb erg metho d that incorp orates spatial a w areness and immersiv eness. Chapter 6 Conclusion Throughout the course of this study , a new auralization metho d called Iceb erg w as conceptualized and compared to well-kno wn metho ds, including VBAP and ﬁrst-order Ambisonics, using ob jective parameters. The Iceberg metho d is inno v ativ e in that it uses TS to ﬁnd the transition p oin t betw een early and late reﬂections in order to split the Am bisonics impulse resp onses and adequately distribute them. VBAP is resp onsible for lo calization cues in this prop osed metho d, while Am bisonics con tributes to the sense of immersion. In the cen ter p osition, the Iceb erg metho d was found to b e in line with the lo calization accuracy of other metho ds while also adding to the sense of immersion. Also, a second listener added to the side did not present undesired eﬀects to the auralization. Additionally , it was found that virtualization of sound sources with Ambisonics can implicate limitations on a participant’s b eha vior due to its sw eet sp ot in a listening-in-noise test. How ever, these limitations can b e circum v en ted and extended to Iceb erg, resulting in sub jective resp onses that align with b ehavioral p erformance in sp eec h in telligibilit y tests and increasing the lo calization accuracy . 184 Chapter 6. Iceb erg 185 6.1 Iceb erg In the previous chapter, w e conducted a thorough analysis comparing the p erformance of the Iceb erg metho d to the results presented in Chapter 3 and the relev an t literature in Chapter 2 . This comparison included ev aluating the Iceb erg metho d’s p erformance at the cen ter p osition, at v arious oﬀ-center p ositions, and in the presence of a second listener. The results show ed that the Iceb erg metho d w as able to provide the designed ov erall reverberation times of 0 seconds, 0.5 seconds, and 1.1 seconds across all measured p ositions. Additionally , the diﬀerences b et ween the reverberation times were b elow the JND 5% threshold. When comparing v alues to the ones obtained with a HA TS in the center with- out virtualization, it is noteworth y that the Iceberg method uses 20 few er loud- sp eak ers than this VBAP and Ambisonics conﬁguration. The Iceb erg metho d exhibited low er accuracy in repro ducing ITDs at the sweet sp ot p osition than VBAP , but it p erformed b etter than ﬁrst-order Ambisonics. W e also observ ed detrimen tal deviations in ILDs, with v alues exceeding 4 dB, particularly at the same angles as the ITDs. The most signiﬁcant ILD deviations o ccurred in the 2 kHz o cta v e band, which could inﬂuence the perceived localization accuracy . F urther inv estigation through v alidation tests is necessary to fully understand the exten t of these diﬀerences b et w een the metho ds. Regarding o v erall binaural cue repro duction, the Iceb erg metho d using four loudsp eak ers w as sup erior to pure ﬁrst-order Ambisonics but less accurate than VBAP with 24 loudsp eak ers. The Iceb erg presented a maximum estimated lo calization er- ror of around 30 degrees for angles plus minus 40 degrees from the center while the listener is cen tered. Although this magnitude matc hes the similar metho ds in T able 2.2 , the binaural cues were p ointed to a lo wer estimate (around 15 degrees). Therefore further studies with p erceptual ev aluation are highly en- couraged. In the Aided condition, w e observed that the ITD w as not aﬀected Chapter 6. Iceb erg 186 at the center p osition, and the ILD was closer to the VBAP condition with 24 loudsp eak ers. Ho wev er, this impro v ement w as not reﬂected in the mo del estimate, which still sho w ed maximum deviations of around ± 30 ◦ . A t oﬀ-cen ter positions, the Iceberg method sho wed slight v ariations in lo caliza- tion estimates, particularly in lateral angles, which were found to b e statisti- cally signiﬁcant when comparing diﬀeren t reverberation times. This v ariation is likely due to the metho d’s spatial limitation, known as the sweet sp ot, as discussed in the Chapter 2 . When the reverberation time w as 0 s or 1.1 s, the sw eet sp ot was more limited in terms of displacement from the cen ter (up to 3.5 cm). This means that these conditions were more prone to breaking virtualization when sound sources were virtualized on the contralateral side of the displacemen t. In contrast, the mild condition (0.5 s) maintained this up to 5 cm. A sw eet sp ot is generally smaller in ﬁrst-order Ambisonics compared to VBAP with a 24 loudsp eaker setup, as iden tiﬁed in Chapter 3. Ho w ev er, it is imp ortan t to note that ob jectiv e parameters may not alw a ys corresp ond directly to sub jective impressions. Despite this, the Iceb erg metho d with four loudsp eak ers was found to p erform similarly to VBAP (with 24 loudsp eakers) in terms of binaural cue repro duction. The mo del estimates also show ed that, within a com bined displacement of up to 3.5 cm in b oth lateral and fron tal directions, the maximum error w ould b e less than 30 degrees, indicating the presence of virtualization ( i.e. , the sound b eing physically comp osed of more than just the nearest sp eaker). It is therefore recommended to ev aluate this deviation further using other mo dels and sub jectiv e v alidation tests. The results in Section 5.4.3.3 , the condition with the listener in the center, sho w ed that the presence of a second listener did not negatively aﬀect the p erformance of the Iceb erg metho d in all conditions of reverberation tested. No statistical diﬀerence in the means of estimated error w as identiﬁed when considering the three R T conditions and the three KEMAR positions. The Chapter 6. General Discussion 187 binaural cues errors follo w ed the same trend as the Alone version, meaning that ITDs p ointed to an error around 15 degrees, but with ILDs having absolute v alues with diﬀerences exceeding 4 dB (JND), which can probably explain the 30 º error estimated by the mo del in the w orst p osition ( i.e. , the angle of the virtualized sound source at ± 45 º ). Based on these results, the Iceb erg metho d can b e viable for virtualization setups with limited loudsp eakers or when a higher sense of realism is desired. 6.2 General Discussion In this work, we explored the use of auralization methods in hearing research as a means of impro ving the ecological v alidit y of acoustic en vironments. The use of virtualized sound ﬁelds has b ecome increasingly p opular in lab oratory tests. Ho w ev er, it is essen tial to understand the limitations of these methods in order to ensure unbiased results [ 97 ]. Our literature review (Chapter 2 ) identiﬁed the need for auralization metho ds that can b e implemented in smaller-scale setups, and our initial ev aluations fo cused on the spatial accuracy of several fundamen tal auralization metho ds, as well as their p otential use in tasks in- v olving m ultiple listeners. A collab orativ e study allo w ed us to test one of these tec hniques with real participants, and our ﬁndings highligh ted both the limita- tions and p oten tial improv ements of using Am bisonics for conducting listening eﬀort tests. Based on this exp erience and our kno wledge of room acoustics and auralization, w e prop osed a new hybrid metho d called Iceb erg, whic h combines the strengths of Am bisonics and VBAP and can b e implemented using just four loudsp eak ers. This prop osed metho d oﬀers a lo w-cost option for auralization that could increase its adoption among researc hers worldwide. In Chapter 3 , the VBAP and Ambisonics auralization metho ds were ob jectively c haracterized and compared in terms of binaural cues for the center and oﬀ- Chapter 6. General Discussion 188 cen ter p ositions. This inv estigation provided a foundation for combining the metho ds and further highlighted the strengths of each tec hnique: lo calization in VBAP and immersiv eness in Am bisonics. Ob jective parameters extracted from BRIRs and RIRs were examined for a single listener and in the presence of a second listener in the ro om. The results show ed that the presence of a second listener did not signiﬁcan tly impact the p erformance of VBAP . A t the same time, Am bisonics was less eﬀectiv e in repro ducing the examined cues, esp ecially with a second listener present. This information w as crucial in developing the prop osed Iceb erg auralization metho d, whic h combines the strengths of b oth VBAP and Ambisonics to create a hybrid metho d suitable for use with simple setups such as four loudsp eakers. The results of the collab orativ e study describ ed in Chapter 4 demonstrate the feasibilit y of using a virtualization method to deliv er a hearing test with a certain lev el of spatial resolution and immersion across diﬀerent ro om sim ula- tions and signal-to-noise ratios. This study suggests that virtualization meth- o ds hav e the p otential to provide realistic acoustic environmen ts for hearing tests, allowing researc hers to v ary the acoustic demands of a task and p o- ten tially impro ve ecological v alidity . Additionally , the signiﬁcant correlation b et w een participants’ sub jective p erception of eﬀort and their sp eec h recogni- tion p erformance highligh ts the imp ortance of considering listening eﬀort in hearing researc h. Ho w ever, the limitations and p otential solutions identiﬁed in this study also highlight the need for further in v estigation into virtualization metho ds in hearing research, including dev eloping new auralization metho ds that address these limitations. In Chapter 5 , we presented the dev elopment of a new auralization metho d called Iceb erg, whic h was designed to be compatible with small-scale virtu- alization setups using only four loudsp eak ers. Previous hybrid metho ds that com bine Am bisonics and VBAP hav e b een developed, but the innov ative as- Chapter 6. General Discussion 189 p ect of the Iceb erg metho d is its approach to handling and combining the dif- feren t metho ds to virtualize sounds while delivering appropriate spatial cues. This feature is ac hiev ed b y iden tifying a transition p oin t in the RIR using the Cen tral Time parameter from the omnidirectional channel of an Ambisonics RIR. This automated pro cess allo ws the user to input any Ambisonics RIR, along with the desired presentation angle(s) and sound ﬁle(s), to b e auralized using the VBAP and Am bisonics metho ds merged into a ﬁnal multi-c hannel .w a v ﬁle for presen tation o v er a four-loudsp eak er system. One of the b eneﬁts of this approac h is that it do es not require any additional parameters, such as those generated by a simulation program, and can b e used with any Am bison- ics RIR, including those in higher-order format that m ust b e con v erted to an appropriate order for the n um b er of loudsp eak ers. Overall, the dev elopment of the Iceb erg metho d illustrates the p oten tial for adapting existing technology to meet the needs of smaller-scale virtualization setups while still deliv ering realistic spatial cues. This approac h could support the broader adoption of au- ralization in hearing researc h and encourage researc hers to utilize virtualized sound ﬁelds in their proto cols. 6.2.1 Iceb erg capabilities The auralization metho d proposed in this work combines the use of Am bisonics RIRs and VBAP to balance the acoustic energy in t wo spatial domains: the p erception of sound lo calization and the p erception of immersion. This results in a ﬁle that captures the c haracteristics of a giv en sound as if it were play ed in the desired environmen t. The metho d can b e repro duced with at least four loudsp eakers but is scalable to a more extensive arra y of loudsp eakers of an y size greater than four, theoretically increasing its eﬃciency . In addition, m ultiple sound sources can b e virtualized and merged at presen tation to create more complex environmen ts. The input to Iceb erg includes Ambisonics RIRs Chapter 6. General Discussion 190 corresp onding to sp eciﬁc source-and-receptor p ositions and the sounds to b e virtualized, preferably recorded in (near) anechoic conditions. The method can pan the source around the listener, as the VBAP comp onen t is indep endent of Am bisonics. Ho wev er, it is recommended that RIRs b e generated for sp eciﬁc angles when using ro om acoustic soft w are to generate the Am bisonics RIRs. One b eneﬁt of this metho d is that it can repro duce sounds ab o v e the cut-oﬀ frequency asso ciated with lo wer-order Am bisonics due to its use of VBAP , whic h is initially not frequency limited [ 241 ]. VBAP is resp onsible for the deliv ery of b oth direct sound and early reﬂections. Additionally , the default prop erties are deﬁned to work with normalized RIRs, enabling the researcher to sp ecify the sound pressure level of the auralized ﬁles. 6.2.2 Iceb erg & Second Join t Listener T esting with a second listener inside the loudsp eak er ring helps illuminate the p oten tial for this virtualization system in diﬀerent tasks and h uman-in teraction situations [ 143 , 202 , 230 , 234 ]. A system that allo ws these tasks and situations needs to deliv er the appropriate sound prop erties for the sound to b e p er- ceiv ed as coming from the intended p osition [ 97 ]. Am bisonics w as sho wn to b e not eﬀectiv e in this test, as the shadow caused by a second listener pre- v en ted higher frequency spatial information from b eing correctly presen ted, distorting the sound ﬁeld (esp ecially in lo w-order Am bisonics). V ector-based solutions can hav e less impact as the sound is physically formed from t w o (or three in 3D setups) loudsp eak ers in the same quadrant. That means that the in terference will happen only at angles where the acoustic shado w of a physical ob ject would naturally in terfere in a non-virtualized repro duction. In Chap- ter 5 BRIRs w ere acquired with ﬁles generated b y the Iceb erg metho d and repro duced via a mo dest setup comp osed of four loudsp eak ers in the presence of a second listener. It could b e observed that it did not disturb the sound Chapter 6. General Discussion 191 ﬁeld, as the (primary) listener in the center p osition received the appropri- ate binaural cues. The system designed to repro duce ﬁles virtualized with Iceb erg metho d managed to perform competitively with systems with more loudsp eak ers rendering pure metho ds (See T able 2.2 ). 6.2.3 Iceb erg: Listener W earing Hearing Aids Adding the p ossibilit y of allowing participants to use hearing devices is an- other crucial step in making auditory tests with auralized ﬁles accessible to more researc hers [ 134 , 144 ]. It has b een observ ed that hearing aid signals can inﬂuence the intelligibilit y and clarity of sp eec h in virtualized sound ﬁelds [ 7 , 97 , 99 , 103 , 161 , 188 , 213 , 276 ]. When the hearing aid signals are not appro- priately aligned with the characteristics of the virtualized sound ﬁeld, listeners ma y struggle to comprehend sp oken w ords or sen tences [ 98 ]. This issue can b e exacerbated when the virtualized sound ﬁeld includes noise or other distrac- tions that can interfere with sp eec h p erception or when the hearing aid signals fail to amplify or enhance the sp eec h signal to an adequate degree [ 98 , 137 ]. Supp ose the hearing aid signals are not correctly capturing the sound ﬁeld and, therefore, not correcting it to the individual needs and preferences of the lis- tener. In that case, the listener ma y exp erience diﬃculty using the virtualized sound ﬁeld comfortably and eﬀectiv ely . Sw ept signals were auralized b y the Iceb erg metho d, pla yed through the system, recorded with a manikin wearing hearing aids, and deconv olved. The resulting BRIRs were analyzed in terms of binaural cues and compared to the same signals from actual loudsp eak ers. The lo calization error was estimated b y May and Kohlrausch’s probabilistic mo del for robust sound source lo calization based on a binaural auditory fron t end. This mo del estimates the lo cation of a sound source using binaural cues such as in teraural lev el diﬀerences and interaural time diﬀerences extracted from the signals received by the t w o ears. By combining these cues in a probabilistic Chapter 6. General Conclusion 192 framew ork, the mo del can robustly estimate the lo cation of the sound source, ev en in noisy or distracting environmen ts. Ev aluation of the mo del suggests its p oten tial for use in practical con texts such as in hearing aids or virtual re- alit y systems. Results obtained using the Iceberg metho d with an aided HA TS sho w ed similar p erformance to the unaided results with the listener p ositioned in the sweet sp ot, indicating suitable p erformance (see Section 5.5 ). 6.2.4 Iceb erg Limitations The virtualization system playing ﬁles auralized with the Iceb erg metho d has b een found to b e less eﬀective outside of the sweet sp ot, as the binaural cues are not correctly rendered. This mismatc h, whic h o ccurs for more than 2.5 cm displacemen ts, can b e mitigated b y keeping the listener centered in the virtu- alized sound ﬁeld. While this is a signiﬁcant limitation, the metho d can still b e applied with simple measures suc h as a mo dest head restraint, reducing the setup requirements compared to other classical metho ds. One ma jor limitation of the Iceb erg metho d is its spatial resolution capabilities. It is recommended for scenarios with a minimum of 30 º of separation b etw een sound sources (it can b e low er if closer to loudsp eak ers, although it should b e c hec k ed for the error distribution). F urthermore, the distance to the sound source should b e equal to the radius of the loudsp eaker array , as Ambisonics and VBAP can- not deﬁne sources inside the array . VBAP can only pan b et w een physical sound sources. These limitations should b e considered when using the Iceb erg metho d to create virtualized sound ﬁelds. Chapter 6. General Conclusion 193 6.3 General Conclusion As computational capacit y increases, using more complex and natural sound scenarios in auditory research b ecomes feasible and desirable. This technology allo ws for testing new features, sensors, and algorithms in controlled condi- tions with increasing realism and ecological v alidity . Ev en clinical tests can b eneﬁt from auralization, allo wing for in vestigations in diﬀerent scenarios with v arying acoustics ( e.g. , in a sp eech-in-noise test). The spatial-cue p erformance of the Iceb erg auralization metho d, repro ducing ﬁles through a system of four loudsp eak ers, is mainly suﬃcien t for these t yp es of tests. It is essential to un- derstand the constrain ts of auralization metho ds, Iceb erg included, whic h are tied to the virtualization setup and should b e c hosen by researchers based on their needs and the av ailable hardw are. How ever, utilizing the Iceb erg, virtu- alization can b e conducted by auditory research groups that cannot aﬀord or house exp ensive anec hoic cham b ers with tens or hundreds of loudsp eak ers and sophisticated hardw are and need more freedom than using headphones. The metho d presented in this w ork serves as an additional to ol for researc hers to consider. 6.4 Main Con tributions In this w ork, we ha v e presented a nov el auralization metho d called Iceb erg, designed to create virtualized sound scenarios for use in auditory research. The main contributions of this work are: 1. The dev elopment of a hybrid auralization metho d that combines t w o psyc hoacoustic virtualization metho ds to balance the energy of an RIR and output a m ulti-c hannel ﬁle for presentation. Chapter 6. Main Contributions 194 2. The implemen tation of an eﬀectiv e, simple, and partially automated au- ralization metho d that allows for the creation of reasonably realistic vir- tualized sound scenarios with a mo dest setup. 3. The exploration of the use and limitations of auralization metho ds in auditory research, including the suggestion that the Iceb erg metho d has the p oten tial to b e a helpful to ol for testing new features, sensors, and algorithms in con trolled conditions with increasing realism and ecological v alidit y . 4. W e researc hed the limitations and feasibility of using Ambisonics in the con text of sp eech intelligibilit y with normal-hearing listeners. 5. Identifying the p oten tial for the Iceb erg metho d to b e applied in a range of practical contexts, including in hearing aids and virtual realit y sys- tems. Bibliograph y [1] Aguirre, S. L. (2017). Implemen ta¸ c˜ ao e a v alia¸ c˜ ao de um sistema de vir- tualiza¸ c˜ ao de fontes sonoras (in p ortuguese). Master, Programa de P´ os- Gradua¸ c˜ ao em Engenharia Mecˆ anica, Universidade F ederal de Santa Cata- rina. (Cite d on p ages 41 and 49 ) [2] Aguirre, S. L., Bramsløw, L., Lunner, T., and Whitmer, W. M. (2019). Spa- tial cue distortions within a virtualized sound ﬁeld caused b y an additional listener. In Pr o c e e dings of the 23r d International Congr ess on A c oustics : in- te gr ating 4th EAA Eur or e gio 2019 , pages 6537–6544, Berlin, Germany . ICA In ternational Congress on Acoustics, Deutsche Gesellschaft f ¨ ur Akustik. (Cite d on p age 94 ) [3] Aguirre, S. L., Seiﬁ-Ala, T., Bramsløw, L., Gra v ersen, C., Hadley , L. V., Na ylor, G., and Whitmer, W. M. (2021). Com bination study 3. h ttp: //hear- eco.eu/combination- study- 3/ . (accessed: 24.11.2021). (Cite d on p age 97 ) [4] Agus, T. R., Akero yd, M. A., Gatehouse, S., and W arden, D. (2009). In- formational masking in young and elderly listeners for sp eec h masked b y sim ultaneous speech and noise. The Journal of the A c oustic al So ciety of A meric a , 126(4):1926–1940. (Cite d on p age 40 ) [5] Ahnert F eistel Media Group (2011). Ease enhanced acoustic sim ulator for engineers. h ttps://www.afmg.eu/en/ ease- enhanced- acoustic- sim ulator- engineers . Last c hec ked on: No v 28, 2021. (Cite d on p age 27 ) [6] Ahrens, A., Marschall, M., and Dau, T. (2017). Measuring sp eec h intelligi- bilit y with sp eech and noise interferers in a loudsp eaker-based virtual sound en vironmen t. The Journal of the A c oustic al So ciety of Americ a , 141(5):3510– 3510. (Cite d on p ages 16 , 42 and 52 ) [7] Ahrens, A., Marsc hall, M., and Dau, T. (2019). Measuring and mo deling sp eec h intelligibilit y in real and loudsp eak er-based virtual sound en viron- men ts. He aring R ese ar ch , 377:307–317. (Cite d on p ages 42 , 53 , 99 and 191 ) 195 BIBLIOGRAPHY 196 [8] Ahrens, A., Marsc hall, M., and Dau, T. (2020). The eﬀect of spatial energy spread on sound image size and sp eech intelligibilit y . The Journal of the A c oustic al So ciety of Americ a , 147(3):1368–1378. (Cite d on p age 42 ) [9] Akero yd, M. A. (2006). The psychoacoustics of binaural hearing. Interna- tional Journal of A udiolo gy , 45(sup1):25–33. (Cite d on p ages 9 and 17 ) [10] Alfandari Menase, D. (2022). Motivation and fatigue eﬀe cts in pupil lo- metric me asur es of listening eﬀort . PhD thesis, Univ ersit y of Nottingham. (Cite d on p age 33 ) [11] Algazi, V. R., Duda, R. O., and Thompson, D. M. (2004). Motion-track ed binaural sound. Journal of the Audio Engine ering So ciety , 52(11):1142– 1156. (Cite d on p age 23 ) [12] Alhanbali, S., Daw es, P ., Millman, R. E., and Munro, K. J. (2019). Mea- sures of Listening Eﬀort Are Multidimensional. Ear and he aring . (Cite d on p ages 51 and 118 ) [13] Alp ert, M. I., Alp ert, J. I., and Maltz, E. N. (2005). Purc hase o ccasion inﬂuence on the role of m usic in advertising. Journal of business r ese ar ch , 58(3):369–376. (Cite d on p age 8 ) [14] Arau-Puchades, H. (1988). An improv ed rev erb eration formula. A cta A custic a unite d with A custic a , 65(4):163–180. (Cite d on p age 34 ) [15] Archon tis P olitis (2020). Higher Order Ambisonics (HOA) library. (Cite d on p age 107 ) [16] Arlinger, S. (2003). Negative consequences of uncorrected hearing loss—a review. International Journal of Audiolo gy , 42(sup2):17–20. (Cite d on p ages 1 and 55 ) [17] Asp¨ ock, L., P ausc h, F., Stienen, J., Berzb orn, M., Kohnen, M., F els, J., and V orl¨ ander, M. (2018). Application of virtual acoustic en vironmen ts in the scop e of auditory research. In XXVIII Enc ontr o da So cie dade Br asileir a de A c ´ ustic a, SOBRAC, Porto Ale gr e, Br azil . SOBRAC. (Cite d on p ages 16 and 42 ) [18] Atten b orough, K. (2007). Sound Pr op agation in the Atmospher e , pages 113–147. Springer New Y ork, New Y ork, NY. (Cite d on p age 10 ) [19] Baldan, S., Lac hambre, H., Delle Monache, S., and Boussard, P . (2015). Ph ysically informed car engine sound synthesis for virtual and augmented en vironmen ts. In 2015 IEEE 2nd VR Workshop on Sonic Inter actions for Virtual Envir onments (SIVE) , pages 1–6. IEEE. (Cite d on p age 20 ) BIBLIOGRAPHY 197 [20] Barron, M. (1971). The sub jective eﬀects of ﬁrst reﬂections in concert halls—the need for lateral reﬂections. Journal of Sound and Vibr ation , 15(4):475–494. (Cite d on p age 37 ) [21] Barron, M. and Marshall, A. (1981). Spatial impression due to early lateral reﬂections in concert halls: The deriv ation of a physical measure. Journal of Sound and Vibr ation , 77(2):211–232. (Cite d on p ages 9 , 37 and 38 ) [22] Bates, E., Kearney , G., F urlong, D., and Boland, F. (2007). Localization accuracy of adv anced spatialisation tec hniques in small concert halls. The Journal of the A c oustic al So ciety of Americ a , 121. (Cite d on p ages 47 , 49 and 53 ) [23] Benesty , J., Sondhi, M., and Huang, Y. (2008). Springer Handb o ok of Sp e e ch Pr o c essing . Springer Handb o ok of Sp eech Pro cessing. Springer- V erlag Berlin Heidelb erg. bibtex: Benesty2008. (Cite d on p age 24 ) [24] Berkhout, A. J. (1988). a holographic approac h to acoustic control. Jour- nal of the A udio Engine ering So ciety , 36(12):977–995. (Cite d on p age 31 ) [25] Berkhout, A. J., de V ries, D., and V ogel, P . (1993). Acoustic con trol b y w av e ﬁeld syn thesis. The Journal of the A c oustic al So ciety of Americ a , 93(5):2764–2778. (Cite d on p age 31 ) [26] Bertet, S., Daniel, J., Parizet, E., and W arusfel, O. (2009). Inﬂuence of microphone and loudsp eaker setup on p erceiv ed higher order ambisonics repro duced sound ﬁeld. Pr o c e e dings of A mbisonics Symp osium . cited By 3. (Cite d on p age 31 ) [27] Bertet, S., Daniel, J., Parizet, E., and W arusfel, O. (2013). In vestigation on lo calisation accuracy for ﬁrst and higher order am bisonics repro duced sound sources. A cta A custic a unite d with A custic a , 99:642 – 657. (Cite d on p ages 31 and 77 ) [28] Bertoli, S. and Bo dmer, D. (2014). No v el sounds as a psychoph ysiological measure of listening eﬀort in older listeners with and without hearing loss. Clinic al Neur ophysiolo gy . (Cite d on p age 50 ) [29] Berzb orn, M., Bomhardt, R., Klein, J., Rich ter, J.-G., and V orl¨ ander, M. (2017). The IT A-T o olb o x: An Op en Source MA TLAB T o olb o x for Acoustic Measuremen ts and Signal Pro cessing. In 43th A nnual German Congr ess on A c oustics, Kiel (Germany), 6 Mar 2017 - 9 Mar 2017 , v olume 43, pages 222–225. (Cite d on p ages 60 , 107 and 138 ) [30] Best, V., Kalluri, S., McLachlan, S., V alentine, S., Edw ards, B., and Carlile, S. (2010). A comparison of cic and bte hearing aids for three- dimensional localization of sp eec h. International Journal of Audiolo gy , 49(10):723–732. (Cite d on p age 42 ) BIBLIOGRAPHY 198 [31] Best, V., Keidser, G., Buc hholz, J. M., and F reeston, K. (2015). An exam- ination of sp eec h reception thresholds measured in a sim ulated reverberant cafeteria environmen t. International Journal of Audiolo gy . (Cite d on p ages 42 and 52 ) [32] Best, V., Marrone, N., Mason, C. R., and Kidd, G. (2012). The inﬂuence of non-spatial factors on measures of spatial release from masking. The Journal of the A c oustic al So ciety of A meric a , 131(4):3103–3110. bibtex: b est2012. (Cite d on p age 33 ) [33] Bidelman, G. M., Da vis, M. K., and Pridgen, M. H. (2018). Brainstem- cortical functional connectivity for sp eech is diﬀerentially challenged b y noise and reverberation. He aring R ese ar ch . (Cite d on p age 50 ) [34] Bigg, G. R. (2015). The scienc e of ic eb er gs , page 21–124. Cambridge Univ ersit y Press. (Cite d on p age 128 ) [35] Bisgaard, N., Vlaming, M. S. M. G., and Dahlquist, M. (2010). Stan- dard audiograms for the iec 60118-15 measuremen t pro cedure. T r ends in A mpliﬁc ation , 14(2):113–120. (Cite d on p ages 163 and 164 ) [36] Blackstock, D. (2000). F undamentals of Physic al A c oustics . A Wiley- In terscience publication. Wiley . (Cite d on p age 228 ) [37] Blauert, J. (1969). Sound lo calization in the median plane. A cta A custic a unite d with A custic a , 22(4):205–213. (Cite d on p age 10 ) [38] Blauert, J. (1997). Sp atial he aring: the psychophysics of human sound lo c alization . MIT press. (Cite d on p ages 9 , 13 , 14 , 16 , 18 , 40 , 73 , 93 , 126 and 149 ) [39] Blauert, J. (2005). Communic ation ac oustics . Springer-V erlag Berlin Hei- delb erg, 1 edition. (Cite d on p ages 2 , 3 , 10 , 13 , 20 , 76 and 98 ) [40] Blauert, J. (2013). The te chnolo gy of binaur al listening . Springer. (Cite d on p ages 9 , 20 , 22 , 33 and 76 ) [41] Blauert, J., Lehnert, H., Sahrhage, J., and Strauss, H. (2000). An interac- tiv e virtual-environmen t generator for psychoacoustic research. i: Arc hitec- ture and implementation. A cta A custic a unite d with A custic a , 86:94–102. (Cite d on p ages 1 and 57 ) [42] Bo ck, T. M. and Keele, Jr., D. B. D. (1986). The eﬀects of in terau- ral crosstalk on stereo repro duction and minimizing interaural crosstalk in nearﬁeld monitoring by the use of a physical barrier: part 1. Journal of the A udio Engine ering So ciety . (Cite d on p age 43 ) BIBLIOGRAPHY 199 [43] Bradley , J. S. (1986). Sp eec h intelligibilit y studies in classro oms. The Journal of the A c oustic al So ciety of Americ a , 80(3):846–854. (Cite d on p age 127 ) [44] Bradley , J. S. and Soulo dre, G. A. (1995). Ob jectiv e measures of listener en v elopmen t. The Journal of the A c oustic al So ciety of Americ a , 98(5):2590– 2597. (Cite d on p ages 36 , 38 and 58 ) [45] Brand˜ ao, E. (2018). A c´ ustic a de salas: Pr ojeto e mo delagem . Editora Bluc her, S˜ ao Paulo. (Cite d on p ages 16 , 19 , 33 , 36 , 38 and 128 ) [46] Brandao, E., Morgado, G., and F onseca, W. (2020). A ray tracing engine in tegrated with blender and with uncertain t y estimation: Description and initial results. Building A c oustics , 28:1–20. (Cite d on p age 27 ) [47] Breebaart, J., v an de P ar, S., Kohlrausc h, A., and Sc h uijers, E. (2004). High-qualit y parametric spatial audio co ding at low bitrates. Journal of the A udio Engine ering So ciety . (Cite d on p age 57 ) [48] Breebaart, J., V an de P ar, S., Kohlrausch, A., and Sch uijers, E. (2005). P arametric co ding of stereo audio. EURASIP Journal on A dvanc es in Signal Pr o c essing , pages 1–18. (Cite d on p age 57 ) [49] Brinkmann, F., Asp¨ oc k, L., Ac kermann, D., Lepa, S., V orl¨ ander, M., and W einzierl, S. (2019). A round robin on ro om acoustical simulation and auralization. The Journal of the A c oustic al So ciety of Americ a , 145(4):2746– 2760. (Cite d on p age 21 ) [50] Brinkmann, F., Asp¨ ock, L., Ac kermann, D., Op dam, R., V orl¨ ander, M., and W einzierl, S. (2021). A benchmark for ro om acoustical sim ulation. concept and database. Applie d A c oustics , 176:107867. (Cite d on p age 21 ) [51] Brinkmann, F., Lindau, A., and W einzierl, S. (2017). On the authen ticity of individual dynamic binaural synthesis. The Journal of the A c oustic al So ciety of Americ a , 142(4):1784–1795. (Cite d on p age 14 ) [52] Brown, C. and Duda, R. (1998). A structural mo del for binaural sound syn thesis. IEEE T r ansactions on Sp e e ch and A udio Pr o c essing , 6(5):476– 488. (Cite d on p age 12 ) [53] Brown, V. A. and Strand, J. F. (2019). Noise increases listening eﬀort in normal-hearing young adults, regardless of working memory capacit y. L anguage, Co gnition and Neur oscienc e . (Cite d on p ages 51 and 118 ) BIBLIOGRAPHY 200 [54] Brungart, D. S., Cohen, J., Cord, M., Zion, D., and Kalluri, S. (2014). Assessmen t of auditory spatial a wareness in complex listening en vironments. The Journal of the A c oustic al So ciety of Americ a , 136(4):1808–1820. (Cite d on p age 18 ) [55] Buchholz, J. M. and Best, V. (2020). Sp eech detection and lo calization in a reverberant m ultitalk er environmen t b y normal-hearing and hearing- impaired listeners. The Journal of the A c oustic al So ciety of A meric a , 147(3):1469–1477. (Cite d on p age 42 ) [56] Byrnes, H. (1984). The role of listening comprehension: A theoretical base. F or eign language annals , 17(4):317. (Cite d on p age 8 ) [57] Campanini, S. and F arina, A. (2008). A new audacity feature: ro om ob jectiv e acustical parameters calculation mo dule. (Cite d on p age 36 ) [58] Choi, I., Shinn-Cunningham, B. G., Chon, S. B., and Sung, K.-m. (2008). Ob jectiv e measurement of p erceived auditory quality in m ultic hannel audio compression co ding systems. Journal of the Audio Engine ering So ciety , 56(1/2):3–17. (Cite d on p age 73 ) [59] Claus Lynge Christensen, Gry Bælum Nielsen, J. H. R. (2008). Danish acoustical so ciety round robin on ro om acoustic computer mo delling. https: //o deon.dk/learn/articles/auralisation/ . Last c hec k ed on: No v 28, 2021. (Cite d on p ages 18 , 27 , 68 , 107 and 129 ) [60] Co op er, D. H. and Bauck, J. L. (1989). prosp ects for transaural recording. Journal of the A udio Engine ering So ciety , 37(1/2):3–19. (Cite d on p age 22 ) [61] Cubick, J. and Dau, T. (2016). V alidation of a virtual sound en vironmen t system for testing hearing aids. A cta A custic a unite d with A custic a . (Cite d on p ages 42 , 53 , 56 and 120 ) [62] Cuev as-Ro dr ´ ıguez, M., Picinali, L., Gonz´ alez-T oledo, D., Garre, C., de la Rubia-Cuestas, E., Molina-T anco, L., and Rey es-Lecuona, A. (2019). 3d tune-in to olkit: An op en-source library for real-time binaural spatialisation. PloS one , 14(3):e0211899. (Cite d on p ages 16 , 20 and 121 ) [63] Cunningham, L. L. and T ucci, D. L. (2017). Hearing loss in adults. New England Journal of Me dicine , 377(25):2465–2473. (Cite d on p ages 1 and 55 ) [64] Daniel, J. (2000). R epr´ esentation de champs ac oustiques, applic ation ` a la tr ansmission et ` a la r epr o duction de sc ` enes sonor es c omplexes dans un c ontexte multim´ edia (In F r ench) . PhD thesis, Universit y of P aris VI. (Cite d on p ages 31 and 99 ) [65] Daniel, J. and Moreau, S. (2004). F urther study of sound ﬁeld co ding with higher order ambisonics. In Audio Engine ering So ciety Convention 116 . (Cite d on p ages 30 , 32 , 99 , 121 and 142 ) BIBLIOGRAPHY 201 [66] Davies, W. J., Bruce, N. S., and Murph y , J. E. (2014). Soundscap e repro- duction and synthesis. A cta A custic a unite d with A custic a , 100(2):285–292. (Cite d on p age 42 ) [67] Dietrich, P ., Masiero, B., M ¨ uller-T rap et, M., Pollo w, M., and Scharrer, R. (2010). Matlab to olbox for the comprehension of acoustic measuremen t and signal pro cessing. In F ortschritte der A kustik – DA GA . (Cite d on p age 107 ) [68] Dreier, C. and V orl¨ ander, M. (2020). Psyc hoacoustic optimisation of air- craft noise-challenges and limits. In Inter-Noise and Noise-Con Congr ess and Confer enc e Pr o c e e dings , volume 261, pages 2379–2386. Institute of Noise Con trol Engineering. (Cite d on p age 20 ) [69] Dreier, C. and V orl¨ ander, M. (2021). Aircraft noise—auralization-based assessmen t of weather-dependent eﬀects on loudness and sharpness. The Journal of the A c oustic al So ciety of Americ a , 149(5):3565–3575. (Cite d on p age 20 ) [70] Duda, R., Av endano, C., and Algazi, V. (1999). An adaptable ellip- soidal head mo del for the interaural time diﬀerence. In 1999 IEEE Interna- tional Confer enc e on A c oustics, Sp e e ch, and Signal Pr o c essing. Pr o c e e dings. ICASSP99 (Cat. No.99CH36258) , v olume 2, pages 965–968 vol.2. (Cite d on p age 12 ) [71] Dunne, R., Desai, D., and Heyns, P . S. (2021). Developmen t of an acoustic material prop ert y database and universal airﬂo w resistivity mo del. Applie d A c oustics , 173:107730. (Cite d on p age 21 ) [72] Eddins, D. A. and Hall, J. W. (2010). Binaural pro cessing and auditory asymmetries. In Gordon-Salant, S., F risina, R. D., Popper, A. N., and F ay , R. R., editors, The A ging A uditory System , pages 135–165. Springer New Y ork, New Y ork, NY. (Cite d on p age 11 ) [73] Epain, N., Guillon, P ., Kan, A., Kosobro dov, R., Sun, D., Jin, C., and V an Sc haik, A. (2010). Ob jective ev aluation of a three-dimensional sound ﬁeld repro duction system. In Burgess, M., Dav ey , J., Don, C., and McMinn, T., editors, Pr o c e e dings of 20th International Congr ess on A c oustics, ICA 2010 , volume 2, pages 949–955. In ternational Congress on Acoustics (ICA). (Cite d on p age 31 ) [74] Eyring, C. F. (1930). Rev erb eration time in “dead” ro oms. The Journal of the A c oustic al So ciety of A meric a , 1(2A):168–168. (Cite d on p age 34 ) [75] F arina, A. (2000). Simultaneous measurement of impulse resp onse and distortion with a sw ept-sine tec hnique. Journal of The A udio Engine ering So ciety . (Cite d on p age 63 ) BIBLIOGRAPHY 202 [76] F arina, A., Glasgal, R., Armelloni, E., and T orger, A. (2001). am biophonic principles for the recording and repro duction of surround sound for music. journal of the audio engine ering so ciety . (Cite d on p ages 43 and 45 ) [77] F avrot, S. and Buchholz, J. (2009). V alidation of a loudsp eak er-based ro om auralization system using sp eec h intelligibil ity measures. In A udio Engine ering So ciety Convention Pap ers , volume Preprin t 7763, page 7763. Praesens V erlag. 126th Audio Engineering So ciet y Con v en tion, AES126 ; Conference date: 07-05-2009 Through 10-05-2009. (Cite d on p ages 21 and 99 ) [78] F avrot, S., Marsc hall, M., K¨ asbach, J., Buc hholz, J., and W eller, T. (2011). Mixed-order ambisonics recording and playbac k for improving hor- izon tal directionality . In Pr o c e e ding of the audio engine ering so ciety 131st c onvention . 131st AES Con v en tion ; Conference date: 20-10-2011 Through 23-10-2011. (Cite d on p age 99 ) [79] F avrot, S. E., Buchholz, J., and Dau, T. (2010). A loudsp e aker-b ase d r o om aur alization system for auditory r ese ar ch . phdthesis, T ec hnical Universit y of Denmark. (Cite d on p ages 1 , 42 , 43 , 45 , 56 , 57 , 120 and 127 ) [80] Fichna, S., Bib erger, T., Seeb er, B. U., and Ewert, S. D. (2021). Eﬀect of acoustic scene complexit y and visual scene represen tation on auditory p erception in virtual audio-visual environmen ts. 2021 Immersive and 3D A udio: fr om Ar chite ctur e to Automotive (I3D A) . (Cite d on p age 42 ) [81] Fintor, E., Asp¨ ock, L., F els, J., and Sc hlittmeier, S. (2021). The role of spatial separation of t w o talk ers auditory stimuli in the listener’s memory of running sp eec h: listening eﬀort in a non-noisy conv ersational setting. International Journal of A udiolo gy . (Cite d on p age 57 ) [82] Fitzroy , D. (1959). Reverberation form ula which seems to b e more accu- rate with non uniform distribution of absorption. The Journal of the A c ous- tic al So ciety of Americ a , 31(7):893–897. (Cite d on p age 34 ) [83] F rancis, A. L. and Lov e, J. (2020). Listening eﬀort: Are we measuring cognition or aﬀect, or b oth? WIREs Co gnitive Scienc e , 11(1):e1514. (Cite d on p age 98 ) [84] F rank, M. (2014). Lo calization using diﬀeren t amplitude-panning metho ds in the frontal hhorizontal plane. In Pr o c e e dings of the EAA Joint Symp osium on Aur alization and A mbisonics 2014 . (Cite d on p ages 45 , 49 and 160 ) [85] F rank, M. and Zotter, F. (2008). Lo calization exp erimen ts using diﬀer- en t 2d am bisonics deco ders. In 25th T onmeistertagung-VDT International Convention, L eipzig . (Cite d on p ages 45 and 49 ) BIBLIOGRAPHY 203 [86] F ranklin, W. S. (1903). Deriv ation of equation of deca ying sound in a ro om and deﬁnition of op en windo w equiv alen t of absorbing p o w er. Phys. R ev. (Series I) , 16:372–374. (Cite d on p age 34 ) [87] F raser, S., Gagn ´ e, J. P ., Alepins, M., and Dub ois, P . (2010). Ev aluat- ing the eﬀort exp ended to understand sp eech in noise using a dual-task paradigm: The eﬀects of providing visual sp eech cues. Journal of Sp e e ch, L anguage, and He aring R ese ar ch . (Cite d on p age 50 ) [88] F uruya, H., F ujimoto, K., Y oung Ji, C., and Higa, N. (2001). Arriv al direc- tion of late sound and listener env elopment. Applie d A c oustics , 62(2):125– 136. (Cite d on p age 37 ) [89] Gandemer, L., Parseihian, G., Bourdin, C., and Kronland-Martinet, R. (2018). Perception of Surrounding Sound Source T ra jectories in the Horizon- tal Plane: A Comparison of VBAP and Basic-Deco ded HOA. A cta A custic a unite d with A custic a , pages 338–350. (Cite d on p ages 40 , 57 and 122 ) [90] Gelfand, S. and Gelfand, S. (2004). He aring: An Intr o duction to Psycho- lo gic al and Physiolo gic al A c oustics, F ourth Edition . T aylor & F rancis. (Cite d on p age 76 ) [91] Gerzon, M. A. (1985). Ambisonics in multic hannel broadcasting and video. AES: Journal of the A udio Engine ering So ciety . (Cite d on p ages 23 and 58 ) [92] Giguere, C. and W o o dland, P . C. (1994). A computational mo del of the auditory p eriphery for sp eec h and hearing researc h. i. ascending path. The Journal of the A c oustic al So ciety of Americ a , 95(1):331–342. (Cite d on p age 8 ) [93] Gil Carv a jal, J., Cubic k, J., Santurette, S., and Dau, T. (2016). Spatial hearing with incongruent visual or auditory ro om cues. Scientiﬁc R ep orts , 6. (Cite d on p ages 16 and 42 ) [94] Glasgal, R. (2001). the am biophone deriv ation of a recording methodology optimized for am biophonic repro duction. journal of the audio engine ering so ciety . (Cite d on p age 43 ) [95] Glasgal, R. and Y ates, K. (1995). Ambiophonics: Beyond Surr ound Sound to Virtual Sonic R e ality . Am biophonics Institute. (Cite d on p age 43 ) [96] Gomes, L., F onseca, W. D., de Carv alho, D. M. L., and Mareze, P . H. (2020). Rendering binaural signals for mo ving sources. In R epr o duc e d Sound 2020 . (Cite d on p age 20 ) BIBLIOGRAPHY 204 [97] Grimm, G., Ewert, S., and Hohmann, V. (2015a). Ev aluation of spatial audio repro duction schemes for application in hearing aid research. A cta A custic a unite d with A custic a , 101(4):842–854. (Cite d on p ages 18 , 21 , 31 , 41 , 49 , 53 , 94 , 117 , 121 , 160 , 163 , 177 , 187 , 190 and 191 ) [98] Grimm, G., Kollmeier, B., and Hohmann, V. (2016a). Spatial Acoustic Scenarios in Multic hannel Loudspeaker Systems for Hearing Aid Ev aluation. Journal of the A meric an A c ademy of Audiolo gy , 27(7):557–566. (Cite d on p ages 56 , 163 and 191 ) [99] Grimm, G., Kollmeier, B., and Hohmann, V. (2016b). Spatial Acoustic Scenarios in Multic hannel Loudspeaker Systems for Hearing Aid Ev aluation. Journal of the A meric an A c ademy of Audiolo gy . (Cite d on p age 191 ) [100] Grimm, G., Luberadzk a, J., Herzke, T., and Hohmann, V. (2015b). T o ol- b o x for acoustic scene creation and rendering (T ASCAR): Render metho ds and research applications. Pr o c e e dings of the Linux A udio Confer enc e . (Cite d on p age 42 ) [101] Grimm, G., Lub eradzk a, J., and Hohmann, V. (2018). Virtual acoustic en vironmen ts for comprehensive ev aluation of mo del-based hearing devices *. International Journal of Audiolo gy . (Cite d on p ages 42 and 120 ) [102] Grimm, G., Lub eradzk a, J., and Hohmann, V. (2019). A to olb o x for rendering virtual acoustic environmen ts in the context of audiology . A cta A custic a unite d with A custic a , 105:566–578. (Cite d on p ages 1 , 42 and 57 ) [103] Guastavino, C., Katz, B., P olack, J.-D., Levitin, D., and Dubois, D. (2004). Ecological v alidit y of soundscap e repro duction. A cta A custic a unite d with A custic a , 50. (Cite d on p ages 42 and 191 ) [104] Guastavino, C. and Katz, B. F. G. (2004). P erceptual ev aluation of m ulti-dimensional spatial audio repro duction. The Journal of the A c oustic al So ciety of Americ a , 116:1105–1115. (Cite d on p ages 57 , 86 and 122 ) [105] Guastavino, C., Larcher, V., Catusseau, G., and Boussard, P . (2007). Spatial audio quality ev aluation: comparing transaural, am bisonics and stereo. In Pr o c e e dings of the 13th International Confer enc e on Auditory Display. Montr´ eal Canada . Georgia Institute of T echnology . (Cite d on p ages 86 and 122 ) [106] Hacihabib oglu, H., De Sena, E., Cv etko vic, Z., Johnston, J., and Smith I I I, J. O. (2017). P erceptual spatial audio recording, simulation, and rendering: An ov erview of spatial-audio tec hniques based on psychoa- coustics. IEEE Signal Pr o c essing Magazine , 34(3):36–54. (Cite d on p ages 20 , 22 and 23 ) [107] Hamdan, E. C. and Fletcher, M. D. (2022). A compact tw o-loudsp eak er virtual sound reproduction system for clinical testing of spatial hearing with hearing-assistiv e devices. F r ontiers in Neur oscienc e , 15. (Cite d on p ages 42 , 47 and 49 ) BIBLIOGRAPHY 205 [108] Hammershøi, D. and Møller, H. (1992). F undamentals of binaural tech- nology . In F undamentals of Binaur al T e chnolo gy . (Cite d on p ages 12 and 16 ) [109] Hammershøi, D. and Møller, H. (2005). Binaural technique — basic metho ds for recording, syn thesis, and repro duction. In Blauert, J., edi- tor, Communic ation A c oustics , pages 223–254. Springer Berlin Heidelb erg, Berlin, Heidelb erg. (Cite d on p age 16 ) [110] Harris, P ., Nagy , S., and V ardaxis, N. (2018). Mosby’s Dictionary of Me dicine, Nursing and He alth Pr ofessions - R evise d 3r d Anz Edition . Else- vier Health Sciences Apac. (Cite d on p age 7 ) [111] Hav elo ck, D. I., Kuw ano, S., and V orlander, M. (2008). Handb o ok of signal pr o c essing in ac oustics . Springer, New Y ork. (Cite d on p age 98 ) [112] Hazrati, O. and Loizou, P . C. (2012). The combined eﬀects of rev er- b eration and noise on sp eech in telligibility b y co c hlear implan t listeners. International Journal of A udiolo gy . (Cite d on p age 98 ) [113] He, J. (2016). Sp atial A udio R epr o duction with Primary Ambient Ex- tr action . SpringerBriefs in Electrical and Computer Engineering. Springer Singap ore. (Cite d on p age 18 ) [114] Heck er, S. (1984). Music for adv ertising eﬀect. Psycholo gy & Marketing , 1(3-4):3–8. (Cite d on p age 8 ) [115] Hendrickx, E., Stitt, P ., Messonnier, J.-C., Lyzw a, J.-M., Katz, B. F., and de Boish´ eraud, C. (2017). Impro vemen t of externalization b y listener and source mov ement using a “binauralized” microphone arra y . Journal of the audio engine ering so ciety , 65(7/8):589–599. (Cite d on p age 23 ) [116] Hendrikse, M. M. E., Llorac h, G., Hohmann, V., and Grimm, G. (2019). Mo v emen t and gaze b eha vior in virtual audio visual listening environmen ts resem bling everyda y life. T r ends in He aring , 23. (Cite d on p ages 1 and 57 ) [117] Hiyama, K., Komiyama, S., and Hamasaki, K. (2002). The minimum n um b er of loudsp eak ers and its arrangement for reproducing the spatial impression of diﬀuse sound ﬁeld. Journal of the Audio Engine ering So ciety . (Cite d on p age 41 ) [118] Hohmann, V., P aluch, R., Krueger, M., Meis, M., and Grimm, G. (2020). The virtual realit y lab: Realization and application of virtual sound envi- ronmen ts. Ear & He aring , 41:31S–38S. (Cite d on p ages 1 and 57 ) BIBLIOGRAPHY 206 [119] Holman, J. A., Drummond, A., and Na ylor, G. (2021). Hearing aids reduce daily-life fatigue and increase so cial activit y: a longitudinal study . me dRxiv . (Cite d on p ages 1 and 56 ) [120] Holub e, I., F redelak e, S., Vlaming, M., and Kollmeier, B. (2010). Devel- opmen t and analysis of an international sp eec h test signal (ists). Interna- tional Journal of A udiolo gy , 49(12):891–903. (Cite d on p age 131 ) [121] Holub e, I., Haeder, K., Imbery , C., and W eb er, R. (2016). Sub jective Listening Eﬀort and Electro dermal Activity in Listening Situations with Rev erb eration and Noise. T r ends in he aring . (Cite d on p ages 99 and 118 ) [122] Hong, J. Y., He, J., Lam, B., Gupta, R., and Gan, W.-S. (2017). Spatial audio for soundscap e design: Recording and repro duction. Applie d Scienc es , 7(6). (Cite d on p age 18 ) [123] Hornsby , B. W. (2013). The eﬀects of hearing aid use on listening eﬀort and men tal fatigue asso ciated with sustained speech pro cessing demands. Ear and he aring , 34(5):523–534. (Cite d on p age 33 ) [124] How ard, D. and Angus, J. (2009). A c oustics and Psycho ac oustics 4th Edition . Oxford: F o cal Press, 4th edition. (Cite d on p age 73 ) [125] Huisman, T., Ahrens, A., and MacDonald, E. (2021). Ambisonics sound source lo calization with v arying amoun t of visual information in virtual realit y . F r ontiers in Virtual R e ality , 2. (Cite d on p ages 16 and 49 ) [126] International T elecommunications Union - Radio communications Sector (ITU-R) (2015). Metho ds for the sub jective assessmen t of small impair- men ts in audio systems. T ec hnical rep ort, In ternational T elecomm unications Union, Genev a. (Cite d on p ages 42 , 61 , 105 and 132 ) [127] ISO (2009). 3382-1: Acoustics - measuremen t of ro om acoustic parame- ters. part 1 : P erformance spaces. ISO 1:2009, ISO. (Cite d on p ages 33 , 34 , 35 and 70 ) [128] J¨ anck e, L. (2008). Music, memory and emotion. Journal of biolo gy , 7(6):1–5. (Cite d on p age 8 ) [129] Jin, C., Corderoy , A., Carlile, S., and v an Schaik, A. (2004). Contrasting monaural and interaural sp ectral cues for h uman sound lo calization. The Journal of the A c oustic al So ciety of Americ a , 115(6):3124–3141. (Cite d on p age 11 ) [130] Jot, J.-M., W ardle, S., and Larcher, V. (1998). approaches to binaural syn thesis. Journal of the Audio Engine ering So ciety . (Cite d on p age 27 ) BIBLIOGRAPHY 207 [131] Kang, S. and Kim, S.-H. K. (1996). Realistic audio teleconferencing using binaural and auralization techniques. Etri Journal , 18:41–51. (Cite d on p age 23 ) [132] Katz, B. F. G. and Noisternig, M. (2014). A comparative study of in ter- aural time delay estimation metho ds. The Journal of the A c oustic al So ciety of Americ a , 135(6):3530–3540. (Cite d on p age 70 ) [133] Keet, V. (1968). The inﬂuence of early lateral reﬂections on the spatial impression. Pr o c. 6th Int. Cong. A c oust., T okyo , 2. (Cite d on p age 39 ) [134] Keidser, G., Na ylor, G., Brungart, D. S., Caduﬀ, A., Camp os, J., Carlile, S., Carp enter, M. G., Grimm, G., Hohmann, V., Holub e, I., Launer, S., Lunner, T., Mehra, R., Rapp ort, F., Slaney , M., and Smeds, K. (2020). The quest for ecological v alidity in hearing science: what it is, why it matters, and how to adv ance it. Ear and He aring , 41(S1):5S–19S. (Cite d on p ages 53 , 56 , 98 , 122 and 191 ) [135] Kestens, K., Degeest, S., and Keppler, H. (2021). The eﬀect of cognition on the aided b eneﬁt in terms of sp eech understanding and listening eﬀort obtained with digital hearing aids: A systematic review. A meric an Journal of Audiolo gy , 30(1):190–210. (Cite d on p age 98 ) [136] Kirsch, C., Poppitz, J., W endt, T., v an de Par, S., and Ew ert, S. D. (2021). Computationally eﬃcient spatial rendering of late rev erb eration in virtual acoustic environmen ts. 2021 Immersive and 3D A udio: fr om A r chite ctur e to Automotive (I3DA) . (Cite d on p age 42 ) [137] Klatte, M., Lachmann, T., Meis, M., et al. (2010). Eﬀects of noise and rev erb eration on sp eec h p erception and listening comprehension of children and adults in a classro om-lik e setting. Noise and He alth , 12(49):270. (Cite d on p ages 1 and 191 ) [138] Kleiner, M., Dalenb¨ ack, B.-I., and Svensson, P . (1993). Auralization-an o v erview. Journal of the Audio Engine ering So ciety , 41(11):861–875. (Cite d on p age 19 ) [139] Klemenz, M. (2005). Sound synthesis of starting electric railb ound v ehi- cles and the inﬂuence of consonance on sound quality . A cta acustic a unite d with acustic a , 91(4):779–788. (Cite d on p age 20 ) [140] Klo ckgether, S. and v an de Par, S. (2016). Just noticeable diﬀerences of spatial cues in ec hoic and anec hoic acoustical environmen ts. The Journal of the A c oustic al So ciety of Americ a , 140(4):EL352–EL357. (Cite d on p ages 93 and 94 ) [141] Kobay ashi, M., Ueno, K., and Ise, S. (2015). The Eﬀects of Spatial- ized Sounds on the Sense of Presence in Auditory Virtual Environmen ts: A Psyc hological and Physiological Study. Pr esenc e: T ele op er ators and Virtual Envir onments , 24(2):163–174. (Cite d on p age 16 ) BIBLIOGRAPHY 208 [142] Ko ehnke, J. and Besing, J. (1996). A procedure for testing sp eech intelli- gibilit y in a virtual listening environmen t. Ear and He aring , 17(3):211–217. cited By 59. (Cite d on p age 40 ) [143] Ko elewijn, T., Zekveld, A. A., F esten, J. M., and Kramer, S. E. (2012). Pupil dilation uncov ers extra listening eﬀort in the presence of a single-talker mask er. Ear and He aring , 33(2):291–300. (Cite d on p age 190 ) [144] Kramer, S. E., Bh uiyan, T., Bramsløw, L., Fiedler, L., Gra versen, C., Hadley , L. V., Innes-Brown, H., Naylor, G., Rich ter, M., Saunders, G. H., V ersfeld, N. J., W endt, D., Whitmer, W. M., and Zekveld, A. A. (2020). Inno v ativ e hearing aid research on ecological conditions and outcome mea- sures: The hear-eco pro ject. (Cite d on p age 191 ) [145] Kramer, S. E., Kapteyn, T. S., F esten, J. M., and T obi, H. (1996). The relationships b et w een self-rep orted hearing disability and measures of auditory disability . Audiolo gy , 35(5):277–287. (Cite d on p age 56 ) [146] Krokstad, A., Strom, S., and Sørsdal, S. (1968). Calculating the acous- tical ro om resp onse by the use of a ra y tracing technique. Journal of Sound and Vibr ation , 8(1):118–125. (Cite d on p age 19 ) [147] Krueger, M., Sch ulte, M., Brand, T., and Holub e, I. (2017). Developmen t of an adaptive scaling metho d for sub jective listening eﬀort. The Journal of the A c oustic al So ciety of Americ a . (Cite d on p age 50 ) [148] Kuttruﬀ, H. (2009). R o om A c oustics, Fifth Edition . T a ylor & F rancis. (Cite d on p ages 19 , 34 , 68 and 127 ) [149] Kwak, C., Han, W., Lee, J., Kim, J., and Kim, S. (2018). Eﬀect of noise and rev erb eration on sp eec h recognition and listening eﬀort for older adults. Geriatrics and Ger ontolo gy International . (Cite d on p ages 50 , 99 and 118 ) [150] Laitinen, M.-V. and Pulkki, V. (2009). Binaural repro duction for di- rectional audio co ding. In 2009 IEEE Workshop on Applic ations of Signal Pr o c essing to A udio and A c oustics , pages 337–340. (Cite d on p age 41 ) [151] Lau, M. K., Hicks, C., Kroll, T., and Zupancic, S. (2019). Eﬀect of auditory task type on ph ysiological and sub jective measures of listening eﬀort in individuals with normal hearing. Journal of Sp e e ch, L anguage, and He aring R ese ar ch . (Cite d on p ages 50 and 52 ) [152] Lau, S.-T., Pic hora-F uller, M., Li, K., Singh, G., and Camp os, J. (2016). Eﬀects of hearing loss on dual-task p erformance in an audio visual virtual re- alit y simulation of listening while walking. Journal of the A meric an A c ademy of Audiolo gy , 27. (Cite d on p age 57 ) BIBLIOGRAPHY 209 [153] Letowski, T. and Letowski, S. (2011). Lo calization error accuracy and precision of auditory lo calization. In Strumillo, P ., editor, A dvanc es in Sound L o c alization , chapter 4, pages 55–78. In tec h, Oxford. (Cite d on p ages 9 , 10 and 17 ) [154] Levy , S. M. (2012). Section 9 - calculations to determine the eﬀec- tiv eness and con trol of thermal and sound transmission. In Levy , S. M., editor, Construction Calculations Manual , pages 503–544. Butterw orth- Heinemann, Boston. (Cite d on p age 60 ) [155] Lindau, A. and Brinkmann, F. (2012). p erceptual ev aluation of head- phone comp ensation in binaural synthesis based on non-individual record- ings. journal of the audio engine ering so ciety , 60(1/2):54–62. (Cite d on p age 14 ) [156] Lindau, A., Kosank e, L., and W einzierl, S. (2010). p erceptual ev aluation of ph ysical predictors of the mixing time in binaural room impulse responses. Journal of the A udio Engine ering So ciety . (Cite d on p age 126 ) [157] Lindemann, W. (1986). Extension of a binaural cross-correlation mo del b y contralateral inhibition. i. simulation of lateralization for stationary sig- nals. The Journal of the A c oustic al So ciety of Americ a , 80 6:1608–22. (Cite d on p age 45 ) [158] Liu, Z., F ard, M., and Jazar, R. (2015). Developmen t of an acoustic material database for vehicle in terior trims. T ec hnical report, SAE T ec hnical P ap er. (Cite d on p age 21 ) [159] Llopis, H. S., Pind, F., and Jeong, C.-H. (2020). Dev elopmen t of an auditory virtual realit y system based on pre-computed b-format im- pulse resp onses for building design ev aluation. Building and Envir onment , 169:106553. (Cite d on p age 133 ) [160] Llorach, G., Ev ans, A., Blat, J., Grimm, G., and Hohmann, V. (2016). W eb-based liv e speech-driv en lip-sync. In 2016 8th International Confer enc e on Games and Virtual Worlds for Serious Applic ations (VS-GAMES) , pages 1–4. (Cite d on p ages 1 and 57 ) [161] Llorach, G., Grimm, G., Hendrikse, M. M., and Hohmann, V. (2018). T ow ards Realistic Immersiv e Audiovisual Sim ulations for Hearing Research. In Pr o c e e dings of the 2018 Workshop on Audio-Visual Sc ene Understanding for Immersive Multime dia , pages 33–40. (Cite d on p ages 1 , 18 , 27 , 40 , 53 , 56 , 57 , 58 and 191 ) [162] Llorca-Bof ´ ı, J., Dreier, C., Heck, J., and V orl¨ ander, M. (2022). Urban sound auralization and visualization framew ork;case study at ih tapark. Sus- tainability , 14(4). (Cite d on p age 20 ) BIBLIOGRAPHY 210 [163] Loizou, P . C. (2007). Sp e e ch enhanc ement: the ory and pr actic e . CRC press. (Cite d on p age 52 ) [164] Lokki, T. and Savio ja, L. (2008). Virtual acoustics. In Ha velock, D., Ku w ano, S., and V orl¨ ander, M., editors, Handb o ok of Signal Pr o c essing in A c oustics , pages 761–771. Springer New Y ork, New Y ork, NY. (Cite d on p age 20 ) [165] Long, M. (2014). A r chite ctur al A c oustics . Elsevier Science. (Cite d on p ages 16 , 19 and 42 ) [166] Lop ez, J. J., Gutierrez, P ., Cob os, M., and Aguilera, E. (2014). Sound distance p erception comparison b etw een W a v e Field Synthesis and V ector Base Amplitude Panning. In ISCCSP 2014 - 2014 6th International Sym- p osium on Communic ations, Contr ol and Signal Pr o c essing, Pr o c e e dings . (Cite d on p ages 21 and 121 ) [167] Lov edee-T urner, M. and Murph y , D. (2018). Application of mac hine learning for the spatial analysis of binaural ro om impulse resp onses. Applie d Scienc es , 8(1). (Cite d on p age 15 ) [168] Lund, K. D., Ahrens, A., and Dau, T. (2020). A metho d for ev alu- ating audio-visual scene analysis in multi-talk er en vironmen ts. In Pr o c e e d- ings of the International Symp osium on Auditory and Audiolo gic al R ese ar ch , v olume 7, pages 357–364. The Danav ox Jubilee F oundation. International Symp osium on Auditory and Audiological Research ISAAR2019. (Cite d on p age 42 ) [169] Lundb eck, M., Grimm, G., Hohmann, V., Laugesen, S., and Neher, T. (2017). Sensitivity to Angular and Radial Source Mo v emen ts as a F unction of Acoustic Complexity in Normal and Impaired Hearing. T r ends in He aring , 21:2331–2165. (Cite d on p ages 33 and 57 ) [170] Lyon, R. F. (2017). Human and Machine He aring Extr acting Me aning fr om Sound . a. Cam bridge Universit y Press. (Cite d on p age 10 ) [171] Magezi, D. A. (2015). Linear mixed-eﬀects models for within-participan t psyc hology exp eriments: an introductory tutorial and free, graphical user in terface (lmmgui). F r ontiers in Psycholo gy , 6:2. (Cite d on p age 113 ) [172] Malham, D. G. and Myatt, A. (1995). 3-d sound spatialization using am bisonic techniques. Computer Music Journal , 19(4):58–70. (Cite d on p age 29 ) [173] Mansour, N., Marschall, M., May , T., W estermann, A., and Dau, T. (2021a). Sp eech intelligibilit y in a realistic virtual sound environmen t. The Journal of the A c oustic al So ciety of Americ a , 149(4):2791–2801. (Cite d on p age 99 ) BIBLIOGRAPHY 211 [174] Mansour, N., W estermann, A., Marsc hall, M., Ma y , T., Dau, T., and Buc hholz, J. (2021b). Guided ecological momentary assessmen t in real and virtual sound en vironments. A c oustic al So ciety of A meric a. Journal , 150(4):2695–2704. (Cite d on p ages 16 and 42 ) [175] Marentakis, G., Zotter, F., and F rank, M. (2014). V ector-base and am- bisonic amplitude panning: A comparison using p op, classical, and contem- p orary spatial music. A cta A custic a unite d with A custic a . (Cite d on p ages 57 and 86 ) [176] Marrone, N., Mason, C. R., and Kidd, G. (2008). The eﬀects of hearing loss and age on the b eneﬁt of spatial separation b etw een multiple talk ers in reverberant ro oms. The Journal of the A c oustic al So ciety of A meric a , 124(5):3064–3075. (Cite d on p ages 16 and 40 ) [177] Marschall, M. (2014). Capturing and repro ducing realistic acoustic scenes for hearing research. PhD Thesis - T e chnic al University of Denmark . (Cite d on p ages 40 , 53 , 99 and 120 ) [178] Masiero, B. (2012). Individualize d Binaur al T e chnolo gy. Me asur ement, Equalization and Per c eptual Evaluation . PhD thesis, R WTH Aac hen Uni- v ersit y . (Cite d on p age 14 ) [179] Masiero, B. and F els, J. (2011). Perceptually robust headphone equal- ization for binaural repro duction. In Audio Engine ering So ciety Convention 130 . Audio Engineering So ciet y . (Cite d on p age 22 ) [180] Masiero, B. and V orlaender, M. (2011). Spatial Audio Repro duction Metho ds for Virtual Realit y. In 42 º Congr eso Esp a ˜ nol de A c ´ ustic a Encuentr o Ib ´ eric o de A c ´ ustic a - Eur op e an Symp osium on Envir onmental A c oustics and on Buildings A c oustic al ly Sustainable , pages 1–12, C´ aceres. (Cite d on p ages 23 , 24 , 58 and 86 ) [181] Matthen, M. (2016). Eﬀort and displeasure in p eople who are hard of hearing. Ear and He aring , 37 Suppl 1. (Cite d on p age 57 ) [182] May , T., v an de P ar, S., and Kohlrausch, A. (2011). A probabilistic mo del for robust lo calization based on a binaural auditory fron t-end. IEEE T r ansactions on A udio, Sp e e ch, and L anguage Pr o c essing , 19(1):1–13. (Cite d on p ages xix , 6 , 138 , 150 , 156 , 166 , 167 , 174 and 178 ) [183] Meesaw at, K. and Hammershoi, D. (2003). The time when the reverber- ation tail in a binaural ro om impulse resp onse b egins. In Audio Engine ering So ciety Convention 115 . Audio Engineering So ciety . (Cite d on p age 15 ) [184] Menase, D. A., Ric hter, M., W endt, D., Fiedler, L., and Naylor, G. (2022). T ask-induced men tal fatigue and motiv ation inﬂuence listening eﬀort as measured by the pupil dilation in a sp eech-in-noise task. me dRxiv . (Cite d on p age 33 ) BIBLIOGRAPHY 212 [185] Michael, V. and V orl¨ ander, M. (2008). A ur alization. F undamentals of A c oustics, Mo del ling, Simulation, Algorithms and A c oustic Virtual R e ality . Springer. (Cite d on p age 98 ) [186] Miles, K., McMahon, C., Boisvert, I., Ibrahim, R., de Lissa, P ., Gra- ham, P ., and Lyxell, B. (2017). Ob jective Assessmen t of Listening Eﬀort: Coregistration of Pupillometry and EEG. T r ends in He aring . (Cite d on p ages 50 , 51 and 118 ) [187] Millington, G. (1932). A mo diﬁed formula for reverberation. The Journal of the A c oustic al So ciety of A meric a , 4(1A):69–82. (Cite d on p age 34 ) [188] Minnaar, P ., F avrot, S., and Buc hholz, J. (2010). Improving hearing aids through listening tests in a virtual sound en vironment. He aring Journal , 63(10):40–44. (Cite d on p ages 1 , 16 , 42 , 121 and 191 ) [189] Møller, H., Sørensen, M. F., Hammershøi, D., and Jensen, C. B. (1995). Head-related transfer functions of h uman sub jects. Journal of the A udio Engine ering So ciety , 43(5):300–321. (Cite d on p age 12 ) [190] Monaghan, J. J., Krumbholz, K., and Seeber, B. U. (2013). F actors aﬀecting the use of env elop e in teraural time diﬀerences in rev erb erationa). The Journal of the A c oustic al So ciety of A meric a , 133(4):2288–2300. bibtex: Monaghan2013. (Cite d on p age 33 ) [191] Mo ore, B. C. J. and T an, C.-T. (2004). dev elopmen t and v alidation of a metho d for predicting the perceived naturalness of sounds sub jected to sp ectral distortion. Journal of the Audio Engine ering So ciety , 52(9):900– 914. (Cite d on p age 41 ) [192] Mo ore, T. M. and Picou, E. M. (2018). A p otential bias in sub jective ratings of men tal eﬀort. Journal of Sp e e ch, L anguage, and He aring R ese ar ch . (Cite d on p ages 50 , 51 and 118 ) [193] Mueller, M. F., Kegel, A., Sc himmel, S. M., Dillier, N., and Hofbauer, M. (2012). Localization of virtual sound sources with bilateral hearing aids in realistic acoustical scenes. The Journal of the A c oustic al So ciety of A meric a , 131(6):4732–4742. (Cite d on p age 16 ) [194] M ¨ uller, S. and Massarani, P . (2001). T ransfer-function measurement with sweeps. Journal of the A udio Engine ering So ciety , 49:443–471. (Cite d on p ages 63 and 140 ) [195] Murta, B. (2019). Plataforma p ar a ensaios de p er c ep¸ c˜ ao sonor a c om fontes distribu ´ ıdas aplic´ avel a disp ositivos auditivos: p erSONA (in Por- tuguese) . PhD thesis, F ederal Universit y of Santa Catarina. (Cite d on p ages 1 and 57 ) BIBLIOGRAPHY 213 [196] Murta, B., Chiea, R., Mour˜ ao, G., Pinheiro, M. M., Cordioli, J., P aul, S., and Costa, M. (2019). Cci-mobile: Dev elopment of soft w are based to ols for sp eec h p erception assessment and training with hearing impaired brazil- ian population. In CONFERENCE on Implantable A uditory Pr ostheses (CIAP), L ake T aho e, California, US . (Cite d on p age 18 ) [197] Møller, H. (1992). F undamentals of binaural tec hnology . Applie d A c ous- tics , 36(3-4):171–218. (Cite d on p ages 15 and 73 ) [198] Nach bar, C., Zotter, F., Deleﬂie, E., and Sontacc hi, A. (2011). Ambix – a suggested ambisonics format. (Cite d on p age 103 ) [199] Narbutt, M., Allen, A., Sk oglund, J., Chinen, M., and Hines, A. (2018). Am biqual - a full reference ob jective qualit y metric for ambisonic spatial audio. In 2018 T enth International Confer enc e on Quality of Multime dia Exp erienc e (QoMEX) , pages 1–6. (Cite d on p age 27 ) [200] Naugolnykh, K. A., Ostro vsky , L. A., Sap ozhnik ov, O. A., and Hamilton, M. F. (2000). Nonlinear wa ve pro cesses in acoustics. (Cite d on p age 9 ) [201] Naylor, G. M. (1993). Odeon—another h ybrid ro om acoustical mo del. Applie d A c oustics , 38(2-4):131–143. (Cite d on p age 68 ) [202] Neuhoﬀ, J. (2021). Ec olo gic al psycho ac oustics . Brill. (Cite d on p age 190 ) [203] Neuman, A. C., W roblewski, M., Ha jicek, J., and Rubinstein, A. (2010). Com bined eﬀects of noise and reverberation on sp eec h recognition p erfor- mance of normal-hearing c hildren and adults. Ear and He aring . (Cite d on p ages 99 and 118 ) [204] Nicola, P . and Chiara, V. (2019). Impact of Background Noise Fluc- tuation and Rev erb eration on Resp onse Time in a Sp eec h Reception T ask. Journal of Sp e e ch, L anguage, and He aring R ese ar ch , 62(11):4179–4195. (Cite d on p ages 50 , 99 and 118 ) [205] Nielsen, J. and Dau, T. (2011). The danish hearing in noise test. Inter- national journal of audiolo gy , 50:202–8. (Cite d on p ages 101 and 102 ) [206] No ck e, C. and Mellert, V. (2002). Brief review on in situ measurement tec hniques of imp edance or absorption. In F orum A custicum, Sevil la . (Cite d on p age 21 ) [207] Nov o, P . (2005). Auditory virtual en vironmen ts. In Blauert, J., edi- tor, Communic ation A c oustics , pages 277–297. Springer Berlin Heidelb erg, Berlin, Heidelb erg. (Cite d on p age 57 ) BIBLIOGRAPHY 214 [208] Obleser, J., W¨ ostmann, M., Hellb ernd, N., Wilsch, A., and Maess, B. (2012). Adverse listening conditions and memory load drive a common alpha oscillatory netw ork. Journal of Neur oscienc e , 32(36):12376–12383. (Cite d on p age 111 ) [209] Ohlenforst, B., W endt, D., Kramer, S. E., Naylor, G., Zekv eld, A. A., and Lunner, T. (2018). Impact of SNR, mask er type and noise reduction pro cessing on sen tence recognition p erformance and listening eﬀort as indi- cated by the pupil dilation resp onse. He aring R ese ar ch . (Cite d on p ages 50 and 101 ) [210] Ohlenforst, B., Zekv eld, A. A., Jansma, E. P ., W ang, Y., Na ylor, G., Lorens, A., Lunner, T., and Kramer, S. E. (2017a). Eﬀects of hearing impairmen t and hearing aid ampliﬁcation on listening eﬀort: A systematic review. Ear and he aring , 38(3):267—281. (Cite d on p age 98 ) [211] Ohlenforst, B., Zekveld, A. A., Lunner, T., W endt, D., Na ylor, G., W ang, Y., V ersfeld, N. J., and Kramer, S. E. (2017b). Impact of stimulus-related factors and hearing impairment on listening eﬀort as indicated by pupil dilation. He aring R ese ar ch , 351:68–79. (Cite d on p age 50 ) [212] Oreinos, C. and Buc hholz, J. (2014). V alidation of realistic acoustic en vironmen ts for listening tests using directional hearing aids. In 2014 14th International Workshop on A c oustic Signal Enhanc ement (IW AENC) , pages 188–192. (Cite d on p ages 41 and 120 ) [213] Oreinos, C. and Buchholz, J. M. (2015). Ob jectiv e analysis of am bisonics for hearing aid applications: Eﬀect of listener’s head, ro om reverberation, and directional microphones. The Journal of the A c oustic al So ciety of A mer- ic a . (Cite d on p ages 18 , 41 , 53 , 163 and 191 ) [214] Palacino, J., Nicol, R., Emerit, M., and Gros, L. (2012). P erceptual assessmen t of binaural deco ding of ﬁrst-order am bisonics. In A c oustics 2012 . (Cite d on p age 21 ) [215] Parsehian, G., Gandemer, L., Bourdin, C., and Kronland Martinet, R. (2015). Design and p erceptual ev aluation of a fully immersiv e three- dimensional sound spatialization system. In 3r d International Confer enc e on Sp atial Audio (ICSA 2015) , Graz, Austria. (Cite d on p age 42 ) [216] Paul, S. (13-15 maio 2014). A ﬁsiologia da audi¸ c˜ ao como base para fenˆ omenos auditiv os. In Pr o c e e dings of the 12th AES Br azil Confer enc e , S˜ ao Paulo, SP . (Cite d on p age 9 ) [217] Pausc h, F., Asp¨ oc k, L., V orl¨ ander, M., and F els, J. (2018). An Ex- tended Binaural Real-Time Auralization System With an Interface to Re- searc h Hearing Aids for Exp eriments on Sub jects With Hearing Loss. T r ends in He aring . (Cite d on p ages 16 , 44 , 45 , 120 and 121 ) BIBLIOGRAPHY 215 [218] Pausc h, F., Behler, G., and F els, J. (2020). Scalar - a surrounding spher- ical cap loudsp eak er arra y for ﬂexible generation and ev aluation of virtual acoustic environmen ts. A cta A cust. , 4(5):19. (Cite d on p ages 1 and 57 ) [219] Pausc h, F. and F els, J. (2019). Mobilab – a mobile lab oratory for on-site listening exp erimen ts in virtual acoustic environmen ts. bioRxiv . (Cite d on p ages 1 and 57 ) [220] Pausc h, F. and F els, J. (2020). Lo calization p erformance in a binaural real-time auralization system extended to researc h hearing aids. T r ends in he aring , 24:1–18. (Cite d on p ages 1 , 42 and 57 ) [221] Pelzer, S., Masiero, B., and V orl¨ ander, M. (2014). 3D Repro duction of Ro om Auralizations b y Com bining Intensit y Panning, Crosstalk Cancella- tion and Ambisonics. Pr o c e e dings of the EAA Joint Symp osium on Aur al- ization and Ambisonics . (Cite d on p ages 44 , 45 , 86 and 127 ) [222] Peng, Z. E. and Litovsky , R. Y. (2021). The role of in teraural diﬀerences, head shado w, and binaural redundancy in binaural intelligibilit y b eneﬁts among school-aged children. T r ends in He aring , 25. (Cite d on p age 77 ) [223] Petersen, E. B., W¨ ostmann, M., Obleser, J., Stenfelt, S., and Lunner, T. (2015). Hearing loss impacts neural alpha oscillations under adverse listening conditions. F r ontiers in Psycholo gy . (Cite d on p age 50 ) [224] Pichora-F uller, M. K., Kramer, S. E., Eck ert, M. A., Edwards, B., Hornsb y , B. W., Humes, L. E., Lemke, U., Lunner, T., Matthen, M., Mack- ersie, C. L., Naylor, G., Phillips, N. A., Rich ter, M., Rudner, M., Sommers, M. S., T rembla y , K. L., and Wingﬁeld, A. (2016). Hearing impairment and cognitiv e energy: The framework for understanding eﬀortful listening (FUEL). In Ear and He aring . (Cite d on p ages 50 , 51 , 52 , 57 and 118 ) [225] Picou, E. M., Gordon, J., and Ric ketts, T. A. (2016). The eﬀects of noise and rev erb eration on listening eﬀort in adults with normal hearing. Ear and He aring . (Cite d on p ages 50 and 99 ) [226] Picou, E. M., Mo ore, T. M., and Rick etts, T. A. (2017). The eﬀects of directional pro cessing on ob jectiv e and sub jectiv e listening eﬀort. Journal of Sp e e ch, L anguage, and He aring R ese ar ch . (Cite d on p ages 1 and 51 ) [227] Picou, E. M., Rick etts, T., and Hornsb y , B. (2013). Ho w hearing aids, bac kground noise, and visual cues inﬂuence ob jectiv e listening eﬀort. Ear and He aring , 34:e52–e64. (Cite d on p ages 50 and 56 ) BIBLIOGRAPHY 216 [228] Picou, E. M. and Rick etts, T. A. (2014). Increasing motiv ation changes sub jectiv e rep orts of listening eﬀort and c hoice of coping strategy . Interna- tional Journal of A udiolo gy , 53(6):418–426. (Cite d on p age 50 ) [229] Picou, E. M. and Rick etts, T. A. (2018). The relationship b etw een sp eech recognition, b ehavioural listening eﬀort, and sub jective ratings. Interna- tional Journal of A udiolo gy . (Cite d on p ages 51 and 118 ) [230] Pielage, H., Zekveld, A. A., Saunders, G. H., V ersfeld, N. J., Lunner, T., and Kramer, S. E. (2021). The Presence of Another Individual Inﬂuences Listening Eﬀort, But Not Performance. Ear & He aring . (Cite d on p ages 40 , 57 , 82 and 190 ) [231] Pieren, R. (2018). Aur alization of Envir onmental A c oustic al Sc eneries: Synthesis of R o ad T r aﬃc, R ailway and Wind T urbine Noise . PhD thesis, Delft Universit y of T echnology . (Cite d on p age 20 ) [232] Pieren, R., Heutsc hi, K., W underli, J. M., Snellen, M., and Simons, D. G. (2017). Auralization of railwa y noise: Emission synthesis of rolling and impact noise. Applie d A c oustics , 127:34–45. (Cite d on p age 20 ) [233] Pinheiro, J. C. and Bates, D. M. (2000). Linear mixed-eﬀects mo dels: basic concepts and examples. Mixe d-eﬀe cts mo dels in S and S-Plus , pages 3–56. (Cite d on p age 113 ) [234] Plain, B., Pielage, H., Rich ter, M., Bhuiy an, T., Lunner, T., Kramer, S., and Zekveld, A. (2021). Social observ ation increases the cardio v ascular re- sp onse of hearing-impaired listeners during a sp eech reception task. He aring R ese ar ch , page 108334. (Cite d on p ages 57 and 190 ) [235] Plinge, A., Schlec ht, S. J., Thiergart, O., Rob otham, T., Rumm uk ainen, O., and Hab ets, E. A. P . (2018). six-degrees-of-freedom binaural audio repro duction of ﬁrst-order am bisonics with distance information. Journal of the audio engine ering so ciety . (Cite d on p age 27 ) [236] Poletti, M. A. (2005). Three-dimensional surround sound systems based on spherical harmonics. journal of the audio engine ering so ciety , 53(11):1004–1025. (Cite d on p age 31 ) [237] Politis, A. (2016). Micr ophone arr ay pr o c essing for p ar ametric sp atial audio te chniques . Do ctoral thesis, School of Electrical Engineering. (Cite d on p ages 130 , 131 and 181 ) [238] Politis, A., McCormac k, L., and Pulkki, V. (2017). Enhancement of am bisonic binaural repro duction using directional audio co ding with opti- mal adaptiv e mixing. In 2017 IEEE Workshop on Applic ations of Signal Pr o c essing to A udio and A c oustics (W ASP AA) , pages 379–383. (Cite d on p age 27 ) BIBLIOGRAPHY 217 [239] Pollo w, M. (2015). Dir e ctivity Patterns for R o om A c oustic al Me asur e- ments and Simulations . Aac hener Beitr¨ age zur T echnisc hen Akustik. Logos V erlag Berlin GmbH. (Cite d on p ages 28 and 29 ) [240] Portela, M. S. (2008). Caracteriza¸ c˜ ao de fontes sonoras e aplica¸ c˜ ao na auraliza¸ c˜ ao de ambien tes. Mestrado, Universidade F ederal de Santa Cata- rina. (Cite d on p age 13 ) [241] Pulkki, V. (1997). Virtual sound source p ositioning using v ector base amplitude panning. Journal of the Audio Engine ering So ciety , 45(6). (Cite d on p ages 23 , 24 , 26 , 58 , 93 , 142 , 179 and 190 ) [242] Pulkki, V. and Karjalainen, M. (2015). Communic ation A c oustics: A n Intr o duction to Sp e e ch, Audio and Psycho ac oustics . Wiley . (Cite d on p ages 10 , 17 , 70 , 73 and 143 ) [243] Pulkki, V., Politis, A., Laitinen, M.-V., Vilk amo, J., and Ahonen, J. (2017). First-order directional audio co ding (dirac). In Par ametric Time- F r e quency Domain Sp atial Audio , c hapter 5, pages 89–140. John Wiley & Sons, Ltd. (Cite d on p ages 44 , 45 and 127 ) [244] Purdy , M. (1991). Listening and comm unit y: The role of listening in comm unit y formation. International Journal of Listening , 5(1):51–67. (Cite d on p age 8 ) [245] Queiroz, M., Iazzetta, F., Kon, F., Gomes, M. H. A., Figueiredo, F. L., Masiero, B. S., Dias, L., T orres, M. H. C., and Thomaz, L. F. (2008). Acm us: an op en, integrated platform for ro om acoustics research. J. Br az. Comput. So c. , 14(3):87–103. (Cite d on p age 36 ) [246] Rayleigh, L. (1907). Xii. on our p erception of sound direction. (Cite d on p age 10 ) [247] Reichardt, W., Alim, O. A., and Schmidt, W. (1975). Deﬁnition and basis of making an ob jective ev aluation to distinguish b et w een useful and useless clarit y deﬁning m usical p erformances. A cta A custic a unite d with A custic a , 32(3):126–137. (Cite d on p age 36 ) [248] Rennies, J., Brand, T., and Kollmeier, B. (2011). Prediction of the inﬂuence of reverberation on binaural sp eec h intelligibilit y in noise and in quiet. The Journal of the A c oustic al So ciety of Americ a , 130(5):2999–3012. (Cite d on p age 40 ) [249] Rennies, J., Sc hepk er, H., Holub e, I., and Kollmeier, B. (2014). Listening eﬀort and sp eec h in telligibility in listening situations aﬀected by noise and rev erb eration. The Journal of the A c oustic al So ciety of Americ a . (Cite d on p age 50 ) BIBLIOGRAPHY 218 [250] Roﬄer, S. K. and Butler, R. A. (1968). F actors that inﬂuence the lo cal- ization of sound in the vertical plane. The Journal of the A c oustic al So ciety of Americ a , 43(6):1255–1259. (Cite d on p ages 10 and 33 ) [251] Roginsk a, A. (2017). Binaural audio through headphones. In Immersive Sound , pages 88–123. Routledge. (Cite d on p ages 40 and 53 ) [252] Romanov, M., Berghold, P ., F rank, M., Rudric h, D., Zaunsc hirm, M., and Zotter, F. (2017). Implementation and ev aluation of a low-cost head- trac k er for binaural syn thesis. Journal of the audio engine ering so ciety . (Cite d on p age 23 ) [253] Rose, J., Nelson, P ., Rafaely , B., and T akeuc hi, T. (2002). Sweet sp ot size of virtual acoustic imaging systems at asymmetric listener lo cations. The Journal of the A c oustic al So ciety of Americ a , 112(5):1992–2002. (Cite d on p ages 31 and 121 ) [254] Rossing, T. D. (2007). Springer Handb o ok of A c oustics . Springer Hand- b o ok of Acoustics. Springer-V erlag Berlin Heidelb erg, Stanford, CA, 2 edi- tion. (Cite d on p ages 16 , 19 , 33 , 36 , 38 and 98 ) [255] Rudenko, O. and Soluian, S. (1975). The theoretical principles of non- linear acoustics. Mosc ow Izdatel Nauka . (Cite d on p age 9 ) [256] Rumsey , F. (2013). Sp atial Audio . F o cal Press, Burlington, MA, 2 edi- tion. (Cite d on p ages 30 and 98 ) [257] Ruotolo, F., Maﬀei, L., Di Gabriele, M., Iachini, T., Masullo, M., Rug- giero, G., and Senese, V. P . (2013). Immersiv e virtual realit y and environ- men tal noise assessment: An inno v ative audio–visual approach. Envir on- mental Imp act Assessment R eview , 41:10–20. (Cite d on p age 16 ) [258] Sabine, W. (1922). Col le cte d Pap ers on A c oustics . Harv ard Univ ersity Press. (Cite d on p age 34 ) [259] Savio ja, L., Huopaniemi, J., Lokki, T., and V aananen, R. (1999). Cre- ating in teractiv e virtual acoustic environmen ts. Journal of the Audio Engi- ne ering So ciety , 47:675–705. (Cite d on p ages 1 and 57 ) [260] Schepk er, H., Haeder, K., Rennies, J., and Holub e, I. (2016). P er- ceiv ed listening eﬀort and sp eec h in telligibilit y in rev erb eration and noise for hearing-impaired listeners. International Journal of A udiolo gy . (Cite d on p ages 1 and 50 ) [261] Schr¨ oder, D. (2011). Physic al ly Base d R e al-Time A ur alization of Inter- active Virtual Envir onments . Aac hener Beitr¨ age zur T echnisc hen Akustik. Logos V erlag Berlin. (Cite d on p age 28 ) BIBLIOGRAPHY 219 [262] Schroeder, M. and Atal, B. (1963). Computer sim ulation of sound trans- mission in ro oms. Pr o c e e dings of the IEEE , 51(3):536–537. (Cite d on p age 22 ) [263] Schroeder, M., Atal, B., and Bird, C. (1962). Digital computers in ro om acoustics. Pr o c. 4th ICA, Cop enhagen M , 21. (Cite d on p age 19 ) [264] Schroeder, M. R. (1965). New metho d of measuring rev erb eration time. The Journal of the A c oustic al So ciety of Americ a , 37(3):409–412. (Cite d on p age 144 ) [265] Schroeder, M. R. (1979). In tegrated impulse metho d measuring sound deca y without using impulses. The Journal of the A c oustic al So ciety of A meric a , 66(2):497–500. (Cite d on p age 144 ) [266] Schr¨ oder, D., Pohl, A., Drechsler, S., Svensson, U. P ., V orl¨ ander, M., and Stephenson, U. M. (2013). op enmat - managemen t of acoustic material (meta-)prop erties using an op en source database format. In Pr o c e e dings of the AIA-DA GA 2013 . (Cite d on p age 21 ) [267] Schr¨ oder, D., W efers, F., Pelzer, S., Rausc h, D., V orlaender, M., and Kuhlen, T. (2010). Virtual realit y system at rwth aac hen univ ersit y . In Pr o c e e dings of the International Symp osium on R o om A c oustics (ISRA) . (Cite d on p age 56 ) [268] Seeb er, B. U., Baumann, U., and F astl, H. (2004). Localization ability with bimo dal hearing aids and bilateral co c hlear implan ts. The Journal of the A c oustic al So ciety of Americ a , 116(3):1698–1709. (Cite d on p age 40 ) [269] Seeb er, B. U., Kerb er, S., and Hafter, E. R. (2010). A system to sim u- late and repro duce audio–visual en vironmen ts for spatial hearing researc h. He aring r ese ar ch , 260(1):1–10. (Cite d on p age 56 ) [270] Seikel, J., King, D., and Drumrigh t, D. (2015). A natomy & Physiolo gy for Sp e e ch, L anguage, and He aring . Cengage Learning. (Cite d on p age 14 ) [271] Sette, W. J. (1933). A new rev erb eration time form ula. The Journal of the A c oustic al So ciety of Americ a , 4(3):193–210. (Cite d on p age 34 ) [272] Shavit-Cohen, K. and Zion Golum bic, E. (2019). The dynamics of atten- tion shifts among concurrent sp eec h in a naturalistic multi-speaker virtual en vironmen t. F r ontiers in Human Neur oscienc e , 13:386. (Cite d on p ages 1 and 57 ) [273] Sho jaei, E., Ashay eri, H., Jafari, Z., Dast, M., and Kamali, K. (2016). Eﬀect of signal to noise ratio on the sp eech p erception ability of older adults. Me dic al journal of the Islamic R epublic of Ir an , 30:342. (Cite d on p ages 1 and 55 ) BIBLIOGRAPHY 220 [274] Silzle, A., Kosmidis, D., F elix Greco, G., Beer, D., and Betz, L. (2016). The inﬂuence of microphone directivity on the lev el calibration and equal- ization of 3d loudsp eak ers setups. In 29th T onmeistertagung - VDT Inter- national Convention 2016 . (Cite d on p age 21 ) [275] Simon, L. S. R., Dillier, N., and W ¨ uthrich, H. (2021). Comparison of 3D audio repro duction metho ds using hearing devices. Journal of the Audio Engine ering So ciety , 68(12):899–909. (Cite d on p ages 21 , 46 , 93 , 94 and 121 ) [276] Simon, L. S. R., W uethrich, H., and Dillier, N. (2017). Comparison of higher-order ambisonics, vector- and distance-based amplitude panning using a hearing device b eamformer. In Pr o c e e dings of 4th International Confer enc e on Sp atial Audio, Gr az, A ustria . (Cite d on p ages 20 , 21 , 23 , 117 , 121 , 163 and 191 ) [277] Sim´ on G´ alvez, M., Menzies, D., F azi, F., de Campos, T., and Hilton, A. (2015). Listener tracking stereo for ob ject based audio repro duction. In T e cniacustic a 2016 (V alen cia)-Eur op e an Symp osium in Virtual A c oustics and Ambisonics . (Cite d on p age 27 ) [278] Skudrzyk, E. (1971). The foundations of ac oustics: b asic mathematics and b asic ac oustics . Springer-V erlag. (Cite d on p age 229 ) [279] Solv ang, A. (2008). Sp ectral impairmen t of t w o-dimensional higher order am bisonics. J. Audio Eng. So c , 56(4):267–279. (Cite d on p age 94 ) [280] Spand¨ ock, F. (1934). Akustisc he mo dellv ersuc he. Annalen der Physik , 412(4):345–360. (Cite d on p age 19 ) [281] Sp ors, S., T eutsc h, H., Kuntz, A., and Rab enstein, R. (2004). Sound ﬁeld syn thesis. In Huang, Y. and Benesty , J., editors, Audio Signal Pr o c essing for Next-Gener ation Multime dia Communic ation Systems , pages 323–344. Springer US, Boston, MA. (Cite d on p age 31 ) [282] Sp ors, S., Wierstorf, H., Raak e, A., Melc hior, F., F rank, M., and Zotter, F. (2013). Spatial sound with loudsp eak ers and its p erception: A review of the current state. (Cite d on p ages 21 , 23 , 27 , 40 , 42 and 53 ) [283] Stitt, P ., Bertet, S., and V an W alstijn, M. (2013). P erceptual inv estiga- tion of image placement with ambisonics for non-cen tred listeners. In Pr o c. of the 16th Int. Confer enc e on Digital Audio Eﬀe cts (DAFx-13), Mayno oth, Ir eland . (Cite d on p ages 21 , 46 and 49 ) [284] Strauss, H. (1998). Implemen ting doppler shifts for virtual auditory en vironmen ts. Journal of the Audio Engine ering So ciety . (Cite d on p age 20 ) BIBLIOGRAPHY 221 [285] Strumi l lo, P . (2011). A dvanc es in Sound L o c alization . a. InT ech. (Cite d on p ages 9 and 10 ) [286] Sudarsono, A. S., Lam, Y. W., and Da vies, W. J. (2016). The eﬀect of sound lev el on p erception of repro duced soundscap es. Applie d A c oustics , 110:53–60. (Cite d on p age 42 ) [287] Søndergaard, P . and Ma jdak, P . (2013). The auditory mo deling to olb o x. In Blauert, J., editor, The T e chnolo gy of Binaur al Listening , pages 33–56. Springer, Berlin, Heidelb erg. (Cite d on p age 138 ) [288] T enenbaum, R. A., Camilo, T. S., T orres, J. C. B., and Gerges, S. N. (2007). Hybrid metho d for numerical simulation of ro om acoustics with au- ralization: part 1-theoretical and n umerical aspects. Journal of the Br azilian So ciety of Me chanic al Scienc es and Engine ering , 29:211–221. (Cite d on p age 68 ) [289] T rembla y , P ., Brisson, V., and Desc hamps, I. (2020). Brain aging and sp eec h p erception: Eﬀects of background noise and talker v ariability . Neu- r oImage , 227:117675. (Cite d on p ages 1 and 55 ) [290] T revi ˜ no, J., Ok amoto, T., Iwa ya, Y., and Suzuki, Y. (2011). Ev aluation of a new ambisonic decoder for irregular loudsp eak er arrays using in teraural cues. In Ambisonics Symp osium . (Cite d on p age 94 ) [291] T u, W., Hu, R., W ang, H., and Chen, W. (2010). Measuremen t and analysis of just noticeable diﬀerence of in teraural level diﬀerence cue. 2010 International Confer enc e on Multime dia T e chnolo gy , pages 1–3. (Cite d on p age 148 ) [292] V an W anro oij, M. M. and V an Opstal, A. J. (2004). Contribution of head shado w and pinna cues to c hronic monaural sound lo calization. Journal of Neur oscienc e , 24(17):4163–4171. (Cite d on p age 11 ) [293] V orl¨ ander, M. (2007). A ur alization: F undamentals of A c oustics, Mo d- el ling, Simulation, Algorithms and A c oustic Virtual R e ality . R WTHedition. Springer Berlin Heidelb erg. (Cite d on p ages 2 , 14 , 15 , 18 , 19 , 20 , 21 , 22 , 33 , 40 , 42 , 53 and 121 ) [294] V orl¨ ander, M. (2008). Virtual Acoustics: Opp ortunities and limits of spatial sound repro duction for audiology . Hausdesho er ens-Oldenbur g . (Cite d on p age 56 ) [295] V orl¨ ander, M. (2014). Virtual acoustics. A r chives of A c oustics , v ol. 39(No 3):307–318. (Cite d on p age 40 ) [296] W allach, H. (1938). On sound lo calization. The Journal of the A c oustic al So ciety of Americ a , 10(1):83–83. (Cite d on p age 10 ) BIBLIOGRAPHY 222 [297] W ang, D. and Bro wn, G. J. (2006). Binaural sound lo calization. In Computational Auditory Sc ene A nalysis: Principles, A lgorithms, and Ap- plic ations , pages 147–185. Wiley . (Cite d on p age 146 ) [298] W anner, L., Blat, J., Dasiopoulou, S., Dom ´ ınguez, M., Llorac h, G., Mille, S., Sukno, F., Kamateri, E., V ro chidis, S., Kompatsiaris, I., Andr ´ e, E., Lin- genfelser, F., Mehlmann, G., Stam, A., Stellingw erﬀ, L., Vieru, B., Lamel, L., Mink er, W., Pragst, L., and Ultes, S. (2016). T o w ards a m ultimedia kno wledge-based agen t with so cial comp etence and human interaction ca- pabilities. In Pr o c e e dings of the 1st International Workshop on Multime dia A nalysis and R etrieval for Multimo dal Inter action , MARMI ’16, page 21–26, New Y ork, NY, U SA. Asso ciation for Computing Machinery . (Cite d on p ages 1 and 57 ) [299] W ard, D. B. and Abhay apala, T. D. (2001). Repro duction of a plane- w a v e sound ﬁeld using an arra y of loudsp eakers. IEEE T r ansactions on Sp e e ch and A udio Pr o c essing , 9(6):697–707. (Cite d on p ages 31 , 77 and 86 ) [300] W endt, D., Dau, T., and Hjortkjær, J. (2016). Impact of background noise and sen tence complexit y on pro cessing demands during sen tence com- prehension. F r ontiers in Psycholo gy . (Cite d on p age 51 ) [301] W endt, D., Hietk amp, R. K., and Lunner, T. (2017). Impact of noise and noise reduction on pro cessing eﬀort: A pupillometry study. Ear and He aring . (Cite d on p ages 50 and 101 ) [302] W endt, D., Ko elewijn, T., Ksi¸ a ˙ zek, P ., Kramer, S. E., and Lunner, T. (2018). T ow ard a more comprehensive understanding of the impact of mask er type and signal-to-noise ratio on the pupillary resp onse while p er- forming a sp eech-in-noise test. He aring R ese ar ch , pages 1–12. (Cite d on p ages 50 , 101 and 102 ) [303] W estermann, A. and Buchholz, J. M. (2017). The eﬀect of nearb y mask ers on sp eec h intelligibilit y in reverberant, multi-talk er environmen ts. The Journal of the A c oustic al So ciety of Americ a , 141(3):2214–2223. (Cite d on p ages 42 and 99 ) [304] Whitmer, W. M. and Ak eroyd, M. A. (2013). The sensitivity of hearing- impaired adults to acoustic attributes in simulated ro oms. Pr o c e e dings of Me etings on A c oustics , 19(1):015109. (Cite d on p ages 1 , 18 and 50 ) [305] Whitmer, W. M., Seeb er, B. U., and Akero yd, M. A. (2012). Apparen t auditory source width insensitivity in older hearing-impaired individuals. The Journal of the A c oustic al So ciety of Americ a , 132(1):369–379. (Cite d on p ages 16 , 18 and 40 ) [306] Wightman, F. L. and Kistler, D. J. (1992). The dominan t role of lo w- frequency interaural time diﬀerences in sound lo calization. The Journal of the A c oustic al So ciety of Americ a , 91(3):1648–1661. (Cite d on p age 10 ) BIBLIOGRAPHY 223 [307] Wightman, F. L. and Kistler, D. J. (1997). Monaural sound lo calization revisited. The Journal of the A c oustic al So ciety of Americ a , 101(2):1050– 1063. (Cite d on p age 11 ) [308] Wilcox, R. (2004). Inferences based on a skipp ed correlation co eﬃcien t. Journal of Applie d Statistics , 31(2):131–143. (Cite d on p age 116 ) [309] Williams, G. (1999). F ourier A c oustics: Sound R adiation and Ne arﬁeld A c oustic al Holo gr aphy . Academic Press. (Cite d on p ages 28 and 229 ) [310] Wisniewski, M. G., Thompson, E. R., and Iyer, N. (2017). Theta- and alpha-p o w er enhancemen ts in the electro encephalogram as an auditory de- la y ed match-to-sample task b ecomes imp ossibly diﬃcult. Psychophysiolo gy , 54(12):1916–1928. (Cite d on p age 111 ) [311] Wisniewski, M. G., Thompson, E. R., Iy er, N., Estepp, J. R., Go der- Reiser, M. N., and Sulliv an, S. C. (2015). F rontal midline θ pow er as an index of listening eﬀort. Neur or ep ort , 26(2):94—99. (Cite d on p age 111 ) [312] W ong, G. S. K. (1986). Sp eed of sound in standard air. The Journal of the A c oustic al So ciety of Americ a , 79(5):1359–1366. (Cite d on p age 15 ) [313] W¨ ostmann, M., Lim, S.-J., and Obleser, J. (2017). The Human Neu- ral Alpha Resp onse to Sp eec h is a Proxy of Atten tional Control. Cer ebr al Cortex , 27(6):3307–3317. (Cite d on p age 111 ) [314] Xie, B. (2013). He ad-r elate d tr ansfer function and virtual auditory dis- play . J. Ross Publishing. (Cite d on p ages 22 and 70 ) [315] Y ost, W. (2013). F undamentals of He aring: An Intr o duction . Brill. (Cite d on p age 8 ) [316] Zapata Ro driguez, V., Jeong, C.-H., Hoﬀmann, I., Cho, W.-H., Beldam, M.-B., and Harte, J. (2019). Acoustic conditions of clinic ro oms for sound ﬁeld audiometry . In Pr o c e e dings of 23r d International Congr ess on A c ous- tics , pages 4654–59. Deutsc he Gesellsc haft f ¨ ur Akustik. 23rd International Congress on Acoustics , ICA 2019 ; Conference date: 09-09-2019 Through 13-09-2019. (Cite d on p ages 122 and 139 ) [317] Zekveld, A., Kramer, S., and F esten, J. (2011). Cognitive load during sp eec h p erception in noise: The inﬂuence of age, hearing loss, and cognition on the pupil resp onse. Ear and he aring , 32:498–510. (Cite d on p ages 1 and 55 ) BIBLIOGRAPHY 224 [318] Zekveld, A. A. and Kramer, S. E. (2014). Cognitive processing load across a wide range of listening conditions: Insights from pupillometry . Psy- chophysiolo gy . (Cite d on p ages 50 and 112 ) [319] Zekveld, A. A., Kramer, S. E., and F esten, J. M. (2010). Pupil resp onse as an indication of eﬀortful listening: The inﬂuence of sentence intelligibilit y. Ear and He aring . (Cite d on p ages 50 and 118 ) [320] Zhang, W., Samarasinghe, P ., Chen, H., and Abhay apala, T. (2017). Sur- round b y Sound: A Review of Spatial Audio Recording and Repro duction. Applie d Scienc es , 7(5):532. (Cite d on p ages 19 , 20 , 21 , 27 and 40 ) [321] Zob el, B. H., W agner, A., Sanders, L. D., and Ba ¸ sken t, D. (2019). Spatial release from informational masking declines with age: Evidence from a de- tection task in a virtual separation paradigm. The Journal of the A c oustic al So ciety of Americ a , 146(1):548–566. (Cite d on p ages 16 and 40 ) [322] ˇ Lub o ˇ s Hl´ adek, Ewert, S. D., and Seeb er, B. U. (2021). Comm unication conditions in virtual acoustic scenes in an underground station. (Cite d on p age 42 ) [323] S ¸ aher, K., Rindel, J. H., Nijs, L., and V an Der V o orden, M. (2005). Impacts of reverberation time, absorption lo cation and background noise on listening conditions in multi source en vironmen t. In F orum A custicum Budap est 2005: 4th Eur op e an Congr ess on A custics . (Cite d on p age 50 ) App endix A ITDs Am bisonics Figure A.1 , depicts ITDs for measurements with a listener (HA TS manikin) in the cen ter with Ambisonics (blac k line), in nine oﬀ-center p ositions combi- nations accompanied by a second listener (KEMAR) and alone in those three oﬀ center p ositions. Figure A.1: ITD as a function of sour c e angle in Ambisonics virtualize d setup. T op left HA TS displac ement = 25 cm, top right HA TS displac ement = 50 cm, b ottom left HA TS displac ement = 75 cm, b ottom right HA TS displac ement matching KEMAR displac ement. 225 App endix B Delta ILD Am bisonics Figures B.1 , B.2 , and B.3 , presen t the diﬀerences in ILD b et w een center and oﬀ-cen ter listener p ositions utilizing 24 loudsp eak ers to render an Am bisonics with a second listener present inside the ring of loudsp eak ers. In the ﬁgures, the num b er follo wing H indicates the p osition of the main listener, while the n um b ers after K indicate the p osition of the second listener. Figure B.1: Diﬀer enc es in the ILD b etwe en c enter e d setup and oﬀ-c enter setups: HA TS at 25 cm to the right with: KEMAR at 25 cm to the left (top); KEMAR at 50 cm to the left (midd le); KEMAR 75 cm to the left (b ottom). 226 227 Figure B.2: Diﬀer enc es in the ILD b etwe en c enter e d setup and oﬀ- c enter setups: HA TS at 50 cm to the right with: KEMAR at 25 cm to the left (top); KEMAR at 50 cm to the left (midd le); KEMAR 75 cm to the left (b ottom). Figure B.3: Diﬀer enc es in the ILD b etwe en c enter e d setup and oﬀ- c enter setups: HA TS at 75 cm to the right with: KEMAR at 25 cm to the left (top); KEMAR at 50 cm to the left (midd le); KEMAR 75 cm to the left (b ottom). App endix C W a v e Equation and Spherical Harmonic Represen tation Spherical harmonics (SH) represent spatial v ariations of an orthogonal set of solutions in the Laplace equation (orthonormal basis) when the solution is expressed in spherical co ordinates, th us giving the spatial represen tation of w eigh ted sums in spherical forms that represen ts a signal (space and frequency dep enden t). C.1 W a v e Equation in Spherical Co ordinates Expressing the wa ve equation in spherical co ordinates ( r , ϕ, θ ) [ 36 ] w e hav e: ∂ 2 p ∂ r 2 + 2 r ∂ p ∂ r + 1 r 2 sin( θ ) ∂ ∂ θ  sin( θ ) ∂ p ∂ θ  + 1 r 2 sin 2 ( ϕ ) ∂ 2 p ∂ ϕ 2 − 1 c 2 0 ∂ 2 p ∂ t 2 = 0 , (C.1) C.2 Separation of the V ariables The diﬀerential equation solution to ol called sep ar ation of variables can b e used for the Equation C.1 , b eing formulated from the pro duct of three space dep enden t v ariables and a time dep endent v ariable: p ( r , θ , ϕ, t ) = R ( r )Θ( θ )Φ( ϕ ) T ( t ) . (C.2) 228 Chapter C. Separation of the V ariables 229 With the separation of the v ariables, according to Skudrzyk [ 278 ], there are four homogeneous diﬀerential equations: d 2 Φ d ϕ + m 2 = 0 , (C.3a) 1 sin θ d d θ  sin θ dΘ d θ  +  n ( n + 1) − m 2 sin 2 θ  Θ = 0 , (C.3b) 1 r d d r  r 2 d R d r  + k 2 R − n ( n + 1) r 2 R = 0 , (C.3c) 1 c 2 d 2 T d t 2 + k 2 T = 0 . (C.3d) where m and n integers, the general solutions to the equations are Φ( ϕ ) = Φ 1 e j mϕ +Φ 2 e − j mϕ , (C.4a) Θ( θ ) = Θ 1 P m n (cos( θ )) + Θ 2 Q m n (cos( θ )) , (C.4b) R ( r ) = R 1 h (1) n ( k r ) + R 2 h (2) n ( k r ) , (C.4c) T ( ω ) = T 1 e j ω t + T 2 e − j ω t , (C.4d) where h (1) n ( x ) and h (2) n ( x ) are the ﬁrst and second-kind spherical Hank el func- tions that represen t conv ergent and divergen t wa ves dep ending on the signal agreed for the time and P m n ( x ) and Q m n ( x ) are the asso ciated Legendre func- tions of the ﬁrst and second t yp e. Due to the singularities in the poles of Legendre’s asso ciated functions at θ = 0 and θ = π the term Θ 2 is treated as n ull, and for simpliﬁcation, y ou can use the p ositiv e m v ariable or negativ e, so the term Φ 2 is also null. According to Williams [ 309 ], for there to b e no singularities in the p oles of Legendre’s asso ciated functions, the n index must b e an in teger. Still, considering causal systems, the term T 2 in C.4d is equal to 0 giv en the con v en tion used. Chapter C. Spherical Harmonics 230 The asso ciated Legendre functions of the ﬁrst type deﬁned for p ositive degrees m are P m n ( x ) = (1) m (1 − x 2 ) m 2 d m d x m P n ( x ) . (C.5) Mean while, the functions for negative degrees − m are given b y P − m n = ( − 1) m ( n − m )! ( n + m )! P m n ( x ) , (C.6) P n b eing the Legendre Polynomial giv en by P n ( x ) = 1 2 n n ! d n d x n ( x 2 − 1) n . (C.7) C.3 Spherical Harmonics Equations C.4a and C.4b admit p erio dic solutions in angular co ordinates, and com bined are called spherical harmonics of order n and degree m deﬁned b y Y m n ( θ , ϕ ) = s (2 n + 1) 4 π ( n − m )! ( n + m )! P m n (cos( θ )) e j mϕ . (C.8) The negative order SH functions are obtained through the relation Y m n ( θ , ϕ ) = ( − 1) m · ( Y − n m ( θ , ϕ )) ∗ , (C.9) where ∗ denotes the conjugate complex, and demonstrates that only the phase c hanges b etw een the p ositiv e and negativ e degrees of the function. Th us the magnitude is commonly expressed with the radius and the phase in terms of color or color scale, as in Figure 2.9 . App endix D Rev erb eration time in Acoustic Sim ulation The reverberation time for the classro om and restauran t are presen ted in Fig- ure D.1 (a) (b) Figure D.1: R everb er ation time (a) Classr o om (b) R estaur ant in o ctave b ands 231 App endix E Alpha Co eﬃcien ts Figures E.1 , E.2 , and E.3 presents the absorption co eﬃcients according to the frequency introduced in the ODEON softw are to simulate the en vironmen ts. Figure E.1: Classr o om alpha c o eﬃcients (ODEON softwar e). 232 233 Figure E.2: R estaur ant alpha c o eﬃcients (ODEON softwar e). Figure E.3: A ne choic r o om alpha c o eﬃcients. App endix F Questionnaire Questionnaire 1 | 1 TS_ 00 Date: ___ / ___ / 2019 Hvor meget anstrengte du dig for at høre sætningerne? Hvor mange af ordene tror du, at du forstod korrekt? Hvor ofte måtte du opgive at forstå sætningen? Ingen anstrengelse L av anstrengelse Mo derat anstrengelse H ø j anstrengelse Meget h ø j anstrengelse Ingen Mindre end halvdelen Halvde len Mere e nd halvdelen Alle Aldrig Mindre end halvdelen af tiden Halvde len af tiden Mere end halvdelen af tiden Altid 234

Sergio Aguirre