Keywords

1 Introduction

Virtual Reality (VR) has received much attention in the last decades due to its multiple applications in a broad range of fields: conductor training (look for references), military strategies, entertainment, blind rehabilitation, neurological testing, surgery training, etc. VR is an environment generated in the computer, which the user can operate and interact with in real time [1]. It has also many applications in the architectural field such as designer training [2, 3] or citizen participation in urban design process [4, 5]. But a remarkable feature is that the majority of architectural applications of VR rely only on the visual aspects and doesn’t pay much attention to the acoustic ones, although other architectural-related fields use images combined with sound, such as computer games. The lack of attention to the acoustic aspects in VR architectural representations presents big interest to those who search for an immersive architecture experience.

There are several ways to include audio in VR and there has been extensive research regarding the development of these formats. The three most common and successful ones are named Multi-Channel audio, Object-Based audio [6] and Ambisonics [7]. Although each of these technologies present their own advantages over the others, none of them is capable of containing the soundscape of the place and allowing the free movement of the listener without a big array of speakers, a huge use of CPU power, or a multiple path recording, respectively. Firstly, not everybody has the chance for a big array of speakers. Secondly, the average of personal computer’s power cannot be big enough for the support of a game as that presented, Finally, a multiple path recording would be too expensive and tedious for an architectural VR study.

This paper presents a set of criteria for the generation of a new VR audio format that can fulfil the lacks that the other formats have. This new format is based on the creation of a filter which contains the place information. On the basis of these criteria it then describes the preparation of a set of study cases where the filter can be tested.

The last step in the experimental process will consist of the evaluation of this acoustic filter by architecture students and by generic users. This evaluation will show us the satisfaction grade of both students and users in a similar way that some studies made before [8]. The intention of these studies was to understand the city as a digital educational environment and use it as an experimental frame for the implementation of the new technologies both in academic curriculum and in informal education.

2 Background

Architecture necessarily deals with use, place and technique [9]. The three basic components contain the questions that architecture must answer: how do you live in this architecture, where is this architecture located, and how is this architecture made. The study of these three components leads to the architectural discipline. It seems to be logical that if a representation of an architectural ambience pretends to be immersive, it must represent the three basic components with success. This means that the representation (and more exactly VR) must explain how do you live there, where is this located and how is that made.

The first component (how do you live in this architecture) has always been represented by a wide range of methods commonly known as geometry. These methods consist on showing the facilities of the space to the user: the comfortable measures of the rooms, the suitable position of furniture or the convenient disposition of the walls between rooms. VR representations have developed ways for the visualization of these features by representing the geometry of a place in a perspective in a convincing way. It was also shown that viewing a graph in a virtual reality display is three times as good as 2D diagram [10].

The second component (where is this architecture located) can be firstly found in the science of place called geography. This science is able to define the physical characters of place and represent them through cartographic drawings. VR representations have developed ways for the visualization and navigation through these spaces not only by a desktop but also by a head-mounted display [11]. What is more, architectural models need to include landscape elements such as trees, mountains and other buildings in order to locate the represented architecture in an exact environment.

The third component (how is this architecture made) representation counts with a long tradition of drawing techniques that are able to explain the nature of the construction: materials, textures, colours, brightness, transparency or blurriness are some of the large number of the characteristics of the built environment. VR representations have also developed ways for the visualization of these features by representing the qualities of an object under the effect of natural or artificial light.

As we can see, the three components need to be convincing in order to achieve an immersive environment. Nevertheless, we have presented architecture as a discipline only dealing with visual data. While use, place and technique embrace also haptic, tactile or acoustic factors [1]. In fact, some critiques from within the discipline of architecture itself argue the importance of the other senses, apart from sight, for the discipline of architecture. Juhani Pallasmaa [12], Ted Sheriden and Karen Van Lengen [13], Björn Hellstrom [14], Stephen Holl and Rafael Pizarro [15] are some of the architects who have noted what the discipline of architecture may profit from considering the hidden realm of the auditory and the multisensory [16].

These reasons conduce our attention to the acoustic representation in VR. As Vorländer says, if the behaviour of an acoustic object or system is shown in a more complex way than numerically, including the creation of acoustic signals in time or frequency domain, we talk about “simulation” and “auralization” [1].

The implementation of audio information from the surroundings is what is needed for an enhanced immersive experience of the represented architecture. For this purpose, several VR audio formats for auralization have been developed during the last decades: Multi-channel audio, Object-Based audio and Ambisonics. In this part of the article we are going to explain the main features of these audio formats and to point out the main strengths and weaknesses of each of them.

2.1 Multi-channel Audio

In the Multi-channel audio representation, the listener is located in the centre of the scene and an array of speakers surround them. The unit of information is the loudspeaker, where each channel is associated to a loudspeaker. Here, the sound reproduction is made by mixing the various channels on several speakers. In Multi-channel audio, the more channels, the more spatial sound capabilities. This method has been the traditional sound representation used for the past 50 years or more. The Stereo, 5.1, 7.1 formats are multi-channel horizontal representations. 3D is obtained by adding elevated speakers, like in the 11.1 format, where 4 ceiling speakers are added to a 7.1 horizontal speaker layout. One of the main drawback of the Multi-channel audio representation is that it is loudspeaker set up dependent and that one needs one mix tor type of each set-up, whereas Object-Based and Ambisonics contents are independent of the loudspeaker set-up [17]. Another disadvantage is that a Multi-channel audio system needs an array of speakers and relies on the number of speakers. When the number of speakers is limited the system becomes poor and without possibilities.

2.2 Object-Based Audio

In the Object-Based representation, the listener is located on the centre of the scene with headphones and some virtual sound sources surrounds them. The unit of information is the virtual sound source. The scene is made of several virtual sound sources and information about their locations, their directivity patterns and the rendering environment (room size, reverberation parameters…). The 3D audio rendering is made by calculating the combination of all the sources, including the reverberation, at the listener position. This is a great paradigm to interactively create content, but it also uses a lot of CPU resources. The more complex (number of sound sources) and realistic (precision of the reverberation) the scene, the more CPU is needed [17]. Moreover, Object-Based audio has to render the environment according to the virtual model, what is not always the most exact approximation to the reality. This last drawback is pretended to be solved with our filter proposal, as we explain afterwards.

2.3 Ambisonics

Unlike the two other representations, the Ambisonics format does not rely on the description of individual sound sources (speakers or objects) but instead represents the resulting sound field at the listener’s position. The mathematical formalism used to describe the sound field is called spherical harmonics and the unit of information is the number of component (or the Order) of this spherical representation. The more components or the higher order you have, the more precision in the spatial representation of the scene you get. This paradigm is not new and has been used by a small sound professional community for several decades with a concept called the B-Format which is in fact a Higher Order Ambisonics representation at the 1st order. The representation of the resulting sound field at the listener’s position is one of its main advantages when computing the information, but can be also one of its main drawbacks when considering the listener’s position as a fixed point. If one wants to record a whole place, one must make as Ambisonics recordings as possible positions of the listener, and this can be a tedious task.

Ambisonics can be understood as a three-dimensional extension of M/S (mid/side) stereo, adding channels for height and depth. The resulting signal set is called B-format. Its component channels are labelled:

  • W: for the sound pressure (the M in M/S).

  • X: for the front-minus-back sound pressure gradient.

  • Y: for the left-minus-right (the S in M/S).

  • Z: for up-minus-down.

The W signal corresponds to an omnidirectional microphone, whereas XYZ are the components that would be picked up by figure-of-eight microphones oriented along the three spatial axes.

The simplest Ambisonic panner (or encoder) takes a source signal S and two parameters, the horizontal angle θ and the elevation angle ϕ. The different gains of the Ambisonic components are the following:

$$ W = S \cdot \frac{1}{\sqrt 2 } $$
(1)
$$ X = S \cdot \cos \theta \cos \phi $$
(2)
$$ Y = S \cdot \sin \theta \cos \phi $$
(3)
$$ Z = S \cdot \sin \theta $$
(4)

3 Methodology

In this part of the article we are going to expose the features of our new VR audio format. We must remain that the purpose of our work consists of the definition of a way of representing the soundscape of an architectural environment able to be implemented in a VR environment or to make an acoustic analysis of a place. For the definition of the Acoustic Filter, we need to make an experiment. The materials required for the experiment are an anechoic chamber for the sound recording in a pure state, a recording equipment consisting on a Zoom H6 recorder with two incorporated microphones and two KM 183 Newmann omnidirectional microphones, and a dodecaedrical ball for the reproduction of the sound into the anechoic chamber and into the analyzed places. Additionally, we need a software sound editor as Adobe Audition and the knowledge of correlation of acoustic wave principles. Our experiment consists of some steps derived from the audio correlation technique.

First of all, a known and basic sound is reproduced in the anechoic chamber with a dodecaedrical ball. This basic sound is recorded in the anechoic chamber with the KM 183 Newmann matched pair omnidirectional microphones connected to the Zoom H6. As the environment in the anechoic chamber is free of interferences from the ambience and additional noise, the basic sound recorded contains the acoustic information of the pure basic sound.

Secondly, the pure basic sound is analyzed and its frequency range is tested to be complete.

Thirdly, the pure basic sound is reproduced in the case study place. This sound is reproduced with the same dodecaedrical ball which was recorded in the anechoic chamber, in order to reproduce the basic sound under the same conditions as the first recording. This resulting sound is recorded in the case study place by the KM 183 Newmann matched pair omnidirectional microphones connected to the Zoom H6. The recording of the sound can be made from several points in the case study place, simulating the different positions of the listener.

Fourthly, the resulting sound is analyzed. The presence of an urban and architectural environment, city sound and position from the sound, modifies the basic sound in a way that the resulting sound registers. Usually, the resulting sound is attenuated, colorized, enriched or delayed by the presence of the environmental agents.

Finally, by a cross-correlation process [18] we can compare the basic sound with the resulting sound. In order to search the environment features, which are recorded in the resulting sound, it is easy to make the difference between the resulting sound and the basic sound. The sonic difference contains only the features of the place, which we denominate as the place sound.

This place sound can be treated as a function [19] and constitutes the basis of our Acoustic Filter. The Acoustic Filter is able to reproduce the acoustic features of the place with any basic sound. This is the reason why the Acoustic Filter is very useful in architectural VR, because the different basic sounds that take place in an environment can be treated under the Acoustic Filter in order to get the impression of the sound in that place without a complete virtualization of the acoustics in the model, as Object-Based audio does, and without a prerecorded image of the environment acoustics, as Ambisonics does. The design of this Acoustic Filter is required and in the next part of the article we are going to introduce the basics of filter design (Fig. 1).

Fig. 1.
figure 1

Graphic comparison between the two audio formats (Objects and Ambisonics) with the new Acoustic Filter: (a) above all, the generic scenario is presented; (b) then the working process of Objects and Ambisonics; (c) the basic properties of Objects and Ambisonics and the main feature of the Acoustic Filter; (d) the resulting scenario for the Acoustic Filter.

3.1 The Case Study Places

Five urban places are selected for the study cases. All of them are located in the ancient center of Barcelona, near the cathedral. Each of them has got a principal actor that is usually located there. Our intention is to relate the acoustic study to the type of music they make in each place. For this reason, we are going to record the particular music played in this place to be compared between them. Here we describe the places and their principal actors:

  • The crossroad between Carrer del Bisbe and Carrer de Santa Llúcia. Its principal actor usually sings every weekend night. He is an opera tenor singer that, sometimes, is accompanied by other opera singers. His actuation consists of the interpretation of some opera arias and fragments over an orchestra basis sang by a portable speaker. The orchestra accompaniment has got no voice recorded in a way that our actor sings in front of some spectators. The actor is placed in Carrer de Santa Llúcia and singing towards the Carrer del Bisbe, where people circulation is not blocked.

  • The crossroad between Carrer del Bisbe and Carrer de la Pietat. Its principal actors vary from a guitar soloist, an arpist, or a clarinetist accompanied by a guitar. Their position in front of a door in Carrer de la Pietat does not block the people circulation.

  • Plaça de Sant Iu, in front of the East door of the Barcelona Cathedral. Its principal actors vary forma a guitar soloist, an arpist, or a clarinetist accompanied by a guitar. Their position in the Palau del Lloctinent façade does not block the people circulation.

  • Plaça del Rei. Their principal actors are usually guitar soloists, or violin soloists. But because of its spatial configuration, there have been made some choir or band concerts. They are usually placed on the stairs on the north corner, those that allow the access to the Santa Àgata chapel and the Tinell Room.

  • Plaça de Sant Jaume. The presence of the bells of the Generalitat and its musical interpretations every week, give a unique soundscape to this square and its surroundings. Moreover, Plaça de Sant Jaume is a usual scenery for some concerts and projections with music.

4 Audio Filter Design and Analysis: Basic Parameters

A filter, generally, is a system that avoids some parts of the processed object, following one or more attributes. For example, a gravel filter, or sieve, allows the sand passing but stops the stone passing. In a similar way, a processing signal filter is a very wide concept, because it can be any system treating signal. The filtering concept is very important for our experiment. For this reason, it is necessary a correct definition of the filter concept and types (Fig. 2).

Fig. 2.
figure 2

Frequency representation of the following ideal filters, from left to right: (a) low-pass; (b) high-pass; (c) band-stop; (d) bandpass. The filters are symmetric from the horizontal axis.

We can define the classical types of filters, in a similar way that Ruiz and Duxans [20] do:

  • A low-pass filter that allows low frequencies pass and attenuates high frequencies.

  • A high-pass filter that allows high frequencies pass and attenuates low frequencies.

  • A band-stop filter, that is complementary to bandpass filter because it eliminates a band of frequencies, allowing the rest of the frequencies.

  • A bandpass filter, that allows a band of frequencies, eliminating the high and low frequencies.

A digital classic filter can be defined as a system that modifies a digital signal in a way that allows the pass of an interval of frequencies and attenuates the others. It has a main objective: an output with some features [20]. For our experiment, we need to study the feasible digital filters. These feasible digital filters must verify some properties: linearity, time invariance, causality, and stability:

  • If it verifies the linearity, we know that the filter does not modify the signal form. It can postpone it, but it maintains the signal form. A lineal filter, in discrete time or non-discrete time, is that one which verifies the superposition property: if an input consists of the pondered sum of some signals, the output is simply the superposition (the pondered sum) of the answers of the filter to each one of those signals [21].

  • If it verifies the time invariance, we know that the behavior and features of the filter are fixed in time. A filter is time invariant if a time sliding in the input signal causes a time sliding in the output signal. The property of time invariance tells us that the answers of a time invariant system to time slid unitary impulses, are simply scrolled versions one from each other [21].

  • If it verifies the causality, we know that the filter output, in any moment, depends only from the input value in the present moment and in the past. Sometimes, this filter is called a non-anticipative filter, because the filter output doesn’t anticipate future values of the input [21].

  • If it verifies the stability, we know that the little inputs of the filter lead to non-divergent answers. If the stable filter input is limited (that is, if its magnitude doesn’t increase in a non-limited way), the output is also limited and it cannot diverge [21].

In the next part of this section, we describe the characterization of a filter regarding its impulse response and transfer function.

In the temporal domain, the impulse response relates the input and the output of a lineal and time invariant system (SLIT) [20].

If we talk about digital systems, which is the domain that we are going to treat in our experiment (digital filters), we have to bear in mind that the output is the convolution sum between the input x[n] and the impulse response h[n]. This means that the output is the sum of the input, plus the answer, plus the echoes [21]. From now and ahead we are going to denominate the lineal a time invariant filter (linear time invariant system) as SLTI. In the filters’ case, the impulse response and the transfer function determine the next concepts:

  • The gain (G(f)) is defined as the amplification of the output signal regarding the input signal. If it is negative, it is called attenuation.

  • The amplitude response of a filter is defined as the modulus of the filter frequency response.

  • The phase response of a filter is defined as the phase of the impulse response.

  • The order of a filter is the rate that a filter has got and it matches up with the maximum delay (in terms of samples) used or with the input signal or previous outputs in order to calculate y[n].

  • The bandpass of a filter is the frequency range that a filter allows the pass from the input to the output with an attenuation.

  • The group delay evaluates the output signal compared to the input signal in samples for each frequency. Hence, the global delay experimented by a signal is evaluated. If the group delay is constant, the phase is lineal.

The definition of a filter in finite differences equation allows us to make a clear distinction between two types of filters:

  • Non-recurrent filters: the impulse response of the filter has got a finite number of samples different from zero and, for this reason, they are called under the name of FIR filters (finite impulse response).

  • Recurrent filters: those in which the impulse response has got a non-finite number of samples different from zero, and they are named as IIR filters (infinite impulse response).

The basis of the filter design technique assumes that measurements can be made in the reproduced sound field in order to compare the reproduced signals with the signals that are wanted to be reproduced [22].

5 Filter Design

Having seen the basic parameters of the filters, based on the system features, we are going to consider the design of our filter.

We have already seen that a feasible, lineal, invariant, causal and stable FIR filter has got an impulse response of a finite length L, and an output that depends only on input values, never on output values. If we want to have lineal phase (without distortion of the wave form of the original signal), the impulse response must be symmetric or asymmetric.

The filter design should be flexible enough for its implementation in a Virtual Reality environment. In particular, a Gamification of the study cases that were previously exposed is been developed. The final intention is to implement the Acoustic Filter to the VR environment in order to listen the current soundscape in its actual version, and afterwards to listen the different urban proposals of the same place only by changing the sound sources.

The filter design implies a complete comprehension and application of the cross-correlation tools applied to audio filter production. The definitive Acoustic Filter design will be explored in future research.

6 Conclusion

The present study confirms that a new audio format for virtual reality can be possible. Based on the initial hypothesis, it is now possible to state that the up-to-date audio formats for virtual reality are not suitable for the characterization of the acoustic features of the urban environment because they do not interact with the environment (Objects) or they do not split the environment from the sound sources (Ambisonics).

The current findings add substantial information to our understanding of the acoustic properties of a place. In particular, the proposed Acoustic Filter can be considered a sound filter containing spatial information of the place. This finding suggest that architect’s conception of space can be approached not only by visual parameters, but also acoustic ones.

Finally, an important limitation needs to be considered. If the proposed Acoustic Filter is required for the complete acoustic comprehension of a public space, several measurements must be done in order to interpolate the results and predict a possible acoustic plan. This means that the more measurements are made, the better is the resolution of the acoustic plan.

Further research is required in order to establish a complete design and put into practice of the Acoustic Filter. In particular, next steps will be done in audio filtering design by means of cross-correlation methods and the field measurements in the places above mentioned will be carried out. With these future objectives, it is logical that the experiment will show the results that are searched.