An Investigation of Choral Blending through Soundfield Capture, Acoustic Evaluation, and Perceptual Analysis Methods
An Investigation of Choral Blending through Soundfield Capture, Acoustic Evaluation, and Perceptual Analysis Methods
Project Report
Published: April 14, 2025
Authors
Ying-Ying Zhang (McGill University) and Jithin Thilakan (Hochschüle für Musik Detmold)
Introduction
When several musicians perform together, the resulting “blend” of their sounds represents the “fusion of multiple timbres into a single timbral image” [1]. Blend is often understood as the end goal of orchestration and performance; how, and when, blend occurs, as well as how musicians go about achieving it, has been the subject of particular study [2]. This project focuses on a particular subset of ensemble blend, the “often adjudicated but seldom researched” choral blend [3]. Singers “employ a built-in neurological instrument” to varying effect when they are performing solo or in a choir [4]. Blend is at once a verb—a collective action among musicians— and a noun, the "aesthetic product" of an ensemble’s performance [5]. In addition to the direct sound musicians hear from themselves and other performers, reflected sound from the environment also functions as a strong aural feedback system. In this project we are also investigating how this feedback system affects the blending between musicians.
This investigation represents the culmination of a series of choral blend experiments conducted in the Multimedia Room (MMR) at McGill University during October 2022. In this experiment we investigated singers’ perception of choral blend, having them perform varied repertoire in several acoustic environments and ensemble spacings.
Our research questions were as follows:
What is the influence of room acoustics in choral blend?
Are there acoustic environments that singers find more "supportive" than others?
How do singers change their performance strategies to different acoustic environments?
Materials and Methods
In this experiment we tested three different acoustic environments which were created using a combination of the MMR’s passive acoustic banners and curtains as well as the active Meyer Constellation system: a “dry” setting with all passive acoustic systems fully deployed and and no auralized outputs from the Meyer Constellation system, the “Percussion” setting which is a relatively short tail and a tweaked Early Reflection (ER) response of +1.5 dB, and the “Cathedral” setting which has a long decay tail and was tweaked to have no ER and -2 dB on a perceptual brightness rating.
Room acoustic measurements were taken in the three environments to characterize their acoustic properties both objectively and subjectively, including Room Impulse Response (RIR) and Speech Transmission Index (STI) measurements. The RIRs were measured at multiple source-receiver locations, and the room acoustic parameters representing aspects such as perceived reverberation and clarity were extracted in accordance with the ISO standards [6]. Additionally, STI values, representing speech intelligibility, were estimated in accordance with the IEC 60268-16 standard. The three acoustic environments involved in this study exhibited distinct acoustic properties, including different reverberation time; 0.6 seconds for the ‘Dry’, 1.3 seconds for 'Percussion', and 1.9 seconds for the ‘Cathedral’ (these T30 values are averaged over 500-1000 Hz octave bands).
Along with these acoustic measurement techniques mentioned above, an acoustic camera was used to record the singers' performance, enabling a visual analysis of sound emission from the sources and the room reflections (see Figure 1b). This device allows for the investigation of the intensity and localization of the sound sources and room reflections for different frequency bands.
Constellation or Room Settings: Dry (Banners down and curtains closed)
Comments/Modifications: No Simulated Acoustics
Reverberation Time (T30): 0.6 s
Constellation or Room Settings: Percussion
Comments/Modifications: ER +1.5 dB
Reverberation Time (T30): 1.3 s
Constellation or Room Settings: Cathedral
Comments/Modifications: ER -infinity, -2 dB perceptual brightness
Reverberation Time (T30): 1.9 s
Figure 1a: The acoustic camera
Figure 1b: Screen capture of the camera recording of a performance
To capture signals with the least amount of room response, we used the experimental close microphone technique described in our other work on close microphone techniques. While this anechoic capture system is ideal for separating sound signals, we also implemented spaced and soundfield capture systems. These systems captured varying amounts of the virtual acoustic environment to serve as a comparison point for the anechoic recordings.
Figure 2: A diagram showing the main, surround, height, and soundfield capture systems in the space of the MMR. Circles represent microphones with omnidirectional polar patterns, while the doubled circles represent the dual cardioid microphones used for the back heights. Please note that this is the same capture system used in the ODESSA IV recordings, which took place concurrently
Microphone Setup
Our microphone setup was also being used for other ongoing concurrent experiments, and therefore had to be versatile. We used anechoic close mics and spot microphones on each of our five sopranos, while the virtual acoustic environment was captured through Ambisonics, binaural, and spaced techniques. A typical channel-based recording method was based around typical acoustic recording in order to capture the room proper. This included a 7.1.4 spaced microphone technique. The 7 microphone surround layer was placed at a height of 3.16 m, while the height layer was placed at 4.68 m. The primary sound image was captured by three spaced DPA 4006 microphones in the Left, Center, and Right positions. Two additional DPA 4006 made outrigger microphones captured the extreme Left and Right aspects of the room, while a final pair in the back of the room captured the surround image. The front height capture system consisted of a pair of Scheops MK2H microphones hanging from the room’s ceiling grid. The MK2H capsules have more sensitivity in the high frequency spectrum than their vanilla MK2 counterparts, which is sometimes valued in a height capture. In the rear height position, a pair of Sennheiser MKH 800 Twins NX dual-capsule microphones allow for the choosing of different polar patterns in post-production, giving the engineer the opportunity to change the directivity pattern of this particular microphone position. For binaural capture, a Neumann KU100 dummy head was placed directly under the Center position of the main microphone system at around ear height. Finally, we placed two Sennheiser Ambeo first order Ambisonics microphones in high and low positions to capture the two vertical perspectives on the auditory scene. Audio was recorded in 48k/24bt to accommodate compatibility with the Zylia Higher Order Ambisonics microphone—which was not used for this particular experiment, but was for other recording sessions this week—and certain spatial audio decoder systems. Session video was recorded using two angles (a left looking and right looking shot) in 4k resolution.
Experiment
Participants were asked to sing together to achieve a blended choral sound & rate how easy/difficult this was to achieve. Subjects rated on a scale of 1-10 how “Supportive” each spacing and acoustic environment felt while completing their group performance task. They were also given the opportunity to give overall impressions and comments on each acoustic environment, as well as rank the three acoustic settings in terms of overall preference, and describe which spacings they found difficult to perform in. Subjects were cycled through a combination of acoustic environments and inter-musician distances. We created a randomized combination of works and spacings within each acoustic setting in order to ensure that all pieces were performed in all spacings and acoustic settings while minimizing the interaction between variables as much as possible.
Figure 3: Subjects performing an excerpt in mid spacing and the room’s dry acoustic setting (all banners and curtains fully deployed)
Distances:
Close spacing: 0.75m between singers
Mid spacing: 1.5m between singers
Wide spacing: 2.5m between singers
Works:
Mozart, Laudate Dominum (soprano solo) from Vesperae solennes de confessore K. 339 (soprano solo), bars 11-25 [7]
Händel, Lascia ch’io pianga (soprano aria) from Rinaldo, bars 1-8 [8]
Purcell, See We Assemble (choir soprano) from King Arthur, bars 1-8 [9]
These works were chosen by project supervisors Malte Kob and Martha de Francisco as passages that could showcase a variety of musical articulation, while also being familiar enough repertoire that all singers would have a degree of familiarity with them.
Results
Singer Perception
Out of the ranked distances and acoustic settings, the singers generally found the Cathedral (longest tail) setting at a Close spacing the easiest position to blend in, and the most supportive acoustic environment. This clip of “See We Assemble” is from what is collectively their highest rated spacing, acoustic environment, and musical excerpt.
Audio was taken from the Left and Right position of the Decca Tree.
The singers rated the dry/absorptive acoustic setting at the "Mid" singer spacing lowest in terms of supportiveness. One singer referred to this setting as the most "vulnerable" in her survey feedback. This clip shows their lowest rated spacing, acoustic environment, and musical expert.
Audio was taken from the Left and Right position of the Decca Tree.
An Outlier
While overall subjects preferred the combination of Close spacing and the “Cathedral” virtual acoustic environment, one singer preferred the combination of Wide spacing and the “Dry” or most absorptive acoustic setting, ranking it highest in her overall preferences. While this was unusual given that it is the exact opposite of the group’s overall preference, she stated that she was best able to distinguish between the voices of the other participants and pay attention to minute timing differences in this position and acoustic setting. Because this participant gave her highest rating to this position, the group’s overall pick for the room setting that gave them the least amount of support became the Mid spacing instead of Wide.
Observations and Feedback
When ranking acoustic environments in terms of preference at the end of the survey, the answers from all five singers very closely followed their supportiveness ratings. At the same time, one subject expressed that the concept of finding acoustic environments “Supportive” confused her. While the researchers gave a definition that included decreasing the effort of, or bolstering her singing, this confusion ultimately was not totally resolved. This study revealed a different approach may be needed when discussing room reverberation or auralization with performers, which has become an ongoing topic of inquiry.
Changes in Methodology
Originally, our experimental design had the sopranos in a straight line, to limit sight-line variables. Our subjects however requested a semi-circular arrangement, as they considered visual cues from each other an inseparable aspect of group performance. In support of this participants, without prompting, immediately and spontaneously nominated a conductor for their group performance technique. Therefore, these visual variables cannot be separated from our results. As our study progressed, we also held a small discussion among the singers and researchers on the use of vibrato in Händel’s Lascia ch’io pianga aria. While typically vibrato-style singing would be appropriate for solo performance, in the end it was toned down in order to accommodate a more uniform "choral" sound. One soprano gave the feedback that she found straight-singing this piece more difficult.
Future Work
While initial results show interesting trends among singer motivations and preferences in group performance, our ongoing line of inquiry is to compare the singers’ qualitative responses to quantitative signal processing analysis, as well as additional qualitative responses from listeners.
Signal Analysis
The room acoustic parameters that represent clarity, reverberation, intelligibility, etc., are being assessed for the acoustic environments involved in the study, and the sound samples of individual singers singing the same musical piece in those specific acoustic environments are extracted. Spectral, temporal and loudness-based, musically-oriented signal features (e.g. Spectral centroid, formant position & strength, onset & offset time, etc.) will be extracted from the sound samples, and their correlation with the acoustic parameters will be investigated to see how the acoustic environment influences the performance of individual singers in achieving a blended sound.
Listener Analysis
Initial results from this study suggest that singers feeling acoustically supported may not directly translate to an improved musical performance, as their highest rated spacing and acoustic environment may not have been the best technical performance of the musical excerpt. We plan to conduct testing using a Multiple Stimuli with Hidden Reference and Anchor (MUSHRA) or similar expert-based evaluation methodology to examine this relationship. Additionally, to examine if and how capture systems influence the listener perception of choral blend, we will ask for ratings of various capture systems (from anechoic to soundfield) placed within the room along its reverb continuum. Results will be compared to objective reverb attributes extracted from the signal processing analysis to examine whether there is consistency in the perception of blend, and what objective factors with which this impression might relate.
Works Cited
[1] Sandell, G. J. (1995). Roles for spectral centroid and other factors in determining" blended" instrument pairings in orchestration. Music Perception, 13(2), 209-246.
[2] Lembke, S. A. (2014). When timbre blends musically: Perception and acoustics underlying orchestration and performance (Doctoral Dissertation, McGill University, Canada).
[3] Killian, J. N., & Basinger, L. (2007). Perception of choral blend among choral, instrumental, and nonmusic majors using the continuous response digital interface. Journal of Research in Music Education, 55(4), 313-325.
[4] Cook-Cunningham, S. L. (2019). The effects of musician's earplugs on acoustic and perceptual measures of choral and solo sound. Journal of Voice, 33(1), 87-95.
[5] Slimings, J. L. (2022). Choral blend: sound or sensation?: An interpretative phenomenological analysis of proto-professional singers’ perceptions of ensemble singing (Doctoral dissertation, The University of St Andrews).
[6] ISO 3382-1. Acoustics - Measurement of room acoustic parameters - Part 1: Performance spaces, Geneva: International Organization for Standardization, 2009.
[7] Mozart, W. (1780). Laudate Dominum [Praise the lord, all you nations], Vesperae solennes de confessore [Solemn Vespers for a Confessor] K. 339 [Vocal score].
[8] Händel, G. & Rossi, G. (1711). Lascia ch’io pianga [Let Me Weep], Rinaldo [Vocal score].
[9] Purcell, H. & Dryden, J. (1695). See We Assemble, King Arthur [Vocal score].