Computational aeroacoustics of human phonation

The current paper presents a CFD model of flow past vibrating vocal folds coupled to an acoustic solver, which calculates the sound sources from the flow field in a hybrid approach. The CFD model is based on the numerical solution of 3D Navier-Stokes equations on a time-dependent domain, solved by cell-centered finite volume method. To capture the fine turbulent scales important for the acoustic source calculations, the equations are discretized and solved on large computational meshes up to 3.2M elements. The CFD simulations were run in parallel using domain decomposition method and OpenMPI implementation of the MPI standard. Aeroacoustic simulations are calculated in a separate step by Lighthill’s acoustic analogy, which determines the acoustic sources based on the fluid field. This is done with the research code CFS++ which employs the finite element method (FEM).


Introduction
A large amount of the noise surrounding us in our daily life is generated by turbulent flows -to list a few examples we can mention e.g.flow-induced noise from airplanes, jet engines, cars, ventilators or air conditioning systems, and also sound produced by wind musical instruments or voice created by human vocal folds.The physics behind aerodynamically generated sound is quite complicated and still not fully understood.One way to analyze the generation of flow induced sound is to model it mathematically by partial differential equations and solve them with numerical simulation tools.Due to the advent of affordable highperformance computer resources, these methods currently attract increasing attention.
In the field of the aeroacoustics in human vocal tract, the computational methods gain even more importance, since direct flow and acoustic measurements within the vocal tract are invasive and therefore hardly possible.The current paper presents a hybrid method, in which a 3D CFD model for flow past the vibrating vocal folds is modeled by the incompressible Navier-Stokes equations and aeroacoustic source terms for the inhomogeneous wave equation are determined by Lighthill's acoustic analogy.

Computational aeroacoustics
A model problem in aeroacoustics is depicted in figure 1, which shows a flow around a bluff body creating a vortex street and turbulent wake.Due to the turbulent effects and unsteadiness of the flow field, periodic disturbances of the pressure field are created.Certain parts of these pressure a e-mail: sidlof@it.cas.czb e-mail: stefan.zoerner@tuwien.ac.at fluctuations propagate as acoustic waves and create flowinduced sound.One approach to computational aeroacoustics is the direct solution of the compressible Navier-Stokes equations, which capture both the fluid dynamic and the acoustic phenomena.However, the direct numerical simulation has specific drawbacks: -The aerodynamic and acoustic pressure fields have dramatically different length scales, the differences may be easily in the orders of 10 5 − 10 7 .This leads to very different requirements on the grid element size.-The computational domain Ω 2 for the solution of the acoustic field is in most practical cases much larger, than the fluid dynamic domain Ω 1 .-In many cases, the interaction of the fluid flow with the solid boundary Γ s leads to structural vibrations, which influence the fluid flow and act as an additional acoustic source.pressure and velocity fields using e.g.Lighthill's acoustic analogy [1], Ffowcs Williams-Hawkings analogy [2] or the Acoustic Perturbation Equations (APE) [3].

Human voice production
Human voice is created by expiration of air from the lungs through a narrow constriction called the glottis [4].This constriction is formed by the vocal folds, located in the larynx (see figure 2).The vocal folds (also called the vocal cords) are two symmetric soft tissue structures fixed between the thyroid cartilage and arytenoid cartilages; basically they are composed of the thyroarytenoid muscle and ligament covered by mucosa.Under certain conditions (subglottal pressure, glottal width, longitudinal tension in the thyroarytenoid and ligament) the vocal folds can start to oscillate and close the channel periodically, forming a pulsating jet.The dominant sound source in the glottis is associated with the oscillating volume flow, which determines the fundamental frequency of human voice [5].In some cases, especially when the vocal folds are abducted (e.g. in breathy phonation), the acoustic consequence of the dipole sound sources due to unsteady forces exerted by the vocal folds on the fluid may also grow important [6].The acoustic signal generated in the glottal region is further modulated by the vocal tract, radiated from the mouth, and perceived as voice.The concept of fluid-structure-acoustic interaction between the airflow, elastic vocal folds, sub-and supraglottal acoustic spaces relies on good knowledge of aerodynamics in the larynx.The aerodynamic effects in the larynx are, however, rather complex: this is caused mainly by the periodic closure of the glottal channel during vocal fold vibration and inherent unsteadiness of the airflow.Moreover, it is very complicated to perform any flow measurements in living subjects, since the laryngeal airways are not accessible to measuring instruments.This is why some features of the laryngeal flow are not yet fully understood.
From the fluid-mechanical point of view, human larynx can be seen as an approximately planar nozzle with time-varying clearance.However, the vocal fold vibration patterns also present longitudinal components [4,7], which disrupt the dominantly planar character of the geometry.In the convergent part the airflow accelerates.Near the narrowest cross-section, airflow separates due to adverse pressure gradient and forms a jet.The glottal jet, which is essentially planar, but might vary in intensity along the anteriorposterior axis in the proximity of the glottis, pulsates due to vocal fold oscillations.Further downstream of the glottis, the jet interacts with further laryngeal tissue structures (such as false vocal folds) and supraglottal large-scale vortical structures, which leads to a complex 3D flow field.
The purpose of this paper is to present a 3D CFD model of the glottal flow coupled with an acoustic solver based on acoustic wave equation, with acoustic sources obtained from the Lighthill's analogy, and to present the first results.

Previous works in the field
Due to the difficulties encountered in both in vivo and in vitro measurements, there have always been efforts to develop mathematical models of human phonation.The cornerstone of mathematical modeling of human voice production was set by Ishizaka & Flanagan [8], who developed a simple mass-spring model coupled to potential quasi-1D airflow.Although, this may seem a serious oversimplification of the real physiological situation, the model proved very useful and its variants (e.g., [9,10,11,7]) are widely used up to present.The main advantage of these models is that the equations may be solved either analytically, or using simple and very fast numerical methods for ordinary partial differential equations, making it possible to perform nearly real-time simulations on current computers.However, since these models do not provide much information on the glottal airflow and acoustics, increasing effort has been devoted to numerical solution of the 2D or 3D Navier-Stokes equations on computational domains approximating the glottal channel.These include studies with fixed vocal folds (e.g.[12,13,14,15]), forced vocal fold oscillation [16,17,6,18,19,20,21,22,23] and models with the airflow fully coupled to elastic tissue oscillations [24,25,26,27,28,29,30].Only a few of these computational studies [24,12,14,15,22] solve the flow field in 3D, and most of them on a static geometry.One of the most complex approaches to phonation modeling was recently published in the works of Zheng et.al.[31,32] and Seo & Mittal [33], who use the immersed boundary method for incompressible low-Mach number flow coupled to a finite element solver for the viscoelastic tissue to calculate the 3D flow field and flow-induced vibrations of the vocal folds including glottal closure and contact forces.Using the aerodynamic-acoustic splitting technique, the acoustic field is then calculated by solving linearized perturbed compressible equations.An extensive overview of the computational vocal fold models can be found in a recent review paper of Alipour et al. [34].
The Reynolds numbers found in airflow past vocal folds range from 1000 up to about 5000-10000 [35].This implies that the subglottal flow may be laminar.However, Neubauer et al. [36] showed that near the glottal region, transition from laminar to turbulent flow occurs and that the supraglottal flow field, dominated by separated jet flow and recirculation, is rather turbulent.Numerical computation of such transitory, highly unsteady airflow with massive separation is problematic.The Reynolds-Averaged Navier-Stokes (RANS) turbulence modeling is used rather marginally [17,12].Some recent papers [37,14,15] claim that RANS approach for modeling of human phonation is inadequate and employ Large Eddy Simulations (LES).However, still many authors [16,24,18,25,19,20,13,28,21,30]  (DNS) on a largely insufficient mesh.This approach was adopted within the current study, too.
From the computational models focused on the aeroacoustics of the human voice production, one of the first papers was published by Mc.Gowan [5].Zhao et al. [6] computed the acoustic radiation by direct simulation, and compared the results with Ffowcs Williams-Hawkings analogy.In a study based on LES-based computational techniques, Suh & Frankel published acoustic-analogy-based far-field sound predictions and compared them to direct simulations.Bae & Moon [19] used the perturbed compressible equations in an axisymmetric setup with acoustic sources computed from the incompressible Navier-Stokes equations.The authors conclude that the main portion of the phonation process is caused by dipole source mechanism due to the pulsating air jet, and that the voice quality is closely related to the vortical structure in the shear layer of the jet.

Computational modeling of the fluid dynamics 2.1 Governing equations
The typical Mach numbers of the fluid flow in human phonation are usually bellow Ma < 0.2, and so the fluid flow can be modeled by unsteady incompressible Navier-Stokes equations without introducing significant modeling error.In strong conservation form suitable for further finite volume discretization, the equations read where u, p and ρ are fluid velocity, pressure, and density, respectively, and ν is kinematic viscosity.
The geometry of the model vocal folds used in this study is based on the widely used parametric shape "M5" proposed by Scherer [38,39].The "M5" model is piecewise linear with rounded corners, and provides an easily parametrizable approximation of the vocal fold shape during oscillation.The 2D section of the computational domain is depicted in figure 3.In current simulation, two elementary cases were investigated for the sake of simplicity: convergent vocal folds (angle ψ = 20 o ) and a divergent configuration with ψ = −20 o .The thickness T V F = 7.3 mm of the vocal folds (physiologically, the inferior-superior dimension), height H = 5.5 mm (medial-lateral dimension, normal to the sagittal plane) and length L = 12 mm of the vocal folds (anterior-posterior dimension, normal to the coronal plane, not seen in figure 3) were set to match average physiological values found in male subjects.The subglottal channel length was set to T 0 = 0.4 T V F , the supraglottal channel length is T 1 = 5.4 T V F .In both configurations, the vocal folds vibrate in the medial-lateral direction symmetrically with one degree of freedom and frequency f = 100 Hz, leaving a minimum glottal gap g min = 2 * 0.1 mm and maximum glottal opening g max = 2 * 0.9 mm.
The boundary conditions imposed on the pressure and velocity fields try to mimic the real physiological conditions.The airflow in the model is driven by constant pressure difference between the inlet Γ in and outlet Γ out of the domain (i.e., constant lung pressure and zero relative pressure at outlet).At the fixed channel walls Γ wall and moving vocal fold surfaces Γ VF a Neumann condition ∂p/∂n = 0 was prescribed.The boundary conditions for the velocity field were set as follows: ∂u/∂n = 0 at Γ in , no-slip condition u = 0 at Γ wall .At Γ VF , the flow velocity is equal to the velocity of the moving vocal fold surface.Due to the presence of large vortical structures convected downstream of the glottis up to the outlet of the computational domain and consequent backflow to the domain destabilizing the numerical solution, it was necessary to introduce a stabilized outflow condition at Γ out : ∂u/∂n = 0 when velocity direction points outward of the domain, u = 0 otherwise.

Numerical solution
The numerical solution of the airflow in the glottal channel with moving walls was implemented in OpenFOAM [40], an object-oriented open-source set of libraries programmed in the C++ language.OpenFOAM is based on the finite volume method in cell-centered approach on arbitrarily unstructured meshes.
The incompressible Navier-Stokes (1) equations on a moving computational mesh were solved numerically using a modified PISO algorithm [41] with an extra loop of substep iterations and variable under-relaxation.The algorithm allows for automatic time stepping and is designed to achieve convergence for larger time-steps (i.e., large Courant numbers).The discretization schemes were as follows: first-order Euler implicit for the time derivative, total variation diminishing (TVD) scheme for the convective term, central differencing scheme (CDS) with explicit nonorthogonal correction for the diffusion term.
The computational domain changes in time due to vocal fold oscillations.Since the vocal folds do not collide and close the channel completely in current simulations, it is not necessary to introduce topological changes to the mesh and the mesh is simply deformed.The coordinates of the element vertices in a new timestep are found by solving Laplace equation for the mesh velocity w with spatially variable diffusivity γ: with boundary conditions given by zero mesh velocity at Γ in ∪ Γ out ∪ Γ wall , prescribed velocity on the moving vocal fold surfaces Γ V F and zero normal derivative on the front and back walls (here, the mesh is allowed to "slip" freely).The diffusivity γ decreases exponentially with distance from the vocal fold surfaces.Due to significant element distortion and consequent loss of element orthogonality between the moving vocal folds, one outer loop of 01085-p.3 nonorthogonal correctors was utilized within the modified PISO algorithm to guarantee stability of the computation.The resulting linear system for momentum was solved by bi-conjugate gradient method (PBiCG) with diagonalbased incomplete LU (DILU) preconditioning.For the pressure predictor and corrector steps, faster convergence was obtained using a geometric-algebraic multigrid (GAMG).The cell motion Laplace equation was solved by diagonal incomplete-Cholesky (DIC) preconditioned conjugate gradient method.
The meshes used in the computations were constructed from an unstructured isotropic triangular 2D mesh extruded into third dimension.The mesh with 2.15M elements, used for the calculations, is depicted in figure 4. The grid dependence was tested on six computational meshes, composed of 550k -3.2M prismatic elements.The characteristic element size in the x/y direction ranged from 0.23 mm up to 0.13 mm for the finest mesh used.Since no turbulence and no boundary layer model was used in the calculations, computed flow fields do not converge strictly with mesh refinement due to capture of finer and finer scales of turbulence, better resolution of the boundary layer and flow separation.However, the tests showed that when using elements smaller than ca.0.15 mm, the important global parameters of the flow field (such as the drag and lift forces on the vocal folds) do not change significantly when the mesh is further refined.See [42] for details on this issue.For running solvers in parallel, the OpenFOAM library uses the domain decomposition method and OpenMPI implementation of the MPI standard.To partition the mesh for the parallel computation, two different algorithms were tested: scotch [43] and metis [44].The quality of the mesh decomposition was compared in terms of subdomain element number (important for load balance) and number of processor faces (influencing processor communication).The results of the tests (see [42] for details) suggest that in this particular case, scotch produces better decomposition in terms of interprocessor communication, but is outperformed by metis in terms of load balance (at least for lower number of subdomains).However, the difference is not substantial and does not influence the final computational times significantly.
The CFD simulations were run on a computational cluster Hydra at the Technical University of Liberec -a heterogeneous cluster consisting of 12 Dell PowerEdge 1950 nodes (each one with two Intel Xeon 5140 dual-core processors, 1333 FSB, 4MB shared L2-cache) and 17 Sun Fire V20z nodes (two single-core AMD Opteron 252 processors per node), running Linux CentOS.Instead of using standard Infiniband interlinks, the nodes are interconnected only by 1Gbps network, which forms a serious bottleneck of the system.

Results
Before running computations on the model of the vocal folds, the accuracy of the code in the case of low Reynolds number separated channel flow was verified using a benchmark study by Turek and Schäfer [45].This paper summarizes the quantitative results of 17 research groups using various computational methods to solve a benchmark problem of cross-flow over a cylinder, particularly of a 2D unsteady flow over a fixed circular cylinder at Re = 100 with a parabolic inflow velocity condition.The following global quantities are tracked and compared between the results of all the research groups: frequency of separation (vortex shedding) f , Strouhal number S t = D f U , maximum drag and lift coefficient over one period [t 0 , t 0 + 1/ f ] and pressure difference ∆p(t) between the front and end point of the cylinder at t = t 0 + 1/2 f .The initial time t 0 corresponds to the state when c L is maximum, D is the cylinder diameter, U the mean inflow velocity and F D and F L are the drag and lift forces Here u t denotes the tangential velocity on the cylinder surface S , n x and n y are the components of the unit normal vector.The numerical results of the current algorithm calculated on a triangular mesh with 380k elements were compared against the benchmark data published in [45], which contain also some (incomplete) experimental data.The maximum difference was about 6% against the mean values published in the benchmark study.
As an input for further aeroacoustic computations, a transient CFD simulation for the convergent configuration (ψ = 20 o ) was run.The fluid dynamic fields were computed over 20 periods of vibration (t = 0.2 s).The simulation was run with a variable timestep, adjusted automatically by the solver to insure that the Courant-Friedrichs-Lewy (CFL) condition for the time step ∆t, element size ∆x and local velocity u is satisfied: Depending on the maximum local Courant number (occurring in the glottal gap and dependent on the glottal opening), the timestep ranged from 1 -5 • 10 −6 s on the mesh with 2.1 M elements.The airflow was driven by constant pressure difference ∆p = 300 Pa, corresponding to soft 01085-p.4The results of the transient simulation are demonstrated in figure 5.The velocity fields in the coronal midplane during four phases of one vocal fold vibration cycle are visualized by vorticity isocontours.The supraglottal flow field is dominated by a pulsating jet, which gradually develops during the vocal fold opening phase.Coherent turbulent structures are generated mainly in the shear layer of the jet, but unlike the benchmark problem of cylinder cross-flow, the vortex street is irregular and not perfectly periodic.The jet tends to skew from the symmetry plane due to interaction with large-scale vortices and recirculation flows.Further downstream, the jet loses its planar character, disperses and forms complex 3D structures.

Governing equations
Calculating the acoustic pressure p is characterized by the wave equation, which in index notation is given as In (7) ρ is the acoustic density.According to Lighthill [46], for a high Reynolds number the viscous stress may be neglected.The heat conduction may also be neglected since in regions of ambient temperature the contribution of heat conduction is of the same order as the viscous term.This leads to the approximation The fluid induced sound is a sound source given on the whole domain Ω.For the boundary terms of the vocal folds and trachea a homogeneous Neumann condition is used, which represents a hard reflecting wall.To prevent any reflection of acoustic waves imping the in-and outflow a perfectly matched layer (PML) is used as introduced in [47] and [48].

Numerical solution
The wave equation ( 6) is discretized using the Finite-Element Method (FEM).Thereby, equation ( 6) is multiplied by an appropriate test function and integrated over the whole domain Ω to acquire the weak formulation.The test function ψ is chosen from the Sobolev space with L 2 the space of square integrable functions and Dψ the first order weak derivatives of ψ.This gives us Applying Green's integral theorem on the second derivatives in space results in An advantage of the FEM regarding Lighthill's analogy is the order reduction of the spatial derivative of the Lighthill tensor due to the integration by parts.The boundary integrals over Γ wall ∪ Γ VF vanishes due to the homogeneous Neumann condition (9).The infinite-dimensional space (10) is replaced by finitedimensional subspace V h ⊂ H 1 and a basis {ϕ 1 , . . ., ϕ N } of V h is selected, with N the number of FE nodes in the computational domain.Thereby, the unknown pressure is approximated by The test functions are chosen from the set of basis functions, resulting in a matrix vector notation of ( 11) In ( 12) M M M denotes the mass matrix and K K K the stiffness matrix.The discrete acoustic pressure is denoted as p p p and its second time derivative by p p p, which is discretized by the Newmark scheme.The inhomogeneous part F F F is the discretized form of the right hand side of ( 11) and includes the Lighthill sources.
As a rule of thumb 20 linear finite elements should be used per wave length ( [49]).Since the maximum frequency encountered in this application is about 5 kHz the fine grid of the CFD computation is more than sufficient for acoustics.
For the PML, two additional regions are introduced at the front and at the end of the fluid domain, as shown in figure 6.Their purpose is to damp out any waves and prevent reflection at in-and outlet.This introduces a nonsymmetric term into the hyperbolic wave equation.Further specifics go beyond the scope of this paper-for details refer to [47] and [48].

Results
The results are taken at the monitoring points shown in figure 7 (at inflow, inside the glottis and at the outflow).In figure 8 the acoustic pressure in frequency domain is given.A dominant peak at 100 Hz and its harmonics is noticeable.The sound pressure level inside the glottis is significantly higher than at the other monitoring points, about 10 dB higher and for the harmonics even 15 dB.With higher frequency the amplitudes of the 100 Hz decrease until they become indistinguishable.In front of the glottis and at the outflow the last acoustic relevant harmonic is at 500 Hz.Furthermore, the frequency range from 100-200 Hz is up to 10 dB higher in amplitude than other frequency ranges and as high as the 5th harmonic in amplitude.

Discussion and Conclusions
An aeroacoustic model of human voice production has been developed.The CFD model, based on unsteady incompressible Navier-Stokes equations in 3D, is one-way coupled with an acoustic solver of the inhomogeneous wave equation with source terms calculated from the Lighthill tensor.The first results of the acoustic signal inside the flow domain prove the presence of the fundamental frequency (equal to the vocal fold vibration frequency) and its harmonics.Downstream turbulent structures in the flow field cause noise in hydrodynamic pressure and fluid velocity which exceed higher harmonic frequencies.For the acoustic signal this causes harmonics over 1 kHz to be indistinguishable since their amplitudes drop below the noise level.
The sound pressure level of the simulated sound is higher than expected, and reaches up to 130 dB.Lighthill himself makes clear in his famous paper [1] that this approach does not necessarily give correct results in the near field.Therefore, a different method based on the acoustic perturbation equations [3] is currently being tested.Additionally, the aim is to introduce a simplified model of the vocal tract and analyse the radiated sound signal.Due to this additional far field region, the findings should also make it possible to directly compare the perturbation approach with Lighthill's acoustic analogy.

Fig. 1 .
Fig. 1.A typical problem in aeroacoustics -sound generated by flow over a bluff body.

Fig. 2 .
Fig. 2. Schematic of the the human larynx (coronal section) use a "laminar" model, actually a Direct Numerical Simulation 01085-p.2

Fig. 3 .
Fig. 3. 2D (coronal) section of the computational domain with vocal folds in zero position.The length T 1 of the supraglottal channel is not in scale.

Fig. 4 .
Fig. 4. Computational mesh used for the solution of the fluid flow (left part of the domain): 2.15M prismatic elements.
pn y dS .

Fig. 5 .
Fig. 5. Flow field in the y-normal section (at glottis midline).Vorticity contours at four time instants during 12 th cycle of vocal fold vibration.

) 01085-p. 5 EPJ
Web of Conferences with the speed of sound c and T the Lighthill tensor T i j = ρ f u i u j Reynolds stress + τ i j Viscous stress + p − c 2 ρ δ i j Heat conduction .

Fig. 6 .
Fig. 6.Acoustic source domain with two PML domains at inflow and outflow.

Fig. 7 .Fig. 8 .
Fig.7.Acoustic pressure distribution in computational region.The three balls mark the observation points for the acoustics.