Ph.D. Public Defense

Interacting with Smart Audio Devices using Induced Structural Vibrations

Joseph (Tre) DiPassio III

Supervised by Mark Bocko and Michael Heilemann

Thursday, June 29, 2023
2 p.m.–3 p.m.

601 Computer Studies Building

Smart audio devices have risen to prominence in the last decade as advances in on-device and cloud computing have enhanced their reliability and scope. While developments in display technology have enabled a trend of compact- ness among recent electronics, a form factor trade-off persists in these devices where high-quality audio reproduction is a functional priority. This thesis investigates a method for simultaneously recording and reproducing sound using induced vibrations on elastic panels, such that a duplex audio interface can be embedded onto the screen of a device itself by affixing structural vibration sensors and actuators. A brief history of the technologies employed by modern smart audio devices to perform sound recording, direction of arrival estimation, and signal enhancement is given, and the proposed surface-based audio interface will be evaluated by its ability to perform these tasks with comparable accuracy and quality to commercially available devices. The intelligibility of speech recorded by structural vibration sensors affixed to elastic panels is measured using the speech transmission index. A word error rate metric is introduced to assess the reliability with which an automatic speech recognition system can transcribe the recordings made in this manner. Methods for crosstalk cancellation are developed for situations when the pro- posed interface is simultaneously recording and reproducing sound. Though the modal and resonant properties of the panel will degrade the recordings, the coloration of the recorded signal that is apparent in the panel’s transfer function is shown to be angularly dependent. An approach to estimating the direction of arrival of noise and speech sources is presented using deep neural networks trained with spectral features that reveal this angular dependence. Techniques that reduce the computational complexity of the feature set are implemented by prioritizing the bandwidths containing the panel’s modes that exhibit the greatest variance in excitation by direction. Finally, methods for enhancing signals recorded by structural vibration sensors affixed to elastic panels that utilize directional information are proposed, and preliminary experimental results are reported.