Resonant V2 Plans!
There’s plenty of unfinished business
March 10, 2023
Personal significance
Resonant is a project that is very near and dear to my heart. It was my first significant hardware project and it was also a pivotal moment in my personal development as well. During high school I felt very stuck because I had learned so much about coding, but could not get a single project off the ground. Resonant was what validated, at least to myself, that I can create something cool.
Now I’m coming back to work on it to explore more about audio localization and signal processing. There was a lot left to learn and I’d like to make a second attempt now that I have the mathematical background to dive deeper.
Picking a language: Rust vs. alternatives
I worked a lot on the programming aspect of the project, and normally I pride myself on my code style, but our codebase barely worked. Looking back, it was a threading nightmare I was not equipped to handle. I wrote it in Python which I really disliked doing. If I’m assigning type annotation to variables in Python, I might as well use a language that will verify the code before running. Additionally we used Numpy as a hack to speed up array manipulation and ideally would work with actual arrays.
Nowadays I am a big fan of statically typed languages, but the tooling feels much more archaic. That is why I really like Rust. It is more modern and has great utilities for programming (package manager, built in unit testing, and linting) and I really like the emphasis on error handling. I felt a lot more confident that if my code compiled, it would function as expected. I can hardly think of any runtime errors I experienced while using Rust.
Meanwhile in C/C++ there’s all this friction associated with coding. I want to worry about programming and not setting up my Makefile. And most of the time, 90% of the errors occur during runtime which is a pretty big downside.
However, with all the niceties of Rust, there are some significant problems I discovered. I convinced my team for the NASA BIG Idea Challenge to use Rust and it became really apparent the disadvantage that put us in. When it comes to robotics, Rust has borderline no tooling, while there are probably thousands of packages for C++ and ROS. Even though the developer experience for Rust is much nicer, there is a massive learning curve for actually coding in the language. I read in another article that it took about 4-6 months before the person was coding productively. I am inclined to agree.
The rules of Rust are pretty esoteric compared to all other languages. That non-transferability in skill is something that concerns me as a student who is graduating college in a few years. I really enjoy coding in Rust, but C++ dominates industry and scientific applications.
As of today, Rust is useful in two scenarios.
- You are writing a standalone application with minimal dependencies
- You have an extreme need for thread safety
Rust is not useful if
- You need to prototype
I would prefer to re-write Resonant in Rust, but I think I could benefit from experience using C++, even if it means I will kick myself later for doing so. I really wish there was a good alternative to C++ that wasn’t so painful to work with. Carbon seems really promising, but if I’m using another esoteric language, I might as well just use Rust.
Audio localization methods
One of the major downsides of Resonant V1 was that it could only localize a single source in 2D space. Anything of greater complexity was outside of our reach at the time.
I had researched different sound localization methods and tangential topics, but this one paper by François Grondin and François Michaud summarizes the sound localization problem and its methods nicely (as well as providing a new method that is substantially better than existing ones). This is where I will be getting most of this information from.
Understanding the sound environment can be broken up into two problems:
- Sound Source Localization (SSL): estimating what position in space a sound is coming from, accomplished by direction of arrival estimation (DoA)
- Sound Source Tracking (SST): evaluating potential sounds from sound localization and pruning irrelevant ones
Sound Source Localization (SSL)
There is a lot of research for different sound localization algorithms, each with its own drawbacks. My high level understanding is that there are two classes of algorithms.
One class searches for sources by sampling discrete points in space and plugging them into a delay-and-sum beamformer (just think of it as a function that scores higher if there is a source and lower if not). This is known as the Steered-Response Power Phase Transform (SRP-PHAT) method.
I’m not so confident about this class, but it appears to use linear algebra to estimate the source locations that minimize an error function. I have seen a paper on a Least-Squares solution that accomplishes this, but also the popular ones are variations of family MUSIC and SAMV family.
These are both interesting in their own right and I plan to try implementing at least one algorithm of each type. All of these algorithms are capable of multi-source localization in 3D space.
Sound Source Tracking (SST)
I haven’t investigated sound source tracking at all until today, so I can’t comment on the effectiveness or the methods too much.
Viterbi search is a dynamic programming algorithm that determines the most likely sequence of states that resulted in observed outcomes. This is related to the Hidden Markov Model model which basically indicates the likelihood of transitioning from one state of a system to another with hidden variables.
Sequential Monte Carlo which updates a posterior probability distribution. A posterior probability distribution is a model that indicates your knowledge/belief of the state of the system based on some prior information. It is often useful to update your assumptions (your prior) given some new evidence/information.
Kalman filtering which has a similar purpose of estimating the state of a system with new information. It appears to have the ability to combine multiple forms of feedback to improve the estimate of a system, as a form of sensor fusion.
Another algorithm is the joint probabilistic data-association filter, but I couldn’t find much information.
The Plan
I’m working with Rohan Menon once more on this project. We will be following the paper by François Grondin and François Michaud as a first attempt of experimenting with this problem.
Algorithm
Their solution is a variation of SRP-PHAT and Kalman Filtering. Their version applies a refined search procedure for SRP-PHAT which is generally a more performant than the non-SRP ones. Additionally they use a Kalman filter in Cartesian coordinates instead of spherical. They say there are innacuracies with the Kalman filter as the azimuth increases.
Afterwards we may experiment with some of the less optimal methods since it is a great learning exercise for math and signal processing.
Software and hardware
I have decided that I’m going to code the algorithms in C++ just so I can leave my Rust comfort zone and gain experience with modern systems programming.
Additionally, compared to V1, we will be doing much more work in simulation. In fact, it’s possible that the entirety of the project may be done in simulation (pyroomaccoustics looks like a great package!). Working with hardware is a pain in the neck, especially while prototyping algorithms and getting accurate test data. We will see what happens as we progress. Having hardware is cool to show off.
Once/while the localization and tracking algorithms are being established, we might also investigate audio classification since that was a signficiant aspect of our project.
Goals and conclusion
I would like to apply the math and programming I know in a way that is moderately useful and teaches me things along the way. I’m trying to figure this stuff out as I go. This is my first time learning about all of these things, so I hope you find my journey interesting.
Books I’m reading currently related to the project:
- A Tour of C++ by Bjarne Stroustrup (inventor of C++)
- Understanding Digital Signal Processing by Richard G. Lyons