That would be a bit like a lightfield camera, where you can edit the focusing parameters after the image has already been taken, but now with sound.
The thing about people taking pictures is they interrupt the flow of a live experience. My ex specifically bought a digital camera with a quiet boot up (low noise focus motor) so she could get shots of our friends before they knew she was taking pictures.
Being able to take the shot and work on focal length later would be less disruptive.
After writing a bunch of MATLAB code to find the bats, I handed it off and haven't heard back about whether they actually built the wind turbines or not.
As opposed to?
As a side-effect of the precision needed to spatially locate the mosquitoes, they could detect different wing beat frequencies that allowed target discrimination by sex and species.
This device has never been built, never been purchasable, and it is ALWAYS brought up whenever IV wants to talk about how cool they are.
And I say this as someone who loosely knows and was friends with a few people that worked there. They brought up this same invention when they were talking about their work. They eventually soured on the company, once they saw the actual sausage being made.
IV is a patent troll, shaking down people doing the real work of developing products.
They trot out this invention, and a handful of others, to appear like they are a public benefit. Never mind that most of these inventions don't really exist, have never been manufactured.
They hide the extent of their holdings, they hide the byzantine network of shell companies they use to mask their holdings, and they spend a significant amount of their money lobbying (bribing).
Why do they need to hide all of this?
Look at their front page, prominently featuring the "Autoscope", for fighting malaria. Fighting malaria sounds great, they're the good guys, right? Now do a bit of web searching to try to find out what the Autoscope is and where it's being used. It's vaporware press release articles going back 8 years.
Look at their "spinouts" page, and try to find any real substance at all on these companies. It is all gossamer, marketing speak with nothing behind it when you actually go looking for it.
Meanwhile, they hold a portfolio of more than 40,000 patents, and they siphon off billions from the real economy. Part of their "licensing agreement" is that you can't talk badly about them after they shake you down, or else the price goes up.
They are rent-seeking parasites.
It's very basic. The species identification is based on matching contours of the spectrogram against some template contour. The multilateration was, embarrassingly, done by brute force by generating a dense 3D grid. At the time, I didn't have any knowledge of Kalman filters or anything that could have been helpful for actually tracking the bats.
Not having to deal with wiring that many individual boards and all days of headaches tracking down issues is well worth it in my book.
It doesn't seem they weren't really able to benefit from it all that much, though: half of them arrived defective, and they had to do quite a lot of debugging to fix them.
Unfortunately the assembly/DFM didn't work out well, but with some better design and foresight it should be much less work/wiring compared to wiring them manually.
That's ultra expensive gear.
If there were a speaker array around the screens too, you might be able to localize the audio for each person so that it seems like the sound is coming from where their head is on the screen.
Have a look at the "Meeting Owl" for example.
It works great up to a limit (around 5m) then you will need additional microphones closer to the speaker.
https://www.xmos.com/ a descendant of the Transputer.
I think Cisco had something similar in their large screen meeting room video conferencing systems that could do positional audio tracking of multiple people. Could be wrong, but I think that was at least 10 years or so ago, if not more.
I'm unsure if I'll age out of this problem, or if worse hearing will just recreate it at different thresholds.
I used this to locate an annoying squeal coming from some equipment at work once. And to confirm that it wasn't imaginary.
---
[1] On Android, I like these two:
Spectroid (https://play.google.com/store/apps/details?id=org.intoorbit....). If you use this, consider turning on the waterfall display in the settings.
Spectral Audio Analyzer (https://play.google.com/store/apps/details?id=radonsoft.net....). This has more color options for the waterfall display.
Very neat. I would be surprised if you aren’t seeing some diminished marginal returns from all those extra mics, but I guess you’re trying to capture azimuth deltas that Echo devices don’t really care about.
Some people reverse engineered the entire thing. It can be found in GitHub. And there's an adapter plate available for getting to the GPIOs.
For a less complex entry there are also Chinese FPGAs ("Sipeed" boards which use a GoWin FPGA. They are quite capable and the IDE is free.
Make sure there is enough variation in microphone distances for this method to be effective.
What you want isn't microphone or beamforming tech, it's echo cancellation the same as every videoconferencing software uses.
You just need to feed the show audio and friend audio in, and apply echo cancellation to each.
I'd test that with a CCD line sensor plus a wide aperture lens and reading it out with 8kHz. Then you have 128 audio pixels that can cover an entire city.
However having multiple lasers from multiple different locations might be able to create an improved signal if all signals are averaged, but it wouldn't really be due to the phase shifting that's used in beamforming.
Reminds me of the electronics adage: "all sensors are temperature sensors, some measure other things as well."
The earth’s surface closer to the poles has less distance to travel for any rotation than the surface closer to the equator. As a result the inertial navigation systems of long distance systems must be adjusted. Iirc, this is also the case for artillery firing computations.
https://www.oxts.com/blog/going-round-circles-earth-rotation...
Thought experiment: if I zeroed my IMU at the North pole and traveled in a straight line away from the pole along longitude zero, following the guidance of the IMU. By the time I got to 45° latitude I’d be traveling Westward at 1,180 kph (.95 Mach) to keep the IMU at zero.
(But not all speakers are dynamic speakers.)
Just speculation based on the shared operating principal with condenser microphones
I don't think that electrostatic loudspeakers all require bias power, so it's not quite as simple as using a dynamic loudspeaker backwards is.
It is a neat idea, though. A big, flat-panel microphone would be interesting to play with.
Do you think Apple put a hidden microphone in their devices by pure accident?
They'd probably have to do that anyway.
I wanna say that’s a Bob Pease quote but I can’t find an attribution to it.
I've seen that in electronics lab a few times. The "temporarily light emitting diode"
Due to some law about entropy, efficient processes are necessarily reversible. That's why electric motors - some of the most efficient machines ever invented - are also generators.
You want an ordinary diode to allow current to flow easily when it senses light? Simple: shine a powerful laser at the plastic-encased diode and it will melt the plastic and liquify the metal, fusing it together and allowing current to flow again. See? You just needed to try harder.
The logical question came up more than once: “can we use photovoltaic cells as a light?“. Pretty sure that‘ll work, too, but didn’t try because stuff was expensive then and we didn’t have any broken parts of cells at the time. They probably learned a few things on that day.
Why all solar panels are secretly LEDs (and all LEDs are secretly solar panels) - https://www.youtube.com/watch?v=6WGKz2sUa0w
Over filled it and kinda had to do one 1600m trip.
Fortunately it was manual so I was able to stall it fairly swiftly in third gear with my foot on the break.
Didn't seem to have any impact on the engine as far as normal operating and how it sounded. I didn't do any internal inspection.
In practice most microphones measure the displacement of microscopic membranes, which are deformed by the air pressure. The next question then becomes how to measure microscopic movements of a tiny membrane. Turns out the membrane forms part of a capacitor and the electrical characteristics of capacitors depend on their geometry.
There are at least 4 different types of microphones. Condenser which does in fact form part of a capacitor, dynamic which is effectively a linear generator (coil attached to membrane), ribbon which is a change in resistance as a small ribbon flexes and piezoelectric which is some black magic witg crystals
There are also some exotic principles like laser or radar microphones using interferometry.
For me I see a lot more dynamic than condensers but I guess if you are talking about what is in like every single IOT thingamabob then you might be right there.
I find this to all be in the realm of "I don't believe you that any of this works at all" if I didn't have a lifetime of experience with the fruits of successfully-functioning microphones.
Putting that aside, in an ideal gas, the speed of sound depends on the composition of the gas and the temperature and, interestingly, does not depend on pressure, and pressure is the main way that the altitude would affect the speed of sound. So measuring the speed of sound in air actually makes for a pretty good thermometer.
https://courses.lumenlearning.com/suny-physics/chapter/13-3-...
An ideal gas’ pressure is a function of number of particles per unit volume, its temperature, and nothing else. If you do anything involving adding or removing heat or changing the volume or pressure, you probably also need to know the specific heat at constant volume and the specific heat at constant pressure or, frequency, their ratio. That ratio is called the adiabatic index or the heat capacity ratio, it’s written as gamma, and it’s the last parameter in the speed of sound of an ideal gas. Interestingly, it doesn’t vary all that much between different gasses.
"The speed has a weak dependence on frequency and pressure in ordinary air, deviating slightly from ideal behavior."
"The speed of sound is raised by humidity. The difference between 0% and 100% humidity is about 1.5 m/s at standard pressure and temperature, but the size of the humidity effect increases dramatically with temperature."
"Slight" can matter significantly in an application like this.
This has little do with the behavior of sound. The fraction of the air that consists of water vapor at 100% relative is very small at cool temperatures and increases to 100% at 100 degrees C.
(Yes, water boils at the temperature at which air that is saturated with water vapor is all water vapor.)
But I may misunderstand your comment.
Turns out, not only can you measure temperature that way, but can extrapolate the graph out to find absolute zero (IIRC my result was out by about 20 kelvin, which I think is pretty damn good for a high-school-garage project).
A corollary that's one of my rules to live by: Never measure anything over time without also measuring the ambient temperature.
Firstly it would be great if my phone + headphones could combine the microphones to this end. But what if all phones in the immediate vicinity could cooperate to provide high quality directional audio? (Assuming privacy issues could be addressed).
(Android's Live Transcribe is very good now but doesn't even try to separate which words are from different speakers.)
https://assets.amazon.science/da/c2/71f5f9fa49f585a4616e49d5...
10ms? That's a very long time. Phone clocks are much more accurate than that because they're synced to the atomic clocks in cell towers and GPS satellites.
Hell even NTP can do 1ms over the internet. AFAIK the only modern devices with >10ms inaccurate clocks by default are Windows desktops. I complained about that before because it screwed up my one-way latency measurements: https://github.com/microsoft/WSL/issues/6310
I solved that problem by RTFM and toggling some settings until I got the same accuracy as Linux: https://learn.microsoft.com/en-us/windows-server/networking/...
Anyway I dunno why the math would be too complicated, GPUs are great at this kind of signal processing
In 10ms sound can travel about 3 meters, which is on the order of magnitude of a room, and represents the range of time offsets we're talking about. This has nothing to do with the actual frequencies of the sound itself, or the rate of PCM-type sampling you need to record quality sound. That's a different issue, and doesn't have to do with synchronization of different devices.
Regarding the math: A circular array is better than a grid (or random placement) because there's only one single math formula that's used to compare any mic to any other mic. With a grid array the number of unique formulas involved goes up as the square of the size of the array. And the mics at the 'center' of a grid are basically worthless, and offer no added value.
https://en.wikipedia.org/wiki/Cocktail_party_effect?wprov=sf...
Of course, you can improve on the rejection of off-axis sound by instead using a microphone with a more specialized polar patten (e.g. a shotgun mic), but then you lose the property of the pattern being steerable merely by signal processing.
Lastly, such an array of dirt cheap pressure sensitive mic capsules with some clever computation behind them strikes me as the sort of thing you could throw Moore's law at, if you could justify the quantity. Whereas, Soundfield mics don't make much sense unless you're working with very precisely machined pressure-gradient capsules.
Still, I get the feeling it'll be a while yet before this technique starts looking viable for audio production work, but it's very interesting.
Apparently in loud situations like airplanes, audio illusions can make a sound appear to come from a different spot than it really is. And when you have a weight budget for sound dampening material it matters if you hit the 80/20 sweet spot or not.
The one use case that is both creepy and interesting to me is recording a public space and then after the fact 'zooming in' to conversations between individuals.
Heck, train the model on the raw sensor data and you get the most awesome conference mics
I understand that ICS-52000 is a relatively low cost ($2/100pcs) and there are even breakout boards available with 4 microphones, which can be chained to 8 or 16, like https://www.cdiweb.com/datasheets/notwired/ds-nw-aud-ics5200...
Then you can take Jetson (or any I2S capable hardware with DSP or GPU on it) and chain 16 microphones per I2S port. It would seem a lot easier to assemble and probgam, if comared to FPGA setup.
The Orin has 6xI2S ports internally, so that would work up to 16*6 = 96 microphones, which is a good number. But it looks like maybe only 3 are brought out & on different dev board connectors [1]? As with a lot of design, the devil is in the details. An FPGA could be easier to configure if you need more than 96 microphones.
My notes:
ICS-52000 $3.50, 20 kHz
ICS-41350 $1.05, 40 kHz
SPH0641LU4H-1 $1.45, 80 kHz+
[1] https://docs.nvidia.com/jetson/archives/r34.1/DeveloperGuide...
* I2S requires 3 instead of the 2 pins of PDM. However, in the datasheet that you provided, it shows how you can daisy-chain microphones which is really cool (even if not standard I2S.) So that argument goes away.
* PDM gives you access to way higher sample rates which in turns gives you more flexibility in choosing the delay for a delay-and-sum operation. For example, if the PDM clock is 2MHz, you could theoretically delay with a precision of 0.5us. In practice, you'll do that with lower precision, but with I2S, the clock will typically max out at 192kHz.
* PDM microphones then do be cheaper.
For something indoors, yes, I can see how low sampling frequency gets very limiting. And 192 microphones, that's really pushing it. Love it.
The $2/mic vs $0.5/mic argument is a fun one. You've obviously poured enormous amount of engineering in there, involving PCB design, FPGA and network programming, writing custom CUDA kernels, signal processing, PyTorch, the list goes on. And you've had 4090 plugged in your PC in 2023. Classic hobbit in a mithril vest ;)
think nyquist sampling rates, applied to space, and you can't apply a low-pass filter just because you don't care about higher-order signals. that means that for any given audio environment, there will be some "spatial spectrum" of signal, and you need to sample it densely enough to avoid aliasing.
Radial pattern of linear arrays with exponential spacing should also be pretty close to optimal for the distribution of pairwise microphone distances to maximize the gain with a fixed number of microphones.
This would be cool to mix with VR, so you could hear different conversations as you move around a virtual room