Laugh, but this probably does have some real world applications for Live Audio.
Digital Live audio mixing is taking over, but it suffers one flaw compared to analog: Latency. Humans can adjust pretty easily to performing an action and hearing a delayed response (that's pretty natural in our daily lives, basically think of it as echolocation). This is sort of like standing farther from a guitar amplifier (sound travels roughly 1 ms per foot). However, singers have it the worst: there is 0 latency from their voice to the ear canal, so monitor systems try to use analog as much as possible.
For digital audio links, every time you join then end-to-end or decode them, you get a bit of latency added.
There are a few audio interconnects that run on Ethernet's OSI Layer 0 (physical medium)
* AES50 is standardized, basically you can think of it as the 100Base-T of digital live audio. It's synchronously clocked with a predictable latency; with roughly ~62us per link. Pretty nice. Cat5e cables are dirt cheap and musicians are destructive as feral cats, so it it's a pretty good solution. Max length is 100meters.
* AudioDante is also popular but actually relies on IP Layer 3, so latency is variable. Typical values are 1ms - 10ms. Max length is pretty much unlimited, with a lot of asterisks.
FTA: 11us is _unbelievably good_ digital latency, but with near unlimited length is actually a pretty good value proposition for Live Audio. There may be a niche demand for a product like this: slap in some SFP adapters, transmit a channel of digital audio over whatever medium you like.
Although when designing audio solutions for large venues, the further back a speaker stack is, the more you'll likely want to add a delay to it so that the sound hits at the same time as the sound from speakers closer to the stage - otherwise it can sound awful (like a strange echo): https://www.prosoundweb.com/why-wait-the-where-how-why-of-de...
So yes, for monitoring, or linking two far away places with near zero latency audio, but not for connecting speaker stacks in a venue :)
Things have probably changed since I last talked to my friends at a large state radio/tv broadcaster, but for long haul they used either MADI over fibre, or AES50 into boxes from NetInsight along with SDI for the video feeds. This works so well that you can put the input/output converters in a venue hosting a live music and do the program audio mix in a control room at broadcast HQ 100s of kilometers away.
At 100s of km, you’d be pushing the limits for actual live sound, though. 100km is about a light-millisecond, and ordinary fiber is rather slower than light, so that’s maybe 3ms round trip per 100km. If a musician can hear themselves through monitors at too much more latency than that, it could start to get distracting.
If the monitors are 3ft away from the musician, they're already looking at 3ms of latency just in the air between the monitor and their ear.
This is why you see headphones used in recording studios I’m sure.
You see headphones used in recording studios because ambient sound (i.e. from a loudspeaker) has a habit of getting picked up by microphones.
As i understand it, the sound for audience in the venue and monitors for artists was run locally by separate mixer. The audio backhauled to HQ was for the live broadcast.
Latency is 1ms for a round-trip through 100km of fiber (200km total).
I've recently been reading about T1 and E1 cables, which were used to transmit most calls inside and between telecom companies back in the day, and I was astonished that they transmitted data one sample at a time.
Unlike IP, those were synchronous, circuit-switched systems. You'd first use a signaling protocol (mostly SS7) to establish a call, reserving a particular timeslot on a particular link for it, and you'd then have an opportunity to transmit 8 bits of data on that timeslot 8000 times a second. There was no need for packet headers, as the timeslot you were transmitting on was enough to identify which call the byte belonged to.
Because all data from a single call always took the same path, and everything was very tightly synchronized, there was also no variability in latency.
This basically eliminated any need for buffers, which are the main cause of latency in digital systems.
> This basically eliminated any need for buffers, which are the main cause of latency in digital systems.
You still need a buffer at each switching point, because the timeslots on each cable aren't likely to line up. But the buffer for each channel only needs to be 2 samples wide in the worst case where the timeslots overlap and you need to send from the buffer while receiving into the buffer.
Given the timeframe when T1/E1 were developed, a more accurate perspective is not that buffers were eliminated, it's that they were never created.
Didnt GSM(2G) work same way with dedicated regular timeslots per call? I dont know about 3G, but 4G finally introduced and 5G cemented packetized voice data with Volte.
The interesting point wasn't the timeslots, but their size.
Yes, 2G has fixed time slots, but a slot is used for a lot longer than a single (half?) sample.
2g (and all other standards after it) use 20-millisecond frames.
It needs to send 8KHz audio at much lower bitrates (~14Kbps instead of 64Kbps), and you can't do that with raw PCM if you want decent quality. This means you need lossy compression and a codec, and those need far more than a single sample to work well.
CDMA was similar, not sure what their frame size was exactly, but it was somewhere in the vicinity.
> FTA: 11us is _unbelievably good_ digital latency, but with near unlimited length is actually a pretty good value proposition for Live Audio. There may be a niche demand for a product like this: slap in some SFP adapters, transmit a channel of digital audio over whatever medium you like.
Used to be you could get an PRI (ISDN/T1) phone line for this kind of work, but I think it's pretty doubtful that you can keep it end-to-end low latency PRI with modern telephony. You'd have to be ok with single channel 8-bit, 8k uLaw, but that's not that bad; you could probably orchestrate multiple calls for multiple channels. Someone is going to convert it to SIP with 20ms packets and there goes your latency.
- [deleted]
Dante network latency can go as low as 125us.
Is there a mode I'm unaware of? I've never had Dante latency that low, let alone that predictable. 1ms-2ms is average with occasional spikes in my experience, and the more complex the network setup the worse it gets.
That in aes67 mode?
I don’t dabble much in low latency audio but from what I remember Dante tended to be about 1ms?
AES67 mode is unfortunately limited to 1ms or higher.