Audio Services over ATSC 3.0: A Proof of Concept

Liam Power

Abstract –This paper provides a proof-of-concept for delivering audio services using ATSC 3.0. It outlines the initial starting requirements for the development of the system. It then covers the determinations made at each step of the process to fulfill these goals: encoding, delivery layer, physical layer, and receiver. Real world results of the tests conducted in the lab and in the Baltimore, Maryland market are then discussed. Finally, next steps in development, as well as potential gaps in the standard to fill are suggested.

Introduction and Summary

Much attention is given to the better picture and audio capabilities of ATSC 3.0’s linear audio-video services, and rightly so, as it provides a significant enhancement over the previous standard. However, there are other tools in ATSC 3.0 that enable broadcasters to bring new value-added services to their viewers. This is in large part due to the broad flexibility with which it was designed. In this paper, an exploration is made that applies this flexibility to a specific goal: the delivery of audio services. The initial requirements laid out were to develop a method of audio service carriage that delivers multiple audio programs at equal or better quality, tuning time, and receivability than analog FM radio, while using as little spectrum as possible, using ATSC 3.0. In particular, the goal was for these multiple audio programs to occupy less spectrum than a single analog FM radio channel.

In this process, several limitations with existing practices were discovered, which forced a look at the core capabilities of ATSC 3.0. Over the course of this paper, the decision-making process will be outlined for each component of the service, utilizing many aspects of the standard.

Section 1 presents the overall system architecture of the chosen design.
Section 2 covers the encoding choices made to provide the best efficiency with the most client support.
Section 3 discusses the various delivery layer options and the benefits and downsides of each.
Section 4 examines the approach to the physical layer design, balancing the tradeoffs between spectrum usage and receivability, especially in a mobile environment.
Section 5outlines the real-world drive testing results as well as a comparison to other services in the market.
Section 6 looks at some unique receiver features that were required to support this and examines some solutions currently on the market.
Section 7 outlines some potential future developments with these services. It also discusses some modifications to the ATSC 3.0 specification that would allow for better implementation of audio services going forward, ensuring consumer receivers can efficiently process these.

Section 1: PoC Design Overview

Before getting into the fine details in later sections, this section will explain the chosen design for the proof-of-concept (PoC) stage of this project.

On the transmission side, the system does not use the traditional audio-only services design in the standard, for reasons that will be outlined later in this paper. Instead, services are distributed using ProMPEG RTP [1], which provides MPEG transport stream (TS) encapsulated in a real-time transport protocol (RTP) stream. This allows for forward error correction (FEC) to be utilized, while also avoiding the significant extra overhead that would be present in a pure MPEG TS. The encoder pulls the source material from an HTTP live streaming (HLS) feed of the audio source material being used (for the PoC, Stingray Radio services hosted on Stirr are the audio source), but there is nothing that would restrict it to this as a source. For example, tests have been conducted using an Icecast server hosted by FM radio station WTOP (Washington, DC, 103.5 MHz) as the origin instead. Once this is encoded, the encoder then sends out the RTP multicast to the Scheduler (see Figure 1). As there’s not a specific major/minor, 239.255.major.201-215 were used as the destination IPs, per 6.1 of [2].

Figure 1: Abstracted project architecture.

To identify these multicast streams to the ATSC 3.0 receiver dynamically, rather than hardcoding them, a schema was created for the UserDefined table per 6.8 of [2], an example of which can be seen in Appendix A. This gives the receiver a way to retrieve the service from the appropriate multicast stream, as well as to distinguish between multiple transmissions of the same service, for potential multiple frequency network (MFN) usage when crossing between markets. It functions very similar in manner to a cut down version of the service list table (SLT) in a traditional broadcast, as seen in Table 6.2 of [2].

All this is then bundled together in the transmission over the air. The physical characteristics, discussed later, are optimized for mobile reception to ensure continuous uninterrupted reception even at highway speeds of 70+ miles per hour.

On the receive side, an HDHomeRun Flex 4K using custom firmware courtesy of SiliconDust is utilized. The firmware allows it to process and list services in the channel lineup based on the info in the UserDefined table. Figure 1 provides a high-level view of this process, where once the streams have been determined using the table, the HDHomeRun then extracts the MPEG TS stream from the RTP stream and supplies that to the client device via HTTP. Simultaneously, it would pull down data being sent via non-real time (NRT) transport, containing supplementary information such as thumbnails and song metadata that the client device can utilize. It also would supply the receiver with the information to host a web server that allows utilizing the app interface in the browser, in addition to the native application on the client device.

When the user is connected to the same network as the HDHomeRun, for example via in-car Wi-Fi, they can then navigate to the hosted web page, or open the native app installed on their device. This then extracts the configuration from the local server, avoiding any need for an internet connection, and uses it to populate the channel listing. The user can then view audio services by selecting them, at which point the MPEG TS supplied by the HDHomeRun begins playing. Linear audio/video services are supplied in the normal manner for a consumer HDHomeRun, though they require adequate receive conditions to play, as the higher bitrate needs make it difficult to place them in a similarly receivable PLP to the audio services.

All this comes together to provide a seamless result to the user. From their perspective, they can drive all over the TV market and obtain mobile reception, selecting any audio service at will with a minimal “tuning time” between each service, as all virtual services are on the same physical channel.

Section 2: Encoding Choices

One of the earliest considerations was how best to handle the encoding of these audio services. Several audio codecs were considered, namely AC-4, xHE-AAC, HE-AACv2, HE-AACv1, and AAC-LC.

The first assessment was made towards client support. While AC-4 has many benefits, as outlined in other publications around ATSC 3.0, it is the most lacking in client support. This is not as much of an issue when it comes to watching traditional television services on a TV, as ATSC 3.0-capable sets all support AC-4, but when utilizing a gateway device, such as the HDHomeRun, it presents a greater problem. Many TVs, mobile devices, and computers do not have built-in AC-4 support, which means that to decode AC-4, a native app must be used with appropriate licensing and software for decoding. SiliconDust resolves this issue rather elegantly by sending the audio to their servers for transcoding to a compatible codec, then back to the receiver device for playback to the user, all live and on the fly [3]. This allows for a truly device-agnostic playback solution. Unfortunately, because an internet connection cannot be guaranteed in the automotive receive environment, all decoding would have to be done locally.

At the time of the project planning, xHE-AAC was missing client support on Windows, but was otherwise supported in Android, iOS, and macOS. However, since that time, Fraunhofer IIS has announced support for xHE-AAC on Windows 11[4]. HE-AACv2, HE-AACv1, and AAC-LC all have broad support, and indeed are included in a great many standards.

The other primary consideration was bitrate efficiency. After narrowing down the possible codecs, several in-house listening tests were conducted, in addition to examining results from published MUSHRA tests of the various options.[1] Over the course of this testing, it was found that HE-AACv2 achieved the quality standard the project had set for stereo music at approximately 32 kbps, while HE-AACv1 required roughly 48 kbps, and AAC-LC required roughly 64 kbps. AC-4 was found to perform slightly better than HE-AACv1 at 48 kbps for this type of low-bitrate application, but due to encoder restrictions, was unable to be tested at bitrates below 48 kbps. xHE-AAC was able to achieve the requirements at 24 kbps. While xHE-AAC performed the best, HE-AACv2 was chosen for the, at the time, device support advantage in exchange for a slightly higher bitrate.

Lastly, one note on a secondary benefit to utilizing xHE-AAC from a feature set perspective. xHE-AAC is designed with both spoken word and music in mind, which is a major divergence from historical trends, where codecs tended to optimize for one or the other when operating in a highly efficient environment. This allows for a single unified codec to satisfy both use cases. While the demonstrated PoC in Las Vegas, Nevada (at the 2023 CES) was conducted using music services for licensing reasons, internal testing with news-talk radio station WTOP’s feed backed up Fraunhofer’s claims, allowing for even 8 kbps mono speech audio to sound adequate. Upping the bitrate to 14 kbps resulted in what was subjectively described by the evaluation audience as ”good” audio in an ad-hoc, non-MUSHRA conformant listening session.

Section 3: Delivery Layer

Once the codec had been settled on, the conversation turned to the transport method for the audio. Initially, the assumption had been that the services would be delivered using the audio-only service category in ATSC 3.0 described in 3.4 of [2] with the standard real-time object delivery over unidirectional transport (ROUTE) or MPEG media transport (MMT) method. However, this was unexpectedly found to present significant overhead issues, so other options were explored.

Initial testing with a 48 kbps HE-AACv2 stream resulted in significant overhead across all packager options, with scheduler input bitrates ranging from 100-200 kbps above the theoretical media bitrate, thanks to the additional signaling, transport, and protocol needs. On a 3-5 Mbps video service, as is commonly deployed in markets today, this represents minimal overhead, at about 3%, but when running on a theoretical 48 kbps service, this is 200% to 400% of overhead. Such a rate would more than wipe out any gains from codec efficiency, and so other options had to be explored.

One interesting item is the wide variance in bitrates across the three packagers tested: Digicap’s Digicaster-M, Enensys’s MediaCast, and Triveni’s Guidebuilder. One packager operated at nearly 100 kbps less overhead than the other two tested, encoder and scheduler being held equal. In further testing, a significant portion, though not all, of this was due to variation in the default service layer signaling (SLS) repetition rates, all of which fall within the boundaries set by 4.10 of [5]. Simply reducing the SLS repetition rate would help reduce bitrate, but at the cost of tuning time, which was one of the key requirements. As a result, a different delivery solution was required.

Several alternatives were considered, keeping in mind the unidirectional nature of broadcast:

QUIC Multicast [6]: BBC R&D had published some work [12] on integrating dynamic adaptive streaming over HTTP (DASH) with QUIC using a multicast extension to distribute content more efficiently via over-the-top (OTT). Unfortunately, this was constrained by two factors: the unlikeliness of a full implementation in the single quarter timeline allocated to this project, and initial findings of higher overhead than DASH via ROUTE.
UFTP [7]: This is used in some cases to distribute files via one way internet protocol (IP) satellite connections. Importantly for this project, it also does so with a relatively low overhead, at around 10%. However, as segment size decreases, to minimize the impact of loss and reduce tune time, overhead increases due to the increased number of files per second, so this was also ruled out early on.
MPEG TS: Due to the simplicity of implementation, particularly on existing receivers, MPEG TS was next explored. While this worked very reliably, it ran into overhead issues similar to the ones with normal ATSC 3.0 methods, ending up around 150 kbps. This is in large part due to the inability to evenly divide a standard IP packet into 188-byte pieces, as well as the need for stuffing bytes.
Muxed MPEG TS: Given the previous issues, the obvious next option was to multiplex the TS. This resulted in a massive drop in overhead, bringing it to roughly 60 kbps when muxed with 20 streams. While this was an excellent gain in efficiency, it brought about its own challenges, as not all clients process muxed TS well. The other disadvantage was the fewer the streams, the worse the overhead.
RTP: Given its versatility, real time transport protocol (RTP) was another obvious candidate. After all, it underpins even ATSC 3.0. For this, the choice was made to drop the target bitrate to 32 kbps. As this was using full packets, the overhead on each was theoretically 3%, and that was backed up by real-world testing. While full packets meant more tuning delay, it was not significant by any means, as the maximum wait time would be about a third of a second at the bitrate in use. The downside, however, was inconsistent compatibility of clients unless a session description protocol (SDP) file was provided.
Pro-MPEG RTP [1]: This fixed the SDP problem, as it’s effectively TS bundled within RTP, so clients handle it just fine. It also did so without an excessive increase in bitrate, at a total of 45 kbps, outperforming even the muxed TS, thanks to the RTP encapsulation avoiding some of the limitations of raw MPEG TS, as a packet can be assembled from fragments efficiently spread in RTP. As a bonus, it allowed for the integration of FEC, which added another 9 kbps.

As a result, Pro-MPEG RTP ended up being the protocol of choice. This would also have a benefit later when it came to the receiver-to-client delivery method, as the stream delivered audio already in MPEG TS, for an easy transfer to a unicast HTTP ”last mile.”

This is one of the major benefits of ATSC 3.0: the ability to inject regular user datagram protocol (UDP) multicast streams at the scheduler stage. This has already been used by a variety of groups to deliver educational data, public safety data, digital signage, and enhanced GPS, to name a few. It ensures that anything at the delivery layer not thought of in the specification can easily be added for custom solutions like the one described in this paper. Essentially, ATSC 3.0 becomes a unidirectional IP pipe able to be utilized just like it was part of a hardwired network, as the entire RF component can be abstracted away between the scheduler and receiver. The possibilities of this for data delivery are nigh on endless, and well worth exploring.

The downside, however, compared to the audio-only ATSC 3.0 linear service, is the loss of standardized signaling handled by every compatible receiver, as when a multicast is sent to the scheduler directly, the packager is not aware of it, and thus does not list it in the SLT or generate an SLS. Fortunately, ATSC 3.0 provides a table option that handles this precise situation: the UserDefined table, found in 6.8 of [2] and reproduced in Figure 2.

Figure 2: The full text of the UserDefined section of A/331.

Table 0xFF allows for the addition of additional signaling to the low-level signaling (LLS) as needed. This provides a way to easily distribute and dynamically update information to a receiver without needing to hardcode non-standard multicast addresses into the receiver. In the case of this project, it was used to distribute information similar to what would be listed in the SLT. An example can be seen in Appendix A, and the reasoning for each element and attribute is explained below:

OMLRadio: This tells the receiver that anything within this element will be related to the audio services.
- name: This provides a friendly name to help with debugging.
rsrv: This denotes an individual service element.
- name: Much like the short service name in the SLT, this provides the receiver with a name to display to the user when listing the service in the lineup.
- destIP: This provides the multicast destination IP, corresponding to one described in the link mapping table (LMT).
- destPort: This provides the multicast destination port, corresponding to one described in the LMT.
- mktid: This provides a globally unique ID for the service on the specific transmitter. This is currently composed by combining the broadcast stream ID (BSID) with the last octet of the IP. This is used to distinguish between transmissions when receiving services from two or more RF channels.
- srvid: Unlike mktid, which is globally unique to the transmission, this presents an ID that is shared across all transmissions that utilize the same media source. In combination with mktid, this allows the receiver to potentially transfer from one transmission to the next as the user travels through different markets. As reception quality begins to degrade from one transmitter, the receiver would be able to locate another transmission with a different mktid but the same srvid and attempt to switch to it.

Section 4: Physical Layer

Much like the scheduler injection and UserDefined table, the physical layer options are an example of just how flexible the ATSC 3.0 standard can be. There are enough options to allow fine tuning the transmission to almost any use case, and broadcasters already take full advantage of this for all manner of receive environment and payload combinations, resulting in a healthy variety of physical layer pipe (PLP) and subframe combinations across the country. Beyond ensuring the intended receiver will be able to successfully process the transmission, it also allows for a far more efficient transmission, as multiple use cases can be covered by a single RF channel, rather than needing to spread across multiple channels just to handle a different reception situation.

There are two primary groupings to allow tweaking of the physical configuration for the targeted receivability vs bitrate intersection for individual services: subframe and PLP. At the subframe level, deployments so far commonly adjust the guard interval, scattered pilot pattern, pilot boost, and fast Fourier transform (FFT) size. At the PLP level, modulation and code rate are the two primary levers. While others can be tweaked if needed, internal deployments focus on the above settings.

For this project, the spectrum allocation was 16.66% of the total ATSC 3.0 signal capacity (which is 6 MHz), and the bitrate required was 15 services at 45 kbps per for a total of 675 kbps. As will be seen later, the extra 10 kbps for the FEC was deemed unnecessary after field testing. The full 16.66% was also not needed, due to the efficiency at the encoding stage and delivery layer. This is what makes it so critical to plan for efficiency across the entire chain, rather than simply adjusting the physical layer to fit. Had the original service plan been maintained, it would have required approximately 3 Mbps, which would force severe compromises to receivability. All comparisons below are done relative to a baseline of 1.2 Mbps, the bitrate allocated to the PLP in the final scheme.[2]

One note: due to the channel plan in Baltimore, the C_{red_coeff}of the preamble and subframes is set to 3 to avoid interference due to WBFF’s presence on physical channel 26, as this is hosted on WNUV, physical channel 25. This reduces the number of carriers by roughly 4.2%, resulting in a total occupied bandwidth of 5.589844 MHz, rather than the normal 5.832844 MHz. This therefore reduces the bitrates described in the preceding section when compared to a standard deployment.

Subframe Configuration

The first setting looked to is FFT size. FFT size dictates the available guard interval and scattered pilot pattern options, and so is the natural starting point. It is also the easiest, at three possible choices. 8k is used internally for mobile-targeted services, while 16k is used as an all-around option for generic service reception not targeted at mobile, but not exclusive to 256 QAM. 32k is currently utilized only for exclusively fixed targeted services. Holding all else equal, jumping to 16k would provide only 145 kbps, or 12% in available bitrate,[3] making 8k an acceptable tradeoff given the low bitrate needs, as automotive reception is a priority. Given that cars travel at relatively high rates of speed, it is key to ensure any given symbol has a reduced number of FFT carriers to offset the loss of any given symbol. Half the size means approximately twice the symbols, though also more loss to the guard interval as that’s a flat number per symbol, for the same frame time, so a momentary interruption has less impact.

Once FFT size has been determined, guard interval (GI) is the next item to review. At 8k, the GI options range from 192 to 2048 samples, correlating to 27.78-296.30 μs. This one is interesting, as the initial theory internally had been to operate with a GI of 1536 samples as standard practice to avoid multipath issues. However, after a side conversation at an event put together by broadcast engineering consulting firm Meintel Sgrignoli and Wallace (MSW), where it was suggested this may be overcompensating, some field experimentation was conducted. The results lined up with this assertion, as the longest echo measured was still within the 512 sample range. These tests covered the following environments: highway driving, dense urban with tall buildings, indirect reception with no tower line-of-sight, and contour edge reception. One situation where it would be expected to require a greater guard interval would be in a single-frequency network (SFN) deployment, when greater echoes are more likely. This insight avoided a dramatic decrease in available bitrate, 318 kbps or 26.5%.[4]

The last subframe configuration made was scattered pilots and pilot boost. There is an excellent paper, [8], which provides a great reference for optimizing the pilot choices. It has both theoretical information and simulated results to guide these decisions. At 8k/512, the scattered pilot density options are 12_4, 12_2, 6_4, and 6_2. As can be seen in Figure 3, there is a notable gain from the first level of pilot boosting at 8k FFT, sufficiently offsetting the reduction in strength on the rest of the transmission. As can also be seen, there is not a dramatic benefit for 12_2 vs 6_2 density. Given that the higher density of 6_2 would cost 11%,[5] or 137 kbps, it was not deemed sufficiently beneficial, resulting in an optimal configuration of 3.20 dB pilot boost on a 12_2 scattered pilot pattern.

Figure 3: SP pattern and boosting impact on SNR performance for mobile reception at 60 km/h in time division multiplex (TDM) systems (f_c= 600 MHz). Reproduced from Figure 6 of [8].

PLP Configuration

While it is among the most impactful for reception, the PLP configuration is also one of the easier decisions to make, as the tradeoffs are very clear and broadly applicable. For signal constellation, there are four options commonly deployed in the field: QPSK, 16QAM, 64QAM, and 256QAM. Code rate varies from 2/15 to 13/15, with mandatory combinations varying depending on constellation and code length. At this point, it is generally a matter of determining the bitrate required for the service, then selecting the most receivable modulation and coding combination (modcod) that is able to supply that bitrate.

For the deployment to the WNUV transmitter, it was decided to leave significant room for future experimentation without necessitating a reconfiguration. This is what brought about the 1.2 Mbps bitrate number for the PLP containing the radio services, pre-hybrid time interleaver (HTI) configuration. For this, the mandatory option that provided the chosen bitrate with the best expected carrier-to-noise (C/N) ratio was QPSK 11/15, at 6.3 dB, per Annex B of [9]. In the truly optimized lab configuration, the chosen modcod would instead be QPSK 7/15, at 2.9 dB.

Section 5: Real-world Performance

After all the decisions discussed in Section 4 have been made, the next step is to determine the expected field test performance by performing a projection of the anticipated reception area. Those results can be seen in Figure 4 for the Baltimore market (used in the field test described below), with transmissions originating from station WNUV (539 MHz, RF channel 25). For seamless playback, the locations in blue are best suited.

Figure 4: The projected reception probability of the Baltimore designated market area (DMA) for the chosen configuration.

Drive test results are shown in Figure 5, including comparative results for analog FM radio stations WYPR and WBJC (discussed below). For WNUV, consistent reception without dropouts was achieved at highway speeds as far north as exit 10, going directly north on I-83, roughly 35 miles from the tower, as far south as Washington, DC going south on I-95, and as far east as the Delaware border going north on I-95. The latter two were used as stopping points because they represent east and west boundaries of the Baltimore DMA, rather than due to a loss of signal (all indicated by black-and-white stars on Figure 5). The reduced northern reception compared to the other two directions is primarily due to the steep peak along the Maryland-Pennsylvania border, which results in little to no line of sight after a few miles north.

Figure 5: A map of the highway drive test boundaries. The red star shows the WNUV tower location, while the black and white stars show the extent of WNUV reception for each test route. The blue hexagons show the dropout of WYPR, while the green hexagon shows the dropout of WBJC when it differed from WNUV.

Drive testing in the immediate Hunt Valley, Maryland area was also conducted to assess performance on non-highway speed streets, and near high buildings, wooded areas, and valleys. One example of a route is shown in Figure 6, though additional tests were conducted, such as in the rolling hills to the west, and downtown Baltimore to the south. At no point during these non-highway tests did the audio services experience a failure to decode.

Figure 6: An example of a non-highway drive test route. On this route, there were no dropouts in reception of audio services over WNUV.

For the FM radio comparison component of the evaluation, two analog radio stations were utilized: WYPR, 88.1 MHz, a 15.5 kW station, and WBJC, 91.5 MHz, a 50 kW station. Their contour maps can be seen in Figure 7. These were received via the built-in car radio and antenna in a 2010 Toyota Prius, while the ATSC 3.0 audio services were received using a 3.5-inch omnidirectional UHF stick antenna, with a magnetic base affixed to the roof of the car, shown in Figure 8. WYPR began to experience breakup just south of the Pennsylvania border, with total loss about a mile over the border (see Figure 5). WBJC performed similarly to the audio services over WNUV, maintaining audible reception until mile 10, though it began to experience breakup a few miles before. East and west, WYPR fell out around Riverside and Laurel, respectively, while WBJC dropped out just over the Susquehanna, while making it all the way to Washington. It must be noted that this is not a truly fair comparison, given the differing signal contours due to the densely packed mid-Atlantic FM radio spectrum.

Figure 7: The FCC contour maps for WYPR (left) and WBJC (right).

Figure 8: The antenna used in drive tests.

Section 6: Receiver Needs

One of the most interesting parts of this project was the receiver component. Even before it was decided to go down the UserDefined/RTP route, it was obvious the requirements of the project would necessitate a specific, and likely unorthodox, receiver. The key needs were as follows:

Gateway: The receiver needs to be able to distribute the incoming services, both the specialized audio services and the regular AV services, to any client devices in the vehicle, such as phones or tablets.
Low power: This would potentially be running off a battery, as it was required to be portable outside the vehicle for demonstration purposes, so it had to be able to run off an FAA-approved battery for a full 8 hours. The FAA restricts lithium ion batteries to 100 Wh, and as this system would have to be transported by plane for trade shows, it could not exceed this limit.
Mobile reception: Not all receivers are equally capable when it comes to handling signals in a mobile environment, which is a key component of an automotive service.
Low tuning time: The delay from the user pushing the button to hearing audio should be comparable to that experienced with analog FM radio.
NRT handling: Metadata and the backend of the application are intended to be transported via NRT in the design, to avoid the need for any external transmission or installation.
User interface: The native and web applications should both have a usable interface that allows easy access to all services.
Compact: The full system should fit in a relatively small portable enclosure.
UserDefined/Multicast injection: The receiver must be able to process the entries in the UserDefined table and use those to extract the data from the injected multicasts.

Market Options

Due to these unorthodox requirements, the options for receivers were few. Most commercially available receivers are designed for a regular user, and so do not allow for easy external access to low level items like NRT data or multicast streams. However, thanks to the efforts of SiliconDust, OpenZNet, and Digicap, a solution nonetheless arose. Digicap’s Autocaster product is incredibly flexible, allowing for easy access to just about any component of the ATSC 3.0 signal, in particular the NRT files and any multicast listed in the LMT, found in 7.1.1 of [10], which can then be redirected as unicast or multicast onto the local network. This was critical to the initial efforts to get a basic prototype up and running, and allowed for rapid testing because it acts as a seamless gateway connecting one multicast network at the transmission end to the network at the receive side. This meant that any unidirectional delivery method that worked in the lab network would work just as well over the air to the Autocaster and the receiving network.
OpenZNet was responsible for design and creation of the web server (used within the vehicle to connect the receiver to user devices) and the native application that tied everything together to create the user experience. This went through several design iterations. The final product assembles the app configuration via an easy-to-use website on the transmission side, then bundles it together into a compressed package that’s sent out to the receive side. This is used to populate the web interface on the receiver to user devices, or the options in the native application if accessing that way. At this point, the player then accesses the streams from the receiver. This last part sounds simple but was quite complex to make work in a consistent and responsive way and was only successful due to some heroic efforts on their part.
Lastly, SiliconDust’s HDHomeRun formed the core of the final receiver design. Despite the short timeline, the SiliconDust team were able to make several key changes to the stock firmware to enable this project. This allowed the HDHomeRun to read the UserDefined table, retrieve the correct streams, continuously buffer those streams to reduce tune delay, and provide ‘virtual tuners’ to allow multiple users in the vehicle to simultaneously consume the services while using a single tuner without needing to rework the basic functionality of the device. The other major positive to the HDHomeRun was the excellent performance in a mobile environment. A wide variety of set-top box style receivers were tested in a moving vehicle, and the HDHomeRun consistently was able to retrieve a linear AV service placed in the same PLP as the audio services at higher speeds than any other set top box assessed. Both the device and the development team are remarkably flexible and capable.

Receiver Challenges

The single greatest challenge, and one that as yet has no satisfactory solution, was AC-4 compatibility. Very few ATSC 3.0 gateway devices support AC-4, as mentioned earlier. While the audio services did not require AC-4, the regular linear AV services do, and part of the requirements given to us was the ability to play AV services. Because the web interface was required within the vehicle to allow users without the app installed to connect to the receiver, this also meant an AC-4 decoder could not simply be supplied with a native app. And as there’s no internet connection, the clever solution utilized by the HDHomeRun of transcoding on cloud servers also would not work.

As a result, several different methods were tried to provide AC-4 support to the receiver. The only successful one involved first a Raspberry Pi, then later a NUC (small form-factor PC) with sufficient compute power to continuously take in all the linear services, transcode them to HLS segments with AAC, and serve those HLS streams to the client device. While it worked in a technical sense, it was obviously very impractical, as it was neither low power nor compact. The band-aid in the short term was to utilize only recent Samsung products as client devices, which come with AC-4 support, and to integrate AC-4 decode into the native app.

Tuning time was another major challenge. Initially, all types of services would take 3-5 seconds to begin playback. This is moderately acceptable on video services, but for audio services, the target to achieve a similar delay to analog FM radio. One key point is that the only assessment was made on delay from user interaction to playback. This meant that latency from transmission to the ears was irrelevant, and so the initial solution was to simply have a continuous remux to HLS of all 15 services running. This made for near-instant tuning of the services, as there would always be a segment ready to go. Because it’s being transferred over a local network, connection speeds are very fast, so the segment is transferred in a fraction of a second to begin playing. The downside of this approach, however, was the instability, as it layered another single point of failure over each service. Fortunately, SiliconDust was able to devise a solution in which the HDHomeRun would continuously buffer a short section of each audio service being signaled in the UserDefined as long as any audio service was tuned. This meant that after a short initial tune of 1-2 seconds, made faster thanks to some optimizations made to prefer speed over initial stability (the opposite of what would be desired for video), tuning times would be around half a second. Meanwhile, the team at OpenZNet was able to rapidly optimize their player solution to ensure as little delay as possible, dropping the video tuning time from 5 seconds down to 2-3 seconds.

Section 7: Future Usage

This section discusses a series of potential next steps based on the findings above.

Physical Layer Fine Tuning

The most immediate and easiest alteration is to adjust the physical layer, as the current configuration outperforms requirements. The bitrate-optimized configuration of QPSK 7/15 is unnecessary, given that QPSK 11/15 is more than adequate. Testing also found that the FEC component of the audio streams was unnecessary. Instead, the number of cells allocated to the PLP can be reduced to only those necessary for the radio service bitrate plus the metadata delivery to lessen the spectrum impact. This would drop usage from 16.66% to 10% of the usable channel bandwidth. This is an effective 600 kHz, though not literally, given the usage of TDM. Divided into 15 channels, this provides 40 kHz per channel. This translates to roughly 20% of the usage by analog FM radio, not considering the added efficiency of HD Radio digital radio, and considering the full assigned channel bandwidth of 200 kHz rather than exclusively the audio component. Further testing could be utilized to reduce receivability of the modcod even further, improving efficiency by finding the optimum operating point.

MFN Service Deployment

Part of the table design included two sets of IDs, one for the service as a whole, and one for that specific transmission of the service. This could potentially be utilized to transition the listener between two broadcasts of the same service as they move from market to market, with only a momentary interruption. With the coverage results from this test, it would be theoretically possible to drive from Washington, DC to Portland, Maine while remaining in range of an ATSC 3.0 signal the entire time, assuming each market has a 3.0 deployment. This would fulfill a role similar to that of the SLT.Service.OtherBsid element, whose usage is explored by Sony in [11].

Handheld Usage

Another major potential use case is that of a cellular phone. Due to the ease of receivability, this would be eminently possible using a device like a Mark One, which has an in-built ATSC 3.0 tuner, or using a phone with a USB tuner as Saankhya’s Yoga dongle attached. This would also eliminate complexity from the user experience, as no separate receiver would be needed, just their phone. Testing of a video service in the same PLP had favorable results on the ONE Media Mark One, and so integration of the audio services is worth exploring.[6]

Receiver Market Gaps

As applications like datacasting have become more prominent using ATSC 3.0 technology, one item that has been noted is a lack of receivers designed with datacasting in mind. Simple file transfer via NRT is relatively common, and some even incorporate application layer FEC (AL-FEC). However, some of the most interesting applications of these do not easily fit into basic file delivery, like those mentioned in Section 3. Most of these projects have required extensive custom work to integrate into a receiver, utilizing platforms like Sony’s USB ATSC 3.0 tuner dongle, Digicap’s Homecaster and Autocaster, and SiliconDust’s HDHomeRun as bases. ONE Media’s intention is to continue working with receiver manufacturers to better integrate these more advanced datacasting projects. Additionally, work on receivers with better mobile reception will help to alleviate the need for more receivable configurations in the automotive reception environment, allowing for better spectral efficiency.

Standardization of Audio Services

While the current ATSC 3.0 specification allows for audio-only services, the overhead required to transmit them is simply too high to be feasible on a commercial scale, as mentioned earlier. Additionally, while AC-4 provides many advantages for television, there are other codecs that provide more efficient options in a bitrate-sensitive application like this, especially when it comes to highly compressible content like talk radio.

In addition to the bitrate concerns, a standardized signaling method designed around audio services would provide a clear target for receiver manufacturers, and allow for the easy incorporation of key features, such as song metadata, thumbnails, alerts, and news tickers.

One proposal is to make use of the Digital Radio Mondiale (DRM) standard [13], the development of which has been driven in large part by Fraunhofer IIS. While DRM is deployed as a more traditional digital radio application in countries around the world, there exists a DRM drop-in option that fits neatly into an ATSC 3.0 system, thanks to the option to inject multicast into the scheduler. The signaling, audio encoding, metadata, thumbnails, alerts, and tickers would all be handled by the DRM packager-equivalent, then bundled into a multicast stream. On the receiving side, DRM software would obtain the multicast stream from the ATSC 3.0 receiver and unpack it into the requisite components for the user experience. Incorporating DRM as the standard for audio delivery on ATSC 3.0 would solve many of the issues illustrated throughout this paper and provide a base of support given the existing global adoption. It is better to have a single unified audio services standard, than to end up with a multitude of competing implementations brought about by a gap in the existing ATSC 3.0 standard.

References

Pro-MPEG Code of Practice #3 release 2:Transmission of Professional MPEG-2 Transport Streams Over IP Network, July 2004.
ATSC Standard: Signaling, Delivery, Synchronization, and Error Protection, Doc. A/331:2023-02.
“HDHomeRun apps now support ATSC 3.0 audio on more devices, even when they lack the ATSC 3.0 audio codecs,” https://mailchi.mp/245e1f09021d/hdhomerun-apps-now-support-atsc-30-audio-on-more-devices
“xHE-AAC Audio Codec now in Windows 11,” https://www.audioblog.iis.fraunhofer.com/xhe-aac-windows11
ATSC Recommended Practice: Techniques for Signaling, Delivery, Synchronization, and Error Protection, Doc. A/351:2022-03.
Iyengaar, J and Thomson, M., “QUIC: A UDP-Based Multiplexed and Secure Transport,” Internet Engineering Task Force, RFC 9000.
Bush, D. “UFTP – Encrypted UDP based FTP with multicast.”
Garro, E., Gimenez, J., Park, S., and Gomez-Barquero, D. “Pilot configuration optimization for ATSC 3.0,” IEEE Transactions on Broadcasting, December 2016.
ATSC Recommended Practice: Guidelines for the Physical Layer Protocol, Doc. A/327:2022-12.
ATSC Standard: Link-Layer Protocol, Doc. A/330:2022-03.
Fay, L. and Clift, G., “ATSC 3.0 File Delivery to Multiple Markets.”
Pardue, L. and Bradbury, R., “Hypertext Transfer Protocol (HTTP) over multicast QUIC,” Internet Engineering Task Force Network Working Group, draft-pardue-quic-http-mcast-11.
Digital Radio Mondiale (DRM); System Specification, ETSI ES 201 980 V4.2.1 (2021-01).

Acknowledgments

Marshall Behrmann and the team at ONE Media, for their invaluable work on this project.
Azita Manson of OpenZNet for handling the front end under tight deadlines.
Nick Kelsey of SiliconDust for his effort, ideas, and constant revisions of the HDHomeRun firmware.
Gary Sgrignoli of MSW for his endless knowledge of the physical layer, as well as his willingness to provide feedback on any and all theories.
Alexander Zink of Fraunhofer IIS for his assistance with all things AAC.

Appendix A: UserDefined Audio Service Example

Below is an example of the UserDefined table described in Section 3. This one is for the Baltimore market, and so uses the 54 for the major channel and 1409 for the BSID, as it’s transmitted on WNUV.

<?xml version="1.0" encoding="UTF-8"?>
<UDS>
    <OMLradio name="Baltimore Radio">
        <rsrv name="Hit List" destIP="239.255.54.201" destPort="5000" mktid="1409201" srvid="10001" />
        <rsrv name="Classic Rock" destIP="239.255.54.202" destPort="5000" mktid="1409202" srvid="10002"/>
        <rsrv name="Greatest Hits" destIP="239.255.54.203" destPort="5000" mktid="1409203" srvid="10003" />
        <rsrv name="Alternative" destIP="239.255.54.204" destPort="5000" mktid="1409204" srvid="10004" />
        <rsrv name="Pop Adult" destIP="239.255.54.205" destPort="5000" mktid="1409205" srvid="10005" />
        <rsrv name="Hip Hop" destIP="239.255.54.206" destPort="5000" mktid="1409206" srvid="10006" />
        <rsrv name="Hot Country" destIP="239.255.54.207" destPort="5000" mktid="1409207" srvid="10007"/>
        <rsrv name="Flashback 70s" destIP="239.255.54.208" destPort="5000" mktid="1409208" srvid="10008" />
        <rsrv name="Everything 80s" destIP="239.255.54.209" destPort="5000" mktid="1409209" srvid="10009" />
        <rsrv name="Latin Pop" destIP="239.255.54.210" destPort="5000" mktid="1409210" srvid="10010" />
        <rsrv name="Karaoke" destIP="239.255.54.211" destPort="5000" mktid="1409211" srvid="10011" />
        <rsrv name="Naturescape" destIP="239.255.54.212" destPort="5000" mktid="1409212" srvid="10012"/>
        <rsrv name="Qello Concerts" destIP="239.255.54.213" destPort="5000" mktid="1409213" srvid="10013" />
        <rsrv name="Classica" destIP="239.255.54.214" destPort="5000" mktid="1409214" srvid="10014" />
        <rsrv name="Djazz" destIP="239.255.54.215" destPort="5000" mktid="1409215" srvid="10015" />
    </OMLradio>
</UDS>

[1] Multiple Stimuli with Hidden Reference and Anchor (MUSHRA) is a methodology for conducting a codec listening test to evaluate the perceived quality.

[2] Note that the 1.2 Mbps baseline represents the final bitrate that results from the 16.66% spectrum allocation combined with the physical layer choices.

[3] 12% when compared to the final PLP bitrate of 1.2 Mbps.

[4] 26.5% when compared to the final PLP bitrate of 1.2 Mbps.

[5] 11% when compared to the final PLP bitrate of 1.2 Mbps.

[6] See https://www.tvtechnology.com/news/one-medias-atsc-30-smartphone-becomes-a-reality for information on the Mark One smartphone.

­­Audio Services over ATSC 3.0: A Proof of Concept