HTTP Live Streaming protocol, better known as HLS, was originally created by Apple for the launch of iPhone 3. Since its original introduction in 2009, Apple has maintained the official HLS protocol specification as an informational IETF Internet Draft, updating it at least bi-annually to maintain its status as an IETF working document. (Despite its now 7-year stint as an IETF working document, HLS has never been officially ratified as a standard by any industry organization, therefore it still remains a de facto standard in the streaming world.)
The following table is a summary of all the updates Apple has made to the official HLS specification to this date:
|HLS Protocol Version||IETF Draft Version||Publish Date||#Tags, Attributes or Features Introduced or Removed *|
|1||00||5/1/2009||#EXTINF, #EXT-X-TARGETDURATION, #EXT-X-MEDIA-SEQUENCE, #EXT-X-KEY, #EXT-X-PROGRAM-DATE-TIME, #EXT-X-ALLOW-CACHE, #EXT-X-STREAM-INF, #EXT-X-ENDLIST|
|3||05||11/19/2010||floating point segment durations|
|4||07||9/30/2011||#EXT-X-BYTERANGE, #EXT-X-MEDIA, #EXT-X-STREAM-INF, #EXT-X-I-FRAMES-ONLY, #EXT-X-I-FRAME-STREAM-INF, alternate audio/video renditions|
|5||09||9/22/2012||#EXT-X-MAP, X-TIMESTAMP-MAP, KEYFORMAT, KEYFORMATVERSIONS, WebVTT subtitles, sample AES encryption|
|6||12||10/14/2013||#EXT-X-DISCONTINUITY-SEQUENCE, #EXT-X-START, PROGRAM-ID, CEA-608 service channels|
|7||14||10/14/2014||#EXT-X-SESSION-DATA, #EXT-X-ALLOW-CACHE, CEA-708 service channels|
|15||4/15/2015||FRAME-RATE, E-AC-3 codec support|
|19||4/4/2016||#EXT-X-DATERANGE, SCTE-35 signaling|
* Some attributes and features may have been omitted for conciseness.
Starting in 2010 the HLS specification introduced the concept of HLS protocol versioning in an effort to manage HLS client compatibility. What’s been surprising, however, is just how frequently HLS protocol versioning has been misunderstood by implementers. Over the course of working with HLS I’ve heard many customers/partners/vendors say things like:
“Our HLS client is only v3 compatible. Can you give us an HLS playlist version without WebVTT subtitles so it doesn’t break our client?”
“This HLS playlist says it’s v3 but we found it contains #EXT-X-INDEPENDENT-SEGMENTS tag which is a v6 feature. You need to change your declared version to 6.”
“Our HLS content has multiple audio languages so you must package it as v4.”
And if you take a look at the table above, these statements seem to make sense. WebVTT subtitles were introduced in version 5 so they shouldn’t be present in a v3 playlist, right? #EXT-X-INDEPENDENT-SEGMENTS tag was introduced in v6 so it shouldn’t be included in a v3 playlist, right? Multiple audio languages require v4 packaging because that’s when alternate renditions were introduced, right?
This common misinterpretation of the HLS spec is rooted in the assumption that the purpose of HLS protocol versions is to ensure certain features work only on clients which support them. But that is not the case. The purpose of HLS protocol versions is to ensure certain features don’t break older clients, which is a fundamentally different problem.
Nearly every communications protocol has among its design goals the goal to minimize compatibility issues between clients and services as the protocol evolves over time. The HLS protocol accomplishes this forwards compatibility goal by setting forth two essential client implementation requirements (section 6.3.1: General Client Responsibilities):
Clients MUST ensure […] that the EXT-X-VERSION tag, if present, specifies a protocol version supported by the client; if either check fails, the client MUST NOT attempt to use the Playlist, or unintended behavior could occur.
To support forward compatibility, when parsing Playlists, Clients MUST:
- ignore any unrecognized tags.
- ignore any Attribute/value pair with an unrecognized AttributeName.
- ignore any tag containing an attribute/value pair of type enumerated-string whose AttributeName is recognized but whose AttributeValue is not recognized, unless the definition of the attribute says otherwise.
These basic tenets allows Apple to introduce new HLS features in the form of new tags (e.g. #EXT-X-DISCONTINUITY, #EXT-X-START, etc.) or new attributes (e.g. VIDEO, AUDIO, SUBTITLES, FRAME-RATE) without breaking old clients. It also enables protocol extensibility as it allows organizations to add proprietary tags (e.g. #EXT-X-SCTE35 for ad break signaling) while remaining compatible with existing HLS clients.
The challenge with designing any protocol over a long period of time, of course, is handling breaking changes. What happens when you need to introduce a feature which fundamentally alters previously established concepts or understandings? A well implemented HLS client is expected to ignore any tags or attributes it doesn’t recognize, so the only situation in which a compliant client would need to be warned about something new is if the change somehow defied existing assumptions and expectations. This is where protocol versioning comes in. Rather than increment the protocol version every time a new HLS feature is introduced, Apple only increments protocol versions when changes are introduced which break backwards compatibility.
For example, up until November 2010 the HLS spec only allowed segment durations (#EXTINF values) to be defined as integer values. When it became evident that greater precision was required, Apple updated the spec (version 05) to allow floating point values for #EXTINF. Since doing so changed the previous definition of #EXTINF, it also created the risk of breaking any existing HLS client which enforced the old integer requirement. So in order to shield old clients from this breaking change Apple incremented the protocol version requirement to 3 for any HLS playlists which use floating point values for segment durations. If implemented correctly, a v2 client should refuse to play a v3 playlist because it can’t guarantee successful playback. And if a v2 client does support floating point #EXTINF values… well then it should declare itself v3 compatible.
So what are these backwards-compatibility breaking features which force HLS version increments? Fortunately, the HLS specification provides a very unambiguous answer to this question in Section 7: Protocol version compatibility from which we can compile this table:
|If M3U8 playlist uses…||You must declare at least version…|
|IV attribute of the EXT-X-KEY tag||2|
|Floating-point EXTINF duration values||3|
|KEYFORMAT and KEYFORMATVERSIONS attributes of the EXT-X-KEY tag||5|
|EXT-X-MAP tag in a playlist that does not contain EXT-X-I-FRAMES-ONLY||6|
|“SERVICE” values for the INSTREAM-ID attribute of the EXT-X-MEDIA tag||7|
In determining HLS protocol version, this is the only table that matters. It doesn’t matter which version of the HLS spec introduced a particular feature. You only need to increment your declared HLS version if your playlist contains any of the tags, attributes or features listed above.
This is why, consequently, an HLS playlist utilizing a feature such as WebVTT subtitles doesn’t need to be declared as version 5. In fact, it doesn’t even need to be declared any higher than version 1. Why is that? Well, in order to define WebVTT subtitles in a master playlist one must use the #EXT-X-MEDIA tag. Since the #EXT-X-MEDIA tag is not considered a backwards-compatibility breaking tag, it poses no risk to clients which don’t support it. A truly spec compliant client that doesn’t support #EXT-X-MEDIA is expected to ignore all unknown tags and proceed without them. Therefore, a playlist containing #EXT-X-MEDIA definitions of WebVTT subtitles is not obligated to “warn” a client about potential compatibility issues.
Note that this doesn’t guarantee at all that a client consuming such a playlist – regardless of its declared version number – will have the ability to correctly ingest, process and render WebVTT subtitles. Similarly, stating #EXT-X-VERSION:4 does not guarantee that a compatible client will be able to switch between multiple audio languages just because they’re present in the playlist. The goal of HLS versioning isn’t to ensure fully-featured playback but to prevent catastrophic playback failure, and in that respect failure to render subtitles or switch audio languages isn’t considered catastrophic. This may seem like an odd distinction, and the HLS spec sort of acknowledges it by using alternate media renditions as example: “The EXT-X-MEDIA tag and the AUDIO, VIDEO and SUBTITLES attributes of the EXT-X-STREAM-INF tag are backward compatible to protocol version 1, but playback on older clients may not be desirable. A server MAY consider indicating a EXT-X-VERSION of 4 or higher in the Master Playlist but is not required to do so.”
In other words: HLS versioning can guarantee that your content will play back, but it can’t guarantee that the experience will be perfect.