Smooth Streaming Architecture

As described in the previous two posts, Smooth Streaming is Microsoft’s implementation of HTTP-based adaptive streaming, which is a hybrid media delivery method. It acts like streaming, but is in fact based on HTTP progressive download. The HTTP downloads are performed in a series of small chunks, allowing the media to get easily and cheaply cached along the edge of the network, closer to the end users. Providing multiple encoded bitrates of the same media source also allows Silverlight clients to seamlessly and dynamically switch between bitrates depending on network conditions and CPU power. The resulting end user experience is one of reliable, consistent playback without stutter, buffering or “last mile” congestion. In one word: Smooth.

In this post we’ll take a closer look at how Smooth Streaming works: format, server, and client.

 

Smooth Streaming Format

Smooth Streaming is the first Microsoft media format in over a decade to use a file format other than ASF. It is based on the ISO/IEC 14496-12 ISO Base Media File Format specification, better known as the MP4 file specification. Why MP4 and not ASF? Well, there are several reasons:

  • MP4 is a lightweight container format with less overhead than ASF
  • MP4 is easier to parse in managed (.NET) code than ASF
  • MP4 is based on a widely used standard, making 3rd party adoption and support more straightforward
  • MP4 was architected with H.264 video codec support in mind, and we’re counting on H.264 support in Smooth Streaming and Silverlight 3 (ASF can also contain H.264 video, but it’s not as straightforward as with MP4)
  • MP4 was designed to natively support payload fragmentation within the file

There are actually 2 parts to the Smooth Streaming format: the wire format, and the disk file format. In Smooth Streaming a video is recorded in full length to the disk as a single file (one file per encoded bitrate), but it’s transfered to the client as a series of small file chunks. The wire format defines the structure of the chunks that get sent by IIS to the client, whereas the file format defines the structure of the contiguous file on disk. Fortunately, the MP4 specification allows MP4 to be internally organized as a series of fragments, which means that in Smooth Streaming the wire format is a direct subset of the file format.

What are these MP4 “fragments” that I speak of? The basic unit of an MP4 file is called a “box.” These boxes can contain both data and metadata. The MP4 specification allows for various ways to organize data and metadata boxes within a file. In most media scenarios it is considered useful to have the metadata written before the data so that a player client application can have more information about the video/audio it’s about to play before it plays it. However, in live streaming scenarios it is often not possible to write the metadata upfront about the whole data stream because it’s simply not fully known yet. Furthermore, less upfront metadata means less overhead, which can lead to shorter startup times. For these reasons the MP4 ISO Base Media File Format specification was designed to allow MP4 boxes to be organized in a fragmented manner, where the file can be written “as you go” as a series of short metadata/data box pairs, rather than one long metadata/data pair. The Smooth Streaming file format heavily leverages this aspect of the MP4 file specification, to the point where at Microsoft we often interchangeably refer to Smooth Streaming files as “Fragmented MP4 files” or “(f)MP4.”

Here is a high-level overview of what a Smooth Streaming file looks like on the inside:

Smooth Streaming File Format

Smooth Streaming File Format

In a nutshell, the file starts with file-level metadata (‘moov‘) which generically describes the file, but the bulk of the payload is actually contained in the fragment boxes which also carry more accurate fragment-level metadata (‘moof‘) and media data (‘mdat‘). (The diagram only shows 2 fragments, but a typical Smooth Streaming file has a fragment per every 2 seconds of video/audio.) Closing the file is a ‘mfra‘ index box which allows easy and accurate seeking within the file.

When a Silverlight client requests a video time slice from the IIS Smooth Streaming server, the server simply seeks to the approriate starting fragment in the MP4 file and then lifts the fragment out of the file and sends it over the wire to the client. This is why we refer to the fragments as the “wire format.” This technique greatly enhances the efficiency of the IIS server because it requires no remuxing or rewriting overhead.

Here is what an MP4 fragment looks like in more detail:

Smooth Streaming Wire Format

Smooth Streaming Wire Format

We say that the Smooth Streaming format is based on the MP4 file format because even though we’re following the ISO specification, we specify our own box organization schema and some custom boxes. In order to differentiate Smooth Streaming files from “vanilla” MP4 files, we use new file extensions: *.ismv (video+audio) and *.isma (audio only). I keep forgetting to ask the IIS Media team what the acronyms exactly stand for, but my best guess would be “IIS Smooth Streaming Media Video (Audio)”.

 

Smooth Streaming Media Assets

A typical Smooth Streaming media asset therefore consists of the following files:

  • MP4 files containing video/audio
    • *.ismv – contains video and audio, or only video
      • 1 ISMV file per encoded video bitrate
    • *.isma – contains only audio
      • In videos with audio, the audio track can be muxed into an ISMV file instead of a separate ISMA file
  • Server manifest file
    • *.ism
    • Describes the relationships between media tracks, bitrates and files on disk
    • Only used by the IIS Smooth Streaming server – not by client
  • Client manifest file
    • *.ismc
    • Describes to the client the available streams, codecs used, bitrates encoded, video resolutions, markers, captions, etc.
    • It’s the first file delivered to the client

Both manifest file formats are based on XML. The server manifest file format is based specifically on the SMIL 2.0 XML format specification.

A folder containing a single Smooth Streaming media asset might look something like this:

A typical folder containing a Smooth Streaming media asset

A folder containing a Smooth Streaming media asset

In this particular case the audio track is contained in the NBA_3000000.ismv file.

 

Smooth Streaming Manifest Files

The Smooth Streaming Wire/File Format specification defines the manifest XML language as well as the MP4 box structure. Because the manifests are based on XML they are highly extensible. Among the features already included in the current Smooth Streaming format specification is support for:

  • VC-1, WMA, H.264 and AAC codecs
  • Text streams
  • Multi-language audio tracks
  • Alternate video and audio tracks (i.e. multiple camera angles, director’s commentary, etc.)
  • Multiple hardware profiles (i.e. same bitrates targeted at different playback devices)
  • Script commands, markers/chapters, captions
  • Client manifest Gzip compression
  • URL obfuscation
  • Live encoding and streaming

For an example of a Smooth Streaming On-Demand Server Manifest file, see here.

For an example of a Smooth Streaming Client Manifest file, see here.

 

Smooth Streaming Playback: Bringing It All Home

Microsoft’s adaptive streaming prototype (used for NBC Olympics 2008) relied on physically chopping up long video files into small file chunks. In order to retrieve the chunks for the web server, the player client simply needed to download files in a logical sequence: 00001.vid, 00002.vid, 00003.vid, etc.

As I’ve explained in this and previous posts, Smooth Streaming uses a more sophisticated file format and server design. The videos are no longer split up into thousands of file chunks, but are instead “virtually” split up into fragments (typically 1 fragment per video GOP) and stored within a single contiguous MP4 file. This implies two significant changes in server and client design too:

  1. The server must be able to translate URL requests into exact byte range offsets within the MP4 file, and
  2. The client can request chunks in a more developer-friendly manner, such as by timecode instead of by index number

The first thing a Silverlight client requests from the Smooth Streaming server is the *.ismc client manifest. The manifest tells it which codecs were used  to compress the content (so that the Silverlight runtime can initialize the correct decoder and build the playback pipeline), which bitrates and resolutions are available, and a list of all the available chunks and either their start times or durations.

With IIS7 Smooth Streaming, a client is expected to request fragments in the form of RESTful URLs:

http://video.foo.com/NBA.ism/QualityLevels(400000)/Fragments(video=610275114)
http://video.foo.com/NBA.ism/QualityLevels(64000)/Fragments(audio=631931065)

The values passed in the URL represent encoded bitrate (i.e. 400000) and the fragment start offset (i.e. 610275114) expressed in an agreed-upon time unit (usually 100 ns). These values are known from the client manifest.

Upon receiving a request like this, the IIS7 Smooth Streaming component looks up the quality level (bitrate) in the corresponding *.ism server manifest and maps it to a physical *.ismv or *.isma file on disk. It then goes and reads the appropriate MP4 file, and based on its ‘tfra’ index box figures out which fragment box (‘moof’ + ‘mdat’) corresponds to the requested start time offset. It then extracts the said fragment box and sends it over the wire to the client as a standalone file. This is a particularly important part of the overall design because the sent fragment/file can now be automatically cached further down the network, potentially saving the origin server from sending the same fragment/file again to another client requesting the same RESTful URL.

As you can see, requesting chunks of video/audio from the server is easy. But what about dynamic bitrate switching that makes adaptive streaming so effective? This part of the Smooth Streaming experience is implemented entirely in client-side Silverlight application code – the server plays no part in the bitrate switching process. The client-side code looks at chunk download times, buffer fullness, rendered frame rates, and other factors – and based on them decides when to request higher or lower bitrates from the server. Remember, if during the encoding process we ensure that all bitrates of the same source are perfectly frame aligned (same length GOPs, no dropped frames), then switching between bitrates is completely seamless – and Smooth.

In my next blog post: Encoding For Smooth Streaming

Posted in H.264, Internet Information Services, Silverlight, Smooth Streaming | Tagged , , , , , , | 62 Comments

The Birth of Smooth Streaming

In my last post I talked about the history of multi-bitrate streaming and how we got from RTSP and HTTP streaming back to HTTP download as the primary media web distribution mechanism. In this post we’ll take a closer look at how adaptive streaming differs from traditional streaming (i.e. RTSP) and plain progressive download.

 

Traditional Streaming

Let’s start by first taking a look at RTSP as an example of a traditional streaming protocol. RTSP is defined as a stateful protocol. This means that from the first time a client connects to the streaming server until the time it disconnects from the streaming server, the server keeps track of a client’s state. The client communicates its state to the server by issuing it commands such as PLAY, PAUSE or TEARDOWN (the first two are obvious; the last one is used to disconnect from the server and close the streaming session).

Once a session between the client and the server has been established, the server begins sending down the media as a steady stream of small packets (the format of these packets is known as RTP). The size of a typical RTP packet is 1452 bytes, which means that in a video stream encoded at 1 Mbps each packet only carries roughly about 11 msec of video. In RTSP the packets can be trasmitted either over UDP or TCP transports – the latter is preferred in cases where firewalls or proxies block UDP packets, but can also lead to increased latency (TCP packets get re-sent until received).

RTSP is an example of a traditional streaming protocol

RTSP is an example of a traditional streaming protocol

 

An HTTP protocol, on the other hand, is known as a stateless protocol. If an HTTP client requests some data, the server will respond by sending down the data, but it won’t remember the client or its state. Every HTTP request is handled as a completely standalone one-time session.

Windows Media Services supports streaming over both RTSP and HTTP. Now, you may ask yourself, “But if HTTP is a stateless protocol, how can it be used for streaming?” WMS uses a modified version of HTTP known as MS-WMSP, which uses standard HTTP for transfer of data and messages but also maintains session states, thus effectively turning it into a streaming protocol like RTSP/TCP. Windows Media Services has also supported RTSP streaming since 2003 (9 Series), over both UDP and TCP. Its implementation of the protocol is publicly documented as MS-RTSP.

The important things to remember about traditional streaming protocols like RTSP and WMS-HTTP is that:

  1. The server sends the data packets to the client at a real-time rate only – that is the bit rate at which the media is encoded (i.e. a 500 kbps encoded video is streamed to the client at approx. 500 kbps)
  2. The server only sends ahead enough data packets to fill the client buffer. The client buffer is typically between 1 and 10 seconds (WMP and Silverlight default buffer length is 5 seconds). This means that if you pause a streamed video and wait 10 minutes – still only ~5 seconds of video will have downloaded to your client in that time.

Progressive Download

The other most common form of media delivery on the Web today is progressive download. Progressive download is nothing more than just a plain, ordinary file download from an HTTP web server like IIS or Apache. It is supported by Silverlight, Flash, WMP, and nearly every other media player and platform under the sun. The term “progressive” likely stems from the fact that most player clients allow the media file to be played back while the download is still in progress – before the entire file has been fully written to disk (typically to the browser cache). Clients that support the HTTP 1.1 specification can also seek to positions in the media file that haven’t been downloaded yet by performing byte range requests to the web server (assuming it also supports HTTP 1.1).

The most popular video sharing websites on the Web today almost exclusively use progressive download:  YouTube, Vimeo, MySpace, MSN Soapbox – and even the rather misnamed Silverlight Streaming Service. (See my previous blog posts for a list of reasons why HTTP download is becoming increasingly popular in the online delivery of media.)

Unlike streaming servers which rarely send more than 10 seconds of media data to the client at a time, HTTP web servers keep the data flowing until the download is complete. This leads to the user experience that we have by now grown very accustomed to thanks to YouTube – if you pause a YouTube video at the beginning of playback and wait, eventually the entire video will have downloaded to your browser cache, allowing you to smoothly play the whole video without any hiccups. There is a downside to this behavior as well – if 30 seconds into a fully downloaded 10 minute video you decide you don’t like it and quit the video, both you and your content provider have just wasted 9:30 minutes worth of bandwidth. In an effort to mitigate this problem, IIS7 Media Pack 1.0 provides a cool feature called Bit Rate Throttling which allows content providers to throttle the download bitrate in order to reduce costs. But that’s another story…

 

Adaptive Streaming

Speaking of misnomers… Here’s another one: Adaptive Streaming. Guess what? It’s not really streaming in the classic sense at all.

Adaptive streaming is really a hybrid delivery method. It acts like streaming but is in fact based on HTTP progressive download. A more technically accurate name for adaptive streaming might be “A Series of Progressive Downloads of Variable Sized Video Fragments,” but even the most determined marketing experts would have a hard time selling that one. 🙂

A very important thing to remember about adaptive streaming is that there doesn’t really exist a standard implementation for it today because it’s just an advanced download concept, rather than a new protocol. This is why we talk about both Microsoft Smooth Streaming and Move Networks Adaptive Stream as examples of adaptive streaming, even though they use mutually incompatible codecs, formats and encryption schemes. They both rely on HTTP as the transport protocol and perform the media download as a long series of very small progressive downloads, rather than one big progressive download.

In a prototypical adaptive streaming implementation, the video/audio source is cut up into many short segments (“chunks”) and encoded to the desired delivery format. Chunks are typically 2-4 seconds in length. On the video codec level this typically means that each chunk is cut along video GOP boundaries (each chunk starts with a key frame) and has no dependencies on past or future chunks/GOPs. This allows every chunk to later be decoded completely independently from other chunks.

The encoded chunks are then hosted on a regular HTTP web server. A client requests the chunks from the web server in a linear fashion and downloads them using plain HTTP progressive download. As the chunks are downloaded to the client, the client plays back the sequence of chunks in linear order. Because the chunks were carefully encoded without any gaps or overlaps between them, the chunks play back as a seamless video.

The “adaptive” part of the solution comes into play when the video/audio source is encoded at multiple bitrates, generating multiple chunks of various sizes for each 2-4 seconds of video. The client now has the option to choose between chunks of different sizes. Because web servers usually deliver data as fast as network bandwidth allows them, the client can easily estimate user bandwidth and decide to download bigger or smaller chunks ahead of time. The size of the playback/download buffer is fully customizable.

Adaptive streaming is a hybrid media delivery method

Adaptive streaming is a hybrid media delivery method

 

Adaptive streaming, like other forms of HTTP delivery, offers the following advantages to the content provider:

  • It’s cheaper to deploy because adaptive streaming can utilize any generic HTTP caches/proxies and doesn’t require specialized servers at every node
  • It offers better scalability and reach, reducing “last mile” issues because it can dynamically adapt to inferior network conditions as it gets closer to the user’s home
  • It lets the audience adapt to the content, rather than requiring the content providers to guess which bitrates are most likely to be accessible to their audience

It also offers the following benefits for the end user:

  • Fast start-up and seek times because start-up/seeking can be initiated on the lowest bitrate before moving up to a higher bitrate
  • No buffering, no disconnects, no playback stutter (as long as the user meets the minimum bitrate requirement)
  • Seamless bitrate switching based on network conditions and CPU capabilities
  • A generally consistent, smooth playback experience

 

Microsoft Adaptive Streaming Prototype: NBC Olympics 2008

We first prototyped an implementation of HTTP-based adaptive streaming as part of the NBC Olympics 2008 project. In order to deliver the desired level of quality in a short period of time, we took the most basic adaptive streaming implementation approach. We had NBC’s Digital Rapids and Anystream encoders produce multiple WMV files of different bitrates/resolutions for each source. The encoders didn’t employ any new encoding tricks but merely followed strict encoding guidelines (closed GOP, fixed length GOP, VC-1 entry point headers) which ensured exact frame alignment across the various bitrates of the same video. Then we ran the WMV files through a post-processing tool which physically split each WMV file into thousands of 2-second chunks (files). The rest of the solution consisted of simply uploading the chunks to Limelight’s origin web servers (running Apache) and then building a Silverlight player that would download the chunks and play them in sequence. Simple!

The good news: Our implementation worked great for the end users. We were able to offer a better-than-WMS streaming experience while using just simple HTTP download!

The bad news: CDN operators lost many hours (days?) managing millions of tiny files in their systems. Imagine: if every 2-seconds of video is split into a separate file and this is repeated for 5 available bitrates, you end up with 150 files for every minute of video. That’s 13,500 files for a 90-minute soccer game!

So despite NBC Olympics being a huge success for Silverlight and HTTP-based adaptive streaming, it quickly became apparent we had to go back to the drawing board on this one.

 

At Last! Smooth Streaming!

The IIS Media team soon took charge of turning the NBC Olympics adaptive streaming solution into a real server product. Its official name – IIS7 Smooth Streaming, an extension for Internet Information Services 7.0.

The IIS Media team redesigned the content creation and delivery aspect of the prototype solution in order to fix the file management issues while still keeping all the advantages of the original solution. The new design eschewed the one-file-per-chunk approach in favor of a single contiguous file for each encoded bitrate. The file format of choice: MPEG-4.

Smooth Streaming server uses the MPEG-4 Part 14 (ISO/IEC 14496-12) file format as its disk (storage) and wire (transport) format. Specifically, the Smooth Streaming specification defines each chunk/GOP as an MPEG-4 Movie Fragment (moof) and stores it within a contiguous MP4 file for easy random access. One MP4 file is expected per each bitrate. When the client requests a specific source time segment from the IIS server, the server dynamically finds the appropriate Movie Fragment box within the contiguous MP4 file and sends it over the wire as a standalone file, thus ensuring full cacheability downstream.

In other words, with Smooth Streaming file chunks are created virtually upon client request, but the actual video is stored on disk as a single full-length file per encoded bitrate.

 

Smooth Streaming Availability

Smooth Streaming server support will ship as part of the next edition of IIS7 Media Pack, a free download for Windows Server 2008. A technology preview of IIS7 Smooth Streaming Server is already available now through Akamai. They are calling this service Akamai AdaptiveEdge Streaming for Microsoft Silverlight. A demo of the service is available at http://www.smoothhd.com.

On the content creation end, creation of on-demand Smooth Streaming-compatible video is already possible with the latest Expression Encoder 2 SP1. Note that you’ll need to purchase the full version of Expression Encoder 2 in order to get Smooth Streaming encoding support – it’s not included in the “Express” trial version. As a helper tool for encoding to multiple-bitrate formats such as Smooth Streaming, I recommend my Smooth Streaming Calculator. (More about Smooth Streaming encoding with Expression Encoder to come soon.)

In addition, we are already working with a number of encoding ISVs on enabling support for the Smooth Streaming format in their professional encoding products.

 

Smooth Streaming Playback:

You probably already know that Silverlight 2 supports playback of Smooth Streaming sources (if you don’t, go to http://www.smoothhd.com). But how does it do it?

Despite popular belief, Silverlight doesn’t actually feature native support for any particular adaptive streaming technology – Microsoft’s, Netflix’s or Move Networks’ for example. Smooth Streaming support in Silverlight is implemented via the MediaStreamSource API. This API allows developers to implement their own media transport methods (instead of relying on MediaElement‘s native transport methods) while still leveraging Silverlight’s native decoders and renderers. In other words, Silverlight support for Smooth Streaming is provided entirely in .NET code: the parsing of the MPEG-4 file format, the HTTP download, the bitrate switching heuristics, etc. This allows developers to modify and fine-tune the client adaptive streaming code as needed, instead of waiting for the next Silverlight release and hoping it magically fixes every customer scenario.

The most challenging part of Smooth Streaming Silverlight client development is the heuristics module which determines when and how to switch bitrates. Elementary stream switching functionality requires the ability to swiftly adapt to changing network conditions while never falling too far behind, but that’s often not enough to deliver a great experience. One must also consider: What if the user has enough bandwidth but doesn’t have enough CPU power to consume the high bitrates/resolutions? What happens when the video is paused or hidden in the background (i.e. minimized browser window)? What if the resolution of the best available video stream is actually larger than the screen resolution, thus wasting bandwidth? How large should the download buffer window be? How does one ensure seamless rollover to new media assets such as ads? As any web application developer will tell, you there’s much more to building a good player than just setting a source URL for the media element.

Fortunately for those who prefer not to write such code from scratch, there are already two options available for adding Smooth Streaming support to your Silverlight application:

  1. Expression Encoder 2 SP1 templates
    Every Silverlight 2 player template included with Expression Encoder 2 SP1 includes a ready Smooth Streaming module as well as complete source code (which can be modified and used freely). The Smooth Streaming object (named AdaptiveStreaming.dll) can be easily integrated into any Silverlight project. See James Clarke’s blog for additional Expression Encoder tips & tricks.
  2. Open Video Player (OVP)
    The Akamai-led Open Video Player Initiative is an open-source community project that strives to provide a best-of-breed video player platform for Silverlight and Flash. The Silverlight version of the Open Video Player provides integrated support for Smooth Streaming playback, and is in fact the video player used by Akamai on SmoothHD.com and many of their customer sites.

  These templates provide great out-of-the-box Smooth Streaming experiences while also allowing developers to continue innovating and fine-tuning the client code.

 

In my next blog post:  The Smooth Streaming Format

Posted in Expression Encoder, Internet Information Services, Olympics, Silverlight, Smooth Streaming | Tagged , , , , , , | 31 Comments

A Brief History of Multi-Bitrate Streaming

It’s been a while since I last posted. Probably the most significant news from Microsoft regarding Silverlight media since IBC (when we announced H.264/AAC support in Silverlight 3) has been on the IIS and Expression fronts. The Expression Encoder team released Service Pack 1 for Encoder 2 in October, which really should’ve been called a 2.5 release considering all the new feature additions: Silverlight 2 templates; simple H.264/AAC encoding for devices; publishing to servers via WebDAV; and most importantly, on-demand encoding to the new Smooth Streaming format. Read more about the SP1 release on Expression Encoder Team’s and James Clarke’s blogs.

The IIS Media team, meanwhile, released IIS Media Pack 1.0, a free download for IIS7 (Windows Server 2008) which adds media features to IIS. The first media pack release features bitrate throttling and web playlists. But the big news came on October 28th at Digital Hollywood when we announced that Media Services 2.0 for IIS7 will feature a new HTTP-based adaptive streaming technology named Smooth Streaming. In order to promote the new technology we launched a showcase website in partnership with Akamai Technologies – SmoothHD.com. Akamai will also offer the first media delivery service based on Smooth Streaming, named Akamai AdaptiveEdge Streaming for Microsoft Silverlight. The better the technology, the longer the product name. 🙂

For more details on IIS media extensions, visit Chris Knowlton’s, Vishal Sood’s, and John Bocharov’s excellent blogs.

So with all this talk of adaptive streaming going on lately, it’s highly likely that you find yourself a little lost and confused. What is adaptive streaming? What is Smooth Streaming? Is it related to MBR streaming? Is it backwards compatible with Windows Media? Does it work with Windows Media Services? Is it supported by Silverlight? Where does Move Networks fit into all this?

So before I dive into all the technical details of the new technology, let’s take a step back first and take a look at Microsoft’s history of multi-bitrate streaming and how we got to Smooth Streaming.

Adaptive Streaming Confusion

 

The first effort to adapt streams to client conditions was called “stream thinning”, which Microsoft introduced as part of NetShow Services 3.0 and Windows Media Player 6.1 in 1998. Stream thinning automatically detected deteriorating network conditions and decreased the video frame rate in response. In dire network conditions the client could even suspend video playback entirely and stream only audio.

The earliest form of multiple bit rate streaming from Microsoft was introduced in 1999, as part of Windows Media Technologies 4.0 (in Windows NT 4.0) together with Windows Media Player 6.4. The ASF file format allows storage of multiple video and audio tracks inside a single file, and the Windows Media streaming protocols support switching streams during broadcast. This technology is most commonly referred to as Multiple Bit Rate ASF, or simply MBR.

In 2002 Microsoft released the Windows Media 9 Series products. Windows Media Services 9 (in Windows Server 2003) and Windows Media Player 9 introduced an improved MBR technology dubbed Intelligent Streaming. Intelligent Streaming combined bandwidth detection, stream thinning, MBR ASF, and better image handling in Windows Media Player. Intelligent Streaming, of course, still required the media to be encoded as MBR ASF files with a tool such as Windows Media Encoder 9.

While the technology itself was well designed, its implementations suffered from numerous shortcomings. It was limited to streaming only (no progressive download), and only from Windows Media servers. The encoders never required the multiple video streams to be temporally aligned, let alone keyframe aligned, which made switching between streams difficult to do in a seamless fashion. Because the media was streamed, and streaming protocols function at constant rates, it was almost impossible to accurately predict overall client bandwidth – particularly in a timely fashion. By the time poor network conditions were detected, it was usually already too late – the player often went through several iterations of re-buffering before finally downgrading the bitrate. WMP’s heuristics code never fully lived up to its potential. MBR streaming worked great for some people, and less than stellar for others.

Meanwhile… One of the trends that emerged in the streaming media industry over the past several years has been a steady shift away from classic streaming protocols (RTSP, MMS, RTMP, etc.) and back towards plain HTTP download. Need proof? Just visit YouTube.

Why is that? There are several strong reasons for this industry trend:

  • Web download services are typically cheaper than media streaming services offered by CDNs and hosting providers.
  • Media protocols often run into trouble getting around firewalls and routers because they are commonly based on UDP sockets over unusual port numbers. HTTP-based media delivery has no such problems because every firewall and router knows to let through HTTP downloads over port 80.
  • HTTP delivery doesn’t require special proxies or caches. Any web cache will do.
  • It’s much easier and cheaper to move HTTP data to the edge of the network, closer to the end users.

Even though streaming protocols were designed with media delivery in mind, the fact of the matter is that the Internet was built on HTTP and optimized for HTTP delivery. So instead of trying to adapt the entire Internet to streaming protocols – why not just adapt media delivery to the Internet instead?

Move Networks, a strategic Microsoft Silverlight partner, seized the opportunity early on by providing exceptional end-to-end HTTP-based media delivery services, and pioneering a new hybrid form of media streaming called adaptive streaming. They proved again and again in 2008 that HTTP-based media delivery can be done successfully on a large scale – both on-demand (ABC, ESPN, Discovery, etc.) and live (Democratic National Convention). Microsoft demonstrated this again during the NBC Olympics, when it prototyped its own HTTP-based adaptive streaming solution.

So what exactly is adaptive streaming and how is it different than classic streaming? Stay tuned for my next blog post. 🙂

Posted in Expression Encoder, Internet Information Services, Silverlight, Smooth Streaming | Tagged , , , , , | 6 Comments

H.264 and AAC support coming in Silverlight v.Next

I’m on a mini pre-IBC vacation this week so I was caught a little off-guard when I noticed that the big IBC announcement that we had been working on for months now – went out this morning. I didn’t expect it’d go out before Thursday or Friday. 🙂

Anyway, here’s the big news:

We will be adding support for H.264 and AAC-LC decoding to the next version of Silverlight (post v2). This is in response to the loud and clear customer demand for H.264/AAC that we’ve been hearing since Silverlight 1.0, the general convergence of the video industry around H.264, and a continuation of Microsoft’s own investment into the MPEG-4 standard.

Here’s the official Microsoft press release, framed as Q&A with Silverlight’s Scott Guthrie:

http://www.microsoft.com/presspass/features/2008/sep08/09-09silverlight.mspx

The Q&A does a very good job of answering some of the questions regarding the seemingly complex relationship between H.264, VC-1 and Windows Media, but one answer in particular bears repeating:

Addition of H.264 support in Silverlight does not mark a departure from VC-1 or Windows Media, but instead serves to enhance and expand the existing video/audio format ecosystem. The idea is to give users more choice and allow Silverlight to adapt to their existing workflows, rather than forcing it the other way around. I’ve seen too many people over the past few years get bogged down in “VC-1 vs H.264 codec wars” and I always found such obsessions to be very counterproductive. Both codecs are efficient enough to deliver excellent video quality at similar bitrates, so the question of which one to use should really be answered with “whichever one best fits your workflow and project constraints.” By adding H.264 support to Silverlight, we hope to get to a point where Silverlight is codec agnostic and customers can spend their valuable time focusing on end-to-end media delivery.

We will be showing a technology preview of H.264/AAC playback at IBC in Amsterdam (September 12-16), for which we partnered with Inlet Technologies to produce the demo content. The exact techhnical details of “what” and “how” are still being worked out, but one thing I can tell for sure is that MP4 file progressive download will definitely be supported.

A few links to blogosphere coverage of this announcement:

Posted in H.264, Silverlight | 5 Comments

DNC powered by Silverlight and Move

After all the Olympics madness I nearly forgot the “other” big Silverlight video event happening right now: The Democratic National Convention.

Check it out at:

http://gallery1.demconvention.com/

The DNC website is providing live streaming HD video coverage of the convention using Silverlight 2 Beta 2 and Move Networks plugins. This is a great example of Move providing their own adaptive streaming technology and integrating it with Silverlight. I don’t have the exact encoding specs available, but it looks like video is being streamed in true HD – at least 720p in full screen, as far as I can tell. It’s really amazing video quality – and it’s live!

Posted in Silverlight | Comments Off on DNC powered by Silverlight and Move

An inside look at NBC Olympics video player

It’s the second week of the Beijing 2008 Olympics and though the press coverage of the NBC Olympics website has been more than thorough, one thing that hasn’t been fully explained is – what exactly are you watching when exploring the different parts of the NBCO video player – and what kind of quality should you expect anyway?

Let’s begin by explaining the 2 video player user interfaces and the plugins that power each.

User Interface

The NBC Olympics video player is available in 2 flavors: Standard and Enhanced. The Standard player UI is what you get when you first launch the video player. The Enhanced player UI is what you get when you click on the “Enhanced” button in the lower right corner of the Standard player.

The Standard player has a video rectangle of size 592×336 (roughly a 16:9 aspect ratio) and can be experienced with either WMP or Silverlight plugins. As explained in earlier posts, if you are running Windows OS + Internet Explorer or Firefox browser + WMP9 or better (ideally WMP11), you can choose to use the WMP plugin instead of the Silverlight plugin to view video by choosing “Watch without Plugin” when prompted to install Silverlight. The video streams available in the Standard player are identical regardless of whether you’re using WMP or Silverlight. The bitrate of those streams never exceeds 650 kbps in the Standard player.

The Enhanced player is only available to those who have installed the Silverlight plugin. It provides a more interactive experience and features a larger video window, as well as higher resolution and higher bitrate video streams (for some content). The video rectangle is 848×480 (also roughly 16:9 aspect ratio).

Video and Audio Codecs

All video on the NBC Olympics website is encoded as VC-1 Advanced Profile in CBR mode at various bitrates (described below).

All audio is encoded as WMA 10 Professional audio at 48 kbps, 44.1 kHz, stereo. The special Low Bitrate (LBR) mode of the WMA Professional codec offers improved fidelity over the more commonly used WMA Standard codec and is comparable with HE-AAC quality.

Content Categories

The content is generally divided into 2 categories: Live/Rewind and Highlights/Encore.

Live video (and its archived counterpart Rewind) is encoded on site in Beijing, then beamed back to New York and distributed to homes via CDNs. It comes in 2 bitrates and sizes:

  • 592×336 at 600 kbps
  • 320×176 at 300 kbps

The reason why higher bitrates aren’t offered for Live streams is because NBC’s link from Beijing to New York has a fixed bandwidth and needs to be able to sustain many simultaneous live streams (1 Mbps per event, and there can be as many as 30 events happening at the same time). In addition, delivering more than 1 Mbps of video around the world without losing packets all over the place or running into last-mile bottlenecks – is still incredibly difficult even in 2008.

Highlights/Encore video is content produced and encoded by NBC in New York. It typically features highlights, previews, recaps, interviews – so generally anything that’s not a full rewind of an event. It comes in 4 bitrates:

  • 320×176 at 350 kbps
  • 424×240 at 600 kbps
  • 592×336 at 1050 kbps
  • 848×480 at 1450 kbps

As mentioned above, the higher bitrates are only available in the Silverlight-exclusive Enhanced player interface. The Standard player is only able to consume the first 2 lower-bitrate streams.

In addition to all the bitrates and resolutions mentioned above, all content is available for thumbnail-sized Picture-In-Picture viewing. PiP video is always encoded as 128×96 at 50 kbps and half the source framerate.

This means that the minimum bandwidth needed to view the highest quality video + PiP is 1550 kbps (1450 video + 48 audio + 50 PiP) in perfect conditions. In reality, you probably need at least 100 kbps overhead on top of that in order to compensate for Internet unreliability.

Much of the press coverage of the NBC Olympics website has referred to the video content as being “HD quality.” The definition of “HD” for television has always been pretty clear: you need at least 1280×720 to call something “HD.” Unfortunately, the definition of HD video on the web has been far more ambiguous. It’s the YouTube effect. Once you get used to watching 320×240 poorly compressed video for so long, anything above that suddenly starts looking like Digital Cinema. 🙂 Whether or not you choose to think of 848×480 video as HD is up to you. I personally wouldn’t, but then again – it’s my job to be nitpicky about video quality.

Streaming Methods

Finally, there’s the actual delivery of the content. Two basic methods of streaming are used on the NBC Olympics website.

All Live video – regardless of which plugin is consuming it – is streamed via WMS HTTP streaming protocol from Windows Server 2008 servers running Windows Media Services. The same streaming method is also used for all delivery to the WMP plugin. If you’re using the WMP plugin, you always get the WMS stream, regardless of content type.

As mentioned before, the Silverlight-powered Enhanced player has several features that make it a superior experience to the Standard player and WMP plugin. One of them is its ability to seamlessly switch between streams of different bitrates and resolutions during playback to dynamically match the user’s bandwidth and CPU power. This feature, often referred to generically as Adaptive Streaming, is something that Microsoft developed for NBC based on Silverlight 2’s MediaStreamSource interface. NBC’s website does not utilize the Move Networks adaptive streaming technology, as has been widely rumored. Silverlight 2 supports hooks to multiple adaptive streaming approaches, including Move’s – but in this particular case Microsoft provided the solution.

The easiest way to recognize that you’re watching an adaptively streamed video while in the Enhanced player is by seeking to another point in the video. If the player is using adaptive streaming, you will see the video start up very quickly without a buffering notification and the resolution will briefly drop. After a few seconds the blurry video will get sharper, and then sharper again… and then sharper again, your bandwidth allowing, of course.

Summary – Setting expectations

Here’s the best quality you can expect for NBC Olympics video, as well as minimum requirements:

  • Live/Rewind content:
    • Either plugin: 592×336 at 650 kbps
  • Highlights/Encore content:
    • WMP plugin:  424×240 at 650 kbps
    • Silverlight plugin and Enhanced player:  848×480 at 1500 kbps
Posted in Olympics, Silverlight | 14 Comments

Moonlight 2.0 – Help wanted

Jason Perlow of ZDNet wrote a very supporting article of Silverlight and its handling of the NBC Olympics website, but that’s not the reason why I bring it up here. Jason points out something perhaps less obvious but far more interesting that bears repeating:

We need Silverlight on Linux – and we need your help to make it happen.

Jason writes:

Yeah, it would have been nice to be able to watch the Olympics event playbacks and live feeds on Linux using Moonlight.  But right now, Moonlight only supports Silverlight 1.0 apps, and NBCOlympics.com is implemented using 2.0. As Novell’s chief Mono/Moonlight developer, Miguel de Icaza told me several weeks ago before the NBCOlympics content launch, “Work on this has started, but it will take a lot of work. And sadly, there are very few people willing to contribute to make this happen on time.”

That’s incredibly disappointing to hear, because here’s just a sample of the type of feedback I’ve been seeing from Linux users regarding NBC’s use of Silverlight:

“It’s infuriating to be summarily left out just because I choose to use a superior OS, Linux, instead of the crap M$ puts out. Oh well, I guess NBC doesn’t care how many viewers–and, yes, we ARE viewers as well, not just people online–they’re alienating by their idiotic decision to go with a Micro$oft only application.”  – from a comment on an LA Times blog

“I triple boot. XP, Vista and Ubuntu. I refuse to boot into XP or Vista to watch this online. If they don’t care about Linux users then I will return the favor and find alternatives.” – from a comment on Digg

“Nevertheless, NBC’s official stance is to support Internet Explorer and Firefox for Windows and the Mac, but there is no Linux support. This seems absolutely foolish. How hard is it cater to users of Firefox on Linux?” – from a blog post on OStatic

So on one hand we have Linux users who are infuriated because they feel left out, and on the other hand we have a statement from Moonlight project leader indicating there is not enough interest and support in the OSS community to deliver a solution. What’s going on here? Some might say it’s just a case of Linux users growing too complacent, or use this as an example of a counterproductive anti-Microsoft bias in the OSS community – but an article on the not-very-subtly named Boycott Novell site hints there might be more to it than meets the eye. As it turns out, the OSS community doesn’t seem to like Novell very much either.

My own opinion is: if you want it, help make it happen. Miguel is waiting for your e-mail. 

I’d love to hear everyone’s opinion on this, especially from Linux users and active OSS contributors.

Posted in Linux, Silverlight | 12 Comments

Why no full screen mode in the NBC Olympics player?

A very spirited discussion of the pros and cons of the NBCO player’s (lack of) full screen mode is taking place over on the Silverlight forums:

http://silverlight.net/forums/p/22318/80644.aspx

Microsoft’s Tom Taylor has provided some context and explanation for the controversial design decision.

Posted in Olympics, Silverlight | Comments Off on Why no full screen mode in the NBC Olympics player?

NBC Olympics video without Silverlight?

There’s been a lot of rumor on the Internet regarding the NBC Olympics website’s plug-in requirements and OS/browser support, so I thought I’d shed some more light on exactly what is and isn’t supported.

The NBCO website specifically lists the plug-ins required for experiencing all sections of the website: http://www.nbcolympics.com/pluginsneeded.html

As you can see, only Flash is a requirement for access to the main (non-video) site. To access the video content on the website, Silverlight 2 Beta 2 plug-in or Windows Media Player are required. The various Silverlight supported platforms were outlined in my previous post, but what about WMP support?

WMP “fallback” mode was a key part of the NBC Olympics player design from the very start. Because the Silverlight 1 and 2 media pipeline is  built on top of the Windows Media format (ASF) and codecs (VC-1, WMV8, WMV7, WMA9 Standard & Pro, MP3), any media content produced for Silverlight is also backwards compatible with WMP – so it only made sense to re-use the same streams for WMP as a “fallback” option in case certain users didn’t wish to install Silverlight 2 (after all, it is a beta) or simply couldn’t install it due to non-admin restrictions or due to being on an unsupported Windows OS.

Now, I know some may immediately ask, “Well, if you wanted to provide a fallback option, why not just use Flash as an alternative?” All business politics aside, it’s important to understand that creating all the content in duplicate (and trust me, there’s A LOT of content being produced for these Olympic games) would’ve been extremely inefficient with regards to both time and cost. Not only would all content need to be encoded twice, but the bandwidth of NBC’s direct link from Beijing to New York would need to be doubled, and NBC would need to deploy twice the number of encoders and servers, etc, etc, etc. Supporting Flash would’ve also doubled the engineering cost of designing, implementing and testing the video player application. Speaking from a purely engineering perspective, I can say that getting this project off the ground and to this final stage was an incredibly ambitious undertaking with just Windows Media alone. Supporting a whole additional set of formats, codecs and RIA technology would’ve been nothing short of impossible.

But back to WMP support…

At the moment WMP support is limited to Windows OS only, and Internet Explorer and Firefox browsers only. The good news is that any Windows OS running WMP9 or later ought to work – and that includes even ancient Windows 98SE, Millennium and 2000 systems. However, if using the WMP fallback mode, it is definitely recommended that you use WMP11 (XP, Vista and WS2008) for optimal video and audio playback quality. If you are using WMP9 and WMP10, upon visiting the NBC Olympics video player page for the first time you may be prompted with an ActiveX security dialog asking you to install “wvc1dmo.cab” or “Windows Media Audio Codec.” It is safe to install these updates – and actually required to make the NBCO video player work with WMP9/WMP10. If you don’t get any security dialog prompt, but can’t see any video either – your browser’s security settings might be blocking the ActiveX install prompts. In that case you can install the necessary video codec update manually from http://support.microsoft.com/kb/942423.

It has also been suggested that Mac PPC users might be able to get the WMP fallback solution working for them by installing Flip4Mac WMV. Unfortunately, at the time of this writing that solution is not working, partially due to an apparent incompatibility between the NBCO player Javascript code and Flip4Mac/Safari/Firefox. If this changes at any point, I’ll make sure to post about it immediately.

Finally, for those who do have a choice of installing Silverlight instead of using WMP, what advantage does Silverlight bring to the table? For starters, all WMP-targeted video streams are limited to 650 kbps, whereas the Silverlight plug-in can take advantage of higher-bitrate and higher-resolution video streams, all the way up to 1500 kbps. Furthermore, all WMP playback is single-bitrate only with no dynamic/adaptive stream-switching capability. The Silverlight-based player, on the other hand, can use adaptive streaming (dynamic bitrate switching) for most NBCO content that’s not Live or Rewind. So the short answer to the question of what advantage Silverlight has over WMP is: better video quality and more reliable streaming methods.

Posted in Olympics, Silverlight | 8 Comments

NBC Olympics 24/7

After more than 8 months of planning and development, full NBC Olympics online video coverage is underway!

Anyone living in the U.S. can watch live and archived streaming video of nearly every Olympic event for free by visiting:

http://www.nbcolympics.com/video/index.html

The reason that only United States residents have access to NBC’s video streams is because NBC owns the rights to broadcasts of the 2008 Summer Olympics only for the United States. The International Olympic Committee sells broadcasts rights to only one broadcaster per country, so if you’re in Canada – you have to watch CBC; if you’re in the UK – you have to watch the BBC, and so on. I’m sure NBC wouldn’t have minded selling ads globally if the IOC had let them. 😉

The NBC Olympics website doesn’t require (as some news reports and blogs have stated) Microsoft Silverlight to run, but is optimized for it. This is even stated on the NBC Olympics website. The Silverlight 2 Beta 2 plugin needed for the optimal (and intended) rich web experience is a minimal download and a very light-weight install. And despite being dubbed a “Beta 2,” it’s actually easily one of the most stable Microsoft products I’ve seen in years. Here’s the list of Silverlight 2 supported operating systems and browsers:

  • Windows Vista: Internet Explorer 7 or better, Firefox 1.5 or better
  • Windows XP SP2 or SP3: Internet Explorer 6 or better, Firefox 1.5 or better
  • Mac OS X 10.4.8+ (Intel only): Firefox 1.5 or better, Safari 2 or better

Microsoft’s Silverlight website also mentions the following platforms as being supported by Silverlight 2, but I guess NBC isn’t explicitly supporting them, so proceed at your own risk: Windows 2000 w/ IE6, Windows Server 2003 w/ IE6+ or Firefox 1.5+. And if Silverlight 2 works on Vista, one would also assume it works on Windows Server 2008 too.

A list of Frequently Asked Questions is also available on the NBC Olympics Video website.

Finally, here’s a few articles that describe the project and showcase its highlights:

http://news.cnet.com/8301-13860_3-10002909-56.html
http://seattlepi.nwsource.com/business/373888_msftoly07.html?source=mypi

MSDN’s Channel 9 also has a 20-minute interview with fellow Silverlight evangelist Eric Schmidt who talks about the various features of the NBCO video player and what it took to pull off this massive effort:

http://channel9.msdn.com/shows/Continuum/Building-NBCOlympicscom-with-Silverlight/

Posted in Olympics, Silverlight | 1 Comment