The Birth of Smooth Streaming

In my last post I talked about the history of multi-bitrate streaming and how we got from RTSP and HTTP streaming back to HTTP download as the primary media web distribution mechanism. In this post we’ll take a closer look at how adaptive streaming differs from traditional streaming (i.e. RTSP) and plain progressive download.

 

Traditional Streaming

Let’s start by first taking a look at RTSP as an example of a traditional streaming protocol. RTSP is defined as a stateful protocol. This means that from the first time a client connects to the streaming server until the time it disconnects from the streaming server, the server keeps track of a client’s state. The client communicates its state to the server by issuing it commands such as PLAY, PAUSE or TEARDOWN (the first two are obvious; the last one is used to disconnect from the server and close the streaming session).

Once a session between the client and the server has been established, the server begins sending down the media as a steady stream of small packets (the format of these packets is known as RTP). The size of a typical RTP packet is 1452 bytes, which means that in a video stream encoded at 1 Mbps each packet only carries roughly about 11 msec of video. In RTSP the packets can be trasmitted either over UDP or TCP transports – the latter is preferred in cases where firewalls or proxies block UDP packets, but can also lead to increased latency (TCP packets get re-sent until received).

RTSP is an example of a traditional streaming protocol

RTSP is an example of a traditional streaming protocol

 

An HTTP protocol, on the other hand, is known as a stateless protocol. If an HTTP client requests some data, the server will respond by sending down the data, but it won’t remember the client or its state. Every HTTP request is handled as a completely standalone one-time session.

Windows Media Services supports streaming over both RTSP and HTTP. Now, you may ask yourself, “But if HTTP is a stateless protocol, how can it be used for streaming?” WMS uses a modified version of HTTP known as MS-WMSP, which uses standard HTTP for transfer of data and messages but also maintains session states, thus effectively turning it into a streaming protocol like RTSP/TCP. Windows Media Services has also supported RTSP streaming since 2003 (9 Series), over both UDP and TCP. Its implementation of the protocol is publicly documented as MS-RTSP.

The important things to remember about traditional streaming protocols like RTSP and WMS-HTTP is that:

  1. The server sends the data packets to the client at a real-time rate only - that is the bit rate at which the media is encoded (i.e. a 500 kbps encoded video is streamed to the client at approx. 500 kbps)
  2. The server only sends ahead enough data packets to fill the client buffer. The client buffer is typically between 1 and 10 seconds (WMP and Silverlight default buffer length is 5 seconds). This means that if you pause a streamed video and wait 10 minutes – still only ~5 seconds of video will have downloaded to your client in that time.

Progressive Download

The other most common form of media delivery on the Web today is progressive download. Progressive download is nothing more than just a plain, ordinary file download from an HTTP web server like IIS or Apache. It is supported by Silverlight, Flash, WMP, and nearly every other media player and platform under the sun. The term “progressive” likely stems from the fact that most player clients allow the media file to be played back while the download is still in progress - before the entire file has been fully written to disk (typically to the browser cache). Clients that support the HTTP 1.1 specification can also seek to positions in the media file that haven’t been downloaded yet by performing byte range requests to the web server (assuming it also supports HTTP 1.1).

The most popular video sharing websites on the Web today almost exclusively use progressive download:  YouTube, Vimeo, MySpace, MSN Soapbox – and even the rather misnamed Silverlight Streaming Service. (See my previous blog posts for a list of reasons why HTTP download is becoming increasingly popular in the online delivery of media.)

Unlike streaming servers which rarely send more than 10 seconds of media data to the client at a time, HTTP web servers keep the data flowing until the download is complete. This leads to the user experience that we have by now grown very accustomed to thanks to YouTube – if you pause a YouTube video at the beginning of playback and wait, eventually the entire video will have downloaded to your browser cache, allowing you to smoothly play the whole video without any hiccups. There is a downside to this behavior as well – if 30 seconds into a fully downloaded 10 minute video you decide you don’t like it and quit the video, both you and your content provider have just wasted 9:30 minutes worth of bandwidth. In an effort to mitigate this problem, IIS7 Media Pack 1.0 provides a cool feature called Bit Rate Throttling which allows content providers to throttle the download bitrate in order to reduce costs. But that’s another story…

 

Adaptive Streaming

Speaking of misnomers… Here’s another one: Adaptive Streaming. Guess what? It’s not really streaming in the classic sense at all.

Adaptive streaming is really a hybrid delivery method. It acts like streaming but is in fact based on HTTP progressive download. A more technically accurate name for adaptive streaming might be “A Series of Progressive Downloads of Variable Sized Video Fragments,” but even the most determined marketing experts would have a hard time selling that one. :)

A very important thing to remember about adaptive streaming is that there doesn’t really exist a standard implementation for it today because it’s just an advanced download concept, rather than a new protocol. This is why we talk about both Microsoft Smooth Streaming and Move Networks Adaptive Stream as examples of adaptive streaming, even though they use mutually incompatible codecs, formats and encryption schemes. They both rely on HTTP as the transport protocol and perform the media download as a long series of very small progressive downloads, rather than one big progressive download.

In a prototypical adaptive streaming implementation, the video/audio source is cut up into many short segments (“chunks”) and encoded to the desired delivery format. Chunks are typically 2-4 seconds in length. On the video codec level this typically means that each chunk is cut along video GOP boundaries (each chunk starts with a key frame) and has no dependencies on past or future chunks/GOPs. This allows every chunk to later be decoded completely independently from other chunks.

The encoded chunks are then hosted on a regular HTTP web server. A client requests the chunks from the web server in a linear fashion and downloads them using plain HTTP progressive download. As the chunks are downloaded to the client, the client plays back the sequence of chunks in linear order. Because the chunks were carefully encoded without any gaps or overlaps between them, the chunks play back as a seamless video.

The “adaptive” part of the solution comes into play when the video/audio source is encoded at multiple bitrates, generating multiple chunks of various sizes for each 2-4 seconds of video. The client now has the option to choose between chunks of different sizes. Because web servers usually deliver data as fast as network bandwidth allows them, the client can easily estimate user bandwidth and decide to download bigger or smaller chunks ahead of time. The size of the playback/download buffer is fully customizable.

Adaptive streaming is a hybrid media delivery method

Adaptive streaming is a hybrid media delivery method

 

Adaptive streaming, like other forms of HTTP delivery, offers the following advantages to the content provider:

  • It’s cheaper to deploy because adaptive streaming can utilize any generic HTTP caches/proxies and doesn’t require specialized servers at every node
  • It offers better scalability and reach, reducing “last mile” issues because it can dynamically adapt to inferior network conditions as it gets closer to the user’s home
  • It lets the audience adapt to the content, rather than requiring the content providers to guess which bitrates are most likely to be accessible to their audience

It also offers the following benefits for the end user:

  • Fast start-up and seek times because start-up/seeking can be initiated on the lowest bitrate before moving up to a higher bitrate
  • No buffering, no disconnects, no playback stutter (as long as the user meets the minimum bitrate requirement)
  • Seamless bitrate switching based on network conditions and CPU capabilities
  • A generally consistent, smooth playback experience

 

Microsoft Adaptive Streaming Prototype: NBC Olympics 2008

We first prototyped an implementation of HTTP-based adaptive streaming as part of the NBC Olympics 2008 project. In order to deliver the desired level of quality in a short period of time, we took the most basic adaptive streaming implementation approach. We had NBC’s Digital Rapids and Anystream encoders produce multiple WMV files of different bitrates/resolutions for each source. The encoders didn’t employ any new encoding tricks but merely followed strict encoding guidelines (closed GOP, fixed length GOP, VC-1 entry point headers) which ensured exact frame alignment across the various bitrates of the same video. Then we ran the WMV files through a post-processing tool which physically split each WMV file into thousands of 2-second chunks (files). The rest of the solution consisted of simply uploading the chunks to Limelight’s origin web servers (running Apache) and then building a Silverlight player that would download the chunks and play them in sequence. Simple!

The good news: Our implementation worked great for the end users. We were able to offer a better-than-WMS streaming experience while using just simple HTTP download!

The bad news: CDN operators lost many hours (days?) managing millions of tiny files in their systems. Imagine: if every 2-seconds of video is split into a separate file and this is repeated for 5 available bitrates, you end up with 150 files for every minute of video. That’s 13,500 files for a 90-minute soccer game!

So despite NBC Olympics being a huge success for Silverlight and HTTP-based adaptive streaming, it quickly became apparent we had to go back to the drawing board on this one.

 

At Last! Smooth Streaming!

The IIS Media team soon took charge of turning the NBC Olympics adaptive streaming solution into a real server product. Its official name – IIS7 Smooth Streaming, an extension for Internet Information Services 7.0.

The IIS Media team redesigned the content creation and delivery aspect of the prototype solution in order to fix the file management issues while still keeping all the advantages of the original solution. The new design eschewed the one-file-per-chunk approach in favor of a single contiguous file for each encoded bitrate. The file format of choice: MPEG-4.

Smooth Streaming server uses the MPEG-4 Part 14 (ISO/IEC 14496-12) file format as its disk (storage) and wire (transport) format. Specifically, the Smooth Streaming specification defines each chunk/GOP as an MPEG-4 Movie Fragment (moof) and stores it within a contiguous MP4 file for easy random access. One MP4 file is expected per each bitrate. When the client requests a specific source time segment from the IIS server, the server dynamically finds the appropriate Movie Fragment box within the contiguous MP4 file and sends it over the wire as a standalone file, thus ensuring full cacheability downstream.

In other words, with Smooth Streaming file chunks are created virtually upon client request, but the actual video is stored on disk as a single full-length file per encoded bitrate.

 

Smooth Streaming Availability

Smooth Streaming server support will ship as part of the next edition of IIS7 Media Pack, a free download for Windows Server 2008. A technology preview of IIS7 Smooth Streaming Server is already available now through Akamai. They are calling this service Akamai AdaptiveEdge Streaming for Microsoft Silverlight. A demo of the service is available at http://www.smoothhd.com.

On the content creation end, creation of on-demand Smooth Streaming-compatible video is already possible with the latest Expression Encoder 2 SP1. Note that you’ll need to purchase the full version of Expression Encoder 2 in order to get Smooth Streaming encoding support – it’s not included in the “Express” trial version. As a helper tool for encoding to multiple-bitrate formats such as Smooth Streaming, I recommend my Smooth Streaming Calculator. (More about Smooth Streaming encoding with Expression Encoder to come soon.)

In addition, we are already working with a number of encoding ISVs on enabling support for the Smooth Streaming format in their professional encoding products.

 

Smooth Streaming Playback:

You probably already know that Silverlight 2 supports playback of Smooth Streaming sources (if you don’t, go to http://www.smoothhd.com). But how does it do it?

Despite popular belief, Silverlight doesn’t actually feature native support for any particular adaptive streaming technology – Microsoft’s, Netflix’s or Move Networks’ for example. Smooth Streaming support in Silverlight is implemented via the MediaStreamSource API. This API allows developers to implement their own media transport methods (instead of relying on MediaElement‘s native transport methods) while still leveraging Silverlight’s native decoders and renderers. In other words, Silverlight support for Smooth Streaming is provided entirely in .NET code: the parsing of the MPEG-4 file format, the HTTP download, the bitrate switching heuristics, etc. This allows developers to modify and fine-tune the client adaptive streaming code as needed, instead of waiting for the next Silverlight release and hoping it magically fixes every customer scenario.

The most challenging part of Smooth Streaming Silverlight client development is the heuristics module which determines when and how to switch bitrates. Elementary stream switching functionality requires the ability to swiftly adapt to changing network conditions while never falling too far behind, but that’s often not enough to deliver a great experience. One must also consider: What if the user has enough bandwidth but doesn’t have enough CPU power to consume the high bitrates/resolutions? What happens when the video is paused or hidden in the background (i.e. minimized browser window)? What if the resolution of the best available video stream is actually larger than the screen resolution, thus wasting bandwidth? How large should the download buffer window be? How does one ensure seamless rollover to new media assets such as ads? As any web application developer will tell, you there’s much more to building a good player than just setting a source URL for the media element.

Fortunately for those who prefer not to write such code from scratch, there are already two options available for adding Smooth Streaming support to your Silverlight application:

  1. Expression Encoder 2 SP1 templates
    Every Silverlight 2 player template included with Expression Encoder 2 SP1 includes a ready Smooth Streaming module as well as complete source code (which can be modified and used freely). The Smooth Streaming object (named AdaptiveStreaming.dll) can be easily integrated into any Silverlight project. See James Clarke’s blog for additional Expression Encoder tips & tricks.
  2. Open Video Player (OVP)
    The Akamai-led Open Video Player Initiative is an open-source community project that strives to provide a best-of-breed video player platform for Silverlight and Flash. The Silverlight version of the Open Video Player provides integrated support for Smooth Streaming playback, and is in fact the video player used by Akamai on SmoothHD.com and many of their customer sites.

  These templates provide great out-of-the-box Smooth Streaming experiences while also allowing developers to continue innovating and fine-tuning the client code.

 

In my next blog post:  The Smooth Streaming Format

About Alex Zambelli

Alex is a Principal Product Manager at iStreamPlanet Co. in Redmond, Washington. Prior to his current job he was a Technical Evangelist for Microsoft Media Platform at Microsoft Corporation. He specializes in video streaming, adaptive HTTP streaming, VC-1 and H.264 video, and video processing best practices.
This entry was posted in Expression Encoder, Internet Information Services, Olympics, Silverlight, Smooth Streaming and tagged , , , , , , . Bookmark the permalink.

31 Responses to The Birth of Smooth Streaming

  1. Pingback: .net DEvHammer : Everything You Ever Wanted To KNow About Streaming Video, but Were Afraid To Ask, and Then Some…

  2. Pingback: Weekly Web Nuggets #50 : Code Monkey Labs

  3. Pingback: De los medios y de lo digital : Smooth Streaming ya está en beta

  4. Pingback: Nigel Parker's Outside Line : IIS Smooth Streaming Available Now

  5. Pingback: Hiroshi Okunushi's Blog ?? : ?IIS7? IIS7 ??????????????????????2?

  6. gdc says:

    All this makes sense for files, but what about live AV? Is this suggesting that a live stream is also chunked up much like mobile TV applications on phones that do not support RTP? Extending further, what about 2-way and multi-party communication?

  7. GDC:
    Yes, live streams can be chunked up like this, too.
    As for peer-to-peer communication, I don’t think there’d be much advantage in doing it in the form of HTTP progressive download since P2P communication puts a heavier emphasis on realtime communication (very low latency). The true power of Smooth Streaming is in its scalability – the ability to cache content along the network edge, allowing it to serve a very large number of clients. That advantage would be mostly lost in two-way communication.

  8. Yue Chen says:

    Hi Alex,

    Thanks for wonderful post about the smooth streaming. My question is if the smooth streaming is also deployed by the CDN edge servers? Since most CDN vendor deploys Linux+Apache+Squid based caching server, I would assume their problem with dealing with thousands of video segment files is still unresovled. Is there anyway to make smooth streaming like scheme avaialble in Linux+Aapache based system? Thanks.

    Regards,

    Yue

    • Hi Yue,

      Smooth Streaming (IIS Media Services) only needs to be deployed on the origin servers. Downstream edge servers can run whatever OS they wish – Linux, Windows, hardware – as long as they have the ability to cache HTTP responses. HTTP caches don’t need to be specially configured to handle Smooth Streaming. All they need to be able to recognize is that a unique HTTP response matches a unique HTTP request. The rest are just bytes.

  9. Pingback: ReLabs » Blog Archive » ?????????? ??????-????????? ???????

  10. Pingback: Ezequiel Jadib’s Blog » Live Smooth Streaming: How-to: Start, Stop & Shutdown a Publishing Point Programmatically

  11. vps says:

    Great job and excellent work. Thank you.

  12. Jay Bhalod says:

    Will this be supported by windows mobile? If so from which version onwards? What would be the limitation in terms of resolution/bitrate of the video content? Any points? thanks

  13. Pingback: Michael Wolf

  14. Pingback: Michael Wolf

  15. saivert says:

    So this is for IIS only? Are there plans for Smooth streaming for other web server daemons?

    refer to http://smoothhd.code-shop.com/

  16. Thanks for given this useful post…

  17. cmstream says:

    Here it is an alternative solution for a Multiplatform Smooth Streaming

    http://cmstream.net/adaptive-streaming.aspx

  18. Jake says:

    I am going to attempt to post a comment reguarding netflicks. I am watching movies online on my dell computer with 2.4 mhz pent 4, mx4000 128mb nvidia, and 512mb ram.Not the best, but the problem is the player buffers 10secs. maxs out, then progressivly depleates it self not compensating for the play time. then it stops it buffer again. The processes continues for the entire movie.
    Can there be a problem here,netflicks, or sl3, or me. I have a cricket mobile broadband connection. I tried everything and don’t want to waste my $. Tell where the problem is so i can fix it please.

  19. Jake says:

    this website is great just hope someone intelligent respones before my trail version of netflicks expires, otherwise byby netflicks

    • Jake, you’d probably be better off posting a comment at http://blog.netflix.com. Netflix built their own Silverlight player, their own adaptive streaming platform, their own encoder… everything. They’re the only ones who can tell you what’s going on with their player.

  20. Michael says:

    so why have 2 products that deliver streaming from Microsoft? IIS with extensions and Windows Media Services? Will smooth streaming be rolled into Windows Media Services 2008? When should I use Windows Media Servics and when should I use the new extension to IIS for streaming?

    • Windows Media Services uses traditional RTSP and WMS HTTP stateful protocol streaming, whereas IIS Media Services is entirely HTTP based. WMS is better suited for small networks, controlled networks, enterprise environments, etc. IISMS, on the other hand, is what you want to use if you’re, say, trying to deliver live HD video to 50,000 concurrent users over the Internet.

      Furthermore, WMS is based exclusively on the ASF file format, whereas IIS is largely format agnostic (e.g. for progressive download) though Smooth Streaming itself is MP4-based.

  21. server says:

    thank you alex..

  22. domain kaydi says:

    thank you for this article. Ive looked at the end.

  23. Carine says:

    Hi Alex,

    Thanks for this interesting article. my question will be : is there any solution of adaptive streaming for RTP/RTSP, wich is the most used protocole in IPTV streamings?

    • @Carine: Not for RTSP, but IIS Media Services 4.0 (currently in Beta) does support Smooth Streaming over IP Multicast (UDP). Also, good old Windows Media Services supports streaming multi-bitrate video over RTSP, though unfortunately that’s only supported in the WMP client but not in Silverlight.

      If you’re streaming over the Internet, HTTP adaptive is always going to be more efficient and scalable than RTSP. If you’re streaming over intranets, Multicast Smooth Streaming will be a better choice.

  24. abdullah says:

    Hi, I use pushencoder tool to push the content from an external harddisk connected to IIS server via USB port onto the publishing point. Test is successful. I can stream the video smoothly. When I disconnect the external hdd from IIS server, I can still watch the movie. I was expecting not to be able to do so. I disabled the archive media option on publishing point settings. Even if I do that, IIS server caches some content under C:\inetpub\media\archives\Default Web Site Maybe this makes it possible to watch? I cleared the cache of the browser, I can still watch. ~~~DOES THIS MEAN, ALL THE ON-DEMAND CONTENT WILL/SHOULD RESIDE ON IIS SERVER? If that is the case, there is a scalability problem, isn’t there?~~~ Before I push the content, the state of the publishing point is “starting”. OK During push, it is “started”. OK After the pushencoder command does it’s job, the status is “stopped”. But I can still watch the movie? And what does “Wait status is 0″ mean which is displayed at the end of the result logs of pushencoder? And, if I deselect both “Archive media” and “Allow client connections” checkboxs under Publishing Point Details/Advanced Settings, I get the following error on my browser: “Failed to download manifest: d” It doesn’t work…
    Thanks a lot!

    • @Abdullah:
      I haven’t tried this myself, but it does sound like Smooth content is getting cached by IIS. I wouldn’t say that’s necessarily a scalability issue because that cache buffer isn’t infinitely large – I am guessing that if you stream some more content and then try to go back to the first content at some later time it will no longer be available because it will have been wiped out from the cache.

      In the case of live streaming, live content will remain available for DVR access as long as the pub point is in a “Started” or “Stopped” state. The moment you shut down the pub point, the content will no longer be available. If you enabled the Archive option, the archive copy will be indexed and ready as soon as the live pub point enters a Shutdown state.

      Yes, “Allow client connections” needs to be enabled if you want to allow playback from the pub point. The option exists because there are indeed certain scenarios where client access might not be desirable, such as when creating “ingest” pub points. Those are pub points whose role is only to ingest streams (from an encoder, for example) and then push them to other origin pub points (on the same or different server). An example of this would be when you want to push the same video stream from the encoder to multiple pub points: rather than creating multiple outbound connections from the encoder, you can use just a single uplink connection and then multiply the streams on the server itself.

  25. Eddie says:

    Hi Alex,

    Is there a way to play the chunks without silverlight? E.g. downloading all the chunks of a selected bit-rate.

    • It’d be fairly straightforward to write a .NET app that parses the manifest for stream info and timestamps and then issues a series of HTTP requests which download all the video/audio chunks of the highest bitrate. In fact, you could probably do it with just Jscript and WGET.

      The tricky part would be writing an app (or finding some existing code – I’m thinking something like ffmpeg) which then muxes those chunks together into a single MP4 file. A lot of MP4 parsers don’t implement the fragmented format, so you’d probably need to also rearrange the media data into a single box for greater compatibility, in order to get a vanilla MP4 file.