Smooth Streaming Architecture

As described in the previous two posts, Smooth Streaming is Microsoft’s implementation of HTTP-based adaptive streaming, which is a hybrid media delivery method. It acts like streaming, but is in fact based on HTTP progressive download. The HTTP downloads are performed in a series of small chunks, allowing the media to get easily and cheaply cached along the edge of the network, closer to the end users. Providing multiple encoded bitrates of the same media source also allows Silverlight clients to seamlessly and dynamically switch between bitrates depending on network conditions and CPU power. The resulting end user experience is one of reliable, consistent playback without stutter, buffering or “last mile” congestion. In one word: Smooth.

In this post we’ll take a closer look at how Smooth Streaming works: format, server, and client.

 

Smooth Streaming Format

Smooth Streaming is the first Microsoft media format in over a decade to use a file format other than ASF. It is based on the ISO/IEC 14496-12 ISO Base Media File Format specification, better known as the MP4 file specification. Why MP4 and not ASF? Well, there are several reasons:

  • MP4 is a lightweight container format with less overhead than ASF
  • MP4 is easier to parse in managed (.NET) code than ASF
  • MP4 is based on a widely used standard, making 3rd party adoption and support more straightforward
  • MP4 was architected with H.264 video codec support in mind, and we’re counting on H.264 support in Smooth Streaming and Silverlight 3 (ASF can also contain H.264 video, but it’s not as straightforward as with MP4)
  • MP4 was designed to natively support payload fragmentation within the file

There are actually 2 parts to the Smooth Streaming format: the wire format, and the disk file format. In Smooth Streaming a video is recorded in full length to the disk as a single file (one file per encoded bitrate), but it’s transfered to the client as a series of small file chunks. The wire format defines the structure of the chunks that get sent by IIS to the client, whereas the file format defines the structure of the contiguous file on disk. Fortunately, the MP4 specification allows MP4 to be internally organized as a series of fragments, which means that in Smooth Streaming the wire format is a direct subset of the file format.

What are these MP4 “fragments” that I speak of? The basic unit of an MP4 file is called a “box.” These boxes can contain both data and metadata. The MP4 specification allows for various ways to organize data and metadata boxes within a file. In most media scenarios it is considered useful to have the metadata written before the data so that a player client application can have more information about the video/audio it’s about to play before it plays it. However, in live streaming scenarios it is often not possible to write the metadata upfront about the whole data stream because it’s simply not fully known yet. Furthermore, less upfront metadata means less overhead, which can lead to shorter startup times. For these reasons the MP4 ISO Base Media File Format specification was designed to allow MP4 boxes to be organized in a fragmented manner, where the file can be written “as you go” as a series of short metadata/data box pairs, rather than one long metadata/data pair. The Smooth Streaming file format heavily leverages this aspect of the MP4 file specification, to the point where at Microsoft we often interchangeably refer to Smooth Streaming files as “Fragmented MP4 files” or “(f)MP4.”

Here is a high-level overview of what a Smooth Streaming file looks like on the inside:

Smooth Streaming File Format

Smooth Streaming File Format

In a nutshell, the file starts with file-level metadata (‘moov‘) which generically describes the file, but the bulk of the payload is actually contained in the fragment boxes which also carry more accurate fragment-level metadata (‘moof‘) and media data (‘mdat‘). (The diagram only shows 2 fragments, but a typical Smooth Streaming file has a fragment per every 2 seconds of video/audio.) Closing the file is a ‘mfra‘ index box which allows easy and accurate seeking within the file.

When a Silverlight client requests a video time slice from the IIS Smooth Streaming server, the server simply seeks to the approriate starting fragment in the MP4 file and then lifts the fragment out of the file and sends it over the wire to the client. This is why we refer to the fragments as the “wire format.” This technique greatly enhances the efficiency of the IIS server because it requires no remuxing or rewriting overhead.

Here is what an MP4 fragment looks like in more detail:

Smooth Streaming Wire Format

Smooth Streaming Wire Format

We say that the Smooth Streaming format is based on the MP4 file format because even though we’re following the ISO specification, we specify our own box organization schema and some custom boxes. In order to differentiate Smooth Streaming files from “vanilla” MP4 files, we use new file extensions: *.ismv (video+audio) and *.isma (audio only). I keep forgetting to ask the IIS Media team what the acronyms exactly stand for, but my best guess would be “IIS Smooth Streaming Media Video (Audio)”.

 

Smooth Streaming Media Assets

A typical Smooth Streaming media asset therefore consists of the following files:

  • MP4 files containing video/audio
    • *.ismv – contains video and audio, or only video
      • 1 ISMV file per encoded video bitrate
    • *.isma – contains only audio
      • In videos with audio, the audio track can be muxed into an ISMV file instead of a separate ISMA file
  • Server manifest file
    • *.ism
    • Describes the relationships between media tracks, bitrates and files on disk
    • Only used by the IIS Smooth Streaming server – not by client
  • Client manifest file
    • *.ismc
    • Describes to the client the available streams, codecs used, bitrates encoded, video resolutions, markers, captions, etc.
    • It’s the first file delivered to the client

Both manifest file formats are based on XML. The server manifest file format is based specifically on the SMIL 2.0 XML format specification.

A folder containing a single Smooth Streaming media asset might look something like this:

A typical folder containing a Smooth Streaming media asset

A folder containing a Smooth Streaming media asset

In this particular case the audio track is contained in the NBA_3000000.ismv file.

 

Smooth Streaming Manifest Files

The Smooth Streaming Wire/File Format specification defines the manifest XML language as well as the MP4 box structure. Because the manifests are based on XML they are highly extensible. Among the features already included in the current Smooth Streaming format specification is support for:

  • VC-1, WMA, H.264 and AAC codecs
  • Text streams
  • Multi-language audio tracks
  • Alternate video and audio tracks (i.e. multiple camera angles, director’s commentary, etc.)
  • Multiple hardware profiles (i.e. same bitrates targeted at different playback devices)
  • Script commands, markers/chapters, captions
  • Client manifest Gzip compression
  • URL obfuscation
  • Live encoding and streaming

For an example of a Smooth Streaming On-Demand Server Manifest file, see here.

For an example of a Smooth Streaming Client Manifest file, see here.

 

Smooth Streaming Playback: Bringing It All Home

Microsoft’s adaptive streaming prototype (used for NBC Olympics 2008) relied on physically chopping up long video files into small file chunks. In order to retrieve the chunks for the web server, the player client simply needed to download files in a logical sequence: 00001.vid, 00002.vid, 00003.vid, etc.

As I’ve explained in this and previous posts, Smooth Streaming uses a more sophisticated file format and server design. The videos are no longer split up into thousands of file chunks, but are instead “virtually” split up into fragments (typically 1 fragment per video GOP) and stored within a single contiguous MP4 file. This implies two significant changes in server and client design too:

  1. The server must be able to translate URL requests into exact byte range offsets within the MP4 file, and
  2. The client can request chunks in a more developer-friendly manner, such as by timecode instead of by index number

The first thing a Silverlight client requests from the Smooth Streaming server is the *.ismc client manifest. The manifest tells it which codecs were used  to compress the content (so that the Silverlight runtime can initialize the correct decoder and build the playback pipeline), which bitrates and resolutions are available, and a list of all the available chunks and either their start times or durations.

With IIS7 Smooth Streaming, a client is expected to request fragments in the form of RESTful URLs:

http://video.foo.com/NBA.ism/QualityLevels(400000)/Fragments(video=610275114)

http://video.foo.com/NBA.ism/QualityLevels(64000)/Fragments(audio=631931065)

The values passed in the URL represent encoded bitrate (i.e. 400000) and the fragment start offset (i.e. 610275114) expressed in an agreed-upon time unit (usually 100 ns). These values are known from the client manifest.

Upon receiving a request like this, the IIS7 Smooth Streaming component looks up the quality level (bitrate) in the corresponding *.ism server manifest and maps it to a physical *.ismv or *.isma file on disk. It then goes and reads the appropriate MP4 file, and based on its ‘tfra’ index box figures out which fragment box (‘moof’ + ‘mdat’) corresponds to the requested start time offset. It then extracts the said fragment box and sends it over the wire to the client as a standalone file. This is a particularly important part of the overall design because the sent fragment/file can now be automatically cached further down the network, potentially saving the origin server from sending the same fragment/file again to another client requesting the same RESTful URL.

As you can see, requesting chunks of video/audio from the server is easy. But what about dynamic bitrate switching that makes adaptive streaming so effective? This part of the Smooth Streaming experience is implemented entirely in client-side Silverlight application code – the server plays no part in the bitrate switching process. The client-side code looks at chunk download times, buffer fullness, rendered frame rates, and other factors – and based on them decides when to request higher or lower bitrates from the server. Remember, if during the encoding process we ensure that all bitrates of the same source are perfectly frame aligned (same length GOPs, no dropped frames), then switching between bitrates is completely seamless – and Smooth.

In my next blog post: Encoding For Smooth Streaming

About Alex Zambelli

Alex is a Principal Product Manager at iStreamPlanet Co. in Redmond, Washington. Prior to his current job he was a Technical Evangelist for Microsoft Media Platform at Microsoft Corporation. He specializes in video streaming, adaptive HTTP streaming, VC-1 and H.264 video, and video processing best practices.
This entry was posted in H.264, Internet Information Services, Silverlight, Smooth Streaming and tagged , , , , , , . Bookmark the permalink.

62 Responses to Smooth Streaming Architecture

  1. Stenka says:

    Hi Alex and thanks for the informative posts.

    Id like to inquire is it possible to get “near realtime” streaming with WindowsMedia?

    Ive been doing this WMencoding for a while and the fastest connections seems to be straight to the encoder where ive achieved only few seconds.

    On the other hand we have used LiveMeeting also and the RealTime experience is almost there (~1 sec to here from westcost U.S, i presume).

    Would you elaborate in this on how would you go about achieving the fastest connection to the client, please.
    Thanks

  2. Pingback: IIS 7 Smooth Streaming Beta launched « John Deutscher

  3. Hi Stenka,

    Latency in live Windows Media streaming can be reduced to 2-4 seconds with the right set up. Ben Waggoner’s blog post – http://on10.net/blogs/benwagg/Low-Latency-webcasting-with-Windows-Media-and-Siverlight/ – explains how to configure the encoder, WMS server and client (WMP or Silverlight) to deliver low latency streams. Note, however, that adding WMS proxies and CDNs into the workflow will add additional latency.

    Windows Media wasn’t really designed for real-time communications, so it’s not possible to reduce latency below 2-4 seconds.

    Live Meeting was designed for realtime communications. Realtime communications tools such as Office Communicator and Live Meeting are specifically tuned for those scenarios.

  4. Pingback: Expression Encoder : IIS Smooth Streaming server component beta released

  5. Pingback: Nigel Parker's Outside Line : IIS Smooth Streaming Available Now

  6. Pingback: Hiroshi Okunushi's Blog ?? : ?IIS7? IIS7 ??????????????????????3?

  7. afh3 says:

    Alex,
    I’m not having much luck figuring out how to read the metadata (present during the Expression Encoder2 SP1 session) back out of smooth-streaming files it creates. Using the Expression Encoder 2 SP1 SDK, I can get the metadata back out of pretty much every kind of media-file produced by the encoder, — except the ismv types.

    I’ve got a nice little Silverlight 2 video player built, but I need to provide things like the title and description for each playlist item in the initparms as strings that I generate myself – as opposed to being able to pull these very same fields from the ismv files that were created from encoder sessions that already had those values set as metadata in the job.

    What I have is a common dump-folder for the output of Expression Encoder 2 (SP1) smooth-streaming files that resides in an IIS7 site configured to stream them to my player app. It all works perfectly as it is, but I would really like to be able to automate the very last piece which would be to have my player’s playlist item’s title and description properties extracted from the ismv files. That way, whatever values the encoding-user put into the metadata on the encoder-job would appear in the gallery panel of my Silverlight player. If I had this, I could build the custom initparms to pass to my Silverlight usercontrol and it would be displayed next to the thumbnails just like it is now – only I wouldn’t have to constantly update an XML file (as I do now) with that information every time a new file gets encoded.

    Can this be done?

    I know I could do this by creating a front-end for the encoder with the SDK, and then writing those metadata values to my existing XML-file, but I’d rather let the users have the nice Encoder 2 interface rather than something horrible like I’d likely come up with (heh).

    Thanks for any insight you may have.
    -afh3

    • Hi afh3,

      I thought EE2 wrote Smooth Streaming metadata into the client manifest (*.ismc), but I could be wrong. Would you mind sending me an e-mail (my address is listed in the “About” section)? I’ll put you in touch with some folks on the Expression Encoder team and we’ll get this sorted out. Even if it turns out EE2 SP1 doesn’t have the functionality you need, it’ll still be useful to discuss needed features for v3/v4.

      Alex

  8. Pingback: Progressive download CPU consumption « Maxim Fridental

  9. Pingback: Ezequiel Jadib’s Blog » Live Smooth Streaming: How-to: Start, Stop & Shutdown a Publishing Point Programmatically

  10. Pingback: Silverlight smooth streaming and HTTP « Sharovatov’s Weblog

  11. Tim Weiss says:

    How would you go about inserting captions into a set of Smooth Streams that have already been created?

  12. Hrry says:

    I am trying to do Live Streaming from Encoder 2, using Server 2008, Windows media service 2008, IIS7, My problem is I con’t get the video to show on my webpage, using expression web 2, Can you show me how to setup a simple webpage and connect all thease togather.thank you please answere ASAP.

  13. Pingback: Silverlight’s smooth Streaming – Increases the Popularity « Revealing the secret of RIA World

  14. Pingback: Maxim Fridental » Blog Archive » Progressive download CPU consumption

  15. Sushil says:

    Hi Alex,

    Can you point me to some resource on how live ad insertion is done or for that matter how to provide text streams dynamically.
    The scenario is I have a pre-encoded smooth streaming content and need to append metadata dynamically. e.g. Announcements etc., without having to encode the content again.

    Thanks in advance.
    – Sushil

    • Hi Sushil,

      I recommend you look at http://www.codeplex.com/Wikipage?ProjectName=smf because the Silverlight Media Framework is already enabled for interpreting ad insert metadata.

      If you’d like to insert ad markers into pre-encoded content, you could do it by appending ad stream metadata to the client manifest (*.ismc), like this:

      The string within the F tags is just a base64 encoded string formatted in a way that makes sense to your client.

  16. Pingback: Scott Hanselman's Computer Zen - Installing and Setting Up and Encoding for IIS 7 Smooth Streaming and Silverlight

  17. Pingback: Developit » Installing and Setting Up and Encoding for IIS 7 Smooth Streaming and Silverlight

  18. Duncan Smart says:

    Saw this over at Scott Hanselman’s blog… any idea why HTTP byte range requests weren’t used? e.g:

    GET /blah HTTP/1.1
    Range: bytes=0-999

    HTTP/1.0 206 Partial Content
    Content-Length: 1000
    Content-Range: bytes 0-999/10000

    This is what the iPhone/iPod Touch does: no special software required on the server – it’s plain old HTTP 1.1.

    • Hi Duncan,

      Byte range requests weren’t used because:
      1) they’re not universally cacheable
      2) at the time of Silverlight 2 (when Smooth Streaming was being developed) not all browser stacks supported HTTP 1.1 yet

  19. Sushil says:

    Hi Alex,

    Thanks for your answer.
    I got the method you were explaining but in my scenario, if I don’t know what announcements/ads to place in advance and also what position (e.g. I need to put markers for user comments in text stream which are stored in a db) what should I do?

    I basically need a way in which everytime user starts a video I should be able to create text-stream dynamically. At the same time I dont want to generate the entire ismc file at runtime.

    Please help.

    • Are you still talking about on-demand video, or live video too?

      If you don’t know what ads/comments you will need to insert ahead of time, then insert markers which correspond to some lookup scheme (e.g. DoubleClick calls).

  20. Sushil says:

    Yes I meant about on-demand content.
    My only point was, as player is pulling video & audio chunks based on bandwidth, cpu & seek-poistion etc, it could have helped me to serve metadata based on clients’ capabilities. And the way to do it was to generate the text stream dynamically.
    Thanks anyways.

  21. David says:

    Hi Alex, I’m trying to use gzip compression in IIS smooth streaming. I have configured compression function on IIS 7.0, and add according HTTP response header. But I’m still a little confused that if I need prepare compressed files on IIS7. How to do with manifest file? You know, the client request the manifest file by the URL “http://host/program.ism/manifest”, but actually there is not a file named manifest, but program.ismc instead. So do I need compress the program.ismc as program.ismc.gzip on IIS7? Or IIS7 can do the compressing work on demond?

    • GZIP compression in IIS is dynamic – you don’t need to manually zip/unzip anything. The entire process ought to be completely transparent. Just enable Dynamic Compression in IIS7 (you must remember to install the feature – it’s not a default feature) and then use Fiddler2 or HTTPWatch to monitor your Silverlight player’s HTTP traffic. The headers in the outgoing HTTP requests should contain “Accept-Encoding: gzip, deflate” and the headers in the HTTP server responses should contain “Content-Encoding: gzip”. If you don’t see compression applied to video and audio MP4 chunks, don’t worry, gzip compression on compressed video/audio data isn’t very useful anyway since the content is already highly compressed. The true benefit is in client manifest compression, since uncompressed client manifests can often exceed 500 kbytes for long content.

  22. Mark says:

    Hi Alex,

    This is a very usefull documentation that covers the the VOD and partial the LIVE part of the smooth stream

    Thank you, this is verry usefull.

    The problem for me is that i can’t find documentation for the LIVE part where the the encoder is pushing data to the IIS.

    When testing the “IIS Smooth Streaming Player Development Kit – Beta 2″ i see that the pushencoder is sending manifests for each stream. It’s that all? send manifest before streaming the stream with the specification that you posted here? How do i find the place to post the specific stream? is random string? “http://localhost/live/stream.isml/Streams(4996-stream5)”

    This is very vague. There is no documentation for that.

    It would be wonderfull if you can help me understand how it works.

    Kind Regards,
    Mark

  23. gperetz says:

    Hi Alex. Maybe you can help me.
    I am trying to simulate a live smooth streaming client. I’m having probelms in calculating the time of the fragments that do not appear in the manifest. I was trying to analyze the fragment moof box to get the duration, and add it the fragment time. The problem is it appears to be incorrect (or maybe should be used in a different manner) for the audio stream. The video stream duration brings me to the next fragment just fine. However, with the audio stream it often misses the time (by a very small difference, sometimes just 1 increment) and so the server returns 404. I don’t see that behaviour with the silverlight client – it always gets status 200 from the server, and requests the manifest only at the beginning of the play. More details: I am getting a moof box – indise the traf box I check the tfhd box and trun box. the tfhd box (defaults) has no values in it. the turn box has SampleSize, SampleFlags and SampleDuration present, and shows 6 or 7 samples. What I did was to sum the SampleDuration of the samples, and got the fragment duration. Did I miss anything?

  24. Alon says:

    Thanks for this excellent explanation.
    Looking at the client manifest file example in the link above I noticed that the video framents and audio fragments are not aligned (different durations).
    Does this imply each fragment contains just video or just audio or is that simply due to some “non uniform interleaving” of video and audio data within mdat of each fragment?
    Can you clarify?
    Thanks,
    Alon

    • @Alon: The physical fragment on disk can contain both audio and video (as is the case with content encoded with Expression Encoder for example), but the HTTP client request must be made separately for each so the fragment that gets sent over the wire only contains either audio or video.
      The video and audio fragments are allowed to have different durations (especially since audio fragments tend to be smaller than video fragments) because the A/V samples don’t need to be multiplexed before they’re handed off to the decoder.

  25. Clark says:

    Hi Alex,

    Does Stream Server send the “moov” (Movie Metadata) box to the client? If not, is it possible for client to request it?

    Thanks!

  26. Vlad says:

    Hi Alex,

    I just wanted to say thank you, it is really educational for me. I think it’s awesome technology. Is difficult to implement the smooth streaming on Linux based portable devices? Also how do you encrypt samples for PlayReady? Is it the Envelope mode, and done on a fragment base?

    Regards,
    Vlad

    • It’s possible to implement Smooth Streaming on Linux devices – the SS Client Porting Kit will be made available soon.
      Encrypting samples with PlayReady works very similar to the way it works in ASF, there’s no need to use the envelope mode. The most recent versions of IIS Media Services SDKs come with sample encryption code.

  27. TVo says:

    What is the technical difference between a Smooth VoD client vs. Smooth Live client?

    • In a live scenario the client downloads the latest manifest at startup and then internally continues to build a running copy of the live manifest, based on metadata contained in downloaded chunks. This avoids having to periodically re-download the manifest. The only time the client re-downloads the manifest is if the user performs a seek operation or restarts the stream.

      Also, in most VOD manifests (though it’s not a requirement) the chunks are described as “index + duration”, whereas in live manifests they are described by timestamps (which can be 0-based or timecode based).

  28. Tomas says:

    Hello Alex,

    I have a question regarding the manifest file which is built by the IIS. In my manifest file I have this:


    ———————-

    it starts normally, but after a while ( aproximately 600 lines ) is start to not continue in the correct way.

    instead of continuing with 0000 at the end is continues with 66666. like this.

    what may be the problem?

    how is this value calculated?

    Thank you?

  29. sametT says:

    Hi Alex,

    I read your article, little part of your article:

    Among the features already included in the current Smooth Streaming format specification is support for:

    * VC-1, WMA, H.264 and AAC codecs
    * Text streams
    * Multi-language audio tracks
    ……
    ….

    How i can encode multi-language video files to smooth streaming format? I use expression encoder 4 pro full version but it does not support multilanguaged video files.

  30. Joe says:

    Hi,

    Asked previously above, how do I insert makers in a live stream? It would appear that EE4 doesn’t allow this as it greys out the script menu when choosing to encode Smooth. Is this limitation also in the EE SDK? Do I have to insert at the server? How do Inlet and the SNF people manage this?

    • Correct, EE4 doesn’t support live text stream insertion.

      Most professional live encoders which support Smooth Streaming (e.g. Inlet Spinnaker) support text streams. Exactly what kind of text streams – ad markers, captions, etc – depends on the encoder, so I recommend you inquire with the particular vendor.

      If you’re an encoder developer, you might be interested in the Smooth Streaming Format SDK (http://www.iis.net/download/SmoothFormatSDK), which supports muxing text streams to Smooth. This is similar to what was done for SNF and Vancouver Olympics: the ad insertion was performed by an app separate from the encoder; its only function was to create text stream chunks containing ad markers and push them to the live publishing point (the same one that the video encoder was connected to from a different location) on the IIS ingest server.

  31. deion says:

    does the smooth streaming format sdk have the ability to send the outputted files to a smooth stream server? i saw online someone put that you get the server manifest and pass it the publishing poitn. hwoever i havent got it to send to teh server? does the format sdk for smooth streaming support pushing to the server?

    • I was just about to reply back and tell you to send me an e-mail, but I see you’ve already got a hold of John D – which is exactly who I would’ve directed you to, too. :)

  32. Mykhaylo says:

    Hi Alex,

    Thanks for comprehensible overview of Smooth Streaming technology.

    1) This overview states that one of the reasons of choosing MP4 but not ASF as a base format for Smooth Streaming format is that MP4 is easier to parse in managed code than ASF. From my experience I can say that it is true for non-managed code (C, C++), too. I am just curious what are “managed code” specific reasons to say that MP4 is easier to parse than ASF ?

    2) The text says that values passed in RESTfull URLs are taken from the client’s manifest, in particular, the fragment start value offset. Is it possible to specify the value that are not taken from the manifest file ? For example, a player’s client asks to do a time seek to a position that are not aligned to any offset specified in the manifest. Or a player has to do this alignment ?

    Thanks,

    Mykhaylo.

    • Mykhaylo,

      1) You’re right, MP4 is also easier to parse in native code too. My point was really that MP4 format structure is easier to parse than ASF in general, and therefore can be done efficiently even in managed code (which is not as optimized as native code).

      2) The player can only requests timestamps which are listed in the client manifest, as these also correspond to timestamps indexed in the ‘mfra’ box.

  33. Soroush says:

    Hi Alex,
    Could you confirm if Smooth Streaming is supported or not on Window Phone 7? here is a discussion about this http://www.wowzamedia.com/forums/showthread.php?11599-Windows-Phone-7-SmoothStreaming

    Thanks

    • Soroush,
      Yes, Smooth Streaming is supported on WP7. Both the SSME and SMF are available for WP7.

      I looked at the forum and I think a part of the confusion might be that people are expecting Smooth Streaming to work in IE on WP7. That’s not the case. Silverlight only exists as an app runtime platform on WP7, not as a browser plugin. In order to play back Smooth Streams on WP7 one has to build an app and deploy it to the phone. There’s currently no generic Smooth Streaming player app available in the Zune Marketplace, but there’s definitely nothing stopping any developer from writing one and submitting it to the marketplace.

      When encoding Smooth Streaming for WP7, you should also follow these guidelines: http://blogs.iis.net/vsood/archive/2010/12/04/iis-smooth-streaming-encoding-for-windows-phone-7.aspx

  34. abdutr says:

    Hi Alex,

    I see that default fragment size is 2 seconds.
    During transcoding, if I enable Scene Change Detection, output file will have GOPs that are non-uniform in size. This will result in fragments in different sizes. Sometimes less than 2 seconds, sometimes more.
    What do you think?

    Thanks

    • That’s correct. If the encoder is not multi-bitrate aware and Scene Change Detection is enabled, then it will not be able to align GOPs across all bitrates because each stream encoder instance will insert I-frames independetly at different times.
      That’s why we require Scene Change Detection (aka Adaptive GOP) feature to be disabled in Smooth Streaming encoding *unless* the underlying encoder SDK is multi-bitrate aware and knows how to handle GOP alignment across streams. So far I’m aware of only one SDK that has that ability, and that’s the Smooth Streaming Encoder SDK. in Expression Encoder that’s the SDK that gets used when you use the 2-pass VBR mode.

  35. Charles says:

    Hi Alex,

    I am a developer and I have been doing some extensive work on Smooth Streaming using Silverlight Media Framework. The problem I’m facing now how to protect content with using DRM (e.g. Play Ready), we are a startup and cannot afford the licence cost.

    Any suggestions?

    Thank you.

  36. Charles says:

    Correction: I meant “protect content WITHOUT using DRM” in the last post.

  37. David Mo says:

    I have a urgency problem about live smooth streaming:when I get the last fregment starttime from manifest ,and then send a request and get the last fragment in the manifest(means it is the lastest fragment).
    So I add the last fagment duration to the timestamp(I aslo try to get the iofomation from tfxdbox), for calculate the next strattime.
    But the problem appear!I find the last fragment duration is wrong,no matter how and when you get the manifest,the last fragment’s timestamp+duration != next fragment!
    Eg:
    input the url :
    http://ispmix1.edgesuite.net/MIX/MSFT-MIX10-100315.isml/Manifest
    and get the manifest(the front is ignore)


    input the url again(just a few seconds later),and get the manifest follow:


    You can find 30746114029080 – 30746094009120 = 20019960!! not 20020000!!
    So I request the next fragment after last ,the server give me a 404!!
    Tell me the resaon and what should I do !!
    Why the silverlight can get the accurate strarttime?
    I hope I can get help soon!
    Best wish!

  38. David Mo says:

    Here is the content which cann’t be appear just now
    post again!
    Eg:
    input the url :
    http://ispmix1.edgesuite.net/MIX/MSFT-MIX10-100315.isml/Manifest
    and get the manifest(the front is ignore)


    input the url again(just a few seconds later),and get the manifest follow:


    You can find 30746114029080 – 30746094009120 = 20019960!! not 20020000!!

  39. Abdullah says:

    Hi Alex,

    I have an 8 layer, 1 audio smooth streaming asset.
    In the server manifest (.ism file), audio src is pointed to the highest bitrate .ismv file. Does that mean audio is existing only in that layer?
    However, according to MediaInfo analysis, all 8 .ismv files have audio inside.
    Can you please elaborate?

    Thanks

    • Abdullah:
      Each encoding software can choose to mux the video & audio differently. There’s no one “correct” way, as long as the *.ism manifest correctly references a valid audio stream. For example, the content you describe was probably encoded with Expression Encoder, which writes the audio stream into every ISMV file. On the other hand, Inlet Spinnaker encoder muxes the audio stream only with the first-ranked video stream, and Anystream Agility writes audio to a separate ISMA file. All are valid options.

      Theoretically a single ISMV file could contain multiple video streams and audio streams, so that all streams of an asset are contained in a single file (like we did back in the Windows Media days with ASF files).

  40. Frédéric says:

    Hi Alex,

    Thank you for your very helpful explainations !

    Do you happen to know when SS client porting kit will be available ?
    Could you explain how the client get (or build) the initialization segment (ftyp + moov) ?

  41. Pingback: links for 2011-09-24 « Donghai Ma

  42. rohan says:

    Hi Alex,

    I had a few questions and would appreciate if you can provide some guidance.

    1> Is there an API from Microsoft using which the smooth fragments can be pre-created from the MP4 file? i.e. is there any way to do smooth streaming without using IIS?

    2> How is playready DRM to be used here? i.e. can a playready-encrypted file be used as an input to smooth streaming? or is it that once fragments are created, then each fragment will have to be DRM-encrypted?

    Thank you-

    • Hi Rohan,

      1) Microsoft provides a Transform Manager task which can convert GOP-aligned MP4s into Smooth Streaming assets. There is no Microsoft-supported way of serving Smooth Streaming video without IIS.

      2) DRM encryption is applied on a sample level, rather than a fragment or file level. I also recommend exploring Transform Manager in this case, as it provides a DRM task too.

  43. aviad says:

    I’m looking for tools that can make sense of the smooth streaming files [.ism, .ismc .ismv] without going through IIS.
    1) A player that can play from local files
    2) a probing tool that can print out the packets in an .ismv file
    3) a transmuxer that can take an .ismv file as input an produce a simple .mov file without moof and microsoft-specific atoms

    are you aware of any such tools existing?

    the “IIS Smooth Streaming Format SDK API” works great for muxing ssmf files. is there an equivalent API for demuxing?

    • Hi Aviad,

      1) If you install Expression Encoder (even the free version), it comes with an ISMV DirectShow source filter. That’ll allow you play ISMV files locally with any DShow-based player, including WMP.
      2) I hear M4Scene’s Scenescope is an excellent tool for probing into MP4-based formats. http://www.m4scene.com/product/show?class=SceneScope
      3) Hmm, I’m not aware of one, unfortunately. Though that sounds like it’d make a good ffmpeg or mp4box project for someone. The ISMV format, or the PIFF format as we officially call it, is actually compatible with the UltraViolet CFF format and the ISO Base Media File container described in MPEG-DASH, so supporting this would definitely help beyond just Smooth Streaming.