IIS Smooth Streaming Technical Overview
IIS Smooth Streaming Technical Overview
IIS Smooth Streaming Technical Overview
Author: Alex Zambelli , Media Technology Evangelist Microsoft Corporation March 2009
Contents
Introduction ......................................................................................................................................3 A Brief History of Multiple Bit Rate Streaming ....................................................................................3 Windows Media Stream Thinning, MBR and Intelligent Streaming ................................................... 3 The Shift to HTTP-Based Delivery........................................................................................................ 4 Content Delivery Techniques ..............................................................................................................5 Traditional Streaming .......................................................................................................................... 5 Progressive Download......................................................................................................................... 6 HTTP-Based Adaptive Streaming......................................................................................................... 7 Introducing IIS Smooth Streaming ......................................................................................................9 Smooth Streaming Playback with Silverlight..................................................................................... 10 Smooth Streaming Architecture ....................................................................................................... 11 Introducing the Smooth Streaming Format ...................................................................................... 11 Smooth Streaming Disk File Format .................................................................................................. 11 Smooth Streaming Wire File Format ................................................................................................. 12 Smooth Streaming Media Assets ...................................................................................................... 13 Smooth Streaming Manifest Files ..................................................................................................... 14 Smooth Streaming Playback ............................................................................................................. 15 Summary ......................................................................................................................................... 16 For More Information ...................................................................................................................... 16 Legal Notice ..................................................................................................................................... 17
Page 2
Introduction
In October 2008, Microsoft announced that Internet Information Services (IIS) 7.0 would feature a new HTTP-based adaptive streaming extension: Smooth Streaming. To promote the new technology, Microsoft also announced an initiative with Akamai and launched a showcase Web site SmoothHD.com. Other content delivery networks (CDNs) are expected to announce support for Smooth Streaming in the future. Smooth Streaming dynamically detects local bandwidth and CPU conditions and seamlessly switches, in near real time, the video quality of a media file that a player receives. Consumers with high-bandwidth connections can experience high definition (HD) quality streaming while others with lower bandwidth speeds receive the appropriate stream for their connectivity, allowing consumers across the board to enjoy a compelling, uninterrupted streaming experience and alleviating the need for media companies to cater to the lowest common denominator quality level within their audience base. This enables companies to boost brand awareness and advertising revenues by extending average viewing times through higher quality true HD (resolution greater than 720p) experiences. They can also benefit from unprecedented network scalability using distributed HTTP-based Web servers and offer better quality to more customers. To understand how Smooth Streaming works and what its benefits are, we must first take a look at the history of multiple bit rate media streaming on the Web.
MBR ASF, and better image handling in Windows Media Player. Intelligent Streaming, of course, still required the media to be encoded as MBR ASF files with a tool such as Windows Media Encoder 9 Series. While the technology itself was well designed, its implementations suffered from shortcomings. It was limited to streaming only (no progressive download), and only from Windows Media servers. The encoders did not require that the multiple video streams be temporally aligned, let alone key-frame aligned, which made switching between streams difficult to do in a seamless fashion. Because the media was streamed, and streaming protocols function at constant rates, it was almost impossible to accurately predict overall client bandwidthparticularly in a timely fashion. By the time poor network conditions were detected, it was usually already too latethe Player often went through several iterations of re-buffering before finally downgrading the bit rate.
Even though streaming protocols are designed with media delivery in mind, the fact of the matter is that the Internet was built on HTTP and optimized for HTTP delivery. Consequently this begged the question, "Why not adapt media delivery to the Internet instead of trying to adapt the entire Internet to streaming protocols?" Move Networks proved on several occasions in 2008 that HTTP-based media delivery could be done successfully on a large scaleboth on-demand and live, such as with the broadcast of the Democratic National Convention , using Microsoft Silverlight as the client framework. Microsoft demonstrated this again during coverage of the 2008 Beijing Summer Olympic Games, when it prototyped its own HTTP-based adaptive streaming platform.
Page 4
Traditional Streaming
RTSP (Real-Time Streaming Protocol) is a good example of a traditional streaming protocol. RTSP is defined as a stateful protocol, which means that from the first time a client connects to the streaming server until the time it disconnects from the streaming server, the server keeps track of the client's state. The client communicates its state to the server by issuing it commands such as PLAY, PAUSE or TEARDOWN (the first two are obvious; the last one is used to disconnect from the server and close the streaming session). After a session between the client and the server has been established, the server begins sending the media as a steady stream of small packets (the format of these packets is known as RTP). The size of a typical RTP packet is 1452 bytes, which means that in a video stream encoded at 1 megabits per second (Mbps), each packet carries approximately 11 milliseconds of video. In RTSP the packets can be transmitted over either UDP or TCP transportsthe latter is preferred when firewalls or proxies block UDP packets, but can also lead to increased latency (TCP packets are re-sent until received).
HTTP, on the other hand, is known as a stateless protocol. If an HTTP client requests some data, the server responds by sending the data, but it won't remember the client or its state. Each HTTP request is handled as a completely standalone one-time session.
Page 5
Windows Media Services supports streaming over both RTSP and HTTP. But if HTTP is a stateless protocol, how can it be used for streaming? Windows Media Services uses a modified version of HTTP officially known as MS-WMSP (known in Windows Media Services as the Windows Media HTTP Streaming Protocol, or more commonly just as Windows Media HTTP). MS-WMSP uses standard HTTP for transfer of data and messages but also maintains session states, effectively turning it into a streaming protocol like RTSP. Windows Media Services has also supported RTSP streaming since 2003 (in Windows Media Services 9 Series) over both UDP and TCP. Its implementation of the protocol is publicly documented as MS-RTSP. Silverlight only supports HTTP-based delivery from Windows Media Services. The most important things to remember about traditional streaming protocols such as RTSP and Windows Media HTTP (MS-WMSP) are: The server sends the data packets to the client at a real-time rate onlythat is, the bit rate at which the media is encoded. For example, a video encoded at 500 kilobits per second (kbps) is streamed to clients at approximately 500 kbps. The server only sends ahead enough data packets to fill the client buffer. The client buffer is typically between 1 and 10 seconds (Windows Media Player and Silverlight default buffer length is 5 seconds). This means that if you pause a streamed video and wait 10 minutes, still only approximately 5 seconds of video will have downloaded to the client in that time.
Other examples of traditional streaming protocols include Adobe Systems' proprietary Real Time Messaging Protocol (RTMP) and RealNetworks' RTSP over Real Data Transport (RDT) protocol. The Dynamic Streaming stream-switching feature in the Adobe Flash Platform is based on the RTMP protocol and is, therefore, considered a traditional streaming methodnot adaptive streaming.
Progressive Download
Another common form of media delivery on the Web today is progressive download, which is nothing more than a simple file download from an HTTP Web server. Progressive download is supported by most media players and platforms, including Adobe Flash, Silverlight, and Windows Media Player. The term "progressive" stems from the fact that most player clients allow the media file to be played back while the download is still in progressbefore the entire file has been fully written to disk (typically to the Web browser cache). Clients that support the HTTP 1.1 specification can also seek to positions in the media file that haven't been downloaded yet by performing byte range requests to the Web server (assuming that it also supports HTTP 1.1). Popular video sharing Web sites on the Web today, including YouTube, Vimeo, MySpace, and MSN Soapbox, almost exclusively use progressive download. Unlike streaming servers that rarely send more than 10 seconds of media data to the client at a time, HTTP Web servers keep the data flowing until the download is complete. If you pause a progressively downloaded video at the beginning of playback and then wait, the entire video will eventually have downloaded to your browser cache, allowing you to smoothly play the whole video without any hiccups. Smooth Streaming Technical Overview Page 6
There is a downside to this behavior as wellif 30 seconds into a fully downloaded 10 minute video, you decide that you don't like it and quit the video, both you and your content provider have just wasted 9 minutes and 30 seconds worth of bandwidth. To try to mitigate this problem, IIS 7.0 provides a cool extension called Bit Rate Throttling, which allows content providers to throttle the download bit rate in exactly the same way that a streaming server would to reduce costs.
Page 7
Adaptive streaming, like other forms of HTTP delivery, offers the following advantages over traditional streaming to the content distributor: It's cheaper to deploy because adaptive streaming can use generic HTTP caches/proxies and doesn't require specialized servers at each node. It offers better scalability and reach, reducing "last mile" issues because it can dynamically adapt to inferior network conditions as it gets closer to the user's home. It lets the audience adapt to the content, rather than requiring content providers to guess which bit rates are most likely to be accessible to their audience.
It also offers the following benefits for the user: Fast start-up and seek times because start-up/seeking can be initiated on the lowest bit rate before moving to a higher bit rate. No buffering, no disconnects, no playback stutter (as long as the user meets the minimum bit rate requirement). Seamless bit rate switching based on network conditions and CPU capabilities. A generally consistent, smooth playback experience.
Microsoft created a prototype implementation of HTTP-based adaptive streaming for the NBC 2008 Beijing Summer Olympic Games Web site. To meet the project's rapid development schedule, this implementation was very straightforward. NBC used Digital Rapids and Anystream encoders to produce multiple Windows Media Video (WMV) files of different bit rates/resolutions for each source. The Smooth Streaming Technical Overview Page 8
encoders didn't employ any new encoding tricks but merely followed strict encoding guidelines (closed GOP, fixed-length GOP, VC-1 entry point headers, and so on.) which ensured exact frame alignment across the various bit rates of the same video. These WMV files were run through a post-processing tool that physically split each WMV file into thousands of 2-second chunks (files). The rest of the solution consisted of uploading the chunks to the CDN's Web servers and then building a Silverlight player that would download the chunks and play them in sequence. With this implementation, NBC and Microsoft were able to offer a better-than-WMS streaming experience while using just simple HTTP download, with increased average content viewing times that directly translated to better advertising and monetization opportunities. However, CDN operators lost many hours managing the millions of tiny files in their systems. Imagine: if each 2-seconds of video is split into a separate file and this is repeated for 5 available bit rates, you end up with 150 files for each minute of video. That's 13,500 files for a 90-minute soccer game! So despite the NBC Olympics site being a huge success for Silverlight and HTTP-based adaptive streaming, it quickly became apparent that to productize this solution and offer improved filemanagement benefits, elementary design changes were required.
Page 9
On the content-creation end, encoding on-demand Smooth Streaming-compatible video is already possible with Expression Encoder 2 Service Pack 1 (SP1). Note that you'll need to purchase the full version of Expression Encoder 2 to get Smooth Streaming encoding supportit's not included in the "Express" trial version. We recommend the Smooth Streaming Multi Bit Rate Calculator as a helper tool for encoding to multiple bit rate formats such as Smooth Streaming. In addition, Microsoft is working with a number of encoding independent software vendors (ISVs) to enable support for the Smooth Streaming format in their professional encoding products.
As any Web application developer will tell you, there's much more to building a good player than just setting a source URL for the media element. Fortunately for those who prefer not to write such code from scratch, there are two options already available for adding Smooth Streaming support to your Silverlight application: Expression Encoder 2 SP1 templates. The Silverlight 2 player templates included with Expression Encoder 2 SP1 include a ready-to-use Smooth Streaming module and complete source code (which, if needed, allows fine-tuning of the switching heuristics to adjust to particular needs of a given network or device). The Smooth Streaming object (AdaptiveStreaming.dll) can be easily integrated into any Silverlight project. See James Clarke's Weblog for additional Expression Encoder tips & tricks. Open Video Player (OVP). The Akamai-led Open Video Player Initiative is an open-source community project that strives to provide a best-of-breed video player platform for Silverlight and Flash. The Silverlight version of the Open Video Player provides integrated support for Page 10
Smooth Streaming playback, and is the video player used by Akamai on SmoothHD.com and many of their customer sites.
There are actually two parts to the Smooth Streaming format: the wire format, and the disk file format. In Smooth Streaming, a video is recorded in full length to the disk as a single file (one file per encoded bit rate), but it's transferred to the client as a series of small file chunks. The wire format defines the structure of the chunks that are sent by IIS to the client, whereas the file format defines the structure of the contiguous file on disk, enabling better file management. Fortunately, the MP4 specification allows MP4 to be internally organized as a series of fragments, which means that in Smooth Streaming the wire format is a direct subset of the file format.
In a nutshell, the file starts with file-level metadata ('moov') that generically describes the file, but the bulk of the payload is actually contained in the fragment boxes that also carry more accurate fragmentlevel metadata ('moof') and media data ('mdat'). (The diagram in Figure 3 only shows 2 fragments, but a typical Smooth Streaming file has a fragment for each 2 seconds of video/audio.) Closing the file is an 'mfra' index box that allows easy and accurate seeking within the file.
Page 12
Within the guidelines of the MP4 ISO Base Media File Format specification, the Smooth Streaming format uses a custom box organization schema and some custom boxes. To differentiate Smooth Streaming files from "vanilla" MP4 files, we use new file extensions: *.ismv (video+audio) and *.isma (audio only).
Describes the available streams to the client: the codecs used, bit rates encoded, video resolutions, markers, captions, etc. It's the first file delivered to the client
The manifest files are XML-formatted files. The server manifest file format is based specifically on the SMIL 2.0 XML format specification. A folder containing a single Smooth Streaming presentation might look something like the following:
Figure 5. A folder containing a Smooth Streaming encoded presentation. In this particular case, the audio track is contained in the NBA_3000000.ismv file.
Sample Smooth Streaming on-demand server manifest files (.ism) and Smooth Streaming client manifest files (.ismc) are included in the IIS Smooth Streaming Beta Sample Content for IIS Smooth Streaming Beta.
The first thing a player client requests from the Smooth Streaming server is the *.ismc client manifest. The manifest tells it which codecs were used to compress the content (so that the client runtime can initialize the correct decoder and build the playback pipeline), which bit rates and resolutions are available, and a list of the available chunks (with either their start times or durations). With IIS Smooth Streaming, clients request fragments in the form of a RESTful URL: http://video.foo.com/NBA.ism/QualityLevels(400000)/Fragments(video=610275114)
The values passed in the URL represent encoded bit rate (400000) and the fragment start offset (610275114) expressed in an agreed-upon time unit (usually 100 nanoseconds (ns)). These values are known from the client manifest. After receiving the client request, IIS Smooth Streaming looks up the quality level (bit rate) in the corresponding *.ism server manifest and maps it to a physical *.ismv or *.isma file on disk. It then reads the appropriate MP4 file, and based on its 'tfra' index box, figures out which fragment box ('moof' + 'mdat') corresponds to the requested start time offset. It then extracts the fragment box and sends it over the wire to the client as a standalone file. This is a particularly important part of the overall design because the sent fragment/file can now be automatically cached further down the network, potentially saving the origin server from sending the same fragment/file again to another client that requests the same RESTful URL. Requesting chunks of video/audio from the server is easy. But what about the bit-rate switching that makes adaptive streaming so effective? This part of the Smooth Streaming experience is implemented Smooth Streaming Technical Overview Page 15
entirely in client-side Silverlight application codethe server plays no part in the bit-rate switching process. The client-side code looks at chunk download times, buffer fullness, rendered frame rates, and other factors, and decides when to request higher or lower bit rates from the server. Remember, if during the encoding process we ensure that all bit rates of the same source are perfectly frame-aligned (same length GOPs, no dropped frames, etc.), then switching between bit rates is completely seamless.
Summary
Smooth Streaming is Microsoft's implementation of HTTP-based adaptive streaming, which is a hybrid media delivery method. It acts like streaming, but is based on HTTP progressive download. The HTTP downloads are performed in a series of small chunks, allowing the media to be easily and cheaply cached along the edge of the network, closer to clients. Providing multiple encoded bit rates of the same media source also allows clients to seamlessly and dynamically switch between bit rates depending on network conditions and CPU power. The resulting user experience is one of reliable, consistent playback without stutter, buffering or "last mile" congestion. In one word: Smooth.
Page 16
Legal Notice
This is a preliminary document and may be changed substantially prior to final commercial release of the software described herein. The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. 2009 Microsoft Corporation. All rights reserved. Microsoft, Expression, Silverlight, the Silverlight logo, Windows, the Windows logo, Windows Media, and Windows NT are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
Page 17