content from everywhere is one of the growing challenge for
the products and services that are being introduced starting
now : from home, from work, from the street we want to see
photos, play music, watch news or movies.
Even if local and portable storage of A/V data is partially an
answer to this request for ubiquity, some use cases still
requires live access to multimedia program: news and sports,
video phone or conferencing.
Moreover, the competition around fast Internet access using
xDSL or IP over cable technology is about to make old cable-TV
technology an outdated application more early than expected.
From provider to customer, we are all
concerned by emergence of this technology : multimedia
streaming over IP network.
We will try to have a look at what is
involved to build a working streaming system :
- Protocols and Standards
- Clocks and Synchronization
Research have been conducted by workgroups
of the IETF (Internet Engineering Task Force) on the problem
of transmitting multimedia content over IP networks early
90’s. While some olders protocols have been abandoned, a
triplet of communication protocols have been setup to allow
streaming of nearly any type of content to one or multiple
These protocols are :
- RTSP : the Real-Time Streaming Protocol
- RTP : the Real-time Transport Protocol
- RTCP : the RTP Control Protocol
All begins with the RTSP. This protocol,
plugged onto TCP, is mandated to manage the streaming session.
This is what we could call the Internet Remote Control.
This is the protocol that allows the client to know about
available streams at a location, get stream parameters (used
codecs, bit rate, content, …) and most of all , this is the
protocol that allows to PLAY, PAUSE, STOP : just what you
generally expect from a remote control. Stream descriptions
exchanged using RTSP are done using stream description
language called SDP (session description protocol).
Although RTSP name contains " streaming
", it is not RTSP that " streams " the data to
RTP is the protocol used to transport the multimedia stream to
Because multimedia is inherently real-time, and because in
real-time systems after-time is no more the time, RTP is build
upon UDP( ). Using IDP, RTP send a packet to the network but
cannot warrant that the packet will reach its destination.
That's the reason why, RTP adds information
in front of the content it encapsulates so that the recipient
can check whereas a packet have been lost or have been
transmitted in the wrong order. The multimedia content
transported by a RTP packet is called the payload.
Some payloads format are well known and as such have been
allocated a legacy (static) payload identifier. This includes
audio payloads such as G.721, G.722, GSM or video payloads
such as JPEG or H.261. These legacy formats doesn’t require
a complex description in SDP.
Other A/V formats fall into the case of dynamic payload
identifier and require a more complex description in the SDP
TCP is one of the most common communication protocol
of the IP galaxy. TCP (Transmission Control Protocol)
allows session-oriented (or connection-oriented)
communications. Ie this is the protocol dedicated for
point to point communication where each one knows who
is talking to and checking that information reach the
recipient. It is more like a phone call : you know who
you are talking to and you always check that the
person on the other end have understood what just told
to him. Main IP applications are using TCP as the base
protocol (FTP, HTTP, …).
2. UDP is the other most used communication protocol
but generally less known. UDP (User Datagram Protocol)
is a connection-less protocol. That means that UDP
doesn’t take into account a handshake protocol, it
is more like a push protocol : just shouting into the
tube and don’t check if there is someone’s ear at
the other end. Because it doesn’t use handshake, it
is a little faster to manage than TCP, but also,
delivery of messages (datagram) are not warranted.
The last protocol, RTCP is used to monitor
the session. It is mainly used to feed the streaming server
with reception statistics from the client. The server may then
decide to use these statistics (such as the numbers of lost
packets, the delay from reception, …) to adapt its strategy.
The diagram below shows an overview of the
relations between protocols:
There are related the 2 main problems of
- Insuring that the client player is
playing the multimedia stream at the correct speed
- Insuring that the different part of a
multimedia stream (audio and video for instance) keeps
The answer is quite easy : all RTP packets
contain a header with a timestamp. The timestamp of the 1st
packet is randomly chosen or could start at 0. Then each
packed is dated with an integer value that gives the clock
time at which the packet should be played or rendered.
RTP streams audio and video on separate
channels and because they are based on different sampling
rates, different clocks reference may be used to date the
audio and video packets.
For instance, ADPCM and G.72x audio payloads are dated using
the sampling frequency as references clock whereas MPEG audio
or video are dated using a fixed 90kHz clock reference.
So how could the player manage to keep audio and video
synchronous when they are dated using different reference ?
As part of the RTCP, the RTP control protocol, the server will
deliver regularly synchronization information by referring the
channel clock reference to an absolute clock reference which
is common for the whole system.