Voice Over IP: Protocols and Standards: Rakesh Arora
Voice Over IP: Protocols and Standards: Rakesh Arora
Abstract
This paper first discusses the key issues that inhibit Voice over IP (VOIP) to be popular
with the users. Then I discuss the protocols and standards that exist today and are
required to make the VOIP products from different vendors to interoperate. The main
focus is on H.323 and SIP (Session Initiation Protocol), which are the signaling protocols.
We also discuss some hardware standards for internet telephony.
See Also: Voice over IP - Products, Services and Issues| Voice over IP (Lecture by Dr
Jain) | Voice over ATM| H.323 and Associated Protocols| VOIP References| Books on
Voice over IP and IP Telephony
Other Reports on Recent Advances in Networking
Back to Raj Jain's Home Page
TABLE OF CONTENTS
• 1. Introduction
o 1.1 Main Issues
• 2. H.323 Standard
o 2.1 Components of H.323
o 2.2 H.323 Protocol Stack
o 2.3 Definitions
o 2.4 Control and Signaling in H.323
o 2.5 Call Setup in H.323
• 3. Session Initiation Protocol (SIP)
o 3.1 Components of SIP
o 3.2 SIP Messages
o 3.3 Overview of SIP Operation
o 3.4 Sample SIP operation
• 4. Comparison of H.323 with SIP
• 5. Supporting Protocols
o 5.1 Media Gateway Access Protocol
o 5.2 RTP and RTCP
o 5.3 Real Time Streaming Protocol
o 5.4 Resource Reservation Protocol
o 5.5 Session Description Protocol
o 5.6 Session Announcement Protocol
• 6. Hardware Standards
o 6.1 SCBus
o 6.2 S.100
• 7. Summary
• Appendix A: Functions of the key protocols and standards
• References
• List of Acronyms
INTRODUCTION
Voice over IP (VOIP) uses the Internet Protocol (IP) to transmit voice as packets over an
IP network. So VOIP can be achieved on any data network that uses IP, like Internet,
Intranets and Local Area Networks (LAN). Here the voice signal is digitized, compressed
and converted to IP packets and then transmitted over the IP network. Signaling protocols
are used to set up and tear down calls, carry information required to locate users and
negotiate capabilities.One of the main motivations for Internet telephony is the very low
cost involved. Some other motivations are:
For VOIP to become popular, some key issues need to be resolved. Some of these issues
stem from the fact that IP was designed for transporting data while some issues have
arisen because the vendors are not conforming to the standards. The key issues are
discussed below [Munch98]:
• Quality of voice
As IP was designed for carrying data, so it does not provide real time guarantees
but only provides best effort service. For voice communications over IP to
become acceptable to the users, the delay needs to be less than a threshold value
and the IETF (Internet Engineering Task Force) is working on this aspect. To
ensure good quality of voice, we can use either Echo Cancellation, Packet
Prioritization (giving higher priority to voice packets) or Forward Error
Correction [Micom].
• Interoperability
In a public network environment, products from different vendors need to operate
with each other if voice over IP is to become common among users. To achieve
interoperability, standards are being devised and the most common standard for
VOIP is the H.323 standard, which is described in the next section.
• Security
This problem exists because in the Internet, anyone can capture the packets meant
for someone else. Some security can be provided by using encryption and
tunneling. The common tunneling protocol used is Layer 2 Tunneling protocol
and the common encryption mechanism used is Secure Sockets Layer (SSL).
• Scalability
As researchers are working to provide the same quality over IP as normal
telephone calls but at a much lower cost, so there is a great potential for high
growth rates in VOIP systems. VOIP systems needs to be flexible enough to grow
to large user market and allow a mix of private and public services.
2. H.323 STANDARD
This is the ITU-Ts (International Telecommunications Union) standard that vendors
should comply while providing Voice over IP service. This recommendation provides the
technical requirements for voice communication over LANs while assuming that no
Quality of Service (QoS) is being provided by LANs. It was originally developed for
multimedia conferencing on LANs, but was later extended to cover Voice over IP. The
first version was released in 1996 while the second version of H.323 came into effect in
January 1998. The standard encompasses both point to point communications and
multipoint conferences. The products and applications of different vendors can
interoperate if they abide by the H.323 specification.
H.323 defines four logical components viz., Terminals, Gateways, Gatekeepers and
Multipoint Control Units (MCUs). Terminals, gateways and MCUs are known as
endpoints. These are discussed below [DataBeam]:
2.1.1 Terminals
These are the LAN client endpoints that provide real time, two way communications. All
H.323 terminals have to support H.245, Q.931, Registration Admission Status (RAS) and
Real Time Transport Protocol (RTP). H.245 is used for allowing the usage of the
channels, Q.931 is required for call signaling and setting up the call, RTP is the real time
transport protocol that carries voice packets while RAS is used for interacting with the
gatekeeper.These protocols have been discussed later in the paper. H.323 terminals may
also include T.120 data conferencing protocols, video codecs and support for MCU. A
H.323 terminal can communicate with either another H.323 terminal, a H.323 gateway or
a MCU.
2.1.2 Gateways
An H.323 gateway is an endpoint on the network which provides for real-time, two-way
communications between H.323 terminals on the IP network and other ITU terminals on
a switched based network, or to another H.323 gateway. They perform the function of a
"translator" i.e. they perform the translation between different transmission formats, e.g
from H.225 to H.221. They are also capable of translating between audio and video
codecs. The gateway is the interface between the PSTN and the Internet. They take voice
from circuit switched PSTN and place it on the public Internet and vice versa. Gateways
are optional in that terminals in a single LAN can communicate with each other directly.
When the terminals on a network need to communicate with an endpoint in some other
network, then they communicate via gateways using the H.245 and Q.931 protocols.
2.1.3 Gatekeepers
It is the most vital component of the H.323 system and dispatches the duties of a
"manager". It acts as the central point for all calls within its zone (A zone is the
aggregation of the gatekeeper and the endpoints registered with it) and provides services
to the registered endpoints. Some of the functionalities that gatekeepers provide are listed
below [DataBeam] [H.323]:
The MCU is an endpoint on the network that provides the capability for three or more
terminals and gateways to participate in a multipoint conference. The MCU consists of a
mandatory Multipoint Controller (MC) and optional Multipoint Processors (MP). The
MC determines the common capabilities of the terminals by using H.245 but it does not
perform the multiplexing of audio, video and data. The multiplexing of media streams is
handled by the MP under the control of the MC. The following figure [Fig1] shows the
interaction between all the H.323 components
The following figure [Fig 2] shows the H.323 protocol stack. The audio, video and
registration packets use the unreliable User Datagram Protocol (UDP) while the data and
control application packets use the reliable Transmission Control Protocol (TCP) as the
transport protocol. Except for the T.120 protocol, the other protocols are described in the
paper. The T.120 protocol is used for defining the data conferencing part. [Toga99]
Fig 2. The protocol stack of H.323
2.3 Definitions
2.3.1 Zone
The collection of a gatekeeper and the endpoints registered with it is called a zone.
For each H.323 entity, a network address is assigned and this address uniquely identifies
the H.323 entity on the network. An endpoint may use different network addresses for
different channels within the same call.
The alias address provides an alternate method of addressing the endpoint. It could be an
email address, a telephone number or something similar. An endpoint may have one or
more alias addresses associated with it and is unique within a zone.
H.323 provides three control protocols viz., H.225.0/Q.931 Call Signaling, H.225.0 RAS
and H.245 Media Control. H.225/ Q.931 is used in conjunction with H.323 and provides
the signaling for call control. For establishing a call from a source to a receiver host, the
H.225 RAS (Registration, Admission and Signaling) channel is used. After the call has
been established, H.245 is used to negotiate the media streams.
The RAS channel is used for the communication between the endpoints and the
gatekeeper. Since the RAS messages are sent over UDP (an unreliable channel), so it
recommends timeouts and retry counts for messages. The procedures defined by the RAS
channel are [H.323]:
Gatekeeper discovery
This is the process that an endpoint uses to determine the gatekeeper with which it should
register. The endpoint normally multicasts a Gatekeeper Request (GRQ) message asking
for its gatekeeper. One or more gatekeepers may respond with the Gatekeeper
Confirmation (GCF) message thereby indicating the willingness to be the gatekeeper for
that endpoint. The response includes the transport address of the gatekeepers RAS
channel. Gatekeepers who do not want the endpoint to register with it can send a
Gatekeeper Reject (GRJ) message. If more than one gatekeeper responds with GCF, then
the endpoint may choose the gatekeeper and register with it. If no gatekeeper responds
within a timeout interval, the endpoint may retransmit the GRQ.
Endpoint Registration
This is the process by which an endpoint joins a zone and informs the gatekeeper of its
transport and alias addresses. All endpoints usually register with the gatekeeper that was
identified through the discovery process. An endpoint shall send a Registration Request
(RRQ) to a gatekeeper. This is sent to the gatekeepers RAS channel Transport Address.
The endpoint has the network address of the gatekeeper from the gatekeeper discovery
process and uses the well known RAS channel TSAP Identifier. The gatekeeper shall
respond with either a Registration Confirmation (RCF) or a Registration Reject (RRJ).
The gatekeeper shall ensure that each alias address translates uniquely to a single
transport address. An endpoint may cancel its registration by sending an Unregister
Request (URQ) message to the gatekeeper. The gatekeeper shall respond with an
Unregister Confirmation (UCF) message. A gatekeeper may cancel the registration of an
endpoint by sending an Unregister Request (URQ) message to the endpoint. The endpoint
shall respond with an Unregister Confirmation (UCF) message
Endpoint Location
An endpoint or gatekeeper which has an alias address for an endpoint and would like to
determine its contact information may issue a Location request (LRQ) message. The
gatekeeper with which the requested endpoint is registered shall respond with the
Location Confirmation (LCF) message containing the contact information of the endpoint
or the endpoints gatekeeper. All gatekeepers with which the requested endpoint is not
registered shall return Location Reject (LRJ) if they received the LRQ on the RAS
channel
The call signaling channel is used to carry H.225 control messages. In networks that do
not contain a gatekeeper, call signaling messages are passed directly between the calling
and called endpoints using the Call Signaling Transport Addresses. It is assumed that the
calling endpoint knows the Call Signaling Transport Address of the called endpoint and
thus can communicate directly. In networks that do contain the gatekeeper, the initial
admission message exchange takes place between the calling endpoint and the gatekeeper
using the gatekeepers RAS channel transport address. The call signaling is done over
TCP (reliable channel).
H.245 is the media control protocol that H.323 systems utilize after the call establishment
phase has been completed. H.245 is used to negotiate and establish all of the media
channels carried by RTP/RTCP. The functionality offered by H.245 are [Toga99]:
A user agent is an end system acting on behalf of a user. There are two parts to it: a client
and a server. The client portion is called the User Agent Client (UAC) while the server
portion is called User Agent Server (UAS). The UAC is used to initiate a SIP request
while the UAS is used to receive requests and return responses on behalf of the user.
There are 3 types of servers within a network. A registration server receives updates
concerning the current locations of users. A proxy server on receiving requests, forwards
them to the next-hop server, which has more information about the location of the called
party. A redirect server on receiving requests, determines the next-hop server and returns
the address of the next-hop server to the client instead of forwarding the request.
SIP defines a lot of messages. These messages are used for communicating between the
client and the SIP server. These messages are:
Callers and cal lees are identified by SIP addresses. When making a SIP call, a caller first
needs to locate the appropriate server and send it a request. The caller can either directly
reach the cal lee or indirectly through the redirect servers. The Call ID field in the SIP
message header uniquely identifies the calls. Below I briefly discuss how the protocol
performs its operations [RFC2543].
The SIP hosts are identified by a SIP URL which is of the form sip:username@host. A
SIP address can either designate an individual or a whole group.
The client can either send the request to a SIP proxy server or it can send it directly to the
IP address and port corresponding to the Uniform Request Identifier (URI).
Once the host part of the Request URI has been resolved to a SIP server, the client can
send requests to that server. A request together with the responses triggered by that
request make up a SIP transaction. The requests can be sent through reliable TCP or
through unreliable UDP.
A successful SIP invitation consists of two requests: a INVITE followed by ACK. The
INVITE request asks the callee to join a particular conference or establish a two party
conversation. After the callee has agreed to participate in the call, the caller confirms that
it has received that response by sending an ACK request. The INVITE request contains a
session description that provides the called party with enough information to join the
session. If the callee wishes to accept the call, it responds to the invitation by returning a
similar session description.
A callee may keep changing its position with time. These locations can be dynamically
registered with the SIP server. When the SIP server is queried about the location of a
callee, it returns a list of possible locations. A Location Server in the SIP system actually
generates the list and passes it to the SIP server.
Sometimes we may need to change the parameters of an existing session. This is done by
re-issuing the INVITE message using the same Call ID but a new body to convey the
new information.
Here a basic example of a SIP operation is given where a client is inviting a participant
for a call. A SIP client creates an INVITE message for [email protected]., which is
normally sent to a proxy server. This proxy server tries to obtain the IP address of the SIP
server that handles requests for the requested domain. The proxy server consults a
Location Server to determine this next hop server. The Location server is a non SIP server
that stores information about the next hop servers for different users. On getting the IP
address of the next hop server, the proxy server forwards the INVITE to the next hop
server. After the User Agent Server (UAS) has been reached, it sends a response back to
the proxy server. The proxy server in-turn sends back a response to the client. The client
then confirms that it has received the reponse by sending an ACK. The exchange of
messages is shown in the figure below (Fig 3). In this case, we had assumed that the
client's INVITE request was forwarded to the proxy server. However, if it had been
forwarded to a redirect server, then the redirect server returns the IP address of the next
hop server to the client.The client then directly communicates with the UAS
[Schulzrinne99b].
Fig 3. Example of a SIP operation
5. SUPPORTING PROTOCOLS
SIP works in conjunction with RSVP (Resource Reservation Protocol), RTP/RTCP (Real-
time Transport Protocol), RTSP (Real-time Streaming Protocol), SAP(Session
Announcement Protocol) and SDP (Session Description Protocol). RTP/RTCP is used for
transporting real time data, RSVP for reserving resources, RTSP for controlled delivery
of streams, SAP for advertising multimedia sessions and SDP for describing multimedia
sessions. H.323. too works in conjunction with RTP and RTCP (Real-time Control
Protocol). The present day voice gateways usually compose of two parts: the signaling
gateway and the media gateway. The signaling gateway communicates with the media
gateway using MGCP (Media Gateway Access Protocol). MGCP can interoperate with
both SIP and H.323 [Huitema99]. The following figure (Fig 4) shows the signaling and
transport protocols required for delivering voice over IP [Schulzrinne99b].
Fig 4. Signaling protocols SIP and H.323 with some of its supporting protocols
It is a protocol that defines communication between call control elements (Call Agents)
and telephony gateways. Call Agents are also known as Media Gateway Controllers. It is
a control protocol, allowing a central coordinator to monitor events in IP phones and
gateways and instructs them to send media to specific addresses. It resulted from the
merger of the Simple Gateway Control Protocol and Internet Protocol Device Control.
The call control intelligence is located outside the gateways and are handled by external
call control elements, the Call Agent. MGCP assumes that these call control elements or
Call Agents will synchronize with each other to send coherent commands to the gateways
under their control. It is a master/slave protocol, where the gateways are expected to
execute commands sent by the Call Agents. It has introduced the concepts of connections
and endpoints for establishing voice paths between two participants, and the concepts of
events and signals for establishing and tearing down calls. Since the main emphasis of
MGCP is simplicity and reliability and it allows programming difficulties to be
concentrated in Call Agents, so it will enable service providers to develop reliable and
cheap local access systems. [IDMGCP] [Huitema99]
A call agent may ask to be notified about certain eventsoccuring in an endpoint, such as
off-hook, on-hook, dialed digits, and may request that a certain signalbe applied to an
endpoint such as dial-tone, busy tone or ringing. Events and signals are grouped in
packages that are supported by a particular type of endpoint e.g., one package may
support a certain group of events and signals for analog access lines.
Connections are created on the call agent at each endpoint that will be involved in the
call. When the two endpoints are located on gateways that are managed by the same call
agent, the creation is done via the following three steps [IDMGCP]:
• The Call Agent asks the first gateway to create a connection on the first endpoint.
The response sent by the gateway includes a session description that contains
pertinent information required by third parties to be able to send packets to the
new connection that has been created.
• The Call Agent then sends the session description of the first gateway to the
second gateway and asks it to create a connection on the second endpoint. The
second gateway responds by sending its own session description.
• The Call Agent uses a modify connection command to provide this second session
description to the first endpoint. Now communication can occur in both
directions.
When the two endpoints are located on gateways that are managed by the different call
agents, these two call agents shall exchange information through a call agent to call agent
signaling protocol, in order to synchronize the creation of the connection on the two
endpoints.
5.1.4 Commands
The MGCP implements the media gateway control interface as a set of transactions. The
transactions are composed of a command and a mandatory response. There are 8 types of
command [Huitema99]:
5.2 RTP and RTCP (Real-time Transport Protocol and Real-time Control
Protocol)
RTP supports the transfer of real-time media (audio and video) over packet switched
networks. It is used by both SIP and H.323. The transport protocol must allow the
receiver to detect any losses in packets and also provide timing information so that the
receiver can correctly compensate for delay jitter. The RTP header contains information
that assist the receiver to reconstruct the media and also contains information specifying
how the codec bitstreams are broken up into packets. RTP does not reserve resources in
the network but instead it provides information so that the receiver can recover in the
presence of loss and jitter. [Chunlei97] [RFC1889]
• Sequencing: The sequence number in the RTP packet is used for detecting lost
packets
• Payload Identification: In the Internet, it is often required to change the encoding
of the media dynamically to adjust to changing bandwidth availability. To provide
this functionality, a payload identifier is included in each RTP packet to describe
the encoding of the media
• Frame Indication: Video and audio are sent in logical units called frames. To
indicate the beginning and end of the frame, a frame marker bit has been provided
• Source Identification: In a multicast session, we have many participants. So an
identifier is required to determine the originator of the frame. For this
Synchronization Source (SSRC) identifier has been provided.
• Intramedia Synchronization: To compensate for the different delay jitter for
packets within the same stream, RTP provides timestamps which are needed by
the play-out buffers.
RTCP is a control protocol and works in conjunction with RTP. In a RTP session,
participants periodically send RTCP packets to obtain useful information about QoS etc.
The additional services that RTCP provides to the participants are:
• QoS feedback: RTCP is used to report the quality of service. The information
provided includes number of lost packets, Round Trip Time, jitter and this
information is used by the sources to adjust their data rate.
• Session Control: By the use of the BYE packet, RTCP allows participants to
indicate that they are leaving a session
• Identification: Information such as email address, name and phone number are
included in the RTCP packets so that all the users can know the identities of the
other users for that session.
• Intermedia Synchronization: Even though video and audio are normally sent over
different streams, we need to synchronize them at the receiver so that they play
together. RTCP provides the information that is required for synchronizing the
streams.
RTSP, the Real Time Streaming Protocol, is a client-server protocol that provides control
over the delivery of real-time media streams. It provides "VCR-style" remote control
functionality for audio and video streams, like pause, fast forward, reverse, and absolute
positioning. It provides the means for choosing delivery channels (such as UDP, multicast
UDP and TCP), and delivery mechanisms based upon RTP. RTSP establishes and controls
streams of continuous audio and video media between the media servers and the clients.
A media server provides playback or recording services for the media streams while a
client requests continuous media data from the media server. RTSP acts as the "network
remote control" between the server and the client. It supports the following operations:
[Chunlei97] [RFC2326]
• Retrieval of media from media server: The client can request a presentation
description, and ask the server to setup a session to send the requested data. The
server can either multicast the presentation or send it to the client using unicast.
• Invitation of a media server to a conference: The media server can be invited to
the conference to play back media or to record a presentation.
• Addition of media to an existing presentation: The server or the client can notify
each other about any additional media that has become available.
Features of RTSP include:
• RTSP is an application level protocol with syntax and operations similar to HTTP,
but works for audio and video. It uses URLs like those in HTTP.
• An RTSP server needs to maintain states, using SETUP, TEARDOWN and other
methods.
• Unlike HTTP, in RTSP both servers and clients can issue requests.
• RTSP is implemented on multiple operating system platforms and it allows
interoperability between clients and servers from different manufacturers.
The network delay and Quality of Service are the most hindering factors in the voice-data
convergence. The most promising solution to this problem has been developed by IETF
viz., RSVP. RSVP can prioritize and guarantee latency to specific IP traffic streams.
RSVP enables a packet-switched network to emulate a more deterministic circuit
switched voice network. With the advent of RSVP, VOIP has become a reality today.
With RSVP enabled, we can accomplish voice communication with tolerable delay on a
data network. RSVP requests will generally result in resources being reserved in each
node along the data path. RSVP requests resources in only one direction, therefore it
treats a sender as logically distinct from a receiver, although the same application process
may act as both a sender and a receiver at the same time. RSVP is not itself a routing
protocol, it is designed to operate with current and future unicast and multicast routing
protocols. In order to efficiently accommodate large groups, dynamic group membership,
and heterogeneous receiver requirements, RSVP makes receivers responsible for
requesting a specific QoS. A QoS request from a receiver host application is passed to the
local RSVP process. The RSVP protocol then carries the request to all the nodes along
the reverse data path to the data source. RSVP has the following attributes [RFC2205]:
• It is receiver oriented
• It supports both unicast and multicast
• It maintains soft state in routers and hosts, providing graceful support for dynamic
membership changes
• It provides transparent operation through routers that do not support it
SDP is intended for describing multimedia sessions for the purpose of session
announcement, session invitation etc. The purpose of SDP is to convey information about
media streams in multimedia sessions to allow the recipients of a session description to
participate in the session. SDP includes the following information: [RFC2327]
• Session name and purpose
• Address and port number
• Start and stop times
• Information to receive those media
• Information about the bandwidth to be used by the conference
• Contact information for the person responsible for the session
The above information is conveyed in a simple textual format. When a call is set up using
SIP, the INVITE message contains an SDP body describing the session parameters
acceptable to the calling party. The response from the callee includes a SDP body
describing the capabilities of the callee. In general, SDP must convey enough information
to be able to join a session and to announce the resources to be used to non-participants
that may need to know. The media information that SDP sends are: type of media (audio
or video), transport protocol (RTP, UDP etc) and media format (MPEG video, H.263
video etc).
This protocol is used for advertising the multicast conferences and other multicast
sessions. A SAP announcer periodically multicasts an announcement packet to a well
known multicast address and port (port number 9875). A SAP listener learns of the
multicast scopes using the Multicast Scope Zone Announcement Protocol and listens on
the well known SAP address and port for those scopes. There is no rendezvous
mechanism the SAP announcer is not aware of the presence or absence of any SAP
listeners. A SAP announcement is multicast with the same scope as the session it is
announcing, ensuring that the recipients of the announcement can also be potential
recipients of the session being advertised. If a session uses addresses in multiple
administrative scope ranges, it is necessary for the announcer to send identical copies of
the announcement to each administrative scope range. Multiple announcers may
announce a single session, as an aid to robustness in the face of packet loss and failure of
one or more announcers. The time period between repetitions of an announcement is
chosen such that the total bandwidth used by all announcements on a single SAP group
remains below a preconfigured limit. Each announcer is expected to listen to other
announcements in order to determine the total number of sessions being announced on a
particular group. SAP is intended to announce the existence of a long-lived wide area
multicast sessions and involves a large startup delay before a complete set of
announcements is heard by a listener. In order to reduce the delays inherent in SAP, it is
recommended that proxy caches be deployed. A SAP proxy is expected to listen to all
SAP groups in its scope and maintain an up-to-date list of all announced sessions along
with the time each announcement was last received. SAP also contains mechanisms for
ensuring integrity of session announcements, for authenticating the origin of an
announcement and for encrypting such announcements. [IDSAP]
Back to Table of Contents
6. HARDWARE STANDARDS
Some hardware standards for computer telephony have come up over the past few years.
They attempt to provide interoperability among the telephony products form different
vendors. Two of these standards (SCBus and S.100) are discussed below:
6.1 SCBUS
The SCBus is a high speed digital TDM (Time Division Multiplexing) bus developed for
computer telephony. It is a standalone component of SCSA (Signal Computing System
Architecture) that makes it easier to build more scalable systems using devices from
multiple vendors. It provides tight integration of hardware resources from different
vendors. The features provided by SCBus include [SCSA]:
The SCBus standard has been endorsed by American National Standards Institute (ANSI)
and telephony products from several vendors are based on it [Jain98].
6.2 S.100
The S.100 standard has been endorsed by Enterprise Computer Telephony Forum (ECTF)
7. SUMMARY
In this paper, we discussed the signaling protocols H.323 (ITU-T standard) and SIP (IETF
standard). We compared both the protocols and noted that although H.323 has more share
of the market at present, but SIP is a much better protocol given its simplicity and
scalability. We also discussed MGCP, which is a gateway protocol whereby the Call
Agent controls the signaling gateway. For both H.323 and SIP, we need some real-time
protocols that does the actual transport. RTP and RTCP are used for the real-time
transport and controling. RTSP is used to provide controlled delivery of media streams.
We also saw some protocols that are required in conjunction with SIP so as to advertise
the session (SAP) and give a description of the session (SDP). RSVP is used to reserve
resources in the network and thereby provide some Quality of Service. Finally, we
discussed two hardware standards, viz SCBus and S.100.
A table summarizing the key protocols and standards can be found in Appendix A.
References
[H.323] ITU, "Packet Based Multimedia Communications Systems", Feb 1998, 125
pages
This describes the H.323 standard in detail
[IDMGCP] "Media Gateway Control Protocol (MCGP)", August 1999, 111 pages,
ftp://www.ietf.org/internet-drafts/draft-huitema-megaco-mgcp-v0r1-05.txt
This Internet Draft gives a detailed explanation of the MGCP protocol
[Schulzrinne98] H Schulzrinne, J Rosenberg, "A Comparison of SIP and H.323 for
Internet Telephony", July 1998, 4 pages, Proc NOSSDAV'98
http://www.cs.columbia.edu/~hgs/papers/Schu9807_Comparison.pdf
This paper compares the ITU's standard with the IETF standard.
[Rizzetto99] D Rizzetto, C Catania, "A Voice over IP Service Architecture for Integrated
Communications", IEEE Internet Computing May/June 1999 pg 53-62
http://computer.org/internet/ic1999/w3053abs.htm
This paper proposes an architecture that integrates the circuit-switched communications
with the Internet.
[Jain98] Raj Jain, "Voice over IP: Issues and Challenges", Nortel, Canada, August 14,
1998 and Southwestern Bell, Atlanta, October 21, 1998, 42 slides, http://www.cse.ohio-
state.edu/~jain/talks/voip.htm
This slideshow gives a good introduction to VOIP and its standards
[Micom] Micom, "Voice/Fax over IP: Internet, Intranet and Extranet", 47 pages,
http://www.saintrochtree.com/cgi-
bin/go2.pl?http://www.micom.com/WhitePapers/whtpaper.pdf
This white paper gives us an introduction to VOIP
[Chunlei97] Chunlei Liu, "Multimedia Over IP: RSVP, RTP, RTCP, RTSP", Jan 1998, 23
pages, http://www.cse.ohio-state.edu/~jain/cis788-97/ip_multimedia/index.htm
This survey paper explains the RSVP, RTSP and RTP protocols.
[Jones99] R Jones, J Cruz, "Carrier Class Voice over IP" , August 1999, 9 pages,
http://www.digital.com/info/LIW0PF/
This white paper gives us a brief introduction of Internet telephony
[RFC1889] "RTP: A Transport Protocol for Real-time applications", Jan 1996, 65 pages
http://www.ietf.org/rfc/rfc1889.txt
This RFC describes RTP, the real-time end-to-end transport protocol
Books
[Black99] Ulyess Black, "Voice over IP", 1999, 328 pages, Prentice Hall
This book explains the protocols for VOIP
[Goralski99] W Goralski, M Kolon, "IP Telephony", 1999, 468 pages, McGraw Hill
This book gives a good overview of Voice over IP.
List of Acronyms
SIP Session Initiation Protocol
ITU International Telecommunications Union
SAP Session Announcement Protocol
MGCP Media Gateway Control Protocol
SDP Session Description Protocol
RSVP Resource Reservation Protocol
RTP Real Time Transport Protocol
RTCP RTP Control Protocol
MCU Multipoint Control Unit
UAS User Agent Server
UAC User Agent Client
RAS Registration, Admission and Status
TSAP Transport layer Service Access Point