How RTP (Real-time Transport Protocol ) Works in VOIP?

How RTP (Real-time Transport Protocol ) Works in VOIP_
5 min read

The Real-Time Transfer Protocol with the acronym RTP was standardized in 1996. It allows the transmission of audio and video data in real-time. RTP has end-to-end transport capabilities for real-time applications on multicast or unicast network services. Thus, it is widely used for interactive audio and video conferencing.

In practical terms, RTP relies on a multitude of protocols. The TCP / IP architecture uses the UDP protocol. It is an integral part of the application, unlike other transport protocols like TCP. The use of the UDP protocol for the encapsulation of RTP packets includes certain constraints, particularly at the level of error correction. As a result, any lost or damaged package is merely ignored and discarded. The RTP protocol favors the concatenation and combination of sound and images rather than the integrity of the transported data.

The role of RTP is to ensure a uniform way to transmit data subject to real-time constraints. For this purpose, RTP injects time markers and sequence numbers to the various multimedia streams (audio, video, etc.), controls the destination arrival of the packets, and identifies the type of information transported. However, this protocol cannot reserve resources in the network, provide reliability in the network, or guarantee delivery time.

For example, in a multimedia session, each media is transported in separate RTP sessions. This allows the server to tailor the stream to the bandwidth of the recipients. Subsequently, thanks to the source identifier and the samples’ timestamp, the synchronization can be assured. RTP can be used alone, but it can be associated with the Real-Time Control Protocol (RTCP).

While various techniques offer guarantees on the allocation of resources in the network, they hardly provide the application of temporal guarantees on the routing of information, such as jitter, which is poorly controlled. It is faced with this problem that has been proposed a standard of real-time transmission protocol, which is the RTP. Real-Time Transport Protocol (RTP) and its Real-Time Transport Control Protocol (RTCP) add-in. They respectively allow to transport and control streams of data that have real-time properties.


More concisely, RTP is based on the following principles:

• Provide multimedia applications with a temporal reference that is not provided by any existing communication protocol.

• Provide multimedia applications with a transport protocol adapted to their structure

• Provide multimedia applications that are changed to network behaviors to evaluate the properties of the network (error rate, delay, jitter, etc.).

Despite the robustness of TCP, which has the advantage of managing a reliable transfer (re-sending IP packets in case of error), it is unfortunately incompatible with a real-time flow. As a result, UDP is used as a simpler protocol, i.e., without error correction nor sequencing of packets on arrival. For this, we will use two additional protocols: RTP and RTCP. These two protocols use different communications ports to reference the applications that run on both communicating machines (local and remote). An even port will be used by RTP and an odd port by RTCP.

RTSP. RTP. RTCP. TCP. (till now)

OSI Model


RTP can be described as a UDP add-in that adds to each transmitted packet valuable information about the sequence number (which will put the received packets back in order) plus a packet timestamp for the database restore. Of time. Thus, the receiver of information knows the date on which a packet was sent and can measure the time spent in the network to reduce the transmission time by comparing the transmission times of several packets of the same exchange.

RTP is managed at the application level. Its purpose is to provide end-to-end transport functions for real-time applications over multicast or unicast network services (audio conferencing, interactive video/video broadcasting, audio/simulation). The primary role of RTP is to implement IP packet sequence numbers to reconstruct the voice or video information even if the underlying network changes the packet order. From a technical point of view, RTP allows :

– Reconstitute the time base of the audio, video, and real-time data streams in general.
– Quickly detect packet loss and inform the source in a time compatible with the service.

Like any other telecommunication protocol, RTP has certain limitations :

– RTP does not act at the level of the routers.
– It does not control QoS.
– It does not reserve resources (RSVP intervenes at this level). RSVP is a signaling protocol that allows the application to make resource reservations on the Internet.
– RTP does not guarantee the delivery of the packages.
– RTP does not provide automatic retransmission of missing packets.

The RTCP protocol is built into RTP. Its primary role is to report (at the source) the network behavior (provide information about the quality of the network) by distributing statistical results made by the participants in an RTP session. Technically, this is called feedback.


The source can exploit this information that carries in real-time to adapt the type of coding to the level of available resources. It also ensures the communication of information, identification of participants of an RTP session. Thus everyone can know how many participants are part of the conference. Finally, it adjusts the transmission rate. This information will be used to improve the output rate and adapt it to accommodate all people wishing to join the event.

The confidentiality of media flows is achieved by encryption. As the data compression used with the payload formats described in this profile is applied end-to-end, the encryption can be done after the compression so that there is no conflict between the two operations. A potential denial of service threat exists for data encodings that use compression techniques with a computing load different from that of the receiving end. Under extreme conditions, an attacking potential can inject into the flow of complex pathogenic datagrams to be decoded that cause the overload of the recipient. As with any IP-based protocol, a receiver may be overloaded merely by receiving too many desired or unwanted packets in some circumstances.

Thus, software that deploys the RTP protocol is considered as an IP network monitoring software. It may have audio conversion options in addition to all of its Capture Utility and Frame Analyzer capabilities. For example, it would be possible to know how a hacker with access to a LAN can easily sniff it, analyze RTP streams and listen to a particular conversation. This type of software exists for use in a Windows or Linux environment. In both cases, a graphical interface facilitates the commands of such software for backup and exploitation.