Linux Networking And Useful Tips for Real-Time Applications : What is EPOLL? Epoll vs Poll vs Select call ? And How to implement UDP server in Linux using EPOLL?

Monday 27 May 2013

What is EPOLL? Epoll vs Poll vs Select call ? And How to implement UDP server in Linux using EPOLL?

Today in internet world, as the number of users are increasing day to day and to support these users it needs more efficient HTTP servers.

A common problem in HTTP server scalability is how to ensure that the server handles a large number of connections simultaneously without degrading the performance.

An event-driven approach is often implemented in high-performance network servers to multiplex a large number of concurrent connections over a few server processes.

In event-driven servers it is important that the server focuses on connections that can be serviced without blocking its main process.

What is EPOLL?

===========

epoll - I/O event notification facility

Select Vs poll Vs Epoll

==================

The Epoll event mechanism is designed to scale to larger numbers of connections than select and poll.

One of the problems with select and poll is that in a single call they must both inform the kernel of all of the events of interest and obtain new events.

This can result in large overheads, particularly in environments with large numbers of connections and relatively few new events occurring.

However, if your server application is network-intensive (e.g., 1000s of concurrent connections and/or a high connection rate), you should get really serious about performance.

This situation is often called the c10k problem. With select() or poll(), your network server will hardly perform any useful things but wasting precious CPU cycles under such high load.

c10k Problem

===========

Suppose that there are 10,000 concurrent connections. Typically, only a small number of file descriptors among them, say 10, are ready to read.

The rest 9,990 file descriptors are copied and scanned for no reason, for every select()/poll() call.

Another Example as :

The cost of Epoll is closer to the number of file descriptors that actually have events on them.

If you're monitoring 200 file descriptors, but only 100 of them have events on them, then you're (very roughly) only paying for those 100 active file descriptors.

This is where Epoll tends to offer one of its major advantages over select. If you have a thousand clients that are mostly idle,

then when you use select you're still paying for all one thousands of them. However, with Epoll, it's like you've only got a few - you're only paying for the ones that are active at any given time.

All this means that epoll will lead to less CPU usage for most workloads

Time Complexity

=============

Select -> O(n) Epoll -> O(1)

Select calls, which are O(n), epoll is an O(1) algorithm – this means that it scales well as the number of watched file descriptors increase.

select uses a linear search through the list of watched file descriptors, which causes its O(n) behaviour, whereas epoll uses callbacks in the kernel file structure.

Another fundamental difference of epoll is that it can be used in an edge-triggered, as opposed to level-triggered, fashion.

This means that you receive “hints” when the kernel believes the file descriptor has become ready for I/O, as opposed to being told “I/O can be carried out on this file descriptor”.

No of clients support is a Limitation in Select Call

==============================================

Using Select() call, Max number of clients it handle is 1024 (1k).

In other words, server is able to handle only 1024 client after which connections are failing.

Increased per process max open files (1024) to 100000 and still the connections failed at 1024.

select limitation

select fails after 1024 fds as FD_SETSIZE max to 1024.

As a natural progression poll was tried next to overcome max open fd issue.

poll limitation

poll solves the max fd issue. But as the number of concurrent clients started increasing, performance dropped drastically.

Poll implementation does O(n) operations internally and performance drops as number of fds increases.

epoll

Epoll solved both problems and gave awesome performance.

Triggering modes

=============

Edge Triggered Mode
Level Triggered Mode

Epoll provides both edge-triggered and level-triggered modes.

In edge-triggered mode, a call to epoll_wait will return only when a new event is en queued with the epoll object, while in level-triggered mode, epoll_wait will return as long as the condition holds.

For instance, if a pipe, registered with epoll, has received data, a call to epoll_wait will return, signaling the presence of data to be read.

Suppose the reader only consumed part of data from the buffer. In level-triggered mode, further calls to epoll_wait will return immediately, as long as the pipe's buffer contains data to be read.

In edge-triggered mode, however, epoll_wait will return only once new data is written to the pipe

To Understand Better…..

When an FD becomes read or write ready, you might not want necessarily want to read (or write) all the data immediately.

Level-triggered epoll will keep nagging you as long as the FD remains ready, whereas edge-triggered won't bother you again until the next time you get an EAGAIN

(so it's more complicated to code around, but can be more efficient depending on what you need to do).

Say you're writing from a resource to an FD. If you register your interest for that FD becoming write ready as level-triggered, you'll get constant notification that the FD is still ready for writing.

If the resource isn't yet available, that's a waste of a wake-up, because you can't write any more anyway.

If you were to add it as edge-triggered instead, you'd get notification that the FD was write ready once, then when the other resource becomes ready you write as much as you can.

Then if write(2) returns EAGAIN, you stop writing and wait for the next notification.

The same applies for reading, because you might not want to pull all the data into user-space before you're ready to do whatever you want to do with it

(thus having to buffer it, etc etc). With edge-triggered epoll you get told when it's ready to read, and then can remember that and do the actual reading "as and when".

EPOLL SYSTEM Calls

==================

The Epoll interface consists of three system calls:

int epoll_create(int size);

Creates an epoll object and returns its file descriptor. size is obsolete since kernel 2.6.8 but must be greater than zero for backwards compatibility.

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

Controls (configures) which file descriptors are watched by this object, and for which events. op can be ADD, MODIFY or DELETE.

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

Waits for any of the events registered for with epoll_ctl, until at least one occurs or the timeout elapses. Returns the occurred events in events, up to maxevents at once.

UDP SERVER IMPLEMENTED USING EPOLL

==========================================

#include <stdio.h>          // for printf() and fprintf()
#include <sys/socket.h>     // for socket(), bind(), and connect()
#include <arpa/inet.h>      // for sockaddr_in and inet_ntoa()
#include <stdlib.h>         // for atoi() and exit()
#include <string.h>         // for memset()
#include <unistd.h>         // for close()
#include <fcntl.h>          // for fcntl()
#include <errno.h>
#include <sys/epoll.h>

#define MAX_EVENTS 100

#define BUFFSIZE 5096

unsigned char buf[BUFFSIZE];

/*
* Dump Data
*/
void dumpData(unsigned char *data, unsigned int len)
{
unsigned int uIndx;

if(data)
    {
      for(uIndx=0; uIndx<len; ++uIndx)
        {
          if(uIndx%32 == 0)
            {
              printf("\n%4d:", uIndx);
            }
          if(uIndx%4 == 0)
            {
              printf(" ");
            }
          printf("%02x", data[uIndx]);
        }
    }
printf(" Length of Bytes: %d\n", len);
printf("\n");
}

/*
* make_socket_non_blocking :
*   This Function makes socket as Non blocking
*/
static int make_socket_non_blocking(int sockFd)
{
int getFlag, setFlag;

getFlag = fcntl(sockFd, F_GETFL, 0);

if(getFlag == -1)
{
    perror("fnctl");
    return -1;
}

/* Set the Flag as Non Blocking Socket */
getFlag |= O_NONBLOCK;

setFlag = fcntl(sockFd, F_GETFL, getFlag);

if(setFlag == -1)
{
    perror("fnctl");
    return -1;
}

return 0;
}

/*
* Main Routine
*/
int main()
{
int i, length, receivelen;

/* Socket Parameters */
int sockFd;
int optval = 1;   // Socket Option Always = 1

/* Server Address */
struct sockaddr_in serverAddr, receivesocket;

/* Epoll File Descriptor */
int epollFd;

/* EPOLL Event structures */
struct epoll_event ev;
struct epoll_event events[MAX_EVENTS];
int numEvents;

int ctlFd;
// Step 1: First Create UDP Socket

/* Create UDP socket
   * socket(protocol_family, sock_type, IPPROTO_UDP);
   */
sockFd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);

/* Check socket is successful or not */
if (sockFd == -1)
{
    perror(" Create SockFd Fail \n");
    return -1;
}

// Step 2: Make Socket as Non Blocking Socket.
//         To handle multiple clients Asychronously, required to
//         configure socket as Non Blocking socket

/* Make Socket as Non Blocking Socket */
make_socket_non_blocking(sockFd);

// Step 3: Set socket options
//    One can set different sock Options as RE-USE ADDR,
//    BROADCAST etc.

/* In this Program, the socket is set to RE-USE ADDR
   * So this gives flexibilty to other sockets to BIND to the
      same port Num */

if(setsockopt(sockFd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval))== -1)
{
     perror("setsockopt Fail\n");
     return -1;
}

// Step 4: Bind to the Recieve socket
/* Bind to its own port Num ( Listen on Port Number) */

/* Setup the addresses */

   /* my address or Parameters
     ( These are required for Binding the Port and IP Address )
      Bind to my own port and Address */
memset(&receivesocket, 0, sizeof(receivesocket));
receivesocket.sin_family = AF_INET;
receivesocket.sin_addr.s_addr = htonl(INADDR_ANY);
receivesocket.sin_port = htons(2905);

receivelen = sizeof(receivesocket);

/* Bind the my Socket */
if (bind(sockFd, (struct sockaddr *) &receivesocket, receivelen) < 0)
{
    perror("bind");
    return -1;
}

// EPOLL Implementation Starts
// Step 5: Create Epoll Instance
             /* paramater is Optional */

epollFd = epoll_create(6);

if(epollFd == -1)
{
     perror("epoll_create");
     return -1;
}

/* Add the udp Sock File Descriptor to Epoll Instance */
ev.data.fd = sockFd;

/* Events are Read Only and Edge-Triggered */
ev.events = EPOLLIN | EPOLLET;

// Step 6: control interface for an epoll descriptor
/* EPOLL_CTL_ADD
      Register the target file descriptor fd on the epoll instance
      referred to by the file descriptor epfd and
      associate the event event with the internal file linked to fd.
*/

/* Add the sock Fd to the EPOLL */
ctlFd = epoll_ctl (epollFd, EPOLL_CTL_ADD, sockFd, &ev);

if (ctlFd == -1)
{
    perror ("epoll_ctl");
    return -1;
}

// Step 7: Start the Event Loop using epoll_wait() in while Loop.

/* Event Loop */
while(1)
{
     /* Wait for events.
      * int epoll_wait(int epfd, struct epoll_event *events, int
      * maxevents, int timeout);
      * Specifying a timeout of -1 makes epoll_wait() wait
      * indefinitely.
      */

     /* Epoll Wait Indefently since Time Out is -1 */
     numEvents = epoll_wait(epollFd, events, MAX_EVENTS, -1);

     for (i = 0; i < numEvents; i++)
     {
       if ((events[i].events & EPOLLERR) ||
           (events[i].events & EPOLLHUP) ||
           (!(events[i].events & EPOLLIN)))
        {
           /* An error has occured on this fd, or the socket is not
            * ready for reading (why were we notified then?)
            */
           fprintf (stderr, "epoll error\n");
           close (events[i].data.fd);
           continue;
        }
       /* We have data on the fd waiting to be read. Read and
        * display it. We must read whatever data is available
        * completely, as we are running in edge-triggered mode
        * and won't get a notification again for the same data.
        */
       else if ( (events[i].events & EPOLLIN) &&
           (sockFd == events[i].data.fd) )
       {
         while (1)
         {

           memset(buf, 0, BUFFSIZE);
           /* Recieve the Data from Other system */
           if ((length = recvfrom(sockFd, buf, BUFFSIZE, 0, NULL, NULL)) < 0)
            {
                perror("recvfrom");
                return -1;
            }

           else if(length == 0)
             {
               printf( " The Return Value is 0\n");
               break;
             }
           else
             {
               /* Print The data */
               printf("Recvd Byte length : %d", length);
               dumpData(buf, length);
             }
          }
       }
     }
}

close( sockFd );
close( epollFd );
return 0;
}

==============================================================================
UDP CLIENT -> udpclient.c
==============================================================================
#include <stdio.h>
#include <arpa/inet.h>
#include <string.h>
#include<stdlib.h>
#include <sys/unistd.h>
#include <sys/fcntl.h>

#define BUFFSIZE 5096
#define MAX_LEN 100000

int sendlen, receivelen;
int received = 0, i,count, rcvCnt=0, sentCnt=0;
unsigned char buffer[BUFFSIZE];
struct sockaddr_in receivesocket;
struct sockaddr_in sendsocket;
int sock;
unsigned int ch;
unsigned int noOfTimes;

int sendUDPData();

int main(int argc, char *argv[]) {

    int ret = 0;
int optval = 1;

    /* Create the UDP socket */
    if ((sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP)) < 0) {
       perror("socket");
       return -1;
    }

    /* my address */
    memset(&receivesocket, 0, sizeof(receivesocket));
    receivesocket.sin_family = AF_INET;
    receivesocket.sin_addr.s_addr = htonl(INADDR_ANY);
    receivesocket.sin_port = htons(2905);

    receivelen = sizeof(receivesocket);

if(setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval))== -1)
{
     perror("setsockopt Fail\n");
     return -1;
}
if (bind(sock, (struct sockaddr *) &receivesocket, receivelen) < 0)
{
       perror("bind");
       return -1;
    }
    /* kernel address */
    memset(&sendsocket, 0, sizeof(sendsocket));
    sendsocket.sin_family = AF_INET;
    sendsocket.sin_addr.s_addr = inet_addr("10.12.7.95");
    sendsocket.sin_port = htons(2905);

   do
    {
       printf("\n");
       printf(" Enter your choice:\t");
       printf(" 1. Send UDP Data \n");
       printf(" 2. exit \n");
       scanf("%d", &ch);
       printf("\n");

       switch(ch)
       {

           case 1:
                   printf("Enter the Length of the Payload \n");
                   scanf("%d", &sendlen);
                   printf("Enter How many times you want to send data \n");
                   scanf("%d", &noOfTimes);
                   sendUDPData();
                   break;

           default:
                  printf("Invalid Choice\n");
                  break;
       }
    }while(ch!=2);
return 0;
}
int sendUDPData()
{
    int count=0;
        memset(buffer, 0x31, sendlen);

        for(count=0; count< noOfTimes; count++)
        {
       if (sendto(sock, buffer, sendlen, 0, (struct sockaddr *) &sendsocket,
                       sizeof(sendsocket)) != sendlen)
       {
       perror("sendto");
       return -1;
       }

    printf("\n");
    }
    return 0;
}

19 comments:

Anonymous21 April 2014 at 05:05
thanks
ReplyDelete
Replies
Anonymous24 April 2014 at 00:44
+1 ty
ReplyDelete
Replies
Anonymous28 May 2014 at 02:39
Thank you.
ReplyDelete
Replies
Unknown4 July 2014 at 16:18
This comment has been removed by the author.
ReplyDelete
Replies
Anonymous13 August 2015 at 22:45
Thank you for your article.
ReplyDelete
Replies
Anonymous26 June 2020 at 04:14
aws training in Bangalore | aws online training in Bangalore
artificial intelligence training in bangalore | artificial intelligence online training
machine learning training in bangalore | machine learning online training
data science training in bangalore | data science online training

ReplyDelete
Replies
Indigoliath27 September 2023 at 15:30
Kavaklıdere
Güzelhisar
Dikili
İnecik
Kuluca
UOX0J
ReplyDelete
Replies
Yüce56 October 2023 at 10:16
elazığ
tekirdağ
kars
sakarya
antep
VPLWO
ReplyDelete
Replies
Adiless7 October 2023 at 05:12
görüntülü show
ücretlishow
W342K5
ReplyDelete
Replies
D633AJimmie531C45 November 2023 at 19:18
EDD29
Niğde Evden Eve Nakliyat
Muğla Parça Eşya Taşıma
Iğdır Lojistik
Bolu Evden Eve Nakliyat
Erzincan Evden Eve Nakliyat
ReplyDelete
Replies
0E010ErikaDFD548 November 2023 at 13:32
C3E40
Uşak Parça Eşya Taşıma
Mersin Şehirler Arası Nakliyat
Malatya Lojistik
Aydın Şehir İçi Nakliyat
Bingöl Şehirler Arası Nakliyat
Çerkezköy Parke Ustası
Kucoin Güvenilir mi
Ordu Evden Eve Nakliyat
Edirne Şehir İçi Nakliyat
ReplyDelete
Replies
F018AJason3B3D24 December 2023 at 04:49
DC44C
komisyon indirimi %20
ReplyDelete
Replies
BA967Allison0738322 December 2023 at 16:31
C36EC
en iyi sesli sohbet uygulamaları
sohbet siteleri
yabancı sohbet
erzurum mobil sohbet siteleri
en iyi sesli sohbet uygulamaları
canlı görüntülü sohbet
ücretsiz sohbet uygulamaları
bedava görüntülü sohbet sitesi
tunceli bedava sohbet chat odaları
ReplyDelete
Replies
D8CB7CourtneyE7B0323 December 2023 at 21:37
66D50
Erzincan Kızlarla Rastgele Sohbet
kayseri kızlarla rastgele sohbet
en iyi rastgele görüntülü sohbet
düzce seslı sohbet sıtelerı
görüntülü sohbet uygulamaları ücretsiz
mobil sohbet chat
canli sohbet
Sivas Ücretsiz Görüntülü Sohbet
şırnak görüntülü sohbet canlı
ReplyDelete
Replies
AA82BPatricia3702D8 January 2024 at 15:27
9C333
Referans Kimliği Nedir
Kwai Beğeni Satın Al
Sohbet
Btcturk Borsası Güvenilir mi
Btcturk Borsası Güvenilir mi
Bitcoin Mining Nasıl Yapılır
Kripto Para Nasıl Alınır
Periscope Takipçi Hilesi
Periscope Takipçi Satın Al
ReplyDelete
Replies
5AA92JayleeE541317 January 2024 at 19:14
A3C49
Bitcoin Oynama
Btcst Coin Hangi Borsada
Bitcoin Madenciliği Nedir
Soundcloud Reposts Satın Al
Arg Coin Hangi Borsada
Ergo Coin Hangi Borsada
Telegram Abone Satın Al
Görüntülü Sohbet Parasız
Ön Satış Coin Nasıl Alınır
ReplyDelete
Replies
014ECElizabethAF55311 March 2024 at 21:24
4D855
Kınık
Çavdır
Kaynarca
Mutki
Niğde
Refahiye
Mahmudiye
Manavgat
Dereli
ReplyDelete
Replies

Add comment