温故 epoll

来源：互联网发布：王师傅c语言视频编辑：程序博客网时间：2024/06/02 18:42

select, poll, epoll 是三种不同的I/O事件通知机制. select最简单,poll次之,epoll要麻烦点,在于它的边缘触发和水平触发容易产生一些陷阱.

基本上来说, 网络编程中雷区多多，所以轻易不要直接用socket api来写网络应用程序，使用 ACE, Boost, etc.之类的网络库比较保险一些。

但是，终归出了问题或者调优还得知道里面到底是怎么玩的，这里面 epoll 稍微复杂一点，有必要好好研究研究一下epoll的系统说明

http://linux.die.net/man/4/epoll

现代Linux系统对于epoll 提供了三个系统调用

epoll_create

epoll_ctl

epoll_wait

An epoll set is connected to a file descriptor created by epoll_create(2). Interest for certain file descriptors is then registered via epoll_ctl(2). Finally, the actual wait is started by epoll_wait(2).

The epoll event distribution interface is able to behave both as Edge Triggered ( ET ) and Level Triggered ( LT ). The difference between ET and LT event distribution mechanism can be described as follows. Suppose that this scenario happens : 边缘触发ET和水平触发LT有什么区别?

The file descriptor that represents the read side of a pipe ( RFD ) is added inside the epoll device.
Pipe writer writes 2Kb of data on the write side of the pipe.
A call to epoll_wait(2) is done that will return RFD as ready file descriptor.
The pipe reader reads 1Kb of data from RFD.
A call to epoll_wait(2) is done.

If the RFD file descriptor has been added to the epoll interface using the EPOLLET flag, the call to epoll_wait(2) done in step 5 will probably hang because of the available data still present in the file input buffers and the remote peer might be expecting a response based on the data it already sent.

在边缘触发方式下，第5步或许会hang住，因为依然有 data 在输入缓冲里，而发送数据的远端可能正在期望对于发送的数据的一个响应

The reason for this is that Edge Triggered event distribution delivers events only when events happens on the monitored file. So, in step 5 the caller might end up waiting for some data that is already present inside the input buffer.

In the above example, an event on RFD will be generated because of the write done in 2 , and the event is consumed in 3. Since the read operation done in 4 does not consume the whole buffer data, the call to epoll_wait(2) done in step 5 might lock indefinitely.

The epoll interface, when used with the EPOLLET flag ( Edge Triggered ) should use non-blocking file descriptors to avoid having a blocking read or write starve the task that is handling multiple file descriptors.

边缘触发一定要用非阻塞方式,建议使用边缘触发，原因在于不会这种方式不会对同一不变的状态多次触发，仅在状态改变才作通知

The suggested way to use epoll as an Edge Triggered ( EPOLLET ) interface is below, and possible pitfalls to avoid follow.

需要注意的事项有：

1) with non-blocking file descriptors

使用非阻塞文件描述符

2) by going to wait for an event only after read(2) or write(2) return EAGAIN

只有在 read/write返回 EAGAIN 之后才去等待其他事件

On the contrary, when used as a Level Triggered interface, epoll is by all means a faster poll(2), and can be used wherever the latter is used since it shares the same semantics.

Since even with the Edge Triggered epoll multiple events can be generated up on receival of multiple chunks of data, the caller has the option to specify theEPOLLONESHOT flag, to tell epoll to disable the associated file descriptor after the receival of an event with epoll_wait(2).

When the EPOLLONESHOT flag is specified, it is caller responsibility to rearm the file descriptor using epoll_ctl(2) with EPOLL_CTL_MOD.

常用流程为
1) Add the fd into epoll watch set

int fdEpoll = epoll_create(MAX_FD_SIZE); // the parameter is ignored since Linux 2.6.8, but should be greater than 0if(fdEpoll < 0)    return -1;struct epoll_event evt;int sock;memset(&evt, 0, sizeof(evt));evt.events = EPOLLIN;evt.data.fd=sock;int nRet = epoll_ctl(fdEpoll, EPOLL_CTL_ADD, sock, &evt);if(nRet < 0)    return -2;

2) wait the interesting event trigger

struct epoll_event ev, *events;for(;;) {    nfds = epoll_wait(kdpfd, events, maxevents, -1);    for(n = 0; n < nfds; ++n) {        if(events[n].data.fd == listener) {            client = accept(listener, (struct sockaddr *) &local,                            &addrlen);            if(client < 0){                perror("accept");                continue;            }            setnonblocking(client);            ev.events = EPOLLIN | EPOLLET;            ev.data.fd = client;            if (epoll_ctl(kdpfd, EPOLL_CTL_ADD, client, &ev) < 0) {                fprintf(stderr, "epoll set insertion error: fd=%d0,                        client);                return -1;            }        }        else            do_use_fd(events[n].data.fd);    }}

FAQ
----------------------

Q.Do I need to continuously read/write an fd until EAGAIN when using the EPOLLET flag ( Edge Triggered behaviour ) ?

A. No you don't. Receiving an event from epoll_wait(2) should suggest to you that such file descriptor is ready for the requested I/O operation. You have simply to consider it ready until you will receive the next EAGAIN. When and how you will use such file descriptor is entirely up to you. Also, the condition that the read/write I/O space is exhausted can be detected by checking the amount of data read/write from/to the target file descriptor. For example, if you callread(2) by asking to read a certain amount of data and read(2) returns a lower number of bytes, you can be sure to have exhausted the read I/O space for such file descriptor. Same is valid when writing using the write(2) function.

注意事项

--------------------------

在边缘触发方式下，保险的方式是一条道跑到黑，读到read return -1, errno = EAGAIN (读完了) 或 read return 0 (连接关闭了)
当然，如果你知道了缓冲区中有多少数据，通过read 想读取一定量的数据，结果只返回了很少的数据，那么也可得知要读取的缓冲空间已经耗尽了？通过 ioctl 的 FIONREAD 命令?
从Linux 2.6.17 开始，可以用 EPOLLRDHUP 来检测socket 的正常关闭, 注意不是 EPOLLHUP (这个是指异常的关闭).