Nick Mathewson [Fri, 10 Feb 2012 16:24:51 +0000 (11:24 -0500)]
In the kqueue backend, do not report EBADF as an EV_READ
We were doing this because of (correct) reports that NetBSD gives an
EBADF when you try to add the write side of a pipe for which the
read side has been closed. But on most kqueue platforms, that
doesn't happen, and on *all* kqueue platforms, reporting a
nonexistent fd (which we usually have if we have seen EBADF) as
readable tends to give programs a case of the vapors.
Nicholas Marriott wrote the original patch here; I did the comment
fixes.
Nick Mathewson [Mon, 6 Feb 2012 17:24:49 +0000 (12:24 -0500)]
Loop on filtering SSL reads until we are blocked or exhausted.
This is not a perfect fix, but it's much much better than the
current buggy behavior, which could lead to filtering SSL
connections that just stopped reading.
Based on ideas by Maseeb Abdul Qadir and Mark Ellzey.
Nick Mathewson [Mon, 23 Jan 2012 22:59:16 +0000 (17:59 -0500)]
Fix a list corruption bug when using event_reinit() with signals present
While re-adding all the events, event_reinit() could add a signal
event, which could then cause evsig_add() to add the
base->sig.ev_signal event. Later on its merry path through
base->eventqueue, event_reinit() would find that same event and give
it to event_io_add a second time. This would make the ev_io_next
list for that fd become circular. Ouch!
Nick Mathewson [Mon, 9 Jan 2012 21:44:53 +0000 (16:44 -0500)]
Fix a race condition in the dns/bufferevent_connect_hostname test.
As originally written, the test would only pass if the accept()
callbacks for the evconnlistener were all invoked before the last of
the CONNECTED/ERROR callbacks for the connecting/resolving bufferevent
had its call to event_base_loopexit() complete. But this was only
accidentally true in 2.0, and might not be true at all in 2.1 where
we schedule event_base_once() callbacks more aggressively.
Nick Mathewson [Mon, 9 Jan 2012 16:49:41 +0000 (11:49 -0500)]
Make evconnlistener work around bug in older Linux when getting nmapped
Older Linuxes sometimes respond to some nmap probes by having accept()
return a success but with socklen 0. That can lead to confusing behavior
when you go to process the sockaddr.
Nick Mathewson [Mon, 5 Dec 2011 20:02:27 +0000 (15:02 -0500)]
Be absolutely sure to clear pncalls before leaving event_signal_closure
I thought we'd fixed the cases where this could come up, but
apparently having an event_base_break() happen while processing
signal events could get us in trouble.
Found by Remi Gacogne. Sourceforge issue 3451433 .
Mark Ellzey [Thu, 17 Nov 2011 16:59:41 +0000 (11:59 -0500)]
Avoid spinning on OpenSSL reads
Previously, if some sender were generating data to read on an
OpenSSL connection as fast as we could process it, we could easily
wind up looping on an openssl do_read operation without ever
considering other sockets.
The difference between this and the original method in
consider_reading() is that it only loops for a single completed
*frame* instead of looping until fd is drained or an error condition
was triggered.
Mark Ellzey [Mon, 14 Nov 2011 15:24:07 +0000 (10:24 -0500)]
Avoid potential SSL read spinlocks
OpenSSL bufferevents with deferred callbacks enabled under high load will
spinlock in the function consider_reading(). This loop continues until all
data has been read.
Because of this condition; openssl bufferevents will never return back into
event_base_loop() until SSL_read has determined data is no longer ready.
As of yet I have not found a reason why this while loop exists, so this patch
just swaps out while for if.
If needed I can write same code which would trigger this effect; optionally
libevhtp has a test.c program which can be run with the following flags:
The return data will include the number of times the readcb got data and the
length of that read.
Without this patch, you are likely to see a small amount of "bytes read....",
otherwise the "bytes read..." return data should show much more reasonable
numbers.
Leonid Evdokimov [Wed, 19 Oct 2011 18:38:37 +0000 (22:38 +0400)]
Empty DNS reply with OK status is another way to say NODATA.
Sometimes DNS reply has nothing but query section. It does not look like
error, so it should be treated as NODATA with TTL=0 as soon as there is
no SOA record to deduce negative TTL from.
Nick Mathewson [Thu, 29 Sep 2011 13:30:04 +0000 (09:30 -0400)]
Prefer mmap to sendfile unless a DRAINS_TO_FD flag is set. Allows add_file to work with SSL.
The sendfile() implementation for evbuffer_add_file is potentially more
efficient, but it has a problem: you can only use it to send bytes over
a socket using sendfile(). If you are writing bytes via SSL_send() or
via a filter, or if you need to be able to inspect your buffer, it
doesn't work.
As an easy fix, this patch disables the sendfile-based implementation of
evbuffer_add_file on an evbuffer unless the user sets a new
EVBUFFER_FLAG_DRAINS_TO_FD flag on that evbuffer, indicating that the
evbuffer will not be inspected, but only written out via
evbuffer_write(), evbuffer_write_atmost(), or drained with stuff like
evbuffer_drain() or evbuffer_add_buffer(). This flag is off by
default, except for evbuffers used for output on bufferevent_socket.
In the future, it could be interesting to make a best-effort file
segment implementation that tries to send via sendfile, but mmaps on
demand. That's too much complexity for a stable release series, though.
Nick Mathewson [Wed, 24 Aug 2011 22:41:35 +0000 (18:41 -0400)]
Make IOCP rate-limiting group support stricter and less surprising.
Previously, we wouldn't decrement read/write buckets because of IOCP
reads and writes until those reads and writes were complete. That's
not so bad on the per-connection front. But for group limits, the
old approach makes us launch a huge amount of reads and writes
whenever the group limit becomes positive, and then decrement the
limit to a hugely negative number as they complete.
With this patch, we decrement our read buckets whenever we launch an
IOCP read or write, based on the maximum that tried to read or
write. Later, when the operations finish, we re-increment the
bucket based on the portion of the request that couldn't finish.