In dnsdist, we try to reuse TCP connection to Downstream servers
as much as possible. However, when sending the size of a new
query, we didn't properly handle a connection being closed by the
downstream server.
Turns out, writing tests actually help finding bugs, who
would have thought?
downstream_failures++;
goto retry;
}
-
- writen2WithTimeout(dsock, query, qlen, ds->tcpSendTimeout);
-
+
+ try {
+ writen2WithTimeout(dsock, query, qlen, ds->tcpSendTimeout);
+ }
+ catch(const runtime_error& e) {
+ vinfolog("Downstream connection to %s died on us, getting a new one!", ds->getName());
+ close(dsock);
+ sockets[ds->remote]=dsock=setupTCPDownstream(ds->remote);
+ downstream_failures++;
+ goto retry;
+ }
+
if(!getNonBlockingMsgLen(dsock, &rlen, ds->tcpRecvTimeout)) {
vinfolog("Downstream connection to %s died on us phase 2, getting a new one!", ds->getName());
close(dsock);