Post by Brian UtterbackAssuming you could find a way to dump the array, doesn't this just give
you a list of port whose connections are currently in CLOSE_WAIT?
Wouldn't netstat give you the same info?
Instead of setting the array value to 1, you could set it to the value
of walltimestamp. That way when you dumped it out, you would have the
time it went into CLOSE_WAIT, which would give you an indication of
which ones were in the state the longest. I wonder if you could get an
aggregation to work here? Hmm.
In about 99 and 44/100ths percent of the cases I've looked at in the
past, what appears to be a "leak" is actually something exacerbated by
the OS.
What I usually see is that the application opens a socket (via socket()
or accept()), does some work, and then closes the socket normally.
Unbeknownst to the application, part of that "work" involved a fork(),
perhaps buried in a library somewhere. (The free fork() given out to
users of syslog() employing LOG_CONS was once a possible cause, but
there are others.)
The fork() logic duplicates all of the open file descriptors, and the
code calling fork() in this case doesn't "know" that there are
descriptors that it shouldn't be copying so it can't easily close them
afterwards. It's the new process -- possibly completely unknown to the
main application -- that's still holding the socket open, allowing it to
slip into CLOSE_WAIT state.
For that reason, I think any CLOSE_WAIT diagnostic function should at
least track the fork() descriptor duplication and allow you to trace
back to the application that "leaked" descriptors by way of creating new
processes.
(Would be nice to have something like z/OS's FCTLCLOFORK or the
sometimes-discussed Linux FD_DONTINHERIT flag.)
--
James Carlson 42.703N 71.076W <carlsonj-dlRbGz2WjHhmlEb+***@public.gmane.org>