Critical: File descriptor and goroutine leak in HandleConnection causing "too many open files" under load #56

Closed
opened 2025-12-31 15:22:21 +00:00 by bagas · 0 comments
Owner

Problem

HandleConnection leaks file descriptors and goroutines, causing "too many open files" errors under stress testing.

Root Cause

The function only waits for ONE goroutine to finish instead of both:

<-done  // Only waits for first goroutine
// defer cleanup runs while second goroutine still active

This leaves the second goroutine running and prevents proper cleanup of SSH channels and file descriptors.

Evidence

Debug logs show channels accumulating with buffered data stuck in half-closed state:

#1011 35.---.---.--- (t4 [forwarded-tcpip] r1003 i3/0 o0/0 e[closed]/0 fd 1017/1017/-1 sock 1017 cc -1 io 0x00/0x00)
#1012 35.---.---.--- (t4 [forwarded-tcpip] r993 i3/0 o0/0 e[closed]/0 fd 1018/1018/-1 sock 1018 cc -1 io 0x00/0x00)
#1013 35.---.---.--- (t4 [forwarded-tcpip] r1012 i3/0 o0/0 e[closed]/0 fd 1019/1019/-1 sock 1019 cc -1 io 0x00/0x00)
#1014 35.---.---.--- (t4 [forwarded-tcpip] r998 i3/0 o0/0 e[closed]/0 fd 1020/1020/-1 sock 1020 cc -1 io 0x00/0x00)
#1015 35.---.---.--- (t4 [forwarded-tcpip] r1015 i3/0 o0/0 e[closed]/0 fd 1021/1021/-1 sock 1021 cc -1 io 0x00/0x00)
#1016 35.---.---.--- (t4 [forwarded-tcpip] r1016 i3/0 o0/0 e[closed]/0 fd 1022/1022/-1 sock 1022 cc -1 io 0x00/0x00)
#1017 35.---.---.--- (t4 [forwarded-tcpip] r1017 i3/0 o0/0 e[closed]/0 fd 1023/1023/-1 sock 1023 cc -1 io 0x00/0x00)

File descriptors reached 1023+ before the process was killed.

Fix

Wait for both goroutines:

Impact

  • Severity: Critical - causes production crashes
  • All forwarded-tcpip connections affected
## Problem HandleConnection leaks file descriptors and goroutines, causing "too many open files" errors under stress testing. ## Root Cause The function only waits for ONE goroutine to finish instead of both: ```go <-done // Only waits for first goroutine // defer cleanup runs while second goroutine still active ``` This leaves the second goroutine running and prevents proper cleanup of SSH channels and file descriptors. ## Evidence Debug logs show channels accumulating with buffered data stuck in half-closed state: ``` #1011 35.---.---.--- (t4 [forwarded-tcpip] r1003 i3/0 o0/0 e[closed]/0 fd 1017/1017/-1 sock 1017 cc -1 io 0x00/0x00) #1012 35.---.---.--- (t4 [forwarded-tcpip] r993 i3/0 o0/0 e[closed]/0 fd 1018/1018/-1 sock 1018 cc -1 io 0x00/0x00) #1013 35.---.---.--- (t4 [forwarded-tcpip] r1012 i3/0 o0/0 e[closed]/0 fd 1019/1019/-1 sock 1019 cc -1 io 0x00/0x00) #1014 35.---.---.--- (t4 [forwarded-tcpip] r998 i3/0 o0/0 e[closed]/0 fd 1020/1020/-1 sock 1020 cc -1 io 0x00/0x00) #1015 35.---.---.--- (t4 [forwarded-tcpip] r1015 i3/0 o0/0 e[closed]/0 fd 1021/1021/-1 sock 1021 cc -1 io 0x00/0x00) #1016 35.---.---.--- (t4 [forwarded-tcpip] r1016 i3/0 o0/0 e[closed]/0 fd 1022/1022/-1 sock 1022 cc -1 io 0x00/0x00) #1017 35.---.---.--- (t4 [forwarded-tcpip] r1017 i3/0 o0/0 e[closed]/0 fd 1023/1023/-1 sock 1023 cc -1 io 0x00/0x00) ``` File descriptors reached 1023+ before the process was killed. ## Fix Wait for both goroutines: ## Impact - Severity: **Critical** - causes production crashes - All forwarded-tcpip connections affected
bagas self-assigned this 2025-12-31 15:22:21 +00:00
bagas closed this issue 2025-12-31 17:22:55 +00:00
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: bagas/tunnel-please#56