Improve udp tunnel stability#406
Conversation
| } | ||
| } | ||
|
|
||
| func (t *Tunnel) getSSHNoWait(ctx context.Context) ssh.Conn { |
There was a problem hiding this comment.
isn't getSSH equivalent when c != nil ?
There was a problem hiding this comment.
Yes you are right.
Remove the wait may also cause high cpu usage, I will revert this change.
| return u.Errorf("inbound-udpchan: %w", err) | ||
| u.Errorf("inbound-udpchan: %w", err) | ||
| // wait for a short while for the ssh connection | ||
| time.Sleep(time.Duration(100) * time.Millisecond) |
There was a problem hiding this comment.
instead of sleeping, can use channels to know when the ssh connection is ready?
There was a problem hiding this comment.
The reason for adding sleep before is that, when the connection is lost, getSSH() will return nil immediately, and the runOutbound() loop will take up a lot of memory and cause the process to be killed.
There was a problem hiding this comment.
Ohh the loop spins? I haven’t seen that happen before though that sounds like a bug, where the real fix might be to block with a waitgroup
There was a problem hiding this comment.
After connection lost, activatingConnWait() cannot block, because t.activatingConn.DoneAll() is called.
Removing DoneAll() fixes the issue.
| } | ||
|
|
||
| func (t *Tunnel) Close() { | ||
| t.activatingConn.DoneAll() |
There was a problem hiding this comment.
need a lock/unlock + nil check?
There was a problem hiding this comment.
activatingConn is a waitGroup, it is value type and thread safe, so no need for lock and nil checks
There was a problem hiding this comment.
ah yea that's right, its a custom wait group (which might be a bad idea itself - anyway)
it should block while disconnected/connecting and it should not-block while connected
DoneAll here would stop blocking, but Tunnel Done means closed?
how does this improve UDP? sorry, I think I'm missing something
There was a problem hiding this comment.
In the original code, when the connection is lost, DoneAll in the BindSSH method will be called, which causes activatingConnWait() to not work and cannot block.
This leads to high resource usage of runOutbound in udpListener. Calling u.getUDPChan(ctx) will immediately gets nil instead of blocking.
Close is added to correctly release resources in server mode, because DoneAll was removed from BindSSH.
In client mode, there is no need to call DoneAll because the same tunnel is always used.
#405