Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cbtls_info() to report "Need to write more data" when called on exit … #5342

Open
wants to merge 1 commit into
base: v3.2.x
Choose a base branch
from

Conversation

martinsta
Copy link

…of a handshake function

The current code does seem to miss to cater for the cases when a TLS handshake exits and there is data left that needs either to be extracted from the BIO and/or sent to the other side. Differentiating and logging such cases within cbtls_info() might aid in further debugging any stalled handshakes followed up by timing out requests we experience with the up to date 3.2.x head.

…of a handshake function

The current code does seem to miss to cater for the cases when a TLS handshake exits and there is data left that needs either to be extracted from the BIO and/or sent to the other side. Differentiating and logging such cases within cbtls_info() might aid in further debugging any stalled handshakes followed up by timing out requests we experience with the up to date 3.2.x head.
@alandekok
Copy link
Member

It may also be useful to check for SSL_want_write() in src/main/tls_listen.c A sample patch is attached.

It may also be good to apply a similar patch to proxy_tls_read() or `proxy_tls_send()

patch.gz

alandekok added a commit that referenced this pull request Jun 18, 2024
@alandekok
Copy link
Member

Perhaps this patch might help, too? I haven't tried it though.

patch.gz

@martinsta
Copy link
Author

Thanks for having a look at this. I don't seem to get more info with patches applied but the server seems to stall some time after attempting a RadSec connection to a home_server. I'm providing more info from the previous non-patched run. This is an "SSL_connect() returned WANT_WRITE" with "(TLS) has 0 bytes in the buffer" situation. TLS handshake magic?

AFAIU, for a BIO_read() to be called from tls_handshake_send() there currently have to be some un-encrypted data in 'clean_in'. The condition in tls_handshake_send() looks like this:
if (ssn->clean_in.used > 0) {

excerpt from a debug log follows:

...
Tue Jun 18 10:40:13 2024 : Debug: (41) proxy: Trying to allocate ID (0/2)
Tue Jun 18 10:40:13 2024 : Debug: (41) proxy: Trying to open a new listener to the home server
Tue Jun 18 10:40:13 2024 : Debug: (TLS) Trying new outgoing proxy connection to proxy (0.0.0.0, 0) -> home_server (a.b.c.d, 2083)
Tue Jun 18 10:40:13 2024 : Debug: Requiring Server certificate
Tue Jun 18 10:40:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Handshake state [PINIT] - before SSL initialization (0)
Tue Jun 18 10:40:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Handshake state [PINIT] - Client before SSL initialization (0)
Tue Jun 18 10:40:13 2024 : Debug: (0) (TLS) RADIUS/TLS - send TLS 1.2 Handshake, ClientHello
Tue Jun 18 10:40:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Client : Need to write more data: SSLv3/TLS write client hello
Tue Jun 18 10:40:13 2024 : Debug: Opened new proxy socket 'proxy (e.f.g.h, 60419) -> home_server (a.b.c.d, 2083)'
Tue Jun 18 10:40:13 2024 : Debug: (41) proxy: Trying to allocate ID (1/2)
Tue Jun 18 10:40:13 2024 : Debug: (41) proxy: request is now in proxy hash
Tue Jun 18 10:40:13 2024 : Debug: (41) proxy: allocating destination a.b.c.d port 2083 - Id 70
Tue Jun 18 10:40:13 2024 : Debug: (41) Proxying request to home server a.b.c.d port 2083 (TLS) timeout 30.000000
Tue Jun 18 10:40:13 2024 : Debug: (41) Sent Access-Request Id 70 from e.f.g.h:60419 to a.b.c.d:2083 length 1988
...
Tue Jun 18 10:40:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Client : Need to write more data: SSLv3/TLS write client hello
Tue Jun 18 10:40:13 2024 : Debug: (TLS) SSL_connect() returned WANT_WRITE
Tue Jun 18 10:40:13 2024 : Debug: (41) (TLS) has 0 bytes in the buffer
Tue Jun 18 10:40:13 2024 : Debug: (41) (TLS) Saving 1988 bytes of RADIUS traffic for later (total 1988)
Tue Jun 18 10:40:13 2024 : Debug: Thread 3 waiting to be assigned a request
Tue Jun 18 10:40:13 2024 : Debug: Waking up in 0.3 seconds.
Tue Jun 18 10:40:13 2024 : Debug: Listening on proxy (e.f.g.h, 60419) -> home_server (a.b.c.d, 2083)
Tue Jun 18 10:40:13 2024 : Debug: Waking up in 0.3 seconds.
Tue Jun 18 10:40:13 2024 : Debug: (41) Expecting proxy response no later than 29.670994 seconds from now
Tue Jun 18 10:40:43 2024 : Debug: (41) No proxy response, giving up on request and marking it done
Tue Jun 18 10:40:43 2024 : Proxy: Marking home server a.b.c.d port 2083 as zombie (it has not responded in 30.000000 seconds).
Tue Jun 18 10:40:43 2024 : ERROR: (41) Failing proxied request for user "test@exampe.com", due to lack of any response from home server a.b.c.d port 2083
...

@alandekok
Copy link
Member

alandekok commented Jun 19, 2024 via email

@martinsta
Copy link
Author

On Wed, Jun 19, 2024 at 10:57:53AM -0700, Alan DeKok wrote:
Yes. There appears to be data in SSL which we need to write out.

I've attached a patch from the current v3.2.x branch. It should replace the previous patch for src/main/tls_listen.c

I hope this works. Looking at the code and OpenSSL, it should be the correct fix,

Thanks, I'd be happy to test the new patch. I'm afraid the attachment's gone astray.

@alandekok
Copy link
Member

This patch should work.

patch.gz

@martinsta
Copy link
Author

Thanks for another round. With the patch applied and BIO_read() getting called the server reports "Debug: (TLS) SSL_connect() writing 0 bytes to the network".

What's that? SSL_connect() left in handshake, SSL_get_error() returns WANT_WRITE and there is no data available to read from BIO? Does it need to retry? Or is this a different king of issue?

30 seconds later the original request timeouts, gets rejected, the server then logs "Debug: Proxy SSL socket has data to read" (possibly also in relation to another request) and locks up.

excerpt from a debug log follows:

Wed Jun 19 23:12:13 2024 : Debug: (TLS) Trying new outgoing proxy connection to proxy (0.0.0.0, 0) -> home_server (a.b.c.d, 2083)
Wed Jun 19 23:12:13 2024 : Debug: Requiring Server certificate
Wed Jun 19 23:12:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Handshake state [PINIT] - before SSL initialization (0)
Wed Jun 19 23:12:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Handshake state [PINIT] - Client before SSL initialization (0)
Wed Jun 19 23:12:13 2024 : Debug: (0) (TLS) RADIUS/TLS - send TLS 1.2 Handshake, ClientHello
Wed Jun 19 23:12:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Client : Need to write more data: SSLv3/TLS write client hello
Wed Jun 19 23:12:13 2024 : Debug: Opened new proxy socket 'proxy (e.f.g.h, 36579) -> home_server (a.b.c.d, 2083)'
Wed Jun 19 23:12:13 2024 : Debug: (22) proxy: Trying to allocate ID (1/2)
Wed Jun 19 23:12:13 2024 : Debug: (22) proxy: request is now in proxy hash
Wed Jun 19 23:12:13 2024 : Debug: (22) proxy: allocating destination a.b.c.d port 2083 - Id 54
Wed Jun 19 23:12:13 2024 : Debug: (22) Proxying request to home server a.b.c.d port 2083 (TLS) timeout 30.000000
Wed Jun 19 23:12:13 2024 : Debug: (22) Sent Access-Request Id 54 from e.f.g.h:36579 to a.b.c.d:2083 length 1988
...
Wed Jun 19 23:12:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Client : Need to write more data: SSLv3/TLS write client hello
Wed Jun 19 23:12:13 2024 : Debug: (TLS) SSL_connect() returned WANT_WRITE
Wed Jun 19 23:12:13 2024 : Debug: (TLS) SSL_connect() writing 0 bytes to the network
Wed Jun 19 23:12:13 2024 : Debug: (22) (TLS) has 0 bytes in the buffer
Wed Jun 19 23:12:13 2024 : Debug: (22) (TLS) Saving 1988 bytes of RADIUS traffic for later (total 1988)
Wed Jun 19 23:12:13 2024 : Debug: Thread 2 waiting to be assigned a request
Wed Jun 19 23:12:13 2024 : Debug: Listening on proxy (e.f.g.h, 36579) -> home_server (a.b.c.d, 2083)
Wed Jun 19 23:12:13 2024 : Debug: (22) Expecting proxy response no later than 29.671495 seconds from now
Wed Jun 19 23:12:43 2024 : Debug: (22) No proxy response, giving up on request and marking it done
Wed Jun 19 23:12:43 2024 : Proxy: Marking home server a.b.c.d port 2083 as zombie (it has not responded in 30.000000 seconds).
Wed Jun 19 23:12:43 2024 : ERROR: (22) Failing proxied request for user "test@example.com", due to lack of any response from home server a.b.c.d port 2083
...
Wed Jun 19 23:12:43 2024 : Debug: Proxy SSL socket has data to read

@alandekok
Copy link
Member

alandekok commented Jun 20, 2024 via email

martinsta added a commit to martinsta/freeradius-server that referenced this pull request Jul 19, 2024
…sync parameter

fd is currently missing O_NONBLOCK property that should be set via nested fr_socket_client_tcp() ->  fr_nonblock() -> fcntl() calls.

this will fix the issue FreeRADIUS#5342 

FreeRADIUS#5342
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants