Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getaddrinfo: improve the service/port resolution #524

Merged
merged 1 commit into from
Aug 28, 2024

Conversation

Henkru
Copy link
Contributor

@Henkru Henkru commented Aug 3, 2024

Hello,

While experimenting with the wasm32-wasip2 target and CPython, I discovered an issue with the getaddrinfo() implementation: it fails to resolve the provided service into a port number, causing sin_port to always be set to 0. This issue leads to failures in network-related functions that rely on getaddrinfo(), such as Python's urllib3 library, which passes the result directly to connect(). This results in connection attempts using a port value of 0, which naturally fails.

Minimal example to reproduce the problem

#include <arpa/inet.h>
#include <netdb.h>
#include <stdio.h>

int main(void) {
    struct addrinfo *res = NULL;
    getaddrinfo("google.com", "443", NULL, &res);

    for (struct addrinfo *i = res; i != NULL; i = i->ai_next) {
        char str[INET6_ADDRSTRLEN];
        if (i->ai_addr->sa_family == AF_INET) {
            struct sockaddr_in *p = (struct sockaddr_in *)i->ai_addr;
            int port = ntohs(p->sin_port);
            printf("%s: %i\n", inet_ntop(AF_INET, &p->sin_addr, str, sizeof(str)), port);
        } else if (i->ai_addr->sa_family == AF_INET6) {
            struct sockaddr_in6 *p = (struct sockaddr_in6 *)i->ai_addr;
            int port = ntohs(p->sin6_port);
            printf("%s: %i\n", inet_ntop(AF_INET6, &p->sin6_addr, str, sizeof(str)), port);
        }
    }

    return 0;
}
$ /opt/wasi-sdk/bin/clang -target wasm32-wasip2 -o foo foo.c
$ wasmtime run -S allow-ip-name-lookup=y foo
216.58.211.238: 0
2a00:1450:4026:808::200e: 0

Expected output:

216.58.211.238: 443
2a00:1450:4026:808::200e: 443

Root Cause

The root cause is that getaddrinfo() does not correctly translate the provided service into a port number. As described in the getaddrinfo() man page, the function should:

service sets the port in each returned address structure. If
this argument is a service name (see services(5)), it is
translated to the corresponding port number. This argument can
also be specified as a decimal number, which is simply converted
to binary. If service is NULL, then the port number of the
returned socket addresses will be left uninitialized.

Proposed Fix

This pull request addresses the issue by implementing the following behavior for getaddrinfo():

  • If the service is NULL, the port number in the returned socket addresses remains uninitialized.
  • The value is converted to an integer and validated if the service is numeric.

The PR does not currently add support for translating named services into port numbers because getservbyname() has not been implemented. In cases where a named service is provided, the EAI_NONAME error code is returned.

Previously, the `getaddrinfo` function did not resolve service names into port
numbers, resulting in `sin_port` and `sin6_port` being hard-coded to `0`.

This commit adds support for `getaddrinfo` to parse the service string as
a port number if it is provided in numeric form. However, this commit does not
include support for resolving service names into port numbers, as the
getservbyname() function has not been implemented (yet). In this case
the `EAI_NONAME` error is returned.
Copy link
Collaborator

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me but can @dicej or @badeend confirm this works as intended? It fixes what may be a gap in #488.

@dicej
Copy link
Contributor

dicej commented Aug 28, 2024

LGTM. I only implemented enough of getaddrinfo to get the relevant CPython asyncio tests passing, plus some basic Rust and C tests. I'm not surprised there were (and probably still are) missing pieces.

Thanks, @Henkru!

@abrown abrown merged commit 1b19fc6 into WebAssembly:main Aug 28, 2024
8 checks passed
@Henkru
Copy link
Contributor Author

Henkru commented Aug 28, 2024

Thanks @abrown and @dicej for checking it out.

I'm also considering to implement the getservbyname() function to support service name resolution. Typically, this function reads the service-to-port mapping from the /etc/services file, which may not be feasible in this context. One possible approach is to include the mapping as a static array within the library (Wasmer has also adopted that method).
However, based on the services file on my system, here's my rough estimate for the size of the lookup table:

Number of unique service names: 4968
Total bytes required for the names: ~50kB
Total bytes needed to encode the port and protocol pair (fits into one `uint32_t`): ~20kB

A 70 kB increase feels heavy for me. While compression could reduce this size, it would add complexity when updating the table and obscure its contents.

Alternatively, the implementation could attempt to read the services file, and the runtime either maps it or not. However, this approach doesn't seem like an ideal solution either.

Any thoughts?

@badeend
Copy link
Contributor

badeend commented Aug 28, 2024

I'd start out with a static array containing a dozen-or-so of the most commonly used protocols (http, ftp, ssh, etc) as the baseline.
And maybe in the future add a build flag to embed the full 70kB mapping file.

@badeend
Copy link
Contributor

badeend commented Aug 28, 2024

If you want an arbitrary sample, I did look into this a while ago and landed on this shortlist:

IANA service name Port Remarks
http 80
https 443
domain 53 DNS
ftp 21
ftp-data 20
ftps 990
ftps-data 989
smtp 25
submissions 465 SMTP (Implicit TLS)
submission 587 SMTP
pop3 110
pop3s 995 POP3 (Implicit TLS)
imap 143
imaps 993 IMAP (Implicit TLS)
ssh 22
telnet 23
ntp 123

@dicej
Copy link
Contributor

dicej commented Aug 28, 2024

Note that wasi-libc currently splits numerical parsing and stringification code across multiple libraries, using weak symbols to allow application code to pick which implementation they want at link time. We could do the same thing for service mapping, which would save the developer from having to build their own custom wasi-libc from source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants