Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] HTTP interfaces rework. #48

Open
azjezz opened this issue Jul 6, 2021 · 29 comments
Open

[Meta] HTTP interfaces rework. #48

azjezz opened this issue Jul 6, 2021 · 29 comments

Comments

@azjezz
Copy link

azjezz commented Jul 6, 2021

This is a meta issue to track the rework that should be done for these interfaces and the problems they have;

@fredemmott
Copy link
Contributor

Similarly to Env, we're likely to want to revisit this consistently as part of changes to entrypoint APIs as a whole - CLI, http, xbox etc.

This repository is likely to be a good place for experimenting with the http-specific parts of any proposal for that, prior to making them actually part of how entrypoints work

@azjezz
Copy link
Author

azjezz commented Jul 6, 2021

Things i have encountered in the past few days, which should be fixed in the rework of HTTP interfaces.

  1. an outgoing response should have a read+write body.
  2. an incoming response should have a read only body, and support chunked responses.
  3. an outgoing request should also support uploaded files, since an HTTP client could be able to upload files.

@azjezz
Copy link
Author

azjezz commented Jul 6, 2021

as per the discussion in #39:

  • headers, query parameters, and parsed body should all have type of dict<string, vec<string>> or vec<(string ,string)> to allow for duplicates, and be consistent with each other.

@fredemmott
Copy link
Contributor

an outgoing response should have a read+write body.

Probably should usually have a read-write body, but streaming responses may have a write-only body by necessity

... though, with IO\pipe() + concurrent , perhaps true read-write isn't necessary. If it is, it needs to be SeekableReadWriteHandle, not just ReadWriteHandle

@azjezz
Copy link
Author

azjezz commented Jul 6, 2021

SeekableReadWriteHandle

actually yes, it needs to also be seekable, which is what i did here: https://github.com/nuxed/nuxed/blob/develop/src/Http/Message/IResponse.hack

@azjezz
Copy link
Author

azjezz commented Jul 6, 2021

this interfaces currently works perfectly for outgoing responses, but i'm having troubles using it for incoming responses ( with http client ), hence why i think it's better to split outgoing messages and incoming messages

@fredemmott
Copy link
Contributor

uploaded files

There needs to be /a way/ to access them. I'm not sure if that belongs here, or in the frameworks

  • multipart makes this complicated
  • if here, it definitely needs to support lazy-loading
  • even the list of file-names must be lazy-loaded
  • fetching a full list of file names requires storing the files somewhere, as post data is only rewindable if it's been buffered, which may be undesirable

If this is part of the interfaces, that means that a default buffering method (e.g. file, ram) and strategy (e.g. limit to nGB/seconds) must be provided by all implementations.

The alternative would be just to provide the ReadHandle for the raw data, and let frameworks provide/let users choose between concrete implementations

@fredemmott
Copy link
Contributor

That was about server requests; client requests are simpler. Probably should support either string content or ReadHandle content

@azjezz
Copy link
Author

azjezz commented Jul 6, 2021

ReadHandle for the raw data

you mean raw data as in the whole request or post headers?

i think post headers is a good idea, but i would expect the handle to be rewindable

@fredemmott
Copy link
Contributor

All non-header data.

The handle being rewindable implicitly requires local buffering, which can be undesirable when dealing with large files (e.g. video uploads)

@azjezz
Copy link
Author

azjezz commented Jul 6, 2021

If this is part of the interfaces, that means that a default buffering method (e.g. file, ram) and strategy (e.g. limit to nGB/seconds) must be provided by all implementations.

or by HHVM, using the exact same way they are handled now, but instead of a global variable $_FILES, maybe HH\Server\get_request_files(): vec<RequestFileShape>.

this worked greatly so far in both PHP and HHVM, the server will upload the files to a temporary file each, and provide the path, name, and errors.

The handle being rewindable implicitly requires local buffering, which can be undesirable when dealing with large files (e.g. video uploads)

rewinding is also helpful in many cases.

E.g: a tool is built to deal with requests in some way where it reads the body, my framework implements the request interface with an additional method Request::toJson<T>(): Awaitable<T> , the user then calls $request->toJson<T>(); and later passes the request to the library they installed, which tries to read the body ( maybe doing some logging ), which fails.

@fredemmott
Copy link
Contributor

fredemmott commented Jul 6, 2021

A general alternative to using SeekableReadWriteHandle for server response bodies is to take (Awaitable<void> $producer, ReadHandle $reader)

There is then no need for seekable (edit: to implement middleware), but it's very verbose:

list($next_r, $next_w) = IO\pipe();
$produce_next = async { concurrent {
  await $producer;
  await async { while(true) {
    try {
      $chunk = await $read->readAllowPartialSuccessAsync();
      if ($chunk === '') {
        $next_w->close();
        break;
      }
      await $next_w->writeAllAsync($chunk);
  }};
 }};
 $response->setBody($next_producer, $next_r);

This allows true streaming with middleware and minimal/controllable buffering

@fredemmott
Copy link
Contributor

If this is part of the interfaces, that means that a default buffering method (e.g. file, ram) and strategy (e.g. limit to nGB/seconds) must be provided by all implementations.

or by HHVM, using the exact same way they are handled now, but instead of a global variable $_FILES, maybe HH\Server\get_request_files(): vec<RequestFileShape>.

this worked greatly so far in both PHP and HHVM, the server will upload the files to a temporary file each, and provide the path, name, and errors.

Because of this implicit buffering/storage, we are unable to use $_FILES internally. It is currently provided as a convenience for external users, but it is not usable as a template for a canonical replacement.

@fredemmott
Copy link
Contributor

I understand that rewinding is definitely useful - but the requirement for storage/buffering is also a cost which makes it completely unusable for some problems - both large uploads, and slow 'uploads' (e.g. push notifications via long-poll, or a hypothetical websocket implementation in hack).

Rewinding might be suitable for a higher level framework (or an additional, optional request/response type here), but it can't be a requirement for an interface that is meant to cover 100% of use cases.

While it definitely should optimize for the common cases, we still want to be able to support everything here, similarly to HSL IO.

@azjezz
Copy link
Author

azjezz commented Jul 6, 2021

but it can't be a requirement for an interface that is meant to cover 100% of use cases.

I don't think it would cover 100% of uses cases regardless, there's always going to be edge cases, in which case, hhvm should be able to a way to access the request as-is ( full, including request line and headers ) in the current context as a read handle.

@azjezz
Copy link
Author

azjezz commented Jul 6, 2021

but the requirement for storage/buffering is also a cost which makes it completely unusable for some problems - both large uploads, and slow 'uploads' (e.g. push notifications via long-poll, or a hypothetical websocket implementation in hack).

wouldn't the "large uploads" problem be gone if each entry in vec<RequestFileShape> had it's own read handle that is not rewindable?

@fredemmott
Copy link
Contributor

Also, while that alternative to rewindable I suggested is pretty horrible, it's straightforward to abstract over nicely in frameworks; for example:

function render_my_page(IO\WriteHandle $body_writer):Awaitable<void> { /* do stuff */ }

... and in the framework:

list($r, $w) = IO\pipe();
$producer = render_my_page($w);
return new_response($producer, $r);

it's more normal to pass a function(): Awaitable<void> than a raw Awaitable<void> to avoid dangling awaitables, but it doesn't significantly change that

@fredemmott
Copy link
Contributor

fredemmott commented Jul 6, 2021

wouldn't the "large uploads" problem be gone if each entry in vec had it's own read handle that is not rewindable?

No - the filenames in a multipart upload are interspersed with the data; you can know the first filename without reading the whole thing, but you can't read the second filename (or content type) until you've fully read the first file. https://stackoverflow.com/questions/913626/what-should-a-multipart-http-request-with-multiple-files-look-like has example upload raw post data that shows this

@fredemmott
Copy link
Contributor

fredemmott commented Jul 6, 2021

$producer thing: let's make that a bit more concrete:

class ServerResponse {

// ...

function withStreamedBody(Awaitable<void> $producer, IO\ReadHandle $body): this {
 // ...
}

function withStringBody(string $foo): this {
  list($r, $w) = IO\pipe();
  return $this->withStreamedBody(
    async { await $w->writeAllAsync($foo); },
    $r,
  );
}
// ...
}

@azjezz
Copy link
Author

azjezz commented Jul 6, 2021

does HHVM currently do that?

also, what about having a different body type instead of a handle? we can have ResponseBody or MessageBody to deal with all of this which it covers the underlying implementation?

@fredemmott
Copy link
Contributor

does HHVM currently do that?

I'm not sure what 'that' is; if it's storing/caching uploaded files, yes.

If it's the withStringBody thing I put there, no: print() etc in HTTP requests is currently blocking

also, what about having a different body type instead of a handle? we can have ResponseBody or MessageBody to deal with all of this which it covers the underlying implementation?

That splits the problem in two, but both parts still need addressing (e.g. 'how does middleware work for streaming responses' is a valid question); not against it, but AIUI that decision will be primarily driven by user-friendliness rather than implementation

@fredemmott
Copy link
Contributor

Generator<string> would be pretty close to the produce/handle thing, but more familiar. The main downsides of directly doing Generator are:

  • generator performance isn't great at the moment
  • it's an awkward api with the AsyncGenerator class instead of direclty yielding from async functions

@azjezz
Copy link
Author

azjezz commented Jul 6, 2021

That splits the problem in two

yes .. but at least we can then deal with them separately, e.g a response body cannot contain multiple files, or both a file, and data.

so it makes sense to separate the concepts from each other.

but AIUI that decision will be primarily driven by user-friendliness rather than implementation

yes, personally i started thinking maybe something like how rocket deals request handler parameter might be better for the user

ref: https://rocket.rs/v0.4/guide/requests/#


Other things to keep in mind:

  • How are 1xx responses handled? currently they are not supported by HHVM, so we can't send early hint responses
  • How are Link headers going to work? should they be treated as normal headers, or should we introduce a new object similar to PSR-13 ( https://github.com/php-fig/link )

@azjezz
Copy link
Author

azjezz commented Jul 6, 2021

also, should we keep HTTP/2 push in mind?

@fredemmott
Copy link
Contributor

fredemmott commented Jul 6, 2021

so it makes sense to separate the concepts from each other.

May do :) If it's workable to consider one an abstraction on top of the other, that also makes sense instead

also, should we keep HTTP/2 push in mind?

Yep. This is supported when using proxygen on FB builds, and I don't see any major reason for these functions to be FB-only (minor reason: they're global functions without an appropriate prefix/namespace).

For my reference: server_push_resource(), server_push_resource_body()

@azjezz
Copy link
Author

azjezz commented Jul 6, 2021

Similarly to Env, we're likely to want to revisit this consistently as part of changes to entrypoint APIs as a whole - CLI, http, xbox etc.

I think we can make something that would also be extremely helpful and make it super easy to create a long running server in Hack.

let's assume the new API acts as follow:

@HttpEntryPoint
async function main(
  HttpContext $ctx
): Awaitable<void> {

}

$ctx could hold information about the request, and be used to send responses, push responses .. etc.

HHVM then could add a new function: HH\HTTP\spawn_context(CloseableReadWriteHandle $connection): Awaitable<DisposableHttpContext>

that would be used as follow:

while (true) {
  $connection = await $server->nextConnectionAsync();
  await using $context = HTTP\spawn_context($connection);
  // handle request, send response .. etc using $context, the same way you would do with an entry point
  // as DisposableHttpContext is a sub-type of `HttpContext`.
  ...
}

the newly added function would parse request, and send response accordingly, and close the connection if it should be closed, not sure how 'keep-alive' would be handled in this case tho 🤔

@fredemmott
Copy link
Contributor

HTTP\spawn_context()

I don't see us allowing spawning of a new context that is specifically an HTTP context; however, something similar to this should be supported for XBox requests - including the not-yet-present ability to pass in the request input/output/error file descriptors for that.

It would be possible to construct an HTTPContext object within a handler, but, assuming we expose this stuff through read-only global functions too, the global functions would indicate an xbox - not http - request

@lexidor
Copy link
Contributor

lexidor commented Jul 9, 2021

also, should we keep HTTP/2 push in mind?

Just adding context. I do not wish to argue for or against http server push. Chromium intends to drop http server push support, from what I have gathered. There are other browsers out there (hi Firefox and Safari) and non web browser http clients which do support http server push.
Intent to Remove: HTTP/2 and gQUIC server push

@azjezz
Copy link
Author

azjezz commented Jul 10, 2021

Google being google :) but we should support the standards, not what browsers implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants