Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(outbound): Route TLS connections by SNI #3160

Merged
merged 13 commits into from
Oct 18, 2024
Merged

feat(outbound): Route TLS connections by SNI #3160

merged 13 commits into from
Oct 18, 2024

Conversation

zaharidichev
Copy link
Member

@zaharidichev zaharidichev commented Aug 29, 2024

This is a draft PR that wires together some of the recently introduced changes to the proxy in order to deliver TLS routing functionality. The changes are:

This change adds a TLS routing stack that has the following properties:

  • it always expects that SNI is present
  • uses tls routing information provided by the discovery API to perform routing based on SNI
  • does not concern itself with terminating TLS by simply proxies the encrypted stream

Signed-off-by: Zahari Dichev zaharidichev@gmail.com

Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a merge strategy perspective, I think we'll probably want to split out PRs for from the leaves up to the root. That is, we can easily add the tls/route crate in an independent change with its own tests. Then we can add client-policy changes that incorporate those changes, and then we can add the outbound tls stack with tests, and then finally wire it into the outbound protocol router.

We are going to have to wire this into the ingress stack as well, presumably? Or what do we expect to happen in the case where an ingress makes an upstream HTTPS call?

@@ -0,0 +1,14 @@
[package]
name = "linkerd-tls-route"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably introduce this crate in a distinct change. It should include module-level tests like the http-route crate.

}

#[derive(Clone, Debug, Default)]
pub struct DetectSniServer<T, N> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this differ from the middleware in linkerd/tls/src/server.rs? Can they be unified? Is there any reason that this middleware can't be defined in the linkerd-tls crate? That should be our default--it's easier to to test/change modules when they are decoupled from the app. That is, if the app isn't compiling, it can still be possible to test and make changes to this stuff.

let (sni, io) = detect.await.map_err(|_| SniDetectionTimeoutError)??;
let sni = sni.ok_or(NoSniFoundError)?;

println!("detected SNI: {:?}", sni);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should definitely be a debug log and not a println.

linkerd/app/outbound/src/protocol.rs Show resolved Hide resolved
@@ -192,7 +192,7 @@ where
}

/// Peek or buffer the provided stream to determine an SNI value.
async fn detect_sni<I>(mut io: I) -> io::Result<(Option<ServerName>, DetectIo<I>)>
pub async fn detect_sni<I>(mut io: I) -> io::Result<(Option<ServerName>, DetectIo<I>)>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I asked above, is it possible to define the TLS servers within this module so we don't need to make these details public?

p.fill_backends(&mut backends);
}
Protocol::Tls(_) => {
unreachable!("no API support for TLS protocol yet");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not possible to fill the route backends here? Our tls types should include enough information to do that, right?

linkerd/io/src/sensor.rs Show resolved Hide resolved
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
This reverts commit d909be7.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
This reverts commit 6c07804.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
@zaharidichev zaharidichev changed the base branch from main to zd/split-tls-detection September 23, 2024 11:50
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Base automatically changed from zd/split-tls-detection to main September 25, 2024 15:36
olix0r and others added 2 commits October 10, 2024 18:52
* tls: Avoid InsertParam parameter.

We don't actually use InsertParam all that much--only in the TLS server (which
is obviously why it was included here). This change removes the InsertParam in
favor of using a tuple, generally reducing boilerplate.

It turns out that the TLS stack already has a map_target to handle turning the
tuple-target into a Tls type, so it shouldn't be needed.

* tls: Remove ExtractParam from detect_sni

Similarly, we don't actually care about extracting a timeout from the target.
Using an ExtractParam causes needless boilerplate.

This change updates the stack module to simply take a timeout parameter at
construction time.

* tls: Make the detect_tls module private

We now only need to export the NewDetectSni type. The module reexport is not
necessary.

* tls: Reorganize NewDetectRequiredSni under the server module

Because the DetectTls and DetectSni types are so similar -- and implemented in
the context of a server inspecting a provided connection (and not a client
establishing a TLS connection), this change reorganizes the module:

* The DetectSni types are renamed to DetectRequiredSni to better reflect their
  purpose and difference from the DetectTls type.
* The detect_sni module is renamed and moved to server::required_sni. This
  module is private and the relevant types are reexported from the server
  module.
@zaharidichev zaharidichev marked this pull request as ready for review October 10, 2024 16:05
@zaharidichev zaharidichev requested a review from a team as a code owner October 10, 2024 16:05
Copy link
Collaborator

@cratelyn cratelyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i took another read through this work, it's looking very good! i have some nitpicks i noticed, but these are all small details. nice work @zaharidichev!

Comment on lines 40 to 42
linkerd-http-route = { path = "../../http/route" }
linkerd-tls-route = { path = "../../tls/route" }
linkerd-identity = { path = "../../identity" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we sort this?


/// A backend dispatcher explicitly fails all requests.
#[derive(Debug, thiserror::Error)]
#[error("{0}")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could use

Suggested change
#[error("{0}")]
#[error(transparent)]

if we are delegating directly to the inner string.

Comment on lines 14 to 24
// === impl Backend ===

impl<T: Clone> Clone for Backend<T> {
fn clone(&self) -> Self {
Self {
route_ref: self.route_ref.clone(),
concrete: self.concrete.clone(),
}
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we move this below the MatchedRoute<T>, Route<T>, and error structures below, just to match the general convention of // === impl T === blocks coming after struct/type definitions?

Comment on lines 48 to 50
}

impl<N> NewDetectRequiredSni<N> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
impl<N> NewDetectRequiredSni<N> {
}
// === impl NewDetectRequiredSni ===
impl<N> NewDetectRequiredSni<N> {

Comment on lines 69 to 72
}

// === impl DetectSni ===

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
// === impl DetectSni ===
}
// === impl DetectRequiredSni ===

Comment on lines +18 to +19
#[error("SNI detection timed out")]
pub struct SniDetectionTimeoutError;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional idea:

if this was a newtype around https://docs.rs/tokio/latest/tokio/time/error/struct.Elapsed.html, we could be extra-defensive about confirming that we've encountered a timeout. (contd. below... 👇)

// Detect the SNI from a ClientHello (or timeout).
let detect = time::timeout(self.timeout, detect_sni(io));
Box::pin(async move {
let (res, io) = detect.await.map_err(|_| SniDetectionTimeoutError)??;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let (res, io) = detect.await.map_err(|_| SniDetectionTimeoutError)??;
let (res, io) = detect.await.map_err(SniDetectionTimeoutError)??;

(contd. from above..) and then this could wrap the timeout error, making sure we don't ever discard some other error we might care about.

// === impl Outbound ===

impl<C> Outbound<C> {
/// Builds a stack that proxies tls connections.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Builds a stack that proxies tls connections.
/// Builds a stack that proxies TLS connections.

Fail { message: Arc<str> },
}

/// A backend dispatcher explicitly fails all requests.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A backend dispatcher explicitly fails all requests.
/// A backend dispatcher that explicitly fails all requests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that suggestion fixes a typo, i believe, but is this comment affixed to the wrong structure? it seems like this is an error returned by the fail service created below in this statement:

          let fail = svc::ArcNewService::new(|message: Arc<str>| {
                svc::mk(move |_| futures::future::ready(Err(DispatcherFailed(message.clone()))))
            });

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
@olix0r olix0r changed the title TLS Routing stack feat(outbound): Route TLS connections by SNI Oct 17, 2024
Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
@zaharidichev zaharidichev merged commit 73e96dd into main Oct 18, 2024
15 checks passed
@zaharidichev zaharidichev deleted the zd/tls-stack branch October 18, 2024 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants