Security considerations for branch-based planner #604

chanwit · 2023-05-16T13:48:39Z

chanwit
May 16, 2023
Maintainer

Evaluate alternative approaches for communicating with git providers (i.e webhook vs polling) and their security posture
Propose an approach and justify it
Propose an architecture that supports the chosen approach

Related issue #595
Part of #527

chanwit · 2023-05-16T13:50:42Z

chanwit
May 16, 2023
Maintainer Author

[RFC] Polling mechanism for the branch-based planner

Motivation

In order to effectively track changes in a branch-based planner, it is essential to establish a solid and secure method of interaction between GitHub and the Kubernetes clusters. Two possible approaches are Webhooks and a polling mechanism.

Webhooks are a common choice because of their real-time nature. However, they do introduce a considerable security risk. To use webhooks, a publicly accessible endpoint is required, which exposes the Kubernetes cluster to the outside world. While security measures can be put in place to protect this endpoint (such as using secure tunneling or authentication mechanisms), the exposure itself presents a risk. An attacker could potentially exploit vulnerabilities in the webhook receiver or use the exposed endpoint for DDoS attacks.

A polling mechanism, on the other hand, does not require exposing an endpoint. Instead, the Kubernetes cluster reaches out to GitHub at regular intervals to check for any changes, such as PR creation, PR description changes, or PR comment changes. This method reduces the attack surface and is generally considered more secure.

Portability of Polling Mechanism to Other Git Providers

One of the key advantages of using a polling mechanism is the ease of porting this implementation to other Git providers such as GitLab, Bitbucket, etc. With the polling mechanism, we can standardize the way we interact with the Git providers, reducing the complexity that comes with dealing with different webhook systems or APIs.

Webhook support varies widely among Git providers. Some providers might not support webhooks at all, or they might have different ways of setting up and securing webhooks. Moreover, the structure and content of webhook messages can also differ significantly between providers, which means the code to parse and handle these messages would need to be written and maintained for each provider.

On the contrary, the polling mechanism works largely the same way for any Git provider. We need to make requests to the provider's API to fetch the required data, which is typically available through similar RESTful APIs across different providers. Therefore, we can easily adapt our polling code to a new provider by changing the API endpoints and adjusting for any differences in the API responses.

In addition, the polling mechanism allows us to control the rate of requests, which can be beneficial for dealing with rate limits imposed by different providers. With webhooks, the rate of incoming messages is determined by the activity on the Git provider, which could potentially overwhelm our system or cause us to exceed rate limits.

Proposed solution

We propose to implement a polling mechanism that operates from within the Kubernetes cluster to GitHub. This solution negates the need to expose the Kubernetes cluster to external entities, hence reducing security risk.

However, it's important to note that the polling mechanism isn't without its drawbacks. It introduces latency due to the time delay between the occurrence of an event and the next scheduled polling. Additionally, it can potentially increase the load on the GitHub server if the polling frequency is high. But, these concerns can be mitigated by carefully configuring the polling frequency based on the urgency of updates.

Examples

Example 1. Polling new PR and file names

package main

import (
	"fmt"
	"context"
	"time"

	"github.com/google/go-github/v52/github"
	"golang.org/x/oauth2"
)

type PRState struct {
	Number    int
	Title     string
	FileNames []string
}

var previousPRState = make(map[int]PRState)

func main() {
	ctx := context.Background()
	ts := oauth2.StaticTokenSource(
		&oauth2.Token{AccessToken: "..."}, // Replace with your GitHub token.
	)
	tc := oauth2.NewClient(ctx, ts)

	client := github.NewClient(tc)

	for {
		// Replace owner and repo.
		prs, _, _ := client.PullRequests.List(ctx, "weaveworks", "tf-controller", nil)

		for _, pr := range prs {
			// Only track PRs to the main branch
			if *pr.Base.Ref != "main" {
				continue
			}

			files, _, _ := client.PullRequests.ListFiles(ctx, "weaveworks", "tf-controller", *pr.Number, nil)
			var fileNames []string
			for _, file := range files {
				fileNames = append(fileNames, *file.Filename)
			}

			currentPRState := PRState{
				Number:    *pr.Number,
				Title:     *pr.Title,
				FileNames: fileNames,
			}

			if _, ok := previousPRState[currentPRState.Number]; !ok {
				// Handle new PR, examine branch name, file paths, etc.
				fmt.Println(currentPRState)
			}

			// Update the stored PR state.
			previousPRState[currentPRState.Number] = currentPRState
		}

		time.Sleep(10 * time.Minute) // Poll every 10 minutes.
	}

}

Example 2. Polling new comments

package main

import (
	"context"
	"time"

	"github.com/google/go-github/v52/github"
	"golang.org/x/oauth2"
)

type PRCommentState struct {
	ID     int64
	Author string
	Body   string
}

var previousPRCommentState = make(map[int64]PRCommentState)

func main() {
	ctx := context.Background()
	ts := oauth2.StaticTokenSource(
		&oauth2.Token{AccessToken: "..."}, // Replace with your GitHub token.
	)
	tc := oauth2.NewClient(ctx, ts)

	client := github.NewClient(tc)

	for {
		// Replace owner, repo, and author.
		prs, _, _ := client.PullRequests.List(ctx, "weaveworks", "tf-controller", nil)
		targetAuthor := "chanwit" // replace with target username

		for _, pr := range prs {
			comments, _, _ := client.Issues.ListComments(ctx, "weaveworks", "tf-controller", *pr.Number, nil)

			for _, comment := range comments {
				if *comment.User.Login != targetAuthor {
					continue
				}

				currentPRCommentState := PRCommentState{
					ID:     *comment.ID,
					Author: *comment.User.Login,
					Body:   *comment.Body,
				}

				if prevState, ok := previousPRCommentState[currentPRCommentState.ID]; ok {
					// Check if there's a change.
					if prevState.Body != currentPRCommentState.Body {
						// Handle the comment change.
						// ...
					}
				} else {
					// Handle the new comment.
					// ...
				}

				// Update the stored comment state.
				previousPRCommentState[currentPRCommentState.ID] = currentPRCommentState
			}
		}

		time.Sleep(10 * time.Minute) // Poll every 10 minutes.
	}
}

Rate Limit Consideration

The GitHub API enforces a rate limit to control the number of requests a client can make in a given period of time to ensure fair usage. For authenticated requests, you can make up to 5,000 requests per hour.

In the context of our polling mechanism, this means we need to plan the frequency of our polling requests carefully. Making requests too frequently could quickly exhaust the limit and result in the client being temporarily blocked from making additional requests.

For example, if we poll GitHub every minute, we could make up to 60 requests per hour for each unique item we are polling. If we are monitoring 100 pull requests, this will total 6,000 requests per hour, exceeding the rate limit. Therefore, we might need to increase our polling interval or reduce the number of items we are monitoring.

Also, it's a good practice to handle the X-RateLimit-Remaining HTTP header in the API response, which indicates the number of requests that you can make before hitting the limit. If the remaining limit is low, we can decide to pause or slow down requests until the limit is reset.

To effectively manage GitHub API rate limits and maintain the security of access tokens, we recommend storing the GitHub token in a Kubernetes Secret and referencing that Secret in the Terraform Custom Resource (CR).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security considerations for branch-based planner #604

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Security considerations for branch-based planner #604

chanwit May 16, 2023 Maintainer

Replies: 1 comment

chanwit May 16, 2023 Maintainer Author

[RFC] Polling mechanism for the branch-based planner

Motivation

Portability of Polling Mechanism to Other Git Providers

Proposed solution

Examples

Example 1. Polling new PR and file names

Example 2. Polling new comments

Rate Limit Consideration

chanwit
May 16, 2023
Maintainer

chanwit
May 16, 2023
Maintainer Author