Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(metrics): init Metrics in app init, cleanup setup code to one b… #1167

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# push will run on every pushed commit to any branch (so this will rerun the tests
# once a branch gets merged to main in addition to any new commits on any branch)
on: push

name: CI
on:
push:
pull_request:
types: [opened, reopened]
Comment on lines +1 to -5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be edge cases where the "automatic merge" messes something up, and the CI succeeds on the "branch run", but fails on the "PR run". In that case you have to update your branch manually and fix the conflict that git didn't see. It has happened to me a handful of times, not sure it's worth running the CI twice on every PR, but 🤷‍♀️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it happen in other repos? which automatic merge are you referring to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can happen in any repo

i can't find documentation about it but this is my understanding from what i observed:
when 2 checks run (for push and for PR) they're not necessarily running on the same code. The "push" check runs on the branch's code, and the "PR" check runs on the code that would be on master after merging the PR. If there are any changes on master that aren't on the branch, it performs a merge automatically and runs the CI on that. It's the same code that would end up on master if you choose to merge without updating your branch first - git/github finds no conflict and merges. Occasionally it does it wrong, and it can be caught by the "PR" CI check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be protected against people merging without first updating their branch, at which point the push should run again. I do remember issues in the past vaguely... but I can't find any documentation in GH Actions that leads me to believe it's necessary to have both of these. Unless you have a strong opinion, I would rather not run everything twice and just get feedback on the code on main when the branch-creator merges it in and pushes those commits.


concurrency:
group: '${{ github.workflow }} @ ${{ github.event.pull_request.head.label || github.head_ref || github.ref }}'
cancel-in-progress: true
Expand Down
4 changes: 2 additions & 2 deletions .secrets.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@
"filename": ".github/workflows/ci.yaml",
"hashed_secret": "3e26d6750975d678acb8fa35a0f69237881576b0",
"is_verified": false,
"line_number": 13
"line_number": 14
}
],
"deployment/scripts/postgresql/postgresql_init.sql": [
Expand Down Expand Up @@ -422,5 +422,5 @@
}
]
},
"generated_at": "2024-07-25T17:19:58Z"
"generated_at": "2024-07-26T20:11:41Z"
}
33 changes: 33 additions & 0 deletions deployment/scripts/metrics/setup_prometheus
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/usr/bin/env bash
# This script is called by:
# UWSGI during startup, this gets moved to the /fence directory in the Dockerfile
# - It prepares the prometheus_multiproc_dir folder to store the metrics from separate uwsgi workers (per PID)
# run.py
# - So local runs setup necessary environment vars and folders for prometheus metrics
# Test framework in conftest
# - So test runs setup necessary environment vars and folders for prometheus metrics

# Usage:
# ./script_name.sh [DIR] [true]

# Default directory if no argument is provided
DIR=${1:-/var/tmp/uwsgi_flask_metrics}

# Determine whether to wipe the directory (default is to wipe)
SETUP_DIR=${2:-true}

set -ex

if [[ "$SETUP_DIR" == "true" ]]; then
echo "setting up $PROMETHEUS_MULTIPROC_DIR. clearing existing files, ensuring it exists, chmod 755"
rm -Rf "$DIR"
mkdir -p "$DIR"
chmod 755 "$DIR"
fi

if id -u nginx &>/dev/null; then
chown "$(id -u nginx)":"$(id -g nginx)" "$DIR"
fi

export PROMETHEUS_MULTIPROC_DIR="$DIR"
echo "PROMETHEUS_MULTIPROC_DIR is $PROMETHEUS_MULTIPROC_DIR"
3 changes: 1 addition & 2 deletions deployment/uwsgi/uwsgi.ini
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,7 @@ pythonpath = /usr/local/src/*
# metrics setup
stats = 127.0.0.1:9191
stats-http = true
env = prometheus_multiproc_dir=/var/tmp/uwsgi_flask_metrics
exec-asap = /fence/clear_prometheus_multiproc /var/tmp/uwsgi_flask_metrics
exec-asap = /fence/setup_prometheus /var/tmp/uwsgi_flask_metrics

# Initialize application in worker processes, not master. This prevents the
# workers from all trying to open the same database connections at startup.
Expand Down
9 changes: 7 additions & 2 deletions fence/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
)

from fence.auth import logout, build_redirect_url
from fence.metrics import metrics
from fence.metrics import Metrics
from fence.blueprints.data.indexd import S3IndexedFileLocation
from fence.blueprints.login.utils import allowed_login_redirects, domain
from fence.errors import UserError
Expand Down Expand Up @@ -97,6 +97,11 @@ def app_init(
logger.info(
f"Prometheus metrics are{'' if config['ENABLE_PROMETHEUS_METRICS'] else ' NOT'} enabled."
)
if config["ENABLE_PROMETHEUS_METRICS"]:
# Initialize the Metrics instance
app.metrics = Metrics()
else:
app.metrics = Metrics(enabled=False)
Avantol13 marked this conversation as resolved.
Show resolved Hide resolved


def app_sessions(app):
Expand Down Expand Up @@ -207,7 +212,7 @@ def metrics_endpoint():
/!\ There is no authz control on this endpoint!
In cloud-automation setups, access to this endpoint is blocked at the revproxy level.
"""
data, content_type = metrics.get_latest_metrics()
data, content_type = flask.current_app.metrics.get_latest_metrics()
return flask.Response(data, content_type=content_type)


Expand Down
3 changes: 1 addition & 2 deletions fence/blueprints/data/indexd.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@
from fence.resources.ga4gh.passports import sync_gen3_users_authz_from_ga4gh_passports
from fence.resources.audit.utils import enable_audit_logging
from fence.utils import get_valid_expiration_from_request
from fence.metrics import metrics

from . import multipart_upload
from ...models import AssumeRoleCacheAWS, query_for_user, query_for_user_by_id
Expand Down Expand Up @@ -210,7 +209,7 @@ def _log_signed_url_data_info(
f"acl={acl} authz={authz} bucket={bucket} user_sub={user_sub} client_id={client_id}"
)

metrics.add_signed_url_event(
current_app.metrics.add_signed_url_event(
action,
protocol,
acl,
Expand Down
3 changes: 1 addition & 2 deletions fence/blueprints/login/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
from fence.blueprints.login.redirect import validate_redirect
from fence.config import config
from fence.errors import UserError
from fence.metrics import metrics

logger = get_logger(__name__)

Expand Down Expand Up @@ -134,7 +133,7 @@ def get(self):

def post_login(self, user=None, token_result=None, **kwargs):
prepare_login_log(self.idp_name)
metrics.add_login_event(
flask.current_app.metrics.add_login_event(
user_sub=flask.g.user.id,
idp=self.idp_name,
fence_idp=flask.session.get("fence_idp"),
Expand Down
66 changes: 36 additions & 30 deletions fence/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,10 @@
- The new method should call the `_increment_counter` and/or `_set_gauge` methods with the
appropriate metric name and labels.
- Call the new method from the code where relevant, for example:
from fence.metric import metrics
metrics.add_login_event(...)
from flask import current_app
current_app.metrics.add_login_event(...)
- Add unit tests to the `tests/test_metrics` file.
"""


import os
import pathlib

Expand All @@ -26,37 +24,49 @@
CONTENT_TYPE_LATEST,
)

from fence.config import config


logger = get_logger(__name__)


class Metrics:
class Metrics(object):
"""
Class to handle Prometheus metrics
Attributes:
registry (CollectorRegistry): Prometheus registry
metrics (dict): Dictionary to store Prometheus metrics
"""

def __init__(self, prometheus_dir="/var/tmp/uwsgi_flask_metrics"):
pathlib.Path(prometheus_dir).mkdir(parents=True, exist_ok=True)
os.environ["PROMETHEUS_MULTIPROC_DIR"] = prometheus_dir
def __init__(self, enabled=True, prometheus_dir="/var/tmp/uwsgi_flask_metrics"):
"""
Create a metrics class.

Args:
enabled (bool): If this is false, the class functions will be no-ops (no operations), effectively
doing nothing. This is the behavior when metrics are disabled. Why? So application code
doesn't have to check, it always tries to log a metric.
"""
self.enabled = enabled
if enabled:
paulineribeyre marked this conversation as resolved.
Show resolved Hide resolved
pathlib.Path(prometheus_dir).mkdir(parents=True, exist_ok=True)
os.environ["PROMETHEUS_MULTIPROC_DIR"] = prometheus_dir

logger.info(
f"PROMETHEUS_MULTIPROC_DIR is {os.environ['PROMETHEUS_MULTIPROC_DIR']}"
)

self._registry = CollectorRegistry()
multiprocess.MultiProcessCollector(self._registry)
self._metrics = {}
self._registry = CollectorRegistry()
multiprocess.MultiProcessCollector(self._registry)
self._metrics = {}

# set the descriptions of new metrics here. Descriptions not specified here
# will default to the metric name.
self._counter_descriptions = {
"gen3_fence_presigned_url": "Fence presigned urls",
"gen3_fence_login": "Fence logins",
}
self._gauge_descriptions = {
"gen3_fence_presigned_url_size": "Fence presigned urls",
}
# set the descriptions of new metrics here. Descriptions not specified here
# will default to the metric name.
self._counter_descriptions = {
"gen3_fence_presigned_url": "Fence presigned urls",
"gen3_fence_login": "Fence logins",
}
self._gauge_descriptions = {
"gen3_fence_presigned_url_size": "Fence presigned urls",
}

def get_latest_metrics(self):
"""
Expand All @@ -67,8 +77,8 @@ def get_latest_metrics(self):
"""
# When metrics gathering is not enabled, the metrics endpoint should not error, but it should
# not return any data.
if not config["ENABLE_PROMETHEUS_METRICS"]:
return "", None
if not self.enabled:
return "", CONTENT_TYPE_LATEST

return generate_latest(self._registry), CONTENT_TYPE_LATEST

Expand Down Expand Up @@ -125,7 +135,7 @@ def add_login_event(self, user_sub, idp, fence_idp, shib_idp, client_id):
"""
Record a login event
"""
if not config["ENABLE_PROMETHEUS_METRICS"]:
if not self.enabled:
return
self._increment_counter(
"gen3_fence_login",
Expand Down Expand Up @@ -164,7 +174,7 @@ def add_signed_url_event(
"""
Record a signed URL event
"""
if not config["ENABLE_PROMETHEUS_METRICS"]:
if not self.enabled:
return
self._increment_counter(
"gen3_fence_presigned_url",
Expand Down Expand Up @@ -193,7 +203,3 @@ def add_signed_url_event(
},
size_in_kibibytes,
)


# Initialize the Metrics instance
metrics = Metrics()
51 changes: 29 additions & 22 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ botocore = "*"
cached_property = "^1.5.1"
cdiserrors = "<2.0.0"
cdislogging = "^1.0.0"
cdispyutils = "^2.0.1"
cdispyutils = {git = "https://github.com/uc-cdis/cdis-python-utils", rev = "feat/metrics"}
paulineribeyre marked this conversation as resolved.
Show resolved Hide resolved
flask = ">=3.0.0"
cryptography = ">=42.0.5"
flask-cors = ">=3.0.3"
Expand Down
Loading
Loading