Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cria funcionalidade para notificar ausência de arquivos de log #15

Merged
merged 29 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
fa24d29
Cria configuração de e-mail para coleção (será substituída no futuro …
May 10, 2024
78c1261
Cria referência temporal para a datetime representada por "two days a…
May 10, 2024
c8c08b8
Cria exceção para reportar situação em que apenas uma configuração at…
May 10, 2024
e90fea8
Cria filtro para obter arquivos de log de uma certa data, válidos
May 10, 2024
235e54b
Cria task que envia mensagem de email (para notificar ausência de logs)
May 10, 2024
2aaa254
Cria task que observa aruqivos de log faltantes e notifica e-mail (a …
May 10, 2024
1831783
Troca URL padrão de configuração local para 0.0.0.0
rafaelpezzuto May 24, 2024
9478ce1
Cria utilitário para obter um range de datas
rafaelpezzuto May 25, 2024
e390a4a
Substitui método que obtém data a partir de string por método numérico
rafaelpezzuto May 25, 2024
277a1e9
Adiciona método de classe para obter num de arquivos requeridos
rafaelpezzuto May 25, 2024
211a206
Adiciona método de classe para obter num de arquivos existentes
rafaelpezzuto May 25, 2024
6f61b0e
Cria modelo para acomodar relatório de arquivos faltantes
rafaelpezzuto May 25, 2024
73998b6
Corrige importações
rafaelpezzuto May 25, 2024
fe74e3f
Limpa choices e adiciona dados para novo modelo que armazena relatóri
rafaelpezzuto May 25, 2024
512fe2a
Substitui task para gerar relatórios de logs por 3 tasks menores
rafaelpezzuto May 25, 2024
5053f61
Cria modelo snippet para relatório de logs faltantes
rafaelpezzuto May 25, 2024
aa82517
Remove import desnecessário
rafaelpezzuto May 25, 2024
0469776
Merge branch 'scieloorg:main' into impl/report-missing-dates
pitangainnovare Jun 1, 2024
92eceea
Substitui termos existing e required por found e expected, respectiva…
rafaelpezzuto Jun 17, 2024
4501c20
Melhora comparações em contagem de arquivos existentes, extras e ok
rafaelpezzuto Jun 17, 2024
3ce8410
Marca mensagens para tradução (em disparo de exceções)
rafaelpezzuto Jun 17, 2024
13e1ccd
Cria property para retornar lista de acron2 e a utiliza em task
rafaelpezzuto Jun 17, 2024
fba9583
Renomeia outras variáveis com os termos required e existing
rafaelpezzuto Jun 17, 2024
3015aa1
Configura send para usar variável de ambiente de email da aplicação (…
rafaelpezzuto Jun 17, 2024
ff37340
Apresenta acron2 em listagem de coleções
rafaelpezzuto Jun 17, 2024
8e28675
Altera termo Core para Usage em template
rafaelpezzuto Jun 17, 2024
46e5da2
Corrige método que obtém número de arquivos exigidos para uma data. U…
rafaelpezzuto Jun 17, 2024
c6f5718
Marca outros termos para tradução
rafaelpezzuto Jun 17, 2024
941c8a8
Renomia outro termo existing para found
rafaelpezzuto Jun 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion config/settings/local.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
# ADMIN
# ------------------------------------------------------------------------------
# https://docs.wagtail.org/en/stable/reference/settings.html#wagtailadmin-base-url
WAGTAILADMIN_BASE_URL = 'https://usage.scielo.org'
WAGTAILADMIN_BASE_URL = 'http://0.0.0.0:8009'

# WhiteNoise
# ------------------------------------------------------------------------------
Expand Down
14 changes: 11 additions & 3 deletions log_manager/choices.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,14 @@
COLLECTION_CONFIG_TYPE_DIRECTORY_PROCESSED_DATA = 'PRO'
COLLECTION_CONFIG_TYPE_DIRECTORY_METRICS = 'MTS'
COLLECTION_CONFIG_TYPE_FILES_PER_DAY = 'DAY'
COLLECTION_CONFIG_TYPE_EMAIL = 'EMA'

COLLECTION_CONFIG_TYPE = [
(COLLECTION_CONFIG_TYPE_DIRECTORY_LOGS, _('Logs')),
(COLLECTION_CONFIG_TYPE_DIRECTORY_PROCESSED_DATA, _('Processed Data')),
(COLLECTION_CONFIG_TYPE_DIRECTORY_METRICS, _('Metrics')),
(COLLECTION_CONFIG_TYPE_FILES_PER_DAY, _('Files per Day')),
(COLLECTION_CONFIG_TYPE_EMAIL, _('E-mail')),
]


Expand All @@ -52,6 +54,12 @@
]


TEMPORAL_REFERENCE_YESTERDAY = 'yesterday'
TEMPORAL_REFERENCE_LAST_WEEK = 'last week'
TEMPORAL_REFERENCE_LAST_MONTH = 'last month'
COLLECTION_LOG_FILE_DATE_COUNT_OK = 'OK'
COLLECTION_LOG_FILE_DATE_COUNT_MISSING_FILES = 'MIS'
COLLECTION_LOG_FILE_DATE_COUNT_EXTRA_FILES = 'EXT'

COLLECTION_LOG_FILE_DATE_COUNT = [
(COLLECTION_LOG_FILE_DATE_COUNT_OK, _("OK")),
(COLLECTION_LOG_FILE_DATE_COUNT_MISSING_FILES, _("Missing Files")),
(COLLECTION_LOG_FILE_DATE_COUNT_EXTRA_FILES, _("Extra files")),
]
3 changes: 3 additions & 0 deletions log_manager/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,6 @@ class UndefinedApplicationConfigError(Exception):

class UndefinedCollectionConfigError(Exception):
...

class MultipleCollectionConfigError(Exception):
...
145 changes: 144 additions & 1 deletion log_manager/models.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from datetime import datetime

from django.db import models
from django.db.models import Q
from django.db.utils import IntegrityError
from django.utils.translation import gettext_lazy as _
from wagtail.admin.panels import FieldPanel
Expand All @@ -12,7 +13,11 @@
from tracker.models import UnexpectedEvent

from . import choices
from .exceptions import LogFileAlreadyExistsError
from .exceptions import (
LogFileAlreadyExistsError,
MultipleCollectionConfigError,
UndefinedCollectionConfigError,
)


class ApplicationConfig(CommonControlField):
Expand Down Expand Up @@ -155,6 +160,23 @@ def filter_by_collection_and_config_type(cls, collection_acron2, config_type, is
config_type=config_type,
is_enabled=is_enabled
)

@classmethod
def get_number_of_required_files_by_day(cls, collection_acron2, date, is_enabled=True):
files_by_day = cls.objects.filter(
collection__acron2=collection_acron2,
start_date__lte=date,
config_type=choices.COLLECTION_CONFIG_TYPE_FILES_PER_DAY,
is_enabled=is_enabled,
)

if files_by_day.count() > 1:
raise MultipleCollectionConfigError("ERROR. Please, keep only one configuration enabled for the FILES_BY_DAY attribute.")

if files_by_day.count() == 0:
raise UndefinedCollectionConfigError("ERROR. Please, add an Application Configuration for the FILES_BY_DAY attribute.")

return int(files_by_day.get().value)

def __str__(self):
return f'{self.value}'
Expand Down Expand Up @@ -201,11 +223,132 @@ def create(cls, user, log_file, date):
obj.save()

return obj

@classmethod
def filter_by_collection_and_date(cls, collection_acron2, date):
return cls.objects.filter(
~Q(log_file__status__in=[
choices.LOG_FILE_STATUS_CREATED,
choices.LOG_FILE_STATUS_INVALIDATED
]),
log_file__collection__acron2=collection_acron2,
date=date,
)

@classmethod
def get_number_of_existing_files_for_date(cls, collection_acron2, date):
return cls.objects.filter(
~Q(log_file__status__in=[
choices.LOG_FILE_STATUS_CREATED,
choices.LOG_FILE_STATUS_INVALIDATED
]),
log_file__collection__acron2=collection_acron2,
date=date,
).count()

def __str__(self):
return f'{self.log_file.path}-{self.date}'


class CollectionLogFileDateCount(CommonControlField):
collection = models.ForeignKey(
Collection,
verbose_name=_('Collection'),
on_delete=models.DO_NOTHING,
null=False,
blank=False,
)

date = models.DateField(
_('Date'),
null=False,
blank=False,
)

year = models.IntegerField(
_('Year'),
max_length=4,
null=False,
blank=False,
)

month = models.IntegerField(
_('Month'),
max_length=2,
null=False,
blank=False,
)

existing_log_files = models.IntegerField(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitangainnovare no lugar de existing, usar found, fica mais explícito

verbose_name=_('Number of Existing Valid Log Files'),
max_length=8,
default=0,
)

required_log_files = models.IntegerField(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitangainnovare no lugar de required, usar expected, fica mais explícito

verbose_name=_('Number of Required Valid Log Files'),
max_length=8,
blank=True,
null=True,
)

status = models.CharField(
verbose_name=_('Status'),
choices=choices.COLLECTION_LOG_FILE_DATE_COUNT,
max_length=3,
)

@classmethod
def create_or_update(cls, user, collection, date, required_log_files, existing_log_files):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitangainnovare adote o padrão de ter também get, create, IntegrityError. Se as tarefas executarem concorrentemente registros ficam duplicados

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O seguinte trecho garante que só haja um registro por data e coleção:

image

obj, created = cls.objects.get_or_create(
collection=collection,
date=date,
month=date.month,
year=date.year,
)

if not created:
obj.updated_by = user
obj.updated = datetime.utcnow()
else:
obj.creator = user
obj.created = datetime.utcnow()

obj.required_log_files = required_log_files
obj.existing_log_files = existing_log_files

if existing_log_files < required_log_files:
obj.status = choices.COLLECTION_LOG_FILE_DATE_COUNT_MISSING_FILES
elif existing_log_files > required_log_files:
obj.status = choices.COLLECTION_LOG_FILE_DATE_COUNT_EXTRA_FILES
else:
obj.status = choices.COLLECTION_LOG_FILE_DATE_COUNT_OK

try:
obj.save()
return obj
except IntegrityError:
...

class Meta:
ordering = ['-date']
verbose_name = _("Collection Log File Date Count")
unique_together = (
'collection',
'date',
)

panels = [
AutocompletePanel('collection'),
FieldPanel('date'),
FieldPanel('year'),
FieldPanel('month'),
FieldPanel('existing_log_files'),
FieldPanel('required_log_files'),
FieldPanel('status'),
]


class LogFile(CommonControlField):
hash = models.CharField(_("Hash MD5"), max_length=32, null=True, blank=True, unique=True)

Expand Down
95 changes: 80 additions & 15 deletions log_manager/tasks.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import logging
import os

from django.conf import settings
from django.core.mail import send_mail
from django.contrib.auth import get_user_model
from django.utils.translation import gettext as _

Expand All @@ -20,21 +22,20 @@


@celery_app.task(bind=True, name=_('Discover Logs'))
def task_discover(self, collection_acron2, is_enabled=True, temporal_reference=None, from_date=None, user_id=None, username=None):
def task_discover(self, collection_acron2, is_enabled=True, days_to_go_back=None, from_date=None, user_id=None, username=None):
"""
Task to discover logs.

Parameters:
collection_acron2 (str): Acronym of the collection.
is_enabled (boolean)
temporal_reference (str, optional): Temporal reference for filtering logs (e.g., 'yesterday', 'last week', 'last month').
days_to_go_back (int, optional): Number of days to count backward from the current date (e.g., 1 for yesterday, 7 for a week ago).
from_date (str, optional): Specific date from which logs should be considered (format: 'YYYY-MM-DD').
user_id
username

Raises:
UndefinedCollectionConfigError: If there is no configuration for the logs directory.
InvalidTemporaReferenceError: If the provided temporal reference is invalid.
InvalidDateFormatError: If the provided date format is invalid.

Returns:
Expand All @@ -54,11 +55,8 @@ def task_discover(self, collection_acron2, is_enabled=True, temporal_reference=N
if len(app_config_log_file_formats) == 0:
raise exceptions.UndefinedApplicationConfigError('ERROR. Please, add a Application Config for each of the supported log file formats.')

if temporal_reference:
try:
obj_from_date = utils.temporal_reference_to_datetime(temporal_reference)
except ValueError:
raise exceptions.InvalidTemporaReferenceError('ERROR. The supported temporal references are: yesterday, last week, and last month.')
if days_to_go_back:
obj_from_date = utils.get_date_offset_from_today(days=days_to_go_back)
elif from_date:
try:
obj_from_date = utils.formatted_text_to_datetime(from_date)
Expand All @@ -75,7 +73,7 @@ def task_discover(self, collection_acron2, is_enabled=True, temporal_reference=N
file_path = os.path.join(root, name)
file_ctime = utils.timestamp_to_datetime(os.stat(file_path).st_ctime)

if not (temporal_reference or from_date) or file_ctime > obj_from_date:
if not (days_to_go_back or from_date) or file_ctime > obj_from_date:
task_create_log_file.apply_async(args=(collection_acron2, file_path, user_id, username))


Expand Down Expand Up @@ -134,12 +132,79 @@ def task_validate_log(self, log_file_hash, user_id=None, username=None):

log_file.save()

# TODO:
# Create a method that get all log files related to a collection and a period of time (start and end dates)
# In detail:
# Look at the LogFileDate table to get all the log_file,date pairs about that collection and dates
# Look at the CollectionConfig table to get the number of valid log files expected per day
# Generate a report informing the dates that there are missing files

@celery_app.task(bind=True, name=_('Check Missing Logs for Date'))
def task_check_missing_logs_for_date(self, collection_acron2, date, user_id=None, username=None):
user = _get_user(self.request, username=username, user_id=user_id)
collection = models.Collection.objects.get(acron2=collection_acron2)
n_required_files = models.CollectionConfig.get_number_of_required_files_by_day(collection_acron2=collection_acron2, date=date)
n_existing_logs = models.LogFileDate.get_number_of_existing_files_for_date(collection_acron2=collection_acron2, date=date)

models.CollectionLogFileDateCount.create_or_update(
user=user,
collection=collection,
date=date,
required_log_files=n_required_files,
existing_log_files=n_existing_logs,
)


@celery_app.task(bind=True, name=_('Check Missing Logs for Date Range'))
def task_check_missing_logs_for_date_range(self, start_date, end_date, collection_acron2=None, user_id=None, username=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitangainnovare Use collection_acron2 como lista e não como string

acron2_list = [c.acron2 for c in models.Collection.objects.iterator()] if not collection_acron2 else [c.strip() for c in collection_acron2.split(',')]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitangainnovare crie uma propriedade em Collection para retornar a lista de acron2


for acron2 in acron2_list:
for date in utils.date_range(start_date, end_date):
logging.info(f'CHECKING missings logs for collection {acron2} and date {date}')
task_check_missing_logs_for_date.apply_async(args=(acron2, date, user_id, username))


@celery_app.task(bind=True, name=_('Log Files Count Status Report'))
def task_log_files_count_status_report(self, collection_acron2, user_id=None, username=None):
col = models.Collection.objects.get(acron2=collection_acron2)
subject = _(f'Log Files Report for {col.main_name}')

message = _(f'Dear collection {col.main_name},\n\nThis message is to inform you of the results of the Usage Log Validation service.\n\nHere are the results:\n\n')

missing = models.CollectionLogFileDateCount.objects.filter(status=choices.COLLECTION_LOG_FILE_DATE_COUNT_MISSING_FILES)
extra = models.CollectionLogFileDateCount.objects.filter(status=choices.COLLECTION_LOG_FILE_DATE_COUNT_EXTRA_FILES)
ok = models.CollectionLogFileDateCount.objects.filter(status=choices.COLLECTION_LOG_FILE_DATE_COUNT_OK)
Comment on lines +168 to +170
Copy link
Member

@robertatakenaka robertatakenaka Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitangainnovare No lugar destes 3 comandos, use:

from django.db.models import Count
items = models.CollectionLogFileDateCount.objects.values('status', 'collection').annotate(total=Count('id'))

Obterá o resultado:

>>> {"status": "ok", "total": 10, "collection": "x"}
>>> {"status": "missing", "total": 30, "collection": "x"}
>>> {"status": "extra", "total": 50, "collection": "x"}


if ok.exists() > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitangainnovare exists não retorna bool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sim.

message += _(f'There are {ok.count()} dates with correct log files.\n')

if missing.exists() > 0:
message += _(f'There are {missing.count()} missing log files.\n')

if extra.exists() > 0:
message += _(f'There are {extra.count()} extra log files.\n')

if missing.exists() > 0 or extra.exists() > 0:
message += _(f'Please check the script that shares the logs.\n')

message += _(f'You can view the complete report results at {settings.WAGTAILADMIN_BASE_URL}/admin/snippets/log_manager/collectionlogfiledatecount/?collection={col.pk}>.')

task_send_message.apply_async(args=(subject, message, collection_acron2, user_id, username))


@celery_app.task(bind=True, name=_('Send message'))
def task_send_message(self, subject, message, collection_acron2, user_id=None, username=None):
col_configs = models.CollectionConfig.filter_by_collection_and_config_type(
collection_acron2=collection_acron2,
config_type=choices.COLLECTION_CONFIG_TYPE_EMAIL,
)
if col_configs.count() == 0:
raise exceptions.UndefinedCollectionConfigError("ERROR. Please, add an Application Configuration for the EMAIL attribute.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitangainnovare preparar para usar mensagens traduzíveis


recipient_list = [cc.value for cc in col_configs]

send_mail(
subject=subject,
message=message,
from_email='log_manager@scielo.org',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitangainnovare usar variável de ambiente

recipient_list=recipient_list
)


@celery_app.task(bind=True, name=_('Parse Logs'), timelimit=-1)
def task_parse_logs(self, collection_acron2, user_id=None, username=None):
Expand Down
Loading
Loading