Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix sorting of None values in NMPVCD.do_persist(). #1321

Merged

Conversation

ernstleierzopf
Copy link
Contributor

Make sure these boxes are signed before submitting your Pull Request -- thank you.

Must haves

Fixes #1314

Submission specific

  • This PR introduces breaking changes
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

Describe changes:

PersistenceUtil.store_json(self.persistence_file_name, sorted(list(self.known_values_set)))
try:
PersistenceUtil.store_json(self.persistence_file_name, sorted(list(self.known_values_set),
key=lambda L: tuple(el if el is not None else b'-' for el in L)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why the lambda expression is necessary here. When there is None in the list, sorting will fail.

>>> x = [b'a', None, b'1']
>>> sorted(list(x), key=lambda L: tuple(el if el is not None else b'-' for el in L))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
TypeError: 'NoneType' object is not iterable

Also, when replacing None with '-' this may be problematic if the value observed in the log lines can also be '-' (which is not unusual, e.g. in Apache Access logs). It is therefore important that None is persisted, and not some string replacement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for showing the need to comment this code.
The Lambda function is only used to allow sorting of tuples with None values. This would not be possible otherwise.
The data is later on stored normally as json (None will be null).

The data structure of known_values_set looks different. This test case is covered in the NewMatchPathValueComboDetectorTest.py lines 168-180.

Here is a simple test:
Python 3.8.10 (default, Jul 29 2024, 17:02:10)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

x = {(b"ddd ", b"25538"), (b" pid=", b"25537"), (b"ddd ", None)}
sorted(list(x), key=lambda L: tuple(el if el is not None else b'-' for el in L))
[(b' pid=', b'25537'), (b'ddd ', None), (b'ddd ', b'25538')]

I don't think there is any bug in the code.

Copy link
Contributor

@landauermax landauermax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@landauermax landauermax merged commit 687fb1f into ait-aecid:development Aug 1, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NewMatchPathValueComboDetector: Error in persistence when missing values occur
2 participants