Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docstring checker #237

Merged
merged 21 commits into from
Oct 19, 2023
Merged

Docstring checker #237

merged 21 commits into from
Oct 19, 2023

Conversation

pauladkisson
Copy link
Member

Adds test_docstrings.py to tests directory to traverse through all modules, classes, methods, and functions to ensure that they have docstrings (skips dataclass methods).

@pauladkisson pauladkisson changed the base branch from main to pydocstyle September 7, 2023 19:17
@CodyCBakerPhD
Copy link
Member

Here was my implementation and comparison script for reference and comparison (both must be run in the same environment for the module methods to have the same address in memory)

(2) is your implementation
(1) is mine

Inclusion of top-level package __init__, magic methods overridden in custom classes, and custom __dir__ masking are all up for debate ATM

import inspect
import os
import importlib
from pathlib import Path
from types import ModuleType, FunctionType
from typing import List, Iterable

import nwbinspector


def traverse_class(cls, objs):
    """Traverse a class and its methods and append them to objs."""
    predicate = lambda x: inspect.isfunction(x) or inspect.ismethod(x)
    for name, obj in inspect.getmembers(cls, predicate=predicate):
        objs.append(obj)


def traverse_module(module, objs):
    """Traverse all classes and functions in a module and append them to objs."""
    objs.append(module)
    predicate = lambda x: inspect.isclass(x) or inspect.isfunction(x) or inspect.ismethod(x)
    for name, obj in inspect.getmembers(module, predicate=predicate):
        parent_package = obj.__module__.split(".")[0]
        if parent_package != "nwbinspector":  # avoid traversing external dependencies
            continue
        objs.append(obj)
        if inspect.isclass(obj):
            traverse_class(obj, objs)


def traverse_package(package, objs):
    """Traverse all modules and subpackages in a package to append all members to objs."""
    for child in os.listdir(package.__path__[0]):
        if child.startswith(".") or child == "__pycache__":
            continue
        elif child.endswith(".py"):
            module_name = child[:-3]
            module = importlib.import_module(f"{package.__name__}.{module_name}")
            traverse_module(module, objs)
        elif Path(child).is_dir():  # subpackage - I did change this one line b/c error otherwise when hit a .json
            subpackage = importlib.import_module(f"{package.__name__}.{child}")
            traverse_package(subpackage, objs)


def traverse_class_2(class_object: type, parent: str) -> List[FunctionType]:
    """Traverse the class dictionary and return the methods overridden by this module."""
    class_functions = list()
    for attribute_name, attribute_value in class_object.__dict__.items():
        if isinstance(attribute_value, FunctionType) and attribute_value.__module__.startswith(parent):
            class_functions.append(attribute_value)
    return class_functions


def traverse_module_2(module: ModuleType, parent: str) -> Iterable[FunctionType]:
    """Traverse the module directory and return all submodules, classes, and functions defined along the way."""
    local_modules_classes_and_functions = list()

    for name in dir(module):
        if name.startswith("__") and name.endswith("__"):  # skip all magic methods
            continue

        object_ = getattr(module, name)

        if isinstance(object_, ModuleType) and object_.__package__.startswith(parent):
            submodule = object_

            submodule_functions = traverse_module_2(module=submodule, parent=parent)

            local_modules_classes_and_functions.append(submodule)
            local_modules_classes_and_functions.extend(submodule_functions)
        elif isinstance(object_, type) and object_.__module__.startswith(parent):  # class
            class_object = object_

            class_functions = traverse_class_2(class_object=class_object, parent=parent)

            local_modules_classes_and_functions.append(class_object)
            local_modules_classes_and_functions.extend(class_functions)
        elif isinstance(object_, FunctionType) and object_.__module__.startswith(parent):
            function = object_

            local_modules_classes_and_functions.append(function)

    return local_modules_classes_and_functions


list_1 = list()
traverse_package(package=nwbinspector, objs=list_1)

list_2 = traverse_module_2(module=nwbinspector, parent="nwbinspector")

# Analyze and compare - note that for set comparison, the lists must have been run in the same kernel
# to give all imports the same address in memory
unique_list_1 = set(list_1)
unique_list_2 = set(list_2)

found_by_2_and_not_by_1 = unique_list_2 - unique_list_1

# Summary: A series of nested submodules under `checks` and `tools`; some various private functions scattered around
# not really clear why Paul's missed these

found_by_1_and_not_by_2 = unique_list_1 - unique_list_2

# Summary: All of these are bound methods of the Enum's (Importance/Severity) or JSONEncoder
# and are not methods that we actually override in the codebase (they strictly inherit)
# It did, however, find the outermost package __init__ (does that really need a docstring though?)

Base automatically changed from pydocstyle to main September 7, 2023 20:23
@CodyCBakerPhD
Copy link
Member

It occurred to me as per that realization with Heberto on the other PR - when we actually check whether or not the detected method/class/module has a docstring, we actually want to check if any of its parents have docstrings too

@pauladkisson
Copy link
Member Author

It occurred to me as per that realization with Heberto on the other PR - when we actually check whether or not the detected method/class/module has a docstring, we actually want to check if any of its parents have docstrings too

inspect.getdoc actually already takes care of that (see docs)

@CodyCBakerPhD
Copy link
Member

inspect.getdoc actually already takes care of that (see docs)

Very nice!

@pauladkisson
Copy link
Member Author

Ok, since Cody's implementation is a bit cleaner and catches a few missed functions, I think that's what we should proceed with.

I am skipping all of the magic methods for now, maybe we could include them, but it's probably simplest not to.

I did make some minor tweaks to the organization of the functions (separated traverse_module into traverse_package and traverse_module) based on commonly used definitions of packages and modules in python see here.

Once we're satisfied with this test we can remove the compare_docstring_testers.py script and merge it in!

@CodyCBakerPhD
Copy link
Member

The test file is now super clean, readable, and simple!

Next step, can you move it to a 'common_workflows' folder on https://github.com/catalystneuro/.github?

Also there's a merge conflict here now; I'd also remove the comparison script before merge

@pauladkisson
Copy link
Member Author

Next step, can you move it to a 'common_workflows' folder on https://github.com/catalystneuro/.github?
Also there's a merge conflict here now; I'd also remove the comparison script before merge

Done and done!

@CodyCBakerPhD CodyCBakerPhD merged commit 88372b5 into main Oct 19, 2023
13 checks passed
@CodyCBakerPhD CodyCBakerPhD deleted the docstring_checker branch October 19, 2023 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants