Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link custom type aliases (numeric, etc) in docstrings to their definitions #1693

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kandersolar
Copy link
Member

@kandersolar kandersolar commented Mar 9, 2023

  • Closes [DOC] napoleon type aliases for generic types not enabled #1229
  • I am familiar with the contributing guidelines
  • Tests added
  • Updates entries in docs/sphinx/source/reference for API changes.
  • Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
  • Pull request is nearly complete and ready for detailed review.
  • Maintainer: Appropriate GitHub Labels (including remote-data) and Milestone are assigned to the Pull Request and linked Issue.

Enabling this feature required setting napoleon_preprocess_types = True (sphinx-doc/sphinx#10963) which messes up the rendering for docstrings that are technically malformed. See the rendering of the model parameter in pvlib.irradiance.perez as an example. Leaving this as a draft until I've surveyed the docs build to assess the damage.

@echedey-ls
Copy link
Contributor

echedey-ls commented Mar 13, 2023

TL;DR: I've checked some tools (numpydoc, numpydoc-validation (wrapper of numpydoc) and pydocstyle) to check if any of them can spot these issues. None of them does. I conclude the best chance we have is to modify existing tools (numpydoc exactly) or just revise it.
I also talk about one of my main constraints when doing my first PR, that might be of help.

Intro to the numpydoc tool

(You can skip this part, this is me finding what we can do with what is already invented. Keeping for the sake of explaining a bit of the tools and research process)
Maybe is this numpydoc tool of help?
py -m numpydoc pvlib.irradiance.perez --validate
There's also a tool that was done based on that module to recursively check on docstring objects, pretty nice IMO. To be fair, I'm only checking how the former works.
I did test pydocstyle but that isn't reporting this kind of typos. [After writing edit: Neither is numpydoc, but we can tweak that)

>>> py -m numpydoc pvlib.irradiance.perez --validate
C:\Users\**\pvlib-python\venv\lib\site-packages\numpydoc\docscrape.py:460: UserWarning: potentially wrong underline length...
Returns
-------- in
Determine diffuse irradiance from the sky on a tilted surface using
one of the Perez models.... in the docstring of perez in C:\Users\**\pvlib-python\pvlib\irradiance.py.
  warn(msg)
pvlib.irradiance.perez:GL03:Double line break found; please use only one blank line to separate sections or paragraphs, and do not leave blank lines at the end of docstrings
pvlib.irradiance.perez:SS06:Summary should fit in a single line
pvlib.irradiance.perez:PR01:Parameters {'return_components'} not documented
pvlib.irradiance.perez:PR02:Unknown parameters {'default=False)', 'return_components: bool (optional'}
pvlib.irradiance.perez:PR09:Parameter "surface_tilt" description should finish with "."
pvlib.irradiance.perez:PR08:Parameter "solar_zenith" description should start with a capital letter
pvlib.irradiance.perez:PR09:Parameter "airmass" description should finish with "."
pvlib.irradiance.perez:PR06:Parameter "model" type should use "str" instead of "string"
pvlib.irradiance.perez:PR10:Parameter "return_components" requires a space before the colon separating the parameter name and type
pvlib.irradiance.perez:PR04:Parameter "default=False)" has no type
pvlib.irradiance.perez:SA01:See Also section not found
pvlib.irradiance.perez:EX01:No examples section found

Many of the errors are due to the lack of a whitespace between the parameter/return name and the colon.

My view on the current status of docs

One of the issues that I felt is pretty discouraging when contributing for the first time was writing the docstring. Although you can go to the numpy style, the sphinx guidelines and so on, it's mandatory to build locally (at least) to check it isn't a mess. And when you compare between other functions, there are many inconsistencies, so I did end up doing it based on other docstrings and by trial and error.

I think that enforcing these specific rules or just raising warnings (can be a subset of the rules) with some of these tools would make it very easy to write docstrings.
Btw, this is a broad topic to talk on a PR possibly, but I did not find an issue talking about that. Maybe I should open one or talk in the google groups thread that I will open in shortly.

Some ideas about what/how to improve:

  • Talk about the numpydoc --validate tool in contributing.rst
  • Add CI/CD actions with numpydoc-validation
  • On both cases, the chosen package should be added to the 'doc' extra requires of the setup.py

Solution(s)

After some testing, tries to fix these errors and so on, I found out expressions like the pvlib.irradiance.perez string "bool (optional, default False)" are not checked by the validator.

  1. However, numpydoc already provides a broad and intuitive API to work with the parser, so we can regex over the section "parameters" or "return" values from the NumpyDocString class and/or modify the validate func as we need, in order to find all docstrings problems like this one.

    That could make another good GSoC project I think...
    I'll keep you updated on that, although I begin taking exams, so my availability is decreasing.

  2. Ofc, I think we all can visually inspect the whole docs, but that might be a lot of work put into a small thing like this and is prone to miss the typos. IMO, this is also a good idea, if it is a shared work among everyone.

@kandersolar
Copy link
Member Author

Thanks @echedey-ls for looking into docstring linters in support of this PR! Too bad none of these linters turned out to be directly applicable to this specific PR. Adding custom regexes is an interesting idea, and potentially a good one in the long term, but I doubt we could assemble a comprehensive set of rules that finds all of our errors without doing a manual survey of our docs anyway. So I'd say just inspecting the docs manually is the way to go here, even though it will be tedious.

Btw, this is a broad topic to talk on a PR possibly, but I did not find an issue talking about that. Maybe I should open one

Yes please! I'm interested in the idea of adding a docstring linter to our CI jobs outside the context of this PR.

@adriesse
Copy link
Member

In principle I would prefer to have relatively simple and hard to misunderstand terminology that doesn't require a link to a definition for the majority of users. But if this works, why not.

@kandersolar kandersolar mentioned this pull request Jul 19, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] napoleon type aliases for generic types not enabled
3 participants