-
Notifications
You must be signed in to change notification settings - Fork 238
2021 Standards Review
Original report: https://app.box.com/file/805769808927
The goal of coding standards is to ensure that software is safe, secure, reliable, testable, maintainable and portable. Having a set of well-documented standards provides guidance on developers on how to write their code which helps with maintaining consistency across the codebase, reducing the amount of effort required to understand and maintain individual elements of the code. However, there also need to be checks in place to ensure the standards are being followed, and standards should be reviewed regularly to ensure that they are still appropriate and that they are being applied consistently.
Over the past five years of development, a number of coding standards have been developed for IDAES, however most of these have been poorly documented and only informally enforced. A summary of the existing coding standards is as follows:
- Naming conventions for thermophysical properties. This is documented in the IDAES User Guide and covers a large number of thermophysical and reaction related properties, and additional properties have been added as required. Compliance with these standard names is loosely enforced by the need for interaction between unit models and thermophysical property models.
- Standard module for universal constants. A module containing a number of common thermophysical constants with Pyomo units has been developed and is documented in the IDAES User Guides, however developer awareness is likely limited and there are likely a number of models/tools using hard-coded constants in the codebase.
-
All code should comply with the PEP-8 Style Guide. This is currently strongly encouraged, but not well documented and only informally enforced through manual review of code during pull requests.
-
All public classes and modules should have Google style doc-strings to explain their use and functionality. This is used in construction of the technical specifications section of the user guide. However, this is not documented, and the only testing of this is that the doc-string successfully parses during construction. However, this does not check for presence (or absence), quality or style (other supported doc-string styles would also pass).
-
All code should have documentation which is included in the IDAES User Guides. However, this is enforced only through manual review of pull requests, and documentation is missing for many modules. Additionally, the current documentation is focusses more on technical specifications rather than providing guidance to a new user, and the organization of the documentation can make it difficult to find the documentation for a given tool.
-
All code should come with tests which are part of the automatic testing suite and should include both simple unit tests (does the code run without errors) and verification tests (does the code produce the correct answer). This is currently only enforced through manual review of pull requests supported by automatic reports of test coverage.
-
Standard formats for storing data. A number of standard formats for storing specific types of data have been defined, primarily based on the JSON file type. However, this is not well documented and does not cover all types of data that are required for the IDAES toolset. There are a number of cases where data has been stored in ad-hoc fashion by individual code developers in a variety of different formats, and a standard form for this data should be developed. Currently defined formats include:
- DMF "resource" (JSON) schema
- Flowsheet JSON export format
- Visualizer JSON import/export formats
- Use of Pyomo CONFIG blocks for defining, documenting and validating arguments to models.
-
Paths and configuration. A standard organization for the codebase has been developed which determines where a model or tool belongs, however this has not been documented and the logic behind the structure may not be immediately apparent in some cases.
As mentioned previously, there are two key aspects to a successfully set of standards; documentation so that developers can understand the expected standards, and enforcement to ensure that the standards are followed. Currently, IDAES is lacking in both these aspects.
The first obvious area of improvement is documentation of the standards, as without documentation it is difficult or impossible for developers to understand what is required and thus guarantees that the standards will not be followed. A single, centralized location for linking to all standards should be created which developers can use as a go-to point for finding and reviewing standards. GitHub repositories provide a wiki page for all repositories, thus it is proposed to use the ideas-pse wiki (https://github.com/IDAES/idaes-pse/wiki) as the central location for the IDAES standards documentation.
Next, it is important to have mechanisms in place to ensure that all standards are being followed. Currently, IDAES relies on manual checking of all code submitted to the repository via the Pull Request review process. However, this is completely dependent on the efforts of the reviewers and their knowledge the standards. Where possible, automated tests should be implemented to enforce standards, such as the following:
- Coding style. Tools exists for automatically checking code for syntax errors and coding style. The IDAES release management team has been working to apply these tools as part of the continuous integration testing procedures. However, this has identified a large number of issues with the current code base (some false positives), thus deployment of these tools will need to be rolled out gradually as these existing issues are fixed.
- Testing of Code. Tools exist for checking the test coverage of the code base (on the basis of number of lines of code hit by tests) and the continuous integration system can be set to fail if a PR would cause the test coverage to decrease significantly. This should be enabled for the IDAES repositories to ensure a gradual improvement in the test coverage. Other opportunities for automatic enforcement of standards may exist and should be investigated.
Standards should be regularly reviewed, both to ensure that they are still suitable for the project or need to be updated, and to ensure that they are being applied in the code base. A formal process should be established to do this, with regularly scheduled reviews of the standards.
A number of areas where standards are lacking or absent have also been identified which need to be addressed.
Testing of code is critical for ensuring that the code is safe and reliable to use and for identifying unforeseen side-effects of changes to the code base. However, writing good tests for code is something of an art form, and documented standards are required to teach new developers how to run the IDAES test suite and write good tests for new code. Within IDAES, there are a number of types of tests that are required; functionality tests, which ensure that the code runs cleanly and creates the expected model objects, and verification tests which ensure that the models give the expected results when solved, and robustness tests which ensure that the models can be applied and solved across a wide range of operating conditions. In all cases, it is necessary to write documentation of the expectations for testing of models as well as how to write good tests (and conversely how not to write tests).In terms of model verification, there are currently an insufficient number of tests. Whilst most of the models in the IDAES libraries were verified during development, in many cases this was only done for a limited number of conditions and not formally documented in the form of a test. A standard documenting the minimum expectations for verification tests needs to be developed which would describe:
-
A minimum number of data points to check and the range which these should cover
-
Acceptable data sources and how these should be documented in tests
- Verification against commercial codes should be discouraged due to licensing issues. E.g. the Aspen Plus user license prohibits use for benchmarking other codes.
-
If possible, an automated method for checking verification coverage of models should be developed to help identify areas requiring improvement.
Robustness testing is another area which has been neglected in testing of IDAES models (at least in a formal sense). This would be partially addressed by verification testing of models across a wide range of conditions, but a more formal process for robustness testing should be implemented. The IDAES toolset already includes a model robustness tool, which should be applied to all models in the libraries as part of the regular testing process. In order to be useful, tests need to be run regularly to identify issues as they arise which requires documentation on how, when and why different types of tests should be run. As IDAES aims to be a cross-platform tool, this also requires testing on multiple operating systems and versions of Python. In addition to this, there is a need to test both the latest version of the tools as well as the last release of the tool in order to ensure that both versions remain functional. This results in a large number of different test workflows as part of the IDAES continuous integration checks, each with different parameters, and the logic behind what combination of parameters are being run needs to be documented. Finally, test coverage and quality should be included as part of the regular standards review process. This should include processes for identifying areas of weak coverage, both in terms of simple line coverage and quality of coverage, and approaches for correcting these deficiencies.
Currently, the IDAES documentation focusses primarily on the deeper technical specification of the tools and models and less on how these would be used in practice. Users have already begun requesting more examples and simple tutorial documentation and have also said that they find it difficult to find documentation for some tools. Additionally, some tools and models do not have any documentation at all, especially the utility functions.
Based on this, it is likely that we need to rethink how we write and organize our documentation, with a focus more on examples and demonstrations of the tools in use. A useful source of information and guidance on writing technical documentation can be found here: https://diataxis.fr/. Notably, this distinguishes four different types of documentation which should be clearly separated, which closely parallels the experience and feedback we have received so far.
An important consideration here however is that writing good documentation (which would involve rewriting a lot of the existing documentation) is a significant amount of work, which we have likely not budgeted for in our FWPs. However, this is also work that is critical if we want to see widespread adoption of the IDAES tools, and we should begin planning how to improve our documentation within the limitation of the resources available.
Now that IDAES has become fully open-source and is gaining users, it is necessary to be aware of the effects of making changes to the codebase and the need to maintain some level of backwards compatibility. Making changes that break users existing workflows and models will generally alienate users, even if said changes occur infrequently. As a result of this, it is necessary to be much more careful when making updates to the code or adding new features, as once something is implemented in a release of the tools it is much harder to change. Thus, a set of formal change management standards are required that cover the development of new tools and models, changes to existing features and expectations of backwards compatibility.
IDAES has not developed a standard for the expectations of backwards compatibility between major and minor version releases. Whilst there has been an informal attempt to remain backwards compatible through the use of deprecation warnings on tools and features that are changing, there has been no formal decision on how longs these should remain in place and when deprecated features should be removed. This is especially important due to the need to keep examples and documentation current for both the most recent release and the latest development version of the tools (not to mention previous releases).A documented standard for what changes are allowable between major and minor releases should be developed, along with common mitigation procedures. This should include processes for deprecating old features and standards for how long deprecated features should be maintained.
A standard should be developed for a formal Request for Comment process that should be followed for any new tool and feature, or any significant change to the code base. As part of this process the following issues need to be considered:
- Documented process by which a new change is proposed and how it is approved or rejected by the development team. This can draw on the experience of the Pyomo team who currently use GitHub issues to propose and discuss changes through “PEP issues”.
- Effects on backwards compatibility. Any change that would affect backwards compatibility needs to be considered, and steps put in place to minimize (or at least justify) this. This could include deprecation warnings and maintaining old APIs/tools or delaying such changes until the release of a new major version of IDAES.
- API and where code should be located within the code base. In the past, the code base and API has grown somewhat organically, which has resulted in some tools that should be in common use and easily accessible being buried deeper in the code base (e.g. the generic property framework). Experience has already shown that moving code around after the fact is a significant disruption, thus it is important to take the time to decide upon the correct location for the code before it is implemented.
- The IDAES Tools set is also highly interdependent, with many tools needing to work closely together. A frequent example of this is the need for new models and tools to support the IDAES scaling tools. The formal RFC process should include discussion of all the key interactions with other tools so that these are thought out ahead of time and included in the initial design.
As an open-source project, it is hoped that IDAES will see contributions for external users. However, external contributions come with a number of additional concerns that need to be kept in mind, including:
- Ownership and copyright of the external code. External users may not be aware of the copyright and licensing issues associated with contributing code to an open-source repository. In some cases, an external user may not be aware that they do not own the copyright to the code they are proposing to contribute, which can cause issues in future if the code needs to be removed. Similarly, care must be taken to ensure that proprietary data does not inadvertently end up in the code base.
- External contributions may have additional copyright notices, funding declarations or disclaimers that must be maintained.
- External contributions may also vary significantly in maturity and contributed code should be clearly separated based on this. This suggests that there should be multiple locations for storing code based on maturity, ranging from low maturity code through to full inclusion in the core code base.
Much of this depends upon the external user proposing the contribution, as it is not possible for the IDAES development team to be aware of all this information. However, there needs to be a process in place to make sure that the external user provides all this information and that it is reviewed before the code is accepted. It is also likely that any contributed code will not meet the IDAES code standards initially, so the acceptance process should include plans for how to see that external code meets the IDAES standards before it is accepted.
For 2021, the key areas for improvement will be documentation of standards and improving testing of the code base.
- A wiki page will be developed for organizing and linking to standards documents, which will be publicized to the IDAES Development team.
- Automatic checking of coding style and test coverage will be implemented as part of the IDAES continuous integration checks.
- Test coverage of the IDAES Code base will be increased, starting with those areas with low test coverage (the surrogates and apps folders). This is an end-of-year milestone in the IDAES Core FWP.
- Verification and robustness testing of the generic model library will be increased. A formal plan for how to process with this will be developed and implemented by the end of the year.
- Set up pre-commit
- Run pytest with coverage report
- Run Pylint locally
- Update the Pyomo version
- Install Pyomo from a local Git clone
- Set up GitHub authentication with GCM
- Handle warnings in pytest