Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #83

Merged
merged 16 commits into from
Jun 20, 2024
Merged

Develop #83

merged 16 commits into from
Jun 20, 2024

Conversation

kavanase
Copy link
Member

No description provided.

Copy link

coderabbitai bot commented Jun 19, 2024

Walkthrough

This update focuses on refining and expanding the doped library, mainly through improvements in parameter naming, clarification of function documentation, and enhancing code functionality. Key highlights include changes in band gap reference handling, modifications in the parameter structures for certain functions, and introduction of new functions for better defect and thermodynamic analysis. Moreover, updates were made to research paper citations and their context within the documentation.

Changes

File / Path Change Summary
README.md Updated research paper title and publication venue.
docs/index.rst Changed research paper title citation.
doped/analysis.py Updated parameter from bulk_band_gap_path to bulk_band_gap_vr in function.
doped/core.py Clarified usage of VBM eigenvalue in functions formation_energy and equilibrium_concentration.
doped/generation.py Modified supercell_gen_kwargs parameter initialization and handling.
doped/thermodynamics.py Various updates to vbm parameter handling, addition of calculate_md5 function.
doped/utils/symmetry.py Refactored logic for lattice mappings and transformations.
tests/data/CsPbCl3_supercell_POSCAR Added new file for atomic coordinates in supercell structure.
tests/test_analysis.py Reflected parameter changes and updated assertions in test functions.
tests/test_generation.py Added new structure initialization and test cases for supercell generation.
tests/test_thermodynamics.py Updated imports, method calls, and assertions for DOS calculations.

Poem

🐇
In the depths of code, I find delight,
With band gaps clear and parameters bright.
Supercells dance, atoms in a row,
Thermoelectric dreams in an ordered flow.
Pure symmetry, with matrixes so grand,
A rabbit's joy in this code-filled land. 🌟

Warning

Review ran into problems

Problems (1)
  • Git: Failed to clone repository. Please contact CodeRabbit support.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Outside diff range and nitpick comments (8)
tests/test_thermodynamics.py (2)

23-23: Consider organizing imports according to PEP 8 standards: standard library imports, third-party imports, and local application/library specific imports should be grouped separately.


Line range hint 2099-2120: The loop iterating over bulk_dos variants should explicitly handle different input types (string vs. DOS object) to avoid runtime errors. Additionally, the assertion for the mean of quenched Fermi levels should have a clear comment explaining its purpose.

for i, bulk_dos in enumerate([k10_dos_vr_path, get_vasprun(k10_dos_vr_path, parse_dos=True)]):
    print(f"Testing k10 DOS with thermo for {'str input' if i == 0 else 'DOS object input'}")
    quenched_fermi_levels = []
    for anneal_temp in self.reduced_anneal_temperatures:
        gap_shift = belas_linear_fit(anneal_temp) - 1.5
        try:
            (
                fermi_level,
                e_conc,
                h_conc,
                conc_df,
            ) = self.defect_thermo.get_quenched_fermi_level_and_concentrations(
                bulk_dos=bulk_dos,
                limit="Te-rich",
                annealing_temperature=anneal_temp,
                delta_gap=gap_shift,
            )
            quenched_fermi_levels.append(fermi_level)
        except Exception as e:
            print(f"Error processing bulk_dos: {str(e)}")
            continue

    # Ensure the Fermi levels are within expected bounds
    # This checks the consistency of Fermi level calculations across different DOS calculations.
    assert np.isclose(np.mean(quenched_fermi_levels[6:8]), 0.31825, atol=1e-3), "Fermi levels out of expected range"
doped/core.py (1)

Line range hint 1-3000: Optimize the handling of exceptions and error messages.

Throughout the file, there are multiple instances where exceptions are caught broadly (e.g., except Exception). This practice can obscure the source of errors and make debugging more difficult. Refine the exception handling by catching more specific exceptions. Additionally, enhance the error messages to provide more context about the error, helping in quicker resolution and better maintainability.

tests/test_analysis.py (5)

228-230: Consider adding docstrings to the setUp method to explain its purpose and the resources it sets up for the tests.


Line range hint 232-236: The tearDown method should include error handling to ensure that file operations do not cause the test to fail unexpectedly. Use try-except blocks around file operations.

- shutil.move(
+ try:
+     shutil.move(
- f"{self.v_Cd_m2_path}/OUTCAR.gz", f"{self.v_Cd_m2_path}/hidden_otcr.gz")  # use FNV
+ f"{self.v_Cd_m2_path}/OUTCAR.gz", f"{self.v_Cd_m2_path}/hidden_otcr.gz")  # use FNV
+ except Exception as e:
+     print(f"Error during file operation: {e}")

Line range hint 238-242: The test method test_parsing_cdte lacks a docstring. Adding a brief description of what the test aims to verify would improve code readability and maintainability.


Line range hint 244-252: The method test_kumagai_order is testing the robustness of the Kumagai defect correction against different orderings of atoms in the input files. However, there's no assertion here to verify the outcome of the test. Consider adding assertions to ensure that the parsed energies or corrections meet expected values.

+ # Example assertion to verify the energy correction
+ expected_correction = <expected_value>
+ assert np.isclose(parsed_v_cd_m2_orig.corrections['kumagai_charge_correction'], expected_correction, atol=1e-3)

Line range hint 254-258: The method test_freysoldt_order is testing the Freysoldt defect correction parser. Similar to the previous method, it lacks assertions to verify the correctness of the outcomes. It's crucial to add assertions to confirm that the corrections are as expected.

+ # Example assertion to verify the energy correction
+ expected_correction = <expected_value>
+ assert np.isclose(parsed_v_cd_m2_orig.corrections['freysoldt_charge_correction'], expected_correction, atol=1e-3)
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between eaa077c and 015cb5f.

Files ignored due to path filters (2)
  • examples/CdTe/CdTe_prim_k181818_NKRED_2_vasprun.xml.gz is excluded by !**/*.gz
  • tests/data/remote_baseline_plots/CdTe_LZ_Te_rich_concentrations.png is excluded by !**/*.png
Files selected for processing (11)
  • README.md (1 hunks)
  • docs/index.rst (1 hunks)
  • doped/analysis.py (16 hunks)
  • doped/core.py (2 hunks)
  • doped/generation.py (4 hunks)
  • doped/thermodynamics.py (31 hunks)
  • doped/utils/symmetry.py (3 hunks)
  • tests/data/CsPbCl3_supercell_POSCAR (1 hunks)
  • tests/test_analysis.py (8 hunks)
  • tests/test_generation.py (5 hunks)
  • tests/test_thermodynamics.py (6 hunks)
Files skipped from review due to trivial changes (1)
  • docs/index.rst
Additional context used
LanguageTool
README.md

[style] ~8-~8: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’. (HYPHENATED_LY_ADVERB_ADJECTIVE)
Context: ...ducible, user-friendly yet powerful and fully-customisable manner. Tutorials showing the code fun...


[style] ~15-~15: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’. (HYPHENATED_LY_ADVERB_ADJECTIVE)
Context: ...ures All features and functionality are fully-customisable: - Supercell Generation: Generate a...


[style] ~23-~23: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’. (HYPHENATED_LY_ADVERB_ADJECTIVE)
Context: ...ections, etc. - Python Interface: Fully-customisable, modular Python API. Plug-and-play w/...


[uncategorized] ~54-~54: If this is a compound adjective that modifies the following noun, use a hyphen. (EN_COMPOUND_ADJECTIVE_INTERNAL)
Context: ...i.org/10.21105/joss.06433). Journal of Open Source Software 9 (96), 6433, 2024 ## `S...

Additional comments not posted (16)
tests/data/CsPbCl3_supercell_POSCAR (1)

1-143: The POSCAR file format appears correct with proper lattice vectors and atomic positions specified.

README.md (1)

71-71: The updated title and publication venue for W. Dou et al.'s research paper are correctly formatted and match the provided summary.

doped/utils/symmetry.py (1)

Line range hint 304-326: Refine the lattice matrix sorting logic in _rotate_and_get_supercell_matrix.
[REFACTOR_SUGGESTion]

-                possible_mappings,
-                key=lambda x: (
-                    _lattice_matrix_sorting_func(x[1].T @ x[2]),
-                    _lattice_matrix_sorting_func(x[2]),
-                    _lattice_matrix_sorting_func(x[1]),
-                ),
+                possible_mappings, key=lambda x: (
+                    _lattice_matrix_sorting_func(x[1].T @ x[2]),  # R*S
+                    _lattice_matrix_sorting_func(x[2]),  # S
+                    _lattice_matrix_sorting_func(x[1]),  # R
+                ),

The sorting function for lattice matrix mappings has been refined. The refactor makes the sorting criteria clearer by explicitly commenting on what each part of the tuple represents (R*S, S, R). This enhances readability and maintainability.

doped/core.py (1)

Line range hint 1-3000: Review the implementation of defect-specific classes.

The Vacancy, Substitution, and Interstitial classes are well-implemented, providing specific functionalities for different types of defects. The use of inheritance and overriding of methods is correctly applied, ensuring that each defect type behaves as expected while still leveraging shared functionalities from the Defect class.

doped/generation.py (3)

1204-1212: The default settings for supercell_gen_kwargs are well-defined and clear.

However, consider documenting these default values in the class docstring for better visibility to users.


1268-1270: The unpacking of supercell_gen_kwargs using ** directly in the function call is efficient.

This approach keeps the function call clean and leverages Python's unpacking feature effectively.


1277-1277: Check the logical condition for structure size comparison.

#!/bin/bash
# Description: Verify the logic that checks if the input structure size meets the requirements.

# Test: Search for the function usage and ensure the logic is correctly implemented across the codebase.
rg --type python $'input_min_image_distance >= specified_min_image_distance and (primitive_structure * supercell_matrix).num_sites >= self.structure.num_sites'

Ensure that the comparison logic aligns with the intended functionality and that there are no edge cases overlooked.

doped/analysis.py (3)

373-373: Update parameter to accept both string and Vasprun object types for flexibility.

This change enhances the flexibility of the defect_entry_from_paths function by allowing the bulk_band_gap_vr parameter to accept either a string path or a Vasprun object directly. This should simplify the function's usage in different contexts where either the path or the object might already be available, reducing the need for redundant loading operations.

Also applies to: 414-426


465-465: Ensure consistent handling of bulk_band_gap_vr across different classes.

The consistent update across different initializers in the DefectsParser class to support the bulk_band_gap_vr parameter is a good practice. This ensures that wherever the class is instantiated, there is flexibility in specifying the bulk band gap data, which is crucial for accurate defect analysis.

Also applies to: 533-551


1181-1182: Ensure dynamic descriptions in progress bars during multiprocessing.

The dynamic setting of descriptions in the progress bars during multiprocessing parsing provides better visibility into the current state of processing, which is particularly useful in debugging or monitoring long-running parsing operations.

doped/thermodynamics.py (1)

452-461: Clarify the usage of the vbm parameter in the DefectThermodynamics class constructor.
[REFACTOR_Suggestion]

                VBM eigenvalue to use as Fermi level reference point for analysis.
                If None (default), will use ``"vbm"`` from the ``calculation_metadata``
                dict attributes of the parsed ``DefectEntry`` objects, which by default
                is taken from the bulk supercell VBM (unless ``bulk_band_gap_vr`` is set
                during defect parsing).
                Note that ``vbm`` should only affect the reference for the Fermi level
                values output by ``doped`` (as this VBM eigenvalue is used as the zero
                reference), thus affecting the position of the band edges in the defect
                formation energy plots and doping window / dopability limit functions,
                and the reference of the reported Fermi levels.
tests/test_generation.py (5)

694-695: Ensure that the file path is correctly set and consider adding error handling for file loading.


1472-1472: Addition of interstitial coordinates appears consistent and correct.


1587-1587: Consistent addition of interstitial coordinates.


3273-3284: The testing for defect generation is well-implemented. Consider clarifying the comment about adjusting differences to make outputs equal.


3332-3385: Robust testing for handling unrecognized keyword arguments. Good use of specific error messages to ensure clarity in test failures.
[APROVED]

@@ -1048,7 +1048,7 @@ def __init__(
interstitial_coords: Optional[list] = None,
generate_supercell: bool = True,
charge_state_gen_kwargs: Optional[dict] = None,
supercell_gen_kwargs: Optional[dict] = None,
supercell_gen_kwargs: Optional[dict[str, Union[int, float, bool]]] = None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure proper type hinting for the supercell_gen_kwargs parameter.

- supercell_gen_kwargs: Optional[dict[str, Union[int, float, bool]]] = None,
+ supercell_gen_kwargs: Optional[Dict[str, Union[int, float, bool]]] = None,

Please import Dict from typing to maintain consistency and clarity in type annotations.

Committable suggestion was skipped due to low confidence.

Comment on lines +923 to +930
VBM eigenvalue to use as Fermi level reference point for calculating
formation energy. If ``None`` (default), will use ``"vbm"`` from the
``calculation_metadata`` dict attribute if present -- which corresponds
to the VBM of the `bulk supercell` calculation by default, unless
``bulk_band_gap_vr`` is set during defect parsing).
fermi_level (float):
Value corresponding to the electron chemical potential,
referenced to the VBM. Default is 0 (i.e. the VBM).
Value corresponding to the electron chemical potential, referenced
to the VBM eigenvalue. Default is 0 (i.e. the VBM).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the documentation to clarify the new behavior of VBM and Fermi level calculation.

The documentation for the formation_energy method has been updated to reflect changes in handling the VBM eigenvalue as the reference point for calculating formation energy. Ensure that the new behavior is clearly documented, especially the conditions under which the VBM eigenvalue is used or not used.


Ensure consistency in the handling of VBM across different methods.

The handling of the VBM eigenvalue as a reference point for calculations is crucial for accurate defect analysis. It appears in both the formation_energy and equilibrium_concentration methods. Consistency in its usage across these methods is essential for the integrity of the calculations. Consider refactoring to centralize the handling of VBM to ensure consistency and reduce potential errors.

Also applies to: 1113-1117

Comment on lines +1113 to +1117
VBM eigenvalue to use as Fermi level reference point for calculating
the formation energy. If ``None`` (default), will use ``"vbm"`` from the
``calculation_metadata`` dict attribute if present -- which corresponds
to the VBM of the `bulk supercell` calculation by default, unless
``bulk_band_gap_vr`` is set during defect parsing).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify the VBM handling in the equilibrium_concentration method.

Similar to the formation_energy method, the equilibrium_concentration method documentation has been updated to reflect changes in handling the VBM eigenvalue. It's crucial to ensure that these changes are accurately described and that the conditions under which the VBM eigenvalue is used or not used are clear.

Comment on lines 359 to 370
transformation_matrix = target_struct.lattice.matrix @ np.linalg.inv(prim_struct.lattice.matrix)
if not np.allclose(np.rint(transformation_matrix), transformation_matrix, atol=1e-3):
raise ValueError # if non-integer transformation matrix

return prim_struct, transformation_matrix
return prim_struct, np.rint(transformation_matrix)

except ValueError: # if non-integer transformation matrix
prim_struct, transformation_matrix = _rotate_and_get_supercell_matrix(prim_struct, target_struct)

if np.allclose(np.rint(transformation_matrix), transformation_matrix, atol=1e-3):
return prim_struct, np.rint(transformation_matrix)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for non-integer transformation matrices in _get_supercell_matrix_and_possibly_rotate_prim.

-        transformation_matrix = target_struct.lattice.matrix @ np.linalg.inv(prim_struct.lattice.matrix)
-        if not np.allclose(np.rint(transformation_matrix), transformation_matrix, atol=1e-3):
-            raise ValueError  # if non-integer transformation matrix
+        transformation_matrix = np.dot(target_struct.lattice.matrix, np.linalg.inv(prim_struct.lattice.matrix))
+        if not np.all(np.isclose(transformation_matrix, np.rint(transformation_matrix), atol=1e-3)):
+            raise ValueError("Non-integer transformation matrix detected.")

The matrix multiplication has been changed from @ to np.dot for consistency with other numpy operations in the codebase. Additionally, the condition to check for non-integer values now explicitly uses np.all with np.isclose to ensure that all elements meet the criterion, enhancing the robustness of the check. The error raised now includes a message to clarify the reason for the exception.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
transformation_matrix = target_struct.lattice.matrix @ np.linalg.inv(prim_struct.lattice.matrix)
if not np.allclose(np.rint(transformation_matrix), transformation_matrix, atol=1e-3):
raise ValueError # if non-integer transformation matrix
return prim_struct, transformation_matrix
return prim_struct, np.rint(transformation_matrix)
except ValueError: # if non-integer transformation matrix
prim_struct, transformation_matrix = _rotate_and_get_supercell_matrix(prim_struct, target_struct)
if np.allclose(np.rint(transformation_matrix), transformation_matrix, atol=1e-3):
return prim_struct, np.rint(transformation_matrix)
transformation_matrix = np.dot(target_struct.lattice.matrix, np.linalg.inv(prim_struct.lattice.matrix))
if not np.all(np.isclose(transformation_matrix, np.rint(transformation_matrix), atol=1e-3)):
raise ValueError("Non-integer transformation matrix detected.")
return prim_struct, np.rint(transformation_matrix)
except ValueError: # if non-integer transformation matrix
prim_struct, transformation_matrix = _rotate_and_get_supercell_matrix(prim_struct, target_struct)
if np.allclose(np.rint(transformation_matrix), transformation_matrix, atol=1e-3):
return prim_struct, np.rint(transformation_matrix)

Comment on lines +1959 to +1963
cls.fermi_dos = get_fermi_dos(
os.path.join(cls.CdTe_EXAMPLE_DIR, "CdTe_prim_k181818_NKRED_2_vasprun.xml.gz")
)
cls.anneal_temperatures = np.arange(200, 1401, 50)
cls.reduced_anneal_temperatures = np.arange(200, 1401, 100) # for quicker testing
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fermi_dos attribute initialization should handle potential exceptions if the file path or file format is incorrect. Consider adding error handling around these operations.

- cls.fermi_dos = get_fermi_dos(
+ try:
+     cls.fermi_dos = get_fermi_dos(
          os.path.join(cls.CdTe_EXAMPLE_DIR, "CdTe_prim_k181818_NKRED_2_vasprun.xml.gz")
+     )
+ except Exception as e:
+     cls.fermi_dos = None
+     print(f"Failed to load fermi DOS: {str(e)}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cls.fermi_dos = get_fermi_dos(
os.path.join(cls.CdTe_EXAMPLE_DIR, "CdTe_prim_k181818_NKRED_2_vasprun.xml.gz")
)
cls.anneal_temperatures = np.arange(200, 1401, 50)
cls.reduced_anneal_temperatures = np.arange(200, 1401, 100) # for quicker testing
try:
cls.fermi_dos = get_fermi_dos(
os.path.join(cls.CdTe_EXAMPLE_DIR, "CdTe_prim_k181818_NKRED_2_vasprun.xml.gz")
)
except Exception as e:
cls.fermi_dos = None
print(f"Failed to load fermi DOS: {str(e)}")
cls.anneal_temperatures = np.arange(200, 1401, 50)
cls.reduced_anneal_temperatures = np.arange(200, 1401, 100) # for quicker testing

Comment on lines +598 to +601
if bulk_band_gap_vr and not isinstance(bulk_band_gap_vr, Vasprun):
self.bulk_band_gap_vr = get_vasprun(bulk_band_gap_vr, parse_projected_eigen=False)
else:
self.bulk_band_gap_vr = bulk_band_gap_vr
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properly handle the type of bulk_band_gap_vr when setting the attribute.

- if bulk_band_gap_vr and not isinstance(bulk_band_gap_vr, Vasprun):
-     self.bulk_band_gap_vr = get_vasprun(bulk_band_gap_vr, parse_projected_eigen=False)
- else:
-     self.bulk_band_gap_vr = bulk_band_gap_vr
+ self.bulk_band_gap_vr = get_vasprun(bulk_band_gap_vr, parse_projected_eigen=False) if isinstance(bulk_band_gap_vr, str) else bulk_band_gap_vr

The handling of bulk_band_gap_vr should be streamlined to avoid unnecessary checks and to ensure that the Vasprun object is correctly handled regardless of its initial form. The suggested change simplifies the logic, making it more readable and less prone to errors.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if bulk_band_gap_vr and not isinstance(bulk_band_gap_vr, Vasprun):
self.bulk_band_gap_vr = get_vasprun(bulk_band_gap_vr, parse_projected_eigen=False)
else:
self.bulk_band_gap_vr = bulk_band_gap_vr
self.bulk_band_gap_vr = get_vasprun(bulk_band_gap_vr, parse_projected_eigen=False) if isinstance(bulk_band_gap_vr, str) else bulk_band_gap_vr

Comment on lines 1350 to 1384
def _parse_fermi_dos(
self, bulk_dos: Union[str, Vasprun, FermiDos], skip_check: bool = False
) -> FermiDos:
if isinstance(bulk_dos, FermiDos):
fdos = bulk_dos
# most similar settings to Vasprun.eigenvalue_band_properties:
fdos_vbm = fdos.get_cbm_vbm(tol=1e-4, abs_tol=True)[1] # tol 1e-4 is lowest possible, as VASP
fdos_band_gap = fdos.get_gap(tol=1e-4, abs_tol=True) # rounds the DOS outputs to 4 dp

if isinstance(bulk_dos, str):
bulk_dos = get_vasprun(bulk_dos, parse_dos=True) # converted to fdos in next block

if isinstance(bulk_dos_vr, str):
bulk_dos_vr = get_vasprun(bulk_dos_vr, parse_dos=True)
if isinstance(bulk_dos, Vasprun): # either supplied Vasprun or parsed from string there
fdos_band_gap, _cbm, fdos_vbm, _ = bulk_dos.eigenvalue_band_properties
fdos = get_fermi_dos(bulk_dos)

fermi_dos_band_gap, _cbm, fermi_dos_vbm, _ = bulk_dos_vr.eigenvalue_band_properties
if abs(fermi_dos_vbm - self.vbm) > 0.1:
if abs(fdos_vbm - self.vbm) > 0.05 and not skip_check:
warnings.warn(
f"The VBM eigenvalue of the bulk DOS calculation ({fermi_dos_vbm:.2f} eV, with band "
f"gap of {fermi_dos_band_gap:.2f} eV) differs from that of the bulk supercell "
f"calculations ({self.vbm:.2f} eV, with band gap of {self.band_gap:.2f} eV) by more "
f"than 0.1 eV. If this is only due to slight differences in kpoint sampling for the "
f"bulk DOS vs defect supercell calculations, and consistent functional settings "
f"(LHFCALC, AEXX etc) were used, then the eigenvalue references should be consistent "
f"and this warning can be ignored. If not, then this could lead to inaccuracies in "
f"the predicted Fermi level. Note that the Fermi level will be referenced to the VBM "
f"of the bulk supercell (i.e. DefectThermodynamics.vbm)"
f"The VBM eigenvalue of the bulk DOS calculation ({fdos_vbm:.2f} eV, band gap = "
f"{fdos_band_gap:.2f} eV) differs by >0.05 eV from `DefectThermodynamics.vbm/gap` "
f"({self.vbm:.2f} eV, band gap = {self.band_gap:.2f} eV; which are taken from the bulk "
f"supercell calculation by default, unless `bulk_band_gap_vr` is set during defect "
f"parsing). If this is only due to differences in kpoint sampling for the bulk DOS vs "
f"supercell calculations, then you should use the `bulk_band_gap_vr` option during "
f"defect parsing to set the bulk band gap and VBM eigenvalue "
f"(`DefectThermodynamics.gap/vbm`) to the correct values (though the absolute values of "
f"predictions should not be affected as the eigenvalue references in the calculations "
f"are consistent, just the reported Fermi levels will be referenced to "
f"`DefectThermodynamics.vbm` which may not be the exact VBM position here).\n"
f"Otherwise if this is due to changes in functional settings (LHFCALC, AEXX etc), then "
f"the calculations should be redone with consistent settings to ensure accurate "
f"predictions.\n"
f"Note that the Fermi level will be always referenced to `DefectThermodynamics.vbm`!"
)
return FermiDos(bulk_dos_vr.complete_dos, nelecs=get_nelect_from_vasprun(bulk_dos_vr))
return fdos
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider consolidating the conditions for checking the instance types of bulk_dos.

def _parse_fermi_dos(
    self, bulk_dos: Union[str, Vasprun, FermiDos], skip_check: bool = False
) -> FermiDos:
    if isinstance(bulk_dos, FermiDos):
        fdos = bulk_dos
        # most similar settings to Vasprun.eigenvalue_band_properties:
        fdos_vbm = fdos.get_cbm_vbm(tol=1e-4, abs_tol=True)[1]  # tol 1e-4 is lowest possible, as VASP
        fdos_band_gap = fdos.get_gap(tol=1e-4, abs_tol=True)  # rounds the DOS outputs to 4 dp
    if isinstance(bulk_dos, str):
        bulk_dos = get_vasprun(bulk_dos, parse_dos=True)  # converted to fdos in next block
    if isinstance(bulk_dos, Vasprun):  # either supplied Vasprun or parsed from string there
        fdos_band_gap, _cbm, fdos_vbm, _ = bulk_dos.eigenvalue_band_properties
        fdos = get_fermi_dos(bulk_dos)
    if abs(fdos_vbm - self.vbm) > 0.05 and not skip_check:
        warnings.warn(
            f"The VBM eigenvalue of the bulk DOS calculation ({fdos_vbm:.2f} eV, band gap = "
            f"{fdos_band_gap:.2f} eV) differs by >0.05 eV from `DefectThermodynamics.vbm/gap` "
            f"({self.vbm:.2f} eV, band gap = {self.band_gap:.2f} eV; which are taken from the bulk "
            f"supercell calculation by default, unless `bulk_band_gap_vr` is set during defect "
            f"parsing). If this is only due to differences in kpoint sampling for the bulk DOS vs "
            f"supercell calculations, then you should use the `bulk_band_gap_vr` option during "
            f"defect parsing to set the bulk band gap and VBM eigenvalue "
            f"(`DefectThermodynamics.gap/vbm`) to the correct values (though the absolute values of "
            f"predictions should not be affected as the eigenvalue references in the calculations "
            f"are consistent, just the reported Fermi levels will be referenced to "
            f"`DefectThermodynamics.vbm` which may not be the exact VBM position here).\n"
            f"Otherwise if this is due to changes in functional settings (LHFCALC, AEXX etc), then "
            f"the calculations should be redone with consistent settings to ensure accurate "
            f"predictions.\n"
            f"Note that the Fermi level will be always referenced to `DefectThermodynamics.vbm`!"
        )
    return fdos

Committable suggestion was skipped due to low confidence.

Comment on lines 55 to 69
def calculate_md5(file_path):
"""
Calculate the MD5 hash of a file.

Used for checking if the pimitive cell DOS ``vasprun.xml(.gz)``
file has already been parsed, to avoid wasting time re-parsing.

https://www.geeksforgeeks.org/finding-md5-of-files-recursively-in-directory-in-python/
"""
hasher = hashlib.md5()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hasher.update(chunk)
return hasher.hexdigest()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a more detailed docstring to calculate_md5.

def calculate_md5(file_path):
    """
    Calculate the MD5 hash of a file.
+   This function is useful for verifying the integrity of files by comparing their hashes.
+   It reads the file in chunks to handle large files efficiently.
    Used for checking if the primitive cell DOS ``vasprun.xml(.gz)``
    file has already been parsed, to avoid wasting time re-parsing.
    https://www.geeksforgeeks.org/finding-md5-of-files-recursively-in-directory-in-python/
    """
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def calculate_md5(file_path):
"""
Calculate the MD5 hash of a file.
Used for checking if the pimitive cell DOS ``vasprun.xml(.gz)``
file has already been parsed, to avoid wasting time re-parsing.
https://www.geeksforgeeks.org/finding-md5-of-files-recursively-in-directory-in-python/
"""
hasher = hashlib.md5()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hasher.update(chunk)
return hasher.hexdigest()
def calculate_md5(file_path):
"""
Calculate the MD5 hash of a file.
This function is useful for verifying the integrity of files by comparing their hashes.
It reads the file in chunks to handle large files efficiently.
Used for checking if the primitive cell DOS ``vasprun.xml(.gz)``
file has already been parsed, to avoid wasting time re-parsing.
https://www.geeksforgeeks.org/finding-md5-of-files-recursively-in-directory-in-python/
"""

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Outside diff range and nitpick comments (3)
doped/thermodynamics.py (3)

Line range hint 1371-1475: Refactor the get_equilibrium_fermi_level method to reduce complexity.

The get_equilibrium_fermi_level method is quite complex and handles multiple scenarios and configurations. Consider breaking this method into smaller, more focused methods to improve readability and maintainability. For example, handling different types of bulk_dos inputs could be separated into distinct methods or functions.


Line range hint 1527-1868: Enhance documentation for complex methods.

The method get_quenched_fermi_level_and_concentrations is complex and multifunctional, which is great for flexibility but can be challenging to understand quickly. Enhancing the documentation with examples or more detailed explanations of the parameters and their interactions could greatly aid users in utilizing this method effectively.


3121-3130: Standardize documentation format.

The documentation for the get_fermi_dos function should follow the same formatting as other docstrings in the file for consistency. This includes a more detailed parameter description and potentially a small example or a more detailed return description.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 015cb5f and edbed6d.

Files ignored due to path filters (2)
  • tests/data/remote_baseline_plots/CdTe_LZ_Te_rich_Fermi_levels.png is excluded by !**/*.png
  • tests/data/remote_baseline_plots/CdTe_LZ_Te_rich_concentrations.png is excluded by !**/*.png
Files selected for processing (6)
  • CHANGELOG.rst (1 hunks)
  • docs/conf.py (1 hunks)
  • doped/thermodynamics.py (29 hunks)
  • doped/utils/symmetry.py (3 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_generation.py (12 hunks)
Files skipped from review due to trivial changes (2)
  • docs/conf.py
  • pyproject.toml
Files skipped from review as they are similar to previous changes (2)
  • doped/utils/symmetry.py
  • tests/test_generation.py
Additional comments not posted (3)
CHANGELOG.rst (3)

8-8: Ensure the description clearly explains the improvements made to the reverse-supercell-matrix determination.


6-7: Ensure the description accurately reflects the changes discussed in the linked pull request.

#!/bin/bash
# Description: Verify the accuracy of the link and the description of the changes.

# Test: Check if the link is reachable and contains relevant content.
curl -o /dev/null --silent --head --fail https://github.com/materialsproject/pymatgen/pull/3879

9-11: Confirm the refactor from bulk_band_gap_path to bulk_band_gap_vr is consistently applied across the codebase.

#!/bin/bash
# Description: Verify the consistency of the refactor across the codebase.

# Test: Search for any remaining instances of the old parameter name.
rg --type rst --type py 'bulk_band_gap_path'

Comment on lines +435 to +444
VBM eigenvalue to use as Fermi level reference point for analysis.
If None (default), will use ``"vbm"`` from the ``calculation_metadata``
dict attributes of the parsed ``DefectEntry`` objects, which by default
is taken from the bulk supercell VBM (unless ``bulk_band_gap_vr`` is set
during defect parsing).
Note that ``vbm`` should only affect the reference for the Fermi level
values output by ``doped`` (as this VBM eigenvalue is used as the zero
reference), thus affecting the position of the band edges in the defect
formation energy plots and doping window / dopability limit functions,
and the reference of the reported Fermi levels.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify the description of the vbm parameter in the __init__ method.

The description for the vbm parameter in the __init__ method is slightly misleading. It mentions that the vbm is used as a reference for the Fermi level, which might confuse users into thinking it alters the Fermi level directly. It would be clearer to state that it adjusts the reference point for Fermi level calculations in the defect formation energy plots.

@@ -463,7 +469,7 @@ def __init__(
el_refs (dict):
Dictionary of elemental reference energies for the chemical potentials.
vbm (float):
VBM eigenvalue in the bulk supercell, to use as Fermi level reference.
VBM energy to use as Fermi level reference point for analysis.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure consistency in the parameter description across the class.

The description of the vbm parameter in the class attributes is slightly different from that in the __init__ method. It's crucial to maintain consistency to avoid confusion. Consider aligning the descriptions to ensure that they convey the same information.

Comment on lines +1349 to +1365
if abs(fdos_vbm - self.vbm) > 0.05 and not skip_check:
warnings.warn(
f"The VBM eigenvalue of the bulk DOS calculation ({fermi_dos_vbm:.2f} eV, with band "
f"gap of {fermi_dos_band_gap:.2f} eV) differs from that of the bulk supercell "
f"calculations ({self.vbm:.2f} eV, with band gap of {self.band_gap:.2f} eV) by more "
f"than 0.1 eV. If this is only due to slight differences in kpoint sampling for the "
f"bulk DOS vs defect supercell calculations, and consistent functional settings "
f"(LHFCALC, AEXX etc) were used, then the eigenvalue references should be consistent "
f"and this warning can be ignored. If not, then this could lead to inaccuracies in "
f"the predicted Fermi level. Note that the Fermi level will be referenced to the VBM "
f"of the bulk supercell (i.e. DefectThermodynamics.vbm)"
f"The VBM eigenvalue of the bulk DOS calculation ({fdos_vbm:.2f} eV, band gap = "
f"{fdos_band_gap:.2f} eV) differs by >0.05 eV from `DefectThermodynamics.vbm/gap` "
f"({self.vbm:.2f} eV, band gap = {self.band_gap:.2f} eV; which are taken from the bulk "
f"supercell calculation by default, unless `bulk_band_gap_vr` is set during defect "
f"parsing). If this is only due to differences in kpoint sampling for the bulk DOS vs "
f"supercell calculations, then you should use the `bulk_band_gap_vr` option during "
f"defect parsing to set the bulk band gap and VBM eigenvalue "
f"(`DefectThermodynamics.gap/vbm`) to the correct values (though the absolute values of "
f"predictions should not be affected as the eigenvalue references in the calculations "
f"are consistent, just the reported Fermi levels will be referenced to "
f"`DefectThermodynamics.vbm` which may not be the exact VBM position here).\n"
f"Otherwise if this is due to changes in functional settings (LHFCALC, AEXX etc), then "
f"the calculations should be redone with consistent settings to ensure accurate "
f"predictions.\n"
f"Note that the Fermi level will be always referenced to `DefectThermodynamics.vbm`!"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle potential discrepancies in VBM values gracefully.

The method _parse_fermi_dos issues a warning if there is a significant difference between the VBM values from different sources. This is good, but it might be beneficial to provide a mechanism or recommendation on how to resolve such discrepancies, such as recalculating the values or adjusting parameters to ensure consistency.

Comment on lines +1230 to +1232
to the VBM (using ``self.vbm``, which is the VBM of the `bulk supercell`
calculation by default). If None (default), set to the mid-gap Fermi
level (E_g/2).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify the default behavior of the fermi_level parameter.

The fermi_level parameter in the get_equilibrium_concentrations method defaults to the mid-gap Fermi level, but this isn't explicitly stated in the parameter's description. It's helpful to explicitly state default behaviors in the documentation to improve clarity and user understanding.

Comment on lines +1333 to +1367
def _parse_fermi_dos(
self, bulk_dos: Union[str, Vasprun, FermiDos], skip_check: bool = False
) -> FermiDos:
if isinstance(bulk_dos, FermiDos):
fdos = bulk_dos
# most similar settings to Vasprun.eigenvalue_band_properties:
fdos_vbm = fdos.get_cbm_vbm(tol=1e-4, abs_tol=True)[1] # tol 1e-4 is lowest possible, as VASP
fdos_band_gap = fdos.get_gap(tol=1e-4, abs_tol=True) # rounds the DOS outputs to 4 dp

if isinstance(bulk_dos_vr, str):
bulk_dos_vr = get_vasprun(bulk_dos_vr, parse_dos=True)
if isinstance(bulk_dos, str):
bulk_dos = get_vasprun(bulk_dos, parse_dos=True) # converted to fdos in next block

fermi_dos_band_gap, _cbm, fermi_dos_vbm, _ = bulk_dos_vr.eigenvalue_band_properties
if abs(fermi_dos_vbm - self.vbm) > 0.1:
if isinstance(bulk_dos, Vasprun): # either supplied Vasprun or parsed from string there
fdos_band_gap, _cbm, fdos_vbm, _ = bulk_dos.eigenvalue_band_properties
fdos = get_fermi_dos(bulk_dos)

if abs(fdos_vbm - self.vbm) > 0.05 and not skip_check:
warnings.warn(
f"The VBM eigenvalue of the bulk DOS calculation ({fermi_dos_vbm:.2f} eV, with band "
f"gap of {fermi_dos_band_gap:.2f} eV) differs from that of the bulk supercell "
f"calculations ({self.vbm:.2f} eV, with band gap of {self.band_gap:.2f} eV) by more "
f"than 0.1 eV. If this is only due to slight differences in kpoint sampling for the "
f"bulk DOS vs defect supercell calculations, and consistent functional settings "
f"(LHFCALC, AEXX etc) were used, then the eigenvalue references should be consistent "
f"and this warning can be ignored. If not, then this could lead to inaccuracies in "
f"the predicted Fermi level. Note that the Fermi level will be referenced to the VBM "
f"of the bulk supercell (i.e. DefectThermodynamics.vbm)"
f"The VBM eigenvalue of the bulk DOS calculation ({fdos_vbm:.2f} eV, band gap = "
f"{fdos_band_gap:.2f} eV) differs by >0.05 eV from `DefectThermodynamics.vbm/gap` "
f"({self.vbm:.2f} eV, band gap = {self.band_gap:.2f} eV; which are taken from the bulk "
f"supercell calculation by default, unless `bulk_band_gap_vr` is set during defect "
f"parsing). If this is only due to differences in kpoint sampling for the bulk DOS vs "
f"supercell calculations, then you should use the `bulk_band_gap_vr` option during "
f"defect parsing to set the bulk band gap and VBM eigenvalue "
f"(`DefectThermodynamics.gap/vbm`) to the correct values (though the absolute values of "
f"predictions should not be affected as the eigenvalue references in the calculations "
f"are consistent, just the reported Fermi levels will be referenced to "
f"`DefectThermodynamics.vbm` which may not be the exact VBM position here).\n"
f"Otherwise if this is due to changes in functional settings (LHFCALC, AEXX etc), then "
f"the calculations should be redone with consistent settings to ensure accurate "
f"predictions.\n"
f"Note that the Fermi level will be always referenced to `DefectThermodynamics.vbm`!"
)
return FermiDos(bulk_dos_vr.complete_dos, nelecs=get_nelect_from_vasprun(bulk_dos_vr))
return fdos
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider enhancing error handling in the _parse_fermi_dos method.

The method _parse_fermi_dos could benefit from additional error handling. For instance, if the bulk_dos input is neither a str nor a Vasprun object, the method might fail without a clear error message. Adding a type check and a clear error message would enhance the robustness and user-friendliness of the method.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between edbed6d and 1fd036f.

Files selected for processing (3)
  • doped/thermodynamics.py (31 hunks)
  • tests/test_generation.py (12 hunks)
  • tests/test_thermodynamics.py (8 hunks)
Files skipped from review as they are similar to previous changes (2)
  • tests/test_generation.py
  • tests/test_thermodynamics.py
Additional comments not posted (5)
doped/thermodynamics.py (5)

435-444: Consider improving the clarity of the vbm parameter description in the __init__ method.


472-472: Ensure consistency in the description of the vbm parameter across the class.


1230-1232: Clarify the default behavior of the fermi_level parameter.


1349-1365: Handle potential discrepancies in VBM values gracefully in the _parse_fermi_dos method.


1333-1367: Consider enhancing error handling in the _parse_fermi_dos method.

@kavanase kavanase merged commit a696129 into main Jun 20, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant