Skip to content

Commit

Permalink
Fix missing values caused by logging (#60)
Browse files Browse the repository at this point in the history
* Fix missing values caused by logging.

* Fixed for non-scientific notation features as well

* Updated changelog
  • Loading branch information
wfondrie authored Jul 18, 2022
1 parent bca014d commit 3a638c0
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 2 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog for mokapot

## Unreleased
### Fixed
- The PepXML parser would sometimes try and log transform features with `0`'s, resulting in missing values.

## [0.8.1] - 2022-06-24

### Added
Expand Down
3 changes: 1 addition & 2 deletions mokapot/parsers/pepxml.py
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@ def _log_features(col, features):

# Detect columns written in scientific notation and log them:
# This is specifically needed to preserve precision.
if col.str.contains("e").any() and (col.astype(float) >= 0).all():
if col.str.contains("e").any() and (col.astype(float) > 0).all():
split = col.str.split("e", expand=True)
root = split.loc[:, 0]
root = root.astype(float)
Expand Down Expand Up @@ -370,6 +370,5 @@ def _log_features(col, features):
col[~zero_idx] = np.log10(col[~zero_idx])
col[zero_idx] = col[~zero_idx].min() - 1
LOGGER.info(" - log-transformed the '%s' feature.", col.name)
return np.log10(col)

return col

0 comments on commit 3a638c0

Please sign in to comment.