Skip to content

Commit

Permalink
Merge pull request #68 from Arcadia-Science/ter/update-docs
Browse files Browse the repository at this point in the history
Add more details to the dev docs and update README
  • Loading branch information
taylorreiter authored Jan 31, 2023
2 parents 3cf1ade + 83b9088 commit 9ceaa18
Show file tree
Hide file tree
Showing 3 changed files with 61 additions and 78 deletions.
40 changes: 6 additions & 34 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -46,38 +46,10 @@ library(sourmashconsumr)
```

The sourmashconsumr package contains a variety of functions to work with the outputs of the sourmash python package.
`read*` functions read and parse the outputs of sourmash into data frames.
`plot*` functions visualize the sourmash outputs.
`from_*to_*` functions convert sourmash data frames into the formats of other popular R packages (e.g. phyloseq).
sourmashconsumr also contains a few utility functions that work with other datatypes (at this time, mostly upset data frames).
See a complete list of functions below, sorted by sourmash output that the function operates on:

* signatures (output by `sourmash sketch` or `sourmash compute`):
* `read_signature()`
* upset plots: `from_signatures_to_upset_df()`, `plot_signatures_upset()`
* rarefaction plots for signatures sketched from reads: `from_signatures_to_rarefaction_df()`, `plot_signatures_rarefaction()`
* sourmash compare csv:
* `read_compare()`
* MDS plot: `make_compare_mds()`, `plot_compare_mds()`
* heatmap: `plot_compare_heatmap()`
* sourmash taxonomy annotate csv:
* `read_taxonomy_annotate()`
* taxonomy agglomeration: `tax_glom_taxonomy_annotate()`
* upset plot: `from_taxonomy_annotate_to_upset_inputs()`, `plot_taxonomy_annotate_upset()`
* sankey plot: `plot_taxonomy_annotate_sankey()`
* time series alluvial plot: `plot_taxonomy_annotate_ts_alluvial()`
* detect the presence of multiple strains of a single species in a metagenome: `from_taxonomy_annotate_to_multi_strains()`
* to metacoder: `from_taxonomy_annotate_to_metacoder()`
* to phyloseq: `from_taxonomy_annotate_to_phyloseq()`
* sourmash gather csv:
* `read_gather()`
* barchart: `plot_gather_classified()`
* upset plot: `from_gather_to_upset_df()`, `plot_gather_upset()`
* upset utilities:
* `from_list_to_upset_df()`
* `from_upset_df_to_intersection_members()`
* `from_upset_df_to_intersection_summary()`
* `from_upset_df_to_intersections()`
The table below summarizes which sourmash outputs the sourmashconsumr package operates on and the functions that are available.
For a complete list of functions in the sourmashconsumr package, see the [documentation](https://arcadia-science.github.io/sourmashconsumr/reference/index.html).

![](https://i.imgur.com/UfuiAhw.png){width=750px}

## Developer documentation

Expand All @@ -86,8 +58,8 @@ For more information on how to contribute, see the [developer documentation](dev

## Citation

sourmashconsumr doesn't have a citation yet, but sourmash does!
If you use sourmash in your work, please cite: [Brown and Irber (2016), doi:10.21105/joss.00027.](https://joss.theoj.org/papers/10.21105/joss.00027)
* If you use sourmashconsumr in your work, please cite [DOI: 10.57844/arcadia-1896-ke33](https://arcadia-research.pubpub.org/pub/resource-sourmashconsumr).
* If you use sourmash in your work, please cite [DOI: 10.21105/joss.00027](https://joss.theoj.org/papers/10.21105/joss.00027).

If you'd like more information on how sourmash works, please see the following publications:

Expand Down
54 changes: 11 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,46 +42,13 @@ library(sourmashconsumr)
```

The sourmashconsumr package contains a variety of functions to work with
the outputs of the sourmash python package. `read*` functions read and
parse the outputs of sourmash into data frames. `plot*` functions
visualize the sourmash outputs. `from_*to_*` functions convert sourmash
data frames into the formats of other popular R packages
(e.g. phyloseq). sourmashconsumr also contains a few utility functions
that work with other datatypes (at this time, mostly upset data frames).
See a complete list of functions below, sorted by sourmash output that
the function operates on:

- signatures (output by `sourmash sketch` or `sourmash compute`):
- `read_signature()`
- upset plots: `from_signatures_to_upset_df()`,
`plot_signatures_upset()`
- rarefaction plots for signatures sketched from reads:
`from_signatures_to_rarefaction_df()`,
`plot_signatures_rarefaction()`
- sourmash compare csv:
- `read_compare()`
- MDS plot: `make_compare_mds()`, `plot_compare_mds()`
- heatmap: `plot_compare_heatmap()`
- sourmash taxonomy annotate csv:
- `read_taxonomy_annotate()`
- taxonomy agglomeration: `tax_glom_taxonomy_annotate()`
- upset plot: `from_taxonomy_annotate_to_upset_inputs()`,
`plot_taxonomy_annotate_upset()`
- sankey plot: `plot_taxonomy_annotate_sankey()`
- time series alluvial plot: `plot_taxonomy_annotate_ts_alluvial()`
- detect the presence of multiple strains of a single species in a
metagenome: `from_taxonomy_annotate_to_multi_strains()`
- to metacoder: `from_taxonomy_annotate_to_metacoder()`
- to phyloseq: `from_taxonomy_annotate_to_phyloseq()`
- sourmash gather csv:
- `read_gather()`
- barchart: `plot_gather_classified()`
- upset plot: `from_gather_to_upset_df()`, `plot_gather_upset()`
- upset utilities:
- `from_list_to_upset_df()`
- `from_upset_df_to_intersection_members()`
- `from_upset_df_to_intersection_summary()`
- `from_upset_df_to_intersections()`
the outputs of the sourmash python package. The table below summarizes
which sourmash outputs the sourmashconsumr package operates on and the
functions that are available. For a complete list of functions in the
sourmashconsumr package, see the
[documentation](https://arcadia-science.github.io/sourmashconsumr/reference/index.html).

<img src="https://i.imgur.com/UfuiAhw.png" width="750" />

## Developer documentation

Expand All @@ -92,9 +59,10 @@ the [developer documentation](devdoc.md).

## Citation

sourmashconsumr doesn’t have a citation yet, but sourmash does! If you
use sourmash in your work, please cite: [Brown and Irber (2016),
doi:10.21105/joss.00027.](https://joss.theoj.org/papers/10.21105/joss.00027)
- If you use sourmashconsumr in your work, please cite [DOI:
10.57844/arcadia-1896-ke33](https://arcadia-research.pubpub.org/pub/resource-sourmashconsumr).
- If you use sourmash in your work, please cite [DOI:
10.21105/joss.00027](https://joss.theoj.org/papers/10.21105/joss.00027).

If you’d like more information on how sourmash works, please see the
following publications:
Expand Down
45 changes: 44 additions & 1 deletion devdoc.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,13 +102,55 @@ load_all()
```

You can then open the file you want to change and make changes.

### Adding or changing functions

Core package functions can be found in the `R` folder.
Functions are grouped by action (e.g. `read_sourmash_outputs.R` contains all of the `read_*()` functions) or by data type (e.g. functions that involve plotting the output of sourmash taxonomy annotate are in the `plot_taxonomy_annotate.R` file).
Tests and test files are located in the `tests/testthat` folder.

#### Naming functions

The functions in the sourmashconsumr package follow a naming scheme
Functions that are exported (e.g. user-facing) are named by the action completed by the function, the sourmash output type the act on, and if relevant, a description of the action taken.

+ **Action words**:
+ `read`
+ `plot`
+ `from`
+ **sourmash output types**:
+ `signature`
+ `compare_csv`
+ `gather`
+ `taxonomy_annotate`
+ **example actions**:
+ `to_metacoder`
+ `upset`
+ `heatmap`
+ `mds`

Functions that are not exported do not follow a naming scheme but strive to be fully descriptive of their actions, and when possible use the **sourmash output types** to make it clear what type of data the internal function operates on.

+ **examples of internal functions**
+ `check_compare_df_sample_col_and_move_to_rowname()`
+ `check_and_edit_names_in_signatures_df()`
+ `check_uniform_parameters_in_signatures_df()`
+ `make_agglom_cols()`
+ `make_expression()`
+ `get_scaled_for_max_hash()`
+ `pivot_wider_taxonomy_annotate()`

#### Other things

For the `plot*` functions, sourmashconsumr tries not to put anything in the `theme()` layer so that users can control the look of their plot with out conflicts.
This isn't always do-able, but we strive to follow this rule.

Layers that label the plot should be controlled to by a boolean argument `label` so that users can turn off labels and re-add them themselves so they can control the look of those labels (italicized, size, etc.).
When there is a label boolean as a function argument, the `@param` string documenting that parameter should contain an example of how to add the default labels back to the plot as some of these are esoteric and would be hard to divine without an example.

### Tests

The sourmashconsumr package uses unit tests to make sure that code changes don't break existing functions.
Tests and test files are located in the `tests/testthat` folder.
To run all tests, you can use the `testthat::test()` function:

```
Expand Down Expand Up @@ -139,6 +181,7 @@ The vignette can be built using the following command:
devtools::build_rmd("vignettes/sourmashconsumr.Rmd")
```

If you make changes to the vignette, make sure you build it locally and that your changes appear how you want them to before you push changes to GitHub.
The html file should not be pushed to GitHub.

### GitHub README
Expand Down

0 comments on commit 9ceaa18

Please sign in to comment.