Skip to content

Commit

Permalink
docs
Browse files Browse the repository at this point in the history
  • Loading branch information
briney committed Oct 18, 2024
1 parent 8aa00c4 commit 6eb7eb6
Show file tree
Hide file tree
Showing 2 changed files with 63 additions and 12 deletions.
2 changes: 1 addition & 1 deletion docs/source/modules/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,9 @@ write

"FASTA", :ref:`to_fasta() <to-fasta>`, "supports ``Sequence`` or ``Pair`` objects"
"FASTQ", :ref:`to_fastq() <to-fastq>`, "supports ``Sequence`` or ``Pair`` objects"
"AIRR", :ref:`to_airr() <to-airr>`, "only supports ``Sequence`` objects"
"Parquet", :ref:`to_parquet() <to-parquet>`, "supports ``Sequence`` or ``Pair`` objects"
"CSV", :ref:`to_csv() <to-csv>`, "supports ``Sequence`` or ``Pair`` objects"
"AIRR", :ref:`to_airr() <to-airr>`, "only supports ``Sequence`` objects"


convert
Expand Down
73 changes: 62 additions & 11 deletions docs/source/modules/read_sequences.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,68 @@
read/parse sequences
==============================


.. csv-table::
:header: "format", "function", "notes"
:widths: 15, 10, 30

"FASTA/Q", :ref:`read_fastx() <read-fastx>`, "also supports gzipped files"
"FASTA", :ref:`read_fasta() <read-fasta>`, "also supports gzipped files"
"FASTQ", :ref:`read_fastq() <read-fastq>`, "also supports gzipped files"
"AIRR", :ref:`read_airr() <read-airr>`, "only supports ``Sequence`` objects"
"Parquet", :ref:`read_parquet() <read-parquet>`, "supports ``Sequence`` or ``Pair`` objects"
"CSV", :ref:`read_csv() <read-csv>`, "supports ``Sequence`` or ``Pair`` objects"
``abutils`` provides functions for reading/parsing sequence data from a variety of commonly
used file formats. The primary differences between ``read`` and ``parse`` functions are:

- ``read`` functions read an entire file into memory and return a list of ``Sequence`` objects.
- ``parse`` functions yield ``Sequence`` objects one at a time.

Thus, ``parse`` functions are generally more memory efficient for large files, but ``read``
functions may be more convenient for smaller files or quick prototyping.

.. note::

``read_fastx()`` and ``parse_fastx()`` are the most flexible and can read/parse either
FASTA or FASTQ files. This is particularly useful when building pipelines in which users
may want to process both file types or when the source file may not be known in advance.

.. code-block:: python
# read entire file into memory
sequences = abutils.io.read_fastx("sequences.fastq")
# parse file one record at a time
for sequence in abutils.io.parse_fastq("sequences.fastq"):
print(sequence)
.. table::
:align: left
:widths: 10, 12, 24
:width: 100%

+----------+--------------------------------------+----------------------------------------+
| format | function | notes |
+==========+======================================+========================================+
| FASTA/Q | :ref:`read_fastx() <read-fastx>` | returns a list of ``Sequence`` objects |
| | | |
+ +--------------------------------------+----------------------------------------+
| | :ref:`parse_fastx() <parse-fastx>` | yields single ``Sequence`` objects |
| | | |
+----------+--------------------------------------+----------------------------------------+
| FASTA | :ref:`read_fasta() <read-fasta>` | returns a list of ``Sequence`` objects |
| | | |
+ +--------------------------------------+----------------------------------------+
| | :ref:`parse_fasta() <parse-fasta>` | yields single ``Sequence`` objects |
| | | |
+----------+--------------------------------------+----------------------------------------+
| FASTQ | :ref:`read_fastq() <read-fastq>` | returns a list of ``Sequence`` objects |
| | | |
+ +--------------------------------------+----------------------------------------+
| | :ref:`parse_fastq() <parse-fastq>` | yields single ``Sequence`` objects |
| | | |
+----------+--------------------------------------+----------------------------------------+
| AIRR | :ref:`read_airr() <read-airr>` | only supports ``Sequence`` objects |
| | | |
+----------+--------------------------------------+----------------------------------------+
| Parquet | :ref:`read_parquet() <read-parquet>` | supports ``Sequence`` or ``Pair`` |
| | | objects |
+----------+--------------------------------------+----------------------------------------+
| CSV | :ref:`read_csv() <read-csv>` | supports ``Sequence`` or ``Pair`` |
| | | objects |
+----------+--------------------------------------+----------------------------------------+



Expand Down

0 comments on commit 6eb7eb6

Please sign in to comment.