Skip to content

Commit

Permalink
fix --iop for slow5 and update slo5lib
Browse files Browse the repository at this point in the history
  • Loading branch information
hasindu2008 committed Jun 28, 2022
1 parent 1a46b75 commit b2f5921
Show file tree
Hide file tree
Showing 19 changed files with 188 additions and 66 deletions.
13 changes: 9 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ An optimised re-implementation of the *index*, *call-methylation* and *eventalig

First, the reads have to be indexed using `f5c index`. Then, invoke `f5c call-methylation` to detect methylated cytosine bases. Finally, you may use `f5c meth-freq` to obtain methylation frequencies. Alternatively, invoke `f5c eventalign` to perform event alignment. The results are almost the same as from nanopolish except a few differences due to floating point approximations.

*Full Documentation* : [https://hasindu2008.github.io/f5c/docs/overview](https://hasindu2008.github.io/f5c/docs/overview)
*Latest release* : [https://github.com/hasindu2008/f5c/releases/latest](https://github.com/hasindu2008/f5c/releases/latest)
*Pre-print* : [https://doi.org/10.1101/756122](https://www.biorxiv.org/content/10.1101/756122v1)
*Publication* : [https://doi.org/10.1186/s12859-020-03697-x](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03697-x)
*Full Documentation* : [https://hasindu2008.github.io/f5c/docs/overview](https://hasindu2008.github.io/f5c/docs/overview)<br/>
*Latest release* : [https://github.com/hasindu2008/f5c/releases/latest](https://github.com/hasindu2008/f5c/releases/latest)<br/>
*Pre-print* : [https://doi.org/10.1101/756122](https://www.biorxiv.org/content/10.1101/756122v1)<br/>
*Publication* : [https://doi.org/10.1186/s12859-020-03697-x](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03697-x)<br/>

[![GitHub Downloads](https://img.shields.io/github/downloads/hasindu2008/f5c/total?logo=GitHub)](https://github.com/hasindu2008/f5c/releases)
[![BioConda Install](https://img.shields.io/conda/dn/bioconda/f5c?label=BioConda)](https://anaconda.org/bioconda/f5c)
Expand Down Expand Up @@ -55,6 +55,11 @@ On OS X : brew install hdf5
- If you skip `scripts/install-hts.sh` and `./configure`, hdf5 will be compiled locally. It is a good option if you cannot install hdf5 library system wide. However, building hdf5 takes ages.
- *f5c* from version 0.8.0 onwards by default requires vector instructions (SSSE3 or higher for Intel/AMD and neon for ARM) for builtin *slow5lib*. If your processor is an ancient processor with no such vector instructions, invoke make as `make no_simd=1`.
- You can optionally enable *zstd* support for builtin *slow5lib* when building *f5c* by invoking `make zstd=1`. This requires __zstd 1.3 development libraries__ installed on your system (*libzstd1-dev* package for *apt*, *libzstd-devel* for *yum/dnf* and *zstd* for *homebrew*).
- On Mac M1 or in any system if `./configure` cannot find the hdf5 libraries installed through the package manager, you can specify the location as *LDFLAGS=-L/path/to/shared/lib/ CPPFLAGS=-I/path/to/headers/*. For example on Mac M1:
```
./configure LDFLAGS=-L/opt/homebrew/lib/ CPPFLAGS=-I/opt/homebrew/include/
make
```
- Instructions to build a docker image and conda installation are detailed [here](https://hasindu2008.github.io/f5c/docs/misc-install).
- Other uncommon building options are detailed [here](https://hasindu2008.github.io/f5c/docs/building).
- An SIMD accelerated version contributed by [@dkhyland](https://github.com/dkhyland) is available in the [*simd* branch](https://github.com/hasindu2008/f5c/tree/simd).
Expand Down
1 change: 1 addition & 0 deletions slow5lib/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ dist/
pyslow5.egg-info/
docs/doxygen
examples/xample.slow5.idx
/.eggs

# testing output
*actual*
Expand Down
9 changes: 5 additions & 4 deletions slow5lib/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ endif
ifeq ($(zstd_local),)
else
CFLAGS += -DSLOW5_USE_ZSTD
CPPFLAGS += -I $(zstd_local)
CPPFLAGS += -I $(zstd_local)
endif
BUILD_DIR = lib

Expand Down Expand Up @@ -77,11 +77,12 @@ test: slow5lib

pyslow5:
make clean
rm -rf *.so python/pyslow5.cpp build/lib.* build/temp.*
make -C $(SVB)
rm -rf *.so python/pyslow5.cpp build/lib.* build/temp.* build/bdist.* sdist pyslow5.egg-info dist
python3 setup.py build
cp build/lib.*/*.so ./
LD_LIBRAY_PATH=$LD_LIBRAY_PATH:$(SVB) python3 < python/example.py
python3 < python/example.py
python3 setup.py sdist


test-prep: slow5lib
gcc test/make_blow5.c -Isrc src/slow5.c src/slow5_press.c -lm -lz src/slow5_idx.c src/slow5_misc.c -o test/bin/make_blow5 -g
Expand Down
20 changes: 15 additions & 5 deletions slow5lib/include/slow5/slow5.h
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,7 @@ struct slow5_file_meta {
const char *pathname; ///< file path
int fd; ///< file descriptor
uint64_t start_rec_offset; ///< offset (in bytes) of the first SLOW5 record (skipping the SLOW5 header; used for indexing)
char *fread_buffer; ///< buffer for fread
};
typedef struct slow5_file_meta slow5_file_meta_t;

Expand Down Expand Up @@ -504,18 +505,21 @@ double *slow5_aux_get_double_array(const slow5_rec_t *read, const char *field, u
char *slow5_aux_get_string(const slow5_rec_t *read, const char *field, uint64_t *len, int *err);
uint8_t *slow5_aux_get_enum_array(const slow5_rec_t *read, const char *field, uint64_t *len, int *err);


/****** Writing SLOW5 files ******.
* This is just around the corner.
* However, this is being procrastinated until someone requests. If anyone is interested please open a GitHub issue.
***/


/**************************************************************************************************
*** Low-level API *******************************************************************************
**************************************************************************************************/

/*
IMPORTANT: The low-level API is not yet stable. Subject to changes in the future.
Function proptotypes can be changed without notice or completely removed
So do NOT use these functions in your code
these functions are used by slow5tools and pyslow5 - so any change to a function here means slow5tools and pyslow5 must be fixed
IMPORTANT: The low-level API is not yet finalised or documented, until someone requests.
If anyone is interested, please open a GitHub issue, rather than trying to figure out from the code.
Function prototypes can be changed without notice or completely removed. So do NOT use these functions in your code.
these functions are used by slow5tools and pyslow5 - so any change to a function here means slow5tools and pyslow5 must be fixed.
*/

/**
Expand Down Expand Up @@ -714,6 +718,12 @@ static inline ssize_t slow5_eof_print(void) {
//returns null if no attributes
const char **slow5_get_hdr_keys(const slow5_hdr_t *header,uint64_t *len);

//gets the list of read ids from the SLOW5 index
//the list of read is is a pointer and must not be freed by user
//*len will have the number of read ids
//NULL will be returned in case of error
char **slow5_get_rids(const slow5_file_t *s5p, uint64_t *len);

//get the pointer to auxilliary field names
char **slow5_get_aux_names(const slow5_hdr_t *header,uint64_t *len);
//get the pointer to auxilliary field types
Expand Down
6 changes: 5 additions & 1 deletion slow5lib/include/slow5/slow5_defs.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,12 @@ SOFTWARE.
extern "C" {
#endif

/* This is for internal use only - do not use any of the following directly unless they are in the API documentation
The API documentation is available at https://hasindu2008.github.io/slow5tools/
*/

// library version
#define SLOW5_LIB_VERSION "0.3.0"
#define SLOW5_LIB_VERSION "0.5.1"

// maximum file version supported by this library - independent of slow5 library version above
// if updating change all 4 below
Expand Down
1 change: 1 addition & 0 deletions slow5lib/include/slow5/slow5_error.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ SOFTWARE.
extern "C" {
#endif /* _cplusplus */

/* This is for internal use only - do not use any of the following directly*/

/* Debug and verbosity */

Expand Down
6 changes: 3 additions & 3 deletions slow5lib/include/slow5/slow5_press.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@
**************************************************************************************************/

/*
IMPORTANT: The low-level API is not yet stable. Subject to changes in the future.
Function proptotypes can be changed without notice or completely removed
So do NOT use these functions in your code
IMPORTANT: The low-level API is not yet finalised or documented.
If anyone is interested, please open a GitHub issue, rather than trying to figure out from the code.
Function proptotypes can be changed without notice or completely removed. So do NOT use these functions in your code.
these functions are used by slow5tools and pyslow5 - so any change to a function here means slow5tools and pyslow5 must be fixed
*/

Expand Down
93 changes: 76 additions & 17 deletions slow5lib/src/slow5.c
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,12 @@ SOFTWARE.
#include "slow5_misc.h"
#include "klib/ksort.h"


/* IMPORTANT: The comments in this are NOT the API documentation
The API documentation is available at https://hasindu2008.github.io/slow5tools/
The comments here are for internal use and do not rely on them. Open a GitHub issue for any questions.
*/

KSORT_INIT(str_slow5, ksstr_t, ks_lt_str)


Expand All @@ -65,6 +71,8 @@ KSORT_INIT(str_slow5, ksstr_t, ks_lt_str)
#define SLOW5_AUX_ARRAY_CAP_INIT (256) /* Initial capacity for parsing auxiliary array: 2^8 */
#define SLOW5_AUX_ARRAY_STR_CAP_INIT (1024) /* Initial capacity for storing auxiliary array string: 2^10 */

#define SLOW5_FSTREAM_BUFF_SIZE (131072) /* buffer size for freads and fwrites */

static inline void slow5_free(struct slow5_file *s5p);
static int slow5_rec_aux_parse(char *tok, char *read_mem, uint64_t offset, size_t read_size, struct slow5_rec *read, enum slow5_fmt format, struct slow5_aux_meta *aux_meta);
static inline khash_t(slow5_s2a) *slow5_rec_aux_init(void);
Expand Down Expand Up @@ -117,11 +125,29 @@ struct slow5_file *slow5_init(FILE *fp, const char *pathname, enum slow5_fmt for
slow5_errno = SLOW5_ERR_UNK;
return NULL;
}

char *fread_buff = (char *)calloc(SLOW5_FSTREAM_BUFF_SIZE, sizeof(char));
if (!fread_buff) {
SLOW5_MALLOC_ERROR();
slow5_errno = SLOW5_ERR_MEM;
return NULL;
}

if (setvbuf(fp, fread_buff, _IOFBF, SLOW5_FSTREAM_BUFF_SIZE) != 0){
SLOW5_WARNING("Could not set a large buffer for file stream of '%s': %s.", pathname, strerror(errno));;
free(fread_buff);
fread_buff = NULL;
}
else {
SLOW5_LOG_DEBUG("Buffer for file stream of '%s' was set to %d.", pathname, SLOW5_FSTREAM_BUFF_SIZE);
}

// TODO Attempt to determine from magic number

slow5_press_method_t method;
struct slow5_hdr *header = slow5_hdr_init(fp, format, &method);
if (!header) {
free(fread_buff);
SLOW5_ERROR("Parsing slow5 header of file '%s' failed.", pathname);
return NULL;
}
Expand All @@ -130,23 +156,27 @@ struct slow5_file *slow5_init(FILE *fp, const char *pathname, enum slow5_fmt for
if (!s5p) {
SLOW5_MALLOC_ERROR();
slow5_hdr_free(header);
free(fread_buff);
slow5_errno = SLOW5_ERR_MEM;
return NULL;
}

s5p->fp = fp;
s5p->format = format;
s5p->header = header;
s5p->meta.fread_buffer = fread_buff;

s5p->compress = slow5_press_init(method);
if (!s5p->compress) {
free(fread_buff);
slow5_hdr_free(header);
free(s5p);
return NULL;
}

if ((s5p->meta.fd = fileno(fp)) == -1) {
SLOW5_ERROR("Obtaining file descriptor with fileno() failed: %s.", strerror(errno));
free(fread_buff);
slow5_press_free(s5p->compress);
slow5_hdr_free(header);
free(s5p);
Expand All @@ -157,6 +187,7 @@ struct slow5_file *slow5_init(FILE *fp, const char *pathname, enum slow5_fmt for
s5p->meta.pathname = pathname;
if ((s5p->meta.start_rec_offset = ftello(fp)) == -1) {
SLOW5_ERROR("Obtaining file offset with ftello() failed: %s.", strerror(errno));
free(fread_buff);
slow5_press_free(s5p->compress);
slow5_hdr_free(header);
free(s5p);
Expand Down Expand Up @@ -338,6 +369,7 @@ static inline void slow5_free(struct slow5_file *s5p) {
slow5_press_free(s5p->compress);
slow5_hdr_free(s5p->header);
slow5_idx_free(s5p->index);
free(s5p->meta.fread_buffer);
free(s5p);
}
}
Expand Down Expand Up @@ -787,9 +819,10 @@ void *slow5_hdr_to_mem(struct slow5_hdr *header, enum slow5_fmt format, slow5_pr
if (value != NULL) {
len_to_cp = strlen(value);

//special case for "."
if (strlen(value)==0) {
len_to_cp++;
// special case for SLOW5_ASCII_MISSING_CHAR
bool is_empty = (len_to_cp == 0);
if (is_empty) {
len_to_cp ++;
}

// Realloc if necessary
Expand All @@ -799,35 +832,32 @@ void *slow5_hdr_to_mem(struct slow5_hdr *header, enum slow5_fmt format, slow5_pr
SLOW5_MALLOC_CHK(mem);
}

if (strlen(value)==0) { //special case for "."
memcpy(mem + len, ".", len_to_cp);
if (is_empty) { // special case for SLOW5_ASCII_MISSING_CHAR
mem[len] = SLOW5_ASCII_MISSING_CHAR;
} else {
memcpy(mem + len, value, len_to_cp);
}
len += len_to_cp;
}
else{
// I added this here - hasindu
} else {
// Realloc if necessary
if (len + 1 >= cap) { // +1 for "."
if (len + 1 >= cap) { // +1 for SLOW5_ASCII_MISSING_CHAR
cap *= 2;
mem = (char *) realloc(mem, cap * sizeof *mem);
SLOW5_MALLOC_CHK(mem);
}
mem[len] = '.';
mem[len] = SLOW5_ASCII_MISSING_CHAR;
++ len;
}
} else {
// Realloc if necessary
if (len + 2 >= cap) { // +2 for . and SLOW5_SEP_COL_CHAR
if (len + 2 >= cap) { // +2 for SLOW5_ASCII_MISSING_CHAR and SLOW5_SEP_COL_CHAR
cap *= 2;
mem = (char *) realloc(mem, cap * sizeof *mem);
SLOW5_MALLOC_CHK(mem);
}
mem[len] = SLOW5_SEP_COL_CHAR;
++ len;
mem[len] = '.';
++ len;
mem[len + 1] = SLOW5_ASCII_MISSING_CHAR;
len += 2;
}
}

Expand Down Expand Up @@ -2227,6 +2257,35 @@ int slow5_get(const char *read_id, struct slow5_rec **read, struct slow5_file *s
return 0;
}



//gets the list of read ids from the SLOW5 index
//the list of read is is a pointer and must not be freed by user
//*len will have the number of read ids
//NULL will be returned in ase of error
char **slow5_get_rids(const slow5_file_t *s5p, uint64_t *len) {

if (!s5p->index) {
/* index not loaded */
SLOW5_ERROR("%s", "No slow5 index has been loaded.");
slow5_errno = SLOW5_ERR_NOIDX;
return NULL;
*len=0;
}

if(!s5p->index->ids){
SLOW5_ERROR("%s", "No read ID list in the index.");
slow5_errno = SLOW5_ERR_OTH;
return NULL;
*len=0;
}

*len = s5p->index->num_ids;
return s5p->index->ids;

}


/*
* decompress record if s5p has a compression method then parse to read
* set mem to decompressed mem and bytes to new bytes
Expand Down Expand Up @@ -3598,7 +3657,7 @@ void *slow5_rec_to_mem(struct slow5_rec *read, struct slow5_aux_meta *aux_meta,
for (uint64_t i = 0; i < aux_meta->num; ++ i) {
struct slow5_rec_aux_data aux_data = { 0 };

bool hacky_malloc_flag = 0;
bool malloc_flag = false;

khint_t pos = kh_get(slow5_s2a, read->aux_map, aux_meta->attrs[i]);
if (pos != kh_end(read->aux_map)) {
Expand All @@ -3607,7 +3666,7 @@ void *slow5_rec_to_mem(struct slow5_rec *read, struct slow5_aux_meta *aux_meta,
aux_data.len = 1;
aux_data.bytes = SLOW5_AUX_TYPE_META[aux_meta->types[i]].size;
aux_data.data = (uint8_t *) malloc(aux_data.bytes);
hacky_malloc_flag = 1;
malloc_flag = true;
slow5_memcpy_null_type(aux_data.data, aux_meta->types[i]);
}

Expand All @@ -3633,7 +3692,7 @@ void *slow5_rec_to_mem(struct slow5_rec *read, struct slow5_aux_meta *aux_meta,
}
curr_len += aux_data.bytes;

if(hacky_malloc_flag){
if (malloc_flag) {
free(aux_data.data);
}
}
Expand Down
7 changes: 4 additions & 3 deletions slow5lib/src/slow5_extra.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,10 @@ extern "C" {
**************************************************************************************************/

/*
These functions are not to be exposed to the public
Used for slow5tools
Any change to a function prototype here means slow5tools must be fixed
IMPORTANT: The low-level API is not yet finalised or documented and is only for internal use.
If anyone is interested, please open a GitHub issue, rather than trying to figure out from the code.
Function prototypes can be changed without notice or completely removed. So do NOT use these functions in your code.
these functions are used by slow5tools and pyslow5 - so any change to a function here means slow5tools and pyslow5 must be fixed.
*/

// slow5 file
Expand Down
Loading

0 comments on commit b2f5921

Please sign in to comment.