Skip to content

Commit

Permalink
提交COS新版本
Browse files Browse the repository at this point in the history
  • Loading branch information
bolunfeng committed Aug 30, 2017
1 parent e5562d2 commit bca1624
Show file tree
Hide file tree
Showing 4 changed files with 85 additions and 10 deletions.
52 changes: 47 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Powered by GTXLab of Genetalks.

technique preview download URL:https://github.com/Genetalks/gtz/archive/0.2.2g_tech_preview.tar.gz
technique preview download URL:https://github.com/Genetalks/gtz/archive/0.2.2h_tech_preview.tar.gz

[中文说明](https://github.com/Genetalks/gtz/blob/master/README_chs.md "Markdown").

Expand All @@ -12,7 +12,7 @@ GTX Compressor is a fastq compressor and also can be used as a generic data comp

GTX Compressor compresses the 33 qualities of FASTQ files (NA12878_1.fastq), with the size of approximately 200GB, to 19% of the original size, in less than 13 minutes, over the AWS R4.8xlarge machine (or the same configuration server) at a speed of more than 256MB/s. As the FASTQ data which is producted by X10 with only **7 qualities, GTX Compressor can gains 5.5% compression.**

**GTX Compressor provides "Directly compress to the cloud" function**. Out of commercial consideration, users not only need to store the massive data generated by gene sequencing locally, but also need to quickly and steadily transfer the data to the cloud. GTX Compressor system can compress the fastq files and concurrently transfer the compressed data to the Amazon AWS S3 platform or Ali cloud OSS platform, by supplying the same compression speed and compression rate with local compression. With ordinary 100Mbits Intenet line, GTX Compressor can directly compress 200GB Fastq file to the cloud in just 30 minutes.
**GTX Compressor provides "Directly compress to the cloud" function**. Out of commercial consideration, users not only need to store the massive data generated by gene sequencing locally, but also need to quickly and steadily transfer the data to the cloud. GTX Compressor system can compress the fastq files and concurrently transfer the compressed data to the Amazon AWS S3 platform , Ali cloud OSS platform or Tecent cloud COS platform, by supplying the same compression speed and compression rate with local compression. With ordinary 100Mbits Intenet line, GTX Compressor can directly compress 200GB Fastq file to the cloud in just 30 minutes.

## System highlights

Expand All @@ -39,7 +39,7 @@ The download package contains two tar.gz packages for the ubuntu version and the

```
USAGE:
./gtz [--list] [-e <string>] [-f] [--endpoint <string>] [--timeout <string>]
./gtz [--list] [-e <string>] [-f] [--endpoint <string>] [--appid <string>] [--timeout <string>]
[--secret-access-key <string>] [--access-key-id <string>] [-b
<string>] [-s <string>] [-c] [-n <string>] [-l <string>] [-i]
[-d] [--delete] [-a] [-g <number>] [-o <string>] [--] [--version]
Expand All @@ -53,6 +53,7 @@ General Options Instruciton:
- \-\- access-key-id: Specifies the cloud platform user ID
- \-\- secret-access-key: Specifies the cloud platform user key
- \-\- endpoint: Specifies the access domain name and data center of the Ali cloud OSS platform
- \-\- appid: Specifies the access domain name and data center of the Tecent cloud COS platform

Compression Option Description:
- -f, \-\- force
Expand Down Expand Up @@ -80,7 +81,10 @@ export access_key_id=xxxxxx

export secret_access_key=xxxxxx

export endpoint=xxxxxx (Only set when transfering to OSS)
export endpoint=xxxxxx (Only set when transfering to OSS or COS)

export appid=xxxxxx (Only set when transfering to COS)


### Compression examples

Expand All @@ -91,6 +95,16 @@ Direct compression to Ali OSS:
or

zcat source.fastq.gz | ./gtz -o oss://gt-compress/out.gtz


Direct compression to Tecent COS:

./gtz -o cos://gtz/out.gtz   source.fastq (or source.fastq.gz , gtz supports recompress fastq.gz file)

or

zcat source.fastq.gz | ./gtz -o cos://gt-compress/out.gtz


Direct compression to AWS S3

Expand All @@ -116,11 +130,14 @@ Massive small files (<500MB each) compression:

tar -cf - ./you_dir_or_file | gtz -o /dest.gtz

- Direct compression to AWS S3 or Aliyun OSS:
- Direct compression to AWS S3 , Aliyun OSS or Tecent COS:

tar -cf - ./you_dir_or_file | gtz -o s3://bucket/dest.gtz

tar -cf - ./you_dir_or_file | gtz -o oss://bucket/dest.gtz
tar -cf - ./you_dir_or_file | gtz -o cos://bucket/dest.gtz


- Direct decompression:

Expand All @@ -129,6 +146,9 @@ Massive small files (<500MB each) compression:
gtz -c -d s3://bucket/dest.gtz | tar -xf -

gtz -c -d oss://bucket/dest.gtz | tar -xf -
gtz -c -d cos://bucket/dest.gtz | tar -xf -


Notice: Large size files (500MB or more) or the directory full of Large size files, especially fastq or fastq.gz file or its directory, we suggest to use GTZ to directly compress and package, it will be more faster.

Expand All @@ -137,6 +157,8 @@ Notice: Large size files (500MB or more) or the directory full of Large size fil

./gtz -a -o oss://gtz/out.gtz /A/source2.fastq # -a denotes it is the additional mode

./gtz -a -o cos://gtz/out.gtz /A/source2.fastq # -a denotes it is the additional mode

./gtz -a -o s3://gtz/out.gtz /A/source2.fastq # -a denotes it is the additional mode

./gtz -a -o gtz /out.gtz /A/source2.fastq # -a denotes it is the additional mode
Expand All @@ -146,6 +168,8 @@ Notice: Large size files (500MB or more) or the directory full of Large size fil

./gtz_0.2.0_ubuntu_release/gtz --list -d oss://gtz/out.gtz

./gtz_0.2.0_ubuntu_release/gtz --list -d cos://gtz/out.gtz

./gtz_0.2.0_ubuntu_release/gtz --list -d s3://gtz/out.gtz

./gtz_0.2.0_ubuntu_release/gtz --list -d gtz/out.gtz
Expand Down Expand Up @@ -173,6 +197,24 @@ Direct decompression from Ali OSS
./gtz -c -e source.fastq -d oss://gtz/out.gtz | gzip -c > source.gz


Direct decompression from Tecent COS

./gtz -d cos://gtz/out.gtz

Decompress several files separately:

# -e denotes the target decompression files, seperated by ":"
./gtz -e source.fastq:/A/source2.fastq -d cos://gtz/out.gtz

Decompress the target firles to the tube:

# -c denotes output files to the console; -e denotes the target decompression file.
./gtz -c -e source.fastq -d cos://gtz/out.gtz > myfile.txt

or

./gtz -c -e source.fastq -d cos://gtz/out.gtz | gzip -c > source.gz

Direct decompression from AWS S3

./gtz -d s3://gtz/out.gtz
Expand Down
43 changes: 38 additions & 5 deletions README_chs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Powered by GTXLab of Genetalks.

技术预览版本下载地址: https://github.com/Genetalks/gtz/archive/0.2.2g_tech_preview.tar.gz
技术预览版本下载地址: https://github.com/Genetalks/gtz/archive/0.2.2h_tech_preview.tar.gz


[English Manual](https://github.com/Genetalks/gtz/blob/master/README.md "Markdown").
Expand All @@ -13,7 +13,7 @@ GTX Compressor是Genetalks公司GTX Lab实验室开发的面向大型数据(

GTX Compressor可以在AWS C4.8xlarge机器(或同配置服务器),**以超过114MB/s的速度,将接近200GB大小的33个质量数的FASTQ文件(NA12878_1.fastq),在13分钟内压缩到原大小的19%**,而对于X10等只有 **7个质量数的FASTQ数据,其压缩率更可以达到5.5%**

**GTX Compressor提供“直压上云”功能**。考虑商业使用时,用户不仅需要将测序产生的海量数据存储于本地,更迫切地寻求将数据快速稳定传输至云端的能力。 GTX Compressor的数据压缩引擎允许用户直接将fastq文件压缩存储到亚马逊AWS平台或者阿里云OSS平台,并保持与本地压缩相同的压缩速度与压缩效率。普通100Mbits Intenet线路,可以在短短30分钟内稳定地将200GB Fastq文件的直压上云。
**GTX Compressor提供“直压上云”功能**。考虑商业使用时,用户不仅需要将测序产生的海量数据存储于本地,更迫切地寻求将数据快速稳定传输至云端的能力。 GTX Compressor的数据压缩引擎允许用户直接将fastq文件压缩存储到亚马逊AWS平台,阿里云OSS平台或者腾讯云COS平台,并保持与本地压缩相同的压缩速度与压缩效率。普通100Mbits Intenet线路,可以在短短30分钟内稳定地将200GB Fastq文件的直压上云。

## 系统亮点

Expand Down Expand Up @@ -46,7 +46,7 @@ GTX Compressor可以在AWS C4.8xlarge机器(或同配置服务器),**以

```
USAGE:
./gtz [--list] [-e <string>] [-f] [--endpoint <string>] [--timeout <string>]
./gtz [--list] [-e <string>] [-f] [--endpoint <string>] [--appid <string>] [--timeout <string>]
[--secret-access-key <string>] [--access-key-id <string>] [-b
<string>] [-s <string>] [-c] [-n <string>] [-l <string>] [-i]
[-d] [--delete] [-a] [-g <number>] [-o <string>] [--] [--version]
Expand All @@ -61,7 +61,9 @@ USAGE:
- \-\-version:输出gt_compress程序的版本号
- \-\-access-key-id : 指定云平台用户ID
- \-\-secret-access-key: 指定云平台用户密钥
- \-\-endpoint : 指定阿里云OSS平台的访问域名和数据中心
- \-\-endpoint : 指定阿里云OSS平台或者腾讯云COS平台的访问域名和数据中心
- \-\-appid : 指定腾讯云COS平台的用户ID


压缩选项说明:

Expand Down Expand Up @@ -93,7 +95,10 @@ export access_key_id=xxxxxx

export secret_access_key=xxxxxx

export endpoint=xxxxxx (该环境变量只有上传至OSS时才需设置)
export endpoint=xxxxxx (该环境变量只有上传至OSS或者COS时才需设置)

export appid=xxxxxx (该环境变量只有上传至COS时才需设置)


### 压缩举例

Expand All @@ -104,6 +109,14 @@ export endpoint=xxxxxx (该环境变量只有上传至OSS时才需设置)
或者
# zcat 通过管道将fastq的数据送入gtz加压,zcat解压出来的fastq数据流在 out.gtz 中将以stdin这个文件名存在
zcat source.fastq.gz | ./gtz -o oss://gt-compress/out.gtz
直压腾讯COS:

./gtz -o cos://gtz/out.gtz   source.fastq (or source.fastq.gz, gtz支持对fastq.gz的重新压缩)

或者
# zcat 通过管道将fastq的数据送入gtz加压,zcat解压出来的fastq数据流在 out.gtz 中将以stdin这个文件名存在
zcat source.fastq.gz | ./gtz -o cos://gt-compress/out.gtz

直压AWS S3:

Expand Down Expand Up @@ -132,6 +145,8 @@ export endpoint=xxxxxx (该环境变量只有上传至OSS时才需设置)

tar -cf - ./you_dir_or_file | gtz -o oss://bucket/dest.gtz

tar -cf - ./you_dir_or_file | gtz -o cos://bucket/dest.gtz

直接传输回来解包:

gtz -c -d s3://bucket/dest.gtz | tar -xf -
Expand All @@ -142,6 +157,8 @@ export endpoint=xxxxxx (该环境变量只有上传至OSS时才需设置)

./gtz -a -o oss://gtz/out.gtz /A/source2.fastq # -a 指当前是追加模式

./gtz -a -o cos://gtz/out.gtz /A/source2.fastq # -a 指当前是追加模式

./gtz -a -o s3://gtz/out.gtz /A/source2.fastq # -a 指当前是追加模式

./gtz -a -o gtz/out.gtz /A/source2.fastq # -a 指当前是追加模式
Expand All @@ -150,6 +167,8 @@ export endpoint=xxxxxx (该环境变量只有上传至OSS时才需设置)

./gtz_0.2.0_ubuntu_release/gtz --list -d oss://gtz/out.gtz

./gtz_0.2.0_ubuntu_release/gtz --list -d cos://gtz/out.gtz

./gtz_0.2.0_ubuntu_release/gtz --list -d s3://gtz/out.gtz

./gtz_0.2.0_ubuntu_release/gtz --list -d gtz/out.gtz
Expand All @@ -170,6 +189,20 @@ export endpoint=xxxxxx (该环境变量只有上传至OSS时才需设置)
或者
./gtz -c -e source.fastq -d oss://gtz/out.gtz | gzip -c > source.gz

从腾讯 COS 解压:

./gtz -d cos://gtz/out.gtz

或者 单独抽取几个文件:
# -e 代表抽取文件,后面要抽取的文件名称间,用 ":" 隔开
./gtz -e source.fastq:/A/source2.fastq -d cos://gtz/out.gtz

或者某个文件到管道:
# -c 代表输出到console, -e 代表抽取其中的某个文件
./gtz -c -e source.fastq -d cos://gtz/out.gtz > myfile.txt
或者
./gtz -c -e source.fastq -d cos://gtz/out.gtz | gzip -c > source.gz

从AWS S3 解压:

./gtz -d s3://gtz/out.gtz
Expand Down
Binary file not shown.
Binary file not shown.

0 comments on commit bca1624

Please sign in to comment.