Skip to content

Commit

Permalink
Merge pull request #23 from aoki-h-jp/feature/1.1.0/renewal
Browse files Browse the repository at this point in the history
Feature/1.1.0/renewal
  • Loading branch information
aoki-h-jp authored Jan 4, 2025
2 parents 4185226 + f1f6626 commit 01999ee
Show file tree
Hide file tree
Showing 7 changed files with 280 additions and 114 deletions.
34 changes: 0 additions & 34 deletions .github/workflows/Formatter.yml

This file was deleted.

42 changes: 24 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
# binance-bulk-downloader

[![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/release/python-3110//)
[![Format code](https://github.com/aoki-h-jp/binance-bulk-downloader/actions/workflows/Formatter.yml/badge.svg?branch=main)](https://github.com/aoki-h-jp/binance-bulk-downloader/actions/workflows/Formatter.yml)
[![pytest](https://github.com/aoki-h-jp/binance-bulk-downloader/actions/workflows/pytest.yaml/badge.svg)](https://github.com/aoki-h-jp/binance-bulk-downloader/actions/workflows/pytest.yaml)

## Python library for bulk downloading Binance historical data

A Python library to efficiently and concurrently download historical data files from Binance. Supports all asset types (spot, USDT-M, COIN-M, options) and all data frequencies.

## Installation

```bash
pip install git+https://github.com/aoki-h-jp/binance-bulk-downloader
pip install binance-bulk-downloader
```

## Usage

### Download all klines 1m data (USDT-M futures)

```python
Expand Down Expand Up @@ -41,6 +43,7 @@ downloader.run_download()
```

### Other examples

Please see /example directory.

```bash
Expand All @@ -54,25 +57,26 @@ python -m pytest
```

## Available data types

✅: Implemented and tested. ❌: Not available on Binance.

### by data_type

| data_type | spot | um | cm | options |
| :------------------ | :--: | :--: | :--: | :-----: |
| aggTrades |||||
| bookDepth |||||
| bookTicker |||||
| fundingRate |||||
| indexPriceKlines |||||
| klines |||||
| liquidationSnapshot |||||
| markPriceKlines |||||
| metrics |||||
| premiumIndexKlines |||||
| trades |||||
| BVOLIndex |||||
| EOHSummary |||||
| data_type | spot | um | cm | options |
| :------------------ | :--: | :--: | :--: | :-----: |
| aggTrades |||||
| bookDepth |||||
| bookTicker |||||
| fundingRate |||||
| indexPriceKlines |||||
| klines |||||
| liquidationSnapshot |||||
| markPriceKlines |||||
| metrics |||||
| premiumIndexKlines |||||
| trades |||||
| BVOLIndex |||||
| EOHSummary |||||

### by data_frequency (klines, indexPriceKlines, markPriceKlines, premiumIndexKlines)

Expand All @@ -96,14 +100,16 @@ python -m pytest
| 1mo |||||

## If you want to report a bug or request a feature

Please create an issue on this repository!

## Disclaimer

This project is for educational purposes only. You should not construe any such information or other material as legal,
tax, investment, financial, or other advice. Nothing contained here constitutes a solicitation, recommendation,
endorsement, or offer by me or any third party service provider to buy or sell any securities or other financial
instruments in this or in any other jurisdiction in which such solicitation or offer would be unlawful under the
securities laws of such jurisdiction.

Under no circumstances will I be held responsible or liable in any way for any claims, damages, losses, expenses, costs,
or liabilities whatsoever, including, without limitation, any direct or indirect damages for loss of profits.
or liabilities whatsoever, including, without limitation, any direct or indirect damages for loss of profits.
176 changes: 117 additions & 59 deletions binance_bulk_downloader/downloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,15 @@

# import third-party libraries
import requests
from rich import print
from rich.console import Console
from rich.progress import track
from rich.panel import Panel

# import my libraries
from binance_bulk_downloader.exceptions import (
BinanceBulkDownloaderDownloadError, BinanceBulkDownloaderParamsError)
BinanceBulkDownloaderDownloadError,
BinanceBulkDownloaderParamsError,
)


class BinanceBulkDownloader:
Expand Down Expand Up @@ -130,47 +133,55 @@ def __init__(
self._timeperiod_per_file = timeperiod_per_file
self.marker = None
self.is_truncated = True
self.downloaded_list = []
self.downloaded_list: list[str] = []
self.console = Console()

def _check_params(self) -> None:
"""
Check params
:return: None
"""
if (
self._data_type
not in self._DATA_TYPE_BY_ASSET[self._asset][self._timeperiod_per_file]
):
# Check asset type first
if self._asset not in self._ASSET + self._FUTURES_ASSET + self._OPTIONS_ASSET:
raise BinanceBulkDownloaderParamsError(
f"data_type must be {self._DATA_TYPE_BY_ASSET[self._asset][self._timeperiod_per_file]}."
f"asset must be {self._ASSET + self._FUTURES_ASSET + self._OPTIONS_ASSET}."
)

# Check time period
if self._timeperiod_per_file not in ["daily", "monthly"]:
raise BinanceBulkDownloaderParamsError(
"timeperiod_per_file must be daily or monthly."
)

# Check data frequency
if self._data_frequency not in self._DATA_FREQUENCY:
raise BinanceBulkDownloaderParamsError(
f"data_frequency must be {self._DATA_FREQUENCY}."
)

if self._asset not in self._ASSET + self._FUTURES_ASSET + self._OPTIONS_ASSET:
# Check if asset exists in DATA_TYPE_BY_ASSET
if self._asset not in self._DATA_TYPE_BY_ASSET:
raise BinanceBulkDownloaderParamsError(
f"asset must be {self._ASSET + self._FUTURES_ASSET + self._OPTIONS_ASSET}."
f"asset {self._asset} is not supported."
)

if self._timeperiod_per_file not in ["daily", "monthly"]:
# Check if timeperiod exists for the asset
asset_data = self._DATA_TYPE_BY_ASSET.get(self._asset, {})
if self._timeperiod_per_file not in asset_data:
raise BinanceBulkDownloaderParamsError(
f"timeperiod_per_file must be daily or monthly."
f"timeperiod {self._timeperiod_per_file} is not supported for {self._asset}."
)

if not self._data_type in self._DATA_TYPE_BY_ASSET.get(self._asset, None).get(
self._timeperiod_per_file, None
):
# Check data type
valid_data_types = asset_data.get(self._timeperiod_per_file, [])
if self._data_type not in valid_data_types:
raise BinanceBulkDownloaderParamsError(
f"data_type must be {self._DATA_TYPE_BY_ASSET[self._asset][self._timeperiod_per_file]}."
f"data_type must be one of {valid_data_types}."
)

# Check 1s frequency restriction
if self._data_frequency == "1s":
if self._asset == "spot":
pass
else:
if self._asset != "spot":
raise BinanceBulkDownloaderParamsError(
f"data_frequency 1s is not supported for {self._asset}."
)
Expand All @@ -183,7 +194,7 @@ def _get_file_list_from_s3_bucket(self, prefix, marker=None, is_truncated=False)
:param is_truncated: is truncated
:return: list of files
"""
print(f"[bold blue]Get file list[/bold blue]: " + prefix)
self.console.print(Panel(f"Getting file list: {prefix}", style="blue"))
params = {"prefix": prefix, "max-keys": 1000}
if marker:
params["marker"] = marker
Expand Down Expand Up @@ -254,50 +265,95 @@ def _download(self, prefix) -> None:
:param prefix: s3 bucket prefix
:return: None
"""
self._check_params()
zip_destination_path = os.path.join(self._destination_dir, prefix)
csv_destination_path = os.path.join(
self._destination_dir, prefix.replace(".zip", ".csv")
)
try:
self._check_params()
zip_destination_path = os.path.join(self._destination_dir, prefix)
csv_destination_path = os.path.join(
self._destination_dir, prefix.replace(".zip", ".csv")
)

# Make directory if not exists
if not os.path.exists(os.path.dirname(zip_destination_path)):
os.makedirs(os.path.dirname(zip_destination_path))
# Make directory if not exists
if not os.path.exists(os.path.dirname(zip_destination_path)):
try:
os.makedirs(os.path.dirname(zip_destination_path))
except (PermissionError, OSError) as e:
self.console.print(
f"Directory creation error: {str(e)}", style="red"
)
raise BinanceBulkDownloaderDownloadError from e

# Don't download if already exists
if os.path.exists(csv_destination_path):
print(f"[yellow]Already exists: {csv_destination_path}[/yellow]")
return
# Don't download if already exists
if os.path.exists(csv_destination_path):
self.console.print(
f"Already exists: {csv_destination_path}", style="yellow"
)
return

url = f"{self._BINANCE_DATA_DOWNLOAD_BASE_URL}/{prefix}"
print(f"[bold blue]Downloading {url}[/bold blue]")
try:
response = requests.get(url, zip_destination_path)
print(f"[green]Downloaded: {url}[/green]")
except requests.exceptions.HTTPError:
print(f"[red]HTTP Error: {url}[/red]")
return None
url = f"{self._BINANCE_DATA_DOWNLOAD_BASE_URL}/{prefix}"
self.console.print(Panel(f"Downloading: {url}", style="blue"))

with open(zip_destination_path, "wb") as file:
for chunk in response.iter_content(chunk_size=8192):
file.write(chunk)
try:
response = requests.get(url)
response.raise_for_status()
self.console.print(f"Downloaded: {url}", style="green")
except (
requests.exceptions.RequestException,
requests.exceptions.HTTPError,
requests.exceptions.ConnectionError,
requests.exceptions.Timeout,
) as e:
self.console.print(f"Download error: {str(e)}", style="red")
raise BinanceBulkDownloaderDownloadError from e

try:
unzipped_path = "/".join(zip_destination_path.split("/")[:-1])
with zipfile.ZipFile(zip_destination_path) as existing_zip:
existing_zip.extractall(
csv_destination_path.replace(csv_destination_path, unzipped_path)
)
print(f"[green]Unzipped: {zip_destination_path}[/green]")
except BadZipfile:
print(f"[red]Bad Zip File: {zip_destination_path}[/red]")
os.remove(zip_destination_path)
print(f"[green]Removed: {zip_destination_path}[/green]")
raise BinanceBulkDownloaderDownloadError
try:
with open(zip_destination_path, "wb") as file:
for chunk in response.iter_content(chunk_size=8192):
file.write(chunk)
except OSError as e:
self.console.print(f"File write error: {str(e)}", style="red")
raise BinanceBulkDownloaderDownloadError from e

try:
unzipped_path = "/".join(zip_destination_path.split("/")[:-1])
with zipfile.ZipFile(zip_destination_path) as existing_zip:
existing_zip.extractall(
csv_destination_path.replace(
csv_destination_path, unzipped_path
)
)
self.console.print(
f"Unzipped: {zip_destination_path}", style="green"
)
except BadZipfile as e:
self.console.print(f"Bad Zip File: {zip_destination_path}", style="red")
if os.path.exists(zip_destination_path):
os.remove(zip_destination_path)
self.console.print(
f"Removed: {zip_destination_path}", style="green"
)
raise BinanceBulkDownloaderDownloadError from e
except OSError as e:
self.console.print(f"Unzip error: {str(e)}", style="red")
if os.path.exists(zip_destination_path):
os.remove(zip_destination_path)
self.console.print(
f"Removed: {zip_destination_path}", style="green"
)
raise BinanceBulkDownloaderDownloadError from e

# Delete zip file
os.remove(zip_destination_path)
print(f"[green]Removed: {zip_destination_path}[/green]")
# Delete zip file
try:
os.remove(zip_destination_path)
self.console.print(f"Removed: {zip_destination_path}", style="green")
except OSError as e:
self.console.print(f"File removal error: {str(e)}", style="red")
raise BinanceBulkDownloaderDownloadError from e

except Exception as e:
if not isinstance(e, BinanceBulkDownloaderDownloadError):
self.console.print(f"Unexpected error: {str(e)}", style="red")
raise BinanceBulkDownloaderDownloadError from e
raise

@staticmethod
def make_chunks(lst, n) -> list:
Expand All @@ -314,7 +370,9 @@ def run_download(self):
Download concurrently
:return: None
"""
print(f"[bold blue]Downloading {self._data_type}[/bold blue]")
self.console.print(
Panel(f"Starting download for {self._data_type}", style="blue bold")
)

while self.is_truncated:
file_list_generator = self._get_file_list_from_s3_bucket(
Expand Down
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
requests~=2.28.2
setuptools~=68.1.2
requests~=2.32.0
setuptools~=70.0.0
rich~=10.16.2
pytest~=4.6.11
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

setup(
name="binance-bulk-downloader",
version="1.0.4",
version="1.1.0",
description="A Python library to efficiently and concurrently download historical data files from Binance. Supports all asset types (spot, futures, options) and all frequencies.",
install_requires=["requests", "rich", "pytest"],
author="aoki-h-jp",
Expand Down
1 change: 1 addition & 0 deletions test/prefix/file.zip
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
dummy content
Loading

0 comments on commit 01999ee

Please sign in to comment.