Skip to content

Latest commit

 

History

History
27 lines (19 loc) · 2.42 KB

File metadata and controls

27 lines (19 loc) · 2.42 KB

Data Acquisition


In this study, we propose to use the following datasets

  1. Cyclic transaction dataset : dataset made available in Cyclic Arbitrage in Decentralized Exchange Markets. It contains information about arbitrage cycles that were exploited on DEXes.
  2. Uniswap rates preceeding cyclic transaction dataset : dataset gathered in this study. It contains the rates and gas prices preceeding cycles (600 transaction for each token pair uniswap pool).

To obtain these datasets, please follow the instruction below:

  1. Run the script : download_uniswap_cycles.sh. It download the Cyclic transaction dataset.
  2. Download the Uniswap rates preceeding cyclic transaction dataset that was poseted on kaggle here these data were previously fetched using the Bitquery platform (using the script rates_from_Bitquery.py).

IZAR EPFL Cluster

Note: if you have access to the IZAR EPFL cluster, the simplest solution to get the datasets is to our data directory that was made available publicly under the following folder scratch/izar/lgiordan/data/. Use the following bash command to get it

cp -a scratch/izar/lgiordan/dex-cyclic-arbitrage/data/ home/$user/dex-cyclic-arbitrage/data/ 

If you previously cloned our git repository in the root directory of our home directory.

Data sample

In case you do not want to download and process the entire set of data, one can directly download a CSV sample of liquid data on Kaggle This dataset is meant to be saved in /data/liquid. Once downloaded, one can run the script build_embedding_features.py and follow the Data Processing instructions. This sample dataset contains the first 1_000_000 rows of the original data. Please note that the results of the analysis (embedding extraction, clustering, prediction) will be different when using the subsample.

Notes

The sanity check folder contains steps undertaken to compare the data fetched with data available on Etherscan to check the validity of our scripts.