In this study, we propose to use the following datasets
- Cyclic transaction dataset : dataset made available in Cyclic Arbitrage in Decentralized Exchange Markets. It contains information about arbitrage cycles that were exploited on DEXes.
- Uniswap rates preceeding cyclic transaction dataset : dataset gathered in this study. It contains the rates and gas prices preceeding cycles (600 transaction for each token pair uniswap pool).
To obtain these datasets, please follow the instruction below:
- Run the script : download_uniswap_cycles.sh. It download the
Cyclic transaction dataset
. - Download the
Uniswap rates preceeding cyclic transaction dataset
that was poseted on kaggle here these data were previously fetched using the Bitquery platform (using the scriptrates_from_Bitquery.py
).
Note: if you have access to the IZAR EPFL cluster, the simplest solution to get the datasets is to our
data
directory that was made available publicly under the following folderscratch/izar/lgiordan/data/
. Use the following bash command to get it
cp -a scratch/izar/lgiordan/dex-cyclic-arbitrage/data/ home/$user/dex-cyclic-arbitrage/data/
If you previously cloned our git repository in the root directory of our home directory.
In case you do not want to download and process the entire set of data, one can directly download a CSV sample of liquid
data on Kaggle
This dataset is meant to be saved in /data/liquid. Once downloaded, one can run the script build_embedding_features.py
and follow the Data Processing instructions. This sample dataset contains the first 1_000_000 rows of the original data. Please note that the results of the analysis (embedding extraction, clustering, prediction) will be different when using the subsample.
The sanity check folder contains steps undertaken to compare the data fetched with data available on Etherscan to check the validity of our scripts.