All public blockchains are essentially large networks. Analyzing Onchain data most likely involves network analysis. The existing visualizations on common data platforms like Dune currently have difficulty describing the relationships between nodes on blockchains.
Let's use the controversial FTX "hacker" address (0x59ABf3837Fa962d6853b4Cc0a19513AA031fd32b) as an example to do some network analysis (we won't debate whether it's a hacker or the Panama government). We'll look at where the ETH from this address went (we'll examine the 2-hop relationships outgoing from this address).
Tools used in the process:
- Dune: get raw data between addresses and do initial processing
- Python
- Networkx: python package for creating, manipulating and studying complex networks. Allows storing networks in standardized and non-standardized data formats, generating various random and classic networks, analyzing network structure, building network models, designing new network algorithms, drawing networks, etc.
- More info: https://networkx.org/
- Plotly: great package for visualizations, can generate interactive HTML files. Has a complementary frontend framework called DASH that is very user-friendly for data analysts without advanced engineering skills.
- More info: https://plotly.com/
- Pandas: most commonly used Python package for working with data, provides many functions and methods to enable quick and convenient data manipulation.
- More info: https://pandas.pydata.org/
- Networkx: python package for creating, manipulating and studying complex networks. Allows storing networks in standardized and non-standardized data formats, generating various random and classic networks, analyzing network structure, building network models, designing new network algorithms, drawing networks, etc.
- Etherscan API: calculating ETH Balance on Dune is too tedious, requiring pulling all data each time. We can simply get Balance from the Etherscan API.
The process can be broadly divided into the following steps:
- Get raw data from Dune
- Process relationships between nodes and handle various attribute data needed for drawing the network graph (pos, label, color, size etc.) using Networkx
- Visualize the network graph using Plotly
The SQL is quite complex so I won't go into detail, so feel free to check the URL for details if interested:
-
Get data with relationships between all relevant addresses with SQL: https://dune.com/queries/1753177
- from: sender of the transaction
- to: receiver of the transaction
- transfer_eth_balance: total ETH transferred between two
- transfer_eth_count: total number of ETH transfers between two accounts
-
Get list of all addresses and associated labels via SQL: https://dune.com/queries/2430347
- address: all addresses involved in this network analysis
- level_type: level in the network for all addresses involved (Core, Layer One, Layer Two)
- account_type: is a regular EOA, exchange, or smart contract
- label: useful aggregated info for the address into a label for subsequent visualization in python
II. Read local files into DataFrames using pandas and supplement with Balance column from Etherscan API
- Download Dune data locally (either via Dune API or copy-paste) and read into pandas from local files
## Change path to your own local file path
df_target_label = pd.read_csv(u'YOUR FILE PATH/graph_raw_label.csv')
df_target_relation = pd.read_csv(u'YOUR FILE PATH/graph_relation.csv')
## Get list of all addresses to query API
address_list = list(df_target_label.address.values)
balance_list = []
print(address_list)
- Get Balance data for all addresses via Etherscan API and write to DataFrame
while len(address_list) > 0:
for address in address_list:
api_key = "your_api_key"
try:
response = requests.get(
"https://api.etherscan.io/api?module=account&action=balance&address=" + address + "&tag=latest&apikey=" + api_key
)
# Parse the JSON response
response_json = json.loads(response.text)
# Get balance info from response
eth_balance = response_json["result"]
eth_balance = int(eth_balance)/(1E18)
balance_list.append((address,eth_balance))
address_list.remove(address)
time.sleep(1)
print(eth_balance)
except:
print('Error')
print('List Length:'+str(len(address_list)))
df_balance = pd.DataFrame(balance_list, columns=['address', 'Balance'])
df_target_label = df_target_label.merge(df_balance,left_on=['address'],right_on=['address'],how='left')
print('end')
- Add Balance to DataFrame, create Balance_level column (label based on Balance size) to control Node size in network graph later
## Define a function to return different labels based on value size, similar to CASE Statement in SQL
def get_balance_level(x):
if x == 0:
output = 'Small'
elif x > 0 and x < 1000:
output = 'Medium'
elif x > 1000 and x < 10000:
output = 'Large'
else:
output = 'Huge'
return output
df_target_label['Balance_level'] = df_target_label['Balance'].round(2).apply(lambda x: get_balance_level(x))
df_target_label['Balance'] = df_target_label['Balance'].round(2).astype('string')
df_target_label['label'] = df_target_label['label']+' | '+ df_target_label['Balance'] +' ETH'
def drew_graph(df_target_relation,df_target_label):
def add_node_base_data(df_target_relation):
df_target_relation = df_target_relation
node_list = list(set(df_target_relation['from_address'].to_list()+df_target_relation['to_address'].to_list()))
edges = list(set(df_target_relation.apply(lambda x: (x.from_address, x.to_address), axis=1).to_list()))
G.add_nodes_from(node_list)
G.add_edges_from(edges)
return node_list,edges
def add_node_attributes(df_target_label,df_key_list,df_vlaue_list,color_list):
for node, (n,p) in zip(G.nodes(), pos.items()):
G.nodes[node]['pos'] = p
G.nodes[node]['color'] = '#614433'
for id,label,layer_type,Balance_level in list(set(df_target_label.apply(lambda x: (x.address, x.label, x.level_type,x.Balance_level), axis=1).to_list())):
if node==id:
G.nodes[node]['label']=label
if Balance_level=='Large':
G.nodes[node]['size']=40
elif Balance_level=='Medium':
G.nodes[node]['size']=20
elif Balance_level=='Small':
G.nodes[node]['size']=10
elif Balance_level=='Huge':
G.nodes[node]['size']=80
for x,y,z in zip(df_key_list,df_vlaue_list,color_list):
target_list = df_target_label[df_target_label[x]==y]['address'].values.tolist()
if len(target_list)>0:
for id in target_list:
if id==node and G.nodes[node]['color']=='#614433':
G.nodes[node]['color'] = z
############### Draw all edges
def get_edge_trace(G):
xtext=[]
ytext=[]
edge_x = []
edge_y = []
for edge in G.edges():
x0, y0 = G.nodes[edge[0]]['pos']
x1, y1 = G.nodes[edge[1]]['pos']
xtext.append((x0+x1)/2)
ytext.append((y0+y1)/2)
edge_x.append(x0)
edge_x.append(x1)
edge_x.append(None)
edge_y.append(y0)
edge_y.append(y1)
edge_y.append(None)
xtext.append((x0+x1)/2)
ytext.append((y0+y1)/2)
edge_trace = go.Scatter(
x=edge_x, y=edge_y,
line=dict(width=0.5, color='#333'),
hoverinfo='none',
mode='lines')
eweights_trace = go.Scatter(x=xtext,y= ytext, mode='text',
marker_size=0.5,
text=[0.45, 0.7, 0.34],
textposition='top center',
hovertemplate='weight: %{text}<extra></extra>')
return edge_trace, eweights_trace
def get_node_trace(G):
node_x = []
node_y = []
for node in G.nodes():
x, y = G.nodes[node]['pos']
node_x.append(x)
node_y.append(y)
node_trace = go.Scatter(
x=node_x, y=node_y,
mode='markers',
hoverinfo='text',
marker=dict(
color=[],
colorscale = px.colors.qualitative.Plotly,
size=10,
line_width=0))
return node_trace
###############Define Graph
G = nx.Graph()
###############Add Nodes and Edges to the graph
node_list = add_node_base_data(df_target_relation)[0]
edges = add_node_base_data(df_target_relation)[1]
eweights_trace = add_node_base_data(df_target_relation)[1]
###############choose layout and get the pos of the relevant node
pos = nx.fruchterman_reingold_layout(G)
df_key_list = [ 'level_type' ,'account_type' , 'account_type' , 'account_type' ]
df_vlaue_list = [ 'Core' , 'EOA' , 'Cex Address' , 'Contract Address']
color_list = [ '#109947' ,'#0031DE' , '#F7F022' , '#E831D6' ]
###############Add label, Size, color attributes to node
add_node_attributes(df_target_label,df_key_list,df_vlaue_list,color_list)
edge_trace, eweights_trace = get_edge_trace(G)
node_trace = get_node_trace(G)
###############Write node_text, node_size, node_color into list
node_text = []
node_size = []
node_color = []
for node in G.nodes():
x = G.nodes[node]['label']
y = G.nodes[node]['size']
z = G.nodes[node]['color']
node_text.append(x)
node_size.append(y)
node_color.append(z)
# Set label, size, color
node_trace.marker.color = node_color
node_trace.marker.size =node_size
node_trace.text = node_text
fig_target_id=go.Figure()
fig_target_id.add_trace(edge_trace)
fig_target_id.add_trace(node_trace)
fig_target_id.update_layout(
height=1000,
width=1000,
xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
showlegend=False,
hovermode='closest',
)
return fig_target_id
fig = drew_graph(df_target_relation,df_target_label)
fig.show()
fig.write_html(u'YOUR FILE PATH/FTX_Accounts_Drainer.html')
print('end')
Check out the interactive version at this URL: https://pro0xbi.github.io/FTX_Accounts_Drainer.html
-
Node colors
- Green is the FTX "hacker" address
- Blue are normal EOA accounts that had large transfers (>100ETH) with it
- Yellow are Exchange addresses (FTX)
- Red are smart contract addresses
-
Node size
- Larger nodes indicate larger balances for that address. The largest nodes have balances >10,000 ETH
We can see that among all addresses associated with the FTX "hacker", there are still at least 12 addresses holding >10,000 ETH, meaning at least 120,000 ETH have not been sold by the "hacker".
Sixdegree
is a professional onchain data analysis team Our mission is to provide users with accurate onchain data charts, analysis, and insights. We are committed to popularizing onchain data analysis. By building a community and writing tutorials, among other initiatives, we train onchain data analysts, output valuable analysis content, promote the community to build the data layer of the blockchain, and cultivate talents for the broad future of blockchain data applications. Welcome to the community exchange!
- Website: sixdegree.xyz
- Email: contact@sixdegree.xyz
- Twitter: twitter.com/SixdegreeLab
- Dune: dune.com/sixdegree
- Github: https://github.com/SixdegreeLab