Skip to content

Commit

Permalink
Merge pull request #19 from recohut/stage
Browse files Browse the repository at this point in the history
commit
  • Loading branch information
sparsh-ai authored Feb 12, 2022
2 parents 3b4b08e + a01cc40 commit ebc6f69
Show file tree
Hide file tree
Showing 22 changed files with 22 additions and 0 deletions.
1 change: 1 addition & 0 deletions _notebooks/2022-01-26-bert2bert-seq-attack.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-bitcoin-rl-agent.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-deepwalk.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-eda-ml-latest.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-gcegnn-tmall.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-ieee-transformer.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-karateclub-deepwalk.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"2022-01-26-karateclub-deepwalk.ipynb","provenance":[{"file_id":"https://github.com/recohut/nbs/blob/main/raw/T382881%20%7C%20DeepWalk%20on%20Karateclub.ipynb","timestamp":1644674258500}],"collapsed_sections":[],"authorship_tag":"ABX9TyOMPZtcX/9Sdf/et33/tq3d"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"cV7sn6S27MxX"},"source":["# DeepWalk on Karateclub"]},{"cell_type":"markdown","metadata":{"id":"tlZsvlad7QZz"},"source":["## Codebase"]},{"cell_type":"code","metadata":{"id":"mrdu9eT1xSi3"},"source":["import numpy as np\n","import networkx as nx\n","from gensim.models.word2vec import Word2Vec\n","\n","import random\n","from functools import partial\n","from typing import List, Callable\n","\n","import random"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"OEEWYVkBXCok"},"source":["class RandomWalker:\n"," \"\"\"\n"," Class to do fast first-order random walks.\n"," Args:\n"," walk_length (int): Number of random walks.\n"," walk_number (int): Number of nodes in truncated walk.\n"," \"\"\"\n","\n"," def __init__(self, walk_length: int, walk_number: int):\n"," self.walk_length = walk_length\n"," self.walk_number = walk_number\n","\n"," def do_walk(self, node):\n"," \"\"\"\n"," Doing a single truncated random walk from a source node.\n"," Arg types:\n"," * **node** *(int)* - The source node of the random walk.\n"," Return types:\n"," * **walk** *(list of strings)* - A single truncated random walk.\n"," \"\"\"\n"," walk = [node]\n"," for _ in range(self.walk_length - 1):\n"," nebs = [node for node in self.graph.neighbors(walk[-1])]\n"," if len(nebs) > 0:\n"," walk = walk + random.sample(nebs, 1)\n"," walk = [str(w) for w in walk]\n"," return walk\n","\n"," def do_walks(self, graph):\n"," \"\"\"\n"," Doing a fixed number of truncated random walk from every node in the graph.\n"," Arg types:\n"," * **graph** *(NetworkX graph)* - The graph to run the random walks on.\n"," \"\"\"\n"," self.walks = []\n"," self.graph = graph\n"," for node in self.graph.nodes():\n"," for _ in range(self.walk_number):\n"," walk_from_node = self.do_walk(node)\n"," self.walks.append(walk_from_node)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"q0HzTaFsXbYV"},"source":["class Estimator(object):\n"," \"\"\"Estimator base class with constructor and public methods.\"\"\"\n","\n"," seed: int\n","\n"," def __init__(self):\n"," \"\"\"Creating an estimator.\"\"\"\n"," pass\n","\n"," def fit(self):\n"," \"\"\"Fitting a model.\"\"\"\n"," pass\n","\n"," def get_embedding(self):\n"," \"\"\"Getting the embeddings (graph or node level).\"\"\"\n"," pass\n","\n"," def get_memberships(self):\n"," \"\"\"Getting the membership dictionary.\"\"\"\n"," pass\n","\n"," def get_cluster_centers(self):\n"," \"\"\"Getting the cluster centers.\"\"\"\n"," pass\n","\n"," def _set_seed(self):\n"," \"\"\"Creating the initial random seed.\"\"\"\n"," random.seed(self.seed)\n"," np.random.seed(self.seed)\n","\n"," @staticmethod\n"," def _ensure_integrity(graph: nx.classes.graph.Graph) -> nx.classes.graph.Graph:\n"," \"\"\"Ensure walk traversal conditions.\"\"\"\n"," edge_list = [(index, index) for index in range(graph.number_of_nodes())]\n"," graph.add_edges_from(edge_list)\n","\n"," return graph\n","\n"," @staticmethod\n"," def _check_indexing(graph: nx.classes.graph.Graph):\n"," \"\"\"Checking the consecutive numeric indexing.\"\"\"\n"," numeric_indices = [index for index in range(graph.number_of_nodes())]\n"," node_indices = sorted([node for node in graph.nodes()])\n","\n"," assert numeric_indices == node_indices, \"The node indexing is wrong.\"\n","\n"," def _check_graph(self, graph: nx.classes.graph.Graph) -> nx.classes.graph.Graph:\n"," \"\"\"Check the Karate Club assumptions about the graph.\"\"\"\n"," self._check_indexing(graph)\n"," graph = self._ensure_integrity(graph)\n","\n"," return graph\n","\n"," def _check_graphs(self, graphs: List[nx.classes.graph.Graph]):\n"," \"\"\"Check the Karate Club assumptions for a list of graphs.\"\"\"\n"," graphs = [self._check_graph(graph) for graph in graphs]\n","\n"," return graphs"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"0L38D0rBXcx0"},"source":["class DeepWalk(Estimator):\n"," r\"\"\"An implementation of `\"DeepWalk\" <https://arxiv.org/abs/1403.6652>`_\n"," from the KDD '14 paper \"DeepWalk: Online Learning of Social Representations\".\n"," The procedure uses random walks to approximate the pointwise mutual information\n"," matrix obtained by pooling normalized adjacency matrix powers. This matrix\n"," is decomposed by an approximate factorization technique.\n"," Args:\n"," walk_number (int): Number of random walks. Default is 10.\n"," walk_length (int): Length of random walks. Default is 80.\n"," dimensions (int): Dimensionality of embedding. Default is 128.\n"," workers (int): Number of cores. Default is 4.\n"," window_size (int): Matrix power order. Default is 5.\n"," epochs (int): Number of epochs. Default is 1.\n"," learning_rate (float): HogWild! learning rate. Default is 0.05.\n"," min_count (int): Minimal count of node occurrences. Default is 1.\n"," seed (int): Random seed value. Default is 42.\n"," \"\"\"\n","\n"," def __init__(\n"," self,\n"," walk_number: int = 10,\n"," walk_length: int = 80,\n"," dimensions: int = 128,\n"," workers: int = 4,\n"," window_size: int = 5,\n"," epochs: int = 1,\n"," learning_rate: float = 0.05,\n"," min_count: int = 1,\n"," seed: int = 42,\n"," ):\n","\n"," self.walk_number = walk_number\n"," self.walk_length = walk_length\n"," self.dimensions = dimensions\n"," self.workers = workers\n"," self.window_size = window_size\n"," self.epochs = epochs\n"," self.learning_rate = learning_rate\n"," self.min_count = min_count\n"," self.seed = seed\n","\n"," def fit(self, graph: nx.classes.graph.Graph):\n"," \"\"\"\n"," Fitting a DeepWalk model.\n"," Arg types:\n"," * **graph** *(NetworkX graph)* - The graph to be embedded.\n"," \"\"\"\n"," self._set_seed()\n"," graph = self._check_graph(graph)\n"," walker = RandomWalker(self.walk_length, self.walk_number)\n"," walker.do_walks(graph)\n","\n"," model = Word2Vec(\n"," walker.walks,\n"," hs=1,\n"," alpha=self.learning_rate,\n"," iter=self.epochs,\n"," size=self.dimensions,\n"," window=self.window_size,\n"," min_count=self.min_count,\n"," workers=self.workers,\n"," seed=self.seed,\n"," )\n","\n"," num_of_nodes = graph.number_of_nodes()\n"," self._embedding = [model.wv[str(n)] for n in range(num_of_nodes)]\n","\n"," def get_embedding(self) -> np.array:\n"," r\"\"\"Getting the node embedding.\n"," Return types:\n"," * **embedding** *(Numpy array)* - The embedding of nodes.\n"," \"\"\"\n"," return np.array(self._embedding)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"1o-5Qfx7YlCd"},"source":["## Run 1"]},{"cell_type":"code","metadata":{"id":"-Wc-fN97Xf4k"},"source":["g = nx.newman_watts_strogatz_graph(100, 20, 0.05)\n","\n","model = DeepWalk()\n","\n","model.fit(g)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"EoNhhW12Ymol"},"source":["## Run 2"]},{"cell_type":"code","metadata":{"id":"4tz6dV3uXtV6"},"source":["def test_deepwalk():\n"," \"\"\"\n"," Testing the DeepWalk class.\n"," \"\"\"\n"," model = DeepWalk()\n","\n"," graph = nx.watts_strogatz_graph(100, 10, 0.5)\n","\n"," model.fit(graph)\n","\n"," embedding = model.get_embedding()\n","\n"," assert embedding.shape[0] == graph.number_of_nodes()\n"," assert embedding.shape[1] == model.dimensions\n"," assert type(embedding) == np.ndarray\n","\n"," model = DeepWalk(dimensions=32)\n","\n"," graph = nx.watts_strogatz_graph(150, 10, 0.5)\n","\n"," model.fit(graph)\n","\n"," embedding = model.get_embedding()\n","\n"," assert embedding.shape[0] == graph.number_of_nodes()\n"," assert embedding.shape[1] == model.dimensions\n"," assert type(embedding) == np.ndarray"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"a6wF7h7OYRNd"},"source":["test_deepwalk()"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"r2YmkQOXYn4F"},"source":["## Run 3"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"LjvtbL89Yi9U","executionInfo":{"status":"ok","timestamp":1633184529325,"user_tz":-330,"elapsed":651,"user":{"displayName":"Sparsh Agarwal","photoUrl":"https://lh3.googleusercontent.com/a/default-user=s64","userId":"13037694610922482904"}},"outputId":"be54738f-0c50-4f60-f213-07f204f0d525"},"source":["graph = nx.gnm_random_graph(100, 1000)\n","\n","model = DeepWalk()\n","print(model.dimensions)\n","\n","model = DeepWalk(dimensions=64)\n","print(model.dimensions)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["128\n","64\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"sSMXr962Ysjg","executionInfo":{"status":"ok","timestamp":1633184601920,"user_tz":-330,"elapsed":1098,"user":{"displayName":"Sparsh Agarwal","photoUrl":"https://lh3.googleusercontent.com/a/default-user=s64","userId":"13037694610922482904"}},"outputId":"f58a6b10-0b1d-469d-8fb8-b019d95ba45e"},"source":["model.fit(graph)\n","embedding = model.get_embedding()\n","embedding"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[-0.04410676, -0.42963982, -0.17487745, ..., -0.16505624,\n"," -0.26968855, 0.10201965],\n"," [-0.01194742, -0.34208277, -0.14449309, ..., -0.10044529,\n"," -0.28312832, 0.08283068],\n"," [-0.00925825, -0.39878452, -0.1702522 , ..., -0.18669832,\n"," -0.28607824, 0.03604598],\n"," ...,\n"," [-0.00596869, -0.34330964, -0.14082992, ..., -0.16173632,\n"," -0.21714237, 0.02814825],\n"," [ 0.03509096, -0.36868513, -0.19861251, ..., -0.13171035,\n"," -0.28190404, 0.0490713 ],\n"," [ 0.02423354, -0.37686494, -0.16582331, ..., -0.08367265,\n"," -0.24089183, 0.08942277]], dtype=float32)"]},"metadata":{},"execution_count":13}]},{"cell_type":"markdown","metadata":{"id":"GLQMr5rWYjZV"},"source":["## Run 4"]},{"cell_type":"code","metadata":{"id":"qqvX1zQ57VZQ"},"source":[""],"execution_count":null,"outputs":[]}]}
1 change: 1 addition & 0 deletions _notebooks/2022-01-26-lambda-learner.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-mc.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-music-seq-data.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-reinforce.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-retail.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-social.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-stock-agent.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-tagnn.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-tfrs-olist.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-topk-reinforce.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-transformer4rec-xlnet.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-travel-optim.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-trend-news.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-twostage-retail.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _notebooks/2022-01-26-word2vec.ipynb

Large diffs are not rendered by default.

0 comments on commit ebc6f69

Please sign in to comment.