21 Feb 07:15

cc2e993

v1.0.1

What's new

Enable dgl.sparse on Mac and Windows.
Fixed several bugs.

Assets 2

30 Jan 07:07

frozenbugs

1.0.0

45c2655

v1.0.0

v1.0.0 release is a new milestone for DGL. 🎉🎉🎉

New Package: dgl.sparse

In this release, we introduced a brand new package: dgl.sparse, which allows DGL users to build GNNs in Sparse Matrix paradigm. We provided Google Colab tutorials on dgl.sparse package from getting started on sparse APIs to building different types of GNN models including Graph Diffusion, Hypergraph and Graph Transformer, and 10+ examples of commonly used models in github code base.

NOTE: this feature is currently only available in Linux.

New Additions

A new example of SEAL+NGNN for OGBL datasets (#4550, #4772)
Add DeepWalk module (#4562)
A new example of BiPointNet for modelnet40 dataset (#4434)
Add Transformers related modules: Metapath2vec (#4660), LaplacianPosEnc (#4750), DegreeEncoder (#4742), ToLevi (#4884), BiasedMultiheadAttention (#4916), PathEncoder (#4956), GraphormerLayer (#4959), SpatialEncoder & SpatialEncoder3d (#4991)
Add Graph Positional Encoding Ops: double_radius_node_labeling (#4513), shortest_dist (#4799)
Add a new sample algorithm: (La)yer-Neigh(bor) sampling (#4668)

System Enhancement

Support PyTorch CUDA Stream (#4503)
Support canonical edge types in HeteroGraphConv (#4440)
Reduce Memory Consumption in Distributed Training Example (#4558)
Improve the performance of is_unibipartite (#4556)
Add options for padding and eigenvalues in Laplacian positional encoding transform (#4628)
Reduce startup overhead for dist training (#4735)
Add Heterogeneous Graph support for GNNExplainer (#4401)
Enable sampling with edge masks on homogeneous graph (#4748)
Enable save and load for Distributed Optimizer (#4752)
Add edge-wise message passing operators u_op_v (#4801)
Support bfloat16 (bf16) (#4648)
Accelerate CSRSliceMatrix<kDGLCUDA, IdType> by leveraging hashmap (#4924)
Decouple size of node/edge data files from nodes/edges_per_chunk entries in the metadata.json for Distributed Graph Partition Pipeline(#4930)
Canonical etypes are always used during partition and loading in distributed DGL(#4777, #4814).
Add parquet support for node/edge data in Distributed Partition Pipeline.(#4933)

Deprecation & Cleanup

Deprecate unused dataset attributes (#4666)
Cleanup outdated examples (#4751)
Remove the deprecated functions (#5115, #5116, #5117)
Drop outdated modules (#5114, #5118)

Dependency Update

Starting from this release, we will drop support for CUDA 10.1 and 11.0. On windows, we will further drop support for CUDA 10.2.

PyTorch ver. \ CUDA ver.	10.2	11.3	11.6	11.7
1.12	✅	✅	✅
1.13			✅	✅

PyTorch ver. \ CUDA ver.	11.3	11.6	11.7
1.12	✅	✅
1.13		✅	✅

Linux: CentOS 7+ / Ubuntu 18.04+

PyTorch ver. \ CUDA ver. 10.2 11.3 11.6 11.7

1.12 ✅ ✅ ✅

1.13 ✅ ✅

Windows: Windows 10+/Windows server 2016+

PyTorch ver. \ CUDA ver. 11.3 11.6 11.7

1.12 ✅ ✅

1.13 ✅ ✅

Bugfixes

Fix a bug related to EdgeDataLoader (#4497)
Fix graph structure corruption with transform (#4753)
Fix a bug causing UVA cannot work on old GPUs (#4781)
Fix NN modules crashing with non-FP32 inputs (#4829)

Installation

The installation URL and conda repository has changed for CUDA packages. Please use the following:

# If you installed dgl-cuXX pip wheel or dgl-cudaXX.X conda package, please uninstall them first.
pip install dgl -f https://data.dgl.ai/wheels/repo.html   # for CPU
pip install dgl -f https://data.dgl.ai/wheels/cuXX/repo.html   # for CUDA, XX = 102, 113, 116 or 117
conda install dgl -c dglteam   # for CPU
conda install dgl -c dglteam/label/cuXX   # for CUDA, XX = 102, 113, 116 or 117

Assets 2

20 Sep 07:13

jermainewang

0.9.1

960092b

v0.9.1

v0.9.1 is a minor release with the following update:

Distributed Graph Partitioning Pipeline

DGL now supports partitioning and preprocessing graph data using multiple machines. At its core is a new data format called Chunked Graph Data Format (CGDF) which stores graph data by chunks. The new pipeline processes data chunks in parallel which not only reduces the memory requirement of each machine but also significantly accelerates the entire procedure. For the same random graph with 1B nodes/5B edges, using a cluster of 8 AWS EC2 x1e.4xlarge (16 vCPU, 488GB RAM each), the new pipeline can reduce the running time to 2.7 hours and cut down the money cost by 3.7x. Read the feature highlight blog for more details.

To get started with this new feature, check out the new user guide chapter.

New Additions

A new example of SEAL model for OGBL datasets: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/seal_ogbl (#4291)
A new example of Directional Graph Substructure Networks (GSN) for OGBG-MolPCBA dataset: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/directional_GSN (#4405)
A new example of the Network In Graph Neural Network model for OGBL datasets: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/ngnn (#4328)
PyTorch Multi-GPU examples are moved to dgl/examples/pytorch/multigpu/. With a new example of multi-GPU graph property prediction that can achieve 9.5x speedup on 8 GPUs. (#4385)
A new example of Heterogeneous RGCN model on OGBN-MAG dataset: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/ogbn-mag (#4331)
Refactored the code style of the following commonly visited examples: RGCN, GIN, GAT. (#4327) (#4280) (#4240)

System Enhancement

Two new APIs dgl.use_libxsmm and dgl.is_libxsmm_enabled to enable/disable Intel LibXSMM. (#4455)
Added a new option exclude_self to exclude self-loop edges for dgl.knn_graph. The API now supports creating a batch of KNN graphs. (#4389)
The distributed training program launched by DGL will now report error when any trainer/server fails.
Speedup DataLoader by adding CPU affinity support. (#4126)
Enable graph partition book to support canonical edge types. (#4343)
Improve the performance of CUDA SpMMCSr (#4363)
Add CUDA Weighted Neighborhood Sampling (#4064)
Enable UVA for Weighted Samplers (#4314)
Allow add data to self loop created by AddSelfLoop or add_self_loop (#4261)
Add CUDA Weighted Randomwalk Sampling (#4243)

Deprecation & Cleanup

Removed the already deprecated AsyncTransferer class. The functionality has been incorporated to DGL DataLoader. (#4505)
Removed the already deprecated num_servers and num_workers arguments of dgl.distributed.initialize. (#4284)

Dependency Update

Starting from this release, we will drop support for CUDA 10.1 and 11.0. On windows, we will further drop support for CUDA 10.2.

PyTorch ver. \ CUDA ver.	10.2	11.1	11.3	11.5	11.6
1.9	✅	✅
1.10	✅	✅	✅
1.11	✅	✅	✅	✅
1.12	✅		✅		✅

PyTorch ver. \ CUDA ver.	11.1	11.3	11.5	11.6
1.9	✅
1.10	✅	✅
1.11	✅	✅	✅
1.12		✅		✅

Linux: CentOS 7+ / Ubuntu 18.04+

PyTorch ver. \ CUDA ver. 10.2 11.1 11.3 11.5 11.6

1.9 ✅ ✅

1.10 ✅ ✅ ✅

1.11 ✅ ✅ ✅ ✅

1.12 ✅ ✅ ✅

Windows: Windows 10+/Windows server 2016+

PyTorch ver. \ CUDA ver. 11.1 11.3 11.5 11.6

1.9 ✅

1.10 ✅ ✅

1.11 ✅ ✅ ✅

1.12 ✅ ✅

Bugfixes

Fix a crash bug due to incorrect dtype in dgl.to_block() (#4487)
Fix a bug related to unpinning when tensoradaptor is not available (#4450)
Fix a bug related to pinning empty tensors and graphs (#4393)
Remove duplicate entries of CUB submodule (#4499)
Fix broken static_assert (#4342)
A bunch of fixes in edge_softmax_hetero (#4336)
Fix the default value of num_bases in RelGraphConv module (#4321)
Fix etype check in DistGraph.edge_subgraph (#4322)
Fix incorrect _bias and bias usage (#4310)
Enable DistGraph.find_edge() works with str or tuple of str (#4319)
Fix a numerical bug related to SparseAdagrad. (#4253)

Assets 2

18 Jul 15:43

BarclayII

0.9.0

c7edb66

v0.9.0

This is a major update with several new features including graph prediction pipeline in DGL-Go, cuGraph support, mixed precision support, and more.

Starting from 0.9 we also ship arm64 builds for Linux and OSX.

DGL-Go

DGL-Go now supports training GNNs for graph property prediction tasks. It includes two popular GNN models – Graph Isomorphism Network (GIN) and Principal Neighborhood Aggregation (PNA). For example, to train a GIN model on the ogbg-molpcba dataset, first generate a YAML configuration file using command:

dgl configure graphpred --data ogbg-molpcba --model gin

which generates the following configuration file. Users can then manually adjust the configuration file.

version: 0.0.2
pipeline_name: graphpred
pipeline_mode: train
device: cpu                     # Torch device name, e.g., cpu or cuda or cuda:0
data:
    name: ogbg-molpcba
    split_ratio:                # Ratio to generate data split, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset
model:
     name: gin
     embed_size: 300            # Embedding size
     num_layers: 5              # Number of layers
     dropout: 0.5               # Dropout rate
     virtual_node: false        # Whether to use virtual node
general_pipeline:
    num_runs: 1                 # Number of experiments to run
    train_batch_size: 32        # Graph batch size when training
    eval_batch_size: 32         # Graph batch size when evaluating
    num_workers: 4              # Number of workers for data loading
    optimizer:
        name: Adam
        lr: 0.001
        weight_decay: 0
    lr_scheduler:
        name: StepLR
        step_size: 100
        gamma: 1
    loss: BCEWithLogitsLoss
    metric: roc_auc_score
    num_epochs: 100             # Number of training epochs
    save_path: results          # Directory to save the experiment results

Alternatively, users can fetch model recipes of pre-defined hyperparameters for the original experiments.

dgl recipe get graphpred_pcba_gin.yaml

To launch training:

dgl train --cfg graphpred_ogbg-molpcba_gin.yaml

Another addition is a new command to conduct inference of a trained model on some other dataset. For example, the following shows how to apply the GIN model trained on ogbg-molpcba to ogbg-molhiv.

# Generate an inference configuration file from a saved experiment checkpoint
dgl configure-apply graphpred --data ogbg-molhiv --cpt results/run_0.pth

# Apply the trained model for inference
dgl apply --cfg apply_graphpred_ogbg-molhiv_pna.yaml

It will save the model prediction in a CSV file like below

Mixed Precision

DGL is compatible with the PyTorch Automatic Mixed Precision (AMP) package for mixed precision training, thus saving both training time and GPU memory consumption. This feature requires PyTorch 1.6+ and Python 3.7+.

By wrapping the forward pass with torch.cuda.amp.autocast(), PyTorch automatically selects the appropriate data type for each op and tensor. Half precision tensors are memory efficient, most operators on half precision tensors are faster as they leverage GPU tensorcores.

import torch.nn.functional as F
from torch.cuda.amp import autocast

def forward(g, feat, label, mask, model):
      with autocast(enabled=True):
            logit = model(g, feat)
            loss = F.cross_entropy(logit[mask], label[mask])
            return loss

Small gradients in float16 format have underflow problems (flush to zero). PyTorch provides a GradScaler module to address this issue. It multiplies the loss by a factor and invokes backward pass on the scaled loss to prevent the underflow problem. It then unscales the computed gradients before the optimizer updates the parameters. The scale factor is determined automatically.

from torch.cuda.amp import GradScaler

scaler = GradScaler()

def backward(scaler, loss, optimizer):
      scaler.scale(loss).backward()
      scaler.step(optimizer)
      scaler.update()

Putting everything together, we have the example below.

import torch
import torch.nn as nn
from dgl.data import RedditDataset
from dgl.nn import GATConv
from dgl.transforms import AddSelfLoop

class GAT(nn.Module):
      def __init__(self, in_feats, num_classes, num_hidden=256, num_heads=2):
            super().__init__()
            self.conv1 = GATConv(in_feats, num_hidden, num_heads, activation=F.elu)
            self.conv2 = GATConv(num_hidden * num_heads, num_hidden, num_heads)

      def forward(self, g, h):
            h = self.conv1(g, h).flatten(1)
            h = self.conv2(g, h).mean(1)
            return h

device = torch.device('cuda')

transform = AddSelfLoop()
data = RedditDataset(transform)

g = data[0]
g = g.int().to(device)
train_mask = g.ndata['train_mask']
feat = g.ndata['feat']
label = g.ndata['label']
in_feats = feat.shape[1]

model = GAT(in_feats, data.num_classes).to(device)
model.train()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=5e-4)

for epoch in range(100):
     optimizer.zero_grad()
     loss = forward(g, feat, label, train_mask, model)
     backward(scaler, loss, optimizer)

Thanks @nv-dlasalle @ndickson-nvidia @yaox12 etc. for support!

cuGraph Interface

The RAPIDS cuGraph library provides a collection of GPU accelerated algorithms for graph analytics, such as centrality computation and community detection. According to its documentation, “the latest NVIDIA GPUs (RAPIDS supports Pascal and later GPU architectures) make graph analytics 1000x faster on average over NetworkX”.

To install cuGraph, we recommend following the practice below.

conda install mamba -n base -c conda-forge

mamba create -n dgl_and_cugraph -c dglteam -c rapidsai-nightly -c nvidia -c pytorch -c conda-forge cugraph pytorch torchvision torchaudio cudatoolkit=11.3 dgl-cuda11.3 tqdm

conda activate dgl_and_cugraph

DGL now supports compatibility with cuGraph by allowing conversion between a DGLGraph object and a cuGraph graph object, making it possible for DGL users to access efficient graph analytics implementations in cuGraph. For example, users can perform community detection on a graph with the Louvain method available in cuGraph.

import cugraph

from dgl.data import CoraGraphDataset

dataset = CoraGraphDataset()
g = dataset[0].to('cuda')
cugraph_g = g.to_cugraph()
cugraph_g = cugraph_g.to_undirected()
parts, modularity_score = cugraph.louvain(cugraph_g)

The community membership of nodes from parts['partition'] can then be used as auxiliary node labels or node features.

If you have modified the structure of a cuGraph graph object or loaded graph data with cuGraph, you can also convert it to a DGLGraph object.

import dgl
g = dgl.from_cugraph(cugraph_g)

Credits to @VibhuJawa!

Arm64 builds

Linux AArch64 and OSX M1 (arm64) are now supported. One can install them as usual with pip and conda:

pip install dgl-cuXX -f https://data.dgl.ai/wheels/repo.html
conda install -c dglteam dgl-cudaXX.X   # currently not available for OSX M1

Quality-of-life updates

Added more missing FP16 specializations (#4140, @ndickson-nvidia )
Allow communicators of size one when NCCL is missing (#3713, @nv-dlasalle )
Automatically unpin DGL tensors when out of scope to avoid potential bugs (#4135, @yaox12 )

System optimizations

Enable using UVA and FP16 with SparseAdam Optimizer (#3885, @nv-dlasalle )
Enable USE_EPOLL by default in distributed training (#4167)
Optimize the use of alternative streams in dataloader (#4177, @yaox12 )
Redirect AllocWorkspace to PyTorch's allocator if available (#4199, @yaox12 )

Bug fixes

Massive refactoring of examples including GCN, GraphSAGE, PinSAGE, EGES, DGI, GATv2, and many more (#4130, #4194, #4186, #4197, #4201, #4160, #4220, #4219, #4218, #4242, #4255, huge thanks to @chang-l!)
Fix CareGNN example to adapt to new sampler interface (#4211, @yaox12)
Fix #4150 (#4164, #4198, #4212)
Fix etype not guaranteed to be sorted in distributed training (#4156)
Fix compiler warnings (#4051, @TristonC)
Fix correct and smooth example using validation labels during prediction in validation (#4158, @LucasPrietoAl )
Fix build issues on mac OS (#4168, #4175)
Fix that pin_prefetcher is not actually enabled (#4169, @yaox12 )
Fix A Bug Related to GroupRevRes (#4181)
Fix deferred_dtype missing error (#4174, @nv-dlasalle )
Add CUDA context availability check before setting curand seed (#4223, @yaox12)
Fix dtype mismatch when copy graph into shared memory and get it back (#4222) (#4228)
Fix graph attribute missing in DataLoader when device is not specified (#4245)
Record stream when using another CUDA stream for data transfer (#4250, @yaox12 )
Fix Multiple Backwards Pass Error with retain_graph being set (#4078) (#4249)
Doc fixes (#4149, #4180, #4193, #4246, #4248, @PotatoChipsNinja @yaox12 @alxwen711 @Zhanghyi )

Misc

Test pipeline for distributed training (#4122 , @Kh4L)

Contributors

Kh4L, yaox12, and 9 other contributors

Assets 2

30 May 15:06

BarclayII

0.8.2

faf0fd0

v0.8.2

This is a minor release with the following updates.

Test AArch64 Build

A 0.8.2 test build for AArch64 is available in

pip install dgl -f https://data.dgl.ai/wheels-test/repo.html   # or dgl-cuXX for CUDA

New Modules

Graph Isomorphism Network with Edge Features (#3934)
dgl.transforms.FeatMask for randomly dropping out dimensions of all node/edge features (#3968, @RecLusIve-F)
dgl.transforms.RowFeatNormalizer for normalization of all node/edge features (#3968, @RecLusIve-F)
Label propagation module (#4017)
Directional graph network layer (#4017)
Datasets for developing GNN explainability approaches (#3982)
dgl.transforms.SIGNDiffusion for augmenting input node features (#3982)

Quality-of-life Updates

Allow HeteroLinear with/without bias (#3970, @ksadowski13)
Allow selection of “socket” for RPC backend in distributed training (#3951)
Enable specification of maximum number of trials for socket backend in DistDGL (#3977)
Added floating-point conversion functions to dgl.transforms.functional (#3890, @ndickson-nvidia)
Improve the warning message when Tensoradapter is not found (#4055)
Add sanity check for in_edges/out_edges on empty graphs (#4050)

System Optimization

Improved graph batching on GPU for Graph DataLoaders (#3895, @ayasar70)
CPU DataLoader affinitization (#3723 @daniil-sizov)
Memory consumption optimization on index shuffling in dataloader (#3980)
Remove unnecessary induced vertices in edge subgraph (#3978, @yaox12)
Change the curandState and launch dimension of GPU neighbor sampling kernel (#3990, @paoxiaode)

Bug fixes

Fix multi-GPU edge classification crashing with pure-GPU sampling (#3946)
Fixed race conditions in distributed SparseAdam and SparseAdagrad (#3971, @ndickson-nvidia)
Fix launch parameters index select kernel in sparse pull for multi-GPU sparse embedding (#3524, @nv-dlasalle)
Fix import error when tensorflow backend is specified (#4015)
Fix DistDGL crashing when sampling on bipartite graphs (#4014)
Prevent users from attempting to pin PyTorch non-contiguous tensors or views only encompassing part of tensor (#3992, @nv-dlasalle)
Fix Cython CAPI holding GIL causes deadlock when Python callback is asynchronous (#4036)
Misc unit test, example, doc fixes etc. (#3947, #3941, #3928, #3944, #3505, #3953, #3983, #3996, #4009, #4010, #4016, #4022, #4023, #4027, #4030, #4034, #4038, #4053, #4058, #4060 @Kh4L, @daniil-sizov, @HenryChang213, @sharique1006, @msharmavikram, @initzhang, @yinpeiqi, @chang-l, @nv-dlasalle, @Sanzo00, @Eurus-Holmes, @xiaopqr, @decoherencer)

Contributors

msharmavikram, Kh4L, and 17 other contributors

Assets 2

17 Apr 05:19

BarclayII

0.8.1

1453bf2

v0.8.1

This is a minor release that includes the following model updates, optimizations, new features and bug fixes.

Model update

nn.GroupRevRes from Training Graph Neural Networks with 1000 layers [#3842]
transforms.LaplacianPositionalEncoding from Graph Neural Networks with Learnable Structural and Positional Representations [#3869]
transforms.RWPositionalEncoding from Graph Neural Networks with Learnable Structural and Positional Representations [#3869]
dataloading.SAINTSampler from GraphSAINT [#3879]
nn.EGNNConv from E(n) Equivariant Graph Neural Networks [#3901]
nn.PNAConv from the baselines of E(n) Equivariant Graph Neural Networks [#3901]

Example update

Position-aware GNN [#3823 @RecLusIve-F]
EGES (Enhanced Graph Embedding with Side info) [#3756 @Wang-Yu-Qing]

Feature update (new functionalities, interface changes, etc.)

Radius graph - construct a graph by connecting points within a given distance. [#3829 @ksadowski13]
- It uses torch.cdist so the space complexity is O(N^2).
Added a get_attention parameter in GlobalAttentionPooling. [#3837 @decoherencer]

Quality of life update

Example to train with multi-GPU with PyTorch Lightning. [#3863]
Multi-GPU inference with UVA. [#3827 @nv-dlasalle]
Enable UVA sampling with CPU indices to save GPU memory. [#3892]
Set stacklevel=2 for DGL-raised warnings. [#3816]
Pure GPU example of GraphSAGE, with both node classification and link prediction. [#3796 @nv-dlasalle, #3856 @Kh4L]
Tensoradapter DLPack 0.6 compatibility / PyTorch 1.11 support. [#3803]

System optimization

Enable UVA for PinSAGE and RandomWalk. [#3857 @yaox12]
METIS partition with communication volume minimization, reduces the communication volume by 13.4% compared with edge-cut minimization on ogbn-products. [#3821 @chwan1016]
Change parameter of curand_init for reducing GPU latency [#3794 @paoxiaode]

Bug fixes

Fix Python 3.10 import error [#3862]
Fix repeated 0’s in DataLoader index iteration when shuffle=False [#3892]
DataLoader device cannot be None [#3822 @yinpeiqi]
Fix device error in negative sampling with UVA [#3904 @nv-dlasalle]
Illegal instruction in ClusterGCNSampler (#3910)
Include pin memory status in pickling and deep copy [#3914]
Misc doc fixes (@lvcrek @AzureLeon1 @decoherencer @yaox12 @ketyi )

Contributors

Kh4L, yaox12, and 10 other contributors

Assets 2

03 Apr 08:38

BarclayII

0.8.0post2

2b8c834

v0.8.0post2

This is a bugfix release including the following bugfixes:

Quality-of-life updates

Python 3.10 support.
PyTorch 1.11 support.

CUDA 11.5 support on Linux. Please install with

pip install dgl-cu115 -f https://data.dgl.ai/wheels/repo.html  # if using pip
conda install dgl-cuda11.5 -c dglteam  # if using conda

Compatibility to DLPack 0.6 in tensoradapter (#3803) for PyTorch 1.11
Set stacklevel=2 for dgl_warning (#3816)
Support custom datasets in DataLoader that are not necessarily tensors (#3810 @yinpeiqi )

Bug fixes

Pass ntype/etype into partition book when node/edge_split (#3828)
Fix multi-GPU RGCN example (#3871 @yaox12)
Send rpc messages blockingly in case of congestion (#3867). Note that this fix would probably cause speed regression in distributed DGL training. We were still finding the root cause of the underlying issue in #3881.
Fix CopyToSharedMem assuming that all relation graphs are homogeneous (#3841)
Fix HAN example crashing with CUDA (#3841)
Fix UVA sampling crash without specifying prefetching features (#3862)
Fix documentation display issue of node/edge_split (#3858)
Fix device mismatch error in GraphSAGE distributed training example under multi-node multi-GPU (#3870)
Use torch.distributed.algorithms.join.Join to deal with uneven training sets in distributed training (#3870)
Dataloader documentation fixes (#3886)
Remove redundant reference of networkx package in pagerank.py (#3888 @AzureLeon1 )
Make source build work for systems where the default is Python 2 (#3718)
Fix UVA sampling with partially specified node types (#3897)

Contributors

yaox12, AzureLeon1, and yinpeiqi

Assets 2

08 Mar 07:51

BarclayII

0.8.0post1

3800da2

v0.8.0post1

This is a quick post-release with critical bug fixes:

Fix incorrect name when fetch data in sparse optimizer #3808
Fix DataLoader not working with heterogeneous graphs on multiple GPUs #3801
Fix error in heterogeneous graph partitioning when the graph is unidirectional bipartite #3793

Assets 2

01 Mar 15:45

jermainewang

0.8.0

a55e499

v0.8.0

v0.8.0 is a major release with many new features, system improvement and fixes. Read the blog for the highlighted features.

Major features

Mini-batch Sampling Pipeline Update

Enabled CUDA UVA-based optimization and feature prefetching for all built-in graph samplers (up to 4x speedup compared to v0.7). Users can now specify the features to prefetch and turn on UVA optimization in dgl.dataloading.Sampler and dgl.dataloading.DataLoader.

g = ...                             # some DGLGraph data
train_nids = ...                    # training node IDs
sampler = dgl.dataloading.MultiLayerNeighborSampler(
    fanout=[10, 15],
    prefetch_node_feats=['feat'],   # prefetch node feature 'feat'
    prefetch_labels=['label'],      # prefetch node label 'label'
)
dataloader = dgl.dataloading.DataLoader(
    g, train_nids, sampler,
    device='cuda:0',     # perform sampling on GPU 0
    batch_size=1024,
    shuffle=True,
    use_uva=True         # turn on UVA optimization
)

We have done a major refactor on the sampling components to make it easier to implement new graph samplers. Added a new base class dgl.dataloading.Sampler with one abstract method sample for overriding. Added new APIs dgl.set_src_lazy_features, dgl.set_dst_lazy_features, dgl.set_node_lazy_features, dgl.set_edge_lazy_features for customizing prefetching rules. The code below shows the new user experience.

class NeighborSampler(dgl.dataloading.Sampler):
    def __init__(self,
                 fanouts : list[int],
                 prefetch_node_feats: list[str] = None,
                 prefetch_edge_feats: list[str] = None,
                 prefetch_labels: list[str] = None):
        super().__init__()
        self.fanouts = fanouts
        self.prefetch_node_feats = prefetch_node_feats
        self.prefetch_edge_feats = prefetch_edge_feats
        self.prefetch_labels = prefetch_labels

    def sample(self, g, seed_nodes):
        output_nodes = seed_nodes
        subgs = []
        for fanout in reversed(self.fanouts):
            # Sample a fixed number of neighbors of the current seed nodes.
            sg = g.sample_neighbors(seed_nodes, fanout)
            # Convert this subgraph to a message flow graph.
            sg = dgl.to_block(sg, seed_nodes)
            seed_nodes = sg.srcdata[NID]
            subgs.insert(0, sg)
         input_nodes = seed_nodes
         
         # handle prefetching
         dgl.set_src_lazy_features(subgs[0], self.prefetch_node_feats)
         dgl.set_dst_lazy_features(subgs[-1], self.prefetch_labels)
         for subg in subgs:
             dgl.set_edge_lazy_features(subg, self.prefetch_edge_feats)

         return input_nodes, output_nodes, subgs

DGL-Go

DGL-Go is a new command line tool for users to get started with training, using and studying Graph Neural Networks (GNNs). Data scientists can quickly apply GNNs to their problems, whereas researchers will find it useful to customize their experiments.

The initial release include

Four commands, dgl train, dgl recipe, dgl configure and dgl export.
3 training pipelines for node prediction using full graph training, link prediction using full graph training and node prediction using neighbor sampling.
5 node encoding models: gat, gcn, gin, sage, sgc; 3 edge encoding models: bilinear, dot-product, element-wise.
10 datasets including custom dataset in CSV format.

NN Modules

We have accelerated dgl.nn.RelGraphConv and dgl.nn.HGTConv by up to 36x and 12x compared with the baselines from v0.7 and PyG. Shortened the implementation of dgl.nn.RelGraphConv by 3x (from 200L → 64L).

Breaking change: dgl.nn.RelGraphConv no longer accepts 1-D integer tensor representing node IDs during forward. Please switch to torch.nn.Embedding to explicitly represent trainable node embeddings.

Below are the new NN modules added to v0.8:

GATv2Conv: GATv2 from How Attentive are Graph Attention Networks?
EGATConv: Graph attention layer that handles edge features from Rossmann-Toolbox
EdgePredictor: Predictor/score function for pairs of node representations
TransE: Similarity measure from Translating Embeddings for Modeling Multi-relational Data
TransR: Similarity measure from Learning entity and relation embeddings for knowledge graph completion
HeteroLinear: Apply linear transformations on heterogeneous inputs.
HeteroEmbedding: Create a heterogeneous embedding table.
HGTConv: Heterogeneous graph transformer convolution from Heterogeneous Graph Transformer
TypedLinear: Linear transformation according to types.
JumpingKnowledge: The Jumping Knowledge aggregation module from Representation Learning on Graphs with Jumping Knowledge Networks
GNNExplainer: GNNExplainer model from GNNExplainer: Generating Explanations for Graph Neural Networks

A new edge_weight argument is added to several GNN modules to support training on weighted graph. Added a new user guide chapter 5.5 about how to use edge weights in your GNN model.

Graph Dataset and Transforms

Rename the old dgl.transform package to dgl.transforms to follow PyTorch’s namespace convention. All DGL’s datasets now accept an extra transforms keyword argument for data augmentation and transformation:

import dgl
import dgl.transforms as T
t = T.Compose([
    T.AddSelfLoop(),
    T.GCNNorm(),
])
dataset = dgl.data.CoraGraphDataset(transform=t)
g = dataset[0]  # graph and features will be transformed automatically

Added 16 graph data transforms module:

Compose: Create a transform composed of multiple transforms in sequence.
AddSelfLoop: Add self-loops for each node in the graph and return a new graph.
RemoveSelfLoop: Remove self-loops for each node in the graph and return a new graph.
AddReverse: Add a reverse edge (i,j) for each edge (j,i) in the input graph and return a new graph.
ToSimple: Convert a graph to a simple graph without parallel edges and return a new graph.
LineGraph: Return the line graph of the input graph.
KHopGraph: Return the graph whose edges connect the k-hop neighbors of the original graph.
AddMetaPaths: Add new edges to an input graph based on given metapaths, as described in Heterogeneous Graph Attention Network.
GCNNorm: Apply symmetric adjacency normalization to an input graph and save the result edge weights, as described in Semi-Supervised Classification with Graph Convolutional Networks.
PPR: Apply personalized PageRank (PPR) to an input graph for diffusion, as introduced in The pagerank citation ranking: Bringing order to the web.
[HeatKernel](https://docs.dgl.ai/generated/dgl.transforms.HeatKernel.html#dgl.trans...

Contributors

erickim555, yaox12, and 19 other contributors

Assets 2

08 Nov 04:09

BarclayII

0.7.2

582f71a

0.7.2

0.7.2 Release Notes

This is a patch release targeting CUDA 11.3 and PyTorch 1.10. It contains (1) distributed training on heterogeneous graphs, and (2) bug fixes and code reorganization commits. The performance impact should be minimal.

To install with CUDA 11.3 support, run either

pip install dgl-cu113 -f https://data.dgl.ai/wheels/repo.html

conda install -c dglteam dgl-cuda11.3

Distributed Training on Heterogeneous Graphs

We have made the interface of distributed sampling on heterogeneous graph consistent with single-machine code. Please refer to https://github.com/dmlc/dgl/blob/0.7.x/examples/pytorch/rgcn/experimental/entity_classify_dist.py for the new code.

Other fixes

[Bugfix] Fix bugs of farthest_point_sampler (#3327, @sangyx)
[Bugfix] Fix sparse embeddings for PyTorch < 1.7 #3291 (#3333)
Fixes bug in hg.update_all causing crash #3312 (#3345, @sanchit-misra)
[Bugfix] And PYTHONPATH in server launch. (#3352)
[CPU][Sampling][Performance] Improve sampling on the CPU. (#3274, @nv-dlasalle)
[Performance, CPU] Rewriting OpenMP pragmas into parallel_for (#3171, @tpatejko)
[Build] Fix OpenMP header inclusion for Mac builds (#3325)
[Performance] improve coo2csr space complexity when row is not sorted (#3326)
[BugFix] initialize data if null when converting from row sorted coo to csr (#3360)
fix broadcast tensor dim in dgl.broadcast_nodes (#3351, @jwyyy)
[BugFix] fix typo in fakenews dataset variable name (#3363, @kayzliu)
[Doc] Added md5sum info for OGB-LSC dataset (#3332, @msharmavikram)
[Feature] Graceful handling of exceptions thrown within OpenMP blocks (#3353)
Fix torch import in example (#3372, @jwyyy)
[Distributed] Allow user to pass-in extra env parameters when launching a distributed training task. (#3375)
[BugFix] extract gz into target dir (#3389)
[Model] Refine GraphSAINT (#3328 @ljh1064126026 )
[Bug] check dtype before convert to gk (#3414)
[BugFix] add count_nonzero() into SA_Client (#3417)
[Bug] Do not skip graphconv even no edge exists (#3416)
Fix edge ID exclusion when both g and g_sampling are specified in EdgeDataLoader(#3322)
[Bugfix] three bugs related to using DGL as a subdirectory(third_party) of another project. (#3379, @yuanzexi )
[PyTorch][Bugfix] Use uint8 instead of bool in pytorch to be compatible with nightly version (#3406, #3454, @nv-dlasalle)
[Fix] Use ==/!= to compare constant literals (str, bytes, int, float, tuple) (#3415, @cclauss)
[Bugfix][Pytorch] Fix model save and load bug of stgcn_wave (#3303, @HaoWei-TomTom )
[BugFix] Avoid Memory Leak Issue in PyTorch Backend (#3386, @chwan-rice )
[Fix] Split nccl sparse push into two groups (#3404, @nv-dlasalle )
[Doc] remove duplicate papers (#3393, @chwan-rice )
Fix GINConv backward #3437 (#3440)
[bugfix] Fix compilation with CUDA 11.5's CUB (#3468, @nv-dlasalle )
[Example][Performance] Enable faster validation for pytorch graphsage example (#3361, @nv-dlasalle )
[Doc] Evaluation Tutorial for Link Prediction (#3463)

Contributors

msharmavikram, cclauss, and 10 other contributors

Assets 2

Releases: dmlc/dgl

v1.0.1

What's new

v1.0.0

New Package: dgl.sparse

New Additions

System Enhancement

Deprecation & Cleanup

Dependency Update

Bugfixes

Installation

v0.9.1

Distributed Graph Partitioning Pipeline

New Additions

System Enhancement

Deprecation & Cleanup

Dependency Update

Bugfixes

v0.9.0

DGL-Go

Mixed Precision

cuGraph Interface

Arm64 builds

Quality-of-life updates

System optimizations

Bug fixes

Misc

Contributors

v0.8.2

Test AArch64 Build

New Modules

Quality-of-life Updates

System Optimization

Bug fixes

Contributors

v0.8.1

Model update

Example update

Feature update (new functionalities, interface changes, etc.)

Quality of life update

System optimization

Bug fixes

Contributors

v0.8.0post2

Quality-of-life updates

Bug fixes

Contributors

v0.8.0post1

v0.8.0

Major features

Mini-batch Sampling Pipeline Update

DGL-Go

NN Modules

Graph Dataset and Transforms

Contributors

0.7.2

0.7.2 Release Notes

Distributed Training on Heterogeneous Graphs

Other fixes

Contributors