Skip to content

Releases: dmlc/dgl

v1.0.1

21 Feb 07:15
Compare
Choose a tag to compare

What's new

  • Enable dgl.sparse on Mac and Windows.
  • Fixed several bugs.

v1.0.0

30 Jan 07:07
Compare
Choose a tag to compare

v1.0.0 release is a new milestone for DGL. 🎉🎉🎉

New Package: dgl.sparse

In this release, we introduced a brand new package: dgl.sparse, which allows DGL users to build GNNs in Sparse Matrix paradigm. We provided Google Colab tutorials on dgl.sparse package from getting started on sparse APIs to building different types of GNN models including Graph Diffusion, Hypergraph and Graph Transformer, and 10+ examples of commonly used models in github code base.

NOTE: this feature is currently only available in Linux.

New Additions

  • A new example of SEAL+NGNN for OGBL datasets (#4550, #4772)
  • Add DeepWalk module (#4562)
  • A new example of BiPointNet for modelnet40 dataset (#4434)
  • Add Transformers related modules: Metapath2vec (#4660), LaplacianPosEnc (#4750), DegreeEncoder (#4742), ToLevi (#4884), BiasedMultiheadAttention (#4916), PathEncoder (#4956), GraphormerLayer (#4959), SpatialEncoder & SpatialEncoder3d (#4991)
  • Add Graph Positional Encoding Ops: double_radius_node_labeling (#4513), shortest_dist (#4799)
  • Add a new sample algorithm: (La)yer-Neigh(bor) sampling (#4668)

System Enhancement

  • Support PyTorch CUDA Stream (#4503)
  • Support canonical edge types in HeteroGraphConv (#4440)
  • Reduce Memory Consumption in Distributed Training Example (#4558)
  • Improve the performance of is_unibipartite (#4556)
  • Add options for padding and eigenvalues in Laplacian positional encoding transform (#4628)
  • Reduce startup overhead for dist training (#4735)
  • Add Heterogeneous Graph support for GNNExplainer (#4401)
  • Enable sampling with edge masks on homogeneous graph (#4748)
  • Enable save and load for Distributed Optimizer (#4752)
  • Add edge-wise message passing operators u_op_v (#4801)
  • Support bfloat16 (bf16) (#4648)
  • Accelerate CSRSliceMatrix<kDGLCUDA, IdType> by leveraging hashmap (#4924)
  • Decouple size of node/edge data files from nodes/edges_per_chunk entries in the metadata.json for Distributed Graph Partition Pipeline(#4930)
  • Canonical etypes are always used during partition and loading in distributed DGL(#4777, #4814).
  • Add parquet support for node/edge data in Distributed Partition Pipeline.(#4933)

Deprecation & Cleanup

Dependency Update

Starting from this release, we will drop support for CUDA 10.1 and 11.0. On windows, we will further drop support for CUDA 10.2.

Linux: CentOS 7+ / Ubuntu 18.04+

PyTorch ver. \ CUDA ver. 10.2 11.3 11.6 11.7
1.12  
1.13    

Windows: Windows 10+/Windows server 2016+

PyTorch ver. \ CUDA ver. 11.3 11.6 11.7
1.12  
1.13  

Bugfixes

  • Fix a bug related to EdgeDataLoader (#4497)
  • Fix graph structure corruption with transform (#4753)
  • Fix a bug causing UVA cannot work on old GPUs (#4781)
  • Fix NN modules crashing with non-FP32 inputs (#4829)

Installation

The installation URL and conda repository has changed for CUDA packages. Please use the following:

# If you installed dgl-cuXX pip wheel or dgl-cudaXX.X conda package, please uninstall them first.
pip install dgl -f https://data.dgl.ai/wheels/repo.html   # for CPU
pip install dgl -f https://data.dgl.ai/wheels/cuXX/repo.html   # for CUDA, XX = 102, 113, 116 or 117
conda install dgl -c dglteam   # for CPU
conda install dgl -c dglteam/label/cuXX   # for CUDA, XX = 102, 113, 116 or 117

v0.9.1

20 Sep 07:13
Compare
Choose a tag to compare

v0.9.1 is a minor release with the following update:

Distributed Graph Partitioning Pipeline

DGL now supports partitioning and preprocessing graph data using multiple machines. At its core is a new data format called Chunked Graph Data Format (CGDF) which stores graph data by chunks. The new pipeline processes data chunks in parallel which not only reduces the memory requirement of each machine but also significantly accelerates the entire procedure. For the same random graph with 1B nodes/5B edges, using a cluster of 8 AWS EC2 x1e.4xlarge (16 vCPU, 488GB RAM each), the new pipeline can reduce the running time to 2.7 hours and cut down the money cost by 3.7x. Read the feature highlight blog for more details.

To get started with this new feature, check out the new user guide chapter.

New Additions

System Enhancement

  • Two new APIs dgl.use_libxsmm and dgl.is_libxsmm_enabled to enable/disable Intel LibXSMM. (#4455)
  • Added a new option exclude_self to exclude self-loop edges for dgl.knn_graph. The API now supports creating a batch of KNN graphs. (#4389)
  • The distributed training program launched by DGL will now report error when any trainer/server fails.
  • Speedup DataLoader by adding CPU affinity support. (#4126)
  • Enable graph partition book to support canonical edge types. (#4343)
  • Improve the performance of CUDA SpMMCSr (#4363)
  • Add CUDA Weighted Neighborhood Sampling (#4064)
  • Enable UVA for Weighted Samplers (#4314)
  • Allow add data to self loop created by AddSelfLoop or add_self_loop (#4261)
  • Add CUDA Weighted Randomwalk Sampling (#4243)

Deprecation & Cleanup

  • Removed the already deprecated AsyncTransferer class. The functionality has been incorporated to DGL DataLoader. (#4505)
  • Removed the already deprecated num_servers and num_workers arguments of dgl.distributed.initialize. (#4284)

Dependency Update

Starting from this release, we will drop support for CUDA 10.1 and 11.0. On windows, we will further drop support for CUDA 10.2.

Linux: CentOS 7+ / Ubuntu 18.04+

PyTorch ver. \ CUDA ver. 10.2 11.1 11.3 11.5 11.6
1.9      
1.10    
1.11  
1.12    

Windows: Windows 10+/Windows server 2016+

PyTorch ver. \ CUDA ver. 11.1 11.3 11.5 11.6
1.9      
1.10    
1.11  
1.12    

Bugfixes

  • Fix a crash bug due to incorrect dtype in dgl.to_block() (#4487)
  • Fix a bug related to unpinning when tensoradaptor is not available (#4450)
  • Fix a bug related to pinning empty tensors and graphs (#4393)
  • Remove duplicate entries of CUB submodule (#4499)
  • Fix broken static_assert (#4342)
  • A bunch of fixes in edge_softmax_hetero (#4336)
  • Fix the default value of num_bases in RelGraphConv module (#4321)
  • Fix etype check in DistGraph.edge_subgraph (#4322)
  • Fix incorrect _bias and bias usage (#4310)
  • Enable DistGraph.find_edge() works with str or tuple of str (#4319)
  • Fix a numerical bug related to SparseAdagrad. (#4253)

v0.9.0

18 Jul 15:43
c7edb66
Compare
Choose a tag to compare

This is a major update with several new features including graph prediction pipeline in DGL-Go, cuGraph support, mixed precision support, and more.

Starting from 0.9 we also ship arm64 builds for Linux and OSX.

DGL-Go

DGL-Go now supports training GNNs for graph property prediction tasks. It includes two popular GNN models – Graph Isomorphism Network (GIN) and Principal Neighborhood Aggregation (PNA). For example, to train a GIN model on the ogbg-molpcba dataset, first generate a YAML configuration file using command:

dgl configure graphpred --data ogbg-molpcba --model gin

which generates the following configuration file. Users can then manually adjust the configuration file.

version: 0.0.2
pipeline_name: graphpred
pipeline_mode: train
device: cpu                     # Torch device name, e.g., cpu or cuda or cuda:0
data:
    name: ogbg-molpcba
    split_ratio:                # Ratio to generate data split, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset
model:
     name: gin
     embed_size: 300            # Embedding size
     num_layers: 5              # Number of layers
     dropout: 0.5               # Dropout rate
     virtual_node: false        # Whether to use virtual node
general_pipeline:
    num_runs: 1                 # Number of experiments to run
    train_batch_size: 32        # Graph batch size when training
    eval_batch_size: 32         # Graph batch size when evaluating
    num_workers: 4              # Number of workers for data loading
    optimizer:
        name: Adam
        lr: 0.001
        weight_decay: 0
    lr_scheduler:
        name: StepLR
        step_size: 100
        gamma: 1
    loss: BCEWithLogitsLoss
    metric: roc_auc_score
    num_epochs: 100             # Number of training epochs
    save_path: results          # Directory to save the experiment results

Alternatively, users can fetch model recipes of pre-defined hyperparameters for the original experiments.

dgl recipe get graphpred_pcba_gin.yaml

To launch training:

dgl train --cfg graphpred_ogbg-molpcba_gin.yaml

Another addition is a new command to conduct inference of a trained model on some other dataset. For example, the following shows how to apply the GIN model trained on ogbg-molpcba to ogbg-molhiv.

# Generate an inference configuration file from a saved experiment checkpoint
dgl configure-apply graphpred --data ogbg-molhiv --cpt results/run_0.pth

# Apply the trained model for inference
dgl apply --cfg apply_graphpred_ogbg-molhiv_pna.yaml

It will save the model prediction in a CSV file like below
image

Mixed Precision

DGL is compatible with the PyTorch Automatic Mixed Precision (AMP) package for mixed precision training, thus saving both training time and GPU memory consumption. This feature requires PyTorch 1.6+ and Python 3.7+.

By wrapping the forward pass with torch.cuda.amp.autocast(), PyTorch automatically selects the appropriate data type for each op and tensor. Half precision tensors are memory efficient, most operators on half precision tensors are faster as they leverage GPU tensorcores.

import torch.nn.functional as F
from torch.cuda.amp import autocast

def forward(g, feat, label, mask, model):
      with autocast(enabled=True):
            logit = model(g, feat)
            loss = F.cross_entropy(logit[mask], label[mask])
            return loss

Small gradients in float16 format have underflow problems (flush to zero). PyTorch provides a GradScaler module to address this issue. It multiplies the loss by a factor and invokes backward pass on the scaled loss to prevent the underflow problem. It then unscales the computed gradients before the optimizer updates the parameters. The scale factor is determined automatically.

from torch.cuda.amp import GradScaler

scaler = GradScaler()

def backward(scaler, loss, optimizer):
      scaler.scale(loss).backward()
      scaler.step(optimizer)
      scaler.update()

Putting everything together, we have the example below.

import torch
import torch.nn as nn
from dgl.data import RedditDataset
from dgl.nn import GATConv
from dgl.transforms import AddSelfLoop

class GAT(nn.Module):
      def __init__(self, in_feats, num_classes, num_hidden=256, num_heads=2):
            super().__init__()
            self.conv1 = GATConv(in_feats, num_hidden, num_heads, activation=F.elu)
            self.conv2 = GATConv(num_hidden * num_heads, num_hidden, num_heads)

      def forward(self, g, h):
            h = self.conv1(g, h).flatten(1)
            h = self.conv2(g, h).mean(1)
            return h

device = torch.device('cuda')

transform = AddSelfLoop()
data = RedditDataset(transform)

g = data[0]
g = g.int().to(device)
train_mask = g.ndata['train_mask']
feat = g.ndata['feat']
label = g.ndata['label']
in_feats = feat.shape[1]

model = GAT(in_feats, data.num_classes).to(device)
model.train()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=5e-4)

for epoch in range(100):
     optimizer.zero_grad()
     loss = forward(g, feat, label, train_mask, model)
     backward(scaler, loss, optimizer)

Thanks @nv-dlasalle @ndickson-nvidia @yaox12 etc. for support!

cuGraph Interface

The RAPIDS cuGraph library provides a collection of GPU accelerated algorithms for graph analytics, such as centrality computation and community detection. According to its documentation, “the latest NVIDIA GPUs (RAPIDS supports Pascal and later GPU architectures) make graph analytics 1000x faster on average over NetworkX”.

To install cuGraph, we recommend following the practice below.

conda install mamba -n base -c conda-forge

mamba create -n dgl_and_cugraph -c dglteam -c rapidsai-nightly -c nvidia -c pytorch -c conda-forge cugraph pytorch torchvision torchaudio cudatoolkit=11.3 dgl-cuda11.3 tqdm

conda activate dgl_and_cugraph

DGL now supports compatibility with cuGraph by allowing conversion between a DGLGraph object and a cuGraph graph object, making it possible for DGL users to access efficient graph analytics implementations in cuGraph. For example, users can perform community detection on a graph with the Louvain method available in cuGraph.

import cugraph

from dgl.data import CoraGraphDataset

dataset = CoraGraphDataset()
g = dataset[0].to('cuda')
cugraph_g = g.to_cugraph()
cugraph_g = cugraph_g.to_undirected()
parts, modularity_score = cugraph.louvain(cugraph_g)

The community membership of nodes from parts['partition'] can then be used as auxiliary node labels or node features.

If you have modified the structure of a cuGraph graph object or loaded graph data with cuGraph, you can also convert it to a DGLGraph object.

import dgl
g = dgl.from_cugraph(cugraph_g)

Credits to @VibhuJawa!

Arm64 builds

Linux AArch64 and OSX M1 (arm64) are now supported. One can install them as usual with pip and conda:

pip install dgl-cuXX -f https://data.dgl.ai/wheels/repo.html
conda install -c dglteam dgl-cudaXX.X   # currently not available for OSX M1

Quality-of-life updates

System optimizations

  • Enable using UVA and FP16 with SparseAdam Optimizer (#3885, @nv-dlasalle )
  • Enable USE_EPOLL by default in distributed training (#4167)
  • Optimize the use of alternative streams in dataloader (#4177, @yaox12 )
  • Redirect AllocWorkspace to PyTorch's allocator if available (#4199, @yaox12 )

Bug fixes

Misc

  • Test pipeline for distributed training (#4122 , @Kh4L)

v0.8.2

30 May 15:06
Compare
Choose a tag to compare

This is a minor release with the following updates.

Test AArch64 Build

A 0.8.2 test build for AArch64 is available in

pip install dgl -f https://data.dgl.ai/wheels-test/repo.html   # or dgl-cuXX for CUDA

New Modules

  • Graph Isomorphism Network with Edge Features (#3934)
  • dgl.transforms.FeatMask for randomly dropping out dimensions of all node/edge features (#3968, @RecLusIve-F)
  • dgl.transforms.RowFeatNormalizer for normalization of all node/edge features (#3968, @RecLusIve-F)
  • Label propagation module (#4017)
  • Directional graph network layer (#4017)
  • Datasets for developing GNN explainability approaches (#3982)
  • dgl.transforms.SIGNDiffusion for augmenting input node features (#3982)

Quality-of-life Updates

  • Allow HeteroLinear with/without bias (#3970, @ksadowski13)
  • Allow selection of “socket” for RPC backend in distributed training (#3951)
  • Enable specification of maximum number of trials for socket backend in DistDGL (#3977)
  • Added floating-point conversion functions to dgl.transforms.functional (#3890, @ndickson-nvidia)
  • Improve the warning message when Tensoradapter is not found (#4055)
  • Add sanity check for in_edges/out_edges on empty graphs (#4050)

System Optimization

  • Improved graph batching on GPU for Graph DataLoaders (#3895, @ayasar70)
  • CPU DataLoader affinitization (#3723 @daniil-sizov)
  • Memory consumption optimization on index shuffling in dataloader (#3980)
  • Remove unnecessary induced vertices in edge subgraph (#3978, @yaox12)
  • Change the curandState and launch dimension of GPU neighbor sampling kernel (#3990, @paoxiaode)

Bug fixes

v0.8.1

17 Apr 05:19
Compare
Choose a tag to compare

This is a minor release that includes the following model updates, optimizations, new features and bug fixes.

Model update

  • nn.GroupRevRes from Training Graph Neural Networks with 1000 layers [#3842]
  • transforms.LaplacianPositionalEncoding from Graph Neural Networks with Learnable Structural and Positional Representations [#3869]
  • transforms.RWPositionalEncoding from Graph Neural Networks with Learnable Structural and Positional Representations [#3869]
  • dataloading.SAINTSampler from GraphSAINT [#3879]
  • nn.EGNNConv from E(n) Equivariant Graph Neural Networks [#3901]
  • nn.PNAConv from the baselines of E(n) Equivariant Graph Neural Networks [#3901]

Example update

Feature update (new functionalities, interface changes, etc.)

  • Radius graph - construct a graph by connecting points within a given distance. [#3829 @ksadowski13]
    • It uses torch.cdist so the space complexity is O(N^2).
  • Added a get_attention parameter in GlobalAttentionPooling. [#3837 @decoherencer]

Quality of life update

  • Example to train with multi-GPU with PyTorch Lightning. [#3863]
  • Multi-GPU inference with UVA. [#3827 @nv-dlasalle]
  • Enable UVA sampling with CPU indices to save GPU memory. [#3892]
  • Set stacklevel=2 for DGL-raised warnings. [#3816]
  • Pure GPU example of GraphSAGE, with both node classification and link prediction. [#3796 @nv-dlasalle, #3856 @Kh4L]
  • Tensoradapter DLPack 0.6 compatibility / PyTorch 1.11 support. [#3803]

System optimization

  • Enable UVA for PinSAGE and RandomWalk. [#3857 @yaox12]
  • METIS partition with communication volume minimization, reduces the communication volume by 13.4% compared with edge-cut minimization on ogbn-products. [#3821 @chwan1016]
  • Change parameter of curand_init for reducing GPU latency [#3794 @paoxiaode]

Bug fixes

v0.8.0post2

03 Apr 08:38
Compare
Choose a tag to compare

This is a bugfix release including the following bugfixes:

Quality-of-life updates

  • Python 3.10 support.
  • PyTorch 1.11 support.
  • CUDA 11.5 support on Linux. Please install with
    pip install dgl-cu115 -f https://data.dgl.ai/wheels/repo.html  # if using pip
    conda install dgl-cuda11.5 -c dglteam  # if using conda
    
  • Compatibility to DLPack 0.6 in tensoradapter (#3803) for PyTorch 1.11
  • Set stacklevel=2 for dgl_warning (#3816)
  • Support custom datasets in DataLoader that are not necessarily tensors (#3810 @yinpeiqi )

Bug fixes

  • Pass ntype/etype into partition book when node/edge_split (#3828)
  • Fix multi-GPU RGCN example (#3871 @yaox12)
  • Send rpc messages blockingly in case of congestion (#3867). Note that this fix would probably cause speed regression in distributed DGL training. We were still finding the root cause of the underlying issue in #3881.
  • Fix CopyToSharedMem assuming that all relation graphs are homogeneous (#3841)
  • Fix HAN example crashing with CUDA (#3841)
  • Fix UVA sampling crash without specifying prefetching features (#3862)
  • Fix documentation display issue of node/edge_split (#3858)
  • Fix device mismatch error in GraphSAGE distributed training example under multi-node multi-GPU (#3870)
  • Use torch.distributed.algorithms.join.Join to deal with uneven training sets in distributed training (#3870)
  • Dataloader documentation fixes (#3886)
  • Remove redundant reference of networkx package in pagerank.py (#3888 @AzureLeon1 )
  • Make source build work for systems where the default is Python 2 (#3718)
  • Fix UVA sampling with partially specified node types (#3897)

v0.8.0post1

08 Mar 07:51
Compare
Choose a tag to compare

This is a quick post-release with critical bug fixes:

  • Fix incorrect name when fetch data in sparse optimizer #3808
  • Fix DataLoader not working with heterogeneous graphs on multiple GPUs #3801
  • Fix error in heterogeneous graph partitioning when the graph is unidirectional bipartite #3793

v0.8.0

01 Mar 15:45
Compare
Choose a tag to compare

v0.8.0 is a major release with many new features, system improvement and fixes. Read the blog for the highlighted features.

Major features

Mini-batch Sampling Pipeline Update

Enabled CUDA UVA-based optimization and feature prefetching for all built-in graph samplers (up to 4x speedup compared to v0.7). Users can now specify the features to prefetch and turn on UVA optimization in dgl.dataloading.Sampler and dgl.dataloading.DataLoader.

g = ...                             # some DGLGraph data
train_nids = ...                    # training node IDs
sampler = dgl.dataloading.MultiLayerNeighborSampler(
    fanout=[10, 15],
    prefetch_node_feats=['feat'],   # prefetch node feature 'feat'
    prefetch_labels=['label'],      # prefetch node label 'label'
)
dataloader = dgl.dataloading.DataLoader(
    g, train_nids, sampler,
    device='cuda:0',     # perform sampling on GPU 0
    batch_size=1024,
    shuffle=True,
    use_uva=True         # turn on UVA optimization
)

We have done a major refactor on the sampling components to make it easier to implement new graph samplers. Added a new base class dgl.dataloading.Sampler with one abstract method sample for overriding. Added new APIs dgl.set_src_lazy_features, dgl.set_dst_lazy_features, dgl.set_node_lazy_features, dgl.set_edge_lazy_features for customizing prefetching rules. The code below shows the new user experience.

class NeighborSampler(dgl.dataloading.Sampler):
    def __init__(self,
                 fanouts : list[int],
                 prefetch_node_feats: list[str] = None,
                 prefetch_edge_feats: list[str] = None,
                 prefetch_labels: list[str] = None):
        super().__init__()
        self.fanouts = fanouts
        self.prefetch_node_feats = prefetch_node_feats
        self.prefetch_edge_feats = prefetch_edge_feats
        self.prefetch_labels = prefetch_labels

    def sample(self, g, seed_nodes):
        output_nodes = seed_nodes
        subgs = []
        for fanout in reversed(self.fanouts):
            # Sample a fixed number of neighbors of the current seed nodes.
            sg = g.sample_neighbors(seed_nodes, fanout)
            # Convert this subgraph to a message flow graph.
            sg = dgl.to_block(sg, seed_nodes)
            seed_nodes = sg.srcdata[NID]
            subgs.insert(0, sg)
         input_nodes = seed_nodes
         
         # handle prefetching
         dgl.set_src_lazy_features(subgs[0], self.prefetch_node_feats)
         dgl.set_dst_lazy_features(subgs[-1], self.prefetch_labels)
         for subg in subgs:
             dgl.set_edge_lazy_features(subg, self.prefetch_edge_feats)

         return input_nodes, output_nodes, subgs

Related documentations:

We thank Xin Yao (@yaox12 ) and Dominique LaSalle (@nv-dlasalle ) from NVIDIA and David Min (@davidmin7 ) from UIUC for their contributions.

DGL-Go

DGL-Go is a new command line tool for users to get started with training, using and studying Graph Neural Networks (GNNs). Data scientists can quickly apply GNNs to their problems, whereas researchers will find it useful to customize their experiments.

The initial release include

  • Four commands, dgl train, dgl recipe, dgl configure and dgl export.
  • 3 training pipelines for node prediction using full graph training, link prediction using full graph training and node prediction using neighbor sampling.
  • 5 node encoding models: gat, gcn, gin, sage, sgc; 3 edge encoding models: bilinear, dot-product, element-wise.
  • 10 datasets including custom dataset in CSV format.

NN Modules

We have accelerated dgl.nn.RelGraphConv and dgl.nn.HGTConv by up to 36x and 12x compared with the baselines from v0.7 and PyG. Shortened the implementation of dgl.nn.RelGraphConv by 3x (from 200L → 64L).

Breaking change: dgl.nn.RelGraphConv no longer accepts 1-D integer tensor representing node IDs during forward. Please switch to torch.nn.Embedding to explicitly represent trainable node embeddings.

Below are the new NN modules added to v0.8:

A new edge_weight argument is added to several GNN modules to support training on weighted graph. Added a new user guide chapter 5.5 about how to use edge weights in your GNN model.

Graph Dataset and Transforms

Rename the old dgl.transform package to dgl.transforms to follow PyTorch’s namespace convention. All DGL’s datasets now accept an extra transforms keyword argument for data augmentation and transformation:

import dgl
import dgl.transforms as T
t = T.Compose([
    T.AddSelfLoop(),
    T.GCNNorm(),
])
dataset = dgl.data.CoraGraphDataset(transform=t)
g = dataset[0]  # graph and features will be transformed automatically

Added 16 graph data transforms module:

Read more

0.7.2

08 Nov 04:09
Compare
Choose a tag to compare

0.7.2 Release Notes

This is a patch release targeting CUDA 11.3 and PyTorch 1.10. It contains (1) distributed training on heterogeneous graphs, and (2) bug fixes and code reorganization commits. The performance impact should be minimal.

To install with CUDA 11.3 support, run either

pip install dgl-cu113 -f https://data.dgl.ai/wheels/repo.html

or

conda install -c dglteam dgl-cuda11.3

Distributed Training on Heterogeneous Graphs

We have made the interface of distributed sampling on heterogeneous graph consistent with single-machine code. Please refer to https://github.com/dmlc/dgl/blob/0.7.x/examples/pytorch/rgcn/experimental/entity_classify_dist.py for the new code.

Other fixes

  • [Bugfix] Fix bugs of farthest_point_sampler (#3327, @sangyx)
  • [Bugfix] Fix sparse embeddings for PyTorch < 1.7 #3291 (#3333)
  • Fixes bug in hg.update_all causing crash #3312 (#3345, @sanchit-misra)
  • [Bugfix] And PYTHONPATH in server launch. (#3352)
  • [CPU][Sampling][Performance] Improve sampling on the CPU. (#3274, @nv-dlasalle)
  • [Performance, CPU] Rewriting OpenMP pragmas into parallel_for (#3171, @tpatejko)
  • [Build] Fix OpenMP header inclusion for Mac builds (#3325)
  • [Performance] improve coo2csr space complexity when row is not sorted (#3326)
  • [BugFix] initialize data if null when converting from row sorted coo to csr (#3360)
  • fix broadcast tensor dim in dgl.broadcast_nodes (#3351, @jwyyy)
  • [BugFix] fix typo in fakenews dataset variable name (#3363, @kayzliu)
  • [Doc] Added md5sum info for OGB-LSC dataset (#3332, @msharmavikram)
  • [Feature] Graceful handling of exceptions thrown within OpenMP blocks (#3353)
  • Fix torch import in example (#3372, @jwyyy)
  • [Distributed] Allow user to pass-in extra env parameters when launching a distributed training task. (#3375)
  • [BugFix] extract gz into target dir (#3389)
  • [Model] Refine GraphSAINT (#3328 @ljh1064126026 )
  • [Bug] check dtype before convert to gk (#3414)
  • [BugFix] add count_nonzero() into SA_Client (#3417)
  • [Bug] Do not skip graphconv even no edge exists (#3416)
  • Fix edge ID exclusion when both g and g_sampling are specified in EdgeDataLoader(#3322)
  • [Bugfix] three bugs related to using DGL as a subdirectory(third_party) of another project. (#3379, @yuanzexi )
  • [PyTorch][Bugfix] Use uint8 instead of bool in pytorch to be compatible with nightly version (#3406, #3454, @nv-dlasalle)
  • [Fix] Use ==/!= to compare constant literals (str, bytes, int, float, tuple) (#3415, @cclauss)
  • [Bugfix][Pytorch] Fix model save and load bug of stgcn_wave (#3303, @HaoWei-TomTom )
  • [BugFix] Avoid Memory Leak Issue in PyTorch Backend (#3386, @chwan-rice )
  • [Fix] Split nccl sparse push into two groups (#3404, @nv-dlasalle )
  • [Doc] remove duplicate papers (#3393, @chwan-rice )
  • Fix GINConv backward #3437 (#3440)
  • [bugfix] Fix compilation with CUDA 11.5's CUB (#3468, @nv-dlasalle )
  • [Example][Performance] Enable faster validation for pytorch graphsage example (#3361, @nv-dlasalle )
  • [Doc] Evaluation Tutorial for Link Prediction (#3463)