Skip to content
/ HMNet Public
forked from microsoft/HMNet

Official Implementation of "A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining""

License

Notifications You must be signed in to change notification settings

Rawan19/HMNet

 
 

Repository files navigation

HMNet

This is forked from the official code for the Microsoft's paper of HMNet model at EMNLP 2020. It is implemented under PyTorch framework. The related paper to cite is:

@Article{zhu2020a,
author = {Zhu, Chenguang and Xu, Ruochen and Zeng, Michael and Huang, Xuedong},
title = {A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining},
year = {2020},
month = {November},
url = {https://www.microsoft.com/en-us/research/publication/end-to-end-abstractive-summarization-for-meetings/},
journal = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing},
}

What's new?

Inference for a single example

we modified the code to do inference for a single example. The predictions (summary) is stored in a summary.text file.

we also added a preprcessing script to transfrom from Microsoft Teams meeting transcription file into the AMI jsonl format

Steps:

  1. Install the requirements
  2. If you have a transcript for a meeting, paste it in preprocess.py
  3. Edit the name_role_dict in preprocess.py file with the names included in the meeting(this should be automatically extracted in future versions)
  4. run preprocess.py. this returns the jsonl.gzip file that can be used as an input to the model, the file gets stored in ExampleRawData/meeting_summarization/AMI_proprec/test/test_raw2.jsonl.gzip
  5. Add the pretrained model to the repo in change that in the conf file, (currently I'm using AMI-finetuned)
  6. run this line: !python PyLearn.py evaluate ExampleConf/conf_eval_hmnet_AMI, this returns the summary in 'summary.txt'

Finetune HMNet

It is recommended to run our model inside a docker:

Build docker image

cd Docker
sudo docker build . -t hmnet

Run container from image

sudo nvidia-docker run -it hmnet /bin/bash

Get the pretrained HMNet ready at ExampleInitModel/HMNet-pretrained. Please see document.

Finetune on AMI dataset

CUDA_VISIBLE_DEVICES="0,1,2,3" mpirun -np 4 --allow-run-as-root python PyLearn.py train ExampleConf/conf_hmnet_AMI

The training log/model/settings could be found at ExampleConf/conf_hmnet_AMI_conf~/run_1

Data paths

  • ExampleRawData/meeting_summarization/AMI_proprec: The preprocessed AMI dataset. The *.json files point to the path to each split. Each folder (train, dev or test) contains the compressed chunks of data in the format for infinibatch.

  • ExampleRawData/meeting_summarization/ICSI_proprec: Same as above for ICSI dataset.

  • ExampleInitModel/transfo-xl-wt103: Here we only used the vocabulary from Transformer-XL, provided by Huggingface.

Evaluation

Step 1: specify the model path

In ExampleConf/conf_eval_hmnet_AMI, for the line

PYLEARN_MODEL ###

Replace ### to the real checkpoint path. Use the relative path w.r.t the location of this configuration file.

Step 2: run the evaluate pipeline

CUDA_VISIBLE_DEVICES="0,1,2,3" mpirun -np 4 --allow-run-as-root python PyLearn.py evaluate ExampleConf/conf_eval_hmnet_AMI

The decoding results could be found at ExampleConf/conf_eval_hmnet_AMI_conf~/run_1

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Security

Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include Microsoft, Azure, DotNet, AspNet, Xamarin, and our GitHub organizations.

If you believe you have found a security vulnerability in any Microsoft-owned repository that meets Microsoft's Microsoft's definition of a security vulnerability, please report it to us as described below.

Reporting Security Issues

Please do not report security vulnerabilities through public GitHub issues.

Instead, please report them to the Microsoft Security Response Center (MSRC) at https://msrc.microsoft.com/create-report.

If you prefer to submit without logging in, send email to secure@microsoft.com. If possible, encrypt your message with our PGP key; please download it from the the Microsoft Security Response Center PGP Key page.

You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at microsoft.com/msrc.

Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:

  • Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
  • Full paths of source file(s) related to the manifestation of the issue
  • The location of the affected source code (tag/branch/commit or direct URL)
  • Any special configuration required to reproduce the issue
  • Step-by-step instructions to reproduce the issue
  • Proof-of-concept or exploit code (if possible)
  • Impact of the issue, including how an attacker might exploit the issue

This information will help us triage your report more quickly.

If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our Microsoft Bug Bounty Program page for more details about our active programs.

Preferred Languages

We prefer all communications to be in English.

Policy

Microsoft follows the principle of Coordinated Vulnerability Disclosure.

About

Official Implementation of "A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining""

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 54.6%
  • HTML 43.9%
  • Dockerfile 1.1%
  • Mako 0.4%