Chaoxin Visualization Project

This is a Python project based on jieba to split sentences.

File Structure

In folder Named Entity Recognition (NER),

│  README.md
│
├─data
│      AfterPreprocessed.csv
│      AfterSegmentation_space0.1.txt
│      cleaned.csv
│      corpus.txt
│      highlight.csv
│      highlight.csv_bak
│      highlight.csv_bak2
│      stopwords.txt
│      Tag.csv
│
├─LINE
│      concatenate.cpp
│      distance.cpp
│      line.cpp
│      normalize.cpp
│      reconstruct.cpp
│      train.sh
│
├─src
│      LDA.py
│      NER.py
│      jieba_example.py
│      topic_model.ipynb
│      track_network.py
│
└─tf_model
        model-00010.param
        model-00010.tag
        model-00010.twords
        model-00010.wordmap
        model-00010.zvalue

data folder: contains all the dataset
LINE folder: the implementation of LINE Algorithm
src folder: source code for LDA Algorithm (LDA.py), topic model in tensorflow (topic_model.ipynb), main function (track_network.py & NER.py) and an example file to use jieba library (jieba_example.py)
tf_model folder: TensorFlow models.

How to use

Check the topic model visualization with topic_model.ipynb, you can open it with jupyter notebook.
NER.py contains some main functions for preprocessing the dataset, segmentation, cleaning and tag.

Acknowledgement

Topic Model

如何用 Python 从海量文本抽取主题？

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Named Entity Recognition (NER)		Named Entity Recognition (NER)
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chaoxin Visualization Project

File Structure

How to use

Acknowledgement

About

Releases

Packages

Languages

License

Mickeypeng/Chaoxin-Visualization-Project

Folders and files

Latest commit

History

Repository files navigation

Chaoxin Visualization Project

File Structure

How to use

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages