Skip to content

heukirne/brazilian-blog-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

brazilian-blog-dataset

Collection of Brazilian Blogspot Posts

Author: Henrique D. P. dos Santos, Vinicius Woloszyn, and Renata Vieira

Abstract: Diary-like content expressing authors personal experiences and sentiments over a variety of topics is generated every day and made available on the Internet. This rich content can be used for psychological analysis and knowledge discovery regarding human related issues in several ways. This paper presents the creation of a Brazilian Portuguese corpus, using blog posts, for personal stories analyses and detection. We present an analysis of psycholinguistic categories across personal story and non-story posts, discussing their similarities and differences. We also study the use of these psycholinguistic categories as classifying features. Then we describe the evaluation of several machine learning approaches and the process of applying them to identify personal stories on the basis of our dataset. Finally, we investigate the main topic-related polarity of personal narratives posts.

Keywords: Corpus, Natural Language Processing, Personal Story, Psycholinguistic, Social Media.

Full text , Slides , Bibtex

Complete Reference: Henrique D. P. dos Santos, Vinicius Woloszyn, and Renata Vieira. 2017. Portuguese Personal Story Analysis and Detection in Blogs. In Proceedings of WI ’17, Leipzig, Germany, August 23-26, 2017, 7 pages. DOI: 10.1145/3106426.3106517

Basic Statistics

https://github.com/heukirne/brazilian-blog-dataset/blob/master/blogs_stats.ipynb

Countries Stats

https://github.com/heukirne/brazilian-blog-dataset/blob/master/countries.json

Blogset-BR Dataset

http://www.inf.pucrs.br/linatural/blogset-br (4.7 GB, 7.4M posts)

Personal Story Annotated Posts

https://github.com/heukirne/brazilian-blog-dataset/raw/master/corpus.csv.gz (1K Posts)

PUCRS NLP Group

This project belongs to NLP Group at PUCRS, Brazil

About

Collection of Brazilian Personal Story Posts

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages