Indonesian Cryptocurrenct Tweets.
Similar with https://github.com/SokKanaTorajd/gemastik21 without Topic Modelling.
Use same dataset from https://www.kaggle.com/wijatama/indonesiancryptotweets.
Combines Sastrawi's stopwords and Mas Devid's Stopwords and extra stopwords from myself.
- Make sure your RStudio and Gephi are installed. Gephi download here.
- Download the dataset.
- Install required packages such as nurandi/kataDasar, etc and import the libraries
- Import dataset and stopwords.
- Preprocessing (remove duplicate tweets, text lowering, stripping, tokenizing then remove EN and ID stopwords.
- Rejoin the tokens then remove tweets that contain less than 3 words.
- Create and filter bigrams. I only use bigram that appears more than 10 times.
- Separate bigram into source and target.
- Import required libraries for creating the network.
- Create and save the network.
- Open Gephi. Load the graphml file, then feel free to explore visualization you want.
Teached by: Ujang Fahmi and Text Cleaning Bahasa Indonesia