Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retweet decay #22

Open
meilinshi opened this issue Aug 8, 2019 · 9 comments
Open

Retweet decay #22

meilinshi opened this issue Aug 8, 2019 · 9 comments
Assignees

Comments

@meilinshi
Copy link
Contributor

meilinshi commented Aug 8, 2019

  • Look at the retweet decay for an individual tweet--maybe start with the most retweeted tweet.
  • Then add a few more highly retweeted tweets, and a few lowly retweeted tweets in a different color (so you'd have a few lines for each color)
  • Then try to figure out how to show the information on tweet decay for all of the tweets. What the final figure should communicate is what the "average" lifespan of a tweet about soil health is.
  • Label key events: soil health partnership meeting in January etc., what are the proportion of the conference hashtag tweets in January tweets:

Events:

  • Soil Health Partnership Summit
    Jan.18th - 19th, 2018, with hashtag #SHPSummit18
    Jan.15th - 16th, 2019, with hashtag #SoilSummit19

  • Soil Health Institute Annual Meeting
    Aug.1st - 3rd, 2018
    Jul.16th - Jul.18th, 2019

@meilinshi
Copy link
Contributor Author

meilinshi commented Aug 16, 2019

I've tried to plot some conference tweets time series.

Some concerns:

  • we don't collect data everyday, so there's definitely missing data.
  • this is done manually by finding out the conference tags first and query the hashtag.

SHS 2018 during Jan.18-19
image
image

SHS 2019 during Jan.15-16
image
image

@swood-ecology
Copy link
Contributor

swood-ecology commented Aug 22, 2019

  1. Drop 2018 figures because of missing data on the second day of the conference.
  2. Look at retweet decay for 5 most retweeted tweets from 2019 conference and 10 most retweeted tweets overall, across all search terms.

@meilinshi
Copy link
Contributor Author

meilinshi commented Aug 29, 2019

5 most retweeted tweets decay SHS 2019
image

10 most retweeted tweets overall
image

Single tweet decay
image

@swood-ecology
Copy link
Contributor

I like the single tweet decay because you're able to track what happens for a specific tweet event. With the bar plots it's hard to tell what's leading to the values of the bars--is it because one tweet is retweeted several times, or several are retweeted once? I wonder if you could generate the single-tweet figure (with the line that connects all the points) for all of the tweets. Then it would probably look like a cloud of lines but that could give a sense of the average decay envelope.

@meilinshi
Copy link
Contributor Author

I've tried to plot decay for the top 5 tweets RT, set the limit as number > 2 to avoid long tail

Screen Shot 2019-09-04 at 11 20 47 AM

@swood-ecology
Copy link
Contributor

swood-ecology commented Sep 4, 2019

That's great. You might even limit it at >10 to shorten the tail even more and see the dynamics up to the first few days. You could take off the legend too because that'll get crazy when you add in more tweets.

@meilinshi
Copy link
Contributor Author

Changed the x-axis unit to hours and limit the number > 5 for top three most retweeted tweets.
All show different patterns.

Screen Shot 2019-09-05 at 11 22 59 AM
Screen Shot 2019-09-05 at 11 23 06 AM
Screen Shot 2019-09-05 at 11 23 14 AM

@swood-ecology
Copy link
Contributor

swood-ecology commented Sep 5, 2019

When looking at figures 2 and 3 (two tweets, three tweets), you can't tell which point corresponds with which tweet. Presumably each point is the sum of the RTs for the top 2 and 3 tweets? I think what we need is to visualize each tweet as a separate line, like in the prior figure, but to start adding more and more lines. For instance, instead of just the top 5 RTs, how about all of the tweets during the SHP2019 meeting?

@meilinshi
Copy link
Contributor Author

Screen Shot 2019-09-11 at 12 17 37 PM
I'm having a hard time to find which tweets are the retweets for a certain tweet in the dataset, in a systematical way. Especially when there are the tweets with the same content but different users in the noRT dataset --> leading to a possible negative number during calculation of day_since.

At the same time I'm filtering out the non-SHS19 tweets by setting a time limit, applying also retweet_count > 1 limit. That's why the figure above seems not having many entries; it includes 45 noRT and 80 RT in total when I plot it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants