-
Notifications
You must be signed in to change notification settings - Fork 0
/
_book_outline.Rmd
145 lines (128 loc) · 6.47 KB
/
_book_outline.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
# *Network Macroscopes: Data Mining Methodologies for Social Media and Digital Content*
## Authors: Aaron Beveridge and Nicholas M. Van Horn
## Introduction
- What is this book about?
- Who is the audience?
- How/why is the book structured the way it is (discuss outline)?
- How can people get the most out of this book?
- Use MassMine
- See companion tutorials at <www.massmine.org>
## Chapter 1: Understanding Networks
### Types
### Natures
- Common knowledge (need better title)
- Metcalfe's Law
- Traveling Salesman Problem
- Dunbar number
- Others
### Terminologies
### Lifecycles
### Laws and Limitations
## Chapter 2: Building Macroscopes
### Overview and History
- Text as Data
- Brief History of NLP
- Turing test
- Before/After Bag of Words
- Word Embeddings and GPT3
### Natural Language Processing
- Tokenization
- Zipf's Law and Word Frequencies
- N-Grams and Correlations
- Topic Modeling
- Word Embeddings
### Information Extraction
- Text to Speech
- Search
- Sentiment
- Emoji
## Chapter 3: Audience and Circulation: Artifacts, Locations, and Demographics
### Overview of Audience and Circulation
This chapter introduces the concepts of "audience" and "circulation" through the lens of exploratory data analysis (EDA). By providing an introduction to metadata types and their function for various types of digital networks and social media, this chapter explains how to...
### Documents
- Titles
- Authors
### Demographics
- Names
- Profiles
### Locations
- City, State, Country
- Geolocation
- IP Address
- Document/page metadata
- User metadata
### Others
- Others
## Chapter 4: Defining Trends
### Innovations
### Games
### Graphs
This chapter tells the story of "trends" for digital networks. This chapter starts with a very simple definition of a trend according to Diffusion of Innovations, explaining the S-shaped model of diffusion, and the "Justin Bieber" problem. It then complicates this definition with discussions of "contagion" effects, and the problem of defining trends within singular networks.
Moving forward, it looks at the question of "trends" for game theory, explaining some fundamental issues related to information cascades, the unintended consequences of "datafication" (the 'like' button, for example), and then other issues--such as the common practice of 'gaming' social network algorithms. The goal for this chapter is to continue complicating the idea of "trends," and to have the concept evolve as we work our way through Diffusion of Innovations, and then game theory, and then finally into graphs and other issues.
For graph theory we discuss strong and weak ties, triadic closure, and other known phenomenon resulting from agent based modeling. Additionally, we can bring in other ideas, like metcalfe's law and the dunbar number to also provide competing perspectives on how we make sense of patters/trends in digital networks.
**Chapter 4 Notes**
- What is a trend?
- Diffusion of innovations terminologies
- S-shaped curve of innovation
- the "newness" problem
- How "game theory" complicates any simple notions of "trends" and virality
- weak/strong ties
- triadic closure
- others?
- Graphs, agents, and blackboxes
- Others?
- metcalfe's law
- dunbar's number
- Information Cascades and Feedback Loops
## Chapter 5: Network Confidence
### Overview of Network Confidence
Literature and methods review chapter. Raises a key question and suggests areas of study connected to "network confidence." It also provides a transition to the "mixed methods" chapter, as many of the mixed methods studies build on the questions that are raised in regards to network confidence.
### Spam and Misinformation
### Advertising and Manipulation
### Free Speech and Moderation
### Hate Speech and Censorship
###
This chapter brings the book full circle. In another iteration of this chapter, we could have framed it under questions of "network health." How do we study the health of networks? What do healthy networks look like? Maybe we cannot yet provide a foundational ideal that defines how a healthy network functions, or what the key features may be, But, we can certainly define many of the current pollutants that are making our networks full of garbage--contaminants--and begin defining the problem.
## Chapter 6: Mixed Methods
### Overview of Mixed Methods
This chapter provides an overview for methodologies that do not strictly collect or analyze data *from* a particular network, but instead create *new* or *simulated* data to study social media and digital content.
### Case Studies
### Surveys
### Qualitative Coding
### Modeling
## Chapter 7: Attention Ecology
XXX TRASH XXX
XXX Chapter 8: Research Ethics
XXX this chapter combines the previously separated chapters: "Best Practices" and "Research Ethics" XXX
This chapter explains the underlying rationale for MassMine: the use Application Programming Interfaces (APIs) to collect data from digital networks. Building on a basic description of API scraping, as compared to web scraping and web crawling, Chapter 1 explains the benefits and
limitations of API scraping for interdisciplinary research.
**Chapter Notes**
- Being a Good Citizen
- What is Data Mining?
- A Brief History of Internet Research
- Web crawling vs. web scraping vs. APIs vs. Semantic web
- Big Data vs. Data Science
- Best Practices
- Avoid Pitfalls
- Accidental DDOS
- Unintended Doxing
- Respect robot.txt
- Understand API limitations and developer guidelines for each network you research
- Twitter "no benchmarking" example
- Track the evolving ethics and legalities surrounding data mining (discussed at length in chapter 3)
- Follow scientific best practices (the subject of this book, also philosophy of interdisciplinary science chapter 8)
XXX
This chapter addresses the many ethical and ownership issues surrounding data mining activities as they relate to interdisciplinary research. Focusing on two primary areas--privacy rights and intellectual property--this chapter examines key areas related to data collection, anonimization, and ownership.
XXX
- Privacy rights--why they matter
- Respond to this faulty logic:
- "If you're not doing anything wrong, then you shouldn't have anything to hide."
- the limitations of "anonymous" data
- intellectual property law and web scraping
- Open access
- the importance of the open data movement for interdisciplinary data mining
- replicating studies/methods
- archiving
- data loss
- legal issues involved with sharing data collected from APIs
- database, storage, and archiving approaches