-
Notifications
You must be signed in to change notification settings - Fork 2
/
history.tex
746 lines (600 loc) · 38.2 KB
/
history.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
\chapter{A brief history of data science}
\label{chap:history}
\glsresetall
\chapterprecishere{``Begin at the beginning,'' the King said gravely, ``and
go on till you come to the end: then stop.''\par\raggedleft--- \textup{Lewis
Carroll}, Alice in Wonderland}
There are many points of view about the beginning of data science. For the sake of
contextualization, I separate the topic into two approaches: the history of the term itself
and a broad timeline of data-driven sciences, highlighting the important figures in each
age.
I believe that the history of the term is important for understanding the context of the
discipline. Over the years, the term has been used to label quite different fields of
study. Before presenting my view on the term, I present the views of two important
figures in the history of data science: Peter Naur and William Cleveland.
Moreover, studying the main facts and figures in the history of data-driven sciences
enables us to understand the evolution of the field and hopefully guide us to evolve it
further. Besides, history also teaches us ways to avoid repeating the same mistakes.
Most of the important theories and methods in data science have been developed
simultaneously across different fields, such as statistics, computer science, and engineering.
The history of data-driven sciences is long and rich. I present a timeline of the ages of
data handling and the most important markers of data analysis.
I do not intend to provide a comprehensive history of data science. I aim to provide
enough context to support the development of the material in the following chapters,
sometimes avoiding directions that are not relevant in the context of inductive learning.
\begin{mainbox}{Chapter remarks}
\boxsubtitle{Contents}
\startcontents[chapters]
\printcontents[chapters]{}{1}{}
\vspace{1em}
\boxsubtitle{Context}
\begin{itemize}
\itemsep0em
\item The term ``data science'' is recent and has been used to label rather different
fields.
\item The history of data-driven sciences is long and rich.
\item Many important theories and methods in data science have been developed
simultaneously in different fields.
\item The history of data-driven sciences is important to understand the evolution of
the field.
\end{itemize}
\boxsubtitle{Objectives}
\begin{itemize}
\itemsep0em
\item Understand the history of the term ``data science.''
\item Understand the major milestones in the history of data-driven sciences.
\item Identify important figures in the history of data-driven sciences.
\end{itemize}
\boxsubtitle{Takeaways}
\begin{itemize}
\itemsep0em
\item We have evolved both in terms of theory and application of data-driven sciences.
\item There is no consensus on the definition of data science (including which fields
it encompasses).
\item However, there is enough evidence to support data science as a new science.
\end{itemize}
\end{mainbox}
{}
\clearpage
\section{The term ``data science''}
The term data science is recent and has been used to label rather different fields of
study. In the following, I emphasize the history of a few notable usage of the term.
\def\naurds{(0,0) circle (20mm)}
\def\naurcs{(0:5mm) circle (15mm)}
\def\naurde{(0:40mm) circle (15mm)}
\colorlet{circle edge}{black!50}
\colorlet{circle area}{black!20}
\tikzset{filled/.style={fill=circle area, draw=circle edge, thick},
outline/.style={draw=circle edge, thick}}
\paragraph{Peter Naur (1928 -- 2016)}
The term ``data science'' itself was coined in the 1960s by Peter Naur (/naʊə/). Naur was
a Danish computer scientist and mathematician who made significant contributions to the
field of computer science, including his work on the development of programming
languages\footnote{He is best remembered as a contributor, with John Backus, to the
\gls{bnf} notation used in describing the syntax for most programming
languages.}.
His ideas and concepts laid the groundwork for the way we think about programming and data
processing today.
Naur disliked the term computer science and suggested it be called datalogy or data
science. In the 1960s, the subject was practised in Denmark under Peter
Naur's term datalogy, which means the science of data and data processes.
He coined this term to emphasize the importance of data as a fundamental component of
computer science and to encourage a broader perspective on the field that included
data-related aspects. At that time, the field was primarily centered on programming
techniques, but Naur's concept broadened the scope to recognize the intrinsic role of data
in computation.
In his book\footfullcite{Naur1974}, ``Concise Survey of Computer Methods'', he
parts from the concept that \emph{data} is ``a representation of facts or ideas in a
formalised manner capable of being communicated or manipulated by some
process.''\footnote{I. H. Gould (ed.): ‘IFIP guide to concepts and terms in data
processing’, North-Holland Publ. Co., Amsterdam, 1971.} Note however that his view of the
science only ``deals with data [\dots] while the relation of data to what they represent
is delegated to other fields and sciences.''
\begin{figurebox}[label=fig:naur]{Naur's view of data science.}
\centering
\begin{tikzpicture}
\begin{scope}
\clip \naurds;
\fill[filled] \naurcs;
\end{scope}
\draw[outline] \naurds node(ds) {};
\draw[outline] \naurcs node {Computer science};
\draw[outline] \naurde node {Domain expertise};
\node[anchor=north,above] at (0,2) {Data science};
\end{tikzpicture}
\tcblower
For Naur, data science studies the techniques to deal
with data, but he delegates the meaning of data to other fields.
\end{figurebox}
It is interesting to see the central role he gave to data in the field of computer
science. His view that the relation of data to what they represent is delegated to other
fields and sciences is still debatable today. Some scientists argue that data science
should focus on the techniques to deal with data, while others argue that data science
should encompass the whole business domain. A depiction of Naur's view of data science is
shown in \cref{fig:naur}.
\def\clevelandds{(0,0) circle (20mm)}
\def\clevelandst{(0:-5mm) circle (15mm)}
\def\clevelandde {(2,1) circle (15mm)}
\def\clevelandcs {(2,-1) circle (15mm)}
\paragraph{William Cleveland (born 1943)}
In 2001, a prominent statistician used the term ``data science'' in his work to describe a
new discipline that comes from his ``plan to enlarge the major areas of technical work of
the field of statistics\footfullcite{Cleveland2001}.''
In 2014, that work was republished\footnote{W. S. Cleveland.
Data Science: An Action Plan for the Field of Statistics. Statistical Analysis and Data
Mining, 7:414–417, 2014. reprinting of 2001 article in ISI Review, Vol 69.}.
He advocates the expansion of statistics beyond theory into technical areas, significantly
changing statistics. Thus, it warranted a new name.
As a result, William Swain Cleveland II is credited to define data science as it is most
used today. He is a highly influential figure in the fields of statistics, machine
learning, data visualization, data analysis for multidisciplinary studies, and high
performance computing for deep data analysis.
\begin{figurebox}[label=fig:cleveland]{Cleveland's view of data science.}
\centering
\begin{tikzpicture}
\begin{scope}
\clip \clevelandds;
\fill[filled] \clevelandst;
\fill[filled] \clevelandde;
\fill[filled] \clevelandcs;
\end{scope}
\draw[outline] \clevelandds node(ds) {};
\draw[outline] \clevelandst node {Statistics};
\draw[outline] \clevelandde node {Domain expertise};
\draw[outline] \clevelandcs node {Computer science};
\node[anchor=north,above] at (0,2) {Data science};
\end{tikzpicture}
\tcblower
For Cleveland, data science is the ``modern'' statistics,
where it is enlarged by computer science and domain expertise.
\end{figurebox}
In his view, data science is the ``modern'' statistics, where it is enlarged by computer
science methods and domain expertise. An illustration of Cleveland's view of data science
is shown in \cref{fig:cleveland}. It is important to note that Cleveland never defined a
explicit list of computer science fields and business domains that should be included in
the new discipline. The main idea is that statistics should rely on computational methods
and that the domain expertise should be considered in the analysis.
\paragraph{Buzzword or a new science?}
Be aware that scientific literature has no consensus on the definition of data science, and it is still considered
by some to be a buzzword\footnote{Press, Gil. ``Data Science: What's The Half-Life of a
Buzzword?''. Forbes. Available at
\href{https://www.forbes.com/sites/gilpress/2013/08/19/data-science-whats-the-half-life-of-a-buzzword/}%
{forbes.com/sites/gilpress/2013/08/19/data-science-whats-the-half-life-of-a-buzzword}.}.
Most of the usages of the term in literature and in the media are either a rough
reference to a set of data-driven techniques or a marketing strategy. Naur
(\cref{fig:naur}) and Cleveland (\cref{fig:cleveland}) are among the few that try to
carefully define the term. However, both of them do not see data science as an
independent field of study, but an enlarged scope of an existing science. I disagree;
the social and economical demand for data-driven solutions led to an evolution in our
understanding of the challenges we are facing. As a result, we see many ``data
scientist'' being hired and many ``data science degrees'' programs emerging.
In \cref{chap:data}, I dare to provide a (yet another) definition for the term. I
argue that its object of study can be precisely established to support it as a new
science.
\section{Timeline and historical markers}
\textcite{Kelleher2018}\footfullcite{Kelleher2018} provides an interesting timeline of data-driven methods and
influential figures in the field. I reproduce it here with some changes, including
some omissions and additions. On the subject of data analysis, I include some of the
exceptional remarks
from \textcite{Vapnik1999b}\footfullcite{Vapnik1999b}.
I first address data handling --- which I include data sources, collection, organization,
storage, and transformation ---, and then data analysis and knowledge extraction.
\subsection{Timeline of data handling}
\label{sub:time-handling}
The importance of collecting and organizing data goes without saying. Data fuels analysis and
decision making. In the following, I present some of the most important milestones in the history
of data handling.
\begin{figurebox}[label=fig:data-handling-history]{Timeline of the ages of data handling.}
\centering
\begin{tikzpicture}
\draw (0,0) -- (8,0);
\foreach \x in {0,1,...,8} {
\draw (\x,-0.1) -- (\x,0.1);
}
\foreach \x/\y/\z in {%
0/Pre-digital Age/{3,800 BC -- \nth{18} c.},
2/Digital Age/{1890 -- 1960},
4/Formal Age/{1970s},
6/Integrated Age/{1980 -- 1990},
8/Ubiquitous Age/{2000 -- present}} {
\node[anchor=north] at (\x,-0.1) {\footnotesize\y};
\node[anchor=south] at (\x,0.1) {\footnotesize\z};
}
\end{tikzpicture}
\end{figurebox}
\Cref{fig:data-handling-history} illustrates the proposed timeline. Ages are not strict
boundaries, but rather periods where some important events happened. Also, observe that
the timescale is not linear. The Pre-digital Age is the longest period, and one could
divide it into smaller periods. My choices of ages and their boundaries are motivated by
didactic reasons.
\subsubsection{Pre-digital age}
We can consider the earliest records of data collection to be the notches on sticks and
bones to keep tracking of passing of time. The Lebombo bone, a baboon fibula with
notches, is probably the earliest known mathematical object. It was found in the Lebombo
Mountains located between South Africa and Eswatini.
They estimate it is more
than 40,000 years old. It is conjectured to be a tally stick, but its exact purpose is
unknown. Its 29 notches suggests that may have been used as a lunar phase counter.
However, since it is broken at one end, the 29 notches may or may not be the total
number\footfullcite{Beaumont2013}.
Another important milestone in the history of data collection is the record of
demographic data. One of first known census was conducted in 3,800 BC in the Babylonian
Empire. It was ordered to assess the population and resources of
his empire. Records were stored on clay tiles\footfullcite{Grajalez2013}.
Since the early forms of writing, humanity abilities to record events and information
increased significantly. The first known written records date back to around 3,500 BC, the
Sumerian archaic (pre-cuneiform) writing. This writing system was used to represent
commodities using clay tokens and to record transactions\footfullcite{Ifrah1998}.
``Data storage'' was also a challenge. Some important devices that increased our capacity
to register textual information are the Sumerian clay tablets (3,500 BC), the Egyptian
papyrus (3,000 BC), the Roman wax tablets (100 BC), the codex
(100 AD), the Chinese paper (200 AD), the printing press (1440), and the typewriter (1868).
% Talvez citar na parte de análise de dados
% Other mechanisms were also developed to store information in a more structured way. Some
% important devices are
% the abacus (2,700 BC), the Antikythera mechanism (150 -- 100 BC), the
% Chinese South Pointing Chariot (260 AD), the Pascaline (1642), the Jacquard loom (1801),
% the Babbage Difference Engine (1822), the Babbage Analytical Engine (1837).
Besides those improvements in unstructured data storage, at least in the Western and
Middle Eastern world, there are no significant advances in structured data collection
until the \nth{19} century. (A Eastern timeline research seems much hard to perform.
Unfortunally, I left it out in this book.)
I consider a major influential figure in the history of data
collection to be Florence Nightingale (1820 -- 1910). She was a passionate statistician
and probably the first person to use statistics to influence public and official
opinion. The meticulous records she kept during the Crimean War
(1853 -- 1856) were the evidence that saved lives --- part of the mortality came from lack
of sanitation. She was also the first to use
statistical graphics to present data in a way that was easy to understand. She is
credited with developing a form of the pie chart now known as the polar area
diagram. She also reformed healthcare in the United Kingdom and
is considered the founder of modern nursing; where great part of the work was to collect
data in a standardized way to quickly draw conclusions\footfullcite{Grajalez2013}.
\subsubsection{Digital age}
In the modern period, several devices were developed to store digital\footnote{Digital
means the representation of information in (finite) discrete form. The term comes from the Latin
digitus, meaning finger, because it is the natural way to count using fingers. Digital
here do not mean electronic.}
information. One particular device that is important for data collection is the punched
card. It is a piece of stiff paper that contains digital information represented by the
presence or absence of holes in predefined positions. The information can be read by a
mechanical or electrical device called a card reader. The earliest famous usage of
punched cards was by Basile Bouchon (1725) to control looms. Most of the advances until
the 1880s were about the automation of machines (data processing) using hand-punched cards, and not
particularly specialized for data collection.
However, the 1890 census in the United States was the first to use machine-readable
punched cards to tabulate data. Processing 1880 census data took eight years, so the
Census Bureau contracted Herman Hollerith (1860 -- 1929) to design and build a tabulating
machine. He founded the Tabulating Machine Company in 1896, which later merged with other
companies to become \gls{ibm} in 1924. Later
models of the tabulating machine were widely used for business applications such as
accounting and inventory control. Punched card technology remained a prevalent method of
data processing for several decades until more advanced electronic computers were
developed in the mid-\nth{20} century.
The invention of the digital computer is responsible for a revolution in data handling.
The amount of information we can capture and store increased exponentially. ENIAC (1945) was
the first electronic general-purpose computer. It was Turing-complete, digital, and
capable of being reprogrammed to solve a full range of computing problems.
It had 20 words of internal memory, each capable of storing a 10-digit decimal integer number.
Programs and data were entered by setting switches and inserting punched cards.
For the 1950 census, the United States Census Bureau used the
UNIVAC I (Universal Automatic Computer I), the first commercially produced computer in the
United States\footnote{Read more in \url{https://www.census.gov/history/www/innovations/}.}.
It goes without saying that digital computers have become much more powerful and
sophisticated since then. The data collection process has been easily automated and
scaled to a level that was unimaginable before. However, ``where'' storing data is
not the only challenge. ``How'' to store data is also a challenge. The next period of
history addresses this issue.
\subsubsection{Formal age}
In 1970, Edgar Frank Codd (1923 -- 2003), a British computer scientist,
published a paper entitled ``A Relational Model
of Data for Large Shared Data Banks''\footfullcite{Codd1970}. In this paper, he introduced
the concept of a relational model for database management.
A relational model organizes data in tables (relations) where each row represents a record
and each column represents an attribute of the record. The tables are related by common
fields. Codd showed means to organize the tables of a relational database to minimize
data redundancy and improve data integrity. \Cref{sec:normalization} provides more details
on the topic.
His work was a breakthrough in the field of data management. The standardization of
relational databases led to the development of \gls{sql} in 1974.
SQL is a domain-specific language used in programming and designed for managing data held
in a \gls{rdbms}.
As a result, a new challenge rapidly emerged: how to aggregate data from different
sources. Once data is stored in a relational database, it is easy to query and manage
it. However, data is usually stored in different databases, and it is not always possible
to directly combine them.
\subsubsection{Integrated age}
The solution to this problem was the development of the \gls{etl}
process. \gls{etl} is a process in data warehousing responsible for extracting data from
several sources, transforming it into a format that can be analyzed, and loading it into a
data warehouse.
The concept of data warehousing dates back to the late 1980s when IBM researchers Barry
Devlin and Paul Murphy developed the ``business data warehouse.''
Two major figures in the history of \gls{etl} are Ralph Kimball (born 1944) and Bill Inmon (born
1945), both American computer scientists. Although they
differ in their approaches, they both agree that data warehousing is the foundation for
\gls{bi} and analytics, and that data warehouses should be designed to
be easy to understand and fast to query for business users.
A famous debate between Kimball and Inmon is the top-down versus bottom-up approach to
data warehousing. Inmon's approach is top-down, where the data warehouse is designed
first and then the data marts\footnote{A data mart is a specialized subset of a data
warehouse that is designed to serve the needs of a specific business unit, department, or
functional area within an organization.} are created from the data warehouse. Kimball's
approach is bottom-up, where the data marts are created first and then the data warehouse
is created from the data marts.
One of the earliest and most famous case studies of the implementation of a data warehouse
is that of Walmart. In the early 1990s, Walmart faced the challenge of managing and
analyzing vast amounts of data from its stores across the United States. The company
needed a solution that would enable comprehensive reporting and analysis to support
decision-making processes. The solution was to implement a data warehouse that would
integrate data from various sources and provide a single source of truth for the
organization.
\subsubsection{Ubiquitous age}
The last and current period of history is the ubiquitous age. It is characterized by the
proliferation of data sources.
The ubiquity of data generation and the evolution of data-centric technologies have been
made possible by a multitude of figures across various domains.
\begin{itemize}
\itemsep0em
\item Vinton Gray Cerf (born 1943) and Robert Elliot Kahn (born 1938), often referred to
as the ``Fathers of the Internet,'' developed the TCP/IP protocols, which are
fundamental to internet communication.
\item Tim Berners-Lee (born 1955), credited with inventing the World Wide Web, laid the
foundation for the massive data flow on the internet.
\item Steven Paul Jobs (1955 -- 2011) and Stephen Wozniak (born 1950), from Apple Inc.,
and William Henry Gates III (born 1955), from Microsoft Corporation, were responsible
for the introduction of personal computers, leading to the democratization of data
generation.
\item Lawrence Edward Page (born 1973) and Sergey Mikhailovich Brin (born 1973), the
founders of Google, transformed how we access and search for information.
\item Mark Elliot Zuckerberg (born 1984), the co-founder of Facebook, played a crucial
role in the rise of social media and the generation of vast amounts of user-generated
content.
\end{itemize}
In terms of data handling, this change of landscape has brought about the
development of new technologies and techniques for data storage and processing. Especially
the development of NoSQL databases and distributed computing frameworks.
NoSQL databases are non-relational databases that can store and process large volumes of
unstructured, semi-structured, and structured data. They are highly scalable and
flexible, making them ideal for big data applications.
Some authors argue that the rise of big data is characterized by the five V's of big data:
Volume, Velocity, Variety, Veracity, and Value. The amount of data generated is massive,
the speed at which data is generated is high, the types of data generated are diverse, the
quality of data generated is questionable, and the value of data generated is high.
Once massive amounts of unstructured data became available, the need for new data
processing techniques arose. The development of distributed computing frameworks such as
Apache Hadoop and Apache Spark enabled the processing of massive amounts of data in a
distributed manner.
Douglass Read Cutting and Michael Cafarella, the developers of Apache Hadoop,
proposed the \gls{hdfs} and MapReduce, which are the
cornerstones of the Hadoop framework, in 2006. Hadoop's distributed storage and
processing capabilities enabled organizations to handle and analyze massive volumes of
data.
Currently, Google holds a patent for
MapReduce\footfullcite{Dean2008}.
However, their framework inherits from the architeture proposed in
\textcite{Hillis1985}\footfullcite{Hillis1985} thesis.
MapReduce is not particularly novel, but its simplicity and scalability made it popular.
Nowadays, another important topic is \gls{iot}. IoT is a system of
interrelated computing devices that communicate with each other over the internet.
The devices can be anything from cellphones, coffee makers, washing machines, headphones,
lamps, wearable devices, and almost anything else you can think of. The reality of IoT increased the
challenges of data handling, especially in terms of data storage and processing.
In summary, we currently live in a world where data is ubiquitous and comes in many
different forms. The challenge is to collect, store, and process this data in a way that
is meaningful and useful, also respecting privacy and security.
\subsection{Timeline of data analysis}
\label{sub:time-analysis}
The way we think about data and knowledge extraction has evolved significantly over the
years. In the following, I present some of the most important milestones in the history
of data analysis and knowledge extraction.
\begin{figurebox}[label=fig:data-analysis-history]{Timeline of the ages of data analysis.}
\centering
\begin{tikzpicture}
\draw (0,0) -- (8,0);
\foreach \x in {0,1,...,8} {
\draw (\x,-0.1) -- (\x,0.1);
}
\foreach \x/\y/\z in {%
0/Summary statistics/{3,800 BC -- 16th c.},
4/Probability advent/{17th c. -- 19th c.},
7/Learning from data/{20th c. -- present}} {
\node[anchor=north] at (\x,-0.1) {\footnotesize\y};
\node[anchor=south] at (\x,0.1) {\footnotesize\z};
}
\end{tikzpicture}
\end{figurebox}
\Cref{fig:data-analysis-history} illutrates the proposed timeline. I consider changes of
ages to be smooth transitions, and not strict boundaries. The theoretical advances are
slower than the technological ones --- the latter influences more data handling than data
analysis ---, so not much has changed since the beginning of the \nth{20} century.
\subsubsection{Summary statistics}
The earliest known records of systematic data analysis date back to the first censuses.
The term \emph{statistics} itself refer to the analysis of data \emph{about the state},
including demographics and economics. That early (and simplest) form of statistical
analysis is called \emph{summary statistics}, which consists of describing data in terms
of its central tendencies (e.g. arithmetic mean) e variability (e.g. range).
\subsubsection{Probability advent}
However, after the seventeenth century, the foundations of modern probability theory were
laid out. Important figures for developing the probability theory are Blaise Pascal (1623
-- 1662), Pierre de Fermat (1607 -- 1665), Christiaan Huygens (1629 -- 1695), and Jacob
Bernoulli (1655 -- 1705).
The foundation methods brought to life the field of statistical inference. In the
following years, important results were achieved.
\paragraph{Bayes' rule}
Reverend Thomas Bayes (1701 -- 1761) was an English statistician, philosopher, and
presbyterian minister. He is known for formulating a specific case of the theorem that
bears his name: Bayes' theorem. The theorem is used to calculate conditional
probabilities using an algorithm (his Proposition 9, published in 1763) that uses evidence to calculate
limits on an unknown parameter.
The Bayes' rule is the foundation of learning from evidence, once it allows us to
calculate the probability of an event based on prior knowledge of conditions that might be
related to the event. Classifiers based on Naïve Bayes --- the application of Bayes'
theorem with strong independence assumptions between known variables --- is likely to have
been used since the second half of the eighteenth century.
\paragraph{Gauss' method of least squares}
Johann Carl Friedrich Gauss (1777 -- 1855) was a German mathematician and physicist who made
significant contributions to many fields in mathematics and sciences. Circa 1794, he
developed the method of least squares for calculating the orbit of Ceres to minimize the
impact of measurement error\footnote{The method was first published by Adrien-Marie
Legendre (1752 -- 1833) in 1805, but Gauss claimed in 1809 that he
had been using it since circa 1794.}.
The method of least squares marked the beginning of the field of regression analysis. It
marked a shift to find the solution of systems of equations --- especially, overdetermined
systems --- using data instead of theoretical models.
\paragraph{Playfair's data visualization}
Another change in the way we analyze data was the development of data visualization. Data
visualization is the graphical representation of information and data.
William Playfair (1759 -- 1823) was a secret agent on behalf of Great Britain during its
war with France in the 1790s. He invented several types of diagram between 1780s and
1800s, such as the line, area and bar chart of economic data, and the pie chart and circle
graph to show proportions.
\subsubsection{Learning from data}
In the twentieth century and beyond, new advances were made in the field of statistics.
The development of learning machines enabled the development of new techniques for data
analysis.
The recent advances in computation and data storage are crucial for the large-scale
application of these techniques.
This era is characterized by a change of focus from trying to fit data to a theoretical
model to trying to extract knowledge from data. The main goal is to develop algorithms
that can learn from data with minimal human intervention.
\paragraph{Fisher's discriminant analysis}
In the 1930s, Sir Ronald Aylmer Fisher (1890 -- 1962), a British polymath, developed
discriminant analysis\footnote{\url{https://digital.library.adelaide.edu.au/dspace/bitstream/2440/15227/1/138.pdf}},
which was initially used find linear functions to solve the problem of separating two or
more classes of objects\footnote{After, Rosenblatt's work, however, it was used to solve
inductive inference (classification) as well. For curiosity, Fisher's paper
introduced the famous Iris data set.}.
The method is based on the concept of the Fisher discriminant, which is a linear
combination of variables. The method can be used not only for classification but also for
dimensionality reduction.
Tackling the problem of the importance of the variables for a particular task, Fisher's
increased the understanding of the importance of feature selection in data analysis.
% See \cref{sub:fisher} for more details about the technique.
\paragraph{Shannon's information theory}
The field, that studies quantification, storage and communication of information, was
originally established by the works of Harry Nyquist (1889 -- 1976) and Ralph Hartley
(1888 -- 1970) in the 1920s, and Claude Shannon (1916 -- 2001) in the 1940s.
Information theory brought many important concepts to the field of data analysis, such as
entropy, mutual information, and information gain. This theory is the foundation of
several machine learning algorithms.
Information theory sees data as a sequence of symbols that can be compressed and
transmitted. The theory is used to quantify the amount of information in a data set.
It also changed dominant paradigms in the field of statistics, such as the use of
likelihood functions and the Bayesian approach.
% See \cref{sub:shannon} for more details about the theory.
\paragraph{K-Nearest Neighbors}
In 1951, Evelyn Fix (1904 -- 1965) and Joseph Lawson Hodges Jr. (1922 -- 2000) wrote a
technical report entitled ``Discriminatory Analysis, Nonparametric Discrimination:
Consistency Properties.'' In this paper, they proposed the k-nearest neighbors algorithm,
which is a non-parametric method used for classification and regression. The algorithm
marks a shift from the traditional parametric methods --- and purely statistical ---
to non-parametric methods.
It also shows how intuitive models can be used to solve complex problems. The k-nearest
neighbors algorithm is based on the idea that objects that are similar are likely to be in
the same class.
% See \cref{sub:knn} for more details about the technique.
\paragraph{Rosenblatt's perceptron}
In the 1960s, Frank Rosenblatt (1928 -- 1971) developed the perceptron, the first model of
a learning machine. While the idea of a mathematical neuron was not new, he was the first
to describe the model as a program, showing the ability of the perceptron to learn simple
tasks such as the logical operations AND and OR.
This work was the foundation of the field of artificial neural networks. The ``training''
of the perceptron was a breakthrough in the field of learning machines, drawing attention
to the field of artificial intelligence.
% In their famous book entitled Perceptrons: An Introduction to Computational Geometry,
% Minsky and Papert show that a perceptron can't solve the XOR problem. This contributed
% to the first AI winter, resulting in funding cuts for neural networks. However, now we
% know that a multilayer perceptron can solve the XOR problem easily.
A few years after, the book ``Perceptrons: an introduction to computational geometry'' by
Marvin Minsky and Seymour Papert in 1969 draw attention to the limitations of the
perceptron\footnote{Although the Rosenblatt was aware of the limitations of the perceptron
and was probably working on solutions, he died in 1971.}. They showed that a single-layer
perceptron was limited to linearly separable problems, which led to a decline in the
interest in neural networks.
See \cref{sub:perceptron} for more details about the technique.
This fact contributed to the first AI winter, resulting in funding cuts for neural
network research.
\paragraph{Hunt inducing trees}
In 1966, \citeauthor{Hunt1966}'s book\footfullcite{Hunt1966} described a way to induce decision trees from
data. The algorithm is based on the concept of information entropy and is a precursor of
the \citeauthor{Quinlan1986}'s ID3 algorithm\footfullcite{Quinlan1986} and its variations.
These algorithm gave rise to the field of decision trees, which is a popular method for
classification and regression.
Trees are intuitive models that can be easily interpreted by humans. They are based on
symbolic rules that can be used to explain the decision-making process.
% See \cref{sub:tree} for more details about the technique.
\paragraph{Empirical risk minimization principle}
Although many learning machines where developed until the 1960s, they did not advanced
significantly the understanding of the general problem of learning from data. Between
1960s and 1986 --- before the backpropagation algorithm was proposed ---, the field of practical
data analysis was basically stagnant. The main reason for that was the lack of a
theoretical framework to support the development of new learning machines.
However, these years were not completely unfruitful. As early as 1968, Vladimir Vapnik
(born 1936)
and Alexey Chervonenkis (1938 -- 2014) developed the foundamental concepts of VC entropy
and VC dimension for the data classification problems. As a result, a novel inductive
principle was proposed: the \gls{erm} principle.
This principle is the foundation of statistical learning theory.
% \paragraph{Algorithmic complexity}
%
% Solomonoff, Kolmogorov, and Chaitin proposed the first learning model based on
% algorithmic complexity.
\paragraph{Resurgence of neural networks}
In 1986, researchers developed independently a method to optimize coefficients of a
multi-layer neural
network\footfullcite{LeCun1986,Rumelhart1986}. The method is called backpropagation and
is the foundation of the resurgence of neural networks. The technique enabled the
training of artificial networks that can solve non linearly separable problems.
This rebirth of neural networks happened in a scenario very different from the 1960s.
The availability of data and computational power fueled a new approach to the problem of
learning from data. The new approach preferred the use of simple algorithms and
intuitive models over theoretical models, fueling areas such as bioinspired computing and
evolutionary computation.
\paragraph{Ensembles}
Following the new approach, ensemble methods were developed. Based on ideas of
boosting\footfullcite{Schapire1990} and bagging\footfullcite{Breiman1996}, ensemble
methods combine multiple learning machines to improve the performance of the individual
machines.
The difference between boosting and bagging is the way the ensemble is built. In
boosting, the ensemble is built sequentially, where each new model tries to correct the
errors of the previous models. In bagging, the ensemble is built in parallel, where each
model is trained independently with small changes in the data. The most famous bagging
ensemble methods are random forests\footfullcite{Ho1995}, while XGBoost, a gradient
boosting method\footfullcite{Friedman2001}, has been extensively used in machine learning
competitions.
\paragraph{Support vector machines}
In 1995, \citeauthor{Cortes1995}\footfullcite{Cortes1995} proposed the \gls{svm} algorithm, a
learning machine based on the VC theory and the \gls{erm} principle. Based on Cover's
theorem\footfullcite{Cover1965}, they developed a method that finds the optimal hyperplane
that separates two classes of data in a high-dimensional space with the maximum margins.
The resulting method led to practical and efficient learning machines.
\paragraph{Deep learning revolution}
Although the ideia of neural networks with multiple layers were around since the 1960s,
only in the late 2000s the field of deep learning caught the attention of the scientific
community by achieving state-of-the-art results in computer vision and natural language
processing. Yoshua Bengio, Geoffrey
Hinton and Yann LeCun are recognized for their for conceptual and engineering
breakthroughs in the field, winning the 2018 Turing Award\footnote{\url{https://awards.acm.org/about/2018-turing}}.
% \paragraph{Knowledge discovery in databases}
%
% 1990s, Fayyad, Piatesky-Shapiro, and Smyth.
% Developed the KDD process, which is the foundation of data mining.
% Data mining is the process of discovering patterns in large data sets involving methods at
% the intersection of machine learning, statistics, and database systems.
% \paragraph{Generative deep models}
%
% Nowadays, generative deep models are a hot topic in machine learning. They are a class of
% statistical models that can generate new data instances. They are used in unsupervised
% learning to discover hidden structures in unlabeled data (e.g. clustering), and in
% supervised learning to generate new synthetic data instances. The most famous generative
% models are the generative transformers and generative adversarial networks.
\paragraph{LUSI learning theory}
In 2010s, \citeauthor{Vapnik2015}\footfullcite{Vapnik2015} proposed the \gls{lusi}
principle, which is an extension of the statistical learning theory. The \gls{lusi}
theory is based on the concept of statistical invariants, which are properties of
the data that are preserved under transformations. The theory is the foundation of the
learning from intelligent teachers paradigm. They regard the \gls{lusi} theory as the
next step in the evolution of learning theory, calling it the ``complete statistical
theory of learning.''
% vim: set spell spelllang=en: