-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
110 lines (108 loc) · 4.69 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
layout: default
title: CS6240 Large-scale Parallel Data Processing
---
<div class="post">
<div>
<h5>Class</h5>
<p>
Time: Tuesday and Thursday 1:30-3:10pm from May 7 to Aug 15, Full Summer 2024<br>
Location: Churchill Hall 103 (in-person only)<br>
</p>
</div>
<div>
<h5>People</h5>
<p>
Instructor: <br>
<a href="https://zixuanczx.github.io/">Zixuan Chen</a><br>
Office hours: Wednesday 1:00-3:00pm (Zoom link on Canvas)<br><br>
TA: <br>
<a href="https://harivilasp.github.io/">Hari Vilas Panjwani</a><br>
Office hours: Friday 9:00-11:00am (Zoom link on Canvas)
</p>
</div>
<div>
<h5>Topics</h5>
<ul>
<li>
An overview of the big-data processing landscape
<ul>
<li>We will discuss some trends and challenges and briefly survey alternative approaches.</li>
</ul>
</li>
<li>
Distributed algorithms for processing big data
<ul>
<li>We will cover a variety of fundamental problems and design patterns,
including join computation, graph algorithms, information retrieval and data mining techniques,
and analyze how they can be implemented in a scalable manner.</li>
<li>In addition to the implementations,
we will discuss how to evaluate the performance and scalability of programs with parallelization measures.</li>
</ul>
</li>
<li>
Parallel data processing tools
<ul>
<li>We will work with and discuss features/limitations of Hadoop MapReduce and Spark.</li>
<li>We will cover HBase, Hive and Spark libraries including Spark SQL, Spark Streaming, MLlib and GraphX.</li>
<li>We will use the Amazon Cloud to run the code but may work with a different provider if necessary.
(Our goal is to provide a real-world commercial-cloud experience at minimal cost—ideally zero—for each student.)</li>
</ul>
</li>
</ul>
</div>
<div>
<h5>Structure</h5>
<p>The course consists of lectures, weekly readings and self-check quizzes, four homework assignments, one group project and an exam.
</p>
<p class="subtitle">Lecture</p>
<p>
Lectures focus on difficult, interesting, and most relevant material.
More interaction is expected during lectures, e.g., group activities and guided problem-solving.
</p>
<p class="subtitle">Participation</p>
<p>
You are expected to attend every lecture but there is no attendence check in any way.
Asking/answering questions and posting relevant information in the discussion boards is encouraged,
and a small participation bonus will be added to the overall grade of active students.
</p>
<p class="subtitle">Weekly Readings</p>
<p>
Weekly readings provide the background knowledge, terminology, and examples you need to understand and apply fundamental course concepts.
You must complete/view all assigned readings, presentations, and demonstrations included in the lessons.
All materials should be completed by the due dates specified.
</p>
<p class="subtitle">Self-checks</p>
<p>
When available, complete self-checks about the online lecture material designed to
enhance your understanding and ability to correctly apply concepts covered in weekly readings and presentations.
Getting a few questions wrong does not result in any deduction for your final grade, unless it looks like you are guessing.
Notice that you must complete the self-check for a module by midnight on Sunday before the module is discussed.
</p>
<p class="subtitle">Homework/Project</p>
<p>
You will complete multiple homework assignments that give you the opportunity to program code and practice the concepts you learn
and a project to solve a more complicated problem.
More information about these assignments and the course project is available in Canvas.
</p>
<p class="subtitle">Exam</p>
<p>
You will complete an exam designed to test your understanding of the course concepts.
The exam is closed-book, i.e., you cannot bring any material,
but you need to bring either a pencil or pen to write and devices that can take photos and upload your solutions online.
Students must be present in the lecture room for the exam.
Exceptions are possible for students with disabilities who can provide an official letter from the corresponding Northeastern office
at the beginning of the semester
</p>
</div>
<div>
<h5>Acknowledgement</h5>
<p>
The course has been designed and taught by
<a href="https://www.khoury.northeastern.edu/home/mirek/">Prof. Mirek Riedewald</a>.
We reuse most of the reading materials and lecture slides.
Many thanks to previous instructors of the course Prof. Mirek Riedewald and
<a href="https://ntzia.github.io/">Nikolaos Tziavelis</a> for their help!
</p>
</div>
</div>