-
Notifications
You must be signed in to change notification settings - Fork 0
/
tips.html
70 lines (68 loc) · 4.01 KB
/
tips.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
layout: default
title: CS6240 Large-scale Parallel Data Processing
---
<div class="post">
<div>
<h5>How to Succeed in This Course</h5>
<p>
This is an advanced graduate course about an evolving topic. It is therefore essential that
you go through the online material carefully and methodically, attend the lectures and
participate in online discussions. Homework is designed to help you understand the
material and prepare for the exam. The following often works well:
</p>
<ul>
<li>
When going through the online material, make notes about questions you have or
about material you find difficult to understand. Then share these questions through
the online forum or in class.
</li>
<li>
When you get a question in a check-your-knowledge quiz wrong or were not sure
about the answer, go back to the corresponding online material and try to find the
answer.
</li>
<li>
Start working on homework assignments as soon as they come out. This way you
have time to ask questions and get help.
</li>
</ul>
</p>
</div>
<div>
<h5>Is This the Right Course for You?</h5>
<p class="subtitle">Programming Language</p>
<p>
This really is an algorithms course at heart. You will write plenty of code, but the main
emphasis is on learning how to approach big-data analysis problems. You will need solid
Java or Python programming skills to succeed, but we are not teaching any Java/Python
basics in this course. You do not need advanced Scala skills and should be able to pick up
what you need on-the-fly with reasonable effort.
</p><p>
If you believe that programming in Java or Scala presents an insurmountable barrier
for you, contact the instructor during the first week of classes to find a solution. It is
possible to program in other languages, but we generally cannot promise any
support for them—so you may be on your own if you get stuck. Students in the past
completed their homework successfully using Python for both MapReduce and
Spark. Python is well supported in Spark and the programs often look similar to
those written in Scala.
</p>
<p class="subtitle">Challenges you may face</p>
<ul>
<li>We are learning about novel techniques that are only partially understood and explored by the research community.
Hence in many cases there are no "certain truths". At times we might find better solutions that could be publishable in a research paper.</li>
<li>We are working with complex cutting-edge software from the open-source community.
This means that there will be bugs, lack of documentation, and simply inexplicable behavior at times.
Hadoop and Spark also keep changing and updating their API,
therefore some code you find in books or on the Web might be outdated or use deprecated features.</li>
<li>When dealing with big data in a complex environment such as MapReduce/Spark and the cloud,
developing and debugging code is different compared to traditional settings.
Sometimes a task might appear easy but turns out to be much harder and more time-consuming (or the other way round).</li>
</ul>
<p>
You should only take this course if you are prepared to deal with such issues and are willing to put in extra time when necessary.
Do not take this course if you want a well-polished and well-tested course without any uncertainty.
If you are genuinely interested in the topic and are ready to work around the inevitable frustrations, then this will be a rewarding experience.
</p>
</div>
</div>