-
Notifications
You must be signed in to change notification settings - Fork 4
/
harvard-cs050.malloynb
203 lines (191 loc) · 6.8 KB
/
harvard-cs050.malloynb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
>>>markdown
# Harvard CS 050 SQL Problems
Harvard's beginnning computer science class teaches a section on SQL. The class uses the IMDB as a data set for the class. Here are the questions and answers in Malloy
Below is the semantic data model used to answer all the questions
>>>malloy
source: people is table('duckdb:data/names.parquet')
source: principals is table('duckdb:data/principals.parquet') {
join_one: people is people on nconst = people.nconst
}
source: movies is table('duckdb:data/titles.parquet') + {
join_many: principals on tconst = principals.tconst
measure:
title_count is count()
}
>>>markdown
## Question 1: Movie Titles after 2008
List the titles of all movies released in 2008. Your query should output a table with a single column for the title of each movie.
>>>malloy
run: movies -> {
where: startYear = 2008
project: primaryTitle
}
>>>markdown
## Question 2: When was Emma Stone Born?
Write a query to determine the birth year of Emma Stone. Your query should output a table with a single column and a single row (not including the header) containing Emma Stone’s birth year.
>>>malloy
run: people -> {
where: primaryName = 'Emma Stone'
project: birthYear
}
>>>markdown
## Question 3: Emma Stone Titles
You may assume that there is only one person in the database with the name Emma Stone. Write a query to list the titles of all movies with a release date on or after 2018, in alphabetical order. Your query should output a table with a single column for the title of each movie.
>>>malloy
run: movies -> {
where: principals.people.primaryName = 'Emma Stone'
and startYear >= 2018
group_by: primaryTitle
order_by: 1
}
>>>markdown
## Question 4: Movies with 10 rating
Movies released in 2018 should be included, as should movies with release dates in the future. Write a query to determine the number of movies with an IMDb rating of 10.0. Your query should output a table with a single column and a single row (not including the header) containing the number of movies with a 10.0 rating.
>>>malloy
query: movies -> {
where: averageRating = 10
aggregate: title_count
nest: by_title is {
group_by: primaryTitle
}
}
>>>markdown
## Question 5: Harry Potter Movies
Write a Query to list the titles and release years of
all Harry Potter movies, in chronological order.
Your query should output a table with two columns, one
for the title of each movie and one for the release year
of each movie.
You may assume that the title of all Harry Potter
movies will begin with the words “Harry Potter”,
and that if a movie title begins with the words
“Harry Potter”, it is a Harry Potter movie.
>>>malloy
query: movies -> {
where: primaryTitle ~ 'Harry Potter%'
project: primaryTitle, startYear
order_by: 2
}
>>>markdown
Question 6: Average Rating in 2012
Write a query to determine the average rating of all movies
released in 2012.
Your query should output a table with a single column and
a single row (not including the header) containing the
average rating.
>>>malloy
query: movies -> {
where: startYear = 2012
aggregate: average_rating is averageRating.avg()
}
>>>markdown
## Question 7: Toy Story Who?
Write a query to list the names of all people
who starred in Toy Story.
Your query should output a table with a single column for
the name of each person. You may assume that there is only
one movie in the database with the title Toy Story.
>>>malloy
query: movies -> {
where: primaryTitle = 'Toy Story'
group_by: principals.people.primaryName
order_by: 1
}
>>>markdown
## Question 8: Who's Who 2004
Write a query to list the names of all people who starred
in a movie released in 2004, ordered by birth year.
Your query should output a table with a single column
for the name of each person.
People with the same birth year may be listed in any order.
No need to worry about people who have no birth year listed,
so long as those who do have a birth year are listed
in order. If a person appeared in more than one movie in 2004,
they should only appear in your results once.
>>>malloy
query: movies -> {
where: startYear = 2004
group_by:
principals.people.primaryName
principals.people.birthYear
order_by: 1
}
>>>markdown
## Question 9: High Directors
Write SQL query to list the names of all people who have
directed a movie that received a rating of at least 9.0.
Your query should output a table with a single column for
the name of each person. If a person directed more than
one movie that received a rating of at least 9.0, they
should only appear in your results once.
>>>malloy
query: movies -> {
where: averageRating >= 9
and principals.category = 'director'
group_by:
principals.people.primaryName
}
>>>markdown
## Question 10: Chadwick Boseman?
Write a Query to list the titles of the five highest
rated movies (in order) that Chadwick Boseman starred
in, starting with the highest rated.
Your query should output a table with a single column
for the title of each movie.
You may assume that there is only one person in the
database with the name Chadwick Boseman.
>>>malloy
query: movies -> {
where: principals.people.primaryName = 'Chadwick Boseman'
group_by:
primaryTitle
averageRating
order_by: 2 desc
limit: 5
}
>>>markdown
## Question 11: Johnny and Helena?
Write a SQL query to list the titles of all
movies in which both Johnny Depp and Helena Bonham
Carter starred.
Your query should output a table with a single column
for the title of each movie.
You may assume that there is only one person in the
database with the name Johnny Depp.
You may assume that there is only one person in
the database with the name Helena Bonham Carter.
>>>malloy
query: johnnys_movies is movies ->{
where: principals.people.primaryName = 'Johnny Depp'
project: tconst
}
query: movies -> {
join_one: x is from(->johnnys_movies) with tconst
where: x.tconst != null
and principals.people.primaryName = 'Helena Bonham Carter'
group_by: primaryTitle
}
>>>markdown
## Question 12: One Degree of Kevin Bacon
Write a SQL query to list the names of all people who
starred in a movie in which Kevin Bacon also starred.
Your query should output a table with a single column
for the name of each person.
There may be multiple people named Kevin Bacon in
the database. Be sure to only select the Kevin Bacon
born in 1958.
Kevin Bacon himself should not be included in
the resulting list.
>>>malloy
query: kevins_movies is movies ->{
where: principals.people.primaryName = 'Kevin Bacon'
and principals.people.birthYear = 1958
project: tconst
}
query: movies -> {
join_one: x is from(->kevins_movies) with tconst
where: x.tconst != null
and principals.people.primaryName != 'Kevin Bacon'
group_by: principals.people.primaryName
aggregate: title_count
}