From 4ea1c36044f76bf794400a1d8735451a56097988 Mon Sep 17 00:00:00 2001 From: Aron Culotta Date: Tue, 15 Dec 2015 10:50:22 -0600 Subject: [PATCH] set dates --- README.md | 71 +++++++++++++++++++++++++++--- admin/Resources.md => Resources.md | 0 admin/Schedule.md => Schedule.md | 52 +++++++++++----------- admin/README.md | 3 -- admin/Syllabus.md | 65 --------------------------- 5 files changed, 90 insertions(+), 101 deletions(-) rename admin/Resources.md => Resources.md (100%) rename admin/Schedule.md => Schedule.md (57%) delete mode 100644 admin/README.md delete mode 100644 admin/Syllabus.md diff --git a/README.md b/README.md index 67817ac..7018579 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,67 @@ -### CS 429: Information Retrieval -**Spring 2015** +**See the [Schedule](Schedule.md) for a detailed list of readings and due dates.** -This repository contains class files for CS 429: Information Retrieval, taught at the [Illinois Institute of Technology](http://cs.iit.edu) by [Aron Culotta](http://cs.iit.edu/~culotta). +### Overview -The contents are organized as follows: +- **Course:** CS 429: Information Retrieval +- **Instructor:** [Dr. Aron Culotta](http://cs.iit.edu/~culotta) +- **Meetings:** 3:15 - 4:30 pm T/R Room TBA +- **E-mail:** culotta at cs.iit.edu +- **Phone:** 312-567-5261 +- **Office Hours:** T/R 10:00 a.m. - 11:00 a.m. +- **Office:** Stuart Hall 229B +- **TA:** TBA -- [`admin`](admin): the syllabus and related resources -- [`assignments`](assignments): instructions and code for homework assignments -- [`lectures`](lectures): class notes +**Description:** Overview of fundamental issues of information retrieval with theoretical foundations. The information-retrieval techniques and theory, covering both effectiveness and run-time performance of information-retrieval systems are covered. The focus is on algorithms and heuristics used to find documents relevant to the user request and to find them fast. The course covers the architecture and components of the search engine such as parser, stemmer, index builder, and query processor. The students learn the material by building a prototype of such a search engine. Prerequisites: CS 331 or CS 401; requires strong programming knowledge. 3-0-3 (C) (T) + +**Textbook:** [*Introduction to Information Retrieval*](http://nlp.stanford.edu/IR-book/), Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Cambridge University Press. 2008. + +You can use the [electronic version](http://nlp.stanford.edu/IR-book/) of this book. + +### Grading + +- 250 points - [Assignments](../assignments) (5 @ 50 points each) +- 100 points - Midterm +- 100 points - Final +- 32 points - 1 quiz 50 points - Quizzes / In-class assignments (5 @ 10 points each) +- **682 total points** 700 total points + +| **Percent** | **Grade** | +|-------------|-----------| +| 100-90 | A | +| 89-80 | B | +| 79-70 | C | +| 69-60 | D | +| < 60 | E | + +**Academic Integrity** + +- Please read IIT's [Academic Honesty Policy](http://www.iit.edu/student_affairs/handbook/information_and_regulations/code_of_academic_honesty.shtml) +- All work you turn in must be done by you alone. +- All violations will be reported to `academichonesty@iit.edu`. +- The first violation will result in a failing grade for that assignment/test. The second will result in a failing grade for the course. + + +**Late Submission Policy** + +- Late assignments will **not** be accepted, unless: + - There is an unavoidable medical, family, or other emergency, **and** + - You notify me **prior** to the due date + +### Course Outcomes + +1. Explain the information retrieval storage methods (Inverted Index and Signature Files) +2. Explain retrieval models, such as Boolean model, Vector Space model, Probabilistic model, Inference Networks, and Neural Networks. +3. Explain retrieval utilities such as Stemming, Relevance Feedback, N-gram, Clustering, and Thesauri, and Parsing and Token recognition. +4. Design and implement a search engine prototype using the storage methods, retrieval models and utilities. +5. Apply the research ideas into their experiments in building a search engine prototype + + +### Program Outcomes + +- a. An ability to apply knowledge of computing and mathematics appropriate to the discipline. +- c. An ability to design, implement and evaluate a computer-based system, process, component, or program to meet desired needs. +- d. An ability to function effectively on teams to accomplish a common goal. +- f. An ability to communicate effectively with a range of audiences. +- i. An ability to use current techniques, skills, and tools necessary for computing practices. +- j. An ability to apply mathematical foundations, algorithmic principles, and computer science theory in the modeling and design of computer-based systems in a way that demonstrates comprehension of the tradeoffs involved in design choices. +- k. An ability to apply design and development principles in the construction of software systems of varying complexity. diff --git a/admin/Resources.md b/Resources.md similarity index 100% rename from admin/Resources.md rename to Resources.md diff --git a/admin/Schedule.md b/Schedule.md similarity index 57% rename from admin/Schedule.md rename to Schedule.md index 6fe1c58..7796dd3 100644 --- a/admin/Schedule.md +++ b/Schedule.md @@ -1,41 +1,41 @@ | Date | Topic | Readings | Due | Lecture | | ----- |----------------------------------|--------------------------------------------------------|-----|---- ||**Part I: Indexing**| -| 1/13 | Boolean Search | [Ch1](http://nlp.stanford.edu/IR-book/pdf/01bool.pdf) | |[L01](../lectures/lec01) -| 1/15 | Indexing I: stemming/stopping | [Ch2](http://nlp.stanford.edu/IR-book/pdf/02voc.pdf) | |[L02](../lectures/lec02) -| 1/20 | Indexing II: phrases, skip lists, position | [Ch2](http://nlp.stanford.edu/IR-book/pdf/02voc.pdf) | | [L03](../lectures/lec03) -| 1/22 | Dictionaries | [Ch3](http://nlp.stanford.edu/IR-book/pdf/03dict.pdf) | [A0](../assignments/assignment0) | [L04](../lectures/lec04) -| 1/27 | Scalable indexing | [Ch4](http://nlp.stanford.edu/IR-book/pdf/04const.pdf) | | [L05](../lectures/lec05) -| 1/29 | Index compression | [Ch5](http://nlp.stanford.edu/IR-book/pdf/05comp.pdf) | | [L06](../lectures/lec06) +| 1/12 | Boolean Search | [Ch1](http://nlp.stanford.edu/IR-book/pdf/01bool.pdf) | |[L01](../lectures/lec01) +| 1/14 | Indexing I: stemming/stopping | [Ch2](http://nlp.stanford.edu/IR-book/pdf/02voc.pdf) | |[L02](../lectures/lec02) +| 1/19 | Indexing II: phrases, skip lists, position | [Ch2](http://nlp.stanford.edu/IR-book/pdf/02voc.pdf) | | [L03](../lectures/lec03) +| 1/21 | Dictionaries | [Ch3](http://nlp.stanford.edu/IR-book/pdf/03dict.pdf) | [A0](../assignments/assignment0) | [L04](../lectures/lec04) +| 1/26 | Scalable indexing | [Ch4](http://nlp.stanford.edu/IR-book/pdf/04const.pdf) | | [L05](../lectures/lec05) +| 1/28 | Index compression | [Ch5](http://nlp.stanford.edu/IR-book/pdf/05comp.pdf) | | [L06](../lectures/lec06) || **Part II: Ranking** | -| 2/03 | Vector space model | [Ch6](http://nlp.stanford.edu/IR-book/pdf/06vect.pdf) | | [L07](../lectures/lec07) -| 2/05 | Scoring for search |[Ch7](http://nlp.stanford.edu/IR-book/pdf/07system.pdf)| [A1](../assignments/assignment1) (now due 2/6) | [L08](../lectures/lec08) | [A1](../assignments/assignment1) -| 2/10 | Evaluation | [Ch8](http://nlp.stanford.edu/IR-book/pdf/08eval.pdf) | | [L09](../lectures/lec09) -| 2/12 | Query Expansion | [Ch9](http://nlp.stanford.edu/IR-book/pdf/09expand.pdf)| | [L10](../lectures/lec10) -| 2/17 | Probabilistic IR | [Ch11](http://nlp.stanford.edu/IR-book/pdf/11prob.pdf) | | [L11](../lectures/lec11) -| 2/19 | Probabilistic IR | [Ch11](http://nlp.stanford.edu/IR-book/pdf/11prob.pdf) | [A2](../assignments/assignment2) | [L12](../lectures/lec12) -| 2/24 | Language Models | [Ch12](http://nlp.stanford.edu/IR-book/pdf/12lmodel.pdf) | | [L13](../lectures/lec13) -| 2/26 | Language Models | [Ch12](http://nlp.stanford.edu/IR-book/pdf/12lmodel.pdf) | | [L14](../lectures/lec14) +| 2/02 | Vector space model | [Ch6](http://nlp.stanford.edu/IR-book/pdf/06vect.pdf) | | [L07](../lectures/lec07) +| 2/04 | Scoring for search |[Ch7](http://nlp.stanford.edu/IR-book/pdf/07system.pdf)| [A1](../assignments/assignment1) (now due 2/6) | [L08](../lectures/lec08) | [A1](../assignments/assignment1) +| 2/09 | Evaluation **(Aron travels)** | [Ch8](http://nlp.stanford.edu/IR-book/pdf/08eval.pdf) | | [L09](../lectures/lec09) +| 2/11 | Query Expansion **(Aron travels)** | [Ch9](http://nlp.stanford.edu/IR-book/pdf/09expand.pdf)| | [L10](../lectures/lec10) +| 2/16 | Probabilistic IR **(Aron travels)** | [Ch11](http://nlp.stanford.edu/IR-book/pdf/11prob.pdf) | | [L11](../lectures/lec11) +| 2/18 | Probabilistic IR | [Ch11](http://nlp.stanford.edu/IR-book/pdf/11prob.pdf) | [A2](../assignments/assignment2) | [L12](../lectures/lec12) +| 2/23 | Language Models | [Ch12](http://nlp.stanford.edu/IR-book/pdf/12lmodel.pdf) | | [L13](../lectures/lec13) +| 2/25 | Language Models | [Ch12](http://nlp.stanford.edu/IR-book/pdf/12lmodel.pdf) | | [L14](../lectures/lec14) || **Part III: Classification**| -| 3/03 | Naive Bayes | [Ch13](http://nlp.stanford.edu/IR-book/pdf/13bayes.pdf)| | [L15](../lectures/lec15) -| 3/05 | Logistic Regression | [Ch14](http://nlp.stanford.edu/IR-book/pdf/14vcat.pdf) | [A3](../assignments/assignment3) -| 3/10 | **Midterm** | | -| 3/12 | KNN | [Ch14](http://nlp.stanford.edu/IR-book/pdf/14vcat.pdf) | | [L16](../lectures/lec16/bayes.pdf) -| 3/05 | Logistic Regression | [Ch14](http://nlp.stanford.edu/IR-book/pdf/14vcat.pdf) | | [L17](../lectures/lec17) +| 3/01 | Naive Bayes | [Ch13](http://nlp.stanford.edu/IR-book/pdf/13bayes.pdf)| | [L15](../lectures/lec15) +| 3/03 | Logistic Regression | [Ch14](http://nlp.stanford.edu/IR-book/pdf/14vcat.pdf) | [A3](../assignments/assignment3) +| 3/08 | **Midterm** | | +| 3/10 | KNN | [Ch14](http://nlp.stanford.edu/IR-book/pdf/14vcat.pdf) | | [L16](../lectures/lec16/bayes.pdf) +| 3/15 | **Spring Break** | | | 3/17 | **Spring Break** | | -| 3/19 | **Spring Break** | | +| 3/22 | Logistic Regression **(Aron travels)** | [Ch14](http://nlp.stanford.edu/IR-book/pdf/14vcat.pdf) | | [L17](../lectures/lec17) | 3/24 | Logistic Regression | [Ch15](http://nlp.stanford.edu/IR-book/pdf/15svm.pdf) | -| 3/26 | Bias/Variance | Handouts | +| 3/29 | Bias/Variance | Handouts | ||**Part IV: Clustering**| | 3/31 | Learning to Rank | [Ch16](http://nlp.stanford.edu/IR-book/pdf/16flat.pdf) | -| 4/02 | K-Means | [Ch16](http://nlp.stanford.edu/IR-book/pdf/16flat.pdf) | [A4](../assignments/assignment4) | +| 4/05 | K-Means | [Ch16](http://nlp.stanford.edu/IR-book/pdf/16flat.pdf) | [A4](../assignments/assignment4) | | 4/07 | EM | [Ch18](http://nlp.stanford.edu/IR-book/pdf/18lsi.pdf) | | [L22](../lectures/lec22) -| 4/09 | Word Clustering | Handouts | | [L23](../lectures/lec23) +| 4/12 | Word Clustering | Handouts | | [L23](../lectures/lec23) ||**Part V: Web Search**| | 4/14 | Web search overview | [Ch19](http://nlp.stanford.edu/IR-book/pdf/19web.pdf) | -| 4/16 | PageRank | [Ch21](http://nlp.stanford.edu/IR-book/pdf/21link.pdf) | [A5](../assignments/assignment5) +| 4/19 | PageRank | [Ch21](http://nlp.stanford.edu/IR-book/pdf/21link.pdf) | [A5](../assignments/assignment5) | 4/21 | PageRank | [Ch21](http://nlp.stanford.edu/IR-book/pdf/21link.pdf) | | [L26](../lectures/lec26) -| 4/23 | Web Crawling | [Ch20](http://nlp.stanford.edu/IR-book/pdf/20crawl.pdf)| +| 4/26 | Web Crawling | [Ch20](http://nlp.stanford.edu/IR-book/pdf/20crawl.pdf)| | 4/28 | Review | | [A6](../assignments/assignment6) -| 4/30 | **Final Exam** | | +| TBA | **Final Exam** | | diff --git a/admin/README.md b/admin/README.md deleted file mode 100644 index c93d6f9..0000000 --- a/admin/README.md +++ /dev/null @@ -1,3 +0,0 @@ -- [Syllabus](Syllabus.md) -- [Schedule](Schedule.md) -- [Resources](Resources.md) diff --git a/admin/Syllabus.md b/admin/Syllabus.md deleted file mode 100644 index e5ffa60..0000000 --- a/admin/Syllabus.md +++ /dev/null @@ -1,65 +0,0 @@ -### Overview - -- **Course:** CS 429: Information Retrieval -- **Instructor:** [Dr. Aron Culotta](http://cs.iit.edu/~culotta) -- **Meetings:** 3:15 - 4:30 pm T/R Stuart 238 -- **E-mail:** culotta at cs.iit.edu -- **Phone:** 312-567-5261 -- **Office Hours:** T/R 10:00 a.m. - 11:00 a.m. -- **Office:** Stuart Hall 229B -- **TA:** Junzhe Zheng (jzheng9 at hawk.iit.edu). Office hours: Wed 1-2pm (room TBD) - -**Description:** Overview of fundamental issues of information retrieval with theoretical foundations. The information-retrieval techniques and theory, covering both effectiveness and run-time performance of information-retrieval systems are covered. The focus is on algorithms and heuristics used to find documents relevant to the user request and to find them fast. The course covers the architecture and components of the search engine such as parser, stemmer, index builder, and query processor. The students learn the material by building a prototype of such a search engine. Prerequisites: CS 331 or CS 401; requires strong programming knowledge. 3-0-3 (C) (T) - -**Textbook:** [*Introduction to Information Retrieval*](http://nlp.stanford.edu/IR-book/), Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Cambridge University Press. 2008. - -You can use the [electronic version](http://nlp.stanford.edu/IR-book/) of this book. - -### Grading - -- 350 points - [Assignments](../assignments) (7 @ 50 points each) -- 100 points - Midterm -- 200 points - Final -- 32 points - 1 quiz 50 points - Quizzes / In-class assignments (5 @ 10 points each) -- **682 total points** 700 total points - -| **Percent** | **Grade** | -|-------------|-----------| -| 100-90 | A | -| 89-80 | B | -| 79-70 | C | -| 69-60 | D | -| < 60 | E | - -**Academic Integrity** - -- Please read IIT's [Academic Honesty Policy](http://www.iit.edu/student_affairs/handbook/information_and_regulations/code_of_academic_honesty.shtml) -- All work you turn in must be done by you alone, except for the group project. -- All violations will be reported to `academichonesty@iit.edu`. -- The first violation will result in a failing grade for that assignment/test. The second will result in a failing grade for the course. - - -**Late Submission Policy** - -- Late assignments will **not** be accepted, unless: - - There is an unavoidable medical, family, or other emergency. - - You notify me **prior** to the due date - -### Course Outcomes - -1. Explain the information retrieval storage methods (Inverted Index and Signature Files) -2. Explain retrieval models, such as Boolean model, Vector Space model, Probabilistic model, Inference Networks, and Neural Networks. -3. Explain retrieval utilities such as Stemming, Relevance Feedback, N-gram, Clustering, and Thesauri, and Parsing and Token recognition. -4. Design and implement a search engine prototype using the storage methods, retrieval models and utilities. -5. Apply the research ideas into their experiments in building a search engine prototype - - -### Program Outcomes - -- a. An ability to apply knowledge of computing and mathematics appropriate to the discipline. -- c. An ability to design, implement and evaluate a computer-based system, process, component, or program to meet desired needs. -- d. An ability to function effectively on teams to accomplish a common goal. -- f. An ability to communicate effectively with a range of audiences. -- i. An ability to use current techniques, skills, and tools necessary for computing practices. -- j. An ability to apply mathematical foundations, algorithmic principles, and computer science theory in the modeling and design of computer-based systems in a way that demonstrates comprehension of the tradeoffs involved in design choices. -- k. An ability to apply design and development principles in the construction of software systems of varying complexity.