Applied Machine Learning


2019 Fall CS5785 Cornell Tech

Fall 2019 is here! Here's what you need to know.

Course Description

Learn and apply key concepts of modeling, analysis and validation from Machine Learning, Data Mining and Signal Processing to analyze and extract meaning from data. Implement algorithms and perform experiments on images, text, audio and mobile sensor measurements. Gain working knowledge of supervised and unsupervised techniques including classification, regression, clustering, feature selection, association rule mining and dimensionality reduction.

Prerequisites

CS 2800 or equivalent plus experience programming with Python or Matlab, or permission of the instructor.

Instructor

Prof. Nathan Kallus
Office hours: Thursdays, 12:20PM - 1:20PM, in Bloomberg Center 363.

Teaching Assistants

Xiaojie Mao
Office hours: Tuesdays, 1:00PM - 2:00PM, in Bloomberg Center 301; Wednesdays, 4:00PM - 5:00PM, in Bloomberg Center 497.

Yichun Hu
Office hours: Mondays, 10:00AM - 11:00AM, in Bloomberg Center 338; Wednesdays, 4:00PM - 5:00PM, in Bloomberg Center 497.

Room & Time

Tuesdays and Thursdays, 11:00 AM - 12:15 PM, in Bloomberg Center 131.

Links: CMS for homework submission, Slack for discussions.

Textbooks (Available for free)

Required:
T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition), Springer-Verlag, 2008.
Recommended:
L. Wasserman, All of Statistics, Springer, 2004.
G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer, 2013.
Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin, Learning from Data, AMLBook, 2012.
P. Harrington, Machine Learning in Action, Manning, 2012.
H. Daumé III, A Course in Machine Learning, v0.8.

Course Requirements and Grading

  • Grade Breakdown: Your grade will be determined by the assignments (40%), one prelim (20%), a final exam (30%), and participation including in-class quizzes (10%).

  • Homework: There will be four assignments and an “assignment 0” for environment setup. Each assignment will have a due date for completion. Half of the points of the lowest-scoring assignment as well as homework 0 will count as extra credit, meaning the points received for homeworks is calculated as (0.1*hw0 + hw1 + hw2 + hw3 + 1.5*hw4) / 4.

  • Late Policy: Each student has a total of one slip day that may be used without penalty. Beyond that, any late submission is subject to a 20% penalty of the homework score for each late day.

  • External Code: Unless otherwise specified, you are allowed to use well known libraries such as scikit-learn, scikit-image, numpy, scipy, etc. in the assignments. Any reference or copy of public code repositories should be properly cited in your submission (examples include Github, Wikipedia, Blogs). In some assignment cases, you are NOT allowed to use any of the libraries above, please refer to individual HW instructions for more details.

  • Collaboration: You are encouraged (but not required) to work in groups of no more than 2 students on each assignment. Please indicate the name of your collaborator at the top of each assignment and cite any references you used (including articles, books, code, websites, and personal communications). If you’re not sure whether to cite a source, err on the side of caution and cite it. You may submit just one writeup for the group. Remember not to plagiarize: all solutions must be written by members of the group.

  • Quizzes: There will be surprise in-class quizzes to make sure you attend and pay attention to the class.

  • Prelim: October 22 in class. The exam is closed book but you are allowed to bring one sheet of written notes (Letter size, two-sided). You are allowed to use a calculator.

  • Final Exam: December 2 through December 9. The final exam is take-home, open-internet, but must be done by your own group with thorough citations of all references used. The kaggle competition for the final exam is available here. Our baseline solution is available here