2019 Fall CS5785 Cornell Tech

Learn and apply key concepts of modeling, analysis and validation from Machine Learning, Data Mining and Signal Processing to analyze and extract meaning from data. Implement algorithms and perform experiments on images, text, audio and mobile sensor measurements. Gain working knowledge of supervised and unsupervised techniques including classification, regression, clustering, feature selection, association rule mining and dimensionality reduction.

CS 2800 or equivalent plus experience programming with Python or Matlab, or permission of the instructor.

Prof. Nathan Kallus

Office hours: Thursdays, 12:20PM - 1:20PM, in Bloomberg Center 363.

Xiaojie Mao

Office hours: Tuesdays, 1:00PM - 2:00PM, in Bloomberg Center 301; Wednesdays, 4:00PM - 5:00PM, in Bloomberg Center 497.

Yichun Hu

Office hours: Mondays, 10:00AM - 11:00AM, in Bloomberg Center 338; Wednesdays, 4:00PM - 5:00PM, in Bloomberg Center 497.

Tuesdays and Thursdays, 11:00 AM - 12:15 PM, in Bloomberg Center 131.

**Links:** CMS for homework submission, Slack for discussions.

**Required:**

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition), Springer-Verlag, 2008.

**Recommended:**

L. Wasserman, All of Statistics, Springer, 2004.

G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer, 2013.

Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin, Learning from Data, AMLBook, 2012.

P. Harrington, Machine Learning in Action, Manning, 2012.

H. Daumé III, A Course in Machine Learning, v0.8.

**Grade Breakdown:**Your grade will be determined by the assignments (40%), one prelim (20%), a final exam (30%), and participation including in-class quizzes (10%).**Homework:**There will be four assignments and an “assignment 0” for environment setup. Each assignment will have a due date for completion. Half of the points of the lowest-scoring assignment as well as homework 0 will count as extra credit, meaning the points received for homeworks is calculated as (0.1*hw0 + hw1 + hw2 + hw3 + 1.5*hw4) / 4.**Late Policy:**Each student has a total of**one**slip day that may be used without penalty. Beyond that, any late submission is subject to a 20% penalty of the homework score for each late day.**External Code:**Unless otherwise specified, you are allowed to use well known libraries such as*scikit-learn, scikit-image, numpy, scipy,*etc. in the assignments. Any reference or copy of public code repositories should be properly cited in your submission (examples include*Github, Wikipedia, Blogs*). In some assignment cases, you are NOT allowed to use any of the libraries above, please refer to individual HW instructions for more details.**Collaboration:**You are encouraged (but not required) to work in groups of no more than 2 students on each assignment. Please indicate the name of your collaborator at the top of each assignment and cite any references you used (including articles, books, code, websites, and personal communications). If you’re not sure whether to cite a source, err on the side of caution and cite it. You may submit just one writeup for the group. Remember not to plagiarize: all solutions must be written by members of the group.**Quizzes:**There will be surprise in-class quizzes to make sure you attend and pay attention to the class.**Prelim: October 22**in class. The exam is closed book but you are allowed to bring one sheet of written notes (Letter size, two-sided). You are allowed to use a calculator.**Final Exam: December 2 through December 9.**The final exam is take-home, open-internet, but must be done by your own group with thorough citations of all references used. The kaggle competition for the final exam is available here. Our baseline solution is available here