Big Data, Large Scale Machine Learning
Term: Spring 2013
Class Time: Tuesdays 5:00 to 6:50 pm
Classroom location: Warren Weaver Hall Room 109 ←-(note change of room)
-
Teaching Assistant: Xiang Zhang xiang.zhang AT nyu.edu
-
News
2013-05-09:
Assignment 3 is released. It is an optional assignment.
-
-
-
2013-02-04: CHANGE OF CLASSROOM: lectures will now take place in the auditorium WWH 109
2013-01-30: videos of first lecture available
2013-01-28: first lecture today at 5:00 pm, Warren Weaver Hall, Room 101
-
2013-01-28: students will have access to a cluster with 100 node (with 8 cores each) running Linux and Hadoop.
Course Material
Prerequisites
This course is for people interested in automatically extracting knowledge from large amounts of data. Students should have some prior knowledge or experience with basic machine learning methods.
You must have taken a machine learning course at the undergraduate or graduate level prior to taking this course, or have industry experience with machine learning.
Required skills:
knowledge of basic methods in machine learning such as linear classifiers, logistic regression, K-Means clustering, and principal components analysis.
although much of the assignments will use dynamic/scripting programming languages, some proficiency in C programming will be assumed
knowledge of basic concepts in probability and statistics: probability distributions and probability density functions, conditional probabilities, marginalization, Bayes' theorem
basic knowledge of linear algebra and multivariate calculus: linear system solving, eigenvalues/eigenvectors, least square minimization, gradient, Jacobian, and Hessian.
Syllabus
Introduction
Online methods for linear models
Online methods for nonlinear models
LBFGS
Boosted Decision Trees and stumps
Mapreduce/Allreduce
Hadoop
Parallelization of learning algorithms: OpenMP, CUDA, OpenCL
Inverted Indices & Predictive Indexing
Feature Hashing
Locally-sensitive Hashing & Linear Dimensionality Reduction
Nonlinear Dimensionality Reduction
Feature Learning
Handling Many Classes, class embedding
Active Learning
Exploration and Learning
Evaluation
Evaluation will be a combination of programming assignments and a final project.