• People
  • Courses

Big Data, Large Scale Machine Learning

Course Information


  • 2013-05-09: Assignment 3 is released. It is an optional assignment.
  • 2013-04-08: Assignment 2 is ready.
  • 2013-02-26: Assignment 1 is out!
  • 2013-02-12: Public enrollment available in the Piazza discussion group. No more access codes or NYU email!
  • 2013-02-04: CHANGE OF CLASSROOM: lectures will now take place in the auditorium WWH 109
  • 2013-01-30: videos of first lecture available
  • 2013-01-28: first lecture today at 5:00 pm, Warren Weaver Hall, Room 101
    • topics: linear representation, on-line gradient descent and improvements thereof.
  • 2013-01-28: students will have access to a cluster with 100 node (with 8 cores each) running Linux and Hadoop.

Course Material


This course is for people interested in automatically extracting knowledge from large amounts of data. Students should have some prior knowledge or experience with basic machine learning methods.

You must have taken a machine learning course at the undergraduate or graduate level prior to taking this course, or have industry experience with machine learning.

Required skills:

  • knowledge of basic methods in machine learning such as linear classifiers, logistic regression, K-Means clustering, and principal components analysis.
  • although much of the assignments will use dynamic/scripting programming languages, some proficiency in C programming will be assumed
  • knowledge of basic concepts in probability and statistics: probability distributions and probability density functions, conditional probabilities, marginalization, Bayes' theorem
  • basic knowledge of linear algebra and multivariate calculus: linear system solving, eigenvalues/eigenvectors, least square minimization, gradient, Jacobian, and Hessian.


  • Introduction
  • Online methods for linear models
  • Online methods for nonlinear models
  • Boosted Decision Trees and stumps
  • Mapreduce/Allreduce
  • Hadoop
  • Parallelization of learning algorithms: OpenMP, CUDA, OpenCL
  • Inverted Indices & Predictive Indexing
  • Feature Hashing
  • Locally-sensitive Hashing & Linear Dimensionality Reduction
  • Nonlinear Dimensionality Reduction
  • Feature Learning
  • Handling Many Classes, class embedding
  • Active Learning
  • Exploration and Learning


Evaluation will be a combination of programming assignments and a final project.

/srv/www/cilvr/htdocs/data/pages/courses/bigdata/start.txt · Last modified: 2013/05/09 19:18 by xz558
Recent changes RSS feed Creative Commons License Valid XHTML 1.0 Valid CSS Driven by DokuWiki
Drupal Garland Theme for Dokuwiki