Table of Contents
Big Data Software
This page contains information about the software used during the course.
Lua is a very simple, compact and efficient interpreter/compiler with a straightforward syntax. It is used widely as a scripting language in the computer game industry. Torch extends Lua with an extensive numerical library and various facilities for machine learning and computer vision.
Torch has computational back-ends for multicore/multi-CPU machines (using Intel/AVX and OpenMP), NVidia GPUs (using CUDA), and ARM CPUs (using the Neon instruction set).
Many research projects at the CILVR Lab are built with Torch.
Quick Installation of Torch on Ubuntu and Mac OSX
Type the following command in your shell:
sudo curl -s https://raw.github.com/clementfarabet/torchinstall/master/install-all | bash
and type your sudo password when prompted.
This line will download, compile, and install Torch (or update it if you have installed it otherwise). This will also download and install all the pre-requisite packages (including homebrew on Mac OS).
This script will also install all the commonly-used optional Torch packages.
Note that this script will only work on Ubuntu and Mac OS.
If you don't have sudo privileges, you can type the above without the leading “sudo”, but you need to manually install the required packaged beforehand.
For more details go to https://github.com/clementfarabet/torchinstall.
Manual installation on Linux and Mac OSX
Go to http://www.torch.ch/manual/install/index and follow instructions.
Then go to http://www.torch.ch/manual/packages to see a list of optional packages, and install them with:
torch-pkg install <package-name>
Torch7 is not completely supported on Windows yet. The simplest thing is to download this Ubuntu virtual machine, which includes a pre-compiled, pre-installed version of Torch. This virtual machine requires VirtualBox.
Torch/Lua On-line resources
Torch Tutorial for Machine Learning
Vowpal Wabbit / All Reduce
Vowpal Wabbit library for fast machine learning, by John Langford.
VW is the essence of speed in machine learning, able to learn from terafeature datasets with ease. Via parallel learning, it can exceed the throughput of any single machine network interface when doing linear learning, a first amongst learning algorithms.
We primarily use the wiki off github. A few useful starting points are:
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.