This repository will host all source code and scripts for Data Algorithms Book. This book provides a set of MapReduce algrithms, which are implemented using
- Java/MapReduce Hadoop 2.5.0
- Java/Spark 1.0.2 (will upgrade to 1.1.0 in next few days)
Please note that this is a work in progress...
- Title: Data Algorithms
- Author: Mahmoud Parsian
- Publisher: O'Reilly Media
- All source code, libraries, and build scripts are posted here
- Shell scripts will be posted for running Spark/Hadoop program (soon!)
Software | Version |
---|---|
Java | JDK7 |
Hadoop | 2.5.0 |
Spark | 1.0.2 |
Ant | 1.9.4 |
Name | Description |
---|---|
README.md | The file you are reading now |
README_lib.md | The file you are reading now (must read before build) |
src | Source files for MapReduce/Hadoop/Spark |
lib | Required jar files |
build.xml | The ant build script |
dist | The ant build's output directory |
LICENSE | License for using this repository |
misc | misc. files for this repository |
setenv | example of how to set your environment variables before building |
Before you build, you should read README_lib.md
Apache's ant 1.9.4 is used for building the project.
-
To clean up:
ant clean
-
To build: the build will create /dist/data_algorithms_book.jar file.
ant
-
To check your build environment:
ant myenv
To run programs, you have to make sure that your CLASSPATH contains all of the following JAR files:
<install-dir>/dist/data_algorithms_book.jar
- all jar files in the
<install-dir>/lib/
directory
Make sure that you use the full path for all jar files. This is how you can set up your CLASSPATH in a Linux bash environment:
BOOK_HOME=<install-dir>
export CLASSPATH=.:$BOOK_HOME/dist/data_algorithms_book.jar
jars=`find $BOOK_HOME/lib -name '*.jar'`
for j in $jars ; do
export CLASSPATH=$CLASSPATH:$j
done
Please send me an email: [email protected]
Thank you!