Skip to content

forky-mcforkface/skale

This branch is 4 commits ahead of skale-me/skale:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f6640ad · May 14, 2024
May 14, 2024
Nov 22, 2017
Nov 23, 2017
Nov 14, 2017
Nov 24, 2017
Nov 22, 2017
Nov 24, 2017
Nov 26, 2017
Dec 25, 2018
Oct 20, 2017
Oct 19, 2017
Oct 27, 2017
Dec 25, 2018
Dec 26, 2018
Oct 11, 2017
Nov 22, 2017
Mar 23, 2016
Dec 25, 2018
May 29, 2021
Nov 14, 2017
Dec 25, 2018
Nov 22, 2017
Nov 14, 2017
Dec 26, 2018
Dec 26, 2018
Dec 26, 2018

Repository files navigation

Development activity is stopped, and this project is now archived.

logo

Build Status Build Status npm badge

High performance distributed data processing and machine learning.

Skale provides a high-level API in Javascript and an optimized parallel execution engine on top of NodeJS.

Features

  • Pure javascript implementation of a Spark like engine
  • Multiple data sources: filesystems, databases, cloud (S3, azure)
  • Multiple data formats: CSV, JSON, Columnar (Parquet)...
  • 50 high level operators to build parallel apps
  • Machine learning: scalable classification, regression, clusterization
  • Run interactively in a nodeJS REPL shell
  • Docker ready, simple local mode or full distributed mode
  • Very fast, see benchmark

Quickstart

npm install skale

Word count example:

var sc = require('skale').context();

sc.textFile('/my/path/*.txt')
  .flatMap(line => line.split(' '))
  .map(word => [word, 1])
  .reduceByKey((a, b) => a + b, 0)
  .count(function (err, result) {
    console.log(result);
    sc.end();
  });

Local mode

In local mode, worker processes are automatically forked and communicate with app through child process IPC channel. This is the simplest way to operate, and it allows to use all machine available cores.

To run in local mode, just execute your app script:

node my_app.js

or with debug traces:

SKALE_DEBUG=2 node my_app.js

Distributed mode

In distributed mode, a cluster server process and worker processes must be started prior to start app. Processes communicate with each other via raw TCP or via websockets.

To run in distributed cluster mode, first start a cluster server on server_host:

./bin/server.js

On each worker host, start a worker controller process which connects to server:

./bin/worker.js -H server_host

Then run your app, setting the cluster server host in environment:

SKALE_HOST=server_host node my_app.js

The same with debug traces:

SKALE_HOST=server_host SKALE_DEBUG=2 node my_app.js

Resources

Authors

The original authors of skale are Cedric Artigue and Marc Vertes.

List of all contributors

License

Apache-2.0

Credits

Logo Icon made by Smashicons from www.flaticon.com is licensed by CC 3.0 BY

About

High performance distributed data processing engine

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 98.5%
  • Other 1.5%