Kmeans

This is a scala implementation of the K-Means clustering algorithm written in a functional way i.e. using functional programming. The algorithm is well-known, a good explanation here. As a short summary it uses alternating iterations of assigning each datapoint to its closest centroid and re-evaluating the centroid position, which is the mean of all its data points.

Initialising Centroids

Currently one method of initialising the centroids is used, called RUNFP and discussed here. The idea behind this method is to make the candidate centroids as far apart as possible. The method selects the first k points to be centroids initially. It then updates these k candidate centroids according to the following rule applied to each of the k candidate centroids. If the candidate centroid is further from the nearest existing centroid than the distance between the two closest existing centroids then use the candidate centroid instead of the closer of the two to the existing centroids.

Example

An example based on the Old Faithful geyser is in the example directory. Running clustering on this dataset produces the output shown below. A Plotter object is included to plot the output of clustering in 2 dimensions like this example.

Usage

The application takes command line arguments:

The format of file to process (csv only currently)
The filepath to the datafile
Whether the file has a header
Whether to plot the output

To run the app via sbt:

sbt run csv example/old_faithful.csv True True

To compile an uber jar with all dependencies use sbt assembly. This allows you to run on a system without scala (e.g. with java) like below

java -jar Kmeans-assembly-0.2.jar csv example/old_faithful.csv True True

The sbt assembly command will generate the uber jar at target/scala-2.11/Kmeans-assembly-0.2.jar in your project directory.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
example		example
project		project
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
build.sbt		build.sbt
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kmeans

Initialising Centroids

Example

Usage

About

Releases

Packages

Languages

pagrim/Kmeans-functional

Folders and files

Latest commit

History

Repository files navigation

Kmeans

Initialising Centroids

Example

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages