ProductPromotion
Logo

Go.Lang

made by https://0x3d.site

GitHub - e-XpertSolutions/go-cluster: k-modes and k-prototypes clustering algorithms implementation in Go
k-modes and k-prototypes clustering algorithms implementation in Go - e-XpertSolutions/go-cluster
Visit Site

GitHub - e-XpertSolutions/go-cluster: k-modes and k-prototypes clustering algorithms implementation in Go

GitHub - e-XpertSolutions/go-cluster: k-modes and k-prototypes clustering algorithms implementation in Go

go-cluster

GoDoc License GoReport Travis cover.run go

Go implementation of clustering algorithms: k-modes and k-prototypes.

K-modes algorithm is very similar to well-known clustering algorithm k-means. The difference is how the distance is computed. In k-means Euclidean distance between two vectors is most commonly used. While it works well for numerical, continuous data it is not suitable to use it with categorical data as it is impossible to compute the distance between values like ‘Europe’ and ‘Africa’. This is why in k-modes, the Hamming distance between vectors is used - it shows how many elements of two vectors is different. It is a good alternative for one-hot encoding while dealing with large number of categories for one feature. K-prototypes is used to cluster mixed data (both categorical and numerical).

Implementation of algorithms is based on papers: HUANG97, HUANG98, CAO09 and partially inspired by python implementation of same algorithms: KMODES.

Installation

go get github.com/e-XpertSolutions/go-cluster/v2

Usage

This is basic configuration and usage of KModes and KPrototypes algorithms. For more information please refer to the documentation.

package main

import (
    "fmt"
    "github.com/e-XpertSolutions/go-cluster/cluster"
)

func main() {

    //input categorical data first must be dictionary-encoded to numbers - for example for values
    //"blue", "red", "green" it can be 1,2,3

    data := cluster.NewDenseMatrix(lineNumber, columnNumber, rawData)
    newData := cluster.NewDenseMatrix(newLineNumber, newColumnNumber, newRawData)


    //input parameters for the algorithm

    //distance and initialization functions may be chosen from the package or one may use 
    //custom functions with proper arguments
    distanceFunction := cluster.WeightedHammingDistance
    initializationFunction := cluster.InitCao

    //number of clusters and maximum number of iterations 
    clustersNumber := 5
    maxIteration := 20

    //weight vector - used to set importance of the features, bigger number means greater 
    //contribution to the cost function
    //vector must be of the same length as the number of features in dataset
    //it is not compulsory, if 'nil' then all features are treated equally (weight = 1)  
    weights := []float64{1,1,2}
    wvec := [][]float64{weights}

    //path to file where model will be saved or loaded from using LoadModel(), SaveModel()
    //if no need to load or save the model, can be set to empty string
    path = "km.txt"

    //KModes algorithm
    //initialization
    km := cluster.NewKModes(distanceFunction, initializationFunction, clustersNumber, 1, 
    maxIteration, wvec, "km.txt")


    //training
    //after training it is possible to access clusters centers vectors and computed labels
    //using km.ClusterCentroids and km.Labels
    err := km.FitModel(data)
    if err != nil {
        fmt.Println(err)
    }

    //predicting labels for new data
    newLabels, err := km.Predict(newData)
    if err != nil {
        fmt.Println(err)
    }


    //KPrototypes algorithm
    //it needs two more parameters than k-modes:
    //categorical - vector with numbers indicating columns with categorical features
    //gamma - float number, importance of cost contribution for numerical values
    categorical := []int{1} // means that only column number one contains categorical data
    gamma := 0.2 //cost from distance function for numerical data will be multiplied by 0.2

    //initialization
    kp := cluster.NewKPrototypes(distanceFunction, initializationFunction, categorical, 
    clustersNumber, 1, maxIteration, wvec, gamma, "km.txt")

    //training
    err := kp.FitModel(data)
    if err != nil {
        fmt.Println(err)
    }

    //predicting labels for new data
    newLabelsP, err := kp.Predict(newData)
    if err != nil {
        fmt.Println(err)
    }
}

Contributing

Contributions are greatly appreciated. The project follows the typical GitHub pull request model for contribution.

License

The sources are release under a BSD 3-Clause License. The full terms of that license can be found in LICENSE file of this repository.

References

[HUANG97]: Huang, Z.: Clustering large data sets with mixed numeric and categorical values, Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference, Singapore, pp. 21-34, 1997.

[HUANG98] Huang, Z.: Extensions to the k-modes algorithm for clustering large data sets with categorical values, Data Mining and Knowledge Discovery 2(3), pp. 283-304, 1998.

[CAO09] Cao, F., Liang, J, Bai, L.: A new initialization method for categorical data clustering, Expert Systems with Applications 36(7), pp. 10223-10228., 2009.

[KMODES] Python implementation of k-modes: https://github.com/nicodv/kmodes

Articles
to learn more about the golang concepts.

Resources
which are currently available to browse on.

mail [email protected] to add your project or resources here 🔥.

FAQ's
to know more about the topic.

mail [email protected] to add your project or resources here 🔥.

Queries
or most google FAQ's about GoLang.

mail [email protected] to add more queries here 🔍.

More Sites
to check out once you're finished browsing here.

0x3d
https://www.0x3d.site/
0x3d is designed for aggregating information.
NodeJS
https://nodejs.0x3d.site/
NodeJS Online Directory
Cross Platform
https://cross-platform.0x3d.site/
Cross Platform Online Directory
Open Source
https://open-source.0x3d.site/
Open Source Online Directory
Analytics
https://analytics.0x3d.site/
Analytics Online Directory
JavaScript
https://javascript.0x3d.site/
JavaScript Online Directory
GoLang
https://golang.0x3d.site/
GoLang Online Directory
Python
https://python.0x3d.site/
Python Online Directory
Swift
https://swift.0x3d.site/
Swift Online Directory
Rust
https://rust.0x3d.site/
Rust Online Directory
Scala
https://scala.0x3d.site/
Scala Online Directory
Ruby
https://ruby.0x3d.site/
Ruby Online Directory
Clojure
https://clojure.0x3d.site/
Clojure Online Directory
Elixir
https://elixir.0x3d.site/
Elixir Online Directory
Elm
https://elm.0x3d.site/
Elm Online Directory
Lua
https://lua.0x3d.site/
Lua Online Directory
C Programming
https://c-programming.0x3d.site/
C Programming Online Directory
C++ Programming
https://cpp-programming.0x3d.site/
C++ Programming Online Directory
R Programming
https://r-programming.0x3d.site/
R Programming Online Directory
Perl
https://perl.0x3d.site/
Perl Online Directory
Java
https://java.0x3d.site/
Java Online Directory
Kotlin
https://kotlin.0x3d.site/
Kotlin Online Directory
PHP
https://php.0x3d.site/
PHP Online Directory
React JS
https://react.0x3d.site/
React JS Online Directory
Angular
https://angular.0x3d.site/
Angular JS Online Directory