After wonderful feedback on myprevious post on Scikit-learnfrom the guys at/r/MachineLearning,I decided to collect the list of machine learning libraries into thisseperate note. Let me know if there’s a library that should beincluded here.
Python
- Scikit-learn: comprehensive and easyto use, I wrote a whole articleon why I like this library.
- PyBrain: Neural networks are one thingthat are missing from SciKit-learn, but this module makes up forit.
- nltk: really useful if you’re doinganything NLP or text mining related.
- Theano:efficient computation of mathematical expressions usingGPU. Excellent for deep learning.
- Pylearn2: machinelearning toolbox built on top of Theano - in very early stages ofdevelopment.
- MDP (Modular toolkit forData Processing): a framework that is useful when setting upworkflows.
Java
- Spark: Apache’s new upstart,supposedly up to a hundred times faster than Hadoop, now includesMLLib, which contains a good selection of machine learningalgorithms, including classification, clustering and recommendationgeneration. Currently undergoing rapid development. Development canbe in Python as well as JVM languages.
- Mahout: Apache’s machine learningframework built on top of Hadoop, this looks promising, but comeswith all the baggage and overhead of Hadoop.
- Weka: this is a Javabased library with a graphical user interface that allows you torun experiments on small datasets. This is great if you restrictyourself to playing around to get a feel for what is possible withmachine learning. However, I would avoid using this in productioncode at all costs: the API is very poorly designed, the algorithmsare not optimised for production use and the documentation is oftenlacking.
- Mallet: another Java based librarywith an emphasis on document classification. I’m not so familiarwith this one, but if you have to use Java this is bound to bebetter than Weka.
- JSAT:stands for “Java Statistical Analysis Tool” - created by EdwardRaff and was born out of his frustation with Weka (I know thefeeling). Looks pretty cool.
.NET
- Accord.NET: thisseems to be pretty comprehensive, and comes recommended byprimaryobjects onReddit. There is perhaps a slight slant towards image processingand computer vision, as it builds on the popular libraryAForge.NET for this purpose.
- Another option is to use one of the Java libraries compiled to .NETusing IKVM - I have used this approachwith success in production.
C++
- Vowpal Wabbit:designed for very fast learning and released under a BSD license,this comes recommended byterath on Reddit.
- MultiBoost: a fast C++ frameworkimplementing some boosting algorithms as well as some cascades(like the Viola-Jones cascades). It’s mainly focused on AdaBoost.MHso it is multi-class/multi-label.
- Shogun: large machine learning library with a focus on kernel methods and support vector machines. Bindings to Matlab, R, Octave and Python.
General
- LibSVM andLibLinear:these are C libraries for support vector machines; there are alsobindings or implementations for many other languages. These are thelibraries used for support vector machine learning in Scikit-learn.
Conclusion
This article is a work in progress, so please send me your comments orcriticisms!
Want more? Sign up below to get a free ebookMachine Learning in Practice, andupdates on new posts:
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请
点击举报。