打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
A Gentle Introduction to Scikit
0
0
48
42
If you are a Python programmer or you are looking for a robust library you can use to bring machine learning into a production system then a library that you will want to seriously consider is scikit-learn.
In this post you will get an overview of the scikit-learn library and useful references of where you can learn more.
Where did it come from?
Scikit-learn was initially developed by David Cournapeau as a Google summer of code project in 2007.
Later Matthieu Brucher joined the project and started to use it as apart of his thesis work. In 2010 INRIA got involved and the first public release (v0.1 beta) was published in late January 2010.
The project now has more than 30 active contributors and has had paid sponsorship fromINRIA, Google,Tinyclues and thePython Software Foundation.
Scikit-learn Homepage
What is scikit-learn?
Scikit-learn provides a range of supervised and unsupervised learning algorithms via a consistent interface in Python.
It is licensed under a permissive simplified BSD license and is distributed under many Linux distributions, encouraging academic and commercial use.
The library is built upon the SciPy (Scientific Python) that must be installed before you can use scikit-learn. This stack that includes:
NumPy: Base n-dimensional array package
SciPy: Fundamental library for scientific computing
Matplotlib: Comprehensive 2D/3D plotting
IPython: Enhanced interactive console
Sympy: Symbolic mathematics
Pandas: Data structures and analysis
Extensions or modules for SciPy care conventionally namedSciKits. As such, the module provides learning algorithms and is named scikit-learn.
The vision for the library is a level of robustness and support required for use in production systems. This means a deep focus on concerns such as easy of use, code quality, collaboration, documentation and performance.
Although the interface is Python, c-libraries are leverage for performance such as numpy for arrays and matrix operations,LAPACK,LibSVM and the careful use of cython.
Your Guide to Machine Learning with Scikit-Learn
Python and scikit-learn are the rising platform among professional data scientists for applied machine learning.
PDF and Email Course.
FREE 14-Day Mini-Course in
Machine Learning with Python and scikit-learn
Download Your FREE Mini-Course >>
Download your PDF containing all 14 lessons.
Get your daily lesson via email with tips and tricks.
What are the features?
The library is focused on modeling data. It is not focused on loading, manipulating and summarizing data. For these features, refer to NumPy and Pandas.
Screenshot taken from a demo of the mean-shift clustering algorithm
Some popular groups of models provided by scikit-learn include:
Clustering: for grouping unlabeled data such as KMeans.
Cross Validation: for estimating the performance of supervised models on unseen data.
Datasets: for test datasets and for generating datasets with specific properties for investigating model behavior.
Dimensionality Reduction: for reducing the number of attributes in data for summarization, visualization and feature selection such as Principal component analysis.
Ensemble methods: for combining the predictions of multiple supervised models.
Feature extraction: for defining attributes in image and text data.
Feature selection: for identifying meaningful attributes from which to create supervised models.
Parameter Tuning: for getting the most out of supervised models.
Manifold Learning: For summarizing and depicting complex multi-dimensional data.
Supervised Models: a vast array not limited to generalized linear models, discriminate analysis, naive bayes, lazy methods, neural networks, support vector machines and decision trees.
Example: Classification and Regression Tress
I want to give you an example to show you how easy it is to use the library.
In this example, we use the Classification and Regression Tress (CART) decision tree algorithm to model the Iris flower dataset.
This dataset is provided as an example dataset with the library and is loaded. The classifier is fit on the data and then predictions are made on the training data.
Finally, the classification accuracy and a confusion matrix is printed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Sample Decision Tree Classifier
from sklearn import datasets
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier
# load the iris datasets
dataset = datasets.load_iris()
# fit a CART model to the data
model = DecisionTreeClassifier()
model.fit(dataset.data, dataset.target)
print(model)
# make predictions
expected = dataset.target
predicted = model.predict(dataset.data)
# summarize the fit of the model
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))
Running this example produces the following output, showing you the details of the trained model, the skill of the model according to some common metrics and a confusion matrix.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best')
precision    recall  f1-score   support
0       1.00      1.00      1.00        50
1       1.00      1.00      1.00        50
2       1.00      1.00      1.00        50
avg / total       1.00      1.00      1.00       150
[[50  0  0]
[ 0 50  0]
[ 0  0 50]]
Who is using it?
Thescikit-learn testimonials page lists Inria, Mendeley, wise.io , Evernote, Telecom ParisTech and AWeber as users of the library.
If this is a small indication of companies that have presented on their use, then there are very likely tens to hundreds of larger organizations using the library.
It has good test coverage and managed releases and is suitable for prototype and production projects alike.
Resources
If you are interested in learning more, checkout theScikit-Learn homepage that includes documentation and related resources.
You can get the code from thegithub repository, and releases are historically available on theSourceforge project.
Documentation
I recommend starting out with the quick-start tutorial and flicking through the user guide and example gallery for algorithms that interest you.
Ultimately, scikit-learn is a library and the API reference will be the best documentation for getting things done.
Quick Start Tutorial http://scikit-learn.org/stable/tutorial/basic/tutorial.html
User Guidehttp://scikit-learn.org/stable/user_guide.html
API Reference http://scikit-learn.org/stable/modules/classes.html
Example Galleryhttp://scikit-learn.org/stable/auto_examples/index.html
Papers
If you interested in more information about how the project started and it’s vision, there are some papers you may want to check-out.
Scikit-learn: Machine Learning in Python (2011)
API design for machine learning software: experiences from the scikit-learn project (2013)
Books
If you are looking for a good book, I recommend “Building Machine Learning Systems with Python”. It’s well written and the examples are interesting.
Learning scikit-learn: Machine Learning in Python (2013)
Building Machine Learning Systems with Python (2013)
Statistics, Data Mining, and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data (2014)
Frustrated With Python Machine Learning?
Develop Your Own Models and Predictions in Minutes
...with just a few lines of scikit-learn code
Discover how in my new Ebook:Machine Learning Mastery With Python
It covers self-study tutorials and end-to-end projects on topics like:
Loading data, visualization, modeling, algorithm tuning, and much more...
Finally Bring Machine Learning To
Your Own Projects
Skip the Academics. Just Results.
Click to learn more.
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
Next Generation Machine Learning
17 Great Machine Learning Libraries
scikit-learn intallation
scikit-learn安装注意顺序
教小伙伴们使用Tracker软件
强烈安利这款功能强悍的机器学习可视化工具
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服