打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
6.3. Preprocessing data

6.3. Preprocessing data


        The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators.

这个 sklearn.preprocessing预处理包 提供了几个常用的实用函数和变换器类,将原始特征向量转换为更适合下游估计器的表现形式。

 

        In general, learning algorithms benefit from standardization of the data set. If some outliers are present in the set, robust scalers or transformers are more appropriate. The behaviors of the different scalers, transformers, and normalizers on a dataset containing marginal outliers is highlighted in Compare the effect of different scalers on data with outliers.

一般来说,学习算法受益于数据集的标准化。如果集合中存在一些异常值,则更适合使用健壮的定标器或变换器。在比较不同定标器对含有边缘离群值的数据集的影响时,着重讨论了不同定标器、变换器和规范化器在含有边缘离群值的数据集上的行为。


6.3.1. Standardization, or mean removal and variance scaling

       Standardization of datasets is a common requirement for many machine learning estimators implemented in scikit-learn; they might behave badly if the individual features do not more or less look like standard normally distributed data: Gaussian with zero mean and unit variance.

数据集的标准化是许多在scikit learn中实现的机器学习估计器的一个共同要求;如果单个特征或多或少不像标准正态分布数据:均值和单位方差为零的高斯分布,则它们可能表现不好。

 

In practice we often ignore the shape of the distribution and just transform the data to center it by removing the mean value of each feature, then scale it by dividing non-constant features by their standard deviation.

 

For instance, many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the l1 and l2 regularizers of linear models) assume that all features are centered around zero and have variance in the same order. If a feature has a variance that is orders of magnitude larger than others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.

 

The preprocessing module provides the StandardScaler utility class, which is a quick and easy way to perform the following operation on an array-like dataset:

from sklearn import preprocessingimport numpy as npX_train = np.array([[ 1., -1.,  2.],                    [ 2.,  0.,  0.],                    [ 0.,  1., -1.]])scaler = preprocessing.StandardScaler().fit(X_train)

 

 

 

 

 

 

 

 

 

 

 

来源:https://www.icode9.com/content-4-880951.html
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
Python sklearn模型选择
【原】关于使用sklearn进行数据预处理
ML神器:sklearn的快速使用
sklearn数据预处理:归一化、标准化、正则化
机器学习中的数据预处理(sklearn preprocessing)
sklearn:sklearn.preprocessing.StandardScaler函数的fit_transform、transform、inverse_transform简介、使用方法之详细攻略
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服