6.3. Preprocessing data

6.3.1. Standardization, or mean removal and variance scaling

from sklearn import preprocessingimport numpy as npX_train = np.array([[ 1., -1.,  2.],                    [ 2.,  0.,  0.],                    [ 0.,  1., -1.]])scaler = preprocessing.StandardScaler().fit(X_train)

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。

6.3. Preprocessing data

The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators.

这个 sklearn.preprocessing预处理包 提供了几个常用的实用函数和变换器类，将原始特征向量转换为更适合下游估计器的表现形式。

6.3.1. Standardization, or mean removal and variance scaling

Standardization of datasets is a common requirement for many machine learning estimators implemented in scikit-learn; they might behave badly if the individual features do not more or less look like standard normally distributed data: Gaussian with zero mean and unit variance.

数据集的标准化是许多在scikit learn中实现的机器学习估计器的一个共同要求；如果单个特征或多或少不像标准正态分布数据：均值和单位方差为零的高斯分布，则它们可能表现不好。

In practice we often ignore the shape of the distribution and just transform the data to center it by removing the mean value of each feature, then scale it by dividing non-constant features by their standard deviation.

The preprocessing module provides the StandardScaler utility class, which is a quick and easy way to perform the following operation on an array-like dataset:

这个 `sklearn.preprocessing`预处理包提供了几个常用的实用函数和变换器类，将原始特征向量转换为更适合下游估计器的表现形式。