最近一直在做机器学习比赛,学习大神们的源码时发现这两个函数使用频繁,自己也是花了一阵子才搞明白,先草草记录下暂时在比赛中用到的,比赛结束后再细细整理。
1、gruopby
In [35]: df = pd.DataFrame({'key1':['a', 'a', 'b', 'b', 'a'], ...: 'key2':['one', 'two', 'one', 'two', 'one'], ...: 'data1':np.random.randn(5),^M ...: 'data2':np.random.randn(5)})In [36]: dfOut[36]: key1 key2 data1 data20 a one -1.400763 0.4940591 a two 1.303229 -2.3967052 b one -0.482499 -1.5900933 b two -0.902582 -0.9090684 a one -0.628412 1.724196In [100]: df.groupby(['key1','key2'],as_index=False)['key1'].agg({'TotalNumber':'count'})Out[100]: key1 key2 TotalNumber0 a one 21 a two 12 b one 13 b two 1
这里用到了key1,key2两个键值作为分组标准,然后对key1进行计数(比赛中用到了类似的)。
还有,agg函数也经常使用,常与groupby连用
2、merge合并
In [89]: left = pd.DataFrame({'key1':['foo','foo','bar'],'key2':['one','one','two'],'lval':[1,2,3]})In [90]: right = pd.DataFrame({'key1':['foo','foo','bar','bar'],'key2':['one','one','one','two'],'rval':[4,5,6,7]})In [91]: leftOut[91]: key1 key2 lval0 foo one 11 foo one 22 bar two 3In [92]: rightOut[92]: key1 key2 rval0 foo one 41 foo one 52 bar one 63 bar two 7In [93]: left.merge(right,on=['key1','key2'],how='left')Out[93]: key1 key2 lval rval0 foo one 1 41 foo one 1 52 foo one 2 43 foo one 2 54 bar two 3 7
这里,用到了key1和key2两个键值作为合并依据,合并方式为left(左侧DataFrame取全部,右侧DataFrame取部分)
联系客服