打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
Python 数据分析:让你像写 Sql 语句一样,使用 Pandas 做数据分析

一、加载数据

Copy
import pandas as pdimport numpy as npurl = ('https://raw.github.com/pandas-dev/pandas/master/pandas/tests/data/tips.csv')tips = pd.read_csv(url)output = tips.head()

Output:

Copy
total_bill tip sex smoker day time size0 16.99 1.01 Female No Sun Dinner 21 10.34 1.66 Male No Sun Dinner 32 21.01 3.50 Male No Sun Dinner 33 23.68 3.31 Male No Sun Dinner 24 24.59 3.61 Female No Sun Dinner 4

二、SELECT 的使用方式

sql 语句: SELECT total_bill, tip, smoker, time FROM tips LIMIT 5;

Copy
output = tips[['total_bill', 'tip', 'smoker', 'time']].head(5)

Output:

Copy
total_bill tip smoker time0 16.99 1.01 No Dinner1 10.34 1.66 No Dinner2 21.01 3.50 No Dinner3 23.68 3.31 No Dinner4 24.59 3.61 No Dinner

三、WHERE 的使用方式

1. 举个栗子

sql 语句: SELECT * FROM tips WHERE time = 'Dinner' LIMIT 5;

Copy
output = tips[tips['time'] == 'Dinner'].head(5)# 或者output = tips.query("time == 'Dinner'").head(5)

Output:

Copy
total_bill tip sex smoker day time size0 16.99 1.01 Female No Sun Dinner 21 10.34 1.66 Male No Sun Dinner 32 21.01 3.50 Male No Sun Dinner 33 23.68 3.31 Male No Sun Dinner 24 24.59 3.61 Female No Sun Dinner 4

2. 比较运算符:等于 ==、 大于 >、 大于等于 >=、小于等于 <=、不等于 !=

2.1 等于 ==

sql 语句:SELECT * FROM tips WHERE time = 'Dinner';

Copy
output = tips[(tips['time'] == 'Dinner')]

2.2 大于 >

sql 语句:SELECT * FROM tips WHERE tip > 5.00;

Copy
output = tips[(tips['tip'] > 5.00)]

2.3 大于等于 >=

sql 语句:SELECT * FROM tips WHERE tip >= 5.00;

Copy
output = tips[(tips['size'] >= 5)]

2.4 小于等于 <=

sql 语句:SELECT * FROM tips WHERE tip <= 5.00;

Copy
output = tips[(tips['size'] <= 5)]

2.5 不等于 !=

sql 语句:SELECT * FROM tips WHERE tip <> 5.00;

Copy
output = tips[(tips['size'] != 5)]

3. 逻辑运算符:且 &、或 |、非 -

3.1 且 &

sql 语句:SELECT * FROM tips WHERE time = 'Dinner' AND tip > 5.00;

Copy
output = tips[(tips['time'] == 'Dinner') & (tips['tip'] > 5.00)]

3.2 或 |

sql 语句:SELECT * FROM tips WHERE size >= 5 OR total_bill > 45;

Copy
output = tips[(tips['size'] >= 5) | (tips['total_bill'] > 45)]

3.3 非 -

sql 语句:SELECT * FROM tips WHERE not (size <> 5 AND size > 4);

Copy
output = df[-((df['size'] != 5) & (df['size'] > 4))]

4. Null 的判断

这里重新定义一个包含 NaN 数据的 DataFrame

Copy
frame = pd.DataFrame({ 'col1': ['A', 'B', np.NaN, 'C', 'D'], 'col2': ['F', np.NaN, 'G', 'H', 'I'] })output = frame

Output:

Copy
col1 col20 A F1 B NaN2 NaN G3 C H4 D I

4.1 判断列是 Null

sql 语句:SELECT * FROM frame WHERE col2 IS NULL;

Copy
output = frame[frame['col2'].isna()]

Output:

Copy
col1 col21 B NaN

4.2 判断列不是 Null

sql 语句:SELECT * FROM frame WHERE col1 IS NOT NULL;

Copy
output = frame[frame['col1'].notna()]

Output:

Copy
col1 col20 A F1 B NaN3 C H4 D I

5. In、Like 操作

5.1 In

sql 语句:SELECT * FROM tips WHERE siez in (5, 6);

Copy
output = tips[tips['size'].isin([2, 5])]

5.2 Like

sql 语句:SELECT * FROM tips WHERE time like 'Din%';

Copy
output = tips[tips.time.str.contains('Din*')]

四、GROUP BY 的使用方式

sql 语句:SELECT sex, count(*) FROM tips GROUP BY sex;

Copy
output = tips.groupby('sex').size()# 获取相应的结果output['Male']output['Female']
Copy
output = tips.groupby('sex').count()# 获取相应的结果output['tip']['Female']
Copy
output = tips.groupby('sex')['total_bill'].count()# 获取相应的结果output['Male']output['Female']

sql 语句:SELECT day, AVG(tip), COUNT(*) FROM tips GROUP BY day;

Copy
output = tips.groupby('day').agg({'tip': np.mean, 'day': np.size})# 获取相应的结果output['day']['Fri']output['tip']['Fri']

sql 语句:SELECT smoker, day, COUNT(*), AVG(tip) FROM tips GROUP BY smoker, day;

Copy
output = tips.groupby(['smoker', 'day']).agg({'tip': [np.size, np.mean]})# 获取相应的结果output['tip']['size']['No']['Fri']

sql 语句:SELECT tip, count(distinct sex) FROM tips GROUP BY tip;

Copy
output = tips.groupby('tip').agg({'sex': pd.Series.nunique})

五、JOIN 连接的使用方式

定义两个 DataFrame。

Copy
df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})df2 = pd.DataFrame({'key': ['B', 'D', 'D', 'E'], 'value': np.random.randn(4)})

1. 内连接 Inner Join

sql 语句:SELECT * FROM df1 INNER JOIN df2 ON df1.key = df2.key;

Copy
output = pd.merge(df1, df2, on='key')# 或indexed_df2 = df2.set_index('key')pd.merge(df1, indexed_df2, left_on='key', right_index=True)

2. 左连接 Left Outer Join

sql 语句:SELECT * FROM df1 LEFT OUTER JOIN df2 ON df1.key = df2.key;

Copy
output = pd.merge(df1, df2, on='key', how='left')# 或output = df1.join(df2, on='key', how='left')

3. 右连接 Right Join

sql 语句:SELECT * FROM df1 RIGHT OUTER JOIN df2 ON df1.key = df2.key;

Copy
output = pd.merge(df1, df2, on='key', how='right')

4. 全连接 Full Join

sql 语句:SELECT * FROM df1 FULL OUTER JOIN df2 ON df1.key = df2.key;

Copy
output = pd.merge(df1, df2, on='key', how='outer')

五、UNION 的使用方式

Copy
df1 = pd.DataFrame({'city': ['Chicago', 'San Francisco', 'New York City'], 'rank': range(1, 4)})df2 = pd.DataFrame({'city': ['Chicago', 'Boston', 'Los Angeles'], 'rank': [1, 4, 5]})

sql 语句:SELECT city, rank FROM df1 UNION ALL SELECT city, rank FROM df2;

Copy
output = pd.concat([df1, df2])

sql 语句:SELECT city, rank FROM df1 UNION SELECT city, rank FROM df2;

Copy
output = pd.concat([df1, df2]).drop_duplicates()

六、与 SQL 等价的其他语法

1. 去重 Distinct

sql 语句:SELECT DISTINCT sex FROM tips;

Copy
output = tips.drop_duplicates(subset=['sex'], keep='first', inplace=False)

2. 修改列别名 As

sql 语句:SELECT total_bill AS total, sex AS xes FROM tips;

Copy
output = tips.rename(columns={'total_bill': 'total', 'sex': 'xes'}, inplace=False)

3. Limit 与 Offset

sql 语句:SELECT * FROM tips ORDER BY tip DESC LIMIT 10 OFFSET 5;

Copy
output = tips.nlargest(10 + 5, columns='tip').tail(10)

4. 每个 Group 的前几行

sql 语句:

Copy
SELECT * FROM ( SELECT t.*, ROW_NUMBER() OVER(PARTITION BY day ORDER BY total_bill DESC) AS rn FROM tips t)WHERE rn < 3ORDER BY day, rn;
Copy
output = tips.assign(rn=tips.sort_values(['total_bill'], ascending=False). groupby(['day']).cumcount() + 1). query('rn < 3'). sort_values(['day', 'rn'])

七、Update 的使用方式

sql 语句:UPDATE tips SET tip = tip*2 WHERE tip < 2;

Copy
output = tips.loc[tips['tip'] < 2, 'tip'] *= 2

八、Delete 的使用方式

sql 语句:DELETE FROM tips WHERE tip > 9;

Copy
output = tips = tips.loc[tips['tip'] <= 9]

九、参考文章

本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
PANDAS  Comparison with SQL
SQL思维快速上手使用Pandas
【Python实战】Pandas:让你像写SQL一样做数据分析(一)
下载股票的历史日交易数据并存入数据库
使用python查询oracle并保存为excel的两种方法
与常用SQL语句对应的pandas函数或方法有哪些?
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服