打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
精确统计github贡献者的代码行数

github的仓库是可以统计每个贡献者的代码行数的,公司年会的时候,特设了一个“码神奖”,颁给去年贡献代码最多的工程师,github的统计数据显示,这位大神去年提交的代码达到了110w行,这个数据太惊人了,一个人不可能写这么多代码的,我非常好奇的研究了一下,发现中间还包括了他提交的很多第三方库,但github也一并统计了,而且经过他合并的代码也会统计进去。那么有没有办法去掉这些无效数据,得到真实的代码贡献量呢?查了一下github api,再结合git 命令,还是可以的,上代码:

#copy this script to your target repo#run python github-stats.py to collect dataimport reimport jsonimport osimport sysimport requests#get token from cmd linetk = sys.argv[1]user_stats={"dummy":{"additions":0,"deletions":0,"total":0}}#query github api for last year's commitspayload = {'since':'2013-01-01T00:00:00Z','until':'2014-01-01T00:00:00Z','access_token':tk}token = {'access_token':tk}def is_merge(commit_sha):	cmd = "git show --oneline " + commit_sha	output = os.popen(cmd)	title = output.read()	p_merge = re.compile("Merge")	if(p_merge.search(title) is not None):		return True	else:		return Falsedef collect_stats(commit_list):	for m in commit_list:		#print user_stats		#print m['sha']		#print data		if(is_merge(m['sha'])):			continue		git_show_command = "git show -s --format=%an " + m['sha']				output = os.popen(git_show_command)		user = output.read().strip(' \t\n\r')		#print user		#r2 = requests.get(commit_request_api+m['sha'], params = token)		#commit = r2.json()		#print commit		git_diff_command = "git diff --shortstat "+m['sha'] + " " + m['sha'] + "^"				output = os.popen(git_diff_command)		data = output.read()						#print "data is:"		#print data		p_ins = re.compile("(\d+) insertion")		r_ins = p_ins.search(data)		ins_data = 0		del_data = 0		if(r_ins is not None):		  ins_str = r_ins.group(1)		  ins_data = int(ins_str)		  #print ins_data		p_del = re.compile("(\d+) deletion")		r_del = p_del.search(data)		if(r_del is not None):		  del_str = r_del.group(1)		  del_data = int(del_str)		  #print del_data 		if(ins_data + del_data > 5000):		  print user		  print 'ins:'+str(ins_data)		  print 'del:'+str(del_data)		  ins_data = 0		  del_data = 0		if(user in user_stats):		  stats = user_stats[user]		  stats['additions'] += ins_data		  stats['deletions'] += del_data		  stats['total'] += (ins_data + del_data)		  user_stats[user] = stats		else:		  new_stat = {'additions':ins_data, 'deletions':del_data, 'total':ins_data+del_data}		  user_stats[user] = new_statr = requests.get("https://api.github.com/repos/cocos2d/cocos2d-x/commits", params = payload)collect_stats(r.json())print user_statspattern = re.compile("<(\S+)>; rel=\"next\"")h = r.headersprint r.headers['X-RateLimit-Remaining']result = pattern.search(h['link'])while(result is not None):		next_url = result.group(1) 	r = requests.get(next_url, params = token)	collect_stats(r.json())  	h = r.headers	print h['link']	result = pattern.search(h['link'])	#print h['link']	#next_url = result.group(1)	#print next_url	#r_next = requests.get(next_url[1])	print r.headers['X-RateLimit-Remaining']	print user_stats
代码也可以在github上获得: https://github.com/heliclei/githubtools/blob/master/github-stats.py
这个脚本过滤了单次提交超过5000行的commit,并且过滤了合并的commit,先把需要统计的仓库克隆到本地,再把这个脚本拷贝到本地git仓库下,注意要把这一行改为对应仓库的url

https://api.github.com/repos/cocos2d/cocos2d-x/commits
github token可以用上一篇的脚本生成
运行 python git-stats.py xxxxxxxxxxxxxgithub-oauth-tokenxxxxxxxxxxxxxxxxxxx
PS: 过滤后,cocos2d-x的码神去年的代码贡献量超过了10w行,还是非常的厉害~~但这个数据没有110W行那么超现实了。    

本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
Pull Request的正确打开方式(如何在GitHub上贡献开源项目)
实用技术 | Github with R:不简明但是很好懂的教程
git merge
You have not concluded your merge (MERGE
被称为“开发者神器”的GitHub,到底该怎么用?
Git命令小记
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服