一.需求分析与解决思路

**1.需求:**需求是公司大领导想要了解每月研发提交的代码量,虽然本人也认为代码量不代表质量。可是现实总是如此的无奈,用量来衡量质量如此不可取的方法只会导致更加内卷。**2.解决思路:**工具: Gitstats :仓库代码统计工具之一,可以按git提交人、提交次数、修改文件数、代码行数、注释量在时间维度上进行统计,亦可按各文件类型进行简单的统计,非常方便,适合小团队代码统计分析。当然还有其他优秀仓库代码统计工具,个人觉得不太友好的地方是需要clone下代码配合分析,不适合项目非常多的情况。开发:Python3.x如果项目,分支,用户很多的情况,先按照每个项目分析生产报告,后合并到一个总的excel报告中。

二.实现代码部分

**1.方法一:  先按项目分析生产单个cvs报告,再汇为一个cvs**
 #!/usr/bin/env python
# coding=utf-8
import requests
import os
import json
import threading
import datetime"""统计的时间区间-开始日期"""
git_root_url = "http://blog.csdn.net/"
"""访问Token"""
git_token = "blog.csdn.net"
"""统计结果的存储目录"""
export_path = "./dist"
"""统计的时间区间-开始日期"""
t_from = "2021-06-01"
"""统计的时间区间-结束日期"""
t_end = "2021-07-01"
"""统计的时间区间-开始日期,datetime对象"""
date_from = datetime.datetime.strptime(t_from, '%Y-%m-%d')
"""统计的时间区间-结束日期,datetime对象"""
date_end = datetime.datetime.strptime(t_end, '%Y-%m-%d')"""一个线程锁"""
lock = threading.RLock()user_unknown = {}
user_email_alias_mapping = {}
user_email_name_mapping = {}class GitlabApiCountTrueLeTrue:"""Worker类""""""所有commit的集合,用于去重。这里的重复,可能是代码merge造成的"""total_commit_map = {}"""最终的数据集合"""totalMap = {}def get_projects(self):"""获取所有仓库,并生成报告:return:"""threads = []# 获取服务器上的所有仓库,每个仓库新建一个线程for i in range(1, 3):# 线上gitlab可用,问题是没有全部显示url = '%s/api/v4/projects' \'?private_token=%s&per_page=1000&page=%d&order_by=last_activity_at' % (git_root_url, git_token, i)r1 = requests.get(url)  # 请求url,传入header,ssl认证为falser2 = r1.json()  # 显示json字符串print(r2)for r3 in r2:value = r3['default_branch']last_active_time = r3['last_activity_at']if value is None:continuedays = date_from - \datetime.datetime.strptime(last_active_time, '%Y-%m-%dT%H:%M:%S.%fZ')# 如果project的最后更新时间比起始时间小,则continueif days.days > 1:continueproject_info = ProjectInfo()project_info.project_id = r3['id']project_info.name = r3['name']project_info.project_desc = r3['description']project_info.project_url = r3['web_url']project_info.path = r3['path']# 构件好线程t = threading.Thread(target=self.get_branches, args=(r3['id'], project_info))threads.append(t)# 所有线程逐一开始for t in threads:t.start()# 等待所有线程结束for t in threads:t.join()final_commit_map = {}for key, project in self.totalMap.items():for author_email, detail in project.commit_map.items():exist_detail = final_commit_map.get(detail.author_email)if exist_detail is None:final_commit_map[detail.author_email] = detailelse:exist_detail.total += detail.totalexist_detail.additions += detail.additionsexist_detail.deletions += detail.deletionsfinal_commit_map[detail.author_email] = exist_detailwrite_to_csv("%s/GitStatic_%s/%s_%s.csv" % (export_path, t_from, 'total', t_from), final_commit_map,"extra")returndef get_branches(self, project_id, project_info):"""获取仓库的所有Branch,并汇总commit到一个map里:param project_id::param project_info::return:"""print("进入线程:%d,项目id%d,%s" %(threading.get_ident(), project_id, project_info.project_url))# 线上gitlab可用,问题是没有全部显示url = '%s/api/v4/projects/%s/repository/branches?private_token=%s' % (git_root_url, project_id, git_token)print("start get branch list %d,url=%s" % (project_id, url))r1 = requests.get(url)  # 请求url,传入header,ssl认证为falser2 = r1.json()  # 显示json字符串if not r2:return# branch的map,key为branch名称,value为按照提交者email汇总的,key为email的子map集合branch_map = {}# 主动获取master分支的提交detail_map = self.get_commits(project_id, project_info.project_url, 'master')print("get commits finish project_id=%d branch master" % project_id)if detail_map:branch_map['master'] = detail_mapfor r3 in r2:branch_name = r3['name']if branch_name is None:continue# 如果仓库已经被Merge了,则不再处理if r3['merged']:continuedetail_map = self.get_commits(project_id, project_info.project_url, branch_name)if not detail_map:continue# 将结果放到map里branch_map[branch_name] = detail_mapprint("get commits finish project_id=%d branch %s" %(project_id, branch_name))print("all branch commits finish %d " % project_id)final_commit_map = {}# 遍历branch map,并按照提交者email进行汇总for key, value_map in branch_map.items():for author_email, detail in value_map.items():exist_detail = final_commit_map.get(detail.author_email)if exist_detail is None:final_commit_map[detail.author_email] = detailelse:exist_detail.total += detail.totalexist_detail.additions += detail.additionsexist_detail.deletions += detail.deletionsfinal_commit_map[detail.author_email] = exist_detailif not final_commit_map:returnproject_info.commit_map = final_commit_map# 加锁lock.acquire()# 此对象会被各个线程操作self.totalMap[project_info.project_id] = project_info# 释放锁lock.release()# 汇总完毕后,将结果写入到projectID+日期的csv文件里write_to_csv("%s/GitStatic_%s/project/%s_%d.csv" % (export_path, t_from, project_info.path, project_info.project_id),final_commit_map, project_info.project_url)def get_commits(self, project_id, project_url, branch_name):"""获取指定仓库,指定分支的所有commits,然后遍历每一个commit获得单个branch的统计信息:param project_id::param project_url::param branch_name::return:"""since_date = date_from.strftime('%Y-%m-%dT%H:%M:%S.%fZ')until_date = date_end.strftime('%Y-%m-%dT%H:%M:%S.%fZ')url = '%s/api/v4/projects/%s/repository/commits?page=1&per_page=1000&ref_name=%s&since=%s&until=%s&private_token=%s' % (git_root_url, project_id, branch_name, since_date, until_date, git_token)r1 = requests.get(url)  # 请求url,传入header,ssl认证为falser2 = r1.json()  # 显示json字符串if not r2:returnprint('start get_commits,projectID=%d,branch=%s,url=%s' %(project_id, branch_name, url))detail_map = {}for r3 in r2:commit_id = r3['id']if commit_id is None:continue# 在这里进行commit去重判断if self.total_commit_map.get(commit_id) is None:self.total_commit_map[commit_id] = commit_idelse:continue# 这里开始获取单次提交详情detail = get_commit_detail(project_id, commit_id)if detail is None:continueif detail.total > 5000:# 单次提交大于5000行的代码,可能是脚手架之类生成的代码,不做处理continue# 这里和主流程无关,是用来处理commit记录里的提交者,账号不规范的问题if detail.author_email in user_unknown:print("email %s projectid= %d,branchname,%s,url=%s" % (detail.author_email, project_id, branch_name, project_url))# 根据email纬度,统计提交数据exist_detail = detail_map.get(detail.author_email)if exist_detail is None:detail_map[detail.author_email] = detailelse:exist_detail.total += detail.totalexist_detail.additions += detail.additionsexist_detail.deletions += detail.deletionsdetail_map[detail.author_email] = exist_detailreturn detail_mapdef get_commit_detail(project_id, commit_id):"""获取单个commit的信息:param project_id: 工程ID:param commit_id: commit的id:return: 返回#CommitDetails对象"""url = '%s/api/v4/projects/%s/repository/commits/%s?private_token=%s' \% (git_root_url, project_id, commit_id, git_token)r1 = requests.get(url)  # 请求url,传入header,ssl认证为falser2 = r1.json()  # 显示json字符串# print(json.dumps(r2, ensure_ascii=False))author_name = r2['author_name']author_email = r2['author_email']stats = r2['stats']if 'Merge branch' in r2['title']:returnif stats is None:returntemp_mail = user_email_alias_mapping.get(author_email)if temp_mail is not None:author_email = temp_mailtemp_name = user_email_name_mapping.get(author_email)if temp_name is not None:author_name = temp_nameadditions = stats['additions']deletions = stats['deletions']total = stats['total']# details = {'additions': additions, 'deletions': deletions, 'total': total, 'author_email': author_email,#            'author_name': author_name}details = CommitDetails()details.additions = additionsdetails.deletions = deletionsdetails.total = totaldetails.author_email = author_emaildetails.author_name = author_namereturn detailsdef make_dir_safe(file_path):"""工具方法:写文件时,如果关联的目录不存在,则进行创建:param file_path:文件路径或者文件夹路径:return:"""if file_path.endswith("/"):if not os.path.exists(file_path):os.makedirs(file_path)else:folder_path = file_path[0:file_path.rfind('/') + 1]if not os.path.exists(folder_path):os.makedirs(folder_path)def write_to_csv(file_path, final_commit_map, extra):"""工具方法:将结果写入csv,从#final_commit_map参数解析业务数据:param file_path:文件路径:param final_commit_map:提交参数:param extra:额外数据列:return:"""make_dir_safe(file_path)with open(file_path, 'w') as out:title = '%s,%s,%s,%s,%s,%s' % ("提交人邮箱", "提交人姓名", "总行数", "增加行数", "删除行数", extra)out.write(title + "\n")# print(title)for key, value in final_commit_map.items():var = '%s,%s,%s,%s,%s' % (value.author_email, value.author_name, value.total, value.additions, value.deletions)out.write(var + '\n')# print(var)out.close()class CommitDetails(json.JSONEncoder):"""提交信息的结构体"""author_name = Noneauthor_email = Noneadditions = 0deletions = 0total = 0class ProjectInfo(json.JSONEncoder):"""工程信息的结构体"""project_id = Noneproject_desc = Noneproject_url = Nonepath = Nonename = Nonecommit_map = Noneif __name__ == '__main__':gitlab4 = GitlabApiCountTrueLeTrue()gitlab4.get_projects()

2.方法二: 在代码中分析每个项目,直接汇总为一个cvs。

#!/usr/bin/env python
# coding=utf-8import time
import gitlab
import collections
import pandas as pdgl = gitlab.Gitlab('http://blog.csdn.net/', private_token='blog.csdn.net', timeout=60, api_version='4')start_time = '2021-06-1T00:00:00Z'
end_time = '2021-07-1T23:00:00Z'def get_gitlab():"""gitlab API"""list2 = []projects = gl.projects.list(owned=True, all=True)num = 0for project in projects:num += 1print("查看了%d个项目" % num)for branch in project.branches.list():commits = project.commits.list(all=True, query_parameters={'since': start_time, 'until': end_time,'ref_name': branch.name})for commit in commits:com = project.commits.get(commit.id)pro = {}try:# print(project.path_with_namespace,com.author_name,com.stats["total"])pro["projectName"] = project.path_with_namespacepro["authorName"] = com.author_namepro["branch"] = branch.namepro["additions"] = com.stats["additions"]pro["deletions"] = com.stats["deletions"]pro["commitNum"] = com.stats["total"]list2.append(pro)except:print("有错误, 请检查")return list2def data():"""数据去重key split"""ret = {}for ele in get_gitlab():key = ele["projectName"] + ele["authorName"] + ele["branch"]if key not in ret:ret[key] = eleret[key]["commitTotal"] = 1else:ret[key]["additions"] += ele["additions"]ret[key]["deletions"] += ele["deletions"]ret[key]["commitNum"] += ele["commitNum"]ret[key]["commitTotal"] += 1list1 = []for key, v in ret.items():v["项目名"] = v.pop("projectName")v["开发者"] = v.pop("authorName")v["分支"] = v.pop("branch")v["添加代码行数"] = v.pop("additions")v["删除代码行数"] = v.pop("deletions")v["提交总行数"] = v.pop("commitNum")v["提交次数"] = v["commitTotal"]list1.append(v)print(list1)return list1def csv(csvName):"""csv"""df = pd.DataFrame(data(), columns=["项目名", "开发者", "分支", "添加代码行数", "删除代码行数", "提交总行数", "提交次数"])df.to_csv(csvName, index=False, encoding="utf_8_sig")if __name__ == "__main__":csv("./gitlab.csv")

三.效果展示

1.方法一效果:

2.方法二效果:

##也可以加上发送邮件功能,具体参考我博客其他文章把该模块自行加进去。

Gitlab统计代码的贡献量指标相关推荐

  1. 统计代码量-代码统计工具 CLOC | gitlab统计代码量

    文章目录 一.代码统计工具 CLOC 什么是CLOC? 下载安装 clocs使用 二.gitlab统计代码量 命令行统计 图形化统计 IDE Statistic统计代码插件 一.代码统计工具 CLOC ...

  2. GitLab统计代码提交行数

    用java统计git项目的每个用户变更行数和提交次数--gitlab4j-api - 灰信网(软件开发博客聚合) (freesion.com)https://www.freesion.com/arti ...

  3. PCA计算原特征(指标)对主成分的贡献量/权重

    1. 用PCA反推原始特征对主成分的贡献量或权重 使用python的sklearn包中的pca函数 # -*- coding: utf-8 -*- import os import numpy as ...

  4. cd返回上一 git_git统计代码量脚本

    在工作中,我想统计各开发人员,在一定日期范围内的编码情况,故有了此脚本. 一.先贴图,看看效果: 此图中,用"开发者姓名"代替了实际项目中git的账号名称,输出结果按提交次数倒序输 ...

  5. Git统计一段时间内代码的修改量

    Git统计某段时间的代码修改量 项目每个版本都要统计代码修改量,故作此记录. 首先打开Git Bash,进入代码目录 cd /d/2021live/oes-live 在执行查询命令行 git log ...

  6. git命令统计代码量

    git命令统计代码量 命令行 git log --since=2020-07-01 --until=2020-07-31 --pretty=tformat: --numstat | awk '{ ad ...

  7. git统计历史上某一段时间代码的修改量

    git统计历史上某一段时间代码的修改量 有两个方法,一个是git log的since - until,另外一个是git log after before,例如: git log --after=&qu ...

  8. java注释量统计代码实现

    代码参考:http://blog.sina.com.cn/s/blog_7cf112e00100vnad.html 由于项目需要统计代码注释量,故寻此代码. 由于我们需要统计的代码中,注释有多种形式, ...

  9. Python_EasyGui图形化的安装,配置窗口,简单实战(登录界面,猜数字游戏,模拟记事本,统计代码量)

    文章目录 1.EasyGui的安装 pip安装(python 3.7版本及其之前版本): python 3.7以上版本 手动安装: 2.EasyGui简单使用 窗口属性修改 3.简单实战 窗口猜数字游 ...

最新文章

  1. linux+分配挂载点权限,Linux系统管理(一)——初学者建议
  2. Deepin中设置文件或文件夹权限
  3. 可动态调节参数的线程池实现
  4. 在linux下做源码免杀,Cobaltstrike免杀从源码级到落地思维转变
  5. 嵌入式环境搭建之NFS
  6. 火星舱如何备份oracle_倒计时!火星,我们来了
  7. matlab无法打开.m文件查看
  8. 美国西北大学 计算机工程专业排名,权威首发!2018年USNews美国大学研究生计算机工程专业排名榜单...
  9. MyBatis-Plus 代码生成器报错
  10. bspline怎么使用 python_零基础5个月快速学会Python的秘诀
  11. 1052. Linked List Sorting (25)再
  12. 常见物理性能测试仪器设备档案
  13. 考研一战失利反思与二战的规划
  14. java 自定义泛型方法_Java中自定义泛型方法的使用
  15. 起底硅谷最神秘、估值最高的大数据公司:Palantir
  16. linux修改时区时间est->cst
  17. 2017 LARS:LARGE BATCH TRAINING OF CONVOLUTIONAL NETWORKS (训练大Batch的卷积神经网络)
  18. 破解一个已经连接好的数据库密码
  19. 阿朱推荐的产品经理读物30本书(修订版)
  20. 【Bioconductor系列】利用Bioconductor包进行基因组变异位点注释

热门文章

  1. 重邮2019计算机复试准备工作相关
  2. 【BeetSQL入门学习】
  3. 2021年上半年最可靠的计算机领域投稿资源---着急毕业的同学必看
  4. 微信发照片怎么在服务器上删除,我们发现微信发送原图,确实会暴露位置信息!但你可以这么解决...
  5. 6-23 sdust-Java-可实现多种排序的Book类
  6. 接受投资人投入材料一批_接受投资者投入材料的会计分录
  7. netbackup如何手动获取主机ID证书。
  8. 破解网页文字无法复制的方法
  9. idw matlab 程序_IDW 算法MATLAB 实现
  10. python预测股票估值_4指数加权平均预测未来股票价格