
是一个dashboard, monitoring and metrics for Kubernetes Developer Productivity. 一系列组件,用户监测developer productivity, 这个应该比较通用,略修改可以用于其他repo

与github交互的代码复用于github robot也比较容易


  • Grafana stack: 前端 (用的都是开源组件,里面存的都是配置)

    • InfluxDB: save precalculated metrics
    • Prometheus: save poll-based metrics
    • Grafana: display graphs based on these metrics
    • nginx: proxy all of these services in a single URL
  • SQL base: containing a copy of the issues, events, and PRs in Github repositories. It is used for calculating statistics about developer productivity.
    • Fetcher: fetches Github data and stores in a SQL database (主要的go代码在这,调用github sdk去拉取数据)
    • SQL Proxy: SQL Proxy deployment to Cloud SQL (存的配置)
    • Transform: Transform SQL (Github db) into valuable metrics
  • Other monitoring tools
    • token-counter: Monitors RateLimit usage of your github



  • Issues (including pull-requests)
  • Events (associated to issues)
  • Comments (regular comments and review comments)


  • Compute average time-to-resolution for an issue/pull-request
  • Compute time between label creation/removal: lgtm'd, merged
  • break-down based on specific flags (size, priority, ...)
// ClientInterface describes what a client should be able to do
type ClientInterface interface {RepositoryName() stringFetchIssues(last time.Time, c chan *github.Issue)FetchIssueEvents(issueID int, last *int, c chan *github.IssueEvent)FetchIssueComments(issueID int, last time.Time, c chan *github.IssueComment)FetchPullComments(issueID int, last time.Time, c chan *github.PullRequestComment)


// 1. 入口
1. main -> cobra.Command(root) -> runProgram -> UpdateIssues
// 2. UpdateIssues test-infra/velodrome/fetcher/issues.go
// 调用client FetchIssues, channel 传递 issue model
go client.FetchIssues(latest, c)
for issue := range c {// 2.1 NewIssue(..)UpdateComments(*issue.Number, issueOrm.IsPR, db, client)// and find if we have new eventsUpdateIssueEvents(*issue.Number, db, client)
}// 2.2 UpdateComments test-infra/velodrome/fetcher/comments.go
func UpdateComments(issueID int, pullRequest bool, db *gorm.DB, client ClientInterface) {latest := findLatestCommentUpdate(issueID, db, client.RepositoryName())updateIssueComments(issueID, latest, db, client)if pullRequest {updatePullComments(issueID, latest, db, client)}
}func updateIssueComments(issueID int, latest time.Time, db *gorm.DB, client ClientInterface) {c := make(chan *github.IssueComment, 200)go client.FetchIssueComments(issueID, latest, c)for comment := range c {commentOrm, err := NewIssueComment(issueID, comment, client.RepositoryName())...}
}func updatePullComments(issueID int, latest time.Time, db *gorm.DB, client ClientInterface) {c := make(chan *github.PullRequestComment, 200)go client.FetchPullComments(issueID, latest, c)for comment := range c {commentOrm, err := NewPullComment(issueID, comment, client.RepositoryName())...}
}// 2.3 UpdateIssueEvents test-infra/velodrome/fetcher/issue-events.go
func UpdateIssueEvents(issueID int, db *gorm.DB, client ClientInterface) {...c := make(chan *github.IssueEvent, 500)go client.FetchIssueEvents(issueID, latest, c)for event := range c {eventOrm, err := NewIssueEvent(event, issueID, client.RepositoryName())...}



sql 中的github数据 --> transform --> metrics

func (config *transformConfig) run(plugin plugins.Plugin) error {...// 处理 issue, comment 数据成为point -> influxdbgo Dispatch(plugin, influxdb, fetcher.IssuesChannel,fetcher.EventsCommentsChannel)ticker := time.Tick(time.Hour / time.Duration(config.frequency))for {// Fetch new events from MySQL, push it to pluginsif err := fetcher.Fetch(mysqldb); err != nil {return err}// 处理好的batch point,批量推送到influx dbif err := influxdb.PushBatchPoints(); err != nil {return err}if config.once {break}// 最短多久跑一次<-ticker}
}// Dispatch receives channels to each type of events, and dispatch them to each plugins.
func Dispatch(plugin plugins.Plugin, DB *InfluxDB, issues chan sql.Issue, eventsCommentsChannel chan interface{}) {for {var points []plugins.Pointselect {case issue, ok := <-issues:if !ok {return}points = plugin.ReceiveIssue(issue)case event, ok := <-eventsCommentsChannel:if !ok {return}switch event := event.(type) {case sql.IssueEvent:points = plugin.ReceiveIssueEvent(event)case sql.Comment:points = plugin.ReceiveComment(event)default:glog.Fatal("Received invalid object: ", event)}}for _, point := range points {if err := DB.Push(point.Tags, point.Values, point.Date); err != nil {glog.Fatal("Failed to push point: ", err)}}}


plugin 需要实现Plugin interface

type Plugin interface {ReceiveIssue(sql.Issue) []PointReceiveComment(sql.Comment) []PointReceiveIssueEvent(sql.IssueEvent) []Point

pulgin 是 authorFilter test-infra/velodrome/transform/plugins/count.go

// test-infra/velodrome/transform/plugins/count.go
// 多个plugin wrap 成了一个
func NewCountPlugin(runner func(Plugin) error) *cobra.Command {stateCounter := &StatePlugin{}eventCounter := &EventCounterPlugin{}commentsAsEvents := NewFakeCommentPluginWrapper(eventCounter)commentCounter := &CommentCounterPlugin{}authorLoggable := NewMultiplexerPluginWrapper(commentsAsEvents,commentCounter,)authorLogged := NewAuthorLoggerPluginWrapper(authorLoggable)fullMultiplex := NewMultiplexerPluginWrapper(authorLogged, stateCounter)fakeOpen := NewFakeOpenPluginWrapper(fullMultiplex)typeFilter := NewTypeFilterWrapperPlugin(fakeOpen)authorFilter := NewAuthorFilterPluginWrapper(typeFilter)..;复制代码


=》 Kubernetes Aggregated Failures

testgrid 的前后端, 是jenkins test的metrics 统计, grid的方式,很直观
前端可配置,config.yaml 里面是所有的test
比如 1.6-1.7-kubectl-skew 是其中一个dashborad 下面有多个tab,每个是一个test group, 如 gce-1.6-1-7-cvm

The testgrid site is accessible at The site is
configured by [config.yaml].
Updates to the config are automatically tested and pushed to production.

Testgrid is composed of:

  • A list of test groups that contain results for a job over time.
  • A list of dashboards that are composed of tabs that display a test group
  • A list of dashboard groups of related dashboards.


测试脚本,python 脚本,调用

Test jobs are composed of two things:
1) A scenario to test
2) Configuration options for the scenario.

Three example scenarios are:

  • Unit tests
  • Node e2e tests
  • e2e tests

Example configurations are:

  • Parallel tests on gce
  • Build all platforms

The assumption is that each scenario will be called a variety of times with
different configuration options. For example at the time of this writing there
are over 300 e2e jobs, each run with a slightly different set of options.



source 是创建的来源,比如FlakyJobReporter会对flaky 的 jenkins job 创建issue

queue health

This app monitors the submit queue and produces the chart at

It does this with two components:

  • a poller, which polls the current state of the queue and appends it to a
    historical log.
  • a grapher, which gets the historical log and renders it into charts.


这个这里面最有意思的app,可以作为处理github command的rebot,plugin的设计

  • cmd/hook is the most important piece. It is a stateless server that listens
    for GitHub webhooks and dispatches them to the appropriate handlers.
  • cmd/plank is the controller that manages jobs running in k8s pods.
  • cmd/jenkins-operator is the controller that manages jobs running in Jenkins.
  • cmd/sinker cleans up old jobs and pods.
  • cmd/splice regularly schedules batch jobs.
  • cmd/deck presents a nice view of recent jobs.
  • cmd/phony sends fake webhooks.
  • cmd/tot vends incrementing build numbers.
  • cmd/horologium starts periodic jobs when necessary.
  • cmd/mkpj creates ProwJobs.


deck的前后端,展示rece… prow jobs (third party resource).


核心,listen github webhook,然后分发,主要是交给plugin处理

k8s Bot Commands

k8s-ci-robot and k8s-merge-robot understand several commands. They should all be uttered on their own line, and they are case-sensitive.

Command Implemented By Who can run it Description
/approve mungegithub approvers owners approve all the files for which you are an approver
/approve no-issue mungegithub approvers owners approve when a PR doesn't have an associated issue
/approve cancel mungegithub approvers owners removes your approval on this pull-request
/area [label1 label2 ...] prow label anyone adds an area/<> label(s) if it exists
/remove-area [label1 label2 ...] prow label anyone removes an area/<> label(s) if it exists
/assign [@userA @userB @etc] prow assign anyone Assigns specified people (or yourself if no one is specified). Target must be a kubernetes org member.
/unassign [@userA @userB @etc] prow assign anyone Unassigns specified people (or yourself if no one is specified). Target must already be assigned.
/cc [@userA @userB @etc] prow assign anyone Request review from specified people (or yourself if no one is specified). Target must be a kubernetes org member.
/uncc [@userA @userB @etc] prow assign anyone Dismiss review request for specified people (or yourself if no one is specified). Target must already have had a review requested.
/close prow close authors and assignees closes the issue/PR
/reopen prow reopen authors and assignees reopens a closed issue/PR
/hold prow hold anyone adds the do-not-merge/hold label
/hold cancel prow hold anyone removes the do-not-merge/hold label
/joke prow yuks anyone tells a bad joke, sometimes
/kind [label1 label2 ...] prow label anyone adds a kind/<> label(s) if it exists
/remove-kind [label1 label2 ...] prow label anyone removes a kind/<> label(s) if it exists
/lgtm prow lgtm assignees adds the lgtm label
/lgtm cancel prow lgtm authors and assignees removes the lgtm label
/ok-to-test prow trigger kubernetes org members allows the PR author to /test all
/test all
/test <some-test-name>
prow trigger anyone on trusted PRs runs tests defined in config.yaml
/retest prow trigger anyone on trusted PRs reruns failed tests
/priority [label1 label2 ...] prow label anyone adds a priority/<> label(s) if it exists
/remove-priority [label1 label2 ...] prow label anyone removes a priority/<> label(s) if it exists
/sig [label1 label2 ...] prow label anyone adds a sig/<> label(s) if it exists
@kubernetes/sig-<some-github-team> prow label kubernetes org members adds the corresponding sig label
/remove-sig [label1 label2 ...] prow label anyone removes a sig/<> label(s) if it exists
/release-note prow releasenote authors and kubernetes org members adds the release-note label
/release-note-action-required prow releasenote authors and kubernetes org members adds the release-note-action-required label
/release-note-none prow releasenote authors and kubernetes org members adds the release-note-none label
// 一个叫个plugin处理的例子
func (s *Server) handleGenericComment(ce *github.GenericCommentEvent, log *logrus.Entry) {for p, h := range s.Plugins.GenericCommentHandlers(ce.Repo.Owner.Login, ce.Repo.Name) {go func(p string, h plugins.GenericCommentHandler) {pc := s.Plugins.PluginClientpc.Logger = log.WithField("plugin", p)pc.Config = s.ConfigAgent.Config()pc.PluginConfig = s.Plugins.Config()if err := h(pc, *ce); err != nil {pc.Logger.WithError(err).Error("Error handling GenericCommentEvent.")}}(p, h)}
}// 开启的plugin 列表 (_ ""_ ""_ ""_ ""_ ""_ ""_ ""_ ""_ ""_ ""_ ""_ ""_ ""_ ""_ ""_ ""_ ""


// plugin 类型 plugin 基本是实现 下面的一个或者多个handler
genericCommentHandlers     = map[string]GenericCommentHandler{}
issueHandlers              = map[string]IssueHandler{}
issueCommentHandlers       = map[string]IssueCommentHandler{}
pullRequestHandlers        = map[string]PullRequestHandler{}
pushEventHandlers          = map[string]PushEventHandler{}
reviewEventHandlers        = map[string]ReviewEventHandler{}
reviewCommentEventHandlers = map[string]ReviewCommentEventHandler{}
statusEventHandlers        = map[string]StatusEventHandler{}// 比如lgtm, 做的事主要是 检查"lgtm (cancel)" 看看是不是合法,assign issue, 添加活着删除
// lgtm label, 如果有问题还会创建comment
func init() {plugins.RegisterIssueCommentHandler(pluginName, handleIssueComment)plugins.RegisterReviewEventHandler(pluginName, handleReview)plugins.RegisterReviewCommentEventHandler(pluginName, handleReviewComment)


// 处理 (un)assign xx; (un)cc; assing issue to xxx
func init() {plugins.RegisterIssueCommentHandler(pluginName, handleIssueComment)plugins.RegisterIssueHandler(pluginName, handleIssue)plugins.RegisterPullRequestHandler(pluginName, handlePullRequest)


// 会git checkout 代码,对改动的代码调用golang lint
func init() {plugins.RegisterIssueCommentHandler(pluginName, handleIC)




do-not-merge/hold label


用得最多的plugin, 很多命令都是他实现,比如area,sig,kind...




// 处理 /retest  /ok-to-testfunc init() {plugins.RegisterIssueCommentHandler(pluginName, handleIssueComment)plugins.RegisterPullRequestHandler(pluginName, handlePullRequest)plugins.RegisterPushEventHandler(pluginName, handlePush)
}// 对于pr 的处理
func handlePR(c client, trustedOrg string, pr github.PullRequestEvent) error {author := pr.PullRequest.User.Loginswitch pr.Action {case github.PullRequestActionOpened:// ismember -> buildAll// else welcomecase github.PullRequestActionReopened, github.PullRequestActionSynchronize:// if trusted -> buildAllcase github.PullRequestActionLabeled:// When a PR is LGTMd, if it is untrusted then build it once.}return nil
}// buildAll -> CreateProwJob// 对于 comment:retest的处理
// 会收集status中失败的 ->一个presubmit 结构 -> prowjob
// 参考github 的status api,复制代码


starts periodic jobs when necessary.


controller that manages jobs running in Jenkins, 基本上是从jenkins job的状态sync 到prow job 的status


一个cmd 可以手动 creates ProwJobs


controller that manages jobs running in k8s pods, 从pod的状态sync到prowjob 的status


cleans up old jobs and pods.


regularly schedules batch jobs.






Planter is a container + wrapper script for your bazel builds.
It will run a docker container as the current user that can run bazel builds
in your $PWD.


a deprecated system befor prow




Status Context Migrator
The migratestatus tool is a maintenance utility used to safely switch a repo from one status context to another.
For example if there is a context named "CI tests" that needs to be moved by "CI tests v2" this tool can be used to copy every "CI tests" status into a "CI tests v2" status context and then mark every "CI tests" context as passing and retired. This ensures that no PRs are ever passing when they shouldn't be and doesn't block PRs that should be passing. The copy and retire phases can be run seperately or together at once in move mode.


LogExporter is a tool that runs post-test on our kubernetes test clusters.
It does the job of computing the set of logfiles to be exported (based on the
node type (master/node), cloud provider, and the node's system services),and
then actually exports them to the GCS path provided to it.





Kubetest is the interface for launching and running e2e tests.



Kubernetes Extract Tests/Transform/Load Engine

This collects test results scattered across a variety of GCS buckets,
stores them in a local SQLite database, and outputs newline-delimited JSON files
for import into BigQuery.


deprecated, 参考…

现在不会直接创建jenkins job,而是用prow job

gubernator 前端,应该是test-infra最重要的前端,status check 里面的链接,比如… 也是来自这里



kubernetes test-infra相关推荐

  1. kubernetes pod infra container网络原理

    刚开始接触kubernetes时,对kubelet的--pod-infra-container-image参数非常不能理解,不理解为什么我的业务应用需要依赖一个第三方的容器: 上文入门级kuberne ...

  2. kubernetes中infra容器的理解

    1. infra容器和用户容器的关系 1.1 pause 是k8s的基础设施的一部分,pod中其他容器通过pause容器跟其他pod进行通信. 1.2 pod中其他容器跟pause容器共享命名空间 1 ...

  3. OpenShift 4 之让Route只运行在集群中Infra节点

    很多OpenShift的用户都非常喜欢其自带的Route功能,Route为外部用户提供了访问Pod的负载均衡功能,它要比Kubernetes缺省提供的Ingress功能强大很多.有关介绍可参见< ...

  4. oracle 其他用户表主键,Oracle中查看所有的表,用户表,列名,主键,外键

    在Oracle中查看所有的表: select * from tab/dba_tables/dba_objects/cat; 看用户建立的表 : select table_name from user_ ...

  5. 在线CentOS7镜像源 ...

  6. 在Kubernetes v1.8中使用RBAC

    Kubernetes 1.8一个重要里程碑是推出了基于角色的访问控制(RBAC)授权,在这个版本中被提升为GA.RBAC是一种控制访问Kubernetes API的机制,因为在1.6中推出beta版, ...

  7. 谷歌大神为你解释Kubernetes, 微服务和容器化

    来自谷歌云平台(Google Cloud Platform)的开发者布道师 Ray Tsang 和 Bret McGowen 在 SpringOne2GX 大会上分享了谷歌的 Kubernetes 的 ...

  8. k8s网络架构图_唯品会基于Kubernetes(k8s)网络方案演进

    VIP PaaS在接近两年时间里,基于kubernetes主要经历四次网络方案的变迁: 1. kubernetes + flannel 2. 基于Docker libnetwork的网络定制 3. k ...

  9. 阿里巴巴超大规模 Kubernetes 基础设施运维体系

    作者:仔仁.墨封.光南 序言 ASI:Alibaba Serverless infrastructure,阿里巴巴针对云原生应用设计的统一基础设施.ASI 基于阿里云公共云容器服务 ACK之上,支撑集 ...

  10. 课时 25:Kubernetes 网络模型进阶(叶磊)

    本文将主要分享以下五个方面的内容: Kubernetes 网络模型来龙去脉 Pod 究竟如何上网? Service 究竟怎么工作? 啥?负载均衡还分内部外部? 思考时间 Kubernetes 网络模型 ...


  1. 编写一个函数,实现两个字符串的连接功能
  2. 渲染管道(2)应用阶段“功能”
  3. primer premier 5 64位_王者荣耀:必出破军的3位英雄,玩他们不出破军?说明你是个菜鸟!...
  4. 从Text文本中读值插入到数据库中
  5. 经典数值优化算法--专题学习
  6. 留言板小程序开发笔记3
  7. ionic4 组件的使用(一)
  8. Android开发笔记(四十一)Service的生命周期
  9. 从 json 文件到炫酷动画 - Lottie 实现思路和源码分析
  10. win10 安装oracle11g R2的64位版本
  11. 小米手机下载二维码APP
  12. 【FinE】债券久期和凸性
  13. 机器学习的十大图像分类数据集
  14. 安装西门子TIA Portal V15.1提示先决条件不足如何处理?
  15. oracle数据库综合练习题及答案写法
  16. jena java,Apache Jena Java工具箱
  17. 反爬虫SSL TLS指纹识别和绕过JA3算法.md
  18. Python判断素数(质数)——循换结构、控制及else循环扩展模式的实践
  19. 百度-视觉技术部招聘计算机视觉相关算法实习生
  20. [转]社区运营必读之天涯志


  1. CAN SPLIT功能作用和SPLIT电容作用
  2. WC2019 冬眠记
  3. 科罗拉多大学波尔得分校计算机科学,CU Boulder的Computer Science「科罗拉多大学波德分校计算机科学系」...
  4. 数组<小罗爱c语言>
  5. Android Studio 自定义View命名空间报错的解决方法
  6. 诺贝尔物理学奖变身“理综”奖:乍看颁给全球变暖研究,其实背后通用理论模型影响机器学习...
  7. 从深度学习计算过程来分析深度学习工作站\服务器的硬件配置
  8. Apollo入门课程04-感知
  9. Hadoop学习笔记01:学习Linux常用操作命令
  10. 能够改变一生的5句话