Data engineers are usually more familiar with development tools like Git than data analysts, data scientists and ML engineers. In the last couple of years as more and more of the non-engineering jobs involve writing code, source control systems like Git have seen a hike in adoption. Although there is a hike in adoption, not enough of the new adopters are good with it.

数据工程师通常比数据分析师,数据科学家和ML工程师更熟悉Git等开发工具。 在过去的几年中,随着越来越多的非工程工作涉及编写代码,像Git这样的源代码控制系统的采用率有所提高。 尽管采用率有所提高,但新采用者对它的评价还不够。

Non-engineering teams often face issues with using Git when the employer doesn’t pay enough attention to training their team or when they don’t get enough support from their engineering counterpart. The everyone-on-their-own model is not sustainable. Training is a very important part of the development and growth of a team.

当雇主对培训团队没有给予足够重视或当他们的工程部门没有足够的支持时,非工程团队经常会遇到使用Git的问题。 每个人拥有自己的模型是不可持续的。 培训是团队发展和成长的重要组成部分。

Non-engineering teams have a different mindset when it comes to writing code. I have tried to summarise the ideas in another article. Here, we’ll talk, not about the mistakes that you can make by running the wrong Git command but by making more fundamental mistakes like working directly on the master branch, making very large commits and so on.

非工程团队在编写代码时有不同的心态。 我试图在另一篇文章中总结这些想法。 在这里,我们将讨论的不是通过运行错误的Git命令而可能导致的错误,而是在解决一些更基本的错误,例如直接在master分支上工作,进行非常大的提交等等。

当主人是你唯一的分支 (When master is your only branch)

You’ve essentially missed the whole concept behind Git (or version control systems, in general) if you are part of a team and the only branch you have is master. In this case, everyone has a local master branch and commits to that branch and then pushes the changes to the remote master to find out there are a zillion conflicts. Avoid that, use branching — that’s where the fun lies.

如果您是团队的一员,而您唯一的分支是master,那么您实际上就错过了Git(或版本控制系统)背后的整个概念。 在这种情况下,每个人都有一个本地master分支并提交到该分支,然后将更改推送到远程master来发现存在无数冲突。 避免这种情况,使用分支-这就是乐趣所在。

There’s a great summary of the reasons why not to use master branch for development written by knittl on StackOverflow.


the master branch should represent the ‘stable’ history of your code. use branches to experiment with new features, implement them, and when they have matured enough you can merge them back to master.

master分支应代表代码的“稳定”历史记录。 使用分支来试验新功能并加以实施,当它们足够成熟时,您可以将其合并回master。

that way code in master will almost always build without problems, and can be mostly used directly for releases.


不遵循分支方法 (Not following a branching methodology)

Once you have understood the basic concept of branches and are able to create branches and merge them into master, the next issue you’ll encounter is more from a branch management perspective. The most popular and valuable of all the branching methodologies is gitflow. I have written about it here from a Data Engineer’s perspective. The same principles apply to anyone whose work involves writing queries in SQL or something similar like Python scripts using pandas, numpy, scipy and so on.

一旦了解了分支的基本概念并能够创建分支并将其合并为master,下一个问题将是从分支管理的角度出发。 在所有分支方法中,最受欢迎和最有价值的是gitflow。 我是从数据工程师的角度写的。 相同的原则适用于涉及使用SQL编写查询或使用pandas,numpy,scipy等类似Python脚本编写查询的任何人。

Gitflow includes three levels of branching with master, develop and feature as the three different levels. There are exceptions to these three but you don’t have to get into that in the beginning. Read more about it here —

Gitflow包括三个层次与支化的masterdevelopfeature为三个不同的等级。 这三个例外,但您不必一开始就讨论。 在这里了解更多信息-

一次提交一千个文件 (Committing a thousand files in a single commit)

Technically, this isn’t a mistake, it is just a very bad practice. Apart from the initial commit in your repo, your commits should contain the changes only which can be clearly defined in a single line commit message. Committing all uncommitted changes cannot be a good commit message with a thousand new files.

从技术上讲,这不是一个错误,这只是一个非常糟糕的做法。 除了回购中的初始提交外,您的提交应仅包含可以在单行提交消息中明确定义的更改。 提交所有未提交的更改不能成为包含一千个新文件的良好提交消息。

Commit small, commit often.


Go by this rule and you’ll be fine. The idea is to push the smallest complete unit of work. There’s a great article covering this in-depth.

遵循这个规则,您会没事的。 这个想法是推动最小的完整工作单元。 有一篇很棒的文章对此进行了深入探讨。

Here are a couple of other great articles that I enjoyed and found us


  • Git Commit Best Practices — Perforce

    Git提交最佳实践— Perforce

  • Git Best Practices — Seth Robertson

    Git最佳实践— 赛斯·罗伯森 ( Seth Robertson)

结论 (Conclusion)

Just like other widely adopted technologies like SQL, Python, Javascript, Git is probably something you can’t do without if you have anything to do with writing code — in whatever capacity. You’ll definitely have to deal with version control systems. Hence, it is best if you spend some time understanding what it is and how it works.

就像其他被广泛采用的技术(例如SQL,Python,Javascript)一样,如果您与编写代码有任何关系,无论使用何种容量,Git都是您不可或缺的。 您肯定必须处理版本控制系统。 因此,最好是花一些时间了解它的含义及其工作方式。

You’d probably not need the fancier and trickier Git commands initially as you’ll get by fine with just the basic ones. There are several articles like this which talk about the mistake you make by running the wrong Git commands. Just go over some of these articles (1, 2, 3, 4, 5) and you should be sorted.

最初,您可能不需要更复杂,更棘手的Git命令,因为只要使用基本命令就可以了。 像这样的几篇文章讨论了您通过运行错误的Git命令而犯的错误。 刚去了一些文章( 1 , 2 , 3 , 4 , 5 ),你应该进行排序。


