Python Coding Guidelines

Python Coding Guidelines 12/14/07 Written by Rob Knight for the Cogent project Table of Contents Why have coding guidelines? What should I call my variables? What are the naming conventions? How do I organize my modules (source files)? How should I write comments? How should I format my code? How should I write my unit tests? Are there any handy Python idioms I should be using? (separate document) Why have coding guidelines?

As project size increases, consistency increases in importance. This project will start with isolated tasks, but we will integrate the pieces into shared libraries as they mature. Unit testing and a consistent style are critical to having trusted code to integrate. Also, guesses about names and interfaces will be correct more often.

Good code is useful to have around. Code written to these standards should be useful for teaching purposes, and also to show potential employers during interviews. Most people are reluctant to show code samples: having good code that you've written and tested will put you well ahead of the crowd. Also, reusable components make it much easier to change requirements and perform new analyses.

What should I call my variables?

Choose the name that people will most likely guess. Make it descriptive, but not too long: curr_record is better than c, or curr, or current_genbank_record_from_database. Part of the reason for having coding guidelines is so that everyone is more likely to guess the same way. Who knows: in few months, the person doing the guessing might be you.

Good names are hard to find. Don't be afraid to change names except when they are part of interfaces that other people are also using. It may take some time working with the code to come up with reasonable names for everything: if you have unit tests, it's easy to change them, especially with global search and replace.

Use singular names for individual things, plural names for collections. For example, you'd expect self.Name to hold something like a single string, but self.Names to hold something that you could loop through like a list or dict. Sometimes the decision can be tricky: is self.Index an int holding a positon, or a dict holding records keyed by name for easy lookup? If you find yourself wondering these things, the name should probably be changed to avoid the problem: try self.Position or self.LookUp.

Don't make the type part of the name. You might want to change the implementation later. Use Records rather than RecordDict or RecordList, etc. Don't use Hungarian Notation either (i.e. where you prefix the name with the type).

Make the name as precise as possible. If the variable is the name of the input file, call it infile_name, not input or file (which you shouldn't use anyway, since they're keywords), and not infile (because that looks like it should be a file object, not just its name).

Use result to store the value that will be returned from a method or function. Use data for input in cases where the function or method acts on arbitrary data (e.g. sequence data, or a list of numbers, etc.) unless a more descriptive name is appropriate.

One-letter variable names should only occur in math functions or as loop iterators with limited scope. Limited scope covers things like for k in keys: print k, where k survives only a line or two. Loop iterators should refer to the variable that they're looping through: for k in keys, i in items, or for key in keys, item in items. If the loop is long or there are several 1-letter variables active in the same scope, rename them.

Limit your use of abbreviations. A few well-known abbreviations are OK, but you don't want to come back to your code in 6 months and have to figure out what sptxck2 is. It's worth it to spend the extra time typing species_taxon_check_2, but that's still a horrible name: what's check number 1? Far better to go with something like taxon_is_species_rank that needs no explanation, especially if the variable is only used once or twice.

Acceptable abbreviations. The following abbreviations can be considered well-known and used with impunity:

full abbreviated

alignment	aln
archaeal	arch
auxillary	aux
bacterial	bact
citation	cite
current	curr
database	db
dictionary	dict
directory	dir
end of file	eof
eukaryotic	euk
frequency	freq
expected	exp
index	idx
input	in
maximum	max
minimum	min
mitochondrial	mt
number	num
observed	obs
original	orig
output	out
phylogeny	phylo
previous	prev
protein	prot
record	rec
reference	ref
sequence	seq
standard deviation	stdev
statistics	stats
string	str
structure	struct
temporary	temp
taxonomic	tax
variance	var

Always use from module import Name, Name2, Name3... syntax instead of import module or from module import *. This is more efficient, reduces typing in the rest of the code, and makes it much easier to see name collisions and to replace implementations.

What are the naming conventions?

Summary of Naming Conventions
Type	Convention	Example
function	action_with_underscores	find_all
variable	noun_with_underscores	curr_index
constant	NOUN_ALL_CAPS	ALLOWED_RNA_PAIRS
class	MixedCaseNoun	RnaSequence
public property	MixedCaseNoun	IsPaired
private property	_noun_with_leading_underscore	_is_updated
public method	mixedCaseExceptFirstWordVerb	stripDegenerate
private method	_verb_with_leading_underscore	_check_if_paired
really private data	__two_leading_underscores	__delegator_object_ref
parameters that match properties	SameAsProperty	def __init__(data, Alphabet=None)
factory function	MixedCase	InverseDict
module	lowercase_with_underscores	unit_test
global variables	gMixedCaseWithLeadingG	no examples in evo - should be rare!

It is important to follow the naming conventions because they make it much easier to guess what a name refers to. In particular, it should be easy to guess what scope a name is defined in, what it refers to, whether it's OK to change its value, and whether its referent is callable. The following rules provide these distinctions.

lowercase_with_underscores for modules and internal variables (including function/method parameters). Exception: in __init__, any parameters that will be used to intitialize properties of the object should have exactly the same spelling, including case, as the property. This lets you use a dict with the right field names as **kwargs to initialize the data easily.

MixedCase for classes and public properties, and for factory functions that act like additional constructors for a class.

mixedCaseExceptFirstWord for public methods and functions.

_lowercase_with_leading_underscore for private functions, methods, and properties.

__lowercase_with_two_leading_underscores for private properties and functions that must not be overwritten by a subclass.

CAPS_WITH_UNDERSCORES for named constants.

gMixedCase (i.e. mixed case prefixed with 'g') for globals. Globals should be used extremely rarely and with caution, even if you sneak them in using the Singleton pattern or some similar system.

Underscores can be left out if the words read OK run together. infile and outfile rather than in_file and out_file; infile_name and outfile_name rather than in_file_name and out_file_name or infilename and outfilename (getting too long to read effortlessly).

How do I organize my modules (source files)?

The first line of each file shoud be #!/usr/bin/env python. This makes it possible to run the file as a script invoking the interpreter implicitly, e.g. in a CGI context.

Next should be the docstring with a description. If the description is long, the first line should be a short summary that makes sense on its own, separated from the rest by a newline.

All code, including import statements, should follow the docstring. Otherwise, the docstring will not be recognized by the interpreter, and you will not have access to it in interactive sessions (i.e. through obj.__doc__) or when generating documentation with automated tools.

Import built-in modules first, followed by third-party modules, followed by any changes to the path and your own modules. Especially, additions to the path and names of your modules are likely to change rapidly: keeping them in one place makes them easier to find.

Next should be authorship information. This information should follow this format:

__author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell"__copyright__ = "Copyright 2007, The Cogent Project"__credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley", "Matthew Wakefield"]__license__ = "GPL"__version__ = "1.0.1"__maintainer__ = "Rob Knight"__email__ = "rob@spot.colorado.edu"__status__ = "Production" Status should typically be one of "Prototype", "Development", or "Production". __maintainer__ should be the person who will fix bugs and make improvements if imported. __credits__ differs from __author__ in that __credits__ includes people who reported bug fixes, made suggestions, etc. but did not actually write the code. Example of module structure: #!/usr/bin/env python"""Provides NumberList and FrequencyDistribution, classes for statistics.NumberList holds a sequence of numbers, and defines several statisticaloperations (mean, stdev, etc.) FrequencyDistribution holds a mapping fromitems (not necessarily numbers) to counts, and defines operations such asShannon entropy and frequency normalization."""from math import sqrt, log, efrom random import choice, randomfrom Utils import indices__author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell"__copyright__ = "Copyright 2007, The Cogent Project"__credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley", "Matthew Wakefield"]__license__ = "GPL"__version__ = "1.0.1"__maintainer__ = "Rob Knight"__email__ = "rob@spot.colorado.edu"__status__ = "Production"class NumberList(list): pass #much code deletedclass FrequencyDistribution(dict): pass #much code deletedif __name__ == '__main__': #code to execute if called from command-line pass #do nothing - code deleted#use this either for a simple example of how to use the module,#or when the module can meaningfully be called as a script. How should I write comments?

Always update the comments when the code changes. Incorrect comments are far worse than no comments, since they are actively misleading.

Comments should say more than the code itself. Examine your comments carefully: they may indicate that you'd be better off rewriting your code (especially, renaming your variables and getting rid of the comment.) In particular, don't scatter magic numbers and other constants that have to be explained through your code. It's far better to use variables whose names are self-documenting, especially if you use the same constant more than once. Also, think about making constants into class or instance data, since it's all too common for 'constants' to need to change or to be needed in several methods.

Wrong: win_size -= 20 # decrement win_size by 20OK: win_size -= 20 # leave space for the scroll barRight: self._scroll_bar_size = 20 win_size -= self._scroll_bar_size

Use comments starting with #, not strings, inside blocks of code. Python ignores real comments, but must allocate storage for strings (which can be a performance disaster inside an inner loop).

Start each method, class and function with a docstring using triple double quotes ("""). The docstring should start with a 1-line description that makes sense by itself (many automated formatting tools, and the IDE, use this). This should be followed by a blank line, followed by descriptions of the parameters (if any). Finally, add any more detailed information, such as a longer description, notes about the algorithm, detailed notes about the parameters, etc. If there is a usage example, it should appear at the end. Make sure any descriptions of parameters have the correct spelling, case, etc.

For example:

def __init__(self, data, Name='', Alphabet=None): """Returns new Sequence object with specified data, Name, Alphabet. data: The sequence data. Should be a sequence of characters. Name: Arbitrary label for the sequence. Should be string-like. Alphabet: Set of allowed characters. Should support 'for x in y' syntax. None by default. Note: if Alphabet is None, performs no validation. """

Always update the docstring when the code changes. Like outdated comments, outdated docstrings can waste a lot of time.

How should I format my code?

Use 4 spaces for indentation. Do not use tabs (set your editor to convert tabs to spaces). The behavior of tabs is not predictable across platforms, and will cause syntax errors. Several people have been bitten by this already.

Lines must not be longer than 79 characters. Long lines are inconvenient in some editors, can be confusing when broken up for printing, and make code snippets difficult to email (especially if your email client or the recipients 'helpfully' wraps the lines automatically). Use \ for line continuation. Note that there cannot be whitespace after the \.

Blank lines should be used to highlight class and method definitions. Separate class definitions by two blank lines. Separate methods by one blank line.

Be consistent with the use of whitespace around operators. Inconsistent whitespace makes it harder to see at a glance what is grouped together.

Good: ((a+b)*(c+d))OK: ((a + b) * (c + d))Bad: ( (a+ b) *(c +d ))

Don't put whitespace after delimiters or inside slicing delimiters. Whitespace here makes it harder to see what's associated.

Good: (a+b), d[k]Bad: ( a+b ), d [k], d[ k] How should I write my unit tests?

Every line of code must be tested. For scientific work, bugs don't just mean unhappy users who you'll never actually meet: they mean retracted publications and ended careers. It is critical that your code be fully tested before you draw conclusions from results it produces.

Tests are the opportunity to invent the interfaces you wish you had. Write the test for a method before you write the method: often, this helps you figure out what you would want to call it and what parameters it should take. Think of the tests as a story about what you wish the interface looked like. It's OK to write the tests a few methods at a time, and to change them as your ideas about the interface change. However, you shouldn't change them once you've told other people what the interface is.

Never treat prototypes as production code. It's fine to write prototype code without tests to try things out, but when you've figured out the algorithm and interfaces you must rewrite it with tests to consider it finished. Often, this helps you decide what interfaces and functionality you actually need and what you can get rid of.

Write a little at a time. For production code, write a couple of tests, then a couple of methods, then a couple more tests, then a couple more methods, then maybe change some of the names or generalize some of the functionality. If you have a huge amount of code where 'all you have to do is write the tests', you're probably closer to 30% done than 90%. Testing vastly reduces the time spent debugging, since whatever went wrong has to be in the code you wrote since the last test suite.

Always run the test suite when you change anything. Even if a change seems trivial, it will only take a couple of seconds to run the tests and then you'll be sure. This can eliminate long and frustrating debugging sessions where the change turned out to have been made long ago, but didn't seem significant at the time.

Use the unittest framework with tests in a separate file for each module. Name the test file test_module_name.py. Keeping the tests separate from the code reduces the temptation to change the tests when the code doesn't work, and makes it easy to verify that a completely new implementation presents the same interface (behaves the same) as the old.

Use evo.unit_test if you are doing anything with floating point numbers or permutations (use assertFloatEqual). Do not try to compare floating point numbers using assertEqual if you value your sanity. assertFloatEqualAbs and assertFloatEqualRel can specifically test for absolute and relative differences if the default behavior is not giving you what you want. Similarly, assertEqualItems, assertSameItems, etc. can be useful when testing permutations.

Test the interface of each class in your code by defining at least one TestCase with the name ClassNameTests. This should contain tests for everything in the public interface.

If the class is complicated, you may want to define additional tests with names ClassNameTests_test_type. These might subclass ClassNameTests in order to share setUp methods, etc.

Tests of private methods should be in a separate TestCase called ClassNameTests_private. Private methods may change if you change the implementation. It is not required that test cases for private methods pass when you change things (that's why they're private, after all), though it is often useful to have these tests for debugging.

Test all the methods in your class. You should assume that any method you haven't tested has bugs. The convention for naming tests is test_method_name. Any leading and trailing underscores on the method name can be ignored for the purposes of the test; however, all tests must start with the literal substring test for unittest to find them. If the method is particularly complex, or has several discretely different cases you need to check, use test_method_name_suffix, e.g. test_init_empty, test_init_single, test_init_wrong_type, etc. for testing __init__.

Write good docstrings for all your test methods. When you run the test with the -v command-line switch for verbose output, the docstring for each test will be printed along with ...OK or ...FAILED on a single line. It is thus important that your docstring is short and descriptive, and makes sense in this context.

Good docstrings:NumberList.var should raise ValueError on empty or 1-item listNumberList.var should match values from R if list has >2 itemsNumberList.__init__ should raise error on values that fail float()FrequencyDistribution.var should match corresponding NumberList varBad docstrings:var should calculate variance #lacks class name, not descriptiveCheck initialization of a NumberList #doesn't say what's expectedTests of the NumberList initialization. #ditto

Module-level functions should be tested in their own TestCase, called modulenameTests. Even if these functions are simple, it's important to check that they work as advertised.

It is much more important to test several small cases that you can check by hand than a single large case that requires a calculator. Don't trust spreadsheets for numerical calculations -- use R instead!

Make sure you test all the edge cases: what happens when the input is None, or '', or 0, or negative? What happens at values that cause a conditional to go one way or the other? Does incorrect input raise the right exceptions? Can your code accept subclasses or superclasses of the types it expects? What happens with very large input?

To test permutations, check that the original and shuffled version are different, but that the sorted original and sorted shuffled version are the same. Make sure that you get different permutations on repeated runs and when starting from different points.

To test random choices, figure out how many of each choice you expect in a large sample (say, 1000 or a million) using the binomial distribution or its normal approximation. Run the test several times and check that you're within, say, 3 standard deviations of the mean.

Example of a unittest test module structure:

#!/usr/bin/env python"""Tests NumberList and FrequencyDistribution, classes for statistics."""from unittest import TestCase, main #use my unittestfp instead for floating pointfrom statistics import NumberList, FrequencyDistribution__author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell"__copyright__ = "Copyright 2007, The Cogent Project"__credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley", "Matthew Wakefield"]__license__ = "GPL"__version__ = "1.0.1"__maintainer__ = "Rob Knight"__email__ = "rob@spot.colorado.edu"__status__ = "Production"class NumberListTests(TestCase): #must remember to subclass TestCase """Tests of the NumberList class.""" def setUp(self): """Define a few standard NumberLists.""" self.Null = NumberList() #test empty init self.Empty = NumberList([]) #test init with empty sequence self.Single = NumberList([5]) #single item self.Zero = NumberList([0]) #single, False item self.Three = NumberList([1,2,3]) #multiple items self.ZeroMean = NumberList([1,-1]) #items nonzero, mean zero self.ZeroVar = NumberList([1,1,1]) #items nonzero, mean nonzero, variance zero #etc. These objects shared by all tests, and created new each time a method #starting with the string 'test' is called (i.e. the same object does not #persist between tests: rather, you get separate copies). def test_mean_empty(self): """NumberList.mean() should raise ValueError on empty object""" for empty in (self.Null, self.Empty): self.assertRaises(ValueError, empty.mean) def test_mean_single(self): """NumberList.mean() should return item if only 1 item in list""" for single in (self.Single, self.Zero): self.assertEqual(single.mean(), single[0]) #other tests of mean def test_var_failures(self): """NumberList.var() should raise ZeroDivisionError if <2 items""" for small in (self.Null, self.Empty, self.Single, self.Zero): self.assertRaises(ZeroDivisionError, small.var) #other tests of var #tests of other methodsclass FrequencyDistributionTests(TestCase): pass #much code deleted#tests of other classesif __name__ == '__main__': #run tests if called from command-line main()

本文转sinojelly51CTO博客，原文链接：http://blog.51cto.com/sinojelly/278154，如需转载请自行联系原作者

Python Coding Guidelines相关推荐

python coding style guide 的快速落地实践——业内python 编码风格就pep8和谷歌可以认作标准...
python coding style guide 的快速落地实践机器和人各有所长,如coding style检查这种可自动化的工作理应交给机器去完成,故发此文帮助你在几分钟内实现coding st ...
python coding utf-8_【转】怎么在Python里使用UTF-8编码
基本概念在Python里有两种类型的字符串类型:字节字符串和Unicode的字符串,一个字节字符串就是一个包含字节列表. 当需要的时候,Python根据电脑默认的locale设置将字节转化成字符. ...
IDEA安装“Alibaba Java Coding Guidelines”插件
Alibaba Java Coding Guidelines 中文名为"阿里巴巴Java代码规约扫描" 第一步:根据"File - Settings - Plugins ...
IDEA/Eclipse安装 Alibaba Java Coding Guidelines 插件
为了让开发者更加方便.并且达到快速规范代码格式的目的并实行起来,阿里巴巴基于<阿里巴巴Java开发规约>手册内容,研发了一套自动化的IDE检测插件(IDEA.Eclipse).它就是Ali ...
alibaba java_阿里巴巴JAVA开发手册最新版插件Alibaba Java Coding Guidelines安装及使用...
阿里巴巴JAVA开发手册最新版插件Alibaba Java Coding Guidelines安装及使用发布时间:2018-04-20 14:40, 浏览次数:887 , 标签: JAVA Alib ...
阿里巴巴Java开发手册 (Alibaba Java Coding Guidelines)
参考资料: 阿里巴巴Java开发手册 https://www.cntofu.com/book/78/index.html 一.编程规约 ##(一)命名风格 [强制]代码中的命名均不能以下划线或美元符 ...
python coding style guide 的快速落地实践
python coding style guide 的快速落地实践机器和人各有所长,如coding style检查这种可自动化的工作理应交给机器去完成,故发此文帮助你在几分钟内实现coding st ...
阿里插件机制android,Android插件 - 阿里規約 Alibaba Java Coding Guidelines
因為項目開發完畢之后,涉及到項目的優化問題,除了資源的共性抽取和方法的抽取之外,還需規范我們的代碼,所以大牛推薦了Alibaba Java Coding Guidelines(阿里規約插件),本以為出 ...
Alibaba Java Coding Guidelines安装使用教程
P3c是阿里代码规范检查工具,该插件由阿里巴巴 P3C 项目组研发. 代码已经开源,GitHub:https://github.com/alibaba/p3c 阿里介绍文章:https://mp.we ...

Python Coding Guidelines

Python Coding Guidelines相关推荐

最新文章

热门文章