python 描述性分析的包_pasty描述性统计包

http://patsy.readthedocs.io/en/latest/overview.html

pasty功能：线性分析里因素分析(方差分析)

and Patsy takes care of building appropriate matrices. Furthermore, it:

Allows data transformations to be specified using arbitrary Python code: instead of x, we could have written log(x), (x > 0), or even log(x) if x > 1e-5 else log(1e-5),

Provides a range of convenient options for coding categorical variables, including automatic detection and removal of redundancies,

Knows how to apply ‘the same’ transformation used on original data to new data, even for tricky transformations like centering or standardization (critical if you want to use your model to make predictions),

Has an incremental mode to handle data sets which are too large to fit into memory at one time,

Provides a language for symbolic, human-readable specification of linear constraint matrices,

Has a thorough test suite (>97% statement coverage) and solid underlying theory, allowing it to correctly handle corner cases that even R gets wrong, and

Features a simple API for integration into statistical packages.

pasty不能做的模型分析，只是提供描述性统计的高级接口

What Patsy won’t do is, well, statistics — it just lets you describe models in general terms. It doesn’t know or care whether you ultimately want to do linear regression, time-series analysis, or fit a forest of decision trees, and it certainly won’t do any of those things for you — it just gives a high-level language for describing which factors you want your underlying model to take into account. It’s not suitable for implementing arbitrary non-linear models from scratch; for that, you’ll be better off with something like Theano, SymPy, or just plain Python. But if you’re using a statistical package that requires you to provide a raw model matrix, then you can use Patsy to painlessly construct that model matrix; and if you’re the author of a statistics package, then I hope you’ll consider integrating Patsy as part of your front-end.

Patsy’s goal is to become the standard high-level interface to describing statistical models in Python, regardless of what particular model or library is being used underneath.

pasty函数可以自定义

I()让+表示算术模式加号

Arithmetic transformations are also possible, but you’ll need to “protect” them by wrapping them in I(), so that Patsy knows that you really do want + to mean addition:

In [23]: dmatrix("I(x1 + x2)", data) # compare to "x1 + x2"

Out[23]: DesignMatrix with shape (8, 2)

Intercept I(x1 + x2)

1 1.66083

1 0.81076

1 1.12278

1 3.69517

1 2.62860

1 -0.85560

1 1.39395

1 0.18232

Terms:

'Intercept' (column 0)

'I(x1 + x2)' (column 1)

In [24]: dmatrix("I(x1 + x2)", {"x1": np.array([1, 2, 3]), "x2": np.array([4, 5, 6])})

Out[24]: DesignMatrix with shape (3, 2)

Intercept I(x1 + x2)

1 5

1 7

1 9

Terms:

'Intercept' (column 0)

'I(x1 + x2)' (column 1)

In [25]: dmatrix("I(x1 + x2)", {"x1": [1, 2, 3], "x2": [4, 5, 6]})

Out[25]: DesignMatrix with shape (6, 2)

Intercept I(x1 + x2)

1 1

1 2

1 3

1 4

1 5

1 6

Terms:

'Intercept' (column 0)

'I(x1 + x2)' (column 1)

# ---------------------------------------------------------------

def anova_statsmodels():

''' do the ANOVA with a function '''

# Get the data

data = pd.read_csv('galton.csv')

#sex是性别，属于分类变量

anova_results = anova_lm(ols('height~C(sex)', data).fit())

print('\nANOVA with "statsmodels" ------------------------------')

print(anova_results)

return anova_results['F'][0]

python 描述性分析的包_pasty描述性统计包相关推荐

Python中文分析：《射雕英雄传》统计人物出场次数、生成词云图片文件、根据人物关系做社交关系网络和其他文本分析
前言 python中文分析作业,将对<射雕英雄传>进行中文分析,统计人物出场次数.生成词云图片文件.根据人物关系做社交关系网络和其他文本分析等. 对应内容 1.中文分词,统计人物出场次数, ...
数据分析报告——经典统计量的描述性分析：平均数方差、偏度峰度
描述性分析一.数据报告二.变量说明表三.统计量描述位置的度量 1. 平均数 2. 中位数和分位数 3. 两者的对比 4. 最大值和最小值变异程度的度量 1. 方差和标准差 2. 极差和四分位 ...
【沃顿商学院学习笔记】商业分析——Customer Analytics：01 描述性分析 Descriptive Analytics
商业进阶--描述性分析本章主要是从描述性分析的三个层面来进行学习,主要包含探索性研究 Exploratory Research.描述性研究Descriptive Research和因果性研究Caus ...
一文看懂描述性分析、诊断性分析、预测性分析、指导性分析
Gartner(象限)将商业数据分析定义为:描述性分析.诊断性分析.预测性分析.指导性分析描述性分析.诊断性分析.预测性分析.指导性分析是数据分析的四个基本方向. 描述性分析描述性分析是数据分析的 ...
python 生存分析_用python教程进行生存分析何时何地
python 生存分析机器学习 , 编程 , 统计 (Machine Learning, Programming, Statistics) Author(s): Pratik Shukla 作者:P ...
python数据分析的四阶段以及电商数据描述性分析和探索性分析
目录数据分析的四阶段 1 需求数据情况需求产出 2 数据规整(数据预处理,数据清洗,数据重构) 2.1 数据预处理 2.1.1 发现错误的对策 2.1.2 修正缺失值 2.2 修正错误数据方 ...
Python之pandas：利用describe函数统计【类别型】特征/离散型变量的描述性统计信息(包括个数count、unique、top及其freq、first、last)之详细攻略
Python之pandas:利用describe函数统计[类别型]特征/离散型变量的描述性统计信息(包括个数count.unique.top及其freq.first.last)之详细攻略目录利用d ...
python 描述性分析_描述性分析-1对被解释变量进行描述
描述性分析-1对被解释变量进行描述描述性分析-1对被解释变量进行描述如果应用需要使用数据库,必须配置数据库连接信息,数据库的配置文件有多种定义方式. 配置文件在全局或者应用配置目录(不清楚配置目录 ...
招聘网探究分析报告（以描述性分析为主）
招聘网探究分析报告(以描述性分析为主) 1 引言记得在我中学时,就听到过"大学生一毕业就失业"的言论.网上资料显示是大学扩招,书本理论知识与岗位真实需求脱节严重,善于纸上谈兵而 ...

python 描述性分析的包_pasty描述性统计包

python 描述性分析的包_pasty描述性统计包相关推荐

最新文章

热门文章