python倾向匹配得分_临床研究的最后一道防线（四）：倾向性评分匹配PSM在Python的实现...

临床研究的最后一道防线（四）：倾向性评分匹配(propensity score matching, PSM) 在Python的实现

No.25介绍了SPSS实现倾向性评分匹配(propensity score matching, PSM)的具体流程，如果用于基本临床试验或者发表论文已经足够，但当进阶进行1:2或者1：N多重有放回匹配时SPSS的劣势就暴露无遗，因此这一讲围绕脚本语言Python实现PSM的流程进行详细探讨。

运算前Python需要安装numpy、scipy、pandas、scikit-learn与PSM算法数据包ctmatching-0.0.6-source.zip (md5)，具体安装流程与细节参见24讲。数据文件仍然选择25讲里使用的re78。

进入脚本界面，首先引入上述数据包：

import pandas as pd

import numpy as np

from ctmatching import psm, load_re78

载入数据集re78：

control, treatment = load_re78()

使用len函数可看到处理组treatment的观察数目为185，对照组control的观察数目为429。

len(control)

Out[78]: 429

len(treatment)

Out[79]: 185

使用help函数查阅psm的说明：help（psm）

psm(control, treatment, use_col=None, stratify_order=None, independent=True, k=1)

Propensity score matching main function.

If you want to know the inside of the psm algorithm, check

:func:`stratified_matching`, :func:`non_stratified_matching`,

:func:`non_repeat_index_matching`, :func:`independent_index_matching`.

otherwise, just read the parameters' definition.

Suppose we have m1 control samples, m2 treatment samples. Sample is n-dimension vector.

:param control: control group sample data, m1 x n matrix. Example::

[[c1_1, c1_2, ..., c1_n], # c means control

[c2_1, c2_2, ..., c2_n],

...,

[cm1_1, cm1_2, ..., cm1_n],]

:type control: numpy.ndarray

:param treatment: control group sample data, m2 x n matrix. Example::

[[t1_1, t1_2, ..., t1_n], # t means treatment

[t2_1, t2_2, ..., t2_n],

...,

[tm1_1, tm1_2, ..., tm1_n],]

:type treatment: numpy.ndarray

:param use_col: (default None, use all) list of column index. Example::

[0, 1, 4, 6, 7, 9] # use first, second, fifth, ... columns

:type use_col: list or numpy.ndarray

:param stratify_order: (default None, use normal nearest neighbor)

list of list. Example::

# for input data has 6 columns

# first feature has highest priority

# [second, third, forth] features' has second highest priority by mean of euclidean distance

# fifth feature has third priority, ...

[[0], [1, 2, 3], [4], [5]]

:type stratify_order: list of list

:param independent: (default True), if True, same treatment sample could be matched to different control sample.

:type independent: Boolean

:param k: (default 1) Number of samples selected from control group.

:type k: int

:returns selected_control_index: all control sample been selected for

entire treatment group.

:returns selected_control_index_for_each_treatment: selected control sample for each treatment sample.

selected_control_index: selected control sample index. Example (k = 3)::

(m2 * k)-length array: [7, 120, 43, 54, 12, 98, ..., 71, 37, 14]

selected_control_index_for_each_treatment: selected control sample index for each treatment sample. Example (k = 3)::

# for treatment[0], we have control[7], control[120], control[43]

# matched by mean of stratification.

[[7, 120, 43],

[54, 12, 98],

...,

[71, 37, 14],]

:raises InputError: if the input parameters are not legal.

:raises NotEnoughControlSampleError: if don't have sufficient data for independent index matching.

头晕吧！简单一点。

selected_control, selected_control_each_treatment = psm(

control, treatment, use_col=[1,2,3,4,5,6], stratify_order=None,

independent=True, k=2)

psm需要调用的五个关键参数是：对照组（control）、处理组（treatment）、匹配的变量（use_col）, 匹配优先级stratify_order，independent：if True, same treatment sample could be matched to different control sample，即可以进行有放回多重匹配，k为一个处理组匹配的对照组个数，这里选为2，即采用1:2匹配。

selected_control为选择出的对照组，selected_control_each_treatment为处理组匹配的对照组编号。

进行for循环嵌套，外置循环treatment_sample在treatment中进行，index为对照组编号。内循环目的是寻找匹配出的对照组control[index[i]]（i的取值范围为0,1）。

for treatment_sample, index in zip(treatment, selected_control_each_treatment):

print treatment_sample

print("matches")

for i in range(2):

print control[index[i]]

以下是结果，匹配出来后可进行后续的进一步比较分析，这里不再罗列。

[u'NSW183', 1, 35, 9, 1, 0, 1, 1, 13602.43, 13830.64, 12803.97]

matches

[u'PSID27', 0, 36, 9, 1, 0, 1, 1, 13256.4, 8457.484, 0.0]

[u'PSID6', 0, 37, 9, 1, 0, 1, 1, 13685.48, 12756.05, 17833.2]

=======================================

[u'NSW184', 1, 35, 8, 1, 0, 1, 1, 13732.07, 17976.15, 3786.628]

matches

[u'PSID27', 0, 36, 9, 1, 0, 1, 1, 13256.4, 8457.484, 0.0]

[u'PSID6', 0, 37, 9, 1, 0, 1, 1, 13685.48, 12756.05, 17833.2]

=======================================

[u'NSW185', 1, 33, 11, 1, 0, 1, 1, 14660.71, 25142.24, 4181.942]

matches

[u'PSID380', 0, 34, 12, 1, 0, 1, 0, 0.0, 0.0, 18716.88]

[u'PSID293', 0, 31, 12, 1, 0, 1, 0, 0.0, 42.96774, 11023.84]

python倾向匹配得分_临床研究的最后一道防线（四）：倾向性评分匹配PSM在Python的实现...相关推荐

python倾向匹配得分_在SPSS软件中实现1:1倾向性评分匹配(PSM)分析
谈起临床研究,如何设立一个靠谱的对照,有时候成为整个研究成败的关键.对照设立的一个非常重要的原则就是可比性,简单说就是对照组除了研究因素外,其他的因素应该尽可能和试验组保持一致,随机是最理想的策略!通 ...
python倾向匹配得分_手把手教你做倾向评分匹配 -PSM
原标题:手把手教你做倾向评分匹配 -PSM 本文首发于"百味科研芝士"微信公众号,转载请注明:百味科研芝士,Focus科研人的百味需求. 各位科研芝士的朋友大家好,今天和大家分享一 ...
python倾向匹配得分_数据分析36计(九)：倾向得分匹配法(PSM)量化评估效果分析
1. 因果推断介绍如今量化策略实施的效果评估变得越来越重要,数据驱动产品和运营.业务等各方的理念越来越受到重视.如今这方面流行的方法除了实验方法AB testing外,就是因果推断中的各种观察研究方 ...
python倾向匹配得分_倾向评分匹配的SPSS和R实现方法
SPSS在22版和23版加入了倾向评分匹配方法,笔者多次操作,程序界面还算友好,现给大家展示一下,供初次使用者参考. 如下图,一个数据,包括了id(病例的唯一编码).group(干预方法).cf1-c ...
python象限判断_玩数据之数据方法篇—四象限分析法—附EXCEL+Python案例
一. 怎么使用(二维)四象限分析案例一,以一个电子商务公司为例,老板需要评估销售的商品类别的整体情况.那么,参考二维四象限分析的套路--直接可以按照销售金额,以及销售利润两个方面进行(具体直接套用下 ...
python 嵌套数组_兴趣是最好的老师，快速入门：十分钟学会python
初试牛刀假设你希望学习Python这门语言,却苦于找不到一个简短而全面的入门教程.那么本教程将花费十分钟的时间带你走入Python的大门.本文的内容介于教程(Toturial)和速查手册(Cheat ...
python和心理学_心理学研究人员的最佳Python库
python和心理学 Python is gaining popularity in many fields of science. This means that there also are ma ...
python嵌套循环优化_减少循环嵌套，提升代码运行速度！你不知道的3个实用Python函数...
作为21世纪最流行的语言之一,Python有很多有趣的功能值得深入探索和研究.今天将讨论其中的三个你可能没听说过的函数,将从理论和实际应用两方面为你详细讲解. 我之所以要讨论这些函数,主要是因为它们可 ...
python概率游戏_看得见概率，可视化讲解概率学基础（附带python代码）
01 概率论的开端概率论是一门源自生活的学科,但是,直到1654年,帕斯卡和费马才开始对概率论进行实质性的研究,也标志着概率论正式的成为了数学的一个分支. 人们最早熟悉和认识到概率是从身边的等可能事 ...

python倾向匹配得分_临床研究的最后一道防线（四）：倾向性评分匹配PSM在Python的实现...

python倾向匹配得分_临床研究的最后一道防线（四）：倾向性评分匹配PSM在Python的实现...相关推荐

最新文章

热门文章