Stata实现DID（倍差法）全流程

理论不介绍了，下面放一个之前做过的DID政策评估的项目。

一、背景介绍

研究环境约谈制度对PM2.5排放量的影响，采用55个地级市，2014-2018年5期面板数据，政策实施点是2016年，共5个处理组，50个控制组。以gdp、人口密度、固定资产投资、产业结构、财政状况、气温指数六个为控制变量。研究环境约谈对pm2.5排放量的影响。

二、准备工作

2.1 软件平台

Stata 16 SE

2.2 需要安装的外部文件

ssc install asdoc,replace//导入word
ssc install estout,replace//绘制三线表
ssc install parmest,replace//导出回归分析的参数和统计量，只有stata16可以安装
ssc install coeplot,replace//回归系数可视化
ssc install dpplot,replace//绘制核密度估计图
ssc install diff,replace//双重差分估计

2.3 数据概览

省份	城市	id	year	PM2.5	GDP	POP	INV	FIN	TEM	IND
山西	长治	1	2014	50.1565	1331.14	340.443	12456466	1363251	11.8174	0.26493
安徽	安庆	2	2014	54.3724	1544.32	537.6	13947517	1056537	16.4748	0.18293
河南	商丘	3	2014	79.8493	1697.64	824	14938866	1007474	15.0777	0.209
山东	济宁	4	2014	78.9295	3800.06	593.57	25381801	3341968	14.2175	0.15236
陕西	咸阳	5	2014	50.4538	2077.34	495.684	24413518	852961	13.0141	0.19124

表中给出了前5行数据，其中id为城市个体编号，year为年份编号。id=1，2，3，4，5为处理组，其余为控制组。pm2.5为结果变量，gdp-ind为控制变量。

三、所有代码

下面是完成这个DID评估的所有代码：

*切换工作路径
cd "XXX"//切换成自己的目标工作路径
*导入数据
use data.dta,clear
*设定面板
xtset id year
*描述性分析
asdoc sum pm25 gdp pop inv fin tem ind
*生成虚拟变量
gen time=(year>=2016)&!missing(year)
gen treat=(id<=5)&!missing(id)
gen did=time*treat
*DID估计一
asdoc xtreg pm25 time treat did gdp pop inv fin tem ind,fe
*DID估计二
diff pm25,t(treat) p(time) cov(gdp pop inv fin tem ind)
*平行趋势检验
gen period = year - 2016
forvalues i = 2(-1)1{
gen pre_`i' = (period == -`i' & treat == 1)
}
gen current = (period == 0 & treat== 1)
forvalues j = 1(1)2{
gen  time_`j' = (period == `j' & treat == 1)}
xtreg pm25 time treat pre_* current time_* i.year,fe
est sto reg
coefplot reg,keep(pre_* current time_*) vertical recast(connect) yline(0) xline(5,lp(dash)) ytitle("政策效应") xtitle("时期 （pre_*政策前，current政策年，time_*政策后）")*安慰剂检验一：提前政策时间
gen time_new = (year>=2015)&!missing(year)
gen treat_new=(id>5)&!missing(id)
gen did_new = time_new*treat_new
xtreg pm25 time_new treat_new did_new ind inv fin tem pop gdp  if year<=2016,r*安慰剂检验二：随机生成处理组和控制组
use id.dta,clear
sample 1
save "temp.dta",replace
use data.dta,clear
merge m:m id year using "temp.dta"
cap drop treat
gen treat =(_merge==3)
drop _merge
save "placebo_did.dta",replace
xtreg pm25 ind inv fin tem pop gdp c.treat#c.time,$absorb $cluster
parmest,format (estimate min95 max95 %8.2f p %8.3f) saving("temp1.dta", replace)
use "temp1.dta", clear
keep if parm=="c.treat#c.time"
save "simulations.dta", replace
*400次
qui{
forvalues i=1(1)400{use id.dta,clearsample  1 save "temp.dta",replaceuse data.dta,clearmerge m:m id year using "temp.dta"cap drop treatgen treat =(_merge==3)reg pm25 ind inv fin tem pop gdp c.treat#c.time,$absorb $clusterparmest,format (estimate min95 max95 %8.2f p %8.3f) saving("temp1.dta", replace)use temp1.dta, clearkeep if parm=="c.treat#c.time"cap append using "simulations.dta"save "simulations.dta", replace
}
}
dpplot estimate,xline(0,lc(black*0.5)  ) xline(-14.414,lc(red*0.5) lp(dash) )  xlabel(-20(10)20)   xtitle("Treatment Effect",size(*0.8)) ytitle("Density",size(*0.8))  note("") caption("")
*800次
qui{
forvalues i=1(1)800{use id.dta,clearsample  1 save "temp.dta",replaceuse data.dta,clearmerge m:m id year using "temp.dta"cap drop treatgen treat =(_merge==3)reg pm25 ind inv fin tem pop gdp c.treat#c.time,$absorb $clusterparmest,format (estimate min95 max95 %8.2f p %8.3f) saving("temp1.dta", replace)use temp1.dta, clearkeep if parm=="c.treat#c.time"cap append using "simulations.dta"save "simulations.dta", replace
}
}
dpplot estimate,xline(0,lc(black*0.5)  ) xline(-14.414,lc(red*0.5) lp(dash) )  xlabel(-20(10)20)   xtitle("Treatment Effect",size(*0.8)) ytitle("Density",size(*0.8))  note("") caption("")
*1200次
qui{
forvalues i=1(1)1200{use id.dta,clearsample  1 save "temp.dta",replaceuse data.dta,clearmerge m:m id year using "temp.dta"cap drop treatgen treat =(_merge==3)reg pm25 ind inv fin tem pop gdp c.treat#c.time,$absorb $clusterparmest,format (estimate min95 max95 %8.2f p %8.3f) saving("temp1.dta", replace)use temp1.dta, clearkeep if parm=="c.treat#c.time"cap append using "simulations.dta"save "simulations.dta", replace
}
}
dpplot estimate,xline(0,lc(black*0.5)  ) xline(-14.414,lc(red*0.5) lp(dash) )  xlabel(-20(10)20)   xtitle("Treatment Effect",size(*0.8)) ytitle("Density",size(*0.8))  note("") caption("")
*1600次
qui{
forvalues i=1(1)1600{use id.dta,clearsample  1 save "temp.dta",replaceuse data.dta,clearmerge m:m id year using "temp.dta"cap drop treatgen treat =(_merge==3)reg pm25 ind inv fin tem pop gdp c.treat#c.time,$absorb $clusterparmest,format (estimate min95 max95 %8.2f p %8.3f) saving("temp1.dta", replace)use temp1.dta, clearkeep if parm=="c.treat#c.time"cap append using "simulations.dta"save "simulations.dta", replace
}
}
dpplot estimate,xline(0,lc(black*0.5)  ) xline(-14.414,lc(red*0.5) lp(dash) )  xlabel(-20(10)20)   xtitle("Treatment Effect",size(*0.8)) ytitle("Density",size(*0.8))  note("") caption("")

说明：安慰剂检验第二种方法里，id.dta文件时只包含个体标识符id和年份标识符year的数据，从data.dta数据文件中提取。data.dta为样本数据集。

四、结果

4.1 描述性分析

4.2 DID回归

可以看到交互项的估计结果显著性不好

4.3 差分估计量

4.4 平行趋势检验

4.5 安慰剂检验

将政策实施点提前到2015年，并将样本区间界定在2014-2016年，如果交互项的系数不显著，说明通过安慰剂检验，即DID估计结果不受其他因素的影响

然后随机生成处理组和控制组再次检验，并绘制相应的核密度图。

数据获取地址：

链接：https://pan.baidu.com/s/15GvtWDExtCoxgQhapD3CWA

提取码：uhpo

有误的地方，感谢批评指正！