市场研究中的数据分析知识整理（七）-结构方程模型

结构方程模型

结构方程模型有极其复杂的数学背景知识要求，模型形成过程繁复且陷阱众多，结果的解释力也很大程度取决于数据质量和分析角度，同时，他又对于以问卷调查形式获得的数据，并对产品涉入度评估量级和产品复购模型等问题，有坚实的理论支持，可以很好的回应问题。所以这一总结旨在以最一般化和简明的语言叙述，结构方程运用的一般步骤和方法。SEM包括:回归分析、因子分析(验证性因子分析、探索性因子分析)、ｔ检验、方差分析、比较各组因子均值、交互作用模型、实验设计

具体来说，SEM主要有以下特征：

同时处理多个因变量
容许自变量和因变量含测量[误差传统方法（如回归）假设自变量没有误差 ]
同时估计因子结构和因子关系
容许更大弹性的测量模型
估计整个模型的拟合程度[用以比较不同模型 ]

量及评估 PIES CFA模型

PIES（product involvement and enthusiasm scale）隐因子结构模型主要通过4个无法观测的隐变量，来判断一个产品的特征涉入程度或者该类别个人形象涉入。

数据说明与问题明晰

library(car)semdf <- read.csv("http://r-marketing.r-forge.r-project.org/data/rintro-chapter10pies.csv")
some(semdf,5)

数据是一个有11个观测变量组成的问卷调查数据，11个观测变量均为取值为1~7的离散数值变量。11个观测变量所构成的项对应3个隐变量（总体涉入因子(general)、特征涉入因子(feature)、形象涉入因子(image)），3个隐变量最终构成一个总体隐变量pies（产品涉入和参与度）

#「=~」读作由。。。 表示，表明隐变量和项的关系
strctr <-  "general =~ i1 + i2 + i3feature =~ i4+i5+i6 +i7image =~ i8 + i9 + i10 + i11pies =~ general + feature + image"library(RColorBrewer)
scatterplotMatrix(semdf[,c(1,2,4,5,8,9)],diag= "histogram",col = brewer.pal(3,"Paired"), ellipse = T)

为便于展示，只选取每个隐变量内的前两个变量，由相关矩阵可见，组内相关性明显高于组间相关性，且都是正向关系。

首先需要通过PIES CFA确定这些项和隐变量之间的关系，以及隐变量和总体隐变量之间的关系。

PIES CFA模型

library(lavaan)ps_mdl <- cfa(strctr, data = semdf)
summary(ps_mdl, fit.measures = T)library(semPlot)
semPaths(ps_mdl, what = "est")

结果输出：

lavaan (0.5-23.1097) converged normally after  41 iterationsNumber of observations                          3600Estimator                                         MLMinimum Function Test Statistic              287.649Degrees of freedom                                41P-value (Chi-square)                           0.000Model test baseline model:Minimum Function Test Statistic             9920.901Degrees of freedom                                55P-value                                        0.000User model versus baseline model:Comparative Fit Index (CFI)                    0.975Tucker-Lewis Index (TLI)                       0.966Loglikelihood and Information Criteria:Loglikelihood user model (H0)             -52885.888Loglikelihood unrestricted model (H1)     -52742.064Number of free parameters                         25Akaike (AIC)                              105821.776Bayesian (BIC)                            105976.494Sample-size adjusted Bayesian (BIC)       105897.056Root Mean Square Error of Approximation:RMSEA                                          0.04190 Percent Confidence Interval          0.036  0.045P-value RMSEA <= 0.05                          1.000Standardized Root Mean Square Residual:SRMR                                           0.030Parameter Estimates:Information                                 ExpectedStandard Errors                             StandardLatent Variables:Estimate  Std.Err  z-value  P(>|z|)general =~                                          i1                1.000                           i2                0.948    0.042   22.415    0.000i3                1.305    0.052   25.268    0.000feature =~                                          i4                1.000                           i5                1.168    0.037   31.168    0.000i6                0.822    0.033   25.211    0.000i7                1.119    0.036   31.022    0.000image =~                                            i8                1.000                           i9                0.963    0.028   34.657    0.000i10               0.908    0.027   33.146    0.000i11               0.850    0.027   31.786    0.000pies =~                                             general           1.000                           feature           0.875    0.057   15.355    0.000image             0.932    0.060   15.628    0.000Variances:Estimate  Std.Err  z-value  P(>|z|).i1                0.657    0.020   33.498    0.000.i2                0.796    0.022   35.967    0.000.i3                0.463    0.022   21.479    0.000.i4                0.657    0.019   33.973    0.000.i5                0.554    0.019   28.588    0.000.i6                0.779    0.021   37.701    0.000.i7                0.533    0.018   29.199    0.000.i8                0.640    0.020   32.071    0.000.i9                0.476    0.016   29.501    0.000.i10               0.560    0.017   32.697    0.000.i11               0.599    0.017   34.500    0.000general           0.089    0.015    5.858    0.000feature           0.256    0.018   14.538    0.000image             0.375    0.022   17.165    0.000pies              0.248    0.021   11.570    0.000

结果输出解读：
Comparative Fit Index (CFI)：拟合指数（越接近1越好）， RMSEA ：残差（越小越好），都是模型拟合强健性判断。
Latent Variables:隐变量和各项之间的路径系数。

路径图：

整体来看，该模型有较好的拟合状态。

PIES CFA模型再讨论

三个隐因子的存在是实现预设的，尽管结果已经算是比较理想了，但如果不知悉这个关系，或者有比这更好的关系？这就需要进一步讨论项和隐因子之间可能存在的关系了。

#单因子模型
strctr_1 <-  "pies =~ i1 + i2 + i3 +i4+i5+i6 +i7 +i8 + i9 + i10 + i11"
ps_mdl_1 <- cfa(strctr_1, data = semdf)
semPaths(ps_mdl_1, what = "est")#单层隐因子，且存在相关性
strctr_2 <- "General =~ i1 + i2 + i3
Feature=~i4+i5+i6 +i7
Image =~i8+i9+i10+i11
General ~~ 0.1*Feature
General ~~ 0.1*Image
Feature ~~ 0.1*Image "
ps_mdl_2 <- cfa(strctr_2, data = semdf)
semPaths(ps_mdl_2, what = "est")

单因子模型路径图：

单层隐因子模型路径图：

对比三个模型：

library(semTools)
compareFit(ps_mdl,ps_mdl_1,ps_mdl_2)
summary(ps_mdl_2, fit.measures = T)

################### Nested Model Comparison #########################chi df      p delta.cfi
pies.fit - pies.fit.NH3      222.43  3  <.001    0.0222
pies.fit.NH3 - pies.fit.NH1 2774.50  0  <.001    0.2812
#################### Fit Indices Summaries ##########################chisq df pvalue   cfi   tli         aic         bic rmsea  srmr
pies.fit.NH1 3284.581 44 .000† .672 .589 108812.709 108948.860 .143 .102 pies.fit.NH3 510.078 44 .000† .953 .941 106038.205 106174.356 .054 .078 pies.fit 287.649 41 .000† .975† .966† 105821.776† 105976.494† .041† .030†

相比而言，单层隐因子模型比其他两个模型有更好的解释性。

结构方程模型

基于协方差的结构方程模型主要回应：客户对产品质量的感知和对产品满意度有多大关系？产品质量是否比客户感知的产品价值更重要？客户声称的复购意愿最重要的影响因素是什么？

数据说明与问题明晰


semdf2 <- read.csv("http://r-marketing.r-forge.r-project.org/data/rintro-chapter10sat.csv")
some(semdf2,5)

可以看到数据由5个潜因子各自对应3个调查项，共计15个调查项，其中5个潜因子分别是quality, value,customer satisfication，cost和repeat buy。其关系如：

并依照关系建模如下：

strctr_3 <-"quatlity =~ csat + value q1+q2+q3 +cost*0cost =~ value + repeat+ c1 + c2 + c3value =~ csat + v1 + v2 + v3csat =~ repeat + cs1 + cs2 + cs3repeat =~ r1+ r2 + r3 "sem_mdl <- sem(strctr_3, data = semdf2, std.lv =T)
summary(sem_mdl, fit.measures = T)

获得的结果如下：

同样的方法，对模型结构进行微调，获得候选模型，从而进行对比，以确定模型的可靠性。