我需要进行探索性因子分析,并使用Python计算每个观察的分数,假设只有1个潜在因素.似乎sklearn.decomposition.FactorAnalysis()是要走的路,但遗憾的是documentation和example(遗憾的是我无法找到其他例子)对我来说还不够清楚如何完成工作.

我有以下测试文件,包含29个29变量的观察结果(test.csv):

49.6,34917,24325.4,305,101350,98678,254.8,276.9,47.5,1,3,5.6,3.59,11.9,0,97.5,97.6,8,10,100,0,0,96.93,610.1,100,1718.22,6.7,28,5

275.8,14667,11114.4,775,75002,74677,30,109,9.1,1,0,6.5,3.01,8.2,1,97.5,97.6,8,8,100,0,0,100,1558,100,2063.17,5.5,64,5

2.3,9372.5,8035.4,4.6,8111,8200,8.01,130,1.2,0,5,0,3.33,6.09,1,97.9,97.9,8,8,67.3,342.3,0,99.96,18.3,53,1457.27,4.8,8,4

7.10,13198.0,13266.4,1.1,708,695,6.1,80,0.4,0,4,0,3.1,8.2,1,97.8,97.9,8,8,45,82.7,0,99.68,4.5,80,1718.22,13.8,0,3

1.97,2466.7,2900.6,19.7,5358,5335,10.1,23,0.5,0,2,0,3.14,8.2,0,97.3,97.2,9,9,74.5,98.2,0,99.64,79.8,54,1367.89,6.4,12,4

2.40,2999.4,2218.2,0.80,2045,2100,8.9,10,1.5,1,3,0,2.82,8.6,0,97.4,97.2,8,8,47.2,323.8,0,99.996,13.6,24,1249.67,2.7,12,3

0.59,4120.8,5314.5,0.54,14680,13688,14.9,117,1.1,0,3,0,2.94,3.4,0,97.6,97.7,8,8,11.8,872.6,0,100,9.3,52,1251.67,14,14,2

0.72,2067.7,2364,3,367,298,7.2,60,2.5,0,12,0,2.97,10.5,0,97.5,97.6,8,8,74.7,186.8,0,99.13,12,57,1800.45,2.7,4,2

1.14,2751.9,3066.8,3.5,1429,1498,7.7,9,1.6,0,3,0,2.86,7.7,0,97.6,97.8,8,9,76.7,240.1,0,99.93,13.6,60,1259.97,15,8,3

1.29,4802.6,5026.1,2.7,7859,7789,6.5,45,1.9,0,3,0,2.5,8.2,0,98,98,8,8,34,297.5,0,99.95,10,30,1306.44,8.5,0,4

0.40,639.0,660.3,1.3,23,25,1.5,9,0.1,0,1,0,2.5,8.2,0,97.7,97.8,8,8,94.2,0,0,100,4.3,50,1565.44,19.2,0,4

0.26,430.7,608.1,2,33,28,2.5,7,0.4,0,6,0,2.5,8.2,0,97.4,97.4,8,8,76.5,0,0,98.31,8,54,1490.08,0,0,4

4.99,2141.2,2357.6,3.60,339,320,8.1,7,0.2,0,8,0,2.5,5.9,0,97.3,97.4,8,8,58.1,206.3,0,99.58,13.2,95,1122.92,14.2,8,2

0.36,1453.7,1362.2,3.50,796,785,3.7,9,0.1,0,9,0,2.5,13.6,0,98,98.1,8,8,91.4,214.6,0,99.74,7.5,53,1751.98,11.5,0,2

0.36,1657.5,2421.1,2.8,722,690,8.1,8,0.4,0,1,0,2.5,8.2,0,97.2,97.3,11,12,37.4,404.2,0,99.98,10.9,35,1772.33,10.2,8,3

1.14,5635.2,5649.6,3,2681,2530,5.4,20,0.3,0,1,0,3.1,8.2,0,97.7,97.8,8,11,50.1,384.7,0,99.02,11.6,27,1306.08,16,0,2

0.6,1055.9,1487.9,1.3,69,65,2.5,6,0.4,0,8,0,2.5,8.2,0,97.9,97.7,8,11,63,137.9,0,99.98,5.1,48,1595.06,0,0,4

0.08,795.3,1174.7,1.40,85,76,2.2,7,0.2,0,0,0,2.5,8.2,0,97.4,97.5,8,8,39.3,149.3,0,98.27,5.1,52,1903.9,8.1,0,2

0.90,2514.0,2644.4,2.6,1173,1104,5.5,43,0.8,0,10,0,2.5,13.6,0,97.5,97.5,8,10,58.7,170.5,0,80.29,10,34,1292.72,4,0,2

0.27,870.4,949.7,1.8,252,240,2.2,31,0.2,0,1,0,2.5,8.2,0,97.5,97.6,8,8,64.5,0,0,100,6.6,29,1483.18,9.1,0,3

0.41,1295.1,2052.3,2.60,2248,2135,6.0,12,0.8,0,4,0,2.7,8.2,0,97.7,97.7,8,8,71.1,261.3,0,91.86,4.6,21,1221.71,9.4,0,4

1.10,3544.2,4268.9,2.1,735,730,6.6,10,1.7,0,14,0,2.5,8.2,0,97.7,97.8,8,8,52,317.2,0,99.62,9.8,46,1271.63,14.2,0,3

0.22,899.3,888.2,1.80,220,218,3.6,7,0.5,0,1,0,2.5,8.2,0,97.2,97.5,8,8,22.5,0,0,70.79,10.6,32,1508.02,0,0,4

0.24,1712.8,1735.5,1.30,41,35,5.4,7,0.5,0,1,0,3.28,8.2,0,97.8,97.8,9,10,16.6,720.2,0,99.98,4.3,53,1324.46,0,4,2

0.2,558.4,631.9,1.7,65,64,2.5,7,0.2,0,5,0,2.5,8.2,0,97.7,97.5,8,8,60.7,0,0,99.38,6.1,52,1535.08,0,0,2

0.21,599.9,1029,1.1,69,70,3.7,85.7,0.1,0,12,0,2.5,8.2,0,97.4,97.5,8,8,48.6,221.2,0,100,5.4,40,1381.44,25.6,0,2

0.10,131.3,190.6,1.6,28,25,2.9,7,0.3,0,3,0,2.5,8.2,0,97.7,97.8,8,8,58.9,189.4,0,99.93,6.9,42,1525.58,17.4,0,3

0.44,3881.4,5067.3,0.9,2732,2500,11.2,10,1.5,0,5,0,2.67,8.2,0,97.4,97.3,8,11,14.5,1326.2,0,99.06,3.7,31,1120.54,10.3,10,2

0.18,1024.8,1651.3,1.01,358,345,4.6,35,0.3,0,2,0,2.5,8.2,0,97.8,97.9,8,10,15.9,790.2,0,100,4.3,48,1531.04,10.5,0,3

0.46,682.9,784.2,1.8,103,109,2.2,8,0.4,0,4,0,2.5,8.2,0,97.8,97.9,8,8,82.7,166.3,0,99.96,6.4,44,1373.6,13.5,0,2

0.12,370.4,420.0,1.10,28,25,3.4,10,0.1,0,6,0,2.57,8.2,0,97.6,97.8,8,11,51.6,120,0,99.85,8.1,40,1297.94,0,0,3

0.03,552.4,555.1,0.8,54,49,3.5,10,0.4,0,0,0,2.5,8.2,0,97.4,97.6,8,10,33.6,594.5,0,100,3.2,41,1184.34,6.6,0,3

0.21,1256.5,2434.8,0.9,1265,1138,6.3,20,1.3,0,2,0,2.6,8.2,0,98,97.9,8,9,20.1,881,0,99.1,3.9,31,1265.93,7.8,0,3

0.09,320.6,745.7,1.10,37,25,2.7,8,0.3,0,9,0,2.5,8.2,0,98,97.8,8,8,49.2,376.4,0,99.95,4.3,39,1285.11,0,0,3

0.08,452.7,570.9,1,18,20,4.7,9,0.6,0,2,0,2.45,8.2,0,97.1,97.1,8,8,19.9,1103.8,0,99.996,2.9,22,1562.61,21.9,0,3

0.13,967.9,947.2,1,74,65,4.0,25,1.4,0,6,0,2.5,8.2,0,98,98,9,11,30.1,503.1,0,99.999,3.4,55,1269.33,0,0,2

0.07,495.0,570.3,1.2,27,30,4.3,7,0.5,0,12,0,3.62,8.2,0,98.2,98.2,15,13,29.8,430.5,0,99.7,4.9,40,1461.79,14.6,0,2

0.17,681.9,537.4,1.1,113,120,2.9,12,0.4,0,8,0,2.5,8.2,0,98.2,98.3,8,8,24,74.3,0,100,5,43,1290.16,0,0,3

0.05,639.7,898.2,0.40,9,12,3.0,7,0.1,0,1,0,2.5,8.2,0,97.6,97.8,15,11,11.9,1221.1,0,99.996,1.7,40,1372,7,0,4

0.65,2067.8,2084.2,2.50,414,398,7.3,6,0.7,0,4,0,2.16,8.2,0,97.8,97.9,12,12,60.1,146.3,0,99.96,10.4,44,1059.68,7.4,0,2

0.12,804.4,1416.4,3.30,579,602,4.2,7,1.8,0,1,0,2.5,8.2,0,98.1,98.3,8,10,8.9,2492.3,0,95.4,2.2,34,1345.76,7,0,2

使用我根据官方示例和this post编写的代码

我得到了奇怪的结果.码:

from sklearn import decomposition, preprocessing

from sklearn.cross_validation import cross_val_score

import csv

import numpy as np

data = np.genfromtxt('test.csv', delimiter=',')

def compute_scores(X):

n_components = np.arange(0, len(X), 1)

X = preprocessing.scale(X) # data normalisation attempt

pca = decomposition.PCA()

fa = decomposition.FactorAnalysis(n_components=1)

pca_scores, fa_scores = [], []

for n in n_components:

pca.n_components = n

fa.n_components = n

#pca_scores.append(np.mean(cross_val_score(pca, X))) # if I attempt to compute pca_scores I get the error.

fa_scores.append(np.mean(cross_val_score(fa, X)))

print pca_scores, fa_scores

compute_scores(data)

代码输出:

[],

[-947738125363.77405,

-947738145459.86035,

-947738159924.70471,

-947738174662.89746,

-947738206142.62854,

-947738179314.44739,

-947738220921.50684,

-947738223447.3678,

-947738277298.33545,

-947738383772.58606,

-947738415104.84912,

-947738406361.44482,

-947738394379.30359,

-947738456528.69275,

-947738501001.14319,

-947738991338.98291,

-947739381280.06506,

-947739389033.33557,

-947739434992.48047,

-947739549511.2655,

-947739355699.70959,

-947739879828.51514,

-947739898216.39099,

-947739905804.71033,

-947739902618.47791,

-947738564594.54639,

-948816122907.87366,

-947744046601.55029,

-947738624937.61292,

-947738625325.73486,

-947738626111.14441,

-947738624973.92188,

-947738625200.06946,

-947738625568.65027,

-947738625528.69666,

-947738625359.41992,

-947738624906.67529,

-947738625652.12439,

-947739509002.01868,

-947738625426.81946,

-947738625380.45837]

这个结果远非预期的结果.这是此任务的R代码和相同的数据.它的输出正常(结果接近某些能够执行FA的IBM程序的输出):

data

col_names

drops

for (name in col_names){

st_dev

if (st_dev == 0){

drops

}

}

da_nal

factanal(na.omit(da_nal), factors = 1, scores = 'regression')$scores

此代码的输出是:

Factor1

1 4.89102190

2 3.65004187

3 0.14628700

4 -0.20255897

5 -0.01565570

6 -0.16438863

7 0.40835986

8 -0.25823984

9 -0.20813064

10 0.09390067

11 -0.28891296

12 -0.28882753

13 -0.26624358

14 -0.25202275

15 -0.25181326

16 -0.15653679

17 -0.28702281

18 -0.28865654

19 -0.23251509

20 -0.28066125

21 -0.18714387

22 -0.24969113

23 -0.28302552

24 -0.28712610

25 -0.29196529

26 -0.28659988

27 -0.29502523

28 -0.15802910

29 -0.27440118

30 -0.29083667

31 -0.29548220

32 -0.29461059

33 -0.23594859

34 -0.29654336

35 -0.29759659

36 -0.29085001

37 -0.29539071

38 -0.29234303

39 -0.29702103

40 -0.27595130

41 -0.27184361

所以我希望在Python中获得类似的结果(我知道我不会得到确切的数字),但我不知道如何.

解决方法:

似乎我想出了如何获得分数.

from sklearn import decomposition, preprocessing

import numpy as np

data = np.genfromtxt('rangir_test.csv', delimiter=',')

data = data[~np.isnan(data).any(axis=1)]

data_normal = preprocessing.scale(data)

fa = decomposition.FactorAnalysis(n_components = 1)

fa.fit(data_normal)

for score in fa.score_samples(data_normal):

print score

不幸的是,输出(见下文)与factanal()的输出非常不同.任何有关分解的建议.FactorAnalysis()将不胜感激.

Scikit-learn分数输出:

-69.8587183816

-116.353511148

-24.1529840248

-36.5366398005

-7.87165586175

-24.9012815104

-23.9148486368

-10.047780535

-4.03376369723

-7.07428842783

-7.44222705099

-6.25705487929

-13.2313513762

-13.3253819521

-9.23993173528

-7.141616656

-5.57915693405

-6.82400483045

-15.0906961724

-3.37447211233

-5.41032267015

-5.75224753811

-19.7230390792

-6.75268922909

-4.04911793705

-10.6062761691

-3.17417070498

-9.95916350005

-3.25893428094

-3.88566777358

-3.30908856716

-3.58141292341

-3.90778368669

-4.01462493538

-11.6683969455

-5.30068548445

-24.3400870389

-7.66035331181

-13.8321672858

-8.93461397086

-17.4068326999

标签:python,r,scikit-learn,factor-analysis

python 因子分析 权重计算方法_如何使用Python(scikit-learn)计算FactorAnalysis得分?相关推荐

  1. python 因子分析 权重计算方法_因子得分如何计算_spss如何计算因子得分

    因子得分如何计算 各个因子得分是如何计算的?是旋转后的各个因子载荷矩阵中的因子系数*相应的各个指标标准后数据? 解答:因子得分=因子得分系数(因子得分系数矩阵里相应的值)*标准化转化后的数据 再问如何 ...

  2. python 因子分析 权重计算方法_【万矿新品】因子研究利器——WindAlpha

    原标题:[万矿新品]因子研究利器--WindAlpha 因子选股模型是我们在量化策略研究中使用最多的一种方法.今天,万矿重磅推出一款高效.便捷的因子分析工具--WindAlpha,让您在万矿上用于进行 ...

  3. python变量定义大全_详解python变量与数据类型

    这篇文章我们学习 Python 变量与数据类型 变量 变量来源于数学,是计算机语言中能储存计算结果或能表示值抽象概念,变量可以通过变量名访问.在 Python 中 变量命名规定,必须是大小写英文,数字 ...

  4. python积木式编程_实例讲解python函数式编程

    函数式编程是使用一系列函数去解决问题,按照一般编程思维,面对问题时我们的思考方式是"怎么干",而函数函数式编程的思考方式是我要"干什么". 至于函数式编程的特点 ...

  5. python怎么求指数_求指数 python

    softmax用于多分类过程中最后一层,将多个神经元的输出,映射到(0, 1)区间内,可以看成概率来理解,从而来进行多分类! softmax函数如下: 更形象的如下图表示: softmax 直白来说就 ...

  6. python实现数据可视化_使用Matplotib python实现数据可视化

    python实现数据可视化 I Feel: 我觉得: In today's digital world data has become as important as air. Machines &a ...

  7. python画动物代码_如何用python画简单的动物_后端开发

    python3.x完全兼容python2.x吗?_后端开发 可以说是完全不兼容.相对于Python的早期版本,Python3是一个较大的升级,为了不带入过多的累赘,Python 3.0在设计的时候没有 ...

  8. python适用于哪些芯片_五年Python三大秘诀!日常生活不可或缺的秘密武器

    EDA365欢迎您登录! 您需要 登录 才可以下载或查看,没有帐号?注册 x 本帖最后由 Ber_thaw99 于 2020-12-28 14:07 编辑' P& t5 n# [5 J) Y& ...

  9. python展开函数方法_逐步展开Python详细教学—Python语法

    Python语法–在Python世界迈出第一步 我们已经拥有了许多的编程语言,而且都有自己的特色,但是一种语言的独特之处在于它的特性.最终,是它的特点让它被选中或通过项目.因此,在开始更深入的Pyth ...

  10. python数据科学手册_小白入门Python数据科学

    前言 本文讲解了从零开始学习Python数据科学的全过程,涵盖各种工具和方法 你将会学习到如何使用python做基本的数据分析 你还可以了解机器学习算法的原理和使用 说明 先说一段题外话.我是一名数据 ...

最新文章

  1. linux驱动:音频驱动(五)ASoc之codec驱动
  2. 如果理解运算符和各类数值的布尔值
  3. KGmailNotifier-Gmail 邮件关照轨范
  4. 解决maven3.6版本不兼容idea2017问题
  5. Ubuntu9.10 安装试水
  6. 从0开始:500行代码实现 LSM 数据库
  7. JavaScript-分支和循环
  8. 一次难忘的 MTS 故障的排除过程
  9. Bootstrap3 源码版本的文件结构
  10. 怎样选择宽带上网产品--解读上海电信政企宽带新套餐
  11. STM32:GPIO的8种输入输出模式深入详解
  12. 机器学习7个主要领域
  13. ddr4 dqs 频率_你好,请问你知道DDR3中是DQS是什么意思吗
  14. wincap的使用总结
  15. 鹿晗公布恋情致微博宕机;微信发布国庆长假数据报告;三星宣布将销售翻新版Note 7 丨价值早报
  16. 窗口根据屏幕分辨率自动调整大小
  17. 怎么把ogg音乐格式转换成mp3
  18. 插入摄像头时,系统右下角提示:无法识别的USB设备:跟这台计算机连接的一个USB设备运行不正常...
  19. 整理一下个人学习前端的网站
  20. Page Visibility(页面可见性) API介绍、微拓展

热门文章

  1. 10008---光环效应
  2. 2018年计划Review,2019年计划Kick off | 掘金年度征文
  3. ubuntu 20安装NVIDIA驱动并处理蓝色背景的界面 perform mok management
  4. php圆周长怎么求,圆的周长怎么求 公式是什么
  5. 最新Django经典面试问题与答案汇总
  6. Kubernetes快速部署
  7. pandas DataFrame 交集并集补集
  8. MiniUI日期选择框MonthPicker英文修改为中文
  9. js鼠标经过与离开事件
  10. QtWebApp的使用【在Qt中搭建HTTP服务器】(三)