MIMIC-iv数据库官方SQL查询语句标注和初步探索性分析(一)--多种疾病的收录的人数
说明:
MIMIC 是知名的临床数据库,库的结构和SQL查询的数据的方法是很好的一个建立临床数据库的模板。这个官方的SQL为了进行charlson 评分,查询了多种慢性疾病。SQL代码后有对查询到的结果进行的初步分析(python) ,使我们可以对数据库中各种慢性疾病的人数有初步的了解。
心得体会:
MIMIC存储的形式是纵向数据(窄表),这里演示了如何将其一步转变为宽表;
可以看到一种疾病对应多个编码;
with临时表的用途;
-- THIS SCRIPT IS AUTOMATICALLY GENERATED. DO NOT EDIT IT DIRECTLY.
--DROP TABLE IF EXISTS charlson; CREATE TABLE charlson AS
-- ------------------------------------------------------------------
-- This query extracts Charlson Comorbidity Index (CCI) based on the recorded ICD-9 and ICD-10 codes.
--
-- Reference for CCI:
-- (1) Charlson ME, Pompei P, Ales KL, MacKenzie CR. (1987) A new method of classifying prognostic
-- comorbidity in longitudinal studies: development and validation.J Chronic Dis; 40(5):373-83.
--
-- (2) Charlson M, Szatrowski TP, Peterson J, Gold J. (1994) Validation of a combined comorbidity
-- index. J Clin Epidemiol; 47(11):1245-51.
--
-- Reference for ICD-9-CM and ICD-10 Coding Algorithms for Charlson Comorbidities:
-- (3) Quan H, Sundararajan V, Halfon P, et al. Coding algorithms for defining Comorbidities in ICD-9-CM
-- and ICD-10 administrative data. Med Care. 2005 Nov; 43(11): 1130-9.
-- ------------------------------------------------------------------
WITH diag AS--建立临时表格, 诊断信息
(SELECT hadm_id--医院的ID, CASE WHEN icd_version = 9 THEN icd_code ELSE NULL END AS icd9_code, CASE WHEN icd_version = 10 THEN icd_code ELSE NULL END AS icd10_codeFROM mimic_hosp.diagnoses_icd diag
)
, com AS--另外一个临时表格,基于上个临时表;
(SELECTad.hadm_id-- Myocardial infarction, MAX(CASE WHENSUBSTR(icd9_code, 1, 3) IN ('410','412')--substr(string,start,length)函数格式ORSUBSTR(icd10_code, 1, 3) IN ('I21','I22')--从第一个字符开始,匹配前3个,code的解释见d_icd_dianosis表格ORSUBSTR(icd10_code, 1, 4) = 'I252'THEN 1 ELSE 0 END) AS myocardial_infarct--心肌梗死-- Congestive heart failure, MAX(CASE WHEN SUBSTR(icd9_code, 1, 3) = '428'ORSUBSTR(icd9_code, 1, 5) IN ('39891','40201','40211','40291','40401','40403','40411','40413','40491','40493')OR SUBSTR(icd9_code, 1, 4) BETWEEN '4254' AND '4259'ORSUBSTR(icd10_code, 1, 3) IN ('I43','I50')ORSUBSTR(icd10_code, 1, 4) IN ('I099','I110','I130','I132','I255','I420','I425','I426','I427','I428','I429','P290')THEN 1 ELSE 0 END) AS congestive_heart_failure--充血性心力衰竭-- Peripheral vascular disease, , MAX(CASE WHEN SUBSTR(icd9_code, 1, 3) IN ('440','441')ORSUBSTR(icd9_code, 1, 4) IN ('0930','4373','4471','5571','5579','V434')ORSUBSTR(icd9_code, 1, 4) BETWEEN '4431' AND '4439'ORSUBSTR(icd10_code, 1, 3) IN ('I70','I71')ORSUBSTR(icd10_code, 1, 4) IN ('I731','I738','I739','I771','I790','I792','K551','K558','K559','Z958','Z959')THEN 1 ELSE 0 END) AS peripheral_vascular_disease--周围性血管疾病-- Cerebrovascular disease, MAX(CASE WHEN SUBSTR(icd9_code, 1, 3) BETWEEN '430' AND '438'ORSUBSTR(icd9_code, 1, 5) = '36234'ORSUBSTR(icd10_code, 1, 3) IN ('G45','G46')OR SUBSTR(icd10_code, 1, 3) BETWEEN 'I60' AND 'I69'ORSUBSTR(icd10_code, 1, 4) = 'H340'THEN 1 ELSE 0 END) AS cerebrovascular_disease--脑血管疾病-- Dementia, MAX(CASE WHEN SUBSTR(icd9_code, 1, 3) = '290'ORSUBSTR(icd9_code, 1, 4) IN ('2941','3312')ORSUBSTR(icd10_code, 1, 3) IN ('F00','F01','F02','F03','G30')ORSUBSTR(icd10_code, 1, 4) IN ('F051','G311')THEN 1 ELSE 0 END) AS dementia-- Chronic pulmonary disease,慢性肺疾病, MAX(CASE WHEN SUBSTR(icd9_code, 1, 3) BETWEEN '490' AND '505'ORSUBSTR(icd9_code, 1, 4) IN ('4168','4169','5064','5081','5088')OR SUBSTR(icd10_code, 1, 3) BETWEEN 'J40' AND 'J47'OR SUBSTR(icd10_code, 1, 3) BETWEEN 'J60' AND 'J67'ORSUBSTR(icd10_code, 1, 4) IN ('I278','I279','J684','J701','J703')THEN 1 ELSE 0 END) AS chronic_pulmonary_disease-- Rheumatic disease, MAX(CASE WHEN SUBSTR(icd9_code, 1, 3) = '725'ORSUBSTR(icd9_code, 1, 4) IN ('4465','7100','7101','7102','7103','7104','7140','7141','7142','7148')ORSUBSTR(icd10_code, 1, 3) IN ('M05','M06','M32','M33','M34')ORSUBSTR(icd10_code, 1, 4) IN ('M315','M351','M353','M360')THEN 1 ELSE 0 END) AS rheumatic_disease--风湿性疾病-- Peptic ulcer disease, MAX(CASE WHEN SUBSTR(icd9_code, 1, 3) IN ('531','532','533','534')ORSUBSTR(icd10_code, 1, 3) IN ('K25','K26','K27','K28')THEN 1 ELSE 0 END) AS peptic_ulcer_disease--消化性溃疡-- Mild liver disease, MAX(CASE WHEN SUBSTR(icd9_code, 1, 3) IN ('570','571')ORSUBSTR(icd9_code, 1, 4) IN ('0706','0709','5733','5734','5738','5739','V427')ORSUBSTR(icd9_code, 1, 5) IN ('07022','07023','07032','07033','07044','07054')ORSUBSTR(icd10_code, 1, 3) IN ('B18','K73','K74')ORSUBSTR(icd10_code, 1, 4) IN ('K700','K701','K702','K703','K709','K713','K714','K715','K717','K760','K762','K763','K764','K768','K769','Z944')THEN 1 ELSE 0 END) AS mild_liver_disease--轻微的肝脏疾病-- Diabetes without chronic complication, MAX(CASE WHEN SUBSTR(icd9_code, 1, 4) IN ('2500','2501','2502','2503','2508','2509') ORSUBSTR(icd10_code, 1, 4) IN ('E100','E10l','E106','E108','E109','E110','E111','E116','E118','E119','E120','E121','E126','E128','E129','E130','E131','E136','E138','E139','E140','E141','E146','E148','E149')THEN 1 ELSE 0 END) AS diabetes_without_cc--无慢性并发症的糖尿病-- Diabetes with chronic complication, MAX(CASE WHEN SUBSTR(icd9_code, 1, 4) IN ('2504','2505','2506','2507')ORSUBSTR(icd10_code, 1, 4) IN ('E102','E103','E104','E105','E107','E112','E113','E114','E115','E117','E122','E123','E124','E125','E127','E132','E133','E134','E135','E137','E142','E143','E144','E145','E147')THEN 1 ELSE 0 END) AS diabetes_with_cc--有慢性并发症的糖尿病-- Hemiplegia or paraplegia, MAX(CASE WHEN SUBSTR(icd9_code, 1, 3) IN ('342','343')ORSUBSTR(icd9_code, 1, 4) IN ('3341','3440','3441','3442','3443','3444','3445','3446','3449')OR SUBSTR(icd10_code, 1, 3) IN ('G81','G82')OR SUBSTR(icd10_code, 1, 4) IN ('G041','G114','G801','G802','G830','G831','G832','G833','G834','G839')THEN 1 ELSE 0 END) AS paraplegia--截瘫-- Renal disease, MAX(CASE WHEN SUBSTR(icd9_code, 1, 3) IN ('582','585','586','V56')ORSUBSTR(icd9_code, 1, 4) IN ('5880','V420','V451')ORSUBSTR(icd9_code, 1, 4) BETWEEN '5830' AND '5837'ORSUBSTR(icd9_code, 1, 5) IN ('40301','40311','40391','40402','40403','40412','40413','40492','40493') ORSUBSTR(icd10_code, 1, 3) IN ('N18','N19')ORSUBSTR(icd10_code, 1, 4) IN ('I120','I131','N032','N033','N034','N035','N036','N037','N052','N053','N054','N055','N056','N057','N250','Z490','Z491','Z492','Z940','Z992')THEN 1 ELSE 0 END) AS renal_disease--肾脏疾病-- Any malignancy, including lymphoma and leukemia, except malignant neoplasm of skin, MAX(CASE WHEN SUBSTR(icd9_code, 1, 3) BETWEEN '140' AND '172'ORSUBSTR(icd9_code, 1, 4) BETWEEN '1740' AND '1958'ORSUBSTR(icd9_code, 1, 3) BETWEEN '200' AND '208'ORSUBSTR(icd9_code, 1, 4) = '2386'ORSUBSTR(icd10_code, 1, 3) IN ('C43','C88')ORSUBSTR(icd10_code, 1, 3) BETWEEN 'C00' AND 'C26'ORSUBSTR(icd10_code, 1, 3) BETWEEN 'C30' AND 'C34'ORSUBSTR(icd10_code, 1, 3) BETWEEN 'C37' AND 'C41'ORSUBSTR(icd10_code, 1, 3) BETWEEN 'C45' AND 'C58'ORSUBSTR(icd10_code, 1, 3) BETWEEN 'C60' AND 'C76'ORSUBSTR(icd10_code, 1, 3) BETWEEN 'C81' AND 'C85'ORSUBSTR(icd10_code, 1, 3) BETWEEN 'C90' AND 'C97'THEN 1 ELSE 0 END) AS malignant_cancer--恶性肿瘤,包括淋巴瘤和白血病,不包括恶性皮肤肿瘤-- Moderate or severe liver disease, MAX(CASE WHEN SUBSTR(icd9_code, 1, 4) IN ('4560','4561','4562')ORSUBSTR(icd9_code, 1, 4) BETWEEN '5722' AND '5728'ORSUBSTR(icd10_code, 1, 4) IN ('I850','I859','I864','I982','K704','K711','K721','K729','K765','K766','K767')THEN 1 ELSE 0 END) AS severe_liver_disease--中重度肝疾病-- Metastatic solid tumor, MAX(CASE WHEN SUBSTR(icd9_code, 1, 3) IN ('196','197','198','199')OR SUBSTR(icd10_code, 1, 3) IN ('C77','C78','C79','C80')THEN 1 ELSE 0 END) AS metastatic_solid_tumor--转移性实体瘤-- AIDS/HIV, MAX(CASE WHEN SUBSTR(icd9_code, 1, 3) IN ('042','043','044')OR SUBSTR(icd10_code, 1, 3) IN ('B20','B21','B22','B24')THEN 1 ELSE 0 END) AS aidsFROM mimic_core.admissions adLEFT JOIN diagON ad.hadm_id = diag.hadm_idGROUP BY ad.hadm_id
)
, ag AS
(SELECT hadm_id, age, CASE WHEN age <= 40 THEN 0WHEN age <= 50 THEN 1WHEN age <= 60 THEN 2WHEN age <= 70 THEN 3ELSE 4 END AS age_scoreFROM mimic_derived.age
)--以下是主查询
SELECT ad.subject_id, ad.hadm_id, ag.age_score, myocardial_infarct, congestive_heart_failure, peripheral_vascular_disease, cerebrovascular_disease, dementia, chronic_pulmonary_disease, rheumatic_disease, peptic_ulcer_disease, mild_liver_disease, diabetes_without_cc, diabetes_with_cc, paraplegia, renal_disease, malignant_cancer, severe_liver_disease , metastatic_solid_tumor , aids-- Calculate the Charlson Comorbidity Score using the original-- weights from Charlson, 1987., age_score+ myocardial_infarct + congestive_heart_failure + peripheral_vascular_disease+ cerebrovascular_disease + dementia + chronic_pulmonary_disease+ rheumatic_disease + peptic_ulcer_disease+ GREATEST(mild_liver_disease, 3*severe_liver_disease)+ GREATEST(2*diabetes_with_cc, diabetes_without_cc)+ GREATEST(2*malignant_cancer, 6*metastatic_solid_tumor)+ 2*paraplegia + 2*renal_disease + 6*aidsAS charlson_comorbidity_index
FROM mimic_core.admissions ad
LEFT JOIN com--引用临时表
ON ad.hadm_id = com.hadm_id
LEFT JOIN ag--引用临时表
ON com.hadm_id = ag.hadm_id
;
初步分析
import pandas as pd
df=pd.read_csv('charlson_comm.csv')
df.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 523740 entries, 0 to 523739
Data columns (total 21 columns):# Column Non-Null Count Dtype
--- ------ -------------- -----0 subject_id 523740 non-null int641 hadm_id 523740 non-null int642 age_score 523740 non-null int643 myocardial_infarct 523740 non-null int644 congestive_heart_failure 523740 non-null int645 peripheral_vascular_disease 523740 non-null int646 cerebrovascular_disease 523740 non-null int647 dementia 523740 non-null int648 chronic_pulmonary_disease 523740 non-null int649 rheumatic_disease 523740 non-null int6410 peptic_ulcer_disease 523740 non-null int6411 mild_liver_disease 523740 non-null int6412 diabetes_without_cc 523740 non-null int6413 diabetes_with_cc 523740 non-null int6414 paraplegia 523740 non-null int6415 renal_disease 523740 non-null int6416 malignant_cancer 523740 non-null int6417 severe_liver_disease 523740 non-null int6418 metastatic_solid_tumor 523740 non-null int6419 aids 523740 non-null int6420 charlson_comorbidity_index 523740 non-null int64
dtypes: int64(21)
memory usage: 83.9 MB
columns=df.columns[2:-1]
ls_patients_num=[]
for col in df.columns[2:-1]:value=df[col].value_counts()ls_patients_num.append(value)
# print(ls_patients_num)
df2=pd.DataFrame(ls_patients_num)
print(df2[:])0 1 2 3 4
age_score 155046.0 56457.0 84097.0 89482.0 138658.0
myocardial_infarct 487040.0 36700.0 NaN NaN NaN
congestive_heart_failure 455882.0 67858.0 NaN NaN NaN
peripheral_vascular_disease 492644.0 31096.0 NaN NaN NaN
cerebrovascular_disease 494102.0 29638.0 NaN NaN NaN
dementia 511610.0 12130.0 NaN NaN NaN
chronic_pulmonary_disease 439659.0 84081.0 NaN NaN NaN
rheumatic_disease 510475.0 13265.0 NaN NaN NaN
peptic_ulcer_disease 517983.0 5757.0 NaN NaN NaN
mild_liver_disease 489477.0 34263.0 NaN NaN NaN
diabetes_without_cc 440969.0 82771.0 NaN NaN NaN
diabetes_with_cc 487871.0 35869.0 NaN NaN NaN
paraplegia 516222.0 7518.0 NaN NaN NaN
renal_disease 454858.0 68882.0 NaN NaN NaN
malignant_cancer 469985.0 53755.0 NaN NaN NaN
severe_liver_disease 512935.0 10805.0 NaN NaN NaN
metastatic_solid_tumor 501534.0 22206.0 NaN NaN NaN
aids 520348.0 3392.0 NaN NaN NaN
注:其中1代表了患病的人数.
MIMIC-iv数据库官方SQL查询语句标注和初步探索性分析(一)--多种疾病的收录的人数相关推荐
- MySQL数据库高级SQL查询语句(单表查询,多表联合查询)
目录 SQL查询语句 基础查询 条件查询 模糊查询 字段控制查询 排序 聚合函数 分组查询 having子句 limit分页查询 多表连接查询 SQL查询语句 数据查询语言. 数据库执行DQL语句不会 ...
- MIMIC数据库官方SQL查询标注和初步分析--sofa评分(2-19)
说明: 'SOFA'的意思是序贯器官衰竭评分表. 本SQL的目的就是查出计算Sepsis3的相关的资料. 展示了MIMIC查询的一种策略,即在官方SQL的基础上进一步进行查询,本SQL是在SOFA.s ...
- MIMIC-iv官方SQL概念语句标注——mimic_derived模块部分信息
1.mmic-iv与mimic-iii的区别之一就是前者分了3个模块,mimic_icu,mimic_hosp和mimic_core, 而在学习过程中我们还可以观察到另外一个模块mimic_deriv ...
- 数据库低端sql查询语句片段
SELECT * FROM tableSELECT * FROM table WHERE name = '强哥'SELECT * FROM table ORDER BY updateTime DESC
- R语言构建仿真数据库(sqlite)并使用dplyr语法和SQL语法查询数据库、将dplyr语法查询语句翻译为SQL查询语句
R语言构建仿真数据库(sqlite)并使用dplyr语法和SQL语法查询数据库.将dplyr语法查询语句翻译为SQL查询语句 目录
- WordPress 常用数据库SQL查询语句大全
https://www.wpdaxue.com/wordpress-sql.html 在使用WordPress的过程中,我们少不了要对数据库进行修改操作,比如,更换域名.修改附件目录.批量修改文章内容 ...
- (走向DBA[MSSQL篇] - 从SQL语句的角度提高数据库的访问性能)一些SQL查询语句应加上nolock
http://kb.cnblogs.com/page/124787/#s8 最近公司来一个非常虎的DBA,10几年的经验,这里就称之为蔡老师吧,在征得我们蔡老同意的前提下 ,我们来分享一下蔡老给我们带 ...
- [数据库] SQL查询语句表行列转换及一行数据转换成两列
本文主要讲述了SQL查询语句表之间的行列转换,同时也包括如何将一行数据转换成两列数据的方法.子查询的应用.decode函数的用法.希望文章对你有所帮助~ 1.创建数据库表及插入数据 2.子查询统计不同 ...
- MIMIC IV数据库衍生表格配置
文章目录 一.MIMIC IV数据库衍生表格(mimic_derived)简介 二.衍生表格示例 三.衍生表格配置 一.MIMIC IV数据库衍生表格(mimic_derived)简介 MIMIC I ...
最新文章
- TensorFlow基础笔记(11) max_pool2D函数 深度学习
- php heahd,heaheader phpder 详解
- 浅谈创业性公司的发展
- 数据库中字段类型对应的C#中的数据类型
- 第五期 RHCE远程班 12月1日开课(周末班)
- 小米12 Ultra将搭载5倍潜望镜头:自研技术加持 成像相对更好
- python 2 版本中的input() 和 raw_input() 函数的比较
- 消息处理(异步调用OneWay, 双向通讯Duplex)
- 操作系统课设 Nachos 实验一:Nachos 系统的安装与调试
- SSH、myBatis下载地址
- ListView组件的应用
- 王健林:万达体育和传奇影业都要开展资本运作 今年要出成绩
- 齐桓公称霸天下的用人之道
- mysqlError: Can't connect to MySQL server on 'localhost' (10061)
- 如何一键修改CAD图纸底图颜色?
- Python数据可视化-Pyecharts不同的主题风格
- 小程序生成二维码海报
- Matlab数字图像的傅里叶变换(FFT)
- 硅芯思见:SystemVerilog中的packedarray和unpacked array
- 她二本科毕业,拿到阿里年薪40万offer!经验都记录在这几个公众号日记中