说明:
MIMIC的SQL查询确实够复杂,阻碍了很多想利用数据库做研究的人,也让人对已经做出来的研究的确实性产生怀疑 。
今天是几个官方SQL简单基础的查询,循序渐进, 相信能够更好地掌握。

1.SELECT 需要你知道列的名字;FROM需要你知道表格的名字;WHERE 限定查询的范围;GROUPBY 确定索引。表格位于hosp模块中。

-- begin query that extracts the data--本查询的目的是查询两个酶的表达
SELECTMAX(subject_id) AS subject_id--MAX是从多条记录中取一条,最新的。, MAX(hadm_id) AS hadm_id, MAX(charttime) AS charttime, le.specimen_id-- convert from itemid into a meaningful column, MAX(CASE WHEN itemid = 51002 THEN value ELSE NULL END) AS troponin_i--这里的MAX一方面是求最大值,另一方面是多条记录转单条记录的操作, MAX(CASE WHEN itemid = 51003 THEN value ELSE NULL END) AS troponin_t--CASE WHEN 这里是纵向转横向的一个操作,很多地方都见过。, MAX(CASE WHEN itemid = 50911 THEN valuenum ELSE NULL END) AS ck_mb
FROM mimic_hosp.labevents le
WHERE le.itemid IN
(-- 51002, -- Troponin I (troponin-I is not measured in MIMIC-IV)-- 52598, -- Troponin I, point of care, rare/poor quality51003, -- Troponin T50911  -- Creatinine Kinase, MB isoenzyme
)
GROUP BY le.specimen_id
--按照每个样本分组,查询结果是specimen_id的值是唯一的,其他列可能有重复值。
--让我想起索引的操作,一系列的数据中是不是总是要有个作为索引呢?
;

2.另外一个完全相似的查询语句例子,位于hosp模块中, 表格是labevents

SELECTMAX(subject_id) AS subject_id, MAX(hadm_id) AS hadm_id, MAX(charttime) AS charttime, le.specimen_id-- convert from itemid into a meaningful column, MAX(CASE WHEN itemid = 51221 THEN valuenum ELSE NULL END) AS hematocrit, MAX(CASE WHEN itemid = 51222 THEN valuenum ELSE NULL END) AS hemoglobin, MAX(CASE WHEN itemid = 51248 THEN valuenum ELSE NULL END) AS mch, MAX(CASE WHEN itemid = 51249 THEN valuenum ELSE NULL END) AS mchc, MAX(CASE WHEN itemid = 51250 THEN valuenum ELSE NULL END) AS mcv, MAX(CASE WHEN itemid = 51265 THEN valuenum ELSE NULL END) AS platelet, MAX(CASE WHEN itemid = 51279 THEN valuenum ELSE NULL END) AS rbc, MAX(CASE WHEN itemid = 51277 THEN valuenum ELSE NULL END) AS rdw, MAX(CASE WHEN itemid = 52159 THEN valuenum ELSE NULL END) AS rdwsd, MAX(CASE WHEN itemid = 51301 THEN valuenum ELSE NULL END) AS wbc
FROM mimic_hosp.labevents le
--经常用到的表格也就那么几个,hosp模块中的itemid(列)常作为限定条件,
--而valuenum(列)和value(列)常被取出作为分析目标。
WHERE le.itemid IN
(51221, -- hematocrit51222, -- hemoglobin51248, -- MCH51249, -- MCHC51250, -- MCV51265, -- platelets51279, -- RBC51277, -- RDW52159, -- RDW SD51301  -- WBC)
AND valuenum IS NOT NULL--非空,去除缺失值
-- lab values cannot be 0 and cannot be negative
AND valuenum > 0 --去除异常值
GROUP BY le.specimen_id--建立数据索引
;

3.位于icu模块中,表格是chartevents
用到了avg这个函数, 不再是MAX, 所以连续性指标, 什么指标求平均,什么指标求MAX, 恐怕要根据医学知识来判断。分类的指标一般用MAX

-- This query pivots the vital signs for the entire patient stay.
-- Vital signs include heart rate, blood pressure, respiration rate, and temperature 这里是生命体征的测量, 求的是平均值
selectce.subject_id, ce.stay_id, ce.charttime, AVG(case when itemid in (220045) and valuenum > 0 and valuenum < 300 then valuenum else null end) as heart_rate, AVG(case when itemid in (220179,220050) and valuenum > 0 and valuenum < 400 then valuenum else null end) as sbp, AVG(case when itemid in (220180,220051) and valuenum > 0 and valuenum < 300 then valuenum else null end) as dbp, AVG(case when itemid in (220052,220181,225312) and valuenum > 0 and valuenum < 300 then valuenum else null end) as mbp, AVG(case when itemid = 220179 and valuenum > 0 and valuenum < 400 then valuenum else null end) as sbp_ni, AVG(case when itemid = 220180 and valuenum > 0 and valuenum < 300 then valuenum else null end) as dbp_ni, AVG(case when itemid = 220181 and valuenum > 0 and valuenum < 300 then valuenum else null end) as mbp_ni, AVG(case when itemid in (220210,224690) and valuenum > 0 and valuenum < 70 then valuenum else null end) as resp_rate, ROUND(AVG(case when itemid in (223761) and valuenum > 70 and valuenum < 120 then (valuenum-32)/1.8 -- converted to degC in valuenum callwhen itemid in (223762) and valuenum > 10 and valuenum < 50  then valuenum else null end), 2) as temperature, MAX(CASE WHEN itemid = 224642 THEN value ELSE NULL END) AS temperature_site--分类, AVG(case when itemid in (220277) and valuenum > 0 and valuenum <= 100 then valuenum else null end) as spo2--SELECT内去除异常值, AVG(case when itemid in (225664,220621,226537) and valuenum > 0 then valuenum else null end) as glucoseFROM mimic_icu.chartevents ce --chawhere ce.stay_id IS NOT NULLand ce.itemid in(220045, -- Heart Rate225309, -- ART BP Systolic225310, -- ART BP Diastolic225312, -- ART BP Mean220050, -- Arterial Blood Pressure systolic220051, -- Arterial Blood Pressure diastolic220052, -- Arterial Blood Pressure mean220179, -- Non Invasive Blood Pressure systolic220180, -- Non Invasive Blood Pressure diastolic220181, -- Non Invasive Blood Pressure mean220210, -- Respiratory Rate224690, -- Respiratory Rate (Total)220277, -- SPO2, peripheral-- GLUCOSE, both lab and fingerstick225664, -- Glucose finger stick220621, -- Glucose (serum)226537, -- Glucose (whole blood)-- TEMPERATURE223762, -- "Temperature Celsius"223761,  -- "Temperature Fahrenheit"224642 -- Temperature Site-- 226329 -- Blood Temperature CCO (C)
)--这些编码能不能用在我们自建的数据库中?
group by ce.subject_id, ce.stay_id, ce.charttime
;

SQL如何计算公式,临时表的使用,

-- This query extracts the serum creatinine baselines of adult patients on each hospital admission.
-- The baseline is determined by the following rules:
--     i. if the lowest creatinine value during this admission is normal (<1.1), then use the value
--     ii. if the patient is diagnosed with chronic kidney disease (CKD), then use the lowest creatinine value during the admission, although it may be rather large.
--     iii. Otherwise, we estimate the baseline using the Simplified MDRD Formula:
--          eGFR = 186 × Scr^(-1.154) × Age^(-0.203) × 0.742Female
--     Let eGFR = 75. Scr = [ 75 / 186 / Age^(-0.203) / (0.742Female) ] ^ (1/-1.154)
WITH p as
(SELECT ag.subject_id, ag.hadm_id, ag.age, p.gender, CASE WHEN p.gender='F' THEN POWER(75.0 / 186.0 / POWER(ag.age, -0.203) / 0.742, -1/1.154)ELSE POWER(75.0 / 186.0 / POWER(ag.age, -0.203), -1/1.154)END AS MDRD_estFROM `physionet-data.mimic_derived.age` agLEFT JOIN `physionet-data.mimic_core.patients` pON ag.subject_id = p.subject_idWHERE ag.age >= 18
)
, lab as
(SELECT hadm_id, MIN(creatinine) AS scr_minFROM `physionet-data.mimic_derived.chemistry`GROUP BY hadm_id
)
, ckd as --建立一个临时表, 用诊断限定查询的范围
(SELECT hadm_id, MAX(1) AS CKD_flagFROM mimic_hosp.diagnoses_icd --另外一个常用的表格,其中的icd_code和icd_version常结合使用。WHERE (SUBSTR(icd_code, 1, 3) = '585'--String数据的处理AND icd_version = 9)OR (SUBSTR(icd_code, 1, 3) = 'N18'AND icd_version = 10)GROUP BY 1
)
--主查询的功能是罗列查询的列,连接各个表格,较复杂的都在临时表处理好
SELECT p.hadm_id, p.gender, p.age, lab.scr_min, COALESCE(ckd.ckd_flag, 0) AS ckd, p.MDRD_est, CASE WHEN lab.scr_min<=1.1 THEN scr_minWHEN ckd.ckd_flag=1 THEN scr_minELSE MDRD_est END AS scr_baseline
FROM p
LEFT JOIN lab
ON p.hadm_id = lab.hadm_id
LEFT JOIN ckd
ON p.hadm_id = ckd.hadm_id
;

单位转换;
一列中又不同来源的数据(单位不同),分开处理,再合并。
去除超范围的值。
icu模块中重要的表格叫做chartevents,其中的itemid(列)和valuenum是最重要的列。

-- prep height
WITH ht_in AS
(SELECT c.subject_id, c.stay_id, c.charttime-- Ensure that all heights are in centimeters 单位的一个转换,为了后续的合并, ROUND(c.valuenum * 2.54, 2) AS height, c.valuenum as height_origFROM `physionet-data.mimic_icu.chartevents` cWHERE c.valuenum IS NOT NULL-- Height (measured in inches)AND c.itemid = 226707
)
, ht_cm AS--两个临时表用逗号分隔
(SELECT c.subject_id, c.stay_id, c.charttime-- Ensure that all heights are in centimeters, ROUND(c.valuenum, 2) AS heightFROM `physionet-data.mimic_icu.chartevents` cWHERE c.valuenum IS NOT NULL-- Height cmAND c.itemid = 226730
)
-- merge cm/height, only take 1 value per charted row
, ht_stg0 AS
(SELECTCOALESCE(h1.subject_id, h1.subject_id) as subject_id--coalesce()的英文意思是合并的作用是:返回传入的参数中第一个非null的值, COALESCE(h1.stay_id, h1.stay_id) AS stay_id, COALESCE(h1.charttime, h1.charttime) AS charttime, COALESCE(h1.height, h2.height) as heightFROM ht_cm h1FULL OUTER JOIN ht_in h2ON h1.subject_id = h2.subject_idAND h1.charttime = h2.charttime
)
--主查询
SELECT subject_id, stay_id, charttime, height
FROM ht_stg0
WHERE height IS NOT NULL--去掉缺失值
-- filter out bad heights 过滤有可能出错的数据
AND height > 120 AND height < 230;
selectstay_id, charttime, sum(urineoutput) as urineoutput
from
(select-- patient identifiersoe.stay_id, oe.charttime-- volumes associated with urine output ITEMIDs-- note we consider input of GU irrigant as a negative volume-- GU irrigant volume in usually has a corresponding volume out-- so the net is often 0, despite large irrigant volumes, casewhen oe.itemid = 227488 and oe.value > 0 then -1*oe.valueelse oe.valueend as urineoutputfrom `physionet-data.mimic_icu.outputevents` oewhere itemid in(226559, -- Foley226560, -- Void226561, -- Condom Cath226584, -- Ileoconduit226563, -- Suprapubic226564, -- R Nephrostomy226565, -- L Nephrostomy226567, -- Straight Cath226557, -- R Ureteral Stent226558, -- L Ureteral Stent227488, -- GU Irrigant Volume In227489  -- GU Irrigant/Urine Volume Out)
) uo
group by stay_id, charttime
;
-- For reference, some common unit conversions:
-- 10^9/L == K/uL == 10^3/uL
WITH blood_diff AS
(
SELECTMAX(subject_id) AS subject_id--分类变量都用MAX, MAX(hadm_id) AS hadm_id, MAX(charttime) AS charttime, le.specimen_id-- create one set of columns for percentages, and one set of columns for counts-- we harmonize all count units into K/uL == 10^9/L-- counts have an "_abs" suffix, percentages do not-- absolute counts, MAX(CASE WHEN itemid in (51300, 51301, 51755) THEN valuenum ELSE NULL END) AS wbc, MAX(CASE WHEN itemid = 52069 THEN valuenum ELSE NULL END) AS basophils_abs-- 52073 in K/uL, 51199 in #/uL, MAX(CASE WHEN itemid = 52073 THEN valuenum WHEN itemid = 51199 THEN valuenum / 1000.0 ELSE NULL END) AS eosinophils_abs-- 51133 in K/uL, 52769 in #/uL, MAX(CASE WHEN itemid = 51133 THEN valuenum WHEN itemid = 52769 THEN valuenum / 1000.0 ELSE NULL END) AS lymphocytes_abs-- 52074 in K/uL, 51253 in #/uL, MAX(CASE WHEN itemid = 52074 THEN valuenum WHEN itemid = 51253 THEN valuenum / 1000.0 ELSE NULL END) AS monocytes_abs, MAX(CASE WHEN itemid = 52075 THEN valuenum ELSE NULL END) AS neutrophils_abs-- convert from #/uL to K/uL, MAX(CASE WHEN itemid = 51218 THEN valuenum / 1000.0 ELSE NULL END) AS granulocytes_abs-- percentages, equal to cell count / white blood cell count, MAX(CASE WHEN itemid = 51146 THEN valuenum ELSE NULL END) AS basophils, MAX(CASE WHEN itemid = 51200 THEN valuenum ELSE NULL END) AS eosinophils, MAX(CASE WHEN itemid in (51244, 51245) THEN valuenum ELSE NULL END) AS lymphocytes, MAX(CASE WHEN itemid = 51254 THEN valuenum ELSE NULL END) AS monocytes, MAX(CASE WHEN itemid = 51256 THEN valuenum ELSE NULL END) AS neutrophils-- other cell count percentages, MAX(CASE WHEN itemid = 51143 THEN valuenum ELSE NULL END) AS atypical_lymphocytes, MAX(CASE WHEN itemid = 51144 THEN valuenum ELSE NULL END) AS bands, MAX(CASE WHEN itemid = 52135 THEN valuenum ELSE NULL END) AS immature_granulocytes, MAX(CASE WHEN itemid = 51251 THEN valuenum ELSE NULL END) AS metamyelocytes, MAX(CASE WHEN itemid = 51257 THEN valuenum ELSE NULL END) AS nrbc-- utility flags which determine whether imputation is possible, CASE-- WBC is availableWHEN MAX(CASE WHEN itemid in (51300, 51301, 51755) THEN valuenum ELSE NULL END) > 0-- and we have at least one percentage from the diff-- sometimes the entire diff is 0%, which looks like bad dataAND SUM(CASE WHEN itemid IN (51146, 51200, 51244, 51245, 51254, 51256) THEN valuenum ELSE NULL END) > 0THEN 1 ELSE 0 END AS impute_absFROM mimic_hosp.labevents le
WHERE le.itemid IN
(51146, -- basophils52069, -- Absolute basophil count51199, -- Eosinophil Count51200, -- Eosinophils52073, -- Absolute Eosinophil count51244, -- Lymphocytes51245, -- Lymphocytes, Percent51133, -- Absolute Lymphocyte Count52769, -- Absolute Lymphocyte Count51253, -- Monocyte Count51254, -- Monocytes52074, -- Absolute Monocyte Count51256, -- Neutrophils52075, -- Absolute Neutrophil Count51143, -- Atypical lymphocytes51144, -- Bands (%)51218, -- Granulocyte Count52135, -- Immature granulocytes (%)51251, -- Metamyelocytes51257,  -- Nucleated Red Cells-- wbc totals measured in K/uL51300, 51301, 51755-- 52220 (wbcp) is percentage-- below are point of care tests which are extremely infrequent and usually low quality-- 51697, -- Neutrophils (mmol/L)-- below itemid do not have data as of MIMIC-IV v1.0-- 51536, -- Absolute Lymphocyte Count-- 51537, -- Absolute Neutrophil-- 51690, -- Lymphocytes-- 52151, -- NRBC
)
AND valuenum IS NOT NULL
-- differential values cannot be negative
AND valuenum >= 0
GROUP BY le.specimen_id
)
--以下是主查询
SELECT
subject_id, hadm_id, charttime, specimen_id, wbc
-- impute absolute count if percentage & WBC is available
, ROUND(CASEWHEN basophils_abs IS NULL AND basophils IS NOT NULL AND impute_abs = 1THEN basophils * wbcELSE basophils_abs
END, 4) AS basophils_abs
, ROUND(CASEWHEN eosinophils_abs IS NULL AND eosinophils IS NOT NULL AND impute_abs = 1THEN eosinophils * wbcELSE eosinophils_abs
END, 4) AS eosinophils_abs
, ROUND(CASEWHEN lymphocytes_abs IS NULL AND lymphocytes IS NOT NULL AND impute_abs = 1THEN lymphocytes * wbcELSE lymphocytes_abs
END, 4) AS lymphocytes_abs
, ROUND(CASEWHEN monocytes_abs IS NULL AND monocytes IS NOT NULL AND impute_abs = 1THEN monocytes * wbcELSE monocytes_abs
END, 4) AS monocytes_abs
, ROUND(CASEWHEN neutrophils_abs IS NULL AND neutrophils IS NOT NULL AND impute_abs = 1THEN neutrophils * wbcELSE neutrophils_abs
END, 4) AS neutrophils_abs, basophils
, eosinophils
, lymphocytes
, monocytes
, neutrophils-- impute bands/blasts?
, atypical_lymphocytes
, bands
, immature_granulocytes
, metamyelocytes
, nrbc
FROM blood_diff
;

MIMIC-iv官方SQL查询标注(简单基础篇)相关推荐

  1. MIMIC数据库官方SQL查询标注和初步分析--sofa评分(2-19)

    说明: 'SOFA'的意思是序贯器官衰竭评分表. 本SQL的目的就是查出计算Sepsis3的相关的资料. 展示了MIMIC查询的一种策略,即在官方SQL的基础上进一步进行查询,本SQL是在SOFA.s ...

  2. 剖析SQL Server 2005查询通知之基础篇

    剖析SQL Server 2005查询通知之基础篇 注:这是本人转载的文章. 摘要 在本系列文章中,我们将深入探讨如何把.NET 2.0和SQL Server 2005的查询通知特征联合起来,以便通知 ...

  3. SQL常用语句(基础篇)

    SQL常用语句(基础篇) 说明:创建数据库 CREATE DATABASE database-name 说明:删除数据库 drop database dbname 说明:备份sql server -创 ...

  4. MIMIC-iv数据库官方SQL查询语句标注和初步探索性分析(一)--多种疾病的收录的人数

    说明: MIMIC 是知名的临床数据库,库的结构和SQL查询的数据的方法是很好的一个建立临床数据库的模板.这个官方的SQL为了进行charlson 评分,查询了多种慢性疾病.SQL代码后有对查询到的结 ...

  5. SQL注入葵花宝典(基础篇)

    基础篇重在一些基本的概念和步骤 高级篇重在于一些技巧 许多刚入门的小菜们可能会经常听到SQL注入这个词汇, 但更多人的可能对SQL并不怎么了解,只是感觉是一种 很神秘的******手段,所以我就想系统 ...

  6. 如何以正确地姿势AK SQL查询50题(精华篇)

    前言------ AK: ALL Killed 这学期学数据库系统概论,作为一个初学者,必须先熟练掌握SQL的基本查询语句,在完成了老师的SQL50题之后,我把我自己地AK姿势记录下来, 便于回忆之余 ...

  7. mysql的经典sql语句大全_经典SQL语句大全_基础篇_提升篇_技巧篇_开发经典篇

    一.基础 1.说明:创建数据库 CREATE DATABASE database-name 2.说明:删除数据库 drop database dbname 3.说明:备份sql server --- ...

  8. mysql sql注入工具_基础篇——SQL注入(工具注入)

    注入工具 上一篇介绍了SQL注入漏洞以及一些手工注入方法,本篇介绍一下注入工具 1.sqlmap sqlmap属于比较经典的一个注入工具,功能强大,还自带一些绕过参数 使用方法: sqlmap.py ...

  9. MS SQL Server 数据库(基础篇)

    SQL Server数据库 1.0 数据库及数据库系统 1.1数据库是什么 1.2 数据库系统(DBS) 1.3 数据库管理系统(DBMS) 1.4 数据库的作用 1.5 应用程序 1.6 注 2.0 ...

  10. sql查询文章上一篇下一篇

    遇到显示文章时显示上一篇.下一篇的id.title提示需求,经常就是取到当前文章,然后再根据发布时间分别取到上一篇.下一篇文章,然是这样要交互三次数据库,想写一个sql一次执行出来,原来写的是 SEL ...

最新文章

  1. 解决虚拟机 Virtualbox 中 linux 共享文件夹只读问题
  2. python walk_Python os.walk()方法
  3. linux配置静态ip
  4. 你们公司内部有WiKi么
  5. 基于Redis优化首页轮播图查询
  6. 如何校验内存数据的一致性,DynamicExpresso 算是帮上大忙了
  7. 在.Net Core WebAPI下给Swagger增加导出离线文档功能
  8. 拿高薪的web前端都是怎么学的?
  9. Atitit.mysql oracle with as模式临时表模式 CTE 语句的使用,减少子查询的结构性 mssql sql server..
  10. 从AWS到阿里云: 产品体系差异分析 | 凌云时刻
  11. 中巨伟业加密芯片—SMEC98SP (SE98)在产品授权生产防复制的应用方法
  12. 修复SSH Weak Algorithms Supported漏洞
  13. CentOS下安装EDM工具
  14. 考研英语 词根词缀单词71-80
  15. 清除APP 数据的时候出现Crash的情况分析
  16. Qt实现 QOpenGL绘制彩色三角形
  17. android 6.0 拨号界面,【Dialer】android6.0拨号界面分析一
  18. 火星惊现「神秘入口」?好奇号发现的这道门,玉兔二号看着挺眼熟
  19. android 实现视频播放功能,android开发之简单视频播放器(VideoView)
  20. Python制作简易计算器(GUI)---Tkinter

热门文章

  1. 扔掉代码,程序员月薪达到了10k+
  2. codeforces竞赛1141题解
  3. application/json和application/x-www-form-urlencoded使用选择
  4. python 自动登录网站_Python使用selenium实现网页用户名 密码 验证码自动登录功能...
  5. 增强无线路由器信号的方法与技巧
  6. PC端微信的文件夹都是些什么
  7. 做产品也要造概念,讲故事,用优雅的措辞美化自己
  8. linux给目录分配空间,Linux 分配/home的磁盘空间给根目录
  9. 求告知pycharm的这条边界线怎么设置
  10. TI AWR1642学习笔记7 TSW1400采集原始数据