Storm Event Imagery Dataset 数据简介
The Storm EVent ImagRy (SEVIR) dataset 是由时空匹配过的卫星(GOES-16)和雷达数据( NEXRAD weather radars)(包含天气现象时的),注册AWS可以下载。
下载地址: https://registry.opendata.aws/sevir/
教程 Tutorial: https://nbviewer.jupyter.org/github/MIT-AI-Accelerator/eie-sevir/blob/master/examples/SEVIR_Tutorial.ipynb
SEVIR Challenges (BETA): https://sevir.mit.edu
数据格式为: 4 hours 数据块, 5分钟间隔,空间范围: 384 km x 384 ,美国上空是5种探测器
五种探测器如下:
主要是GOES16的三个通道图,以及其上的GLM flashes(闪电)监测。以及地基radar数据,除了lght以外都是 raster (image) types,数据为L x L x 49,L为数据尺寸,49为时间维度(240/5+1=49)
lght 格式为NX5矩阵,包含时间,位置 等
NOAA风暴事件数据库
SEVIR中的许多事件都可以使用国家气象局(NS)收集的数据进行增强。大约20%的SEVIR例是根据美国国家气象局(NWS)标记的“风暴事件”进行选择的,
这些事件在这里存档:https://www.ncdc.noaa.gov/stormevents。此数据库包括额外的信息,包括:恶劣天气的类型(如大风、番茄、冰雹等)。
例如,下面是与图1所示事件相关的NWS风暴事件数据库的一个条目。点击
文本里有时间位置等信息描述
每一个事件在SEVIR中都有一个id,id以S开头与NWS风暴事件相关。R开头则是随机选取的与NWS无关
Downloading SEVIR
如果有亚马逊云,是直接使用:
aws s3 sync --no-sign-request s3://sevir .
下载指定 modalitiy,例如 vil
,执行
aws s3 cp --no-sign-request s3://sevir/CATALOG.csv CATALOG.csv
aws s3 sync --no-sign-request s3://sevir/data/vil .
数据描述:
SEVIR 包含以下两部分:
- Catalog: A CSV file with rows describing the metadata of an event
- Data Files: A set of HDF5 files containing events for a certain sensor type
注意:在SEVIR可能会有额外的风暴事件id与4小时事件重叠。目录将只提供一个这样的id。通过将风暴事件的时间和地点与SEVIR事件的元数据进行比较,可以找到其他的数据。
SEVIR所有的投影(目前)都是Lambert Azmuthial等面积。如果将来的数据使用不同的投影。
H5文件结构
data
+-- img_type
| +-- year
文件名
SEVIR_<IMG_TYPE>_<EVENT_TYPE>_<YEAR>_<START>_<END>.h5
其中
<TYPE>
是图像or 他探测器 (vil
, vis
, ir069
, etc..),
<EVENT_TYPE>
is either STORMEVENTS
, if the events are linked with NWS storm events, RANDOMEVENTS
, if the events are randomly sampled, or ALLEVENTS
if the two types are mixed into the same file.
<YEAR>_<START>_<END>
Denotes the year, start date and end date of the events in the file.
Each file contains the id
's of the events, as well as the event data. For example a summary of one such file is:
$ h5ls SEVIR_VIL_STORMEVENTS_2018_0101_0630.h5 id Dataset {793} vil Dataset {793, 384, 384, 49}
All raster types in SEVIR (vis
, ir069
, ir107
& vil
) store the data 4D tensors with shapes N x L x W x T
, where N
is the number of events in the file, LxL
is the image size (see patch size column in Table 1), and T
is the number of time steps in the image sequence.
Lightning Data
数据
本节包括处理SEVIR数据的代码示例。SEVIR数据可以从本地存储或通过Amazon S3云存储下载和访问。本教程将假定SEVIR数据已下载到本地或网络文件系统。下面的例子需要设置以下变量:
# Enter path to the SEVIR data location DATA_PATH = '../data' CATALOG_PATH = '../CATALOG.csv' # On some Linux systems setting file locking to false is also necessary: import os os.environ["HDF5_USE_FILE_LOCKING"]='FALSE'
下面的代码展示了如何直接打开H5数据文件,而不用使用目录来可视化使用h5py模块捕获的单个类型事件的帧:
import os
import h5py # needs conda/pip install h5py
import matplotlib.pyplot as pltfile_index = 0
with h5py.File('%s/vil/2018/SEVIR_VIL_STORMEVENTS_2018_0101_0630.h5' % DATA_PATH,'r') as hf:event_id = hf['id'][file_index]vil = hf['vil'][file_index] print('Event ID:',event_id)
print('Image shape:',vil.shape)fig,axs=plt.subplots(1,4,figsize=(10,5))
axs[0].imshow(vil[:,:,10])
axs[1].imshow(vil[:,:,20])
axs[2].imshow(vil[:,:,30])
axs[3].imshow(vil[:,:,40])
plt.show()
Event ID: b'S728503' Image shape: (384, 384, 49)
下面的代码展示了如何提取由四种图像类型(vis、vil、ir069和ir107)观察到的事件
import h5py # needs conda/pip install h5py
import matplotlib.pyplot as plt
import pandas as pd # Read catalog
catalog = pd.read_csv(CATALOG_PATH,parse_dates=['time_utc'],low_memory=False)# Desired image types
img_types = set(['vis','ir069','ir107','vil'])# Group by event id, and filter to only events that have all desired img_types
events = catalog.groupby('id').filter(lambda x: img_types.issubset(set(x['img_type']))).groupby('id')
event_ids = list(events.groups.keys())
print('Found %d events matching' % len(event_ids),img_types)# Grab a sample event and view catalog entries
sample_event = events.get_group( event_ids[-1])
print('Sample Event:',event_ids[-1])
sample_event
Found 12739 events matching {'ir069', 'ir107', 'vis', 'vil'}
Sample Event: S858968
这个数据帧包含SEVIR中对应于列出的id的每种图像类型的行。下面的代码使用sample_event的行来提取每种类型的图像数据:
def read_data( sample_event, img_type, data_path=DATA_PATH ):"""Reads single SEVIR event for a given image type.Parameters----------sample_event pd.DataFrameSEVIR catalog rows matching a single IDimg_type strSEVIR image typedata_path strLocation of SEVIR dataReturns-------np.arrayLxLx49 tensor containing event data"""fn = sample_event[sample_event.img_type==img_type].squeeze().file_namefi = sample_event[sample_event.img_type==img_type].squeeze().file_indexwith h5py.File(data_path + '/' + fn,'r') as hf:data=hf[img_type][fi] return datavis = read_data(sample_event, 'vis')
ir069 = read_data(sample_event, 'ir069')
ir107 = read_data(sample_event, 'ir107')
vil = read_data(sample_event, 'vil')# plot a frame from each img_type
fig,axs = plt.subplots(1,4,figsize=(10,5))
frame_idx = 30
axs[0].imshow(vis[:,:,frame_idx]), axs[0].set_title('VIS')
axs[1].imshow(ir069[:,:,frame_idx]), axs[1].set_title('IR 6.9')
axs[2].imshow(ir107[:,:,frame_idx]), axs[2].set_title('IR 10.7')
axs[3].imshow(vil[:,:,frame_idx]), axs[3].set_title('VIL')
(<matplotlib.image.AxesImage at 0x7f6a11ab79e8>, Text(0.5, 1.0, 'VIL'))
、
Lightning
为了将闪电包含在的数据中,我们首先需要将其转换为栅格类型,以便将其与其他图像类型一起显示。下面的代码通过将flash数据转换为每5分钟帧中每像素的flash计数来实现这一点:
import numpy as np
def lght_to_grid(data):"""Converts SEVIR lightning data stored in Nx5 matrix to an LxLx49 tensor representingflash counts per pixel per frameParameters----------data np.arraySEVIR lightning event (Nx5 matrix)Returns-------np.array LxLx49 tensor containing pixel counts"""FRAME_TIMES = np.arange(-120.0,125.0,5) * 60 # in secondsout_size = (48,48,len(FRAME_TIMES))if data.shape[0]==0:return np.zeros(out_size,dtype=np.float32)# filter out points outside the gridx,y=data[:,3],data[:,4]m=np.logical_and.reduce( [x>=0,x<out_size[0],y>=0,y<out_size[1]] )data=data[m,:]if data.shape[0]==0:return np.zeros(out_size,dtype=np.float32)# Filter/separate times# compute z coodinate based on bin locaiton timest=data[:,0]z=np.digitize(t,FRAME_TIMES)-1z[z==-1]=0 # special case: frame 0 uses lght from frame 1x=data[:,3].astype(np.int64)y=data[:,4].astype(np.int64)k=np.ravel_multi_index(np.array([y,x,z]),out_size)n = np.bincount(k,minlength=np.prod(out_size))return np.reshape(n,out_size).astype(np.float32)def read_lght_data( sample_event, data_path=DATA_PATH ):"""Reads lght data from SEVIR and maps flash counts onto a grid Parameters----------sample_event pd.DataFrameSEVIR catalog rows matching a single IDdata_path strLocation of SEVIR dataReturns-------np.array LxLx49 tensor containing pixel counts for selected event"""fn = sample_event[sample_event.img_type=='lght'].squeeze().file_nameid = sample_event[sample_event.img_type=='lght'].squeeze().idwith h5py.File(data_path + '/' + fn,'r') as hf:data = hf[id][:] return lght_to_grid(data)lght = read_lght_data(sample_event)# include lightning counts in plot
fig,axs = plt.subplots(1,5,figsize=(14,5))
frame_idx = 30
axs[0].imshow(vis[:,:,frame_idx]), axs[0].set_title('VIS')
axs[1].imshow(ir069[:,:,frame_idx]), axs[1].set_title('IR 6.9')
axs[2].imshow(ir107[:,:,frame_idx]), axs[2].set_title('IR 10.7')
axs[3].imshow(vil[:,:,frame_idx]), axs[3].set_title('VIL')
axs[4].imshow(lght[:,:,frame_idx]), axs[4].set_title('Lightning')
<matplotlib.image.AxesImage at 0x7f69e00c8470>, Text(0.5, 1.0, 'Lightning'))
示例使用默认的colormap来显示数据。要显示类似于的彩色图像,颜色映射可在位于https://github.com/MIT-AI-Accelerator/eie-sevir的repo中获得。
import sys
sys.path.append('../') # add sevir module to path
from sevir.display import get_cmap
# Get colormaps for encoded types
vis_cmap,vis_norm,vis_vmin,vis_vmax = get_cmap('vis',encoded=True)
ir069_cmap,ir069_norm,ir069_vmin,ir069_vmax = get_cmap('ir069',encoded=True)
ir107_cmap,ir107_norm,ir107_vmin,ir107_vmax = get_cmap('ir107',encoded=True)
vil_cmap,vil_norm,vil_vmin,vil_vmax = get_cmap('vil',encoded=True)
lght_cmap,lght_norm,lght_vmin,lght_vmax = get_cmap('lght',encoded=True)fig,axs = plt.subplots(1,5,figsize=(14,5))
frame_idx = 30
axs[0].imshow(vis[:,:,frame_idx],cmap=vis_cmap,norm=vis_norm,vmin=vis_vmin,vmax=vis_vmax), axs[0].set_title('VIS')
axs[1].imshow(ir069[:,:,frame_idx],cmap=ir069_cmap,norm=ir069_norm,vmin=ir069_vmin,vmax=ir069_vmax), axs[1].set_title('IR 6.9')
axs[2].imshow(ir107[:,:,frame_idx],cmap=ir107_cmap,norm=ir107_norm,vmin=ir107_vmin,vmax=ir107_vmax), axs[2].set_title('IR 10.7')
axs[3].imshow(vil[:,:,frame_idx],cmap=vil_cmap,norm=vil_norm,vmin=vil_vmin,vmax=vil_vmax), axs[3].set_title('VIL')
axs[4].imshow(lght[:,:,frame_idx],cmap=lght_cmap,norm=lght_norm,vmin=lght_vmin,vmax=lght_vmax), axs[4].set_title('Lightning')
matplotlib.image.AxesImage at 0x7f6a12427208>, Text(0.5, 1.0, 'Lightning'))
运用SEVIR事件
在许多应用程序中,有必要知道SEVIR补丁中像素的纬度/经度位置。目录提供了执行数据地理引用所需的信息。
目前,SEVIR中的所有补丁都使用相同的地图投影,一个以(纬度,lon)=(38,-98)为中心的兰伯特等面积投影。这个投影细节存储在proj列中:
sample_event.proj.values[0]
import re
import numpy as np
class LaeaProjection():"""Translates between pixel coordinates (X,Y) and Lat lon coordinates for a given SEVIR eventprojected in laea.Parameters----------event pd.SeriesOne row from SEVIR catalog"""def __init__(self,event):self.proj = event.proj self.lat0 = float(re.compile('\+lat_0=([+-]?\d+)').search(self.proj).groups()[0])self.lon0 = float(re.compile('\+lon_0=([+-]?\d+)').search(self.proj).groups()[0])self.R = float(re.compile('\+a=(\d+)').search(self.proj).groups()[0])self.llcrnlat = event.llcrnrlatself.llcrnlon = event.llcrnrlonself.refX, self.refY = self.forward(self.llcrnlon,self.llcrnlat,pixel=False)self.binX = event.width_m / event.size_xself.binY = event.height_m / event.size_ydef forward(self,lon,lat,pixel=True):"""Maps lat/lon to pixel x,y. For projection coordinates instead of pixel, set pixel=False."""sind = lambda t: np.sin(t*np.pi/180)cosd = lambda t: np.cos(t*np.pi/180)k = self.R * np.sqrt(2/(1+sind(self.lat0)*sind(lat)+cosd(self.lat0)*cosd(lat)*cosd(lon-self.lon0)))x = k*cosd(lat)*sind(lon-self.lon0)y = k*(cosd(self.lat0)*sind(lat) - sind(self.lat0)*cosd(lat)*cosd(lon-self.lon0))if pixel:x = (x-self.refX) / self.binXy = (y-self.refY) / self.binYreturn x,ydef inverse(self,x,y,pixel=True):"""Maps pixel coordinates to (lon,lat) position. If passing projection corrdinates, set pixel=False."""if pixel:x = x*self.binX + self.refXy = y*self.binY + self.refYx/=self.Ry/=self.Rsind = lambda t: np.sin(t*np.pi/180)cosd = lambda t: np.cos(t*np.pi/180)rho = np.sqrt(x*x+y*y)c = 2*np.arcsin(0.5*rho)sinc = np.sin(c)cosc = np.cos(c)lat = 180/np.pi*np.arcsin(cosc*sind(self.lat0)+y*sinc*cosd(self.lat0)/rho)lon = self.lon0+180/np.pi*np.arctan(x*sinc/(rho*cosd(self.lat0)*cosc - y*sind(self.lat0)*sinc))return lon,lat
运行一个事件
proj = LaeaProjection( sample_event[sample_event.img_type=='vil'].squeeze() )
X,Y = np.meshgrid(np.arange(vil.shape[0]),np.arange(vil.shape[0]))
lons,lats = proj.inverse(X,Y)# Plot with origin='lower' so up corresponds to north.
fig,ax=plt.subplots(1,3,figsize=(15,5))
ax[0].imshow(vil[:,:,20],origin='lower')
la = ax[1].imshow(lats,origin='lower')
fig.colorbar(la,ax=ax[1])
ax[1].set_title('Pixel Latitudes')
lo = ax[2].imshow(lons,origin='lower')
fig.colorbar(lo,ax=ax[2])
ax[2].set_title('Pixel Longitudes')
Text(0.5, 1.0, 'Pixel Longitudes')
This particular domain overlaps with north eastern Minnesota. We can add markers for particular locations by converting from lat/lon coordinates into the image pixel coordinates:
In [18]:
lat,lon = 46.7867, -92.1005 # Duluth, MN x,y=proj.forward(lon,lat) print('x=%f,y=%f' % (x,y)) # Plot with origin='lower' so up corresponds to north. fig,ax=plt.subplots(1,1,figsize=(5,5)) ax.imshow(vil[:,:,20],origin='lower') ax.plot(x,y,linestyle='none', marker="o", markersize=16, alpha=0.6, c="red") ax.text(x-30,y-30,'Duluth, MN',color='r')
x=200.021527,y=79.057562
Out[18]:
Text(170.02152736199426, 49.057561772525304, 'Duluth, MN')
Use Basemap instead
In [22]:
# Note: Requires basemap module to run import warnings warnings.filterwarnings('ignore') from mpl_toolkits.basemap import Basemap # conda/pip install basemap basemap-data-hires # Create Basemap Object s=sample_event[sample_event.img_type=='vil'].squeeze() fig,ax=plt.subplots(1,1,figsize=(7,7)) m = Basemap(llcrnrlat=s.llcrnrlat, llcrnrlon=s.llcrnrlon,urcrnrlat=s.urcrnrlat,urcrnrlon=s.urcrnrlon,width=s.width_m, height=s.height_m,lat_0=38, lon_0=-98,projection='laea',resolution='i',ax=ax) m.drawstates(color='w') m.drawcoastlines(color='w') m.drawcountries(color='w') m.imshow(vil[:,:,24])lat,lon = 46.7867, -92.1005 # Duluth, MN x,y=m(lon,lat) # will be in projection coordinates m.plot(x,y,linestyle='none', marker="o", markersize=16, alpha=0.6, c="red") ax.text(x-30e3,y-30e3,'Duluth, MN',color='r')
Out[22]:
Text(170021.5273619945, 49057.56177252438, 'Duluth, MN')
- NOAA's Storm Event Database: https://www.ncdc.noaa.gov/stormevents/ftp.jsp
Storm Event Imagery Dataset 数据简介相关推荐
- MEE: A Novel Multilingual Event Extraction Dataset 论文解读
MEE: A Novel Multilingual Event Extraction Dataset paper:[2211.05955] MEE: A Novel Multilingual Even ...
- tfds.load()和tf.data.Dataset的简介
tfds.load()和tf.data.Dataset的简介 tfds.load()有以下参数 tfds.load(name, split=None, data_dir=None, batch_siz ...
- 大数据简介,技术体系分类整理
简介:大数据是指无法在一定时间范围内用常规软件工具进行捕捉.管理和处理的数据集合,是需要新处理模式才能具有更强的决策力.洞察发现力和流程优化能力的海量.高增长率和多样化的信息资产.大数据技术则主要用来 ...
- 【Storm】【一】简介
介绍 1.1 简介 Storm 是 Apache 旗下免费开源的分布式实时计算框架.Storm可以轻松.可靠地处理无限数据流,对实时分析.在线机器学习.连续计算.分布式RPC,ETL等提供高效.可靠的 ...
- [VQA文献阅读] FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding
背景 文章题目:<FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding> ...
- Suomi NPP VIIRS夜间灯光遥感数据简介与下载(一)——数据介绍,FTP下载与hdf5读取
失踪人口回归+爷青回,本篇介绍下Suomi NPP VIIRS夜间灯光遥感数据下载. 文章目录 1 夜间灯光遥感数据简介 2 夜间灯光遥感数据下载 1 年和月尺度产品下载 2 日尺度产品下载 3 NP ...
- Re15:读论文 LEVEN: A Large-Scale Chinese Legal Event Detection Dataset
诸神缄默不语-个人CSDN博文目录 论文名称:LEVEN: A Large-Scale Chinese Legal Event Detection Dataset 本文是2022年ACL论文,作者来自 ...
- 皮肤电数据简介及预处理指南
皮肤电数据简介及预处理指南 Part1 相关概念详细解释 Part2 皮肤电数据介绍 2.1 皮肤电导水平(Skin Conductance Level,SCL) 2.2 皮肤电导反应(Skin Co ...
- Storm-源码分析-EventManager (backtype.storm.event)
Protocol and DataType 大体结构, 定义protocol EventManager, 其实就是定义interface 函数event-manager, 主要做2件事 1. 启动ev ...
最新文章
- lombox的用法(省去了set/get/NoArgsConstructor/AllArgsConstructor)
- 国内最牛的Java面试八股,真offer收割机!
- 实体类是个什么东西!答金色海洋同学的疑问
- python基因差异分析_差异基因
- Vue 组件中 移动 this.$el 的注意事项
- 从中心走向边缘——深度解析云原生边缘计算落地痛点
- springboot导入项目依赖报错_最详细的 Spring Boot 多模块开发与排坑指南
- vue中进度条写法_vue实现简单loading进度条
- WebSocket 入门及开源库
- 35. Element title 属性
- 抗滑桩初始弹性系数计算_理正岩土常见问题解答(全)
- 云服务器安装不了声卡,驱动精灵怎么安装声卡 声卡驱动安装方法
- Dynamics CRM 365零基础入门学习(一)Dynamics介绍以及开发工具配置
- matlab模拟塞曼图谱,基于MATLAB的塞曼效应数值模拟
- 凸包算法-------Graham扫描法
- 用python判断身份证号性别_验证身份证号的Python脚本
- codeforce比赛规则介绍(转)
- 虚拟机与虚拟主机 区别
- 阿里面试现场实录,三轮技术面+HR面总结归纳,属于包教包会的面经了
- oracle中 sql%rowcount 用法
热门文章
- python识别人脸的年龄和性别_人脸识别是如何判断性别和年龄的?
- 项目开发中,真的有必要定义VO,BO,PO,DO,DTO这些吗?
- c语言零错误零警告,C语言 g警告:无符号表达式的比较0始终为false
- 杭州电子科技大学全国计算机排名,杭电排名为什么比211还高,杭州电子科技大学是211吗...
- 手机上的计算机黑屏怎么办,手机黑屏是怎么回事,详细教您苹果手机黑屏怎么办...
- 【数学模拟卷总结】2023李林六套卷数学二第三套
- MTFCSGO准心设置
- python画极坐标图_Python matplotlib绘制极坐标图
- mysql 军规_在互联网大厂必须遵守的MySql开发军规
- 面试:CAS算法原理