小型数据库_如果您从事“小型科学”工作，那么您是否正在利用数据存储库？

小型数据库

If you’re a scientist, especially one performing a lot of your research alone, you probably have more than one spreadsheet of important data that you just haven’t gotten around to writing up yet. Maybe you never will. Sitting idle on a hard drive, that “dark data” could prove very useful to someone in the future (or even someone in the present), especially as our climate and society changes.

如果您是一名科学家，尤其是一个人独自进行大量研究，那么您可能拥有多个重要数据电子表格，而这些电子数据您还没有写出来。也许你永远不会。闲置在硬盘上的“黑暗数据”可能对将来的某人(甚至现在的某人)非常有用，尤其是在我们的气候和社会变化的情况下。

What are you going to do with those files? How are you going to preserve them?

您将如何处理这些文件？您将如何保存它们？

If you’re like me, maybe you’ve felt the terror of losing data every time you moved your files to a new computer or moved your research to a new job. Did you remember to back up that spreadsheet from your brilliant pet project from 7 years ago? If you did back it up, are you sure you backed up the most recent version? It’s sobering to imagine other people have gone through this and lost potentially valuable species records, survey data, and field observations.

如果您像我一样，也许您每次将文件移至新计算机或将研究移至新工作时都会感到丢失数据的恐惧。您是否还记得从7年前的出色宠物项目中备份了该电子表格？如果您备份过，是否确定备份了最新版本？想象其他人经历了这一过程并失去了可能有价值的物种记录，调查数据和实地观察结果，这真是令人震惊。

营救的数字数据存储库 (Digital Data Repositories to the Rescue)

In the years before I returned to graduate school, I worked for a science nonprofit on Nantucket Island, Massachusetts, and this problem haunted me all the time. Over nearly a decade there, I accumulated spreadsheets filled with very localized, ecological data, but had no way to organize it, save it, and share it. Fortunately, a solution is emerging in the form of digital repositories backed with robust metadata schemes and indexing services. Importantly, some of these repositories are accessible to everyone, and no university affiliation is required.

回到研究生院的前几年，我在马萨诸塞州楠塔基特岛的一家科学非营利组织工作，这个问题一直困扰着我。在那附近的近十年中，我积累了电子表格，其中包含非常本地化的生态数据，但是却无法组织，保存和共享它。幸运的是，以强大的元数据方案和索引服务为后盾的数字存储库的形式正在出现一种解决方案。重要的是，每个人都可以访问其中一些存储库，并且不需要大学附属机构。

In May 2020, Meghan Mitchell, Christopher Tillman Neal and I launched a digital repository for the Nantucket Biodiversity Initiative (NBI). The repository stores and protects environmental and ecology research data from around Nantucket, but it is focused on projects funded by NBI. Visit the Nantucket Biodiversity Digital Repository and browse through the files to learn about bat counts, spider surveys, sandplain grassland research, and much more.

2020年5月，我和梅根·米切尔 ( Meghan Mitchell) ，克里斯托弗·蒂尔曼·尼尔 ( Christopher Tillman Neal)共同为楠塔基特生物多样性倡议 (NBI)建立了一个数字仓库。该存储库可以存储和保护Nantucket周围的环境和生态研究数据，但它的重点是由NBI资助的项目。访问Nantucket生物多样性数字资料库，浏览文件，以了解蝙蝠数量，蜘蛛调查，滩涂草地研究等更多信息。

A snapping turtle on Nantucket. Over half of Nantucket Island is conservation land and scientific species inventories date back to the late 1800’s. There is a wealth of information that would benefit from being published to a repository. Photo: Andrew Mckenna-Foster

We used Zenodo, a free platform that allows anyone to upload research related files. Zenodo stores the files forever, makes them searchable on the internet, and even gives them a digital object identifier (DOI). However, uploading your files to a repository is the easy part of the solution; to make data useful far into the future, it is crucial to follow the core principles of data publishing and sharing. Uploading data with no context makes it one more piece of junk in the vastness of the internet.

我们使用了Zenodo ，这是一个免费平台，任何人都可以上传与研究相关的文件。 Zenodo永久存储文件，使它们可以在Internet上搜索，甚至为它们提供数字对象标识符(DOI)。但是，将文件上传到存储库是该解决方案的简单部分。为了使数据对将来有用，遵循数据发布和共享的核心原则至关重要。在没有上下文的情况下上传数据会使它在互联网的广阔空间中变得更加垃圾。

记录数据很困难，但是绝对必要 (Documenting Data is Difficult but Absolutely Essential)

Published data should be FAIR: Findable, Accessible, Interoperable, and Reusable. In practice, this means

发布的数据应公平：可查找，可访问，可互操作和可重用。实际上，这意味着

Describing the data with a solid description, useful keywords, and author information (metadata)用可靠的描述，有用的关键字和作者信息(元数据)描述数据
Using a standard metadata scheme so that the information can be easily shared使用标准的元数据方案，以便可以轻松共享信息
Uploading the files in an open format (like CSV)以开放格式(例如CSV)上传文件
Licensing the data so that people and machines will understand how the data can be used.授予数据许可，以便人和机器可以理解如何使用数据。

That is only the bare minimum. While Zenodo and other free repository platforms like figshare and Dataverse simplify this process, it still requires work and planning.

那只是最低限度。虽然Zenodo和其他免费的存储库平台(例如figshare和Dataverse)简化了此过程，但仍需要进行工作和计划。

The meat of our project was working with NBI to create a workflow that curates and applies metadata to all reports and datasets before publication. If you want to set up a repository for yourself or your organization, this is where you should focus most of your energy. We built a documentation site on GitHub that describes the process in detail and is free to copy.

我们项目的重点是与NBI合作创建一个工作流，该工作流在发布之前对所有报表和数据集进行策展并将其应用于元数据。如果您想为自己或您的组织建立存储库，则应在此处集中精力。我们在GitHub上建立了一个文档站点，该站点详细描述了该过程，可以免费复制。

那么，结果是什么？ (So, What are the Outcomes?)

The repository is growing as we curate and upload reports and data going back to 2005. More importantly,

随着我们整理和上载可追溯到2005年的报告和数据，该信息库正在增长。更重要的是，

NBI now has a permanent, accessible, and shareable library of the research it has supported.NBI现在拥有其支持的研究的永久，可访问且可共享的库。
Researchers who work on or near Nantucket now have a way to publish their data and reports.现在，在Nantucket上或附近工作的研究人员可以发布其数据和报告。
People looking for data and information for the area can now browse current and past research. Importantly, they can cite any information they use, giving authors the credit they deserve.正在寻找该地区数据和信息的人们现在可以浏览当前和过去的研究。重要的是，他们可以引用自己使用的任何信息，从而为作者提供应有的信誉。
I can sleep at night knowing the data I spent years collecting has a permanent home.我知道自己花了数年收集的数据拥有永久性住所，因此我可以在晚上入睡。

A summary of the repository as of August 2020. We use Zenodo’s API to harvest metadata from the Nantucket Biodiversity Digital Repository for visualization using Python. These charts are only possible because the workflow we designed controls how keywords are assigned.

As NBI continues to support research and add files to this repository, publishing the raw data, not just a project report, will be especially important. With that data in hand, researchers in 10, 50, or 100 years will be able to reproduce and directly compare data from species surveys, population surveys, and management regimes.

随着NBI继续支持研究并向该存储库添加文件，发布原始数据(而不仅仅是项目报告)将变得尤为重要。有了这些数据，研究人员将能够在10、50或100年内重现并直接比较物种调查，种群调查和管理制度中的数据。

存储库已被使用 (The Repository Is Already Being Used)

The icing on the cake is that since the repository became operational, it has already proven useful: I recently shared a dataset on Nantucket tarantulas with another spider researcher who was looking for a way to cite our observations.

锦上添花的是，自该库投入运行以来，它已被证明是有用的：我最近与另一位蜘蛛研究人员共享了Nantucket tarantulas的数据集，该研究人员正在寻找一种方法来引用我们的观察结果。

I hope you consider publishing your data whenever possible and choose to follow the FAIR principles. The open science community is growing rapidly and offers numerous resources for anyone to get started. I am always open to questions and collaborations so please contact me if you’re interested in working together.

我希望您考虑在任何可能的时候发布数据，并选择遵循FAIR原则。开放式科学界正在Swift发展，并为任何人提供了众多的资源。我总是对问题和合作持开放态度，因此，如果您有兴趣合作，请与我联系。

翻译自: https://medium.com/swlh/if-you-work-in-small-science-are-you-leveraging-data-repositories-357cabfc2326

小型数据库

查看全文

http://www.taodudu.cc/news/show-994965.html

参考文献_参考
数据统计测试方法_统计测试：了解如何为数据选择最佳测试！
每个Power BI开发人员的Power Query提示
a/b测试_如何进行A / B测试？
面向数据科学家的实用统计学_数据科学家必知的统计数据
在Python中有效使用JSON的4个技巧
虚拟主机创建虚拟lan_创建虚拟背景应用
python 传不定量参数_Python中的定量金融
贝叶斯朴素贝叶斯_手动执行贝叶斯分析
GitHub动作简介
照顾好自己才能照顾好别人_您必须照顾的5个基本数据
认识数据分析_认识您的最佳探索数据分析新朋友
arima模型怎么拟合_7个统计测试，用于验证和帮助拟合ARIMA模型
天池幸福感的数据处理_了解幸福感与数据（第1部分）
詹森不等式_注意詹森差距
数据分析师需求分析师_是什么让分析师出色？
猫眼电影评论_电影的人群意见和评论家的意见一样好吗？
ai前沿公司_美术是AI的下一个前沿吗？
mardown 标题带数字_标题中带有数字的故事更成功吗？
使用Pandas 1.1.0进行稳健的2个DataFrames验证
rstudio 关联r_使用关联规则提出建议（R编程）
jquery数据折叠_通过位折叠缩小大数据
决策树信息熵计算_决策树熵|熵计算
流式数据分析_流式大数据分析
数据科学还是计算机科学_数据科学101
js有默认参数的函数加参数_函数参数：默认，关键字和任意
相似邻里算法_纽约市-邻里之战
数据透视表和数据交叉表_数据透视表的数据提取
图像处理傅里叶变换图像变化_傅里叶变换和图像床单视图。
滞后分析rstudio_使用RStudio进行A / B测试分析