abcd选项后的数据分析

In 2020, it’s simply not enough to collect data about your company to be “data-driven”; to stay relevant, you must also know how to apply it. Underlying this evolution from gut-based decision making to data-driven analytics is a critical need to reason about this data intelligently across your business.

在2020年,仅收集有关公司的数据以“数据驱动”是远远不够的。 为了保持相关性,您还必须知道如何应用它。 从基于肠道的决策到数据驱动的分析的这种演进的根本,是在整个企业中智能地推理此数据的关键需求。

In many ways, the data industry is at a similar stage to where software engineering (and, more specifically, Developer Operations, or DevOps) was about a decade ago. Just now are data teams understanding the importance of automated tooling, eliminating data downtime, and perhaps, most importantly, ensuring high data reliability. In fact, over the past few years, we’ve found that the best data organizations are applying a software engineering mindset to maintain their competitive edge.

从许多方面来看,数据行业正处于大约十年前的软件工程(更具体地说,是Developer Operations或DevOps)的阶段。 到目前为止,数据团队已经了解了自动化工具的重要性,消除了数据停机时间,也许最重要的是确保了高数据可靠性。 实际上,在过去的几年中,我们发现最好的数据组织正在运用软件工程思想来保持自己的竞争优势。

In this article, we walk through the ABCs of modern data teams, providing an overview of the top terms and concepts today’s data teams need to know:

在本文中,我们将介绍现代数据团队的基础知识,概述当今数据团队需要了解的主要术语和概念:

Data analytics give teams across a company insights into how their functional organizations are performing. Image courtesy of Franki Chamaki on Unsplash.
数据分析使整个公司的团队可以深入了解其职能组织的绩效。 图片由Franki ChamakiUnsplash提供

数据分析 (Data Analytics)

(n.) [Pronounced: an-l-it-ks]. See: Data Analyst. An emerging discipline for the collection, integration, analysis, and presentation of large-sets of information to generate business intelligence. Data analytics allow functional areas across a company to make smarter decisions with their data.

(n。)[发音:an-l-it-ks]。 请参阅:数据分析师。 新兴学科,用于收集,集成,分析和表示大量信息以生成商业智能。 数据分析使整个公司的职能部门可以利用其数据做出更明智的决策。

数据分析师 (Data Analyst)

(n.) See: Data Analytics. A data team member responsible for supporting the data scientist and engineer; data analyst roles vary depending on your industry and the size of the company, but they are typically responsible for performing data modeling, identifying patterns in data, and designing/creating reports.

(n。) 请参阅:数据分析 数据团队成员,负责支持数据科学家和工程师; 数据分析师的角色因您所在的行业和公司规模而异,但通常负责执行数据建模,识别数据模式以及设计/创建报告。

商业智能(BI) (Business Intelligence (BI))

(n.) See: Data Analytics. Methods and technologies that gather, store, and analyze organizational data to help companies make better decisions. Generally speaking, business intelligence refers to the output of data analytics (i.e., intelligence for your business).

(n。)请参阅:数据分析。 收集,存储和分析组织数据以帮助公司做出更好决策的方法和技术。 一般而言,商业智能是指数据分析的输出(即,企业的智能)。

Like a library catalog, a data catalog tells you where your data is stored and how to access it. Image courtesy of Dollar Gill on Unsplash.
像库目录一样,数据目录告诉您数据的存储位置以及访问方式。 图片由Dollar GillUnsplash

资料目录 (Data Catalog)

(n.) [Pronounced: dah-tuh kat-l-awg] An inventory of metadata that gives users the information necessary to evaluate data accessibility, health, and location. Many modern data catalogs are self-serve, making it easy for data teams to pull information about their data and regulate who has access to it.

(n。)[发音:dah-tuh kat-l-awg]元数据清单,为用户提供评估数据可访问性,健康状况和位置所必需的信息。 许多现代数据目录都是自助服务,这使数据团队可以轻松提取有关其数据的信息并规范谁可以访问它们。

数据停机时间 (Data Downtime)

(n.) Etymology: Coined by data reliability company Monte Carlo. Periods of time when data is partial, erroneous, missing, or otherwise inaccurate. Data downtime is caused by bad data, data anomalies, and other issues that can corrupt otherwise good data pipelines.

(n。)词源:由数据可靠性公司Monte Carlo创造 数据不完整,错误,丢失或不准确的时间段。 数据停机是由不良数据,数据异常以及其他可能破坏正常数据管道的问题引起的。

数据工程师 (Data Engineer)

(n.) A data team member responsible for preparing data. Data engineers import / clean / manipulate raw data, develop / test / maintain infrastructure, marry systems together, and conduct database administration. Data engineers are increasingly replicating best practices from software engineers (particularly DevOps teams) to collaborate on data management and automate flows between data managers and downstream data consumers.

(n。)负责准备数据的数据团队成员。 数据工程师可以导入/清除/处理原始数据,开发/测试/维护基础架构,将系统结合在一起并进行数据库管理。 数据工程师越来越多地复制软件工程师(尤其是DevOps团队)的最佳实践,以协作进行数据管理并自动执行数据管理器与下游数据使用者之间的流程。

提取,转换,加载(ETL) (Extract, Transform, Load (ETL))

(n.) The general procedure of copying raw data from one or more sources into a destination system which presents the data differently, via:

(n。)通过以下方式将原始数据从一个或多个源复制到目标系统的通用过程,该目标系统以不同的方式呈现数据:

  1. Extract: find the data source, make a copy of the data, then load data into memory

    提取:找到数据源,复制数据,然后将数据加载到内存中

  2. Transform: reorganize data into a form that suits the needs of the end user (reporting, dashboards, ML) by cleaning, formatting summarizing, joining records from multiple sources, and more

    转换:通过清理,格式化摘要,合并来自多个来源的记录等来将数据重组为适合最终用户需求的形式(报表,仪表板,ML)

  3. Load: move data into a destination (data warehouse, data lake, etc.)

    负载:将数据移至目标位置(数据仓库,数据湖等)

This data flow diagram depicts the flow of data in and out of the National Cancer Registration and Analysis Service. 该数据流程图描述了进出国家癌症登记和分析服务的数据流。 Image图片 courtesy Wikimedia CommonsWikimedia Commons提供, under the ,已获得GNU Free Documentation LicenseGNU自由文档许可.

数据流程图 (Data Flow Diagram)

(n.) A visual means of representing the path of data throughout its lifecycle, often spanning the ETL process across different solutions or steps. This video from SmartDraw does an amazing job of explaining what data flow diagrams are and how to design them.

(n。)表示数据生命周期中的数据路径的可视化手段,通常跨越不同解决方案或步骤的ETL过程。 SmartDraw的这段视频在解释什么是数据流程图以及如何设计它们方面做得非常出色。

数据治理 (Data Governance)

(n.) [Pronounced: dah-tuh guhv-er-nuhns] The process of managing the availability, usability, and security of data in an organization, frequently based on internal policies and external regulations regarding the application of said data. Buzzy term in the data world due to GDPR, CCPA, and other important pieces of legislation around data compliance.

(n。)[发音:dah-tuh guhv-er-nuhns]通常根据有关数据应用的内部策略和外部法规,管理组织中数据的可用性,可用性和安全性的过程。 由于GDPR,CCPA和其他有关数据合规性的重要法规,数据世界中的术语繁琐。

数据中心 (Data Hub)

(n.) A type of data architecture that collects data from multiple sources, similar to a data lake. Unlike a data lake, however, a data hub homogenizes data and may be able to serve data via various formats.

(n。)一种数据架构,类似于数据湖,它从多个源收集数据。 但是,与数据湖不同,数据中心可以使数据均匀化,并且可以通过各种格式来提供数据。

资料撷取 (Data Ingestion)

(n.) [Pronounced: dah-tuh ĭn-jĕs′chən ] The process of acquiring and importing data for use, either immediately or in the future. Data can either be ingested in real-time or in batches.

(n。)[发音: dah -tuhĭn-jĕs'chən]立即或将来获取和导入使用数据的过程。 数据可以实时或成批提取。

数据联接 (Data Join)

(n.) The process of combining two data sets, side-by-side, such that at least one column in each data set must be the same.

(n。)并排组合两个数据集的过程,以使每个数据集中的至少一列必须相同。

知识 (Knowledge)

(n.) See: Business Intelligence. The outcome of applying ETL to data as a means of generating actionable and understandable insights based on raw information. The Data, Information, Knowledge, and Wisdom (DIKW) hierarchy elaborates on the distinction and correlation between data and knowledge. A data team is responsible for transforming data into knowledge for use by their broader enterprise.

(n。)请参阅:商业智能。 将ETL应用于数据作为基于原始信息生成可行且可理解的见解的一种手段的结果。 数据,信息,知识和智慧(DIKW)层次结构详细说明了数据和知识之间的区别和相关性。 数据团队负责将数据转换为知识,以供更广泛的企业使用。

数据湖 (Data Lake)

(n.) A vast pool of raw data for a purpose that is not yet defined, usually stored as object blobs or files.

(n。)尚未定义的大量原始数据,通常存储为对象blob或文件。

数据库管理系统(DBMS) (Database Management System (DBMS))

(n.) [Pronounced: dah-tuh-beys man-ij-muhnt sis-tuhm] A software application or package designed to manage data in a database, including the data’s format, field names, record structure, and file structure. DBMS’ come in a variety of flavors depending on the users’ industry or discipline.

(n。)[发音: dah -tuh-beys man -ij-muhnt sis -tuhm]一种软件应用程序或软件包,用于管理数据库中的数据,包括数据的格式,字段名称,记录结构和文件结构。 根据用户的行业或学科,DBMS有多种风格。

数据库 (Data Mart)

(n.) A form of a data warehouse that focuses on a single functional area (i.e., sales, finance, marketing, etc.).

(n。)数据仓库的一种形式,专注于单个功能区域(即销售,财务,市场营销等)。

数据网格 (Data Mesh)

(n.) [Pronounced: dah-tuh mesh ] Etymology: originated in Zhamak Dehghani’s landmark ThoughtWorks article on the distributed data mesh. A type of data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-service design. Relies on ensuring universal data reliability at all points of the entire architecture and stages of the data life cycle.

(n。)[发音: dah -tuh mesh]词源:起源于Zhamak Dehghani关于分布式数据网格具有里程碑意义的ThoughtWorks文章 一种数据平台架构,通过利用面向领域的自助服务设计来涵盖企业中无处不在的数据。 依靠确保在整个体系结构和数据生命周期各个阶段的通用数据可靠性。

数据可观察性 (Data Observability)

Image courtesy of author.图片由作者提供。

(n.) [Pronounced: dah-tuh uhb-zur-vuh-buh-luh-tee] Etymology: Inspired by the software engineering practice of observability. An organization’s ability to fully understand the health of their data over its entire life cycle and surface data downtime incidents as soon as they arise; includes ability to understand the five pillars of data observability:

(n。)[发音: dah -tuh uhb- zur -vuh-buh-luh-tee]词源:受可观察性软件工程实践的启发。 组织具有在其整个生命周期中充分了解其数据的健康能力以及一旦发生表面数据停机事件的能力; 包括了解数据可观察性的五个Struts的能力:

  • Freshness: how up-to-date data tables are and the cadence at which tables are updated

    新鲜度:如何更新数据表以及更新表的节奏

  • Distribution: if data’s possible values are within an acceptable range and format

    分布:如果数据的可能值在可接受的范围和格式内

  • Volume: completeness of data tables and insight on the health of data sources

    :数据表的完整性和对数据源运行状况的洞察

  • Schema: changes in organization of data and health of data ecosystem

    模式:数据组织和数据生态系统运行状况的变化

  • Lineage: which upstream and downstream ingestors are impacted and which teams are generating data and accessing it

    沿袭:哪些上游和下游摄入器受到影响,哪些团队正在生成数据并访问它

数据操作(DataOps)(Data Operations (DataOps))

(n.) A discipline that merges data engineering and data science to support an organization’s data needs, much in the same way developer operations (DevOps) helped scale the software engineering field (version control, iterative agile development, collaboration, etc.). Automation is increasingly playing an important part in the DataOps practice re: addressing data downtime, akin to how automated tools help DevOps teams ensure high application uptime and minimize downtime.

(n。)融合数据工程和数据科学以支持组织的数据需求的学科,这与开发人员运营(DevOps)帮助扩展软件工程领域(版本控制,迭代敏捷开发,协作等)的方式大体相同。 自动化在DataOps实践中扮演着越来越重要的角色:解决数据停机问题,类似于自动化工具如何帮助DevOps团队确保高应用程序正常运行时间并最大程度地减少停机时间。

数据平台 (Data Platform)

(n.) A central repository for all data, handling the collection, cleansing, transformation, and application of data to generate business insights. A must-have re: scalability and sustainability for large data organizations.

(n。)所有数据的中央存储库,处理数据的收集,清理,转换和应用以产生业务见解。 必须具备的条件:大数据组织的可伸缩性和可持续性。

Data quality issues spare no organization, and very often DataOps teams are responsible for resolving them. Image courtesy of Monte Carlo.
数据质量问题不遗余力,DataOps团队通常负责解决这些问题。 图片由蒙地卡罗( Monte Carlo)提供

资料品质 (Data Quality)

(n.) The health of data at any stage in its life cycle. Data teams can measure data quality through a simple KPI that calculates data downtime. Data quality issues can happen at any stage of the data pipeline.

(n。)数据生命周期中任何阶段的运行状况。 数据团队可以通过计算数据停机时间的简单KPI来衡量数据质量。 数据质量问题可能会在数据管道的任何阶段发生。

数据质量检查测试 (Data QA Testing)

(n.) See: data quality.The maintenance of a desired level of data quality for a given service or product.

(n。)请参阅:数据质量。 维持给定服务或产品所需的数据质量水平。

资料可靠性 (Data Reliability)

(n.) [Pronounced: dah-tuh ri-lahy-uh-bil-i-tee] Having full confidence in data’s accuracy and consistency over its entire life cycle. In short: if data is not reliable, it cannot be trusted. Modern data organizations rely on data reliability to increase revenue, save time, make smart decisions with their data, and ensure customer trust.

(N)[母语发音:大新-tuh RI-lahy-UH-BIL -i-发球]具有在其整个生命周期充分的信心,数据的准确性和一致性。 简而言之:如果数据不可靠,则无法信任。 现代数据组织依靠数据可靠性来增加收入,节省时间,使用数据做出明智的决策并确保客户信任。

数据科学家 (Data Scientist)

(n.) A data team member responsible for analyzing and interpreting data. Data scientists provide insights and answers to key business questions via quantitative means. Increasingly, data scientists are tasked with building ML algorithms to make predictions about the business.

(n。) 负责分析和解释数据的数据团队成员。 数据科学家通过定量手段提供关键业务问题的见解和答案。 数据科学家越来越多地被要求构建ML算法来做出有关业务的预测。

数据源 (Data Source)

(n.) The location where data originates from (file, API feed, database, SaaS application, etc.).

(n。)位置 数据的来源(文件,API提要,数据库,SaaS应用程序等)。

数据表 (Data Table)

(n.) A way of displaying data in a grid-like format of rows and columns, generally organized in relation to X and Y axes.

(n。) 一种以行和列的网格状格式显示数据的方式,通常相对于X轴和Y轴进行组织。

使用者介面(UI) (User Interface (UI))

(n.) [Pronounced: yoo-zer in-ter-feys] The means through which a user and a computer system interact. In the context of data analytics, a UI presents an easy to digest way for consumers to understand data, insights, and knowledge in a given data store.

(n。) [发音: yoo -zer in -ter-feys]用户和计算机系统进行交互的方式。 在数据分析的上下文中,UI提供了一种易于消化的方式,供消费者理解给定数据存储中的数据,见解和知识。

John Hopkins’ COVID-19 Dashboard is one of the most well-known data visualizations of 2020. Image courtesy of 约翰·霍普金斯(John Hopkins)的COVID-19控制台是2020年最著名的数据可视化之一。图片由Clay BanksClay Banks on Unsplash.on Unsplash提供。

数据可视化 (Data Visualization)

(n.) [Pronounced: dah-tuh vi-zhoo-uh-lai-zei-shn] The graphic representation of data, often incorporating images that communicate relationships between data points. A data lineage is a helpful and increasingly popular form of data visualization for mapping connections between upstream and downstream data sources, i.e., in the case of data downtime.

(N)。[发音:DAH -tuh VI-zhoo-UH-lai--shn]数据的图形表示,通常结合有通信数据点之间的关系的图像。 数据沿袭是一种有用且日益流行的数据可视化形式,用于映射上游和下游数据源之间的连接,即在数据停机的情况下。

数据仓库 (Data Warehouse)

(n.) A central repository for structured, filtered data that has already been processed, often for a specific purpose.

(中央)中央 通常用于特定目的的已处理结构化,筛选数据的存储库。

X值 (X-value)

(n.) See: data table. The horizontal value in a pair of coordinates, whose value is determined by measuring parallel to the x-axis.

(n。)参见:数据表。 一对坐标中的水平值,其值是通过平行于x轴进行测量确定的。

Y值 (Y-value)

(n.) See: data table. The vertical value in a pair of coordinates, whose value is determined by measuring parallel to a y-axis.

(n。)参见:数据表。 一对坐标中的垂直值,其值是通过平行于y轴进行测量确定的。

数据区 (Data Zone)

(n.) Not to be confused with the “danger zone,” data zone refers to sub-sections of a data lake that correspond to the format of the data (i.e., raw, structured, transformed, etc.).

(n。)请勿与“危险区域”混淆,数据区域是指数据湖中与数据格式(即原始,结构化,转换等)相对应的子部分。

数据团队的下一步是什么? (What’s next for data teams?)

Data is a rapidly evolving space, presenting a wealth of opportunities when it comes to leveraging data your company can actually trust. Image courtesy of Tomasz Frankowski on Unsplash.
数据是一个快速发展的空间,在利用公司可以实际信任的数据时提供了很多机会。 图片由Tomasz Frankowski在“ Unsplash”中提供

We anticipate that over the next decade, the data industry will witness an explosive growth of the DataOps field. Much in the same way that New Relic, DataDog, and Honeycomb have taken the site reliability and observability fields by storm, DataOps is ripe for its own movement centered around the core concepts of data reliability and observability.

我们预计,在未来十年中,数据行业将见证DataOps领域的爆炸性增长。 与New Relic,DataDog和Honeycomb席卷整个站点的可靠性和可观察性领域一样,DataOps围绕数据可靠性和可观察性的核心概念开展自己的活动的时机已经成熟。

As organizations generate more and more data, data infrastructure and workflows will only increase in complexity, requiring data teams that can ensure data trust across your entire company.

随着组织生成越来越多的数据,数据基础架构和工作流只会增加复杂性,需要数据团队来确保整个公司的数据信任。

This article was written by Barr Moses and Alexa Grabell.

本文由Barr MosesAlexa Grabell撰写。

翻译自: https://medium.com/analytics-vidhya/introducing-the-new-abcs-of-data-8f4f3b6418b6

abcd选项后的数据分析


http://www.taodudu.cc/news/show-2731058.html

相关文章:

  • kuberneters集群发布内部服务详解
  • 一篇文章入门Python生态系统
  • 对话系统中的中文自然语言理解 (NLU) 任务介绍
  • 对你来说,哪一个深度学习网络是最佳选择?(2)
  • 国外嵌入式开源网站
  • 人工智能/数据科学比赛汇总 2019.9
  • 人工智能/数据科学比赛汇总 2019.8
  • 产业的互联网化是什么时代_新时代:为什么互联网公司不应忽视65岁以上的一代...
  • Vue3源码解析01--Vue3初探
  • 香菇油菜做法
  • 小油菜鸡蛋面疙瘩汤
  • 微信小程序字母索引菜单
  • 阶段性感悟小结
  • 小程序端测试经验分享
  • dwf怎么合成一个_油菜素内酯合成基因DWF1、DET2影响毛白杨木质部形成
  • 油菜出现花而不实现象,这是什么原因,该怎样防治?
  • 中国最美油菜花海
  • HDU 新生赛 油菜花王国(并查集)
  • 梯田油菜花海距杭州仅120公里
  • 油菜花王国(并查集)
  • 油菜~
  • 杭电校赛(油菜花王国)
  • 描写油菜花的好句好段
  • CSS小结
  • 基于高通量测序开发甘蓝型油菜全基因组SSR标记
  • 基因组+转录组助力油菜种子油含量自然变异的遗传研究
  • hdu 校赛 油菜花王国
  • hdu 油菜花王国
  • 饼图大小调整_PPT制作简约饼图,学会这一个就够了!
  • 2021年中国油菜籽发展现状及进出口状况分析:加拿大仍为我国进口油菜籽主要来源国 [图]

abcd选项后的数据分析_引入新的数据abcs相关推荐

  1. 小新pro13睡眠后无法唤醒_小新air12、air13、air13pro睡眠后无法唤醒的调试方法

    故障现象: 合上盖子后,一段时间自动黑屏后,或者是点击睡眠按钮之后,无法通过敲击键盘或者是按一下电源按钮唤醒电脑.解决方案: 方法一: 提示:如是合上盖子后无法唤醒,或一段时间自动黑屏后无法唤醒,可参 ...

  2. iphone双卡_打开这个5G选项后:iPhone 12支持用蜂窝数据下载iOS更新|ios|iphone|应用程序|wi-fi...

    >>>闭着眼睛买的新款iPhone 12透明玻璃保护壳-不发黄,防摔. 根据本周苹果发布的支持文件显示,苹果已经推出了使用 5G 蜂窝数据在 iPhone 12 mini,iPhon ...

  3. 季节性时间序列数据分析_如何指导时间序列数据的探索性数据分析

    季节性时间序列数据分析 为什么要进行探索性数据分析? (Why Exploratory Data Analysis?) You might have heard that before proceed ...

  4. 共享单车 芝加哥 数据分析_为什么311无法使用数据科学识别并解决芝加哥311明显的服务问题...

    共享单车 芝加哥 数据分析 Did you know that if you call 311, the City of Chicago can help you trim your trees an ...

  5. vue 添加完数据后刷新页面_页面刷新vuex数据消失

    1.前言 vue构建的项目中,vuex的状态存储是响应式的,当vue组件从store中读取状态的时候,若store中的状态发生变化,那么相应的组件也会得到高效刷新,问题来了,vuex存储的数据只是在页 ...

  6. 数据中心细节_当细节很重要时数据不平衡

    数据中心细节 定义不平衡数据 (Definition Imbalanced Data) When we speak of imbalanced data, what we mean is that a ...

  7. 认识数据分析_认识您的最佳探索数据分析新朋友

    认识数据分析 Visualization often plays a minimal role in the data science and model-building process, yet ...

  8. python股票数据分析_用Python抓取新浪的股票数据

    最近做数据分析,先是找到了Tushare这个免费开源的第三方财经包,但后来用了几天之后发现,它的日交易历史数据有时候有不准确的情况,查看源代码发现,这个包的数据源是凤凰财经,而对比凤凰网站其站点的数据 ...

  9. 引入新插件后Gradle在配置阶段报错

    Gradle Configure 引入新的插件后出错 Caused by: org.gradle.api.InvalidUserDataException: Cannot configure the ...

最新文章

  1. 学软件测试的优势有哪些
  2. 强势推荐一位 Python 原创自动化大佬!
  3. softmax(a,axis=0)的用法理解 总结
  4. HTML5全屏API
  5. React-router的基本使用
  6. PO、VO、DAO、BO、POJO
  7. Window_纪中_1326_单调队列
  8. linux进程作为服务,将一个监视进程做成linux系统服务
  9. windows共享关闭密码保护是灰色的
  10. Axure Rp汉化安装
  11. 关于origin2019的安装教程
  12. 扣丁软件测试基础知识,总结钢筋工程266问,包你从入门到放弃,建议收藏
  13. Python数据分析项目-微信好友数据分析
  14. ProcessOn画斜箭头、写公式方法记录
  15. 一张图看懂光圈、快门、感光度的意义
  16. Go Moudle笔记
  17. caffe2及Detectron环境搭建
  18. 基于51单片机设计的交通灯
  19. 安装并使用Panoply (netCDF, HDF and GRIB Data Viewer)
  20. vue移动端双击页面放大问题

热门文章

  1. [北航软工教学] 教学计划大纲
  2. Inventory文件扩展
  3. 喜欢我们不如加入我们:来投稿吧,稿酬靠谱!
  4. 记录 | Latex 双栏排版插入图片后图片太大的问题 一种解决方案
  5. 创意简约土木黑灰配色PPT模板
  6. Ubuntu18.04笔记本插入耳机没有声音 解决方案
  7. java 首字母大写方法
  8. APScheduler如何设置任务不并发(即第一个任务执行完再执行下一个)?
  9. 2021 上海科技大学信息学院SIST夏令营经验+记录贴
  10. Excel如何实现间隔插入空白行