如何避免循环查询数据库

As the amount of data managed by an organization grows, the difficulty of managing and querying that data grows as well: more databases, more query languages, more gotchas to keep in mind and avoid. Beyond “fits on my laptop” data scale, database-agnostic querying is a necessary technology that allows engineers to focus on the data they need, instead of the data access mechanisms at work behind the scenes. In our previous post, we covered how Kensho uses GraphQL as a database query language via our open-source GraphQL compiler project. We’ll now dig into why this kind of database-agnostic querying is unavoidable at scale, and how this made GraphQL compiler become an essential productivity-booster for Kensho engineers.

随着组织管理的数据量的增加,管理和查询该数据的难度也随之增加:更多的数据库,更多的查询语言,需要记住和避免的更多陷阱。 除了“适合我的笔记本电脑”数据规模外,与数据库无关的查询是一项必不可少的技术,它使工程师可以专注于所需的数据,而不是在后台使用数据访问机制。 在我们以前的文章 中,我们介绍Kensho计划如何使用GraphQL为通过我们的数据库查询语言 开源 GraphQL编译器 项目。 现在,我们将深入探讨为何无法避免这种与数据库无关的查询,以及这如何使GraphQL编译器成为Kensho工程师必不可少的生产力提升者。

Database-agnostic querying is the Holy Grail of data infrastructure: it’s an abstraction that allows querying users to focus on the “what” of their query while completely ignoring its “how” and “where.” Users see a single unified representation of all the data available to them and get answers to their questions (the “what”) by writing declarative queries in a single easy-to-use query language.

与数据库无关的查询是数据基础架构的圣杯:这是一个抽象概念,它允许查询用户将查询的重点放在查询的“ 内容 ”上,而完全忽略了查询的“ 方式 ”和“ 位置” 用户可以使用一种易于使用的查询语言编写声明性查询,从而看到所有可用数据的单一统一表示,并获得其问题的答案(“ 什么 ”)。

Beneath this database-agnostic abstraction, a sophisticated system keeps track of the details around every piece of data:

在与数据库无关的抽象之下,一个复杂的系统跟踪每个数据的细节:

  • The kind of database that houses the data: relational, graph, time-series, key-value, etc.;存放数据的数据库的类型:关系,图形,时间序列,键值等;
  • The particular database instance, shard, or replica that has the data;具有数据的特定数据库实例,分片或副本;
  • The particular flavor and version of the database that has the data: PostgreSQL 12.3, Microsoft SQL Server 2017, OrientDB 2.2.37, etc.具有数据的数据库的特定样式和版本:PostgreSQL 12.3,Microsoft SQL Server 2017,OrientDB 2.2.37等。

Given a database-agnostic query, the system must first locate the data being queried (the “where”) and then decide how to most efficiently retrieve the data in question (the “how”). Doing this well is obviously a tremendous challenge!

给定与数据库无关的查询,系统必须首先找到要查询的数据(“ where ”),然后决定如何最有效地检索所讨论的数据(“ how ”)。 做好这项工作显然是巨大的挑战!

So what makes database-agnostic querying a challenge worth solving? And in particular, why would a 150-person company like Kensho decide to invest heavily into solving it?

那么,什么使与数据库无关的查询成为值得解决的挑战呢? 特别是,为什么像Kensho这样的拥有150名员工的公司决定投入大量资金来解决这个问题?

Cross-database querying is unavoidable at scale, and unaided humans are generally awful at it.

跨数据库查询在规模上是不可避免的,并且无助的人员通常对此感到恐惧。

Two facts combine to give us the answer: cross-database querying is unavoidable at scale, and unaided humans are generally awful at it. Therefore, a good solution would have immense leverage: it would help many people, and help a lot.

有两个事实结合在一起为我们提供了答案:跨数据库查询在规模上是不可避免的,而无助的人通常对此并不满意。 因此,一个好的解决方案将具有巨大的影响力 :它将帮助很多人,也将带来很多帮助。

Before we dig into the specific benefits Kensho derives from GraphQL compiler, let’s examine our two premises in more detail.

在深入研究Kensho从GraphQL编译器中获得的具体好处之前,让我们更详细地研究一下我们的两个前提。

前提1:跨数据库查询是不可避免的。 (Premise #1: Cross-database querying is unavoidable at scale.)

There comes a point in many companies’ lifecycle where it is no longer reasonable to place all of the company’s data into a single database.

在许多公司的生命周期中,有些时候不再需要将公司的所有数据都放在一个数据库中。

If your company is lucky, this is simply a problem of having a large volume of relatively homogeneous data: it’s simply too big for a single database, so splitting up schemas or sharding large tables among multiple databases may be required. Suddenly, querying isn’t so easy anymore: what used to be a simple single-database SQL SELECT query may now need to be a cross-database operation, adding operational complexity and new failure modes, and threatening productivity and correctness. There are plenty of examples of companies building data infrastructure that addresses this problem by abstracting away the physical data storage layer: Google’s BigTable, Dropbox’s Edgestore, or Facebook’s Tao.

如果您的公司很幸运,那么这仅仅是一个拥有大量相对同类数据的问题:对于单个数据库来说太大了,因此可能需要在多个数据库之间拆分模式或分片大型表。 突然之间,查询变得不再那么容易了:以前曾经是简单的单数据库SQL SELECT查询现在可能需要跨数据库操作,从而增加了操作复杂性和新的故障模式,并威胁了生产率和正确性。 通过构建物理数据存储层抽象化解决此问题的公司的例子很多,例如Google的BigTable , Dropbox的Edgestore或Facebook的Tao 。

When a table grows too large to reside in a single database, it can be “sharded” across multiple databases: different rows are assigned to different databases. Each shard is responsible for storing and querying across the rows assigned to it. This distributed nature of the table’s data makes queries significantly more complex.
当表太大而无法驻留在单个数据库中时,可以在多个数据库之间“分片”:将不同的行分配给不同的数据库。 每个分片负责在分配给它的行中进行存储和查询。 表数据的这种分布式性质使查询大大复杂。

The more difficult case is when a company has heterogeneous data: not just relational data, but also time-series, graphs, geospatial data, images, audio files, XML files, PDFs, machine learning models that are gigabytes in size, and metadata linking all of these together. In this case, splitting up the data is a necessity since it’s essentially impossible to make a database system sufficiently good at all of these use cases simultaneously. Faced with this problem, companies generally adopt a “best tool for the job” approach, placing relational data in SQL databases, time-series in specialized time-series databases, large files in Amazon S3, etc.

更为困难的情况是公司拥有异构数据:不仅是关系数据,而且还有时间序列,图形,地理空间数据,图像,音频文件,XML文件,PDF,大小为千兆字节的机器学习模型以及元数据链接所有这些在一起。 在这种情况下,拆分数据是必要的,因为在所有这些用例上同时使数据库系统足够好基本上是不可能的。 面对这个问题,公司通常采用“最佳工作工具”方法,将关系数据放置在SQL数据库中,时间序列放置在专门的时间序列数据库中,大文件放置在Amazon S3中,等等。

Different databases are optimized for managing different kinds of data. Despite the different storage locations, data is usually interconnected and must be used in a cross-database context. For example, a time series is much more useful when linked to the entity to which it belongs (e.g., in a SQL or graph database) and to the primary source documents that corroborate its numbers (e.g., in Amazon S3).
优化了不同的数据库以管理不同种类的数据。 尽管存储位置不同,但是数据通常是互连的,必须在跨数据库的上下文中使用。 例如,时间序列在链接到它所属的实体(例如,在SQL或图形数据库中)和证实其编号的主要源文档(例如,在Amazon S3中)时,会更加有用。

Kensho faces both problems at once. Not only do we have every kind of heterogeneous data mentioned above, but we have A LOT of it — especially after being acquired by S&P Global and thereby getting access to data curated over the span of 100+ years! Just to give you a sense of scale:

Kensho立即面对两个问题。 我们不仅有上面提到的各种异构数据,但我们拥有了很多 -尤其是标普全球被收购,从而获得对数据的访问策划过的100多年跨度后! 只是为了给您一种规模感:

  • Our largest SQL database system is easily dozens of terabytes in size and contains hundreds of thousands of tables and views.

    我们最大SQL数据库系统的大小很容易达到数十TB ,其中包含数十万个表和视图

  • Parsing the schema generated from the most popular such logical database requires almost a full minute of Python CPU time to validate and convert into a GraphQL schema object. That resulting schema object consumes over 500MB of memory.

    解析从最流行的此类逻辑数据库生成的架构需要近一整分钟的Python CPU时间才能验证并转换为GraphQL架构对象。 最终的架构对象将消耗500MB以上的内存

  • For that schema, executing the standard GraphQL introspection query (e.g., as used by the popular GraphiQL editor tool) takes over a minute of Python CPU time using the Python GraphQL library and in response produces a 55MB JSON payload. When delivered to GraphiQL in a web browser, the entire UI freezes up for over a second while GraphiQL parses and loads that absolutely gigantic result object.

    对于该架构,使用Python GraphQL库执行标准的GraphQL自省查询(例如, 流行的GraphiQL编辑器工具所使用的查询)会花费一分钟的Python CPU时间,并且相应地会产生55MB的JSON有效负载 。 当在网络浏览器中交付给GraphiQL时,整个UI会冻结 向上 在GraphiQL解析并加载该绝对巨大的结果对象的过程中,花了超过一秒钟的时间。

Across Kensho, S&P Global, and all its other subsidiaries, we store data in over 30 kinds of data systems from various vendors — usually more than one major version of each. Just in the SQL space, we run at least three major versions of PostgreSQL, two major versions of Microsoft SQL Server, some Oracle instances, dozens of terabytes’ worth of Amazon Aurora nodes, and countless SQLite databases scattered across servers and individual engineers’ computers. Data systems are constantly being consolidated, migrated, and upgraded, and as we ingest ever more data sets and acquire new companies with new technology stacks, the number of systems tends to go up.

在整个Kensho计划,标准普尔全球,和所有其他的子公司,我们在存储来自不同厂商的30多种数据系统中的数据-通常超过一个的每一个主要版本。 仅在SQL空间中,我们至少运行三个主要版本的PostgreSQL,两个主要版本的Microsoft SQL Server,一些Oracle实例,数十兆兆字节的Amazon Aurora节点,以及无数分布在服务器和单个工程师计算机上SQLite数据库。 数据系统正在不断地进行整合,迁移和升级,并且随着我们吸收越来越多的数据集并收购具有新技术堆栈的新公司,系统的数量趋于增加。

Just a few of the data storage systems we use across Kensho, S&P Global, and its other subsidiaries. Frequently, we run more than one major version of each system concurrently. At our scale, data systems are constantly being consolidated, migrated, and upgraded.
我们在Kensho,S&P Global及其子公司中使用的数据存储系统只有少数。 通常,我们同时运行每个系统的多个主要版本。 以我们的规模,数据系统正在不断地整合,迁移和升级。

All this goes to show how unavoidable cross-database querying really is at this scale. In the short term, workarounds such as hand-written, special-cased logic for particular cross-database use cases can act as a bandaid for the problem. However, such workarounds are by definition technical debt and thus cause ever-increasing drag on organization-wide productivity, so investing in a comprehensive cross-database solution is best in the long run.

所有这些都说明了在如此规模的情况下,跨数据库查询确实是不可避免的。 在短期内,针对特定跨数据库用例的诸如手写的,特殊情况的逻辑之类的变通办法可以充当该问题的创可贴。 但是, 从定义上讲 ,这些变通方法是技术债务 ,因此对组织范围内的生产力造成越来越大的拖累,因此从长远来看,最好是对全面的跨数据库解决方案进行投资。

前提2:无助的人在跨数据库查询方面很糟糕。 (Premise #2: Unaided humans are awful at cross-database querying.)

If you are a data engineer, right now you may be thinking: “Databases are my job. I am not awful, I am great at my job!” And you are right! In terms of writing the best queries, between an automated system and an expert human, always place your bet on the human — that is, if considering the particular database at which the human is an expert.

如果您是一名数据工程师,现在您可能会想:“数据库是我的工作。 我并不可怕,我的工作很棒!” 你是对的! 就编写最佳查询而言,在自动化系统和专家人员之间,始终将赌注押在该人员身上-也就是说,如果考虑到该人员是专家所在的特定数据库。

However, most humans aren’t experts for even one database, and restricting access to your company’s data to only experts is wildly impractical. Any solution that relies on humans always writing only good queries ultimately fails at scale — the data eventually outgrows humans’ ability to keep up.

但是, 大多数人甚至都不是一个数据库的专家 ,将访问公司数据的权限限制为仅专家是不切实际的。 任何依赖人类始终只编写良好查询的解决方案最终都会大规模失败- 数据最终将超出人类的跟上能力。

Kensho is far from the first company to make this observation. Dropbox points out the same thing in their Edgestore blog post:

Kensho远非做此观察的第一家公司。 Dropbox在Edgestore博客文章中指出了同样的事情:

“As we rapidly added both users and features, we soon ended up with multiple, independent databases; […] Having to write SQL and interact with MySQL directly impacted developer productivity.” — Dropbox

“随着我们快速添加用户和功能,我们很快就拥有了多个独立的数据库; […]必须编写SQL并与MySQL交互直接影响开发人员的生产力。” — Dropbox

Dropbox found writing a single SQL dialect to be a productivity bottleneck. Just imagine what happens to engineer productivity if needing to query the 30 different kinds of data systems we have at Kensho and S&P!

Dropbox发现编写单个SQL方言是生产力瓶颈。 试想一下,如果需要查询Kensho和S&P 拥有30种不同类型的数据系统 ,工程师的生产率会如何变化

Since Kensho engineers can write queries in a database-agnostic fashion, they can remain productive even without knowing:

由于Kensho工程师可以以与数据库无关的方式编写查询,因此即使知道以下内容,他们也可以保持工作效率:

  • how recursive JOINs can be implemented in SQL using recursive common table expressions (CTEs), and that the “obvious” recursive CTE approach makes queries excessively slow by hindering the query optimizer in most SQL flavors except PostgreSQL v12;

    如何使用递归公用表表达式(CTE)在SQL中实现递归JOIN ,以及“显而易见的”递归CTE方法通过阻碍 PostgreSQL v12 以外的大多数SQL风格的查询优化器而使查询过慢。

  • that Microsoft SQL Server does not have an array aggregation operator, but that a sophisticated workaround via some obscure XML functionality, together with some post-processing, can produce equivalent behavior;

    Microsoft SQL Server没有数组聚合运算符,但是通过一些晦涩的XML功能以及一些后处理 的复杂解决方法可以产生等效的行为;

  • that OrientDB versions before 3.0 may produce incorrect query outputs due to incorrectly tracked query dependencies, and the gloriously simple workaround that can be automatically applied to affected queries;

    3.0之前的OrientDB版本可能由于错误跟踪查询依赖项而产生错误的查询输出 ,并且光荣简单的解决方法可以自动应用于受影响的查询 ;

  • that the OrientDB query planner in the 2.2 releases may cause excessively slow query performance by failing to consider existing indexes for operators like CONTAINS, and the sophisticated query transformation can ensure indexes are used and thereby recover the lost performance;

    2.2发行版中的OrientDB查询规划器可能不考虑象CONTAINS这样的运算符的现有索引,从而可能导致查询性能大大降低,并且复杂的查询转换可以确保使用索引,从而恢复损失的性能 ;

  • that OrientDB does not support further traversals after crossing an “optional” edge, and the clever “union of queries” workaround for this, or even

    在跨越“可选”边缘之后 ,OrientDB 不支持进一步遍历 ,并且为此提供 了巧妙的“查询联盟”解决方法 ,甚至

  • the syntax differences among even the simplest operations across all the major SQL databases, despite the fact that they all implement “the same SQL standard.”

    尽管所有主要SQL数据库都实现了“相同SQL标准”,但即使是最简单的操作之间的语法差异也是如此 。

Even the most straightforward recursive SQL query is massively complex and full of opportunities to accidentally hurt query performance. With GraphQL compiler, recursive queries remain simple without sacrificing performance. Everyone is a database expert when GraphQL compiler writes their queries!
即使是最直接的递归SQL查询,也非常复杂,并且充满了意外损害查询性能的机会。 使用GraphQL编译器,递归查询将保持简单而又不影响性能。 GraphQL编译器编写查询时,每个人都是数据库专家!

This minefield of query problems only grows as datasets get larger and cross-database queries become needed. In the absence of a powerful database-agnostic querying system, systems start to come apart at the seams, and more and more engineering time is spent simply gluing broken things back together. A few examples of what this might look like, loosely inspired from adventures in Kensho’s early days:

查询问题的雷区只会随着数据集变大和需要跨数据库查询而增长。 在缺乏功能强大的数据库不可知的查询系统的情况下,系统开始四分五裂,并且越来越多的工程时间花费在将破碎的东西粘在一起的过程上 。 从Kensho早期的冒险大体上启发了一些可能是什么样的示例:

  • The website search team is desperately trying to speed up search index build times, and is attempting to parallelize their indexing queries. Lacking the ability to adjust query execution based on real-time database CPU and network load statistics, they make an “educated guess” for the parallelism factor to use — fingers crossed that it doesn’t crash the database! Let’s hope they remembered to make sure their new queries still hit indexes correctly!网站搜索团队正拼命地试图加快搜索索引的构建时间,并试图并行化其索引查询。 缺乏基于实时数据库CPU和网络负载统计信息调整查询执行的能力,他们对要使用的并行性因素做出了“有根据的猜测” —手指交叉不会使数据库崩溃! 希望他们记得确保他们的新查询仍然正确命中索引!
  • One of the product teams is dismayed to discover that their backend got a lot slower over the course of a week, even though the endpoint results are unchanged and no updates to that code path were deployed. The backend queries database A for some data, then cross-references the result against database B. After much debugging, they discover that another team loaded some historical data into database A — this new data has increased the query result from database A that the backend cross-references against database B, even though nothing in database B will match this new data. The backend code is rewritten to query database B first and then cross-reference against database A.一个产品团队很沮丧地发现,他们的后端在一周的时间内变慢了很多,即使端点结果保持不变并且没有部署该代码路径的更新。 后端向数据库A查询一些数据,然后将结果与数据库B交叉引用。经过大量调试后,他们发现另一个团队将一些历史数据加载到数据库A中-这个新数据增加了后端数据库A的查询结果即使数据库B中的任何内容都无法匹配此新数据,也可以对数据库B进行交叉引用。 后端代码被重写为首先查询数据库B,然后与数据库A交叉引用。
  • And so on, across the entire organization.以此类推,遍及整个组织。

This is not anyone’s happy place, database expert or not.

这不是任何人的幸福之地,无论数据库专家如何。

These kinds of scaling-related problems aren’t unique to Kensho, either. For example, Dropbox points out a related scaling problem as motivation for Edgestore, as they realized they kept making database sharding decisions and inevitably outgrew them each time:

这些与缩放相关的问题也不是Kensho独有的。 例如,Dropbox指出了与Edgestore相关的扩展问题,因为他们意识到自己不断做出数据库分片决策,每次都不可避免地超过了它们:

“Reactively (re)sharding individual databases as they hit capacity limits was cumbersome; conversely, setting up new database clusters for every use case added a lot of overhead.” — Dropbox

“当单个数据库达到容量限制时,进行React(重新)分片很麻烦; 相反,为每个用例设置新的数据库集群会增加很多开销。” — Dropbox

In fact, these are all real-life Kobayashi Maru scenarios, situations where defeat is certain no matter our course of action. In our examples above, the best parallelism factor, optimal query order, or ideal sharding setup for any point in time is a function of things that are constantly changing: the volume of data in each database, the current CPU utilization and network load, the particular version of the database executing the query, which indexes are available to help, etc. Any choice here could be right or wrong, depending on the circumstances — so any fixed choice made here by a human at one point in time is a liability, a future “epic debugging story”, or perhaps even an outage waiting to happen.

实际上,这些都是现实中的小林丸场景,无论我们采取何种行动,都必定会遭受失败。 在上面的示例中,任何时间点的最佳并行度因子,最佳查询顺序或理想分片设置都是不断变化的函数:每个数据库中的数据量,当前CPU利用率和网络负载,执行查询的数据库的特定版本,可以使用哪些索引等。 根据情况的不同, 此处的任何选择可能是对还是错 –因此,人员在某个时间点在此处做出的任何固定选择都是一种责任,未来的“史诗般的调试故事”,甚至可能是停运等待发生。

Since relying on humans for these decisions is guaranteed to eventually be a losing strategy, we thought it best to teach the machines instead.

由于依靠人类做出这些决定肯定会最终成为失败的策略,因此我们认为最好教机器。

与数据库无关的查询如何帮助Kensho (How database-agnostic querying helps Kensho)

Rather than teaching every engineer at Kensho how to navigate these minefields on their own, we decided to “teach” our code instead. We figured out the most common set of query functionality our engineers relied on and ensured the GraphQL compiler’s query semantics were powerful enough to fulfill their needs. We now teach our users a single query language that works in the same predictable way regardless of the underlying database type and version.

我们没有在Kensho教每个工程师如何独自导航这些雷区,而是决定“教”我们的代码。 我们找出了工程师所依赖的最常见的查询功能集,并确保GraphQL编译器的查询语义足以满足他们的需求。 现在,我们向用户讲授一种查询语言 ,无论底层数据库的类型和版本如何,该查询语言都可以以相同的可预测方式工作。

Everyone wins when products are built atop database-agnostic queries! Products can be rapidly built atop a common query platform with access to all data. Meanwhile, database engineers can freely decide to move data around or switch to a completely new database type without breaking products. Every query that reaches those systems is guaranteed to apply all best practices. As best practices evolve, users’ queries are automatically recompiled — everything “just works.”
在不依赖数据库的查询之上构建产品时,每个人都会取胜! 可以在可访问所有数据的通用查询平台上快速构建产品。 同时,数据库工程师可以自由决定移动数据或切换到全新的数据库类型,而不会破坏产品。 保证到达那些系统的每个查询都将应用所有最佳实践。 随着最佳实践的发展,用户的查询将自动重新编译-一切“正常”。

Along the way, every time we discovered a query performance problem, an unexpected database limitation, or an outright database bug, we added a compilation rule in the GraphQL compiler that ensured we never hit that issue again. Rather than just fixing each immediate problem one at a time (“this query is making the product slow”), the compiler let us banish entire classes of problems at a time: “this query structure makes queries slow, always avoid it.” This gave us immense leverage!

在此过程中,每次发现查询性能问题,数据库意外限制或数据库错误时,我们都会在GraphQL编译器中添加一个编译规则,以确保我们再也不会遇到该问题。 编译器让我们一次消除了所有 问题类别,而不仅仅是一次解决每个眼前的问题(“此查询使产品变慢”):“此查询结构使查询变慢,始终避免它。” 这给了我们巨大的杠杆作用!

We started with five rules, grew to ten, then to twenty and beyond, and without realizing it, went through a phase of emergence of complexity beyond any human’s capacity. Within a few months, the compiler was simply better at writing queries — better than even the group of humans that designed and implemented the rules the compiler was merely following. It was more consistent, it never forgot or misinterpreted a rule, it never had an “off day” and slipped up — and if it ever did make a mistake, that mistake became a valuable test case, was quickly corrected, and never ever came back.

我们从五个规则开始,发展到十个规则,然后发展到二十个规则,甚至更多,直到没有意识到它,才经历了人类无法承受的复杂性出现阶段。 在短短几个月内,编译器在编写查询方面就变得更好了,甚至比设计和实现编译器仅遵循的规则的一群人都要好。 它更加一致,它永远不会忘记或误解一条规则,它从来没有过“休息日”并溜走了-如果它确实犯了一个错误,那么该错误就成为了一个有价值的测试案例,可以Swift纠正,并且永远不会出现回去

As we scale our use of the compiler to more and larger databases, its advantages over even expert humans are only growing. Here are a few examples we hope to cover in more detail in future posts:

随着我们将编译器的使用规模扩大到更多和更大的数据库,它相对于专业人士的优势也在不断增长。 以下是我们希望在以后的帖子中更详细介绍的一些示例:

  • GraphQL compiler is now able to estimate query costs, and use those estimates to automatically parallelize, paginate, and (soon) order cross-database queries while maintaining a predictable impact on index use. As data shapes and sizes change, the compiler’s query execution decisions change with them to keep query performance in the sweet spot.

    GraphQL编译器现在能够估算查询成本 ,并使用这些估算值自动并行化,分页和(很快)排序跨数据库查询,同时保持对索引使用的可预测影响。 随着数据形状和大小的变化,编译器的查询执行决策也会随之变化,以使查询性能始终处于最佳状态。

  • Large normalized schemas are unwieldy and difficult to navigate, so the compiler allows querying users to use macros to reshape their perception of the schemas to fit their needs, without requiring any changes to the underlying data or databases. Do you wish the Company table directly pointed to the “current CEO” row of the Person table, instead of having to constantly query for “the company’s officer with title "CEO" whose role start date is in the past, and whose end date either doesn’t exist or is still in the future?” Then just define that macro, and “current CEO” appears as a relation between Company and Person in your view of the schema, even if Company and Person live in different databases!

    大型规范化架构笨拙且难以导航,因此编译器允许查询用户使用宏来重塑对架构的理解以适应他们的需求,而无需对基础数据或数据库进行任何更改。 你希望Company直接指向的“现任CEO”行表Person ,而不必不断地查询“ 公司的官员与标题表, "CEO" ,其作用开始日期是在过去,并且其结束日期设为不存在或仍在未来? 然后,只需定义该宏,就可以在的架构视图中将“现任首席执行官”显示为CompanyPerson之间的关系,即使CompanyPerson位于不同的数据库中!

  • We are even working on a way to integrate non-database sources of data, such as file systems, APIs, and machine learning models, into our database-agnostic querying universe. To integrate a new source of data using this approach, one would simply have to describe its schema and implement four simple functions that become the backbone of a provably-efficient query interpreter over that data set, allowing all the same query capabilities as when querying any other database.

    我们甚至正在研究一种将非数据库数据源(例如文件系统,API和机器学习模型)集成到与数据库无关的查询Universe中的方法。 要使用这种方法集成新的数据源,只需描述其模式并实现四个简单功能 ,这些功能就成为该数据集上可证明有效的查询解释器的骨干,从而提供与查询任何数据时相同的查询功能。其他数据库。

If you’ve reached this point, thank you! We appreciate you reading this far, and we are sure you must have many questions, like “Why does GraphQL compiler not use standard GraphQL semantics?” and “How can you represent all those kinds of databases in a single database-agnostic way?” These are questions that we’ll explore in subsequent blog posts, so please stay tuned! In the meantime, here’s a demo repository where you can experiment with GraphQL compiler’s flavor of database-agnostic querying.

如果您到了这一步,谢谢! 非常感谢您阅读本文,并且确保您有很多问题,例如“ 为什么GraphQL编译器不使用标准GraphQL语义? ”和“ 如何以一种与数据库无关的方式表示所有这些类型的数据库? ”这些是我们将在后续博客文章中探讨的问题,请继续关注! 同时, 这是一个演示存储库 ,您可以在其中试用GraphQL编译器的与数据库无关的查询风格。

Our team is incredibly proud of the work we’ve done so far on GraphQL compiler, but our journey has just begun. Join us on this adventure by reaching out on Twitter, using GraphQL compiler for your projects, contributing to it on GitHub, or getting paid to work on it as a member of the Kensho team!

到目前为止,我们的团队为GraphQL编译器上所做的工作感到无比自豪,但是我们的旅程才刚刚开始。 加入我们的冒险之旅, 在Twitter上进行接触 ,使用GraphQL编译器来处理您的项目, 在GitHub上进行贡献或以Kensho团队成员的身分获得酬劳 !

Thanks to Bojan Serafimov, Caroline Gerenyi, Joshua Pan, Julian Goetz, Leon Wu, Melissa Whitehead, Michaela Shtilman-Minkin, Pedro Mantica, Selene Chew, and to all GraphQL compiler contributors on GitHub!

感谢Bojan Serafimov,Caroline Gerenyi,Joshua Pan,Julian Goetz,Leon Wu,Melissa Whitehead,Michaela Shtilman-Minkin,Pedro Mantica,Selene Chew,以及 GitHub上的所有GraphQL编译器贡献者

翻译自: https://blog.kensho.com/database-agnostic-querying-is-unavoidable-at-scale-18895f6df2f0

如何避免循环查询数据库


http://www.taodudu.cc/news/show-3329187.html

相关文章:

  • SmartSQL - 最方便、快捷的数据库文档查询生成工具(推荐)
  • 千万级数据库使用索引查询速度更慢的疑惑-数据回表问题
  • 5 个免费在线 SQL 数据库环境,简直太方便了!
  • navicate导出数据库结构为PDF文档格式
  • Java 导出数据库表信息生成Word文档
  • 如何将PDF如何存入MySQL_如何保存PDF、Word和Excel文件到数据库中
  • mysql数据库技术教材答案_MYSQL数据库习题解答.pdf
  • 微服务跨数据库联合查询_微服务架构下,解决数据库跨库查询的一些思路
  • java将数据库数据转换成word文档并且生成pdf文件
  • mysql数据库怎么保存pdf文件_使用来自mysql数据库的图像生成PDF文件
  • 达梦数据库DM8-多表连接查询
  • 怎么设置uboot从u盘启动linux,rt5350使用uboot从u盘启动linux成功含从u盘加载镜像与rootfs...
  • 中图分类法----G 文化、科学、教育、体育
  • java JDBC连接mysql
  • 解决jpgraph汉字乱码的两种方法
  • IPv6入门教程
  • Cisco IOS上Segment Routing TE的简单实验
  • 抽象代数练习(I)
  • TR069浅析
  • python面向对象代码_python_面向对象—代码练习
  • 新闻编辑html,移动互联网环境下HTML5新闻编辑特点分析.doc
  • 特斯拉盘初大跌5.61% 早前推出基础款Model 3
  • 股价大跌时如何抄底?
  • 大跌的思考
  • 贪婪的联发科提价惹怒中国手机,市场份额因此大跌
  • 投资理财-大跌怎么操作?
  • 苹果股价上月大跌25.6美元 几乎抹去前两个月涨幅
  • 申宝股票-新能源股大跌
  • 深富策略:白酒大跌 大消费跟风
  • 白酒的大跌

如何避免循环查询数据库_与数据库无关的查询是不可避免的相关推荐

  1. postgresql 查询序列_时间序列数据库(TSDB)初识与选择

    背景 这两年互联网行业掀着一股新风,总是听着各种高大上的新名词.大数据.人工智能.物联网.机器学习.商业智能.智能预警啊等等. 以前的系统,做数据可视化,信息管理,流程控制.现在业务已经不仅仅满足于这 ...

  2. mysql 在不同的数据库间查询语句_有关数据库SQL递归查询在不同数据库中的实现方法...

    本文给大家介绍有关数据库SQL递归查询在不同数据库中的实现方法,具体内容请看下文. 比如表结构数据如下: Table:Tree ID Name ParentId 1 一级  0 2  二级 1 3  ...

  3. tidb数据库_异构数据库复制到TiDB

    tidb数据库 This article is based on a talk given by Tianshuang Qin at TiDB DevCon 2020. 本文基于Tianshuang ...

  4. sql还原数据库备份数据库_有关数据库备份,还原和恢复SQL面试问题–第IV部分

    sql还原数据库备份数据库 In this article, we'll see the how the backup-and-restore meta-data tables store the i ...

  5. 操作 神通数据库_国产数据库最好的时代

    全文约2580字,阅读约15分钟 近日,墨天轮发布了2020年新一期的国产数据库名单,东方国信完全自主研发的分布式分析型数据库CirroData名列其中. "墨天轮"是国内数据库领 ...

  6. sql还原数据库备份数据库_有关数据库备份,还原和恢复SQL面试问题–第一部分

    sql还原数据库备份数据库 So far, we've discussed a lot about database backup-and-restore process. The backup da ...

  7. sql还原数据库备份数据库_有关数据库备份,还原和恢复SQL面试问题–第二部分

    sql还原数据库备份数据库 In this article, we'll walk through, some of the refined list of SQL Server backup-and ...

  8. sql还原数据库备份数据库_有关数据库备份,还原和恢复SQL面试问题–第三部分

    sql还原数据库备份数据库 So far, we've discussed a lot about database backup commands. In this article, we'll d ...

  9. gp数据库创建数据库_创建数据库简介

    gp数据库创建数据库 MySQL是当今最流行的开源数据库之一. 它在商业和开源双重许可模式下可用. MySQL找到了从嵌入式设备到集群企业环境的各种应用程序. POWER5™处理器是IBMPPC®AS ...

最新文章

  1. 【Kali渗透全方位实战】Aircrack-ng拿隔壁女神家的wifi密码
  2. 万众期待的kintone开发账号免费开放申请啦!
  3. 当前工程读取配置文件示例
  4. Markdown编辑器攻略——字体,字号,颜色
  5. java获取其他类的线程,使用Java实现面向对象编程——第七章 多线程
  6. OpenID Connect:OAuth 2.0协议之上的简单身份层
  7. oracle导出表中某天数据命令,Oracle数据库使用命令行导入导出数据表及数据内容(本地、远程)...
  8. 编程基础 之 位运算专题
  9. c语言 动态链表,C语言的链表(篇章之二:动态链表)
  10. magento smtp设置
  11. centos调整页面大小_新手教程!设置PDF文件的页面大小
  12. 预防 Android Dex 64k Method Size Limit
  13. 一个牧场目前一共有20头刚出生的羊,母羊、公羊各一半。假如母羊5岁时后每年生一胎(母羊,公羊各一半)。羊活到10岁后死亡。请问20年后这个牧场有多少只羊? 请用C#写出算法。
  14. python pip中的Fatal error in launcher错误及解决
  15. 团队作业3 需求改进系统设计
  16. 电脑硬件升级完全解决方案2
  17. MOS管寄生电容是如何形成的?
  18. 中国物流企业家谈“物流”
  19. linux学习笔记16
  20. 在R语言下配置企业微信机器人

热门文章

  1. Idea离线安装阿里代码规约插件
  2. Java语言学习--Swing中Button事件监听
  3. 16个HTML5 框架、模板以及生成工具
  4. 印度理工学院亚洲的麻省理工学院!
  5. BugkuWeb题目解析
  6. win10怎样锁定计算机,win10
  7. 乐Pro3_乐视X728_官方线刷包_救砖包_解账户锁
  8. 【Unity3D入门教程】鼠标和键盘输入与控制
  9. sftp,ftp文件下载
  10. 三级信息安全技术真题知识点总结-第二套