

SQL is a powerful language. SQL is a part of most of the tech stacks you’ll work with. For a developer, the use of SQL might be limited to inserting and retrieving data in the database, but for data analysts, data scientists and data engineers, it is usually much more than that. SQL gives you direct access to the database — there’s a whole lot of analytics that can be done right there — without getting data out of the database and loading it into pandas or PySpark. Obviously, what you can do within the database is limited because of the resources.

SQL是一种强大的语言。 SQL是您将使用的大多数技术堆栈的一部分。 对于开发人员而言,SQL的使用可能仅限于在数据库中插入和检索数据,但是对于数据分析人员,数据科学家和数据工程师而言,SQL的用途远不止于此。 SQL使您可以直接访问数据库-可以在此处直接进行很多分析-无需将数据从数据库中取出并加载到pandas或PySpark中。 显然,由于资源的原因,您在数据库中可以执行的操作受到限制。

From what I have observed over the years, people who work with a statistical programming language like R, Julia or Python tend to do almost everything in that language whereas some of the stuff can be actually done more efficiently with SQL at times. Apart from the basic selects, inserts, updates, joins and subqueries, there are a lot of advanced features of SQL which can be used for data analysis that we don’t exploit often enough.

根据我多年来的观察,使用诸如R,Julia或Python之类的统计编程语言的人往往会使用该语言来执行几乎所有操作,而有些事情实际上有时可以用SQL更有效地完成。 除了基本的选择,插入,更新,联接和子查询外,还有很多SQL的高级功能可用于数据分析,但我们很少利用这些功能。

There’s a post on KDNuggets which says that it’s the last guide that you’d need for data analysis. Although it’s a well written guide but I think that it definitely is NOT the last guide you’d need for data analysis. You’ll need to know more. I’d say that the Medium post you’re reading right now is also not the last guide you’ll need to be great at SQL. This just talks about a few neglected, underused but powerful features of SQL. Let’s go ahead and go over some of them.

KDNuggets上有一篇文章,说这是您进行数据分析所需的最新指南 。 尽管这是一本写得很好的指南,但我认为它绝对不是您进行数据分析所需的最后指南。 您需要了解更多。 我想说的是,您现在正在阅读的中级帖子也不是您精通SQL的最新指南。 此处仅讨论了一些被忽略,未充分利用但功能强大SQL功能。 让我们继续研究其中的一些。

分层查询 (Hierarchical Queries)

Enterprise relational databases like Oracle had started supporting storage and retrieval of hierarchical data long time ago. Before MySQL 8 was released, MySQL was probably one of the few databases which didn’t support a straightforward way of querying hierarchical data. I have had to refer this article by Mike Hillyer many times over the last couple of years while implementing hierarchical storage in MySQL. It’s a great read.

像Oracle这样的企业关系数据库很久以前就开始支持分层数据的存储和检索。 在发布MySQL 8之前,MySQL可能是少数几个不支持直接查询分层数据的数据库之一。 在MySQL中实施分层存储时,最近几年我不得不多次引用Mike Hillyer的文章 。 这是一本好书。

Hierarchical data is everywhere if you think about it — categories and sub-categories and further subcategories of products, organizational hierarchy, animal and plant species, family trees and so on. Normal SQL features aren’t enough to query hierarchical data efficiently as it would result in a lot of subqueries (and confusion). In MySQL 5.7 or earlier, you’d use something called session variables to do hierarchical queries and in MySQL 8 or later and in other databases, you’d use recursive common table expressions.

如果您考虑一下,分层数据无处不在-产品的类别和子类别以及其他子类别,组织层次结构,动植物种类,家谱等。 普通SQL功能不足以有效地查询分层数据,因为它会导致大量子查询(和混乱)。 在MySQL 5.7或更早的版本中,您将使用称为会话变量的内容进行分层查询,而在MySQL 8或更高版本中以及其他数据库中,则将使用递归公用表表达式。

Hierarchical data is everywhere — product categories & subcategories, organizational hierarchies, family trees etc.


I’ll give you some context on this. An ex-colleague called me up one day and asked me about how to run a hierarchical query on MySQL 5.7 — until this version MySQL did not support common table expressions. So, here’s what the query would look like. Let’s now talk about CTEs


