观前提示:
-
这个图先自行背诵至滚瓜烂熟。
文档中论述题目解题思路时没有特殊情况都按照上面的程序执行顺序为准,个别题也会以其他更自然的顺序讲解。
-
本文的知识点整理仅起提示作用,一些不常用功能、参数可能会遗漏,如需了解详细用法请自行百度。
-
有的题涉及多个知识点,则取按重要性排名的前2-3个知识点归入相应分类(同一道题可能在本文出现多次)。
-
本文持续更新,更新顺序基本是由易到难。
-
有些明显有上下级关系的分类,则只取高级分类归入。如差集必然涉及子查询,则只归入差集。自联结是普通联结中的特殊情况则不再归入普通联结。
-
很多复杂的题写子查询套太多层,就统一使用临时表了。
文章目录
- 0. 常用聚合函数
- 1. distinct去重
- 第176题
- 第177题
- 第597题
- 第1045题
- 第1113题
- 第1141题
- 第1149题
- 第1919题
- 第2082题
- 第2205题
- 第2356题
- 2. where 筛选
- 普通用法
- 第584题
- 第1132题
- 第1148题
- 第1495题
- 第2026题
- (列1, 列2, ...) in 语句
- 第184题
- 第512题
- 第585题
- 第614题
- 第1070题
- 第1164题
- 第1174题
- not in 语句(找差集)
- 第183题
- 第607题
- 第1083题
- 第1084题
- 第1264题
- 第1350题
- 第1412题
- 第1607题
- 第1978题
- and 同时满足多个条件
- 第196题
- 第620题
- 第1757题
- 第2020题
- 第2230题
- or 满足多个条件中的一个条件即可
- 第595题
- 3. 联结
- 3.1 笛卡尔积
- 第1280题
- 第1783题(列转行/宽转长)
- 第1990题(查固定字段)
- 3.2 内外联结
- 内联结
- 第1073题
- 第1251题
- 第1571题
- 第1587题
- 第1623题
- 第1731题
- 第1919题
- 第2329题
- 第2339题
- 外联结
- 第175题
- 第262题
- 第577题
- 第580题
- 第618题
- 第1068题
- 第1098题
- 第1132题
- 第1158题
- 第1241题
- 第1364题(四表联查)
- 第1378题
- 第1407题
- 第1421题
- 第1581题
- 第1677题
- 第1715题
- 第1809题
- 第1988题
- 第2142题
- 3.3 自连接
- 第180题
- 第181题
- 第196题
- 第197题
- 第570题
- 第603题
- 第612题
- 第613题
- 第626题
- 第1097题
- 第1270题
- 第1285题(连续区间)
- 第1321题
- 第1459题
- 第1747题
- 第1843题
- 第1875题
- 第1951题
- 第1939题
- 第1949题
- 第1951题
- 第2084题
- 第2228题
- 4. union 求并集(常用于列转行/宽转长)
- 第602题
- 第1205题
- 第1212题
- 第1264题
- 第1341题
- 第1435题
- 第1501题
- 第1555题
- 第1699题
- 第1789题
- 第1783题
- 第1795题
- 第1811题
- 第1841题
- 第1907题
- 第1949题
- 第1965题
- 第1990题(查固定字段)
- 5. 子查询/嵌套查询
- 第176题
- 第177题
- 第597题
- 第1076题
- 第1082题
- 第1107题
- 第1112题
- 第1126题
- 第1174题
- 第1241题
- 第1303题
- 第1468题
- 第1821题
- 第1831题
- 第1867题
- 第2112题
- 第2142题
- 6. 字符串相关函数
- 第1543题
- 第1565题
- 第1667题
- 第1683题
- 7. 时间相关函数
- 第1141题
- 第1142题
- 第1454题(连续区间)
- 第1479题
- 第1843题
- 第1853题
- 第1890题
- 第1939题
- 第2298题
- 8. group by 分组
- 单字段分组
- 第511题
- 第586题
- 第619题
- 第1069题
- 第1075题
- 第1194题
- 第1211题
- 第1322题
- 第1327题
- 第1484题
- 第1571题
- 第1633题
- 第1729题
- 第1821题
- 第1890题
- 多字段联合分组
- 第1050题
- 第1149题
- 第1193题
- 第1543题
- 第1693题
- 第1699题
- 第1741题
- 第1951题
- 9. having 筛选
- 第182题
- 第586题
- 第596题
- 第1050题
- 第1076题
- 第1501题
- 第1511题
- 第1587题
- 第1867题
- 第2041题
- 10. if/case 语句
- 简单的case语句(即只能判断是否相等)
- 第578题
- 第610题
- 第627题
- 第1083题
- 第1126题
- 第1173题
- 第1179题(行转列/长转宽)
- 第1211题
- 第1393题
- 第1445题
- 第1479题
- 第1511题
- 第1661题
- 第1699题
- 第1777题(行转列/长转宽)
- 第1783题(列转行/宽转长)
- 第1934题
- 第2308题
- 可搜索的case语句
- 第608题
- 第1294题
- 第1398题
- 第1440题
- 第1468题
- 第1873题
- 第1907题
- 第2051题
- 第2072题
- 第2298题
- 11. 窗口函数
- 第176题
- 第177题
- 第178题
- 第185题
- 第534题
- 第550题
- 第569题
- 第571题
- 第579题
- 第613题
- 第618题
- 第1076题
- 第1077题
- 第1112题
- 第1204题
- 第1225题
- 第1285题
- 第1308题
- 第1369题
- 第1412题
- 第1454题(连续区间)
- 第1532题
- 第1549题
- 第1596题
- 第1709题
- 第1811题
- 第1831题
- 第1875题
- 第1951题
- 第2066题
- 第2112题
- 第2159题
- 第2175题
- 第2228题
- 第2292题
- 第2308题
- 第2314题
- 第2324题
- 第2346题
- 12. 限定谓词
- 第574题
- 第586题
- 第1076题
- 第1082题
- 第1355题
- 第1867题
- 13. like模糊匹配和regexp 正则表达式
- 第1517题
- 第1527题
- 14. 临时表
- 第601题(找连续区间用编号法,同1285题)
- 第1164题
- 第1949题
- 第2238题
- 15. recursive递归
- 第571题
- 第1613题
- 第1651题
- 第1767题
0. 常用聚合函数
注意:由于基本每道题都在用聚合函数,就不把题总结进来了。
功能 | 函数 |
---|---|
求和 | sum([distinct]列名) |
计数(可去重) | count([distinct] 列名) |
求平均值 | avg([distinct]列名) |
求最大值 | max(列名) |
求最小值 | min(列名) |
字段拼接(可去重、指定顺序、指定分隔符) | group_concat([distinct] 列名1 [order by 列名2 asc/desc] [separator 分隔符]) |
1. distinct去重
功能 | 用法 |
---|---|
单字段去重 | distinct 列名 |
多字段去重 | distinct 列名1, 列名2 |
使用聚合函数时去重 | 聚合函数(distinct 列名) |
第176题
方法1:子查询中distinct 去重、order by 排序、limit + offset 输出第二位、主查询再查一次把空值变成null。
SELECT(SELECT DISTINCTSalaryFROMEmployeeORDER BY Salary DESCLIMIT 1 OFFSET 1) AS SecondHighestSalary
;
方法2:子查询中dense_rank() 排序、主查询中where 筛选、sum()或者max()或者min()聚合函数把空值变成null。
select max(salary) as SecondHighestSalaryfrom (select salary,dense_rank() over(order by salary desc) as rk from employee) twhere rk = 2;
第177题
类似第176题。用方法1要先set n = n - 1。
CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGINset n = n - 1;RETURN (# Write your MySQL query statement below.select salaryfrom ( select distinct salary from employeeorder by salary desc) tlimit 1 offset n);
END
第597题
方法一:先distinct + 多字段得到去重后的临时表,再count()对临时表中的数据计数,无法直接count(distinct + 多字段)。
方法二:为了解决无法直接count(distinct + 多字段)的问题可以把多字段用concat()拼成单字段。
select round(coalesce((t1.acc/t2.req), 0), 2) as accept_ratefrom (select count(distinct concat(requester_id, accepter_id)) as acc,'yyy' as tagfrom RequestAccepted) t1 inner join(select count(distinct concat(sender_id, send_to_id)) as req,'yyy' as tagfrom FriendRequest) t2 on t1.tag = t2.tag;
第1045题
group by 按用户id分组、子查询中得到所有产品的数量、having + count(distinct) 去重计数筛选出数量=所有产品数量的用户id。
select customer_idfrom customer
group by customer_idhaving count(distinct product_key) = (select count(*)from product);
第1113题
where 筛选,group by 分组,count(distinct) 统计。
select extra as report_reason,count(distinct post_id) as report_countfrom actionswhere action = 'report'and action_date = '2019-07-04'
group by extra;
第1141题
where + datediff 筛选、group by 分组、count(distinct) 计数。
select activity_date as day,count(distinct user_id) as active_usersfrom activitywhere datediff('2019-07-27', activity_date) < 30 and datediff('2019-07-27', activity_date) >= 0
group by activity_date;
第1149题
子查询中先将表去重一次,排除看同一本书的情况。主查询中按viewer_id和view_date 联合分组、having+ count() 筛选出看了2篇及以上文章的人、最后distinct去重是排除同一个人多天读多本书的情况~~(太学霸了)~~。
select distinct viewer_id as id from (select distinct article_id,viewer_id,view_datefrom views) t
group by viewer_id, view_datehaving count(*) >= 2
order by id;
第1919题
先给每个user1找朋友(user2):listens表和friendship表内联结,此时id大的人被去掉(因为没有必要同时出现2配5和5配2这样相同的组合)、再内联结listens表看每个user2当天听的和user1相同的歌、group by 按照user1_id、user2_id和day三字段联合分组、having + count(distinct)统计不同的歌曲数,大于3即可。最后select的时候也要distinct,因为有的user1和user2可能有很多天都听了同样的多首歌,会被筛出多次。
select distinct t1.user_id as user1_id,t3.user_id as user2_idfrom listens t1 inner join friendship t2on t1.user_id = t2.user1_idinner join listens t3on t2.user2_id = t3.user_id and t1.day = t3.dayand t1.song_id = t3.song_id
group by t1.user_id, t3.user_id, t1.dayhaving count(distinct t1.song_id) >= 3;
第2082题
where 筛选、count(distinct)计数。
select count(distinct customer_id) as rich_countfrom storewhere amount > 500;
第2205题
where 筛选、count(distinct)计数。
CREATE FUNCTION getUserIDs(startDate DATE, endDate DATE, minAmount INT) RETURNS INT
BEGINRETURN (# Write your MySQL query statement below.select count(distinct user_id) as user_cntfrom purchaseswhere amount >= minAmountand time_stamp between startDate and endDate);
END
第2356题
count + distinct不重复计数。
select teacher_id,count(distinct subject_id) as cnt from teacher
group by teacher_id;
2. where 筛选
功能 | 用法 |
---|---|
单条件筛选 | where 条件语句 |
多条件筛选(与) | where 条件1 and 条件2 |
多条件筛选(或) | where 条件1 or 条件2 |
目标字段在/不在子查询结果中 | where 字段 in / not in (子查询) |
目标元组在/不在子查询结果中 | where (字段1, 字段2) in / not in (子查询) |
普通用法
第584题
coalesce() 处理null值、where 筛选。
select namefrom customerwhere coalesce(referee_id, 0) <> 2;
第1132题
actions表 left join removals表 on post_id相同、where留下extra = 'spam’的数据(这里必须用where筛选,不能写在on后面)、round() + 100 * count(distinct t2.post_id) / count(distinct t1.post_id)得到每天的百分比(这里去重是因为有可能同一个垃圾广告被举报多次,就会有多条记录,但是实际上这只是1条广告,也只被删了1次)、再把前面的查询作为子查询,查出每天删除比率的平均值。
select round(avg(d_ratio), 2) as average_daily_percentfrom ( select 100 * count(distinct t2.post_id) / count(distinct t1.post_id) as d_ratiofrom actions t1 left join removals t2on t1.post_id = t2.post_idwhere t1.extra = 'spam'group by t1.action_date) t;
第1148题
where 筛选、distinct 去重、order by 排序。
select distinct author_id as idfrom viewswhere author_id = viewer_id
order by author_id asc;
第1495题
inner join 两表联结、where + 筛选条件。
select distinct t2.title as titlefrom (select content_idfrom tvprogramwhere date_format(program_date, "%Y-%m") = '2020-06') t1 inner join (select title,content_idfrom contentwhere kids_content = 'Y'and content_type = 'Movies') t2on t1.content_id = t2.content_id;
第2026题
where + 筛选条件。
select problem_idfrom problemswhere likes / (likes + dislikes) < 0.6
order by problem_id;
(列1, 列2, …) in 语句
第184题
先查出各部门最高工资表,再筛选出(部门,工资) in 各部门最高工资表中的员工。
select t2.name as department,t1.name as employee,t1.salaryfrom employee t1 inner join department t2 on t1.departmentId = t2.idwhere (t1.departmentId, salary)in ( select departmentId,max(salary)from employeegroup by departmentId);
第512题
先查出各player最早登录日期表,再筛选出(玩家id, 登陆日期) in 最早登录日期表中的玩家。
select player_id,device_idfrom activity t1where (player_id, t1.event_date) in ( select player_id,min(event_date)from activitygroup by player_id)
第585题
思路是“2015年和别人相同”、“坐标唯一”两个条件分开看,再用where in 取交集。
临时temp表中两个insurance表自连接,将2015年投资和别人一样的数据筛选出来、注意这样一个人可能联上了多条数据,所以最后要distinct去重。子查询中group by按经纬度多字段分组、having + count() 筛选出唯一坐标的pid(注意一般用了分组之后是要聚合的,但因为这次pid是唯一的,就不聚合了)。主查询从temp表里筛选出pid在子查询中的数据(即坐标唯一的数据),并sum()求和。
with temp as (select distinct t1.PID,t1.TIV_2016from insurance t1 inner join insurance t2on t1.PID <> t2.PIDand t1.TIV_2015 = t2.TIV_2015
)select round(sum(TIV_2016), 2) as TIV_2016from tempwhere PID in ( select PIDfrom insurancegroup by LAT, LONhaving count(*) = 1);
第614题
利用where in (子查询) 筛选出followee列中在follower列中的人、分组计数、排序(题目让按字典顺序排,其实就是按followee字母顺序排)
select followee as follower,count(*) as numfrom followwhere followee in (select follower from follow)
group by followee
order by followee;
第1070题
子查询中group by按product_id分组得到每个product_id和他对应的min(year)、主查询中用where (product_id, year) in 子查询筛选出需要的数据。
select product_id,year as first_year,quantity,pricefrom saleswhere (product_id, year) in ( select product_id,min(year)from salesgroup by product_id);
第1164题
方法1:子查询中分组聚合查出日期在8月16号前的max(日期),即16号前的最近改动日期(这里没法连着new_price一起查哈!分组后的数据是不会自动一一对应的)、再套一层查询用where (id, 日期) in 子查询,查出id和new_price、将之前的查询结果外联结到一个distinct id的子查询表中、这样联上了的就按联上的价格取,没联上的就是10元。
方法2:既然原表没有初始10元这个数据,我们就自己加上去。创建临时表:原表union all 查出每个id原价10元的子查询(包含distinct id、10、初始日期’0000-00-00’),之后涉及from的全部用这个临时表、where筛选出16号之前的数据、同方法1先查出id和max(日期)、再套一层查询得到id和价格。
with temp as (select product_id,new_price,change_datefrom productsunion all select distinct product_id,10,'0000-00-00'from products
)select product_id,new_price as price from tempwhere (product_id, change_date) in ( select product_id,max(change_date) as change_datefrom tempwhere change_date <= '2019-08-16'group by product_id);
第1174题
子查询查出用户首次购买数据、主查询统计order_date = pref_date的数据。这里注意一个小技巧:按条件计数,是即时订单+1,计划订单+0,可以用sum(case when),也可以直接写sum(条件),此时条件满足时自动把布尔值转换成1计算,不满足转换成0计算。
select round(100 * sum(order_date = customer_pref_delivery_date) / count(*), 2) as immediate_percentagefrom delivery where (customer_id, order_date) in ( select customer_id,min(order_date) as first from deliverygroup by customer_id);
not in 语句(找差集)
第183题
子查询出订了东西的用户、where not in 查出不订的用户。
select Name as Customersfrom Customers t1where t1.Id not in (select CustomerId from Orders);
第607题
子查询出向’red’销售过的人、再用where not in找差集。
select namefrom salespersonwhere sales_id not in (select sales_idfrom orders t1 natural join company t2where t2.name = 'RED');
第1083题
方法一:分别where筛选出买S8和iphone的人,再not in 找差集。
select distinct buyer_idfrom sales t1 left join product t2on t1.product_id = t2.product_idwhere buyer_id not in (select buyer_idfrom sales t1 left join product t2on t1.product_id = t2.product_idwhere product_name = 'iPhone')and product_name = 'S8';
方法二:用sum(case when)的方法绕开子查询达到找差集目的,具体做法是求和时遇到“S8”就+1,遇到“iphone”就-1,只买“S8”的人的求和结果就是1,都买了的人就是0。
select buyer_idfrom sales t1 left join product t2on t1.product_id = t2.product_idwhere product_name in ('S8', 'iPhone')
group by buyer_idhaving sum(distinct case product_name when 'S8' then 1 else -1 end) = 1;
第1084题
子查询出在目标时间范围以外卖过的产品、where not in找差集。
select product_id,product_namefrom productwhere product_id not in (select distinct product_idfrom saleswhere sale_date > '2019-03-31'or sale_date < '2019-01-01')
第1264题
子查询中将friendship原表和交换左右顺序的原表union all取并集、套一层查询从第二列找到user_id=1的朋友、主查询中查distinct page_id,因为有的人喜欢同一个页面、where进行筛选,条件是user_id在上面子查询的朋友中且page_id不等于1号已经喜欢的page_id(要再用个子查询得到1号喜欢的page_id)。
select distinct page_id as recommended_pagefrom likeswhere user_id in (select user2_idfrom (select user1_id,user2_idfrom friendshipunion all select user2_id,user1_idfrom friendship) t1where user1_id = 1)and page_id not in (select page_idfrom likeswhere user_id = 1);
第1350题
not in 找差集。
select id,namefrom studentswhere department_id not in (select id from departments);
第1412题
临时temp表中用min()、max()窗口函数找到每次考试最高最低分,再顺手求个当前学生分数和最高/最低分的分差、套一层查询筛选出两个分差乘积为0的id,这一步也就是在找考了最高分或最低分的id。
主查询中where筛选出在exam表(即参加了考试),但不在temp表(即没考到最高或最低分)中的人。
with temp as (select distinct student_idfrom (select student_id,score - min(score) over(partition by exam_id order by score asc) as d1,score - max(score) over(partition by exam_id order by score desc) as d2from exam) t where d1 * d2 = 0
)select student_id,student_namefrom studentwhere student_id in (select student_id from exam)and student_id not in (select student_idfrom temp);
第1607题
not in 找差集。
select seller_namefrom seller t1where t1.seller_id not in (select seller_id from orderswhere year(sale_date) = '2020')
order by seller_name;
第1978题
not in 找差集。
select employee_idfrom employeeswhere salary < 30000and manager_id not in (select employee_id from employees)
order by employee_id;
and 同时满足多个条件
第196题
inner join自连接、delete删除。
delete t1 from Person t1 inner join Person t2 on t1.email = t2.email and t1.id > t2.id;
第620题
where A and B。
select id,movie,description,ratingfrom cinemawhere description <> 'boring'and id % 2 = 1
order by rating desc;
第1757题
where A and B。
select product_idfrom productswhere low_fats = 'Y' and recyclable = 'Y';
第2020题
natural join 联结两表、where 筛选。
select count(*) as accounts_countfrom subscriptions t1 natural join streams t2where year(end_date) = '2021'and year(stream_date) != '2021';
第2230题
where A and B。
CREATE PROCEDURE getUserIDs(startDate DATE, endDate DATE, minAmount INT)
BEGIN# Write your MySQL query statement below.select distinct user_id from purchaseswhere amount >= minAmountand time_stamp between startDate and endDateorder by user_id;
END
or 满足多个条件中的一个条件即可
第595题
where 条件1 or 条件2。
select name,population,areafrom worldwhere area >= 3000000 or population >= 25000000;
3. 联结
3.1 笛卡尔积
笛卡尔积是理解所有联结的基础。笛卡尔积就是不管表1和表2有没有关系,将他们的数据强行组合在一起。例如表1有2排,表2有3排,笛卡尔积则有2×3=6排。
所有联结的执行方式都是先产生两表的笛卡尔积,再在此基础上进行筛选出需要的数据。
笛卡尔积用于得到两表行与行之间的全部排列组合情况。
功能 | 用法 |
---|---|
产生两个表的笛卡尔积(不能跟筛选条件) | from 表1, 表2 |
产生两个表的笛卡尔积(能跟筛选条件) | from 表1 cross join 表2 |
注意:
(1)inner join / join 不加筛选条件也得到笛卡尔积,但不建议这样写,表述不清。left join / right join 不能不跟筛选条件。
(2)之所以使用笛卡尔积就是为了得到排列组合的所有情况,否则还不如使用inner join,所以原则上来说笛卡尔积后面是不跟筛选条件的。
(3)由于本人水平原因,很多题使用了先笛卡尔积再筛选的写法,类似先while True再break,其实都可以优化成inner join,大家不要盲目照搬。
第1280题
from 表1 cross join 表2得到笛卡尔积,再left join表3、group by分组、count()计数。
这里注意2点:① 因为表1和表2无关所以才适合用cross join,否则一般不用;② 一般不能两个join连着写最后再写on条件,但这次因为前面是cross join本来就不需要跟筛选条件所以没关系。
select t1.student_id,t1.student_name,t2.subject_name,count(t3.subject_name) as attended_examsfrom students t1 cross join subjects t2 left join examinations t3on t1.student_id = t3.student_idand t2.subject_name = t3.subject_name
group by t1.student_id, t2.subject_name
order by t1.student_id, t2.subject_name;
第1783题(列转行/宽转长)
方法1:用多个union all 将championships表的数据汇总到一列并将列名改成player_id、natural join players 表、group by 分组、count() 统计。
select player_id,player_name,count(*) as grand_slams_countfrom ( select wimbledon as player_idfrom championshipsunion allselect fr_open as player_idfrom championships t1union allselect US_open as player_idfrom championshipsunion allselect Au_open as player_idfrom championships) t1 natural join players t2
group by player_id;
方法2:两张表生成笛卡尔积,此时每个球员都对应一份championships表的全部数据、group by 分组、sum(case when) 筛选统计。
SELECT player_id,player_name, SUM(IF(player_id=Wimbledon,1,0)+IF(player_id=Fr_open,1,0)+IF(player_id=US_open,1,0)+IF(player_id=Au_open,1,0)) AS grand_slams_count
FROM Championships
JOIN Players
GROUP BY player_id
HAVING grand_slams_count>0
第1990题(查固定字段)
union all枚举创建platform表和experiment_name表、cross join得到这两个表笛卡尔积、left join experiments表+group by+count(*) 计数。
select t1.platform,t2.experiment_name,count(t3.experiment_name) as num_experimentsfrom (select 'Android' as platform union allselect 'IOS' as platform union allselect 'Web' as platform) t1 cross join(select 'Reading' as experiment_name union allselect 'Sports' as experiment_name union allselect 'Programming' as experiment_name) t2 left joinexperiments t3on t1.platform = t3.platformand t2.experiment_name = t3.experiment_name
group by platform, experiment_name;
3.2 内外联结
功能 | 用法 |
---|---|
内联结 | 表1 inner join 表2 on 条件 |
自然联结 | 表1 natural join 表2 |
左外联结 | 表1 left join 表2 on 条件 |
右外联结 | 表1 right join 表2 on 条件 |
注意:
(1)natural join是一种特殊的inner join,区别是:
① natural join会选择外键进行连接,没有外键时自动选择两表相同字段进行联结,而无需设置on条件。
② natural join后两表相同的字段即被合并,而普通的inner join不会自动合并。
例如:表1有id、性别2列,表2有id、姓名2列。
inner join后的表有4列:表1id、表1性别、表2id、表2姓名。
natural join后的表只有3列:id、表1性别、表2姓名。
(2)通过1的例子可以发现,inner join(外联结也一样)后经常会出现重复的列名,为了避免提示ambiguous错误,建议任何联结时都给表标t1、t2别名,且使用任何列名时都以t1.列名、t2.列名的形式书写。
(3)外联结一般用于保留联结时联不上的地方产生的null值。即使用外联结以外的联结方式并跟上筛选条件时,未联上/选上的行会被自动排除,但使用外联结时则会保留并显示null。
(4)如果涉及多个表联结,表1 join 表2后马上跟on条件,不要都join完了最后再来慢慢写筛选条件。
(5)两表内联结时如果要联结的列名相同,则可以用on 表1.列名 = 表2.列名,还可以用using(列名)。区别是第一种用法不会合并用于联结的同名列,而using()会合并,效果类似natural join,使用using()时前面不要写on。
(6)内联结时把筛选条件写在on里和联完后的where里对查询结果是没有影响的,但外联结就有影响。原因是:在外联结中,筛选条件写在on里,没联上的数据也会保留null值,而筛选条件写在where里,没选上的数据会直接消失。例如:
表1
id | 姓名 | 出生日期 |
---|---|---|
1 | 张三 | 1997-8-22 |
2 | 李四 | null |
3 | 王五 | 1997-8-20 |
表2
id | 性别 |
---|---|
1 | 男 |
2 | 女 |
表1 left join 表2 on 表1.id = 表2.id and t1.出生日期 is not null
t1.id | t1.姓名 | t2.id | t2.性别 |
---|---|---|---|
1 | 张三 | 1 | 男 |
2 | 李四 | null | null |
3 | 王五 | null | null |
注意观察出生日期是null的李四还在表里,只是没跟她的性别联上而已。如果后续我们进行一个count()操作,会发现数出来有3人。
表1 left join 表2 on 表1.id = 表2.id where t1.出生日期 is not null
t1.id | t1.姓名 | t2.id | t2.性别 |
---|---|---|---|
1 | 张三 | 1 | 男 |
3 | 王五 | null | null |
出生日期是null的李四直接消失。如果后续我们进行一个count()操作,会发现数出来不是上一个情况的3人,而是2人。
内联结
第1073题
inner join 两表联结、group by 分组、avg()算平均数。
第1251题
inner join 内联结、group by分组、round()和sum()等函数计算数据。
select t1.product_id,round(sum(t2.units * t1.price) / sum(t2.units), 2) as average_pricefrom Prices t1 inner join UnitsSold t2on t2.purchase_date between t1.start_date and t1.end_dateand t1.product_id = t2.product_id
group by product_id;
第1571题
inner join 两表联结、group by 分组、sum()求和。
select name as WAREHOUSE_NAME,sum(units * v_pro) as VOLUMEfrom warehouse t1 inner join (select product_id,width * length * height as v_profrom products) t2on t1.product_id = t2.product_id
group by t1.name;
第1587题
inner join 两表联结、group by 分组、sum()求和、having 筛选。
第1623题
一边inner join一边筛选。
select t1.student_name as member_A,t2.student_name as member_B,t3.student_name as member_Cfrom schoola t1 inner join schoolb t2on t1.student_name <> t2.student_name and t1.student_id <> t2.student_idinner join schoolc t3on t2.student_name <> t3.student_nameand t1.student_name <> t3.student_nameand t1.student_id <> t3.student_idand t3.student_id <> t2.student_id;
第1731题
inner join 把经理名字联到reports_to的id后面、group by 按经理id分组、count() 计数、avg() 求平均值、round() 四舍五入。
select t1.reports_to as employee_id,t2.name,count(t1.reports_to) as reports_count,round(avg(t1.age), 0) as average_agefrom employees t1 inner join employees t2on t1.reports_to = t2.employee_id
group by t1.reports_to
order by t2.employee_id;
第1919题
先给每个user1找朋友(user2):listens表和friendship表内联结,此时id大的人被去掉(因为没有必要同时出现2配5和5配2这样相同的组合)、再内联结listens表看每个user2当天听的和user1相同的歌、group by 按照user1_id、user2_id和day三字段联合分组、having + count(distinct)统计不同的歌曲数,大于3即可。最后select的时候也要distinct,因为有的user1和user2可能有很多天都听了同样的多首歌,会被筛出多次。
select distinct t1.user_id as user1_id,t3.user_id as user2_idfrom listens t1 inner join friendship t2on t1.user_id = t2.user1_idinner join listens t3on t2.user2_id = t3.user_id and t1.day = t3.dayand t1.song_id = t3.song_id
group by t1.user_id, t3.user_id, t1.dayhaving count(distinct t1.song_id) >= 3;
第2329题
inner join两表联结、group by按用户id分组、sum(销量*单价)得到每个人总开销、按要求排序。
select user_id,sum(quantity * price) as spendingfrom Sales t1 inner join Product t2using(product_id)
group by user_id
order by spending desc, user_id asc;
第2339题
inner join两表联结、on筛选出不同的配对。
select t1.team_name as home_team,t2.team_name as away_team from teams t1 inner join teams t2on t1.team_name <> t2.team_name;
外联结
注意:为了方便起见,本文档所有能用左/右外联结的地方全部用左外联结记录,实际做题时各人视情况而定。
第175题
left join 左外联结。
select firstName,lastName,city,statefrom Person t1 left join Address t2on t1.personId = t2.personId;
第262题
子查询中先查出被禁的用户和司机,减少数据量、trips表 left join 子查询,保留连不上的地方以达到求差集目的(这种做法比where not in 找差集效率更高)、where继续筛选日期、group by 按日期分组、聚合函数统计、别名的时候记得打反引号,因为不打反引号时不允许出现空格。
select request_at as `Day`,round(1 - sum(status='completed') /count(*), 2) as `Cancellation Rate`from trips t1 left join (select users_idfrom userswhere banned = 'Yes') t2on t1.client_id = t2.users_idor t1.driver_id = t2.users_idwhere t2.users_id is nulland request_at between '2013-10-01' and '2013-10-03'
group by request_at;
第577题
left join 左外联结、coalesce() 处理null。
select t1.name,t2.bonusfrom employee t1 left join bonus t2 on t1.empid = t2.empidwhere coalesce(t2.bonus, 0) < 1000;
第580题
部门表left join 学生表、group by 分组、count() 计数、coalesce() 处理null、order by排序。
select dept_name,coalesce(count(t2.dept_id), 0) as student_numberfrom department t1 left join student t2on t1.dept_id = t2.dept_id
group by t1.dept_id
order by student_number desc, dept_name asc;
第618题
第一个子查询查美洲学生,后面的学生都没美洲学生多,所以可以用left join联上去。给每个子查询都用row_number()窗口函数编号,方便后续按照编号联结。
select America,Asia,Europefrom (select name as America,row_number() over(order by name) as rkfrom studentwhere continent = 'America') t1 left join(select name as Asia,row_number() over(order by name) as rkfrom studentwhere continent = 'Asia') t2on t1.rk = t2.rkleft join(select name as Europe,row_number() over(order by name) as rkfrom studentwhere continent = 'Europe') t3on t1.rk = t3.rk;
第1068题
left join 左外联结。
select `product_name`,`year`,`price`from sales t1 left join product t2 on t1.product_id = t2.product_id;
第1098题
这道题筛选条件比较多,如果先外联结再一口气筛选,可能会出现不知道把筛选条件写在on后面还是where后面或者一不小心写错的情况。所以思路是先通过子查询筛选一遍,再对筛选后的表外联结,这样条理比较清晰。
books表先用where筛掉日期不合规的数据、orders表也用where筛掉日期不合规的数据、两个查询结果left join、group by 分组、having + sum() 统计和筛选。
select t1.book_id,t1.namefrom (select book_id,namefrom bookswhere available_from <= '2019-05-23') t1 left join (select book_id,quantityfrom orderswhere dispatch_date >= '2018-06-23') t2on t1.book_id = t2.book_id
group by t1.book_idhaving coalesce(sum(t2.quantity), 0) < 10;
第1132题
actions表 left join removals表 on post_id相同、where留下extra = 'spam’的数据(这里必须用where筛选,不能写在on后面)、round() + 100 * count(distinct t2.post_id) / count(distinct t1.post_id)得到每天的百分比(这里去重是因为有可能同一个垃圾广告被举报多次,就会有多条记录,但是实际上这只是1条广告,也只被删了1次)、再把前面的查询作为子查询,查出每天删除比率的平均值。
select round(avg(d_ratio), 2) as average_daily_percentfrom ( select 100 * count(distinct t2.post_id) / count(distinct t1.post_id) as d_ratiofrom actions t1 left join removals t2on t1.post_id = t2.post_idwhere t1.extra = 'spam'group by t1.action_date) t;
第1158题
left join 左外联结、group by 分组、where筛选日期、year()截年份、count()计数。
select user_id as buyer_id,join_date,count(buyer_id) as orders_in_2019from users t1 left join (select buyer_idfrom orderswhere year(order_date) = '2019') t2 on t1.user_id = t2.buyer_id
group by user_id;
第1241题
子查询+where筛选查出两个表(帖子表、评论表)、帖子表left join评论表、count计数。
select t1.sub_id as post_id,count(t2.sub_id) as number_of_commentsfrom (select distinct sub_id, parent_id from submissionswhere parent_id is null) t1 left join(select distinct sub_id, parent_id from submissionswhere parent_id is not null) t2on t1.sub_id = t2.parent_id
group by t1.sub_id
order by post_id;
第1364题(四表联查)
invoices表内联结customers表、left join contacts表以统计联系人数量、再left join customers表(这里是用邮箱一一对应联结的)以统计可信联系人数量、group by分组、count()统计。
select t1.invoice_id,t2.customer_name,t1.price,count(t3.contact_name) as contacts_cnt,count(t4.customer_name) as trusted_contacts_cntfrom invoices t1 inner join customers t2on t1.user_id = t2.customer_idleft join contacts t3on t2.customer_id = t3.user_idleft join customers t4on t3.contact_email = t4.email
group by t1.invoice_id
order by t1.invoice_id;
第1378题
left join 左外联结。
select unique_id,namefrom employees t1 left join employeeuni t2 on t1.id = t2.id;
第1407题
left join 左外联结、group by 分组、sum() 统计、coalesce 处理null、order by 排序。
select name, coalesce(sum(distance), 0) as travelled_distancefrom Rides t1 right join users t2 on t1.user_id = t2.id
group by t1.user_id
order by travelled_distance desc, name asc;
第1421题
left join 左外联结。
select t2.id,t2.year,coalesce(npv, 0) as npvfrom npv t1 right join queries t2on t1.id = t2.idand t1.year = t2.year;
第1581题
left join 左外联结、where 筛选、group by 分组、count() 计数。
select customer_id,count(*) as count_no_transfrom visits t1 left join transactions t2 on t1.visit_id = t2.visit_idwhere t2.amount is null
group by customer_id;
第1677题
product表 left join 左外联结 invoice表、sum() 求和、coalesce() 把null变成0。
select name,coalesce(sum(rest), 0) as rest,coalesce(sum(paid), 0) as paid,coalesce(sum(canceled), 0) as canceled,coalesce(sum(refunded), 0) as refundedfrom product t1 left join invoice t2on t1.product_id = t2.product_id
group by t1.product_id
order by name;
第1715题
left join 左外联结把chests表拼到boxes表上、sum()求和。
select sum(t1.apple_count + coalesce(t2.apple_count, 0)) as apple_count,sum(t1.orange_count + coalesce(t2.orange_count, 0)) as orange_countfrom boxes t1 left join chests t2on t1.chest_id = t2.chest_id;
第1809题
left join 左外联结、where筛选。
select session_idfrom playback t1 left join ads t2on t1.customer_id = t2.customer_idand t2.timestamp between t1.start_time and t1.end_timewhere t2.timestamp is null;
第1988题
left join 左外联结、where筛选、group by分组、min()求能接受的最低分、coalesce处理null值。
select school_id,coalesce(min(score), -1) as scorefrom schools t1 left join exam t2on t1.capacity >= t2.student_count
group by school_id;
第2142题
子查询中将乘客表和公交车表内联结、筛选出乘客能坐的所有车(即后于乘客到达的车)、group by按乘客id分组、min()选出时间最早的车(即乘客真正坐的车)。主查询中公交车表 left join 子查询、group by按公交车id分组、count()计数、order by排序。
select t3.bus_id,count(t.tm) as passengers_cntfrom buses t3 left join ( select min(t2.arrival_time) as tmfrom passengers t1 inner join buses t2on t1.arrival_time <= t2.arrival_timegroup by passenger_id) ton t3.arrival_time = t.tm
group by t3.bus_id
order by bus_id asc;
3.3 自连接
用法:from 学生表 t1, 学生表 t2
表示把学生表自己和自己联结产生笛卡尔积并别名t1和t2以区别笛卡尔积中的数据。
注意:很多自连接的题都可以用lag() 窗口函数做,由于作者刚开始编辑本文时水平有限并没有写入lag() 窗口函数的方法。同理,一些lag() 窗口函数的题也可以用自连接做,对于这些题就不列全方法了,大家有兴趣可以自行思考
第180题
自连接再筛选
select distinct t1.num as ConsecutiveNumsfrom logs t1 inner join logs t2on t1.id + 1 = t2.id and t1.num = t2.numinner join logs t3on t2.id + 1 = t3.idand t1.num = t3.num;
第181题
两个employee表inner join 自联结,再筛选。
select t1.name as 'Employee'from Employee t1 inner join Employee t2on t1.salary > t2.salary and t1.managerId = t2.id;
第196题
inner join自连接、delete删除。
delete t1 from Person t1 inner join Person t2 on t1.email = t2.email and t1.id > t2.id;
第197题
两个原表自连接制造笛卡尔积、datediff() 求日期差、where 筛选。
select t2.idfrom Weather t1 inner join Weather t2on datediff(t2.recordDate, t1.recordDate) = 1and t1.Temperature < t2.Temperature;
第570题
两个原表inner join自连接、group by 分组、having + count(*) 筛选。
select t2.namefrom employee t1 inner join employee t2on t1.managerid = t2.id
group by t1.manageridhaving count(*) >= 5;
第603题
两个原表inner join自连接再筛选。还可能用到abs()绝对值函数。
select distinct t1.seat_idfrom cinema t1 inner join cinema t2on abs(t1.seat_id - t2.seat_id) = 1 and t1.free = 1 and t2.free = 1
order by t1.seat_id;
第612题
两个原表inner join 自连接、筛选出与表1点不同的表2点、按照题目公式计算表1表2点的距离(需要用到power()求平方、sqrt()开方、min()求最短距离、round()保留2位小数)。
select round(min(sqrt(power((t1.x - t2.x),2) + power((t1.y - t2.y), 2))), 2) as shortestfrom point2D t1 inner join point2D t2on t1.x <> t2.xor t1.y <> t2.y;
第613题
方法一:两表inner join 自连接再筛选。计算出任意两点间的距离,再min()找出最小的距离。
方法二:由于最小距离只存在于相邻两个点中,所以没必要算出全部任意两点间的距离。先用窗口函数rank排序+编号。再左外联结拼出一张同时有某点和它右一个点的坐标的临时表,并select出这个临时表中相邻两点的距离。最后min()找出最小距离。
select distinct min(dis) as shortestfrom (select abs(t2.x - t1.x) as disfrom ( select x, rank() over(order by x) as idfrom point) t1 left join ( select x,rank() over(order by x) as idfrom point) t2on t1.id + 1 = t2.id) t3;
第626题
两个原表inner join 自连接,再筛选。
select t1.id,t2.studentfrom seat t1 inner join seat t2on (t1.id % 2 = 1 and (t1.id + 1 = t2.id or (t1.id = (select max(id) from seat) and t1.id = t2.id)))or (t1.id % 2 = 0 and t1.id - 1 = t2.id)
order by t1.id;
第1097题
子查询中查出每个人的首次登陆日期、left join自连接再接上每个人第二天登陆的行(没有就保留空)、group by按照安装日期分组、count(安装日期)统计安装人数、count(表2的player_id) / count(安装日期)得到留存率,最后四舍五入一下。
select install_dt,count(t1.install_dt) as installs,round(count(t2.player_id) / count(t1.install_dt), 2) as Day1_retention from ( select min(event_date) as install_dt,player_idfrom activitygroup by player_id) t1 left join activity t2on t1.install_dt + 1 = t2.event_dateand t1.player_id = t2.player_id
group by install_dt;
第1270题
方法1:重复3次employees 表都用 inner join 联起来、where 筛选第三层manager_id = 1的人(不管是哪层的人最终第三层的manager_id都会回到1)。
select t1.employee_idfrom employees t1
inner join employees t2 on t1.manager_id = t2.employee_id
inner join employees t3 on t2.manager_id = t3.employee_idwhere t3.manager_id = 1and t1.employee_id <> 1;
方法2:union 把每个层级的人联到一起。
第1285题(连续区间)
方法1:两个logs表cross join自联结、起点-1不在logs表中,终点+1不在logs表中,且终点值大于起点值,借此筛选出起点和终点、由于用了笛卡尔积,每个起点都对应全部终点,所以再用group by和min() 函数选出终点。
select t1.log_id as start_id,min(t2.log_id) as end_idfrom logs t1 inner join logs t2on t1.log_id <= t2.log_idwhere (t1.log_id - 1) not in (select log_id from logs)and (t2.log_id + 1) not in (select log_id from logs)
group by start_id
order by start_id;
方法2:基本原理:连续区间中的数与某一基准等差数列的差值是固定的。例如123567与123456一一对应相减的差是000111,可借此差值对区间进行分类。
log_id - row_number() over() 得到差值、借之前的差值分组、每组最小的为start,最大的为end。
第1321题
原表先对visited_on去重(不然后面连接时会连多次)再自连接、连接条件是datediff(t2日期, t1日期) between 0 and 6。这样仍然不满足题意,题目意思是必须连续7天才计算一次,像例如4号这种连上了1,2,3三天的应该筛掉,所以可以加个having count(distinct)排除掉连上的日期不足7天的数据,也可以用where筛选掉日期不足最小日期(用子查询得到)+6的数据。
select t1.visited_on,sum(t2.amount) as amount,round(sum(t2.amount)/7, 2) as average_amountfrom (select distinct visited_on from customer) t1 inner join customer t2on datediff(t1.visited_on, t2.visited_on) between 0 and 6
group by t1.visited_onhaving count(distinct t2.visited_on) = 7
order by t1.visited_on;
第1459题
用大炮发射法(即1连2、3,2连3)将两个points表按照inner join自连接、筛选出x和y坐标都不一样的数据、计算面积、order by 排序。
select t1.id as p1,t2.id as p2,abs((t1.x_value - t2.x_value) * (t1.y_value - t2.y_value)) as areafrom points t1 inner join points t2on t1.id < t2.idand t1.x_value <> t2.x_valueand t1.y_value <> t2.y_value
order by area desc, p1, p2;
第1747题
两个loginfo表自连接、筛选出表2登录时间between表1登录和下线时间的数据、还要加个distinct因为有的账户出问题可能不止一次。
select distinct t1.account_idfrom loginfo t1 inner join loginfo t2on t1.account_id = t2.account_idand t1.ip_address <> t2.ip_addressand t2.login between t1.login and t1.logout;
第1843题
临时表temp中将transactions表和accounts表inner join,on id相等、where筛选出所有‘creditor’数据、group by 按账户id和年月(需要用date_format取)联合分组、sum() 计算每人每月总收入、having 筛选出总收入 > 最大收入的数据。主查询中将两个temp表inner join自连接、on id相等且月份差1,求月份差用period_diff()、这样留下的人都是可疑的,只不过有的人可能连续多个月可疑,就会被选出多次,所以还要来个distinct。
with temp as (select t1.account_id,sum(t1.amount) as all_amount,t2.max_income,date_format(t1.day, "%Y%m") as dfrom transactions t1 inner join accounts t2on t1.account_id = t2.account_idwhere type = 'Creditor'group by t1.account_id, dhaving all_amount > max_income
)select distinct t1.account_idfrom temp t1 inner join temp t2on t1.account_id = t2.account_idand period_diff(t2.d, t1.d) = 1;
第1875题
两个原表用inner join按照工资相同id不同自连接以排除掉工资独一无二的员工、dense_rank() 窗口函数按工资排出顺序、distinct去除联结后重复出现的数据、order by 排序。
select distinct t1.employee_id,t1.name,t1.salary,dense_rank() over (order by t1.salary) as team_idfrom employees t1 inner join employees t2on t1.salary = t2.salaryand t1.employee_id <> t2.employee_id
order by team_id asc, employee_id asc
第1951题
子查询中2个ralations表自连接让拥有相同follower的用户处在同行方便之后计数、联结筛选条件是follower_相等且联上来的t2表user_id > 原来t1表的user_id(不然1,7和7,1会算成两种情况)、group by联合分组、rank() 窗口函数以count(*)排序编号、主查询中筛选出排1名的。
select user1_id,user2_idfrom ( select t1.user_id as user1_id,t2.user_id as user2_id,rank() over(order by count(*) desc) as rankingfrom relations t1 inner join relations t2on t1.follower_id = t2.follower_idand t1.user_id < t2.user_idgroup by t1.user_id, t2.user_id) twhere ranking = 1;
第1939题
两个confirmations表inner join自联结、on或者where后面筛选出user_id相等且time_stamp相差1天的数据、可使用timestampdiff()、date_add()、unix_timestamp转换日期格式再相减等。
select distinct t1.user_idfrom confirmations t1 inner join confirmations t2on t1.user_id = t2.user_idand t1.time_stamp < t2.time_stampand unix_timestamp(t2.time_stamp) - unix_timestamp(t1.time_stamp) <= 24*60*60;
第1949题
由于每个人的朋友散布在user1_id和user2_id两列中,所以先用union all将原表和交换两列左右顺序后的新表进行并集,以便后续找某个id的朋友时只需要看第二列即可,再用临时表将上述查询结果保存下来取名为temp以免重复写。原表t1 inner join temp表 t2,将t1.user1_id的朋友联过来、继续inner join temp表t3,将t1.user2_id和前面联的t1.user1_id的共同朋友联过来、group by分组、count(t3中的id)计数、having筛选。
with temp as ( select user1_id,user2_id from friendshipunion all select user2_id,user1_idfrom friendship)select t1.user1_id,t1.user2_id,count(t3.user2_id) as common_friendfrom friendship t1 inner join temp t2on t1.user1_id = t2.user1_idinner join temp t3on t1.user2_id = t3.user1_idand t2.user2_id = t3.user2_id
group by t1.user1_id, t1.user2_idhaving common_friend >= 3;
第1951题
子查询中2个ralations表自连接让拥有相同follower的用户处在同行方便之后计数、联结筛选条件是follower_相等且联上来的t2表user_id > 原来t1表的user_id(不然1,7和7,1会算成两种情况)、group by联合分组、rank() 窗口函数以count(*)排序编号、主查询中筛选出排1名的。
select user1_id,user2_idfrom ( select t1.user_id as user1_id,t2.user_id as user2_id,rank() over(order by count(*) desc) as rankingfrom relations t1 inner join relations t2on t1.follower_id = t2.follower_idand t1.user_id < t2.user_idgroup by t1.user_id, t2.user_id) twhere ranking = 1;
第2084题
left join 左外联结把orders表重复两次(注意此时可能本来只有1条的数据也会被复制多次,所以select后要加distinct)、筛选出要保留的数据。
select distinct t1.order_id,t1.customer_id,t1.order_typefrom orders t1 left join orders t2on t1.customer_id = t2.customer_id and t1.order_type <> t2.order_typewhere t2.order_type is nullor (t1.order_type = 0 and t2.order_type = 1);
第2228题
方法1:两个原表自连接、筛选条件是表2的id和表1id相同,且表2日期大于等于表1日期,且表2purchase_id不等于表1purchase_id、group by 按表1user_id和表1purchase_date联合分组、取得min(表2purchase_date),至此终于找到每一天对应的最近的下一次购买日期、datediff求差值、写进having里筛选。
select distinct t1.user_idfrom purchases t1 inner join purchases t2on t1.user_id = t2.user_idand t2.purchase_date >= t1.purchase_dateand t1.purchase_id <> t2.purchase_id
group by t1.user_id, t1.purchase_datehaving datediff(min(t2.purchase_date), t1.purchase_date) <= 7
order by user_id
方法2:子查询中用lag()窗口函数按user_id分区,purchase_date和purchase_id排序,将每次购买日期的上次购买日期接在每次购买日期后面。主查询中datediff算差值、where筛选。
select distinct user_idfrom (select user_id,purchase_date,lag(purchase_date, 1, '2000-01-01') over(partition by user_id order by purchase_date, purchase_id) as last_datefrom purchases) twhere datediff(purchase_date, last_date) <= 7
order by user_id
4. union 求并集(常用于列转行/宽转长)
功能 | 用法 |
---|---|
将两个表上下合并 | 表1 union all 表2 |
将两个表上下合并+去重 | 表1 union 表2 |
注意:
(1)合并时两个表列数必须相同。列名可以不同,但输出结果以最上方的表的列名为准。
(2)union all直接把两表上下拼一起,而union会将两个表中一整排都相同的地方进行去重。union all性能优于union,一般没有去重的要求都使用union all。
第602题
子查询中将requester_id和accepter_id两列并成1列、主查询中group by分组、count() 计数、order by排序、limit 1输出第1名。
select requester_id as id,count(*) as numfrom ( select requester_idfrom requestacceptedunion allselect accepter_idfrom requestaccepted) t
group by requester_id
order by num desclimit 1
第1205题
临时表中chargeback表 left join transaction表,将chargeback表对应的国家和金额等信息补全(补到和事务表一样5列)、union all到transactions表上,组成一个完整的事务表。在完整的事务表中按日期的年月和国家多字段分组、挨个统计需要的数据,这里又用到了sum(条件语句)。
with temp as (select t1.trans_id as id,t2.country,'chargeback' as state,t2.amount,t1.trans_datefrom chargebacks t1 left join transactions t2on t1.trans_id = t2.idunion allselect id,country,state,amount,trans_datefrom transactionswhere state = 'approved'
)select left(trans_date, 7) as month,country,sum(state = 'approved') as approved_count,sum(if(state = 'approved', amount, 0)) as approved_amount,sum(state = 'chargeback') as chargeback_count,sum(if(state = 'chargeback', amount, 0)) as chargeback_amountfrom temp
group by month, country;
第1212题
临时temp表中将原表数据和主客交换(记得队伍id和得分都要交换)后的新表数据union all到一起,这样的效果是原本要知道所有队伍的得分情况需要host_team和guest_team两列一起看,因为一个队伍有时是主场有时是客场,但现在只需要看host_team 1列即可,方便后续group by分组。主查询中将teams表left join temp表、group by按id分组、sum(条件语句)统计得分、order by排序。
with temp as (select host_team,guest_team,host_goals,guest_goalsfrom matchesunion allselect guest_team,host_team,guest_goals,host_goalsfrom matches
)select team_id,team_name,sum(case when host_goals > guest_goals then 3when host_goals = guest_goals then 1else 0 end) as num_pointsfrom teams t1 left join temp t2on t1.team_id = t2.host_team
group by team_id
order by num_points desc, team_id asc;
第1264题
子查询中将friendship原表和交换左右顺序的原表union all取并集、套一层查询从第二列找到user_id=1的朋友、主查询中查distinct page_id,因为有的人喜欢同一个页面、where进行筛选,条件是user_id在上面子查询的朋友中且page_id不等于1号已经喜欢的page_id(要再用个子查询得到1号喜欢的page_id)。
select distinct page_id as recommended_pagefrom likeswhere user_id in (select user2_idfrom (select user1_id,user2_idfrom friendshipunion all select user2_id,user1_idfrom friendship) t1where user1_id = 1)and page_id not in (select page_idfrom likeswhere user_id = 1);
第1341题
两个子查询分别查出最多点评的用户和最高平均分的电影、union all(两个子查询列数要对的上,列名先改后改都可以,并不影响union)。这里子查询需要注意:要么用窗口函数排序,主查询中选排1的数据,要么用limit 1,但是用limit 1的话要给两个子查询套括号,否则无法使用union。
select resultsfrom ( select t2.name as results,rank() over(order by count(t1.user_id) desc, t2.name asc) as rkfrom movierating t1 inner join users t2on t1.user_id = t2.user_idgroup by t1.user_id
union allselect t4.title as results,rank() over(order by avg(t3.rating) desc, t4.title asc) as rkfrom movierating t3 inner join movies t4on t3.movie_id = t4.movie_idand t3.created_at between '2020-02-01' and '2020-02-29'group by t3.movie_id) twhere rk = 1;
第1435题
先用嵌套查询分别筛选出不同时间段的表,并count(*)算出数量,最后两两union all,得到结果表。
select case when duration < 300 then '[0-5>' when duration < 600 then '[5-10>'when duration < 900 then '[10-15>'else '15 or more' end as bin,count(*) as totalfrom sessions
group by bin;
第1501题
子查询将calls表拆成2份并union all到一起,第一份是caller_id + duration,第二份是callee_id + duration,这是因为一通电话实际上对于呼出和呼入用户的国家来说要各计算一次,即使国家相同也是一样。inner join联好人名表,国家表、group by 分组、having中用avg() > 全世界电话平均时长来筛选、这里还要用子查询计算一次全世界的电话平均时长。
select t3.name as countryfrom ( select caller_id,durationfrom callsunion allselect callee_id,durationfrom calls) t1 inner join person t2on t1.caller_id = t2.id inner join country t3on left(t2.phone_number, 3) = t3.country_code
group by t3.namehaving avg(t1.duration) > (select avg(duration) from calls);
第1555题
临时表中将paid_by列和 - amount列作为一个表,paid_to列和amount列作为一个表,两表union all,得到所有人的财产变化表、财产变化表外面套一层按user_id(之前要别名一下)分组 + sum() 求出每个人财产变化总额,临时表结束。原表left join临时表、求出所有人现在的财产(记得coalesce()处理null值)、if(再算一次财产总值) 判断是否破产。
with temp as (select user_id,sum(change_amount) as amountfrom ( select paid_by as user_id,-amount as change_amountfrom transactionsunion allselect paid_to as user_id,amount as change_amountfrom transactions) tgroup by user_id
)select t1.user_id,t1.user_name,t1.credit + coalesce(t2.amount, 0) as credit,if(t1.credit + coalesce(t2.amount, 0) < 0, 'Yes', 'No') as credit_limit_breachedfrom users t1 left join temp t2on t1.user_id = t2.user_id
第1699题
方法1:子查询中用case when 给from_id和to_id交换顺序、主查询中group by 联合分组、count()和sum() 统计。
SELECT person1,person2, count(*) call_count, sum(duration) total_duration
FROM (
SELECT IF(from_id>to_id, to_id, from_id) person1, IF(from_id>to_id,from_id,to_id) person2, duration
FROM calls
) c
GROUP BY person1, person2
方法2:子查询中用union all 分别把顺序对的数据、顺序不对但交换过的数据拼一起、后续同方法1。
select from_id as person1,to_id as person2,count(*) as call_count,sum(duration) as total_durationfrom ( select from_id,to_id,durationfrom callswhere from_id < to_idunion allselect to_id as from_id,from_id as to_id,durationfrom callswhere from_id > to_id) t
group by from_id, to_id;
第1789题
只有1个部门的员工和有多个部门的员工分开筛选,最后union。
select employee_id,department_idfrom employee group by employee_id having count(*) = 1unionselect employee_id,department_idfrom employee where primary_flag = 'Y'group by employee_id;
第1783题
方法1:用多个union all 将championships表的数据汇总到一列并将列名改成player_id、natural join players 表、group by 分组、count() 统计。
方法2:两张表生成笛卡尔积,此时每个球员都对应一份championships表的全部数据、group by 分组、sum(case when) 筛选统计。
第1795题
用枚举法分别select出不同商店的数据,最后再union all。
select product_id,'store1' as store,store1 as pricefrom productswhere store1 is not nullunion all
select product_id,'store2' as store,store2 as pricefrom productswhere store2 is not nullunion all
select product_id,'store3' as store,store3 as pricefrom productswhere store3 is not null;
第1811题
临时表temp中将原比赛表的3列转成1列方便统计每个人的获奖情况、找连续区间用编号法,算出contest_id和row_number() 编号的差值方便后续按差值diff分组。主查询中先将得了3次金奖的人选出并union all 连续3次得奖的人、group by按照id和diff联合分组,having + count() 计数选出连续得奖3次的人。再套一层查询natural join user表得到名字和邮箱、distinct去重以免有些又连续3次获奖又得3次金奖的人被算多。
with temp as (select contest_id,player,contest_id - row_number() over(partition by player order by contest_id) as difffrom ( select contest_id,gold_medal as playerfrom contestsunion allselect contest_id,silver_medalfrom contestsunion allselect contest_id,bronze_medalfrom contests) t
)select distinct name,mailfrom ( select gold_medal as user_idfrom contestsgroup by gold_medal having count(*) >= 3union allselect playerfrom tempgroup by player, diffhaving count(*) >= 3) t natural join users;
第1841题
和1212题不能说非常相似,只能说一模一样。
with temp as (select home_team_id,away_team_id,home_team_goals,away_team_goalsfrom matchesunion all select away_team_id,home_team_id,away_team_goals,home_team_goalsfrom matches
)select team_name,count(home_team_id) as matches_played,sum(case when home_team_goals > away_team_goals then 3when home_team_goals = away_team_goals then 1else 0 end) as points,sum(home_team_goals) as goal_for,sum(away_team_goals) as goal_against,sum(home_team_goals - away_team_goals) as goal_difffrom teams t1 inner join temp t2on t1.team_id = t2.home_team_id
group by team_id
order by points desc, goal_diff desc, team_name asc;
第1907题
sum(case when)计数、3种工资都统计完了之后union all。
select 'Low Salary' as category,sum(if(income < 20000, 1, 0)) as accounts_countfrom accounts
union allselect 'Average Salary' as category,sum(if(income between 20000 and 50000, 1, 0)) as accounts_countfrom accounts
union allselect 'High Salary' as category,sum(if(income > 50000, 1, 0)) as accounts_countfrom accounts;
第1949题
由于每个人的朋友散布在user1_id和user2_id两列中,所以先用union all将原表和交换两列左右顺序后的新表进行并集,以便后续找某个id的朋友时只需要看第二列即可,再用临时表将上述查询结果保存下来取名为temp以免重复写。原表t1 inner join temp表 t2,将t1.user1_id的朋友联过来、继续inner join temp表t3,将t1.user2_id和前面联的t1.user1_id的共同朋友联过来、group by分组、count(t3中的id)计数、having筛选。
with temp as ( select user1_id,user2_id from friendshipunion all select user2_id,user1_idfrom friendship)select t1.user1_id,t1.user2_id,count(t3.user2_id) as common_friendfrom friendship t1 inner join temp t2on t1.user1_id = t2.user1_idinner join temp t3on t1.user2_id = t3.user1_idand t2.user2_id = t3.user2_id
group by t1.user1_id, t1.user2_idhaving common_friend >= 3;
第1965题
先union all两个表求id的并集、再group by按id分组、having + count(*)筛选出只出现过一次的id、order by。
select employee_idfrom ( select employee_idfrom employeesunion allselect employee_idfrom salaries) as t
group by employee_idhaving count(*) = 1
order by employee_id;
第1990题(查固定字段)
union all枚举创建固定字段的platform表和experiment_name表、cross join得到这两个表笛卡尔积、left join experiments表+group by+count(*) 计数。
select t1.platform,t2.experiment_name,count(t3.experiment_name) as num_experimentsfrom (select 'Android' as platform union allselect 'IOS' as platform union allselect 'Web' as platform) t1 cross join(select 'Reading' as experiment_name union allselect 'Sports' as experiment_name union allselect 'Programming' as experiment_name) t2 left joinexperiments t3on t1.platform = t3.platformand t2.experiment_name = t3.experiment_name
group by platform, experiment_name;
5. 子查询/嵌套查询
注意:由于差集题肯定涉及子查询,所以写进差集分类的题不再写进子查询分类。
第176题
方法1:子查询中distinct 去重、order by 排序、limit + offset 输出第二位、主查询再查一次把空值变成null。
SELECT(SELECT DISTINCTSalaryFROMEmployeeORDER BY Salary DESCLIMIT 1 OFFSET 1) AS SecondHighestSalary;
方法2:子查询中dense_rank() 排序、主查询中where 筛选、sum()或者max()或者min()聚合函数把空值变成null。
select max(salary) as SecondHighestSalaryfrom (select salary,dense_rank() over(order by salary desc) as rk from employee) twhere rk = 2;
第177题
类似第176题。用方法1要先set n = n - 1。
CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGINset n = n - 1;RETURN (# Write your MySQL query statement below.select salaryfrom ( select distinct salary from employeeorder by salary desc) tlimit 1 offset n);
END
第597题
方法一:先distinct + 多字段得到去重后的临时表,再count()对临时表中的数据计数,无法直接count(distinct + 多字段)。
方法二:为了解决无法直接count(distinct + 多字段)的问题可以把多字段用concat()拼成单字段。
select round(coalesce((t1.acc/t2.req), 0), 2) as accept_ratefrom (select count(distinct concat(requester_id, accepter_id)) as acc,'yyy' as tagfrom RequestAccepted) t1 inner join(select count(distinct concat(sender_id, send_to_id)) as req,'yyy' as tagfrom FriendRequest) t2 on t1.tag = t2.tag;
第1076题
方法1:子查询中group by按照id分组、count(*)计算人数、主查询中再用group by分组、having筛选、count( * )>=all(子查询中人数)。
select project_idfrom ( select project_id,count(project_id) as numfrom projectgroup by project_id) twhere num >= all( select count(project_id)from projectgroup by project_id)
方法2:子查询中group by按照id分组、用rank()函数按照count(*)排序(这里比较重要的是rank里是可以用聚合函数的)、主查询中where筛选排序第1名的(rank和dense_rank都可以排出并列排名)。
select project_idfrom ( select project_id,rank() over(order by count(*) desc) as numfrom projectgroup by project_id) twhere num = 1;
第1082题
子查询+group by建立临时表得到每人总销售额、主表用having筛选出sum()总销售额>=all临时表里销售额的销售者id。
select seller_idfrom sales
group by seller_idhaving sum(price) >= all( select sum(price)from salesgroup by seller_id);
第1107题
子查询中按user_id分组得到每个人首次登陆日期、主查询中在子查询的基础上继续筛选出日期在90天内的数据并按日期分组。
select login_date,count(user_id) as user_countfrom ( select user_id,min(activity_date) as login_datefrom trafficwhere activity = 'login'group by user_idhaving datediff('2019-06-30', login_date) <= 90) t
group by login_date;
第1112题
方法1:子查询中查出max()分数、inner join联上原表,条件是id一样、grade一样,则不是最高分的没联上就筛掉了、分组并输出数据(因为有并列第一,所以course_id要输出min()最小值)。
select t1.student_id,min(t1.course_id) as course_id,t1.gradefrom enrollments t1 inner join ( select student_id,max(grade) as bestfrom enrollmentsgroup by student_id) t2on t1.student_id = t2.student_idand t1.grade = t2.best
group by student_id
order by student_id;
方法2:子查询中rank()窗口函数排序、主查询中筛选rank = 1。
第1126题
子查询中group by分组,avg()聚合求出每种event的平均数、主查询中原表natural join子查询、group by按business_id分组、having + sum(case when)统计并筛选。
select business_idfrom events t1 natural join ( select event_type,avg(occurences) as estdfrom eventsgroup by event_type) t2
group by business_idhaving sum(case when occurences > estd then 1 else 0 end) >= 2;
第1174题
子查询查出用户首次购买数据、主查询统计order_date = pref_date的数据。这里注意一个小技巧:按条件计数,是即时订单+1,计划订单+0,可以用sum(case when),也可以直接写sum(条件),此时条件满足时自动把布尔值转换成1计算,不满足转换成0计算。
select round(100 * sum(order_date = customer_pref_delivery_date) / count(*), 2) as immediate_percentagefrom delivery where (customer_id, order_date) in ( select customer_id,min(order_date) as first from deliverygroup by customer_id);
第1241题
子查询+where筛选查出两个表(帖子表、评论表)、帖子表left join评论表、count计数。
第1303题
子查询出每个团队人数,再join到原表上。
select employee_id,team_sizefrom employee t1 inner join ( select team_id,count(*) as team_sizefrom employeegroup by team_id) t2on t1.team_id = t2.team_id;
第1468题
子查询中分组max()聚合查出每个公司的最高工资、主查询中原表natural join子表、case when分条件计算税后工资。
select company_id,employee_id,employee_name,round(case when cstd < 1000 then salarywhen cstd between 1000 and 10000 then salary * 0.76else salary * 0.51 end, 0) as salaryfrom salaries t1 natural join ( select company_id,max(salary) as cstdfrom salariesgroup by company_id) t2;
第1821题
子查询筛选出对应年份数据、group by 单字段分组、having筛选出正收入数据。
select customer_idfrom (select customer_id,revenue from customerswhere `year` = '2021') t2
group by customer_idhaving sum(revenue) > 0;
第1831题
方法1:子查询中rank() over() 排出每天amount的名次、主查询中得到排第1名的数据。
方法2:子查询中group by 分组、max() 求最大值、inner join 到原表上、筛选出原表中日期等于子查询日期且amount = 当日最大值的数据。
select transaction_idfrom transactions t1 inner join( select date(day) as day,max(amount) as firstfrom transactionsgroup by date(day)) t2on date(t1.day) = t2.dayand t1.amount = t2.first
order by transaction_id;
第1867题
子查询中查出每个order_id分组下quantity的平均值、主查询中再次group by 分组并用having 筛选出max(quantity) > all子查询平均值的数据。
第2112题
孙查询中union all把出发机场和到达机场列联结起来、子查询中分组聚合求飞机数并按飞机数用rank() 窗口函数排序、父查询中筛选出排第1的。
select airport_idfrom ( select airport_id,rank() over(order by sum(flights_count) desc) as numfrom ( select departure_airport as airport_id,flights_countfrom flights union all select arrival_airport as airport_id,flights_count from flights) t1group by airport_id) t2where num = 1;
第2142题
子查询中将乘客表和公交车表内联结、筛选出乘客能坐的所有车(即后于乘客到达的车)、group by按乘客id分组、min()选出时间最早的车(即乘客真正坐的车)。主查询中公交车表 left join 子查询、group by按公交车id分组、count()计数、order by排序。
select t3.bus_id,count(t.tm) as passengers_cntfrom buses t3 left join ( select min(t2.arrival_time) as tmfrom passengers t1 inner join buses t2on t1.arrival_time <= t2.arrival_timegroup by passenger_id) ton t3.arrival_time = t.tm
group by t3.bus_id
order by bus_id asc;
6. 字符串相关函数
功能 | 函数 |
---|---|
字母大写 | upper(字符串) |
字母小写 | lower(字符串) |
字符串切割(切左n个字符) | left(字符串, n) |
字符串切割(切右n个字符) | right(字符串, n) |
字符串切割(从第n个开始切) | substring(字符串, n) |
字符串拼接 | concat(字符串1, 字符串2, …) |
将字符串用指定字符连接 | concat_ws(‘连接符号’, 字符串1, 字符串2, …) |
统计字符串长度 | char_length(字符串) |
去除字符串左右空格 | trim(字符串) |
重复目标字符串n次 | repeat(字符串, n) |
第1543题
trim()去空格、lower()改小写、date_format() 或者left() 取年月、group by 联合分组、count()计数。
select lower(trim(product_name)) as product_name,left(sale_date, 7) as sale_date,count(*) as totalfrom sales
group by lower(trim(product_name)), left(sale_date, 7)
order by product_name, sale_date;
第1565题
left()取日期中的年和月、group by分组、count()统计
select left(order_date, 7) as month,count(*) as order_count,count(distinct customer_id) as customer_countfrom orderswhere invoice > 20
group by month;
第1667题
left() 取首字母,配合upper() 将字母大写,substring() 取后面,配合lower() 将字母小写,最后concat()组合。
select user_id,concat(upper(left(name, 1)), lower(substring(name, 2))) as namefrom users
order by user_id;
第1683题
char_length()统计字符串长度、where筛选。
select tweet_idfrom tweetswhere char_length(content) > 15;
7. 时间相关函数
功能 | 函数 |
---|---|
从日期中提取年份数值 | year(日期) |
从日期中提取月份数值 | month(日期) |
从日期中提取日数值 | day(日期) |
日期格式化 | date_format(日期, ‘占位符’) |
计算两个日期的天数差 | datediff(大日期, 小日期) |
日期差值(单位:需要指定) | timestampdiff(输出单位, 小日期, 大日期) |
给日期加上n个单位的时间(如1 day) | date_add(日期, interval n type) |
给日期减去n个单位的时间(如1 day) | date_sub(日期, interval n type) |
得到对应的时间戳 | unix_timestamp(‘yyyy-MM-dd’) |
给日期加上一定的月份 | period_add(日期, n) |
计算两个日期的月份差 | period_diff(日期1, 日期2) |
计算星期几(范围0-6,周一开始算) | weekday(日期) |
计算星期几(范围1-7,周日开始算) | dayofweek(日期) |
注意:period函数的日期参数必须是’YYYYMM’或’YYMM’格式(可以用date_format转格式),且period_add函数输出的是’YYYYMM’格式。这个函数专门用来计算月份差,尤其是对于如2020年12月和2021年1月这种跨年的两个时间点能方便地计算出相差1个月。
附录:date_format的所有占位符
%W 星期名字(Sunday……Saturday)
%D 有英语前缀的月份的日期(1st, 2nd, 3rd, 等等。)
%Y 年, 数字, 4 位
%y 年, 数字, 2 位
%a 缩写的星期名字(Sun……Sat)
%d 月份中的天数, 数字(00……31)
%e 月份中的天数, 数字(0……31)
%m 月, 数字(01……12)
%c 月, 数字(1……12)
%b 缩写的月份名字(Jan……Dec)
%j 一年中的天数(001……366)
%H 小时(00……23)
%k 小时(0……23)
%h 小时(01……12)
%I 小时(01……12)
%l 小时(1……12)
%i 分钟, 数字(00……59)
%r 时间,12 小时(hh:mm:ss [AP]M)
%T 时间,24 小时(hh:mm:ss)
%S 秒(00……59)
%s 秒(00……59)
%p AM或PM
%w 一个星期中的天数(0=Sunday ……6=Saturday )
%U 星期(0……52), 这里星期天是星期的第一天
%u 星期(0……52), 这里星期一是星期的第一天
%% 一个文字“%”
第1141题
where + datediff 筛选、group by 分组、count(distinct) 计数。
select activity_date as day,count(distinct user_id) as active_usersfrom activitywhere datediff('2019-07-27', activity_date) < 30 and datediff('2019-07-27', activity_date) >= 0
group by activity_date;
第1142题
方法1:子查询里where + datediff() 筛选、group by 分组、count() 计数、主查询里avg() 算平均值。
select round(coalesce(avg(num), 0), 2) as average_sessions_per_userfrom ( select count(distinct session_id) as numfrom activitywhere activity_date between '2019-06-28' and '2019-07-27'group by user_id) t;
方法2:where + datediff() 筛选、直接count(distinct session_id) / count(distinct user_id) 算平均值。
SELECT IFNULL(ROUND(COUNT(DISTINCT session_id) / COUNT(DISTINCT user_id), 2), 0) AS average_sessions_per_user
FROM Activity
WHERE DATEDIFF('2019-07-27', activity_date) < 30
第1454题(连续区间)
临时表对原表按照id和login_date联合去重,防止1个人1天登录多次。子查询中row_number()窗口函数编号准备找连续区间,但是这里有个问题,即原表是没有编号的,只能用日期减,即再窗口函数外面再套一个date_sub() 函数,给差别名为diff。主查询里将子查询right join到accounts表中(注意这本来inner join更好,采取right join的原因是之后就不用排序了)、按id和diff联合分组、having + count(*) 筛选连续天数大于等于5天的人、最后select时要去重,因为有的人这段时间连续登陆5天以上,后面哪段时间又连续登陆5天以上。
with temp as (select distinct id,login_datefrom logins
)select distinct t1.id,t2.namefrom (select id,date_add(login_date, interval -row_number() over(partition by id order by login_date) day) as difffrom temp) t1 right join accounts t2on t1.id = t2.id
group by t1.id, diffhaving count(*) >= 5;
第1479题
weekday取星期数,case when判断,sum()求和。
select item_category as category,sum(case when weekday(order_date) = 0 then quantity else 0 end) as monday,sum(case when weekday(order_date) = 1 then quantity else 0 end) as Tuesday,sum(case when weekday(order_date) = 2 then quantity else 0 end) as Wednesday,sum(case when weekday(order_date) = 3 then quantity else 0 end) as Thursday,sum(case when weekday(order_date) = 4 then quantity else 0 end) as Friday,sum(case when weekday(order_date) = 5 then quantity else 0 end) as Saturday,sum(case when weekday(order_date) = 6 then quantity else 0 end) as Sundayfrom items t1 left join orders t2on t1.item_id = t2.item_id
group by item_category
order by item_category;
第1843题
临时表temp中将transactions表和accounts表inner join,on id相等、where筛选出所有‘creditor’数据、group by 按账户id和年月(需要用date_format取)联合分组、sum() 计算每人每月总收入、having 筛选出总收入 > 最大收入的数据。主查询中将两个temp表inner join自连接、on id相等且月份差1,求月份差用period_diff()、这样留下的人都是可疑的,只不过有的人可能连续多个月可疑,就会被选出多次,所以还要来个distinct。
with temp as (select t1.account_id,sum(t1.amount) as all_amount,t2.max_income,date_format(t1.day, "%Y%m") as dfrom transactions t1 inner join accounts t2on t1.account_id = t2.account_idwhere type = 'Creditor'group by t1.account_id, dhaving all_amount > max_income
)select distinct t1.account_idfrom temp t1 inner join temp t2on t1.account_id = t2.account_idand period_diff(t2.d, t1.d) = 1;
第1853题
date_format格式化日期。
select concat_ws(', ', dayname(day), concat(monthname(day), ' ', day(day)), year(day)) as dayfrom days;
第1890题
group by 单字段分组、year()取时间年份、max()。
select user_id,max(time_stamp) as last_stampfrom loginswhere year(time_stamp) = '2020'
group by user_id;
第1939题
两个confirmations表inner join自联结、on或者where后面筛选出user_id相等且time_stamp相差1天的数据、可使用timestampdiff()、date_add()、unix_timestamp转换日期格式再相减等。
select distinct t1.user_idfrom confirmations t1 inner join confirmations t2on t1.user_id = t2.user_idand t1.time_stamp < t2.time_stampand unix_timestamp(t2.time_stamp) - unix_timestamp(t1.time_stamp) <= 24*60*60;
第2298题
weekday()从日期中取得星期几、sum(case when)分类统计、when里面写法是weekday(日期) in (元组),而不是一个个去枚举week(day)=0、=1、=2的情况。
select sum(case when weekday(submit_date) in (5, 6) then 1 else 0 end) as weekend_cnt,sum(case when weekday(submit_date) in (5, 6) then 0 else 1 end) as working_cntfrom tasks;
8. group by 分组
功能 | 用法 |
---|---|
单字段分组 | group by 字段名(即列名) |
多字段联合分组 | group by 字段名1, 字段名2, … |
注意:
(1)单字段分组是把这个字段相同的数据全部归为一组,即聚合成1排。多字段联合分组是把这些字段都相同的数据归为一组。
(2)分组优先度仅次于联结、where筛选。
(3)分组的执行方式讲解:例表学生表:
姓名 | 性别 | 成绩 |
---|---|---|
张三 | 男 | 90 |
李四 | 女 | 80 |
王五 | 男 | 70 |
group by 性别 -> 性别列只剩下男、女
-> 女姓名列 -> 只有李四,如果要输出肯定也只输出李四
-> 男姓名列 -> 同时有张三和李四两排数据 -> 如果不聚合,则输出男组的第一排即张三
-> 女成绩列 -> 只有个80分,如果要输出肯定也只输出80
-> 男成绩列 -> 同时有90和70两排数据 -> 使用avg()平均数函数聚合,则输出80
总结:分组列不用管,其他列如果不聚合就默认输出它那组的第一排,如果聚合了就把组内目标列的数据进行聚合。当然,我们进行分组操作肯定是为了进行聚合,所以一般不聚合的列根本都不会输出。
这里补充一点:如果不分组直接聚合,那么被聚合的列理所应当输出聚合后的结果,但是没聚合的列则输出第一排数据,而不是和聚合列配对的同一排的数据。例如还是上面的表,想查询最低分,直接select 姓名, min(成绩),得到的是张三, 70,结果完全错误。
单字段分组
第511题
group by 分组、min() 求最小日期。
select player_id, min(event_date) as first_loginfrom Activity
group by player_id;
第586题
方法1:group by 分组、order by + count(*)排序、limit限制只输出第一名。(这个方法不能有并列第一,当然本题明说了没有并列第一)
select customer_numberfrom orders
group by customer_number
order by count(*) desclimit 1;
方法2:group by 分组、having筛选出count(*) >= all(子查询出每个customer的count( *)) 。
select customer_number from Orders
group by customer_numberhaving count(*) >= all( select count(order_number)from ordersgroup by customer_number);
第619题
子查询中group by 分组、count()计数、having筛选、主查询中max()找最大值。
select max(num) as numfrom ( select num, count(*) as cishufrom mynumbersgroup by numhaving cishu = 1) t2;
第1069题
group by 分组、sum()求和。
select product_id,sum(quantity) as total_quantityfrom sales
group by product_id;
第1075题
两表inner join 联结、group by 分组、avg() 计算平均年龄、round() 四舍五入。
select project_id,round(avg(experience_years), 2) as average_yearsfrom project t1 inner join employee t2on t1.employee_id = t2.employee_id
group by project_id;
第1194题
先将matches表的两个选手列和得分列拆成两张表,上下union all、套一层查询inner join联上players表、再按player_id分组、sum()求每人得的总分、order by按题目要求排好序、再套一层查询使用group by按group_id分组、由于group by的特性,如果不聚合,输出的就是每组的第一排数据,即刚才排好序的每组内得分最高的player。
select group_id,player_idfrom ( select t2.group_id,t2.player_id,sum(score) as scorefrom ( select first_player as player,first_score as score from matchesunion allselect second_player,second_scorefrom matches) t1 inner join players t2on t1.player = t2.player_idgroup by playerorder by score desc, player_id asc) t
group by group_id;
第1211题
group by 分组、avg()求平均数、sum()和case when配合统计目标数据的数量。
select query_name,round(avg(rating / position), 2) as quality,round(100 * sum(case when rating < 3 then 1 else 0 end) / count(*), 2) as poor_query_percentagefrom queries
group by query_name;
第1322题
group by 分组、sum(条件)(条件成立时为true,求和时按1算,否则按0算)统计个数。
select ad_id,coalesce(round(100 * sum(action='Clicked') / sum(action<>'Ignored'), 2), 0) as ctrfrom ads
group by ad_id
order by ctr desc,ad_id asc;
第1327题
left join 两表联结将表id和表名对应起来、where去除不满足日期条件的数据、group by 分组、sum计算总下单数量、最后having筛选。
select product_name,sum(unit) as unitfrom orders t1 left join products t2on t1.product_id = t2.product_idwhere left(order_date, 7) = '2020-02'
group by t1.product_idhaving unit >= 100;
第1484题
group by 分组、count(distinct) 计数、group_concat 字符串拼接。
select sell_date,count(distinct product) as num_sold,group_concat(distinct product order by product asc separator ',') as productsfrom activities
group by sell_date;
第1571题
inner join 两表联结、group by 分组、sum()求和。
select name as WAREHOUSE_NAME,sum(units * v_pro) as VOLUMEfrom warehouse t1 inner join (select product_id,width * length * height as v_profrom products) t2on t1.product_id = t2.product_id
group by t1.name;
第1633题
子查询求用户总数、group by 分组、count()/总数求百分比。
select contest_id,round(100 * count(*) / n, 2) as percentagefrom register, (select count(*) as n from users) t
group by contest_id
order by percentage desc, contest_id asc;
第1729题
group by 分组、count()计数。
select user_id,count(*) as followers_countfrom followers
group by user_id
order by user_id;
第1821题
子查询筛选出对应年份数据、group by 分组、having筛选出正收入数据。
第1890题
group by 分组、year()取时间年份、max()。
select user_id,max(time_stamp) as last_stampfrom loginswhere year(time_stamp) = '2020'
group by user_id;
多字段联合分组
group by 列1, 列2, 列3, …。
第1050题
group by 联合分组、having + count(*) 筛选。
select actor_id,director_idfrom actordirector
group by actor_id, director_idhaving count(*) >= 3;
第1149题
子查询中先将表去重一次,排除看同一本书的情况。主查询中按viewer_id和view_date 联合分组、having+ count() 筛选出看了2篇及以上文章的人、最后distinct去重是排除同一个人多天读多本书的情况~~(太学霸了)~~。
select distinct viewer_id as id from (select distinct article_id,viewer_id,view_datefrom views) t
group by viewer_id, view_datehaving count(*) >= 2
order by id;
第1193题
group by 按country和trans_date的年月部分联合分组、count() + sum(if) 聚合需要的数据。
select date_format(trans_date, '%Y-%m') as month,country,count(*) as trans_count,sum(state = 'approved') as approved_count,sum(amount) as trans_total_amount,sum(if(state='approved', amount, 0)) as approved_total_amountfrom transactions
group by country, month;
第1543题
trim()去空格、lower()改小写、date_format() 或者left() 取年月、group by 联合分组、count()计数。
select lower(trim(product_name)) as product_name,left(sale_date, 7) as sale_date,count(*) as totalfrom sales
group by lower(trim(product_name)), left(sale_date, 7)
order by product_name, sale_date;
第1693题
group by 多字段联合分组、distinct去重。
select date_id,make_name,count(distinct lead_id) as unique_leads,count(distinct partner_id) as unique_partnersfrom DailySales
group by date_id, make_name;
第1699题
方法1:子查询中用case when 给from_id和to_id交换顺序、主查询中group by 联合分组、count()和sum() 统计。
SELECT person1,person2, count(*) call_count, sum(duration) total_duration
FROM (
SELECT IF(from_id>to_id, to_id, from_id) person1, IF(from_id>to_id,from_id,to_id) person2, duration
FROM calls
) c
GROUP BY person1, person2
方法2:子查询中用union all 分别把顺序对的数据、顺序不对但交换过的数据拼一起、后续同方法1。
select from_id as person1,to_id as person2,count(*) as call_count,sum(duration) as total_durationfrom ( select from_id,to_id,durationfrom callswhere from_id < to_idunion allselect to_id as from_id,from_id as to_id,durationfrom callswhere from_id > to_id) t
group by from_id, to_id;
第1741题
group by 多字段联合分组、sum()。
select event_day as day,emp_id,sum(out_time) - sum(in_time) as total_timefrom employees
group by event_day, emp_id;
第1951题
子查询中2个ralations表自连接让拥有相同follower的用户处在同行方便之后计数、联结筛选条件是follower_相等且联上来的t2表user_id > 原来t1表的user_id(不然1,7和7,1会算成两种情况)、group by联合分组、rank() 窗口函数以count(*)排序编号、主查询中筛选出排1名的。
select user1_id,user2_idfrom ( select t1.user_id as user1_id,t2.user_id as user2_id,rank() over(order by count(*) desc) as rankingfrom relations t1 inner join relations t2on t1.follower_id = t2.follower_idand t1.user_id < t2.user_idgroup by t1.user_id, t2.user_id) twhere ranking = 1;
9. having 筛选
注意:从使用情景上来说,where 是group by 分组前用,having 是group by 分组后用。从用法上来说,where 后不能跟聚合函数,因为执行where 语句时都还没分组。having 后可以跟聚合函数,因为此时组已经分好了。
第182题
group by 分组、having count()筛选。
select Emailfrom Person
group by Emailhaving count(Email) > 1;
第586题
方法1:group by 分组、order by + count(*)排序、limit限制只输出第一名。(这个方法不能有并列第一,当然本题明说了没有并列第一)
方法2:group by 分组、having筛选出count(*) >= all(子查询出每个customer的count( *)) 。
第596题
group by 分组、having + count(*) 筛选。
select classfrom courses
group by classhaving count(*) >= 5;
第1050题
group by 联合分组、having + count(*) 筛选。
select actor_id,director_idfrom actordirector
group by actor_id, director_idhaving count(*) >= 3;
第1076题
方法1:子查询中group by按照id分组、count(*)计算人数、主查询中再用group by分组、having筛选、count( * )>=all(子查询中人数)。
select project_idfrom ( select project_id,count(project_id) as numfrom projectgroup by project_id) twhere num >= all( select count(project_id)from projectgroup by project_id)
方法2:子查询中group by按照id分组、用rank()函数按照count(*)排序(这里比较重要的是rank里是可以用聚合函数的)、主查询中where筛选排序第1名的(rank和dense_rank都可以排出并列排名)。
select project_idfrom ( select project_id,rank() over(order by count(*) desc) as numfrom projectgroup by project_id) twhere num = 1;
第1501题
子查询将calls表拆成2份并union all到一起,第一份是caller_id + duration,第二份是callee_id + duration,这是因为一通电话实际上对于呼出和呼入用户的国家来说要各计算一次,即使国家相同也是一样。inner join联好人名表,国家表、group by 分组、having中用avg() > 全世界电话平均时长来筛选、这里还要用子查询计算一次全世界的电话平均时长。
select t3.name as countryfrom ( select caller_id,durationfrom callsunion allselect callee_id,durationfrom calls) t1 inner join person t2on t1.caller_id = t2.id inner join country t3on left(t2.phone_number, 3) = t3.country_code
group by t3.namehaving avg(t1.duration) > (select avg(duration) from calls);
第1511题
inner join联结3表、group by分组、having和case when配合筛选。
select t3.customer_id,t3.namefrom orders t1 inner join product t2 inner join customers t3on t1.product_id = t2.product_idand t1.customer_id = t3.customer_id
group by t1.customer_idhaving sum(case when left(order_date, 7) = '2020-06' then t1.quantity * t2.price end) >= 100and sum(case when left(order_date, 7) = '2020-07' then t1.quantity * t2.price end) >= 100;
第1587题
inner join 两表联结、group by 分组、sum()求和、having 筛选。
第1867题
子查询中查出每个order_id分组下quantity的平均值、主查询中再次group by 分组并用having 筛选出max(quantity) > all子查询平均值的数据。
select order_idfrom ordersdetails
group by order_idhaving max(quantity) > all( select avg(quantity)from ordersdetailsgroup by order_id);
第2041题
natural join 联结两表、group by 按面试id分组、where 筛选工作经验2年以上、having + sum() 筛选总分>15。
select candidate_idfrom rounds t1 natural join candidateswhere years_of_exp >= 2
group by interview_idhaving sum(score) > 15;
10. if/case 语句
功能 | 用法 |
---|---|
如果满足条件则输出A,否则输出B | if(条件, A, B) |
如果目标列下的值等于目标值则输出A,否则输出B | case 目标列 when 目标值 then A else B end |
如果满足条件1则输出A,满足条件2则输出B,否则C | case when 条件1 then A when 条件2 then B else C end |
注意:由于if和case对于只有1个条件的情景用法是一样的,为了统一起见本文档只用case when,实际情况根据个人习惯而定。
简单的case语句(即只能判断是否相等)
第578题
group by 分组、order by 里计算回答率(使用sum(条件语句))、limit 1。
select question_id as survey_logfrom surveylog
group by question_id
order by sum(action='answer') / sum(action='show') desc, question_id asclimit 1;
第610题
case when。
select x,y,z,if(y + z > x and y + x > z and x + z > y, 'Yes', 'No') as trianglefrom triangle;
第627题
update、case when。
update salary set sex = case sex when 'm' then 'f' else 'm' end;
第1083题
方法一:分别where筛选出买S8和iphone的人,再not in 找差集。
select distinct buyer_idfrom sales t1 left join product t2on t1.product_id = t2.product_idwhere buyer_id not in (select buyer_idfrom sales t1 left join product t2on t1.product_id = t2.product_idwhere product_name = 'iPhone')and product_name = 'S8';
方法二:用sum(case when)的方法绕开子查询达到找差集目的,具体做法是求和时遇到“S8”就+1,遇到“iphone”就-1,只买“S8”的人的求和结果就是1,都买了的人就是0。
select buyer_idfrom sales t1 left join product t2on t1.product_id = t2.product_idwhere product_name in ('S8', 'iPhone')
group by buyer_idhaving sum(distinct case product_name when 'S8' then 1 else -1 end) = 1;
第1126题
子查询中group by分组,avg()聚合求出每种event的平均数、主查询中原表natural join子查询、group by按business_id分组、having + sum(case when)统计并筛选。
select business_idfrom events t1 natural join ( select event_type,avg(occurences) as estdfrom eventsgroup by event_type) t2
group by business_idhaving sum(case when occurences > estd then 1 else 0 end) >= 2;
第1173题
sum()配合case语句只统计目标数量。还可以直接sum(条件),当条件为true时+1,条件为false时不加。
select round(100 * sum(order_date = customer_pref_delivery_date) / count(*), 2) as immediate_percentagefrom delivery;
第1179题(行转列/长转宽)
group by 分组、sum + case when 统计数据(每个月都要写一遍,写12遍)。
select id,sum(case `month` when 'Jan' then revenue else null end) as Jan_Revenue,sum(case `month` when 'Feb' then revenue else null end) as Feb_Revenue,sum(case `month` when 'Mar' then revenue else null end) as Mar_Revenue,sum(case `month` when 'Apr' then revenue else null end) as Apr_Revenue,sum(case `month` when 'May' then revenue else null end) as May_Revenue,sum(case `month` when 'Jun' then revenue else null end) as Jun_Revenue,sum(case `month` when 'Jul' then revenue else null end) as Jul_Revenue,sum(case `month` when 'Aug' then revenue else null end) as Aug_Revenue,sum(case `month` when 'Sep' then revenue else null end) as Sep_Revenue,sum(case `month` when 'Oct' then revenue else null end) as Oct_Revenue,sum(case `month` when 'Nov' then revenue else null end) as Nov_Revenue,sum(case `month` when 'Dec' then revenue else null end) as Dec_Revenuefrom department
group by id;
第1211题
group by 单字段分组、avg()求平均数、sum()和case when配合统计目标数据的数量。
select query_name,round(avg(rating / position), 2) as quality,round(100 * sum(case when rating < 3 then 1 else 0 end) / count(*), 2) as poor_query_percentagefrom queries
group by query_name;
第1393题
case给买进金额加负号、sum()求和。
select stock_name,sum(case operation when 'Buy' then -price else price end) as capital_gain_lossfrom stocks
group by stock_name;
第1445题
group by 分组、sum(case when) 黄金组合做同列减法、order by 排序。
select sale_date,sum(case fruit when 'oranges' then -sold_num else sold_num end) as difffrom sales
group by sale_date
order by sale_date;
第1479题
weekday取星期数,case when判断,sum()求和。
select item_category as category,sum(case when weekday(order_date) = 0 then quantity else 0 end) as monday,sum(case when weekday(order_date) = 1 then quantity else 0 end) as Tuesday,sum(case when weekday(order_date) = 2 then quantity else 0 end) as Wednesday,sum(case when weekday(order_date) = 3 then quantity else 0 end) as Thursday,sum(case when weekday(order_date) = 4 then quantity else 0 end) as Friday,sum(case when weekday(order_date) = 5 then quantity else 0 end) as Saturday,sum(case when weekday(order_date) = 6 then quantity else 0 end) as Sundayfrom items t1 left join orders t2on t1.item_id = t2.item_id
group by item_category
order by item_category;
第1511题
inner join联结3表、group by分组、having和case when配合筛选。
select t3.customer_id,t3.namefrom orders t1 inner join product t2 inner join customers t3on t1.product_id = t2.product_idand t1.customer_id = t3.customer_id
group by t1.customer_idhaving sum(case when left(order_date, 7) = '2020-06' then t1.quantity * t2.price end) >= 100and sum(case when left(order_date, 7) = '2020-07' then t1.quantity * t2.price end) >= 100;
第1661题
group by 分组、sum(case when) 黄金组合做同列减法求出总运行时间再除以count()进程数。
select machine_id,round(2 * sum(case when activity_type = 'start' then -timestamp else timestamp end) / count(process_id), 3) as processing_timefrom activity
group by machine_id
第1699题
方法1:子查询中用case when 给from_id和to_id交换顺序、主查询中group by 联合分组、count()和sum() 统计。
SELECT person1,person2, count(*) call_count, sum(duration) total_duration
FROM (
SELECT IF(from_id>to_id, to_id, from_id) person1, IF(from_id>to_id,from_id,to_id) person2, duration
FROM calls
) c
GROUP BY person1, person2
方法2:子查询中用union all 分别把顺序对的数据、顺序不对但交换过的数据拼一起、后续同方法1。
select from_id as person1,to_id as person2,count(*) as call_count,sum(duration) as total_durationfrom ( select from_id,to_id,durationfrom callswhere from_id < to_idunion allselect to_id as from_id,from_id as to_id,durationfrom callswhere from_id > to_id) t
group by from_id, to_id;
第1777题(行转列/长转宽)
group by 分组、聚合函数(sum()和max()都可以)+ case when 统合数据。
select product_id,sum(if(store='store1', price, null)) as store1,sum(if(store='store2', price, null)) as store2,sum(if(store='store3', price, null)) as store3from products
group by product_id;
第1783题(列转行/宽转长)
方法1:用多个union all 将championships表的数据汇总到一列并将列名改成player_id、natural join players 表、group by 分组、count() 统计。
select player_id,player_name,count(*) as grand_slams_countfrom ( select wimbledon as player_idfrom championshipsunion allselect fr_open as player_idfrom championships t1union allselect US_open as player_idfrom championshipsunion allselect Au_open as player_idfrom championships) t1 natural join players t2
group by player_id;
方法2:两张表生成笛卡尔积,此时每个球员都对应一份championships表的全部数据、group by 分组、sum(case when) 筛选统计。
SELECT player_id,player_name, SUM(IF(player_id=Wimbledon,1,0)+IF(player_id=Fr_open,1,0)+IF(player_id=US_open,1,0)+IF(player_id=Au_open,1,0)) AS grand_slams_count
FROM Championships
JOIN Players
GROUP BY player_id
HAVING grand_slams_count>0
第1934题
left join 左外联结、group by 分组、avg(case when) 把’confirmed’转成1,其他转成0然后求平均数。
select t1.user_id,round(avg(case action when 'confirmed' then 1 else 0 end), 2) as confirmation_ratefrom signups t1 left join confirmations t2on t1.user_id = t2.user_id
group by t1.user_id;
第2308题
仔细审题可以发现,每种性别内部的user_id都是排好序的,即输出表中第一次出现的female、other、male的id一定是他们这个性别按id排序排第1的id。可以看做按性别分区后按id编号,以这个编号排序的结果。如果编号相同就按照female、other、male排序。所以思路是先rank()窗口函数按性别分区(partition by)、编号,然后按题目要求的性别顺序打上1、2、3三种tag。最后在主查询中order by排序。
select user_id,genderfrom (select user_id,gender,rank() over(partition by gender order by user_id) as rk,case gender when 'female' then 1when 'other' then 2else 3 end as tagfrom genders) t
order by rk, tag;
可搜索的case语句
第608题
case when + 多条件、去null值。
select id,case when p_id is null then 'Root' wen id not in ( select p_id from treewhere p_id is not null) then 'Leaf'else 'Inner' end as `type`from tree;
第1294题
left join 两表联结、where筛选日期、group by分组、case when和avg计算平均天气并返回结果。
select country_name,case when avg(weather_state) <= 15 then 'Cold' when avg(weather_state) >= 25 then 'Hot'else 'Warm' end as weather_typefrom weather t1 left join countries t2on t1.country_id = t2.country_id where left(t1.day, 7) = '2019-11'
group by t1.country_id;
第1398题
natural join 两表联结、group by 分组、having + sum(distinct case when) 统计并筛选出只买了A和B却没有买C的用户。
具体做法是:买A的用case when标记成1,B的2,C的10,其他0,这样求和时只买了A和B却没买C的人一定是3,由于加了distinct,所以不会出现买多个A或B的人被重复计算的情况。
select customer_id,customer_namefrom customers t1 natural join orders t2
group by customer_idhaving sum(distinct case when product_name = 'A' then 1 when product_name = 'B' then 2 when product_name = 'C' then 10 else 0 end) = 3;
第1440题
SQL里没有eval(),只有用case when。
inner join 把variables表联到expressions表上,联两次,一次是为了确定不等式左边,一次是右边、case when分别写出operator是>、<、=的三种情况、注意上一步可能计算不等式出来的结果是0和1,所以还要套一个case when或者if转成true和false。
select left_operand,operator,right_operand,if(case operator when '>' then t2.value > t3.value when '<' then t2.value < t3.valueelse t2.value = t3.value end, 'true', 'false') as valuefrom expressions t1 inner join variables t2on t1.left_operand = t2.nameinner join variables t3on t1.right_operand = t3.name;
第1468题
子查询中分组max()聚合查出每个公司的最高工资、主查询中原表natural join子表、case when分条件计算税后工资。
select company_id,employee_id,employee_name,round(case when cstd < 1000 then salarywhen cstd between 1000 and 10000 then salary * 0.76else salary * 0.51 end, 0) as salaryfrom salaries t1 natural join ( select company_id,max(salary) as cstdfrom salariesgroup by company_id) t2;
第1873题
case when + 多条件。
select employee_id,case when employee_id % 2 = 1 and left(name, 1) <> 'M' then salary else 0 end as bonusfrom employees;
第1907题
sum(case when)计数、3种工资都统计完了之后union all。
select 'Low Salary' as category,sum(if(income < 20000, 1, 0)) as accounts_countfrom accounts
union allselect 'Average Salary' as category,sum(if(income between 20000 and 50000, 1, 0)) as accounts_countfrom accounts
union allselect 'High Salary' as category,sum(if(income > 50000, 1, 0)) as accounts_countfrom accounts;
第2051题
3表left join 联结(需要筛选)、group by 分组、case when 分类、分类标准里用两次count() 计算次数,买东西次数/来店次数的比值作分类标准。使用聚合函数时null会被剩下来,可以最后用else处理null的情况。
select t1.member_id,t1.name,case when count(t3.visit_id) / count(t2.visit_id) < 0.5 then "Silver" when count(t3.visit_id) / count(t2.visit_id) >= 0.8 then "Diamond" when count(t3.visit_id) / count(t2.visit_id) < 0.8 and count(t3.visit_id) / count(t2.visit_id) >= 0.5 then "Gold" else "Bronze" end as categoryfrom members t1 left join visits t2on t1.member_id = t2.member_idleft join purchases t3on t2.visit_id = t3.visit_id
group by t1.member_id;
第2072题
case when + 多条件。
select case when (select count(*) as result from newyork where score >= 90) > (select count(*) as result from california where score >= 90)then 'New York University'when (select count(*) as result from newyork where score >= 90) < (select count(*) as result from californiawhere score >= 90) then 'California University'else 'No Winner' end as winner;
第2298题
weekday()从日期中取得星期几、sum(case when)分类统计、when里面写法是weekday(日期) in (元组),而不是一个个去枚举week(day)=0、=1、=2的情况。
select sum(case when weekday(submit_date) in (5, 6) then 1 else 0 end) as weekend_cnt,sum(case when weekday(submit_date) in (5, 6) then 0 else 1 end) as working_cntfrom tasks;
11. 窗口函数
功能 | 函数 |
---|---|
按照顺序编号,1、2、3、4 | row_number() over([partition by] … [order by] …) |
跳跃排序,1、2、2、4 | rank() over([partition by] … [order by] …) |
连续排序,1、2、2、3 | dense_rank() over([partition by] … [order by] …) |
求和(不写order by 全加,写了只加到本行之前) | sum() over([partition by] … [order by] …) |
从col开始数,取前面offset行,如果没有就取到default值 | lag(col, offset, default) over([partition by] … [order by] …) |
从col开始数,取后面offset行,如果没有就取到default值 | lead(col, offset, default) over([partition by] … [order by] …) |
按列名2分区后求列名1的最小值,并将结果跟在本区所有行后面 | min(列名1) over(partition by 列名2) |
按列名2分区后求列名1的最大值,并将结果跟在本区所有行后面 | max(列名1) over(partition by 列名2) |
窗口函数聚合时的范围限定在前/后N范围的数据 | range N preceding/following |
窗口函数聚合时的范围限定在前/后N行数据 | rows N preceding/following |
注意区分rows 和 range,例:
有4行数据:2,3,4,7月。现在想对每个月的最近3月数据进行聚合,即对4月来说是2、3、4月,对7月来说是5、6、7月。
写rows 2 preceding,对于7月行会取到3、4、7月三行数据。
写range 2 preceding,对于7月行则只会取到7月一行数据,因为没有5、6月数据。
在这个案例中写range才是合适的。
第176题
方法1:子查询中distinct 去重、order by 排序、limit + offset 输出第二位、主查询再查一次把空值变成null。
SELECT(SELECT DISTINCTSalaryFROMEmployeeORDER BY Salary DESCLIMIT 1 OFFSET 1) AS SecondHighestSalary
;
方法2:子查询中dense_rank() 排序、主查询中where 筛选、sum()或者max()或者min()聚合函数把空值变成null。
select max(salary) as SecondHighestSalaryfrom (select salary,dense_rank() over(order by salary desc) as rk from employee) twhere rk = 2;
第177题
类似第176题。用方法1要先set n = n - 1。
CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGINset n = n - 1;RETURN (# Write your MySQL query statement below.select salaryfrom ( select distinct salary from employeeorder by salary desc) tlimit 1 offset n);
END
第178题
dense_rank()。
select score,dense_rank() over(order by score desc) as 'rank'from scores;
第185题
子查询中inner join 联表(题目要求就是没联上的地方不要了,而不用显示0)、dense_rank() 排个序、主查询中按序号筛选前3。
select department,employee,salaryfrom (select t2.name as department,t1.name as employee,t1.salary,dense_rank() over(partition by t2.id order by t1.salary desc) as rkfrom employee t1 inner join department t2on t1.departmentid = t2.id) twhere rk <= 3;
第534题
sum() 窗口函数。同1308、2066题。
select player_id,event_date,sum(games_played) over(partition by player_id order by event_date) as games_played_so_farfrom activity;
第550题
下次这种题目举例只给1个考文字理解的出题者能不能先打死啊?tnnd的我看了1小时题才看明白
子查询或者临时表中使用min()窗口函数按player_id分区求出初次登陆日期,再一口气求出当次日期与初次登陆日期差值,别名为diff,很明显每个player_id只有1个diff为1的行,即第二天登陆那行。主查询中count() 子查询/临时表里的where diff = 1的行、除以count(distinct player_id) from 原表、再套个select来查这个商,别名为fraction,注意如果就是为了计算,可以不写from。
with temp as (select player_id,event_date - min(event_date) over(partition by player_id) as difffrom activity
)select round((select count(*)from tempwhere diff = 1) / (select count(distinct player_id) from activity), 2) as fraction;
第569题
方法1:临时temp表中将每个公司的中位数计算出来(需要union all一次,因为偶数会对应两个中位数)。子查询中row_number() 窗口函数编号、联上temp表中的中位数、联上的地方就是结果。
with temp as (select company,round((count(*) + 1) / 2, 0) as midfrom employeegroup by companyunion allselect company,case when count(*) % 2 = 0 then count(*) / 2 endfrom employeegroup by company
)select id,t1.company,salaryfrom (select id,company,salary,row_number() over(partition by company order by salary) as rkfrom employee) t1 inner join temp t2on t1.company = t2.companyand t1.rk = t2.mid;
方法2:正反序找中位数。子查询中row_number() 窗口函数编正反序、count()窗口函数将总数算在后面、主查询中筛选出正反序都大于等于总数/2的数据。
select id,t1.company,salaryfrom (select id,company,salary,row_number() over(partition by company order by salary, id) as rk1,row_number() over(partition by company order by salary desc, id desc) as rk2,count(*) over(partition by company) as cntfrom employee) t1where rk1 >= cnt / 2and rk2 >= cnt / 2;
第571题
临时temp表中用recursive递归达到“解压数字”效果。子查询中用row_number()窗口函数分别按正反序编号、count()窗口函数加一列总数、主查询中筛选出中位数、avg求平均值。
with recursive temp as(select num,frequencyfrom numbersunion allselect num,frequency - 1from tempwhere frequency > 1
)select round(avg(num), 1) as medianfrom (select num,row_number() over(order by num) as rk1,row_number() over(order by num desc) as rk2,count(*) over() as cntfrom temp) t where rk1 >= cnt / 2and rk2 >= cnt / 2
第579题
子查询中rank()窗口函数排序,方便一会儿剔除最大月份、sum()窗口函数求累计和,这里加上range 2 preceding表示求本月和最近的前两个月的累计和、主查询中筛选出排序大于1的数据、order by排序。
select id,month,salaryfrom (select id,month,rank() over(partition by id order by month desc) as rk,sum(salary) over(partition by id order by month range 2 preceding) as salaryfrom employee) twhere rk > 1
order by id asc, month desc;
第613题
方法一:两表inner join 自连接再筛选。计算出任意两点间的距离,再min()找出最小的距离。
方法二:由于最小距离只存在于相邻两个点中,所以没必要算出全部任意两点间的距离。先用窗口函数rank排序+编号。再左外联结拼出一张同时有某点和它右一个点的坐标的临时表,并select出这个临时表中相邻两点的距离。最后min()找出最小距离。
select distinct min(dis) as shortestfrom (select abs(t2.x - t1.x) as disfrom ( select x, rank() over(order by x) as idfrom point) t1 left join ( select x,rank() over(order by x) as idfrom point) t2on t1.id + 1 = t2.id) t3;
第618题
第一个子查询查美洲学生,后面的学生都没美洲学生多,所以可以用left join联上去。给每个子查询都用row_number()窗口函数编号,方便后续按照编号联结。
select America,Asia,Europefrom (select name as America,row_number() over(order by name) as rkfrom studentwhere continent = 'America') t1 left join(select name as Asia,row_number() over(order by name) as rkfrom studentwhere continent = 'Asia') t2on t1.rk = t2.rkleft join(select name as Europe,row_number() over(order by name) as rkfrom studentwhere continent = 'Europe') t3on t1.rk = t3.rk;
第1076题
方法1:子查询中group by按照id分组、count(*)计算人数、主查询中再用group by分组、having筛选、count( * )>=all(子查询中人数)。
select project_idfrom ( select project_id,count(project_id) as numfrom projectgroup by project_id) twhere num >= all( select count(project_id)from projectgroup by project_id)
方法2:子查询中group by按照id分组、用rank()函数按照count(*)排序(这里比较重要的是rank里是可以用聚合函数的)、主查询中where筛选排序第1名的(rank和dense_rank都可以排出并列排名)。
select project_idfrom ( select project_id,rank() over(order by count(*) desc) as numfrom projectgroup by project_id) twhere num = 1;
第1077题
子查询中rank() 窗口函数排序、主查询中筛选排1名的。
select project_id,employee_idfrom (select project_id,employee_id,rank() over(partition by project_id order by experience_years desc) as rankingfrom project t1 natural join employee t2) twhere ranking = 1;
第1112题
方法1:子查询中查出max()分数、inner join联上原表,条件是id一样、grade一样,则不是最高分的没联上就筛掉了、分组并输出数据(因为有并列第一,所以course_id要输出min()最小值)。
select t1.student_id,min(t1.course_id) as course_id,t1.gradefrom enrollments t1 inner join ( select student_id,max(grade) as bestfrom enrollmentsgroup by student_id) t2on t1.student_id = t2.student_idand t1.grade = t2.best
group by student_id
order by student_id;
方法2:子查询中rank()窗口函数排序、主查询中筛选rank = 1。
第1204题
方法同2066题以及1308题,sum+窗口函数求累计和,不赘述。最后多一个order by 和limit 1。
select person_namefrom (select person_name,sum(weight) over(order by turn) as afrom queue) t where a <= 1000
order by a desclimit 1;
第1225题
思路是分别找出失败任务和成功任务再union all,下以失败任务为例讲解。先where筛选出2019年数据、date_sub()配合row_number()编号法找差值别名为diff(编号法用过很多次了不细讲了)、主查询中按diff分组查出min(fail_date)和max(fail_date)作为start和end日期。
select 'failed' as period_state,min(fail_date) as start_date,max(fail_date) as end_datefrom (select fail_date,date_sub(fail_date, interval row_number() over(order by fail_date) day) as diff from failedwhere year(fail_date) = 2019) tgroup by diff
union allselect 'succeeded' as period_state,min(success_date) as start_date,max(success_date) as end_datefrom (select success_date,date_sub(success_date, interval row_number() over(order by success_date) day) as diff from succeededwhere year(success_date) = 2019) tgroup by difforder by start_date;
第1285题
方法1:两个logs表cross join自联结、起点-1不在logs表中,终点+1不在logs表中,且终点值大于起点值,借此筛选出起点和终点、由于用了笛卡尔积,每个起点都对应全部终点,所以再用group by和min() 函数选出终点。
select t1.log_id as start_id,min(t2.log_id) as end_idfrom logs t1 inner join logs t2on t1.log_id <= t2.log_idwhere (t1.log_id - 1) not in (select log_id from logs)and (t2.log_id + 1) not in (select log_id from logs)
group by start_id
order by start_id;
方法2:基本原理:连续区间中的数与某一基准等差数列的差值是固定的。例如123567与123456一一对应相减的差是000111,可借此差值对区间进行分类。
log_id - row_number() over() 得到差值、借之前的差值分组、每组最小的为start,最大的为end。
第1308题
sum() over() 窗口函数、order by 排序。和2066题差不多。
select gender,day,sum(score_points) over(partition by gender order by day) as totalfrom scores
order by gender, day;
第1369题
子查询中窗口函数rank()排序,count()计数、主查询中筛选出排第二的数据或者计数只有1的数据。
select username,activity,startdate,enddatefrom (select username,activity,startdate,enddate,rank() over(partition by username order by startdate desc) as rk,count(*) over(partition by username) as cntfrom useractivity) twhere rk = 2or cnt = 1;
第1412题
临时temp表中用min()、max()窗口函数找到每次考试最高最低分,再顺手求个当前学生分数和最高/最低分的分差、套一层查询筛选出两个分差乘积为0的id,这一步也就是在找考了最高分或最低分的id。
主查询中where筛选出在exam表(即参加了考试),但不在temp表(即没考到最高或最低分)中的人。
with temp as (select distinct student_idfrom (select student_id,score - min(score) over(partition by exam_id order by score asc) as d1,score - max(score) over(partition by exam_id order by score desc) as d2from exam) t where d1 * d2 = 0
)select student_id,student_namefrom studentwhere student_id in (select student_id from exam)and student_id not in (select student_idfrom temp);
第1454题(连续区间)
临时表对原表按照id和login_date联合去重,防止1个人1天登录多次。子查询中row_number()窗口函数编号准备找连续区间,但是这里有个问题,即原表是没有编号的,只能用日期减,即再窗口函数外面再套一个date_sub() 函数,给差别名为diff。主查询里将子查询right join到accounts表中(注意这本来inner join更好,采取right join的原因是之后就不用排序了)、按id和diff联合分组、having + count(*) 筛选连续天数大于等于5天的人、最后select时要去重,因为有的人这段时间连续登陆5天以上,后面哪段时间又连续登陆5天以上。
with temp as (select distinct id,login_datefrom logins
)select distinct t1.id,t2.namefrom (select id,date_add(login_date, interval -row_number() over(partition by id order by login_date) day) as difffrom temp) t1 right join accounts t2on t1.id = t2.id
group by t1.id, diffhaving count(*) >= 5;
第1532题
子查询中rank() over() 按用户id分区+排序、主查询中筛选出排名<=3的数据并natural join上原来的orders表和customers表、最后order by排序。
select name as customer_name,customer_id,order_id,order_datefrom (select order_id,rank() over(partition by customer_id order by order_date desc) as rk from orders) t1 natural join orders t2 natural join customers t3where rk <= 3
order by customer_name, customer_id, order_date desc;
第1549题
子查询中rank窗口排序按product_id分区,order_date降序排序、主查询中筛选排第1的数据。
select product_name,product_id,order_id,order_datefrom (select t2.product_name,t1.product_id,t1.order_id,order_date,rank() over(partition by t1.product_id order by t1.order_date desc) as rankingfrom orders t1 natural join products t2) twhere ranking = 1
order by product_name, product_id, order_id;
第1596题
子查询中group by customer_id、product_id联合分组、rank() over(partition by customer_id order by count(*))、主查询中having 筛选出排名1的数据。这道题需要注意的两点是:1. 先按两个字段联合分组再在窗口函数中按单个字段分区;2. 窗口函数中还可以写聚合函数。
select customer_id,product_id,product_namefrom ( select customer_id,product_id,rank() over(partition by customer_id order by count(*) desc) as rankingfrom ordersgroup by customer_id, product_id) t1 natural join products t2where ranking = 1;
第1709题
方法1(开窗):子查询中用lead() 窗口函数把每个日期的下一次日期拼到后面、主查询中日期做差、分组聚合。
SELECTuser_id,MAX(DATEDIFF(next_day, visit_date)) AS biggest_window
FROM (SELECTuser_id,visit_date,LEAD(visit_date, 1, '2021-1-1') OVER (PARTITION BY user_id ORDER BY visit_date) AS next_dayFROM UserVisits
) tmp
GROUP BY user_id
ORDER BY user_id
方法2(不开窗):子查询中原表left join自连接、筛选条件是两表user_id相等且表2日期大于表1日期、两表日期相减(这里还要用coalesce处理null值)、主查询中group by分组、max() 聚合。
select user_id,max(diff) as biggest_windowfrom ( select t1.user_id,datediff(min(coalesce(t2.visit_date, '2021-1-1')), t1.visit_date) as difffrom uservisits t1 left join uservisits t2on t1.user_id = t2.user_idand t1.visit_date < t2.visit_dategroup by t1.user_id, t1.visit_date) t
group by user_id
order by user_id;
第1811题
临时表temp中将原比赛表的3列转成1列方便统计每个人的获奖情况、找连续区间用编号法,算出contest_id和row_number() 编号的差值方便后续按差值diff分组。主查询中先将得了3次金奖的人选出并union all 连续3次得奖的人、group by按照id和diff联合分组,having + count() 计数选出连续得奖3次的人。再套一层查询natural join user表得到名字和邮箱、distinct去重以免有些又连续3次获奖又得3次金奖的人被算多。
with temp as (select contest_id,player,contest_id - row_number() over(partition by player order by contest_id) as difffrom ( select contest_id,gold_medal as playerfrom contestsunion allselect contest_id,silver_medalfrom contestsunion allselect contest_id,bronze_medalfrom contests) t
)select distinct name,mailfrom ( select gold_medal as user_idfrom contestsgroup by gold_medal having count(*) >= 3union allselect playerfrom tempgroup by player, diffhaving count(*) >= 3) t natural join users;
第1831题
方法1:子查询中rank() over() 排出每天amount的名次、主查询中得到排第1名的数据。
方法2:子查询中group by 分组、max() 求最大值、inner join 到原表上、筛选出原表中日期等于子查询日期且amount = 当日最大值的数据。
select transaction_idfrom transactions t1 inner join( select date(day) as day,max(amount) as firstfrom transactionsgroup by date(day)) t2on date(t1.day) = t2.dayand t1.amount = t2.first
order by transaction_id;
第1875题
两个原表用inner join按照工资相同id不同自连接以排除掉工资独一无二的员工、dense_rank() 窗口函数按工资排出顺序、distinct去除联结后重复出现的数据、order by 排序。
select distinct t1.employee_id,t1.name,t1.salary,dense_rank() over (order by t1.salary) as team_idfrom employees t1 inner join employees t2on t1.salary = t2.salaryand t1.employee_id <> t2.employee_id
order by team_id asc, employee_id asc
第1951题
子查询中2个ralations表自连接让拥有相同follower的用户处在同行方便之后计数、联结筛选条件是follower_相等且联上来的t2表user_id > 原来t1表的user_id(不然1,7和7,1会算成两种情况)、group by联合分组、rank() 窗口函数以count(*)排序编号、主查询中筛选出排1名的。
select user1_id,user2_idfrom ( select t1.user_id as user1_id,t2.user_id as user2_id,rank() over(order by count(*) desc) as rankingfrom relations t1 inner join relations t2on t1.follower_id = t2.follower_idand t1.user_id < t2.user_idgroup by t1.user_id, t2.user_id) twhere ranking = 1;
第2066题
方法1:sum(case when) over(partition by … order by …) 改数据符号并求和。sum() + 窗口函数刚好可以满足这道题的要求。
select account_id,day,sum(case type when 'Deposit' then amount else -amount end) over(partition by account_id order by day) as balancefrom transactions
order by account_id, day;
方法2:transactions 表重复联结2次、筛选只留下表1日期>=表2日期的数据,即对于7号只留7号数据,对于9号则需要7号和9号两天的数据、sum(case when)求和。
第2112题
孙查询中union all把出发机场和到达机场列联结起来、子查询中分组聚合求飞机数并按飞机数用rank() 窗口函数排序、父查询中筛选出排第1的。
select airport_idfrom ( select airport_id,rank() over(order by sum(flights_count) desc) as numfrom ( select departure_airport as airport_id,flights_countfrom flights union all select arrival_airport as airport_id,flights_count from flights) t1group by airport_id) t2where num = 1;
第2159题
由于一个表中不可能排2个序,所以使用两个子查询分别排序。按照1列升序和2列降序打编号+排序,打编号用row_number() 窗口函数、两个子查询按编号inner join、主查询中查出表1的1列和表2的2列。
select t1.first_col,t2.second_colfrom ( select first_col,row_number() over(order by first_col asc) as tagfrom data) t1 inner join ( select second_col,row_number() over(order by second_col desc) as tagfrom data) t2using(tag);
第2175题
子查询中先把变化后的新分数作为新列添加原表上、套一层查询用row_number窗口函数计算原排名和新排名、再套一层函数计算排名差。注意这个排名差设置了非负性,不能直接减,要么用if语句、要么用cast() 函数做类型转换再减。
select team_id,name,if(rk1 > rk2, rk1 - rk2, -(rk2 - rk1)) as rank_difffrom (select team_id,name,row_number() over(order by points desc, name asc) as rk1,row_number() over(order by new_points desc, name asc) as rk2from (select team_id,name,points,points + points_change as new_pointsfrom teampoints natural join pointschange) t0) t1
cast(列名 as 数据类型)
第2228题
方法1:两个原表自连接、筛选条件是表2的id和表1id相同,且表2日期大于等于表1日期,且表2purchase_id不等于表1purchase_id、group by 按表1user_id和表1purchase_date联合分组、取得min(表2purchase_date),至此终于找到每一天对应的最近的下一次购买日期、datediff求差值、写进having里筛选。
select distinct t1.user_idfrom purchases t1 inner join purchases t2on t1.user_id = t2.user_idand t2.purchase_date >= t1.purchase_dateand t1.purchase_id <> t2.purchase_id
group by t1.user_id, t1.purchase_datehaving datediff(min(t2.purchase_date), t1.purchase_date) <= 7
order by user_id
方法2:子查询中用lag()窗口函数按user_id分区,purchase_date和purchase_id排序,将每次购买日期的上次购买日期接在每次购买日期后面。主查询中datediff算差值、where筛选。
select distinct user_idfrom (select user_id,purchase_date,lag(purchase_date, 1, '2000-01-01') over(partition by user_id order by purchase_date, purchase_id) as last_datefrom purchases) twhere datediff(purchase_date, last_date) <= 7
order by user_id
第2292题
创建临时表,按product_id和purchase_date中的年份联合分组、having+count() 筛选出卖出3份及以上的数据。临时表再套一层查询,使用lag() 窗口函数按product_id分区,将每年的上一次年份接到后面、再套一层查询筛选出两个年份相差1年,即相邻的数据、distinct去重。
with temp as (select product_id,year(purchase_date) as pyearfrom ordersgroup by product_id, pyearhaving count(*) >= 3
)select distinct product_idfrom (select product_id,pyear - lag(pyear, 1, '0000') over(partition by product_id order by pyear) as difffrom temp) twhere diff = 1;
第2308题
仔细审题可以发现,每种性别内部的user_id都是排好序的,即输出表中第一次出现的female、other、male的id一定是他们这个性别按id排序排第1的id。可以看做按性别分区后按id编号,以这个编号排序的结果。如果编号相同就按照female、other、male排序。所以思路是先rank()窗口函数按性别分区(partition by)、编号,然后按题目要求的性别顺序打上1、2、3三种tag。最后在主查询中order by排序。
select user_id,genderfrom (select user_id,gender,rank() over(partition by gender order by user_id) as rk,case gender when 'female' then 1when 'other' then 2else 3 end as tagfrom genders) t
order by rk, tag;
第2314题
子查询中排序、主查询中筛选第一名。
select city_id,day,degree from (select city_id,day,degree,rank() over(partition by city_id order by degree desc, day asc) as rk from weather) twhere rk = 1;
第2324题
子查询中按user_id和product_id联合分组、sum求出总开销、rank窗口函数按照总开销排序、主查询中筛选第一名。
select user_id,product_idfrom ( select user_id,product_id,rank() over(partition by user_id order by sum(quantity * price) desc) as rkfrom sales t1 inner join product t2using(product_id)group by user_id, product_id) t
where rk = 1;
第2346题
子查询中rank窗口函数排序、count窗口函数计数、主查询中计算百分数。
select student_id,department_id,round(coalesce((rk - 1) * 100 / (cnt - 1), 0), 2) as percentagefrom (select student_id,department_id,rank() over(partition by department_id order by mark desc) as rk,count(student_id) over(partition by department_id) as cntfrom students) t
12. 限定谓词
功能 | 用法 |
---|---|
筛选大于或小于子查询中全部结果的数据 | where/having + 字段名 > 或 < all(子查询) |
筛选大于或小于子查询中任意结果的数据 | where/having + 字段名 > 或 < any(子查询) |
注意:子查询中只能输出1列,不然报错,哪怕是横着的1排也不行。
第574题
方法1:子查询中分组count()计算出每个候选人票数、主查询中再次分组计数、having > all(子查询)筛选。
select namefrom vote t1 inner join candidate t2on t1.candidateid = t2.id
group by t1.candidateidhaving count(*) >= all( select count(*)from votegroup by candidateid);
方法2:子查询中分组count()计算出每个候选人票数、order by 排序、limit 1输出第一名、主查询中只查name。
第586题
方法1:group by 分组、order by + count(*)排序、limit限制只输出第一名。(这个方法不能有并列第一,当然本题明说了没有并列第一)
方法2:group by 分组、having筛选出count(*) >= all(子查询出每个customer的count( *)) 。
第1076题
方法1:子查询中group by按照id分组、count(*)计算人数、主查询中再用group by分组、having筛选、count( * )>=all(子查询中人数)。
select project_idfrom ( select project_id,count(project_id) as numfrom projectgroup by project_id) twhere num >= all( select count(project_id)from projectgroup by project_id)
方法2:子查询中group by按照id分组、用rank()函数按照count(*)排序(这里比较重要的是rank里是可以用聚合函数的)、主查询中where筛选排序第1名的(rank和dense_rank都可以排出并列排名)。
select project_idfrom ( select project_id,rank() over(order by count(*) desc) as numfrom projectgroup by project_id) twhere num = 1;
第1082题
子查询+group by建立临时表得到每人总销售额、主表用having筛选出sum()总销售额>=all临时表里销售额的销售者id。
select seller_idfrom sales
group by seller_idhaving sum(price) >= all( select sum(price)from salesgroup by seller_id);
第1355题
方法1:子表分组+count(*)统计出每个活动人数、主表用having+count( *)筛选、筛选条件是> and < any 子表统计出来的人数。这里用any而不用all的原因是,只要在子表中找到1个活动的人数比主表人数多,那主表人数肯定就不是最大值了,同理子表中只要找到1个活动人数比主表少,那主表人数肯定就不是最小值了,两个条件加起来就能筛选出中间值。
select activityfrom friends
group by activityhaving count(*) > any( select count(*)from friendsgroup by activity)and count(*) < any( select count(*)from friendsgroup by activity);
方法2:with临时表中rank() over(order by count(*))按人数排序、主查询中筛选排名不等于1且< max(排名) 的数据。
with temp as ( select activity,rank() over(order by count(*)) as rankingfrom friendsgroup by activity)select activityfrom tempwhere ranking not in (select max(ranking)from temp)and ranking <> 1;
第1867题
子查询中查出每个order_id分组下quantity的平均值、主查询中再次group by 分组并用having 筛选出max(quantity) > all子查询平均值的数据。
select order_idfrom ordersdetails
group by order_idhaving max(quantity) > all( select avg(quantity)from ordersdetailsgroup by order_id);
13. like模糊匹配和regexp 正则表达式
功能 | 用法 |
---|---|
模糊查询 | where 列名 like ‘…’ |
正则查询 | where 列名 regexp ‘…’ |
第1517题
正则匹配。
select user_id,name,mailfrom userswhere mail regexp('^[a-z][-a-z_./0-9]*@leetcode[.]com$');
第1527题
like 模糊匹配。
select patient_id,patient_name,conditionsfrom patientswhere conditions like 'DIAB1%' or conditions like '% DIAB1%';
14. 临时表
用法:with 表名 as (查询语句)。之后要使用时就用表名。
第601题(找连续区间用编号法,同1285题)
临时表中筛选出people >= 100的数据,再用row_number()编号,计算id和编号的差值diff,方便后续根据diff分组。子查询中根据diff,group by 分组、having + count() 筛选出连续3次及以上的diff、主查询中筛选出diff in 子查询的数据。
with temp as (select id,visit_date,people,id - row_number() over(order by id) as difffrom stadiumwhere people >= 100)
select id,visit_date,peoplefrom tempwhere diff in ( select difffrom tempgroup by diffhaving count(*) >= 3)
第1164题
方法1:子查询中分组聚合查出日期在8月16号前的max(日期),即16号前的最近改动日期(这里没法连着new_price一起查哈!分组后的数据是不会自动一一对应的)、再套一层查询用where (id, 日期) in 子查询,查出id和new_price、将之前的查询结果外联结到一个distinct id的子查询表中、这样联上了的就按联上的价格取,没联上的就是10元。
方法2:既然原表没有初始10元这个数据,我们就自己加上去。创建临时表:原表union all 查出每个id原价10元的子查询(包含distinct id、10、初始日期’0000-00-00’),之后涉及from的全部用这个临时表、where筛选出16号之前的数据、同方法1先查出id和max(日期)、再套一层查询得到id和价格。
with temp as (select product_id,new_price,change_datefrom productsunion all select distinct product_id,10,'0000-00-00'from products
)select product_id,new_price as price from tempwhere (product_id, change_date) in ( select product_id,max(change_date) as change_datefrom tempwhere change_date <= '2019-08-16'group by product_id);
第1949题
由于每个人的朋友散布在user1_id和user2_id两列中,所以先用union all将原表和交换两列左右顺序后的新表进行并集,以便后续找某个id的朋友时只需要看第二列即可,再用临时表将上述查询结果保存下来取名为temp以免重复写。原表t1 inner join temp表 t2,将t1.user1_id的朋友联过来、继续inner join temp表t3,将t1.user2_id和前面联的t1.user1_id的共同朋友联过来、group by分组、count(t3中的id)计数、having筛选。
with temp as ( select user1_id,user2_id from friendshipunion all select user2_id,user1_idfrom friendship)select t1.user1_id,t1.user2_id,count(t3.user2_id) as common_friendfrom friendship t1 inner join temp t2on t1.user1_id = t2.user1_idinner join temp t3on t1.user2_id = t3.user1_idand t2.user2_id = t3.user2_id
group by t1.user1_id, t1.user2_idhaving common_friend >= 3;
第2238题
方法1:with创建去重后的driver_id临时表、临时表left join原表,将driver_id和passenger_id相等的地方联结起来、分组后count()计数。
with temp as(select distinct driver_id from rides
)select t1.driver_id,count(t2.ride_id) as cntfrom temp t1 left join rides t2on t1.driver_id = t2.passenger_id
group by t1.driver_id;
方法2:和方法1类似,但不创建临时表。由于没有创建临时表去重,每个相同的driver_id都对应同样的多个passenger_id。例如7号当了2次司机3次乘客,这样联结就变成了每次当司机的数据都连上了3次乘客数据,count()数出来就一共当了6次乘客。所以还要再用子查询查出每个人当司机的次数,联上原表后除以这个次数。
15. recursive递归
第571题
临时temp表中用recursive递归达到“解压数字”效果。子查询中用row_number()窗口函数分别按正反序编号、count()窗口函数加一列总数、主查询中筛选出中位数、avg求平均值。
with recursive temp as(select num,frequencyfrom numbersunion allselect num,frequency - 1from tempwhere frequency > 1
)select round(avg(num), 1) as medianfrom (select num,row_number() over(order by num) as rk1,row_number() over(order by num desc) as rk2,count(*) over() as cntfrom temp) t where rk1 >= cnt / 2and rk2 >= cnt / 2
第1613题
with + recursive 递归创建临时表、从临时表中查出不在customers表中,且小于customers表最大id的数据。
with recursive temp as (select 1 as n union allselect n + 1 from tempwhere n < 100
)select n as idsfrom tempwhere n not in (select customer_idfrom customers)and n < (select max(customer_id)from customers);
第1651题
临时boxes表中创建1到10的序号列。主查询中将rides表left join到boxes表上,连接条件是月份不能超过n 2个月,顺便把2020年以外的数据剔除、再left join AcceptedRides表、group by分组、计算对应数据。
with recursive boxes as (select 1 as nunion allselect n + 1from boxes where n <= 9
)select n as month,coalesce(round(sum(ride_distance) / 3, 2), 0) as average_ride_distance,coalesce(round(sum(ride_duration) / 3, 2), 0) as average_ride_durationfrom boxes t1 left join rides t2on month(t2.requested_at) between t1.n and t1.n + 2and year(t2.requested_at) = 2020left join acceptedrides t3on t2.ride_id = t3.ride_id
group by t1.n
order by t1.n;
第1767题
with recursive创建临时temp表、递归条件是从subtasks_count开始递减,减到1为止。temp表left join executed表,连不上的地方就是未完成任务(这么做比where not in找差集效率高很多)。
with recursive temp as (select task_id,subtasks_countfrom tasksunion allselect task_id,subtasks_count - 1 as subtasks_countfrom tempwhere subtasks_count >= 2
)select t1.task_id,subtasks_count as subtask_idfrom temp t1 left join executed t2on t1.task_id = t2.task_idand t1.subtasks_count = t2.subtask_idwhere t2.task_id is null;