【力扣刷题总结】数据库题目按知识点分类总结

article/2025/10/9 2:27:01

观前提示:

  1. 这个图先自行背诵至滚瓜烂熟。

    img

    文档中论述题目解题思路时没有特殊情况都按照上面的程序执行顺序为准,个别题也会以其他更自然的顺序讲解。

  2. 本文的知识点整理仅起提示作用,一些不常用功能、参数可能会遗漏,如需了解详细用法请自行百度。

  3. 有的题涉及多个知识点,则取按重要性排名的前2-3个知识点归入相应分类(同一道题可能在本文出现多次)。

  4. 本文持续更新,更新顺序基本是由易到难。

  5. 有些明显有上下级关系的分类,则只取高级分类归入。如差集必然涉及子查询,则只归入差集。自联结是普通联结中的特殊情况则不再归入普通联结。

  6. 很多复杂的题写子查询套太多层,就统一使用临时表了。

文章目录

  • 0. 常用聚合函数
  • 1. distinct去重
        • 第176题
        • 第177题
        • 第597题
        • 第1045题
        • 第1113题
        • 第1141题
        • 第1149题
        • 第1919题
        • 第2082题
        • 第2205题
        • 第2356题
  • 2. where 筛选
      • 普通用法
        • 第584题
        • 第1132题
        • 第1148题
        • 第1495题
        • 第2026题
      • (列1, 列2, ...) in 语句
        • 第184题
        • 第512题
        • 第585题
        • 第614题
        • 第1070题
        • 第1164题
        • 第1174题
      • not in 语句(找差集)
        • 第183题
        • 第607题
        • 第1083题
        • 第1084题
        • 第1264题
        • 第1350题
        • 第1412题
        • 第1607题
        • 第1978题
      • and 同时满足多个条件
        • 第196题
        • 第620题
        • 第1757题
        • 第2020题
        • 第2230题
      • or 满足多个条件中的一个条件即可
        • 第595题
  • 3. 联结
    • 3.1 笛卡尔积
        • 第1280题
        • 第1783题(列转行/宽转长)
        • 第1990题(查固定字段)
    • 3.2 内外联结
      • 内联结
        • 第1073题
        • 第1251题
        • 第1571题
        • 第1587题
        • 第1623题
        • 第1731题
        • 第1919题
        • 第2329题
        • 第2339题
      • 外联结
        • 第175题
        • 第262题
        • 第577题
        • 第580题
        • 第618题
        • 第1068题
        • 第1098题
        • 第1132题
        • 第1158题
        • 第1241题
        • 第1364题(四表联查)
        • 第1378题
        • 第1407题
        • 第1421题
        • 第1581题
        • 第1677题
        • 第1715题
        • 第1809题
        • 第1988题
        • 第2142题
    • 3.3 自连接
        • 第180题
        • 第181题
        • 第196题
        • 第197题
        • 第570题
        • 第603题
        • 第612题
        • 第613题
        • 第626题
        • 第1097题
        • 第1270题
        • 第1285题(连续区间)
        • 第1321题
        • 第1459题
        • 第1747题
        • 第1843题
        • 第1875题
        • 第1951题
        • 第1939题
        • 第1949题
        • 第1951题
        • 第2084题
        • 第2228题
  • 4. union 求并集(常用于列转行/宽转长)
        • 第602题
        • 第1205题
        • 第1212题
        • 第1264题
        • 第1341题
        • 第1435题
        • 第1501题
        • 第1555题
        • 第1699题
        • 第1789题
        • 第1783题
        • 第1795题
        • 第1811题
        • 第1841题
        • 第1907题
        • 第1949题
        • 第1965题
        • 第1990题(查固定字段)
  • 5. 子查询/嵌套查询
        • 第176题
        • 第177题
        • 第597题
        • 第1076题
        • 第1082题
        • 第1107题
        • 第1112题
        • 第1126题
        • 第1174题
        • 第1241题
        • 第1303题
        • 第1468题
        • 第1821题
        • 第1831题
        • 第1867题
        • 第2112题
        • 第2142题
  • 6. 字符串相关函数
        • 第1543题
        • 第1565题
        • 第1667题
        • 第1683题
  • 7. 时间相关函数
        • 第1141题
        • 第1142题
        • 第1454题(连续区间)
        • 第1479题
        • 第1843题
        • 第1853题
        • 第1890题
        • 第1939题
        • 第2298题
  • 8. group by 分组
      • 单字段分组
        • 第511题
        • 第586题
        • 第619题
        • 第1069题
        • 第1075题
        • 第1194题
        • 第1211题
        • 第1322题
        • 第1327题
        • 第1484题
        • 第1571题
        • 第1633题
        • 第1729题
        • 第1821题
        • 第1890题
      • 多字段联合分组
        • 第1050题
        • 第1149题
        • 第1193题
        • 第1543题
        • 第1693题
        • 第1699题
        • 第1741题
        • 第1951题
  • 9. having 筛选
        • 第182题
        • 第586题
        • 第596题
        • 第1050题
        • 第1076题
        • 第1501题
        • 第1511题
        • 第1587题
        • 第1867题
        • 第2041题
  • 10. if/case 语句
      • 简单的case语句(即只能判断是否相等)
        • 第578题
        • 第610题
        • 第627题
        • 第1083题
        • 第1126题
        • 第1173题
        • 第1179题(行转列/长转宽)
        • 第1211题
        • 第1393题
        • 第1445题
        • 第1479题
        • 第1511题
        • 第1661题
        • 第1699题
        • 第1777题(行转列/长转宽)
        • 第1783题(列转行/宽转长)
        • 第1934题
        • 第2308题
      • 可搜索的case语句
        • 第608题
        • 第1294题
        • 第1398题
        • 第1440题
        • 第1468题
        • 第1873题
        • 第1907题
        • 第2051题
        • 第2072题
        • 第2298题
  • 11. 窗口函数
        • 第176题
        • 第177题
        • 第178题
        • 第185题
        • 第534题
        • 第550题
        • 第569题
        • 第571题
        • 第579题
        • 第613题
        • 第618题
        • 第1076题
        • 第1077题
        • 第1112题
        • 第1204题
        • 第1225题
        • 第1285题
        • 第1308题
        • 第1369题
        • 第1412题
        • 第1454题(连续区间)
        • 第1532题
        • 第1549题
        • 第1596题
        • 第1709题
        • 第1811题
        • 第1831题
        • 第1875题
        • 第1951题
        • 第2066题
        • 第2112题
        • 第2159题
        • 第2175题
        • 第2228题
        • 第2292题
        • 第2308题
        • 第2314题
        • 第2324题
        • 第2346题
  • 12. 限定谓词
        • 第574题
        • 第586题
        • 第1076题
        • 第1082题
        • 第1355题
        • 第1867题
  • 13. like模糊匹配和regexp 正则表达式
        • 第1517题
        • 第1527题
  • 14. 临时表
        • 第601题(找连续区间用编号法,同1285题)
        • 第1164题
        • 第1949题
        • 第2238题
  • 15. recursive递归
        • 第571题
        • 第1613题
        • 第1651题
        • 第1767题


0. 常用聚合函数

注意:由于基本每道题都在用聚合函数,就不把题总结进来了。

功能函数
求和sum([distinct]列名)
计数(可去重)count([distinct] 列名)
求平均值avg([distinct]列名)
求最大值max(列名)
求最小值min(列名)
字段拼接(可去重、指定顺序、指定分隔符)group_concat([distinct] 列名1 [order by 列名2 asc/desc] [separator 分隔符])

1. distinct去重

功能用法
单字段去重distinct 列名
多字段去重distinct 列名1,
列名2
使用聚合函数时去重聚合函数(distinct 列名)

第176题

方法1:子查询中distinct 去重、order by 排序、limit + offset 输出第二位、主查询再查一次把空值变成null。

SELECT(SELECT DISTINCTSalaryFROMEmployeeORDER BY Salary DESCLIMIT 1 OFFSET 1) AS SecondHighestSalary
;

方法2:子查询中dense_rank() 排序、主查询中where 筛选、sum()或者max()或者min()聚合函数把空值变成null。

select  max(salary) as SecondHighestSalaryfrom  (select  salary,dense_rank() over(order by salary desc) as rk from  employee) twhere  rk = 2;

第177题

类似第176题。用方法1要先set n = n - 1。

CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGINset n = n - 1;RETURN (# Write your MySQL query statement below.select  salaryfrom  (  select  distinct salary from  employeeorder by  salary desc) tlimit  1 offset n);
END

第597题

方法一:先distinct + 多字段得到去重后的临时表,再count()对临时表中的数据计数,无法直接count(distinct + 多字段)。

方法二:为了解决无法直接count(distinct + 多字段)的问题可以把多字段用concat()拼成单字段。

select  round(coalesce((t1.acc/t2.req), 0), 2) as accept_ratefrom  (select  count(distinct concat(requester_id, accepter_id)) as acc,'yyy' as tagfrom  RequestAccepted) t1 inner join(select  count(distinct concat(sender_id, send_to_id)) as req,'yyy' as tagfrom  FriendRequest) t2 on  t1.tag = t2.tag;

第1045题

group by 按用户id分组、子查询中得到所有产品的数量、having + count(distinct) 去重计数筛选出数量=所有产品数量的用户id。

  select  customer_idfrom  customer 
group by  customer_idhaving  count(distinct product_key) = (select  count(*)from  product);

第1113题

where 筛选,group by 分组,count(distinct) 统计。

  select  extra as report_reason,count(distinct post_id) as report_countfrom  actionswhere  action = 'report'and  action_date = '2019-07-04'
group by  extra;

第1141题

where + datediff 筛选、group by 分组、count(distinct) 计数。

  select  activity_date as day,count(distinct user_id) as active_usersfrom  activitywhere  datediff('2019-07-27', activity_date) < 30 and datediff('2019-07-27', activity_date) >= 0
group by  activity_date;

第1149题

子查询中先将表去重一次,排除看同一本书的情况。主查询中按viewer_id和view_date 联合分组、having+ count() 筛选出看了2篇及以上文章的人、最后distinct去重是排除同一个人多天读多本书的情况~~(太学霸了)~~。

  select  distinct viewer_id as id  from  (select  distinct article_id,viewer_id,view_datefrom  views) t
group by  viewer_id, view_datehaving  count(*) >= 2
order by  id;

第1919题

先给每个user1找朋友(user2):listens表和friendship表内联结,此时id大的人被去掉(因为没有必要同时出现2配5和5配2这样相同的组合)、再内联结listens表看每个user2当天听的和user1相同的歌、group by 按照user1_id、user2_id和day三字段联合分组、having + count(distinct)统计不同的歌曲数,大于3即可。最后select的时候也要distinct,因为有的user1和user2可能有很多天都听了同样的多首歌,会被筛出多次。

  select  distinct t1.user_id as user1_id,t3.user_id as user2_idfrom  listens t1 inner join friendship t2on  t1.user_id = t2.user1_idinner join listens t3on  t2.user2_id = t3.user_id and  t1.day = t3.dayand  t1.song_id = t3.song_id
group by  t1.user_id, t3.user_id, t1.dayhaving  count(distinct t1.song_id) >= 3;

第2082题

where 筛选、count(distinct)计数。

  select  count(distinct customer_id) as rich_countfrom  storewhere  amount > 500;

第2205题

where 筛选、count(distinct)计数。

CREATE FUNCTION getUserIDs(startDate DATE, endDate DATE, minAmount INT) RETURNS INT
BEGINRETURN (# Write your MySQL query statement below.select  count(distinct user_id) as user_cntfrom  purchaseswhere  amount >= minAmountand  time_stamp between startDate and endDate);
END

第2356题

count + distinct不重复计数。

  select  teacher_id,count(distinct subject_id) as cnt from  teacher
group by  teacher_id;

2. where 筛选

功能用法
单条件筛选where 条件语句
多条件筛选(与)where 条件1 and 条件2
多条件筛选(或)where 条件1 or 条件2
目标字段在/不在子查询结果中where 字段 in / not in (子查询)
目标元组在/不在子查询结果中where (字段1, 字段2) in / not in (子查询)

普通用法

第584题

coalesce() 处理null值、where 筛选。

select  namefrom  customerwhere  coalesce(referee_id, 0) <> 2;

第1132题

actions表 left join removals表 on post_id相同、where留下extra = 'spam’的数据(这里必须用where筛选,不能写在on后面)、round() + 100 * count(distinct t2.post_id) / count(distinct t1.post_id)得到每天的百分比(这里去重是因为有可能同一个垃圾广告被举报多次,就会有多条记录,但是实际上这只是1条广告,也只被删了1次)、再把前面的查询作为子查询,查出每天删除比率的平均值。

select  round(avg(d_ratio), 2) as average_daily_percentfrom  (  select  100 * count(distinct t2.post_id) / count(distinct t1.post_id) as d_ratiofrom  actions t1 left join removals t2on  t1.post_id = t2.post_idwhere  t1.extra = 'spam'group by  t1.action_date) t;

第1148题

where 筛选、distinct 去重、order by 排序。

  select  distinct author_id as idfrom  viewswhere  author_id = viewer_id
order by  author_id asc;

第1495题

inner join 两表联结、where + 筛选条件。

select  distinct t2.title as titlefrom  (select  content_idfrom  tvprogramwhere  date_format(program_date, "%Y-%m") = '2020-06') t1 inner join (select  title,content_idfrom  contentwhere  kids_content = 'Y'and  content_type = 'Movies') t2on  t1.content_id = t2.content_id;

第2026题

where + 筛选条件。

  select  problem_idfrom  problemswhere  likes / (likes + dislikes) < 0.6
order by  problem_id;

(列1, 列2, …) in 语句

第184题

先查出各部门最高工资表,再筛选出(部门,工资) in 各部门最高工资表中的员工。

  select  t2.name as department,t1.name as employee,t1.salaryfrom  employee t1 inner join department t2 on t1.departmentId = t2.idwhere  (t1.departmentId, salary)in  (  select  departmentId,max(salary)from  employeegroup by  departmentId);

第512题

先查出各player最早登录日期表,再筛选出(玩家id, 登陆日期) in 最早登录日期表中的玩家。

select  player_id,device_idfrom  activity t1where  (player_id, t1.event_date) in (  select  player_id,min(event_date)from  activitygroup by  player_id)

第585题

思路是“2015年和别人相同”、“坐标唯一”两个条件分开看,再用where in 取交集。

临时temp表中两个insurance表自连接,将2015年投资和别人一样的数据筛选出来、注意这样一个人可能联上了多条数据,所以最后要distinct去重。子查询中group by按经纬度多字段分组、having + count() 筛选出唯一坐标的pid(注意一般用了分组之后是要聚合的,但因为这次pid是唯一的,就不聚合了)。主查询从temp表里筛选出pid在子查询中的数据(即坐标唯一的数据),并sum()求和。

with temp as (select  distinct t1.PID,t1.TIV_2016from  insurance t1 inner join insurance t2on  t1.PID <> t2.PIDand  t1.TIV_2015 = t2.TIV_2015
)select  round(sum(TIV_2016), 2) as TIV_2016from  tempwhere  PID in (  select  PIDfrom  insurancegroup by  LAT, LONhaving  count(*) = 1);

第614题

利用where in (子查询) 筛选出followee列中在follower列中的人、分组计数、排序(题目让按字典顺序排,其实就是按followee字母顺序排)

  select  followee as follower,count(*) as numfrom  followwhere  followee in (select follower from follow)
group by  followee
order by  followee;

第1070题

子查询中group by按product_id分组得到每个product_id和他对应的min(year)、主查询中用where (product_id, year) in 子查询筛选出需要的数据。

select  product_id,year as first_year,quantity,pricefrom  saleswhere  (product_id, year) in (  select  product_id,min(year)from  salesgroup by  product_id);

第1164题

方法1:子查询中分组聚合查出日期在8月16号前的max(日期),即16号前的最近改动日期(这里没法连着new_price一起查哈!分组后的数据是不会自动一一对应的)、再套一层查询用where (id, 日期) in 子查询,查出id和new_price、将之前的查询结果外联结到一个distinct id的子查询表中、这样联上了的就按联上的价格取,没联上的就是10元。

方法2:既然原表没有初始10元这个数据,我们就自己加上去。创建临时表:原表union all 查出每个id原价10元的子查询(包含distinct id、10、初始日期’0000-00-00’),之后涉及from的全部用这个临时表、where筛选出16号之前的数据、同方法1先查出id和max(日期)、再套一层查询得到id和价格。

with temp as (select  product_id,new_price,change_datefrom  productsunion all select  distinct product_id,10,'0000-00-00'from  products
)select  product_id,new_price as price from  tempwhere  (product_id, change_date) in (  select  product_id,max(change_date) as change_datefrom  tempwhere  change_date <= '2019-08-16'group by  product_id);

第1174题

子查询查出用户首次购买数据、主查询统计order_date = pref_date的数据。这里注意一个小技巧:按条件计数,是即时订单+1,计划订单+0,可以用sum(case when),也可以直接写sum(条件),此时条件满足时自动把布尔值转换成1计算,不满足转换成0计算。

select  round(100 * sum(order_date = customer_pref_delivery_date) / count(*), 2) as immediate_percentagefrom  delivery where  (customer_id, order_date) in (  select  customer_id,min(order_date) as first from  deliverygroup by  customer_id);

not in 语句(找差集)

第183题

子查询出订了东西的用户、where not in 查出不订的用户。

select  Name as Customersfrom  Customers t1where  t1.Id not in (select CustomerId from Orders);

第607题

子查询出向’red’销售过的人、再用where not in找差集。

select  namefrom  salespersonwhere  sales_id not in (select  sales_idfrom  orders t1 natural join company t2where  t2.name = 'RED');

第1083题

方法一:分别where筛选出买S8和iphone的人,再not in 找差集。

select  distinct buyer_idfrom  sales t1 left join product t2on  t1.product_id = t2.product_idwhere  buyer_id not in (select  buyer_idfrom  sales t1 left join product t2on  t1.product_id = t2.product_idwhere  product_name = 'iPhone')and  product_name = 'S8';

方法二:用sum(case when)的方法绕开子查询达到找差集目的,具体做法是求和时遇到“S8”就+1,遇到“iphone”就-1,只买“S8”的人的求和结果就是1,都买了的人就是0。

  select  buyer_idfrom  sales t1 left join product t2on  t1.product_id = t2.product_idwhere  product_name in ('S8', 'iPhone')
group by  buyer_idhaving  sum(distinct case product_name when 'S8' then 1 else -1 end) = 1;

第1084题

子查询出在目标时间范围以外卖过的产品、where not in找差集。

select  product_id,product_namefrom  productwhere  product_id not in (select  distinct product_idfrom  saleswhere  sale_date > '2019-03-31'or  sale_date < '2019-01-01') 

第1264题

子查询中将friendship原表和交换左右顺序的原表union all取并集、套一层查询从第二列找到user_id=1的朋友、主查询中查distinct page_id,因为有的人喜欢同一个页面、where进行筛选,条件是user_id在上面子查询的朋友中且page_id不等于1号已经喜欢的page_id(要再用个子查询得到1号喜欢的page_id)。

select  distinct page_id as recommended_pagefrom  likeswhere  user_id in (select  user2_idfrom  (select  user1_id,user2_idfrom  friendshipunion all select  user2_id,user1_idfrom  friendship) t1where  user1_id = 1)and  page_id not in (select  page_idfrom  likeswhere  user_id = 1);

第1350题

not in 找差集。

select  id,namefrom  studentswhere  department_id not in (select id from departments);

第1412题

临时temp表中用min()、max()窗口函数找到每次考试最高最低分,再顺手求个当前学生分数和最高/最低分的分差、套一层查询筛选出两个分差乘积为0的id,这一步也就是在找考了最高分或最低分的id。

主查询中where筛选出在exam表(即参加了考试),但不在temp表(即没考到最高或最低分)中的人。

with temp as (select  distinct student_idfrom  (select  student_id,score - min(score) over(partition by exam_id order by score asc) as d1,score - max(score) over(partition by exam_id order by score desc) as d2from  exam) t where  d1 * d2 = 0    
)select  student_id,student_namefrom  studentwhere  student_id in (select  student_id from  exam)and  student_id not in (select  student_idfrom  temp);

第1607题

not in 找差集。

  select  seller_namefrom  seller t1where  t1.seller_id not in (select  seller_id from  orderswhere  year(sale_date) = '2020')
order by  seller_name;

第1978题

not in 找差集。

  select  employee_idfrom  employeeswhere  salary < 30000and  manager_id not in (select  employee_id from  employees)
order by  employee_id;

and 同时满足多个条件

第196题

inner join自连接、delete删除。

delete t1 from Person t1 inner join Person t2 on t1.email = t2.email and t1.id > t2.id;

第620题

where A and B。

  select  id,movie,description,ratingfrom  cinemawhere  description <> 'boring'and  id % 2 = 1
order by  rating desc;

第1757题

where A and B。

select  product_idfrom  productswhere  low_fats = 'Y' and  recyclable = 'Y';

第2020题

natural join 联结两表、where 筛选。

select  count(*) as accounts_countfrom  subscriptions t1 natural join streams t2where  year(end_date) = '2021'and  year(stream_date) != '2021';

第2230题

where A and B。

CREATE PROCEDURE getUserIDs(startDate DATE, endDate DATE, minAmount INT)
BEGIN# Write your MySQL query statement below.select  distinct user_id from  purchaseswhere  amount >= minAmountand  time_stamp between startDate and endDateorder by  user_id;
END

or 满足多个条件中的一个条件即可

第595题

where 条件1 or 条件2。

select  name,population,areafrom  worldwhere  area >= 3000000 or population >= 25000000;

3. 联结

preview


3.1 笛卡尔积

笛卡尔积是理解所有联结的基础。笛卡尔积就是不管表1和表2有没有关系,将他们的数据强行组合在一起。例如表1有2排,表2有3排,笛卡尔积则有2×3=6排。

所有联结的执行方式都是先产生两表的笛卡尔积,再在此基础上进行筛选出需要的数据。

笛卡尔积用于得到两表行与行之间的全部排列组合情况。

功能用法
产生两个表的笛卡尔积(不能跟筛选条件)from 表1, 表2
产生两个表的笛卡尔积(能跟筛选条件)from 表1 cross join 表2

注意:

(1)inner join / join 不加筛选条件也得到笛卡尔积,但不建议这样写,表述不清。left join / right join 不能不跟筛选条件。

(2)之所以使用笛卡尔积就是为了得到排列组合的所有情况,否则还不如使用inner join,所以原则上来说笛卡尔积后面是不跟筛选条件的。

(3)由于本人水平原因,很多题使用了先笛卡尔积再筛选的写法,类似先while True再break,其实都可以优化成inner join,大家不要盲目照搬。


第1280题

from 表1 cross join 表2得到笛卡尔积,再left join表3、group by分组、count()计数。

这里注意2点:① 因为表1和表2无关所以才适合用cross join,否则一般不用;② 一般不能两个join连着写最后再写on条件,但这次因为前面是cross join本来就不需要跟筛选条件所以没关系。

  select  t1.student_id,t1.student_name,t2.subject_name,count(t3.subject_name) as attended_examsfrom  students t1 cross join subjects t2 left join examinations t3on  t1.student_id = t3.student_idand  t2.subject_name = t3.subject_name
group by  t1.student_id, t2.subject_name
order by  t1.student_id, t2.subject_name;

第1783题(列转行/宽转长)

方法1:用多个union all 将championships表的数据汇总到一列并将列名改成player_id、natural join players 表、group by 分组、count() 统计。

  select  player_id,player_name,count(*) as grand_slams_countfrom  (   select  wimbledon as player_idfrom  championshipsunion allselect  fr_open as player_idfrom  championships t1union allselect  US_open as player_idfrom  championshipsunion allselect  Au_open as player_idfrom  championships) t1 natural join players t2
group by  player_id;

方法2:两张表生成笛卡尔积,此时每个球员都对应一份championships表的全部数据、group by 分组、sum(case when) 筛选统计。

SELECT player_id,player_name,      SUM(IF(player_id=Wimbledon,1,0)+IF(player_id=Fr_open,1,0)+IF(player_id=US_open,1,0)+IF(player_id=Au_open,1,0)) AS grand_slams_count
FROM Championships 
JOIN Players 
GROUP BY player_id
HAVING grand_slams_count>0

第1990题(查固定字段)

union all枚举创建platform表和experiment_name表、cross join得到这两个表笛卡尔积、left join experiments表+group by+count(*) 计数。

  select  t1.platform,t2.experiment_name,count(t3.experiment_name) as num_experimentsfrom  (select  'Android' as platform union allselect  'IOS' as platform union allselect  'Web' as platform) t1 cross join(select  'Reading' as experiment_name union allselect  'Sports' as experiment_name union allselect  'Programming' as experiment_name) t2 left joinexperiments t3on  t1.platform = t3.platformand  t2.experiment_name = t3.experiment_name
group by  platform, experiment_name;

3.2 内外联结

功能用法
内联结表1 inner join 表2 on 条件
自然联结表1 natural join 表2
左外联结表1 left join 表2 on 条件
右外联结表1 right join 表2 on 条件

注意:

(1)natural join是一种特殊的inner join,区别是:

① natural join会选择外键进行连接,没有外键时自动选择两表相同字段进行联结,而无需设置on条件。

② natural join后两表相同的字段即被合并,而普通的inner join不会自动合并。

例如:表1有id、性别2列,表2有id、姓名2列。

inner join后的表有4列:表1id、表1性别、表2id、表2姓名。

natural join后的表只有3列:id、表1性别、表2姓名。

(2)通过1的例子可以发现,inner join(外联结也一样)后经常会出现重复的列名,为了避免提示ambiguous错误,建议任何联结时都给表标t1、t2别名,且使用任何列名时都以t1.列名、t2.列名的形式书写。

(3)外联结一般用于保留联结时联不上的地方产生的null值。即使用外联结以外的联结方式并跟上筛选条件时,未联上/选上的行会被自动排除,但使用外联结时则会保留并显示null。

(4)如果涉及多个表联结,表1 join 表2后马上跟on条件,不要都join完了最后再来慢慢写筛选条件。

(5)两表内联结时如果要联结的列名相同,则可以用on 表1.列名 = 表2.列名,还可以用using(列名)。区别是第一种用法不会合并用于联结的同名列,而using()会合并,效果类似natural join,使用using()时前面不要写on。

(6)内联结时把筛选条件写在on里和联完后的where里对查询结果是没有影响的,但外联结就有影响。原因是:在外联结中,筛选条件写在on里,没联上的数据也会保留null值,而筛选条件写在where里,没选上的数据会直接消失。例如:

表1

id姓名出生日期
1张三1997-8-22
2李四null
3王五1997-8-20

表2

id性别
1
2

表1 left join 表2 on 表1.id = 表2.id and t1.出生日期 is not null

t1.idt1.姓名t2.idt2.性别
1张三1
2李四nullnull
3王五nullnull

注意观察出生日期是null的李四还在表里,只是没跟她的性别联上而已。如果后续我们进行一个count()操作,会发现数出来有3人。


表1 left join 表2 on 表1.id = 表2.id where t1.出生日期 is not null

t1.idt1.姓名t2.idt2.性别
1张三1
3王五nullnull

出生日期是null的李四直接消失。如果后续我们进行一个count()操作,会发现数出来不是上一个情况的3人,而是2人。


内联结

第1073题

inner join 两表联结、group by 分组、avg()算平均数。

第1251题

inner join 内联结、group by分组、round()和sum()等函数计算数据。

  select  t1.product_id,round(sum(t2.units * t1.price) / sum(t2.units), 2) as average_pricefrom  Prices t1 inner join UnitsSold t2on  t2.purchase_date between t1.start_date and t1.end_dateand  t1.product_id = t2.product_id
group by  product_id;

第1571题

inner join 两表联结、group by 分组、sum()求和。

  select  name as WAREHOUSE_NAME,sum(units * v_pro) as VOLUMEfrom  warehouse t1 inner join (select  product_id,width * length * height as v_profrom  products) t2on  t1.product_id = t2.product_id
group by  t1.name;

第1587题

inner join 两表联结、group by 分组、sum()求和、having 筛选。

第1623题

一边inner join一边筛选。

select  t1.student_name as member_A,t2.student_name as member_B,t3.student_name as member_Cfrom  schoola t1 inner join schoolb t2on  t1.student_name <> t2.student_name and  t1.student_id <> t2.student_idinner join schoolc t3on  t2.student_name <> t3.student_nameand  t1.student_name <> t3.student_nameand  t1.student_id <> t3.student_idand  t3.student_id <> t2.student_id;

第1731题

inner join 把经理名字联到reports_to的id后面、group by 按经理id分组、count() 计数、avg() 求平均值、round() 四舍五入。

  select  t1.reports_to as employee_id,t2.name,count(t1.reports_to) as reports_count,round(avg(t1.age), 0) as average_agefrom  employees t1 inner join employees t2on  t1.reports_to = t2.employee_id
group by  t1.reports_to
order by  t2.employee_id;

第1919题

先给每个user1找朋友(user2):listens表和friendship表内联结,此时id大的人被去掉(因为没有必要同时出现2配5和5配2这样相同的组合)、再内联结listens表看每个user2当天听的和user1相同的歌、group by 按照user1_id、user2_id和day三字段联合分组、having + count(distinct)统计不同的歌曲数,大于3即可。最后select的时候也要distinct,因为有的user1和user2可能有很多天都听了同样的多首歌,会被筛出多次。

  select  distinct t1.user_id as user1_id,t3.user_id as user2_idfrom  listens t1 inner join friendship t2on  t1.user_id = t2.user1_idinner join listens t3on  t2.user2_id = t3.user_id and  t1.day = t3.dayand  t1.song_id = t3.song_id
group by  t1.user_id, t3.user_id, t1.dayhaving  count(distinct t1.song_id) >= 3;

第2329题

inner join两表联结、group by按用户id分组、sum(销量*单价)得到每个人总开销、按要求排序。

  select  user_id,sum(quantity * price) as spendingfrom  Sales t1 inner join Product t2using(product_id) 
group by  user_id
order by  spending desc, user_id asc;

第2339题

inner join两表联结、on筛选出不同的配对。

select  t1.team_name as home_team,t2.team_name as away_team from  teams t1 inner join teams t2on  t1.team_name <> t2.team_name;

外联结

注意:为了方便起见,本文档所有能用左/右外联结的地方全部用左外联结记录,实际做题时各人视情况而定。

第175题

left join 左外联结。

select  firstName,lastName,city,statefrom  Person t1 left join Address t2on  t1.personId = t2.personId;

第262题

子查询中先查出被禁的用户和司机,减少数据量、trips表 left join 子查询,保留连不上的地方以达到求差集目的(这种做法比where not in 找差集效率更高)、where继续筛选日期、group by 按日期分组、聚合函数统计、别名的时候记得打反引号,因为不打反引号时不允许出现空格。

  select  request_at as `Day`,round(1 - sum(status='completed') /count(*), 2) as `Cancellation Rate`from  trips t1 left join (select  users_idfrom  userswhere  banned = 'Yes') t2on  t1.client_id = t2.users_idor  t1.driver_id = t2.users_idwhere  t2.users_id is nulland  request_at between '2013-10-01' and '2013-10-03'
group by  request_at;

第577题

left join 左外联结、coalesce() 处理null。

select  t1.name,t2.bonusfrom  employee t1 left join bonus t2 on  t1.empid = t2.empidwhere  coalesce(t2.bonus, 0) < 1000;

第580题

部门表left join 学生表、group by 分组、count() 计数、coalesce() 处理null、order by排序。

  select  dept_name,coalesce(count(t2.dept_id), 0) as student_numberfrom  department t1 left join student t2on  t1.dept_id = t2.dept_id
group by  t1.dept_id
order by  student_number desc, dept_name asc;

第618题

第一个子查询查美洲学生,后面的学生都没美洲学生多,所以可以用left join联上去。给每个子查询都用row_number()窗口函数编号,方便后续按照编号联结。

select  America,Asia,Europefrom  (select  name as America,row_number() over(order by name) as rkfrom  studentwhere  continent = 'America') t1 left join(select  name as Asia,row_number() over(order by name) as rkfrom  studentwhere  continent = 'Asia') t2on  t1.rk = t2.rkleft join(select  name as Europe,row_number() over(order by name) as rkfrom  studentwhere  continent = 'Europe') t3on  t1.rk = t3.rk;

第1068题

left join 左外联结。

select  `product_name`,`year`,`price`from  sales t1 left join product t2 on  t1.product_id = t2.product_id;

第1098题

这道题筛选条件比较多,如果先外联结再一口气筛选,可能会出现不知道把筛选条件写在on后面还是where后面或者一不小心写错的情况。所以思路是先通过子查询筛选一遍,再对筛选后的表外联结,这样条理比较清晰。

books表先用where筛掉日期不合规的数据、orders表也用where筛掉日期不合规的数据、两个查询结果left join、group by 分组、having + sum() 统计和筛选。

  select  t1.book_id,t1.namefrom  (select  book_id,namefrom  bookswhere  available_from <= '2019-05-23') t1 left join (select  book_id,quantityfrom  orderswhere  dispatch_date >= '2018-06-23') t2on  t1.book_id = t2.book_id
group by  t1.book_idhaving  coalesce(sum(t2.quantity), 0) < 10;

第1132题

actions表 left join removals表 on post_id相同、where留下extra = 'spam’的数据(这里必须用where筛选,不能写在on后面)、round() + 100 * count(distinct t2.post_id) / count(distinct t1.post_id)得到每天的百分比(这里去重是因为有可能同一个垃圾广告被举报多次,就会有多条记录,但是实际上这只是1条广告,也只被删了1次)、再把前面的查询作为子查询,查出每天删除比率的平均值。

select  round(avg(d_ratio), 2) as average_daily_percentfrom  (  select  100 * count(distinct t2.post_id) / count(distinct t1.post_id) as d_ratiofrom  actions t1 left join removals t2on  t1.post_id = t2.post_idwhere  t1.extra = 'spam'group by  t1.action_date) t;

第1158题

left join 左外联结、group by 分组、where筛选日期、year()截年份、count()计数。

  select  user_id as buyer_id,join_date,count(buyer_id) as orders_in_2019from  users t1 left join (select  buyer_idfrom  orderswhere  year(order_date) = '2019') t2 on  t1.user_id = t2.buyer_id
group by  user_id;

第1241题

子查询+where筛选查出两个表(帖子表、评论表)、帖子表left join评论表、count计数。

  select  t1.sub_id as post_id,count(t2.sub_id) as number_of_commentsfrom  (select  distinct sub_id, parent_id from  submissionswhere  parent_id is null) t1 left join(select  distinct sub_id, parent_id from  submissionswhere  parent_id is not null) t2on  t1.sub_id = t2.parent_id  
group by  t1.sub_id
order by  post_id;

第1364题(四表联查)

invoices表内联结customers表、left join contacts表以统计联系人数量、再left join customers表(这里是用邮箱一一对应联结的)以统计可信联系人数量、group by分组、count()统计。

  select  t1.invoice_id,t2.customer_name,t1.price,count(t3.contact_name) as contacts_cnt,count(t4.customer_name) as trusted_contacts_cntfrom  invoices t1 inner join customers t2on  t1.user_id = t2.customer_idleft join contacts t3on  t2.customer_id = t3.user_idleft join customers t4on  t3.contact_email = t4.email
group by  t1.invoice_id
order by  t1.invoice_id;

第1378题

left join 左外联结。

select  unique_id,namefrom  employees t1 left join employeeuni t2 on t1.id = t2.id;

第1407题

left join 左外联结、group by 分组、sum() 统计、coalesce 处理null、order by 排序。

  select  name, coalesce(sum(distance), 0) as travelled_distancefrom  Rides t1 right join  users t2 on t1.user_id = t2.id 
group by  t1.user_id
order by  travelled_distance desc, name asc;

第1421题

left join 左外联结。

select  t2.id,t2.year,coalesce(npv, 0) as npvfrom  npv t1 right join queries t2on  t1.id = t2.idand  t1.year = t2.year;

第1581题

left join 左外联结、where 筛选、group by 分组、count() 计数。

  select  customer_id,count(*) as count_no_transfrom  visits t1 left join transactions t2 on t1.visit_id = t2.visit_idwhere  t2.amount is null
group by  customer_id;

第1677题

product表 left join 左外联结 invoice表、sum() 求和、coalesce() 把null变成0。

  select  name,coalesce(sum(rest), 0) as rest,coalesce(sum(paid), 0) as paid,coalesce(sum(canceled), 0) as canceled,coalesce(sum(refunded), 0) as refundedfrom  product t1 left join invoice t2on  t1.product_id = t2.product_id
group by  t1.product_id
order by  name;

第1715题

left join 左外联结把chests表拼到boxes表上、sum()求和。

select  sum(t1.apple_count + coalesce(t2.apple_count, 0)) as apple_count,sum(t1.orange_count + coalesce(t2.orange_count, 0)) as orange_countfrom  boxes t1 left join chests t2on  t1.chest_id = t2.chest_id;

第1809题

left join 左外联结、where筛选。

select  session_idfrom  playback t1 left join ads t2on  t1.customer_id = t2.customer_idand  t2.timestamp between t1.start_time and t1.end_timewhere  t2.timestamp is null;

第1988题

left join 左外联结、where筛选、group by分组、min()求能接受的最低分、coalesce处理null值。

  select  school_id,coalesce(min(score), -1) as scorefrom  schools t1 left join exam t2on  t1.capacity >= t2.student_count
group by  school_id;

第2142题

子查询中将乘客表和公交车表内联结、筛选出乘客能坐的所有车(即后于乘客到达的车)、group by按乘客id分组、min()选出时间最早的车(即乘客真正坐的车)。主查询中公交车表 left join 子查询、group by按公交车id分组、count()计数、order by排序。

  select  t3.bus_id,count(t.tm) as passengers_cntfrom  buses t3 left join (  select  min(t2.arrival_time) as tmfrom  passengers t1 inner join buses t2on  t1.arrival_time <= t2.arrival_timegroup by  passenger_id) ton  t3.arrival_time = t.tm
group by  t3.bus_id
order by  bus_id asc;

3.3 自连接

用法:from 学生表 t1, 学生表 t2

表示把学生表自己和自己联结产生笛卡尔积并别名t1和t2以区别笛卡尔积中的数据。

注意:很多自连接的题都可以用lag() 窗口函数做,由于作者刚开始编辑本文时水平有限并没有写入lag() 窗口函数的方法。同理,一些lag() 窗口函数的题也可以用自连接做,对于这些题就不列全方法了,大家有兴趣可以自行思考


第180题

自连接再筛选

select  distinct t1.num as ConsecutiveNumsfrom  logs t1 inner join logs t2on  t1.id + 1 = t2.id and  t1.num = t2.numinner join logs t3on  t2.id + 1 = t3.idand  t1.num = t3.num;

第181题

两个employee表inner join 自联结,再筛选。

select  t1.name as 'Employee'from  Employee t1 inner join Employee t2on  t1.salary > t2.salary and t1.managerId = t2.id;

第196题

inner join自连接、delete删除。

delete t1 from Person t1 inner join Person t2 on t1.email = t2.email and t1.id > t2.id;

第197题

两个原表自连接制造笛卡尔积、datediff() 求日期差、where 筛选。

select  t2.idfrom  Weather t1 inner join Weather t2on  datediff(t2.recordDate, t1.recordDate) = 1and  t1.Temperature < t2.Temperature;

第570题

两个原表inner join自连接、group by 分组、having + count(*) 筛选。

  select  t2.namefrom  employee t1 inner join employee t2on  t1.managerid = t2.id
group by  t1.manageridhaving  count(*) >= 5;

第603题

两个原表inner join自连接再筛选。还可能用到abs()绝对值函数。

  select  distinct t1.seat_idfrom  cinema t1 inner join cinema t2on  abs(t1.seat_id - t2.seat_id) = 1 and t1.free = 1 and t2.free = 1
order by  t1.seat_id;

第612题

两个原表inner join 自连接、筛选出与表1点不同的表2点、按照题目公式计算表1表2点的距离(需要用到power()求平方、sqrt()开方、min()求最短距离、round()保留2位小数)。

select  round(min(sqrt(power((t1.x - t2.x),2) + power((t1.y - t2.y), 2))), 2) as shortestfrom  point2D t1 inner join point2D t2on  t1.x <> t2.xor  t1.y <> t2.y;

第613题

方法一:两表inner join 自连接再筛选。计算出任意两点间的距离,再min()找出最小的距离。

方法二:由于最小距离只存在于相邻两个点中,所以没必要算出全部任意两点间的距离。先用窗口函数rank排序+编号。再左外联结拼出一张同时有某点和它右一个点的坐标的临时表,并select出这个临时表中相邻两点的距离。最后min()找出最小距离。

select  distinct min(dis) as shortestfrom  (select  abs(t2.x - t1.x) as disfrom  (  select  x, rank() over(order by x) as idfrom  point) t1 left join (  select  x,rank() over(order by x) as idfrom  point) t2on  t1.id + 1 = t2.id) t3;

第626题

两个原表inner join 自连接,再筛选。

  select  t1.id,t2.studentfrom  seat t1 inner join seat t2on  (t1.id % 2 = 1 and (t1.id + 1 = t2.id or (t1.id = (select max(id) from seat) and t1.id = t2.id)))or  (t1.id % 2 = 0 and t1.id - 1 = t2.id)
order by  t1.id;

第1097题

子查询中查出每个人的首次登陆日期、left join自连接再接上每个人第二天登陆的行(没有就保留空)、group by按照安装日期分组、count(安装日期)统计安装人数、count(表2的player_id) / count(安装日期)得到留存率,最后四舍五入一下。

  select  install_dt,count(t1.install_dt) as installs,round(count(t2.player_id) / count(t1.install_dt), 2) as Day1_retention from  (  select  min(event_date) as install_dt,player_idfrom  activitygroup by  player_id) t1 left join activity t2on  t1.install_dt + 1 = t2.event_dateand  t1.player_id = t2.player_id
group by  install_dt;

第1270题

方法1:重复3次employees 表都用 inner join 联起来、where 筛选第三层manager_id = 1的人(不管是哪层的人最终第三层的manager_id都会回到1)。

    select  t1.employee_idfrom  employees t1 
inner join  employees t2 on t1.manager_id = t2.employee_id 
inner join  employees t3 on t2.manager_id = t3.employee_idwhere  t3.manager_id = 1and  t1.employee_id <> 1;

方法2:union 把每个层级的人联到一起。

第1285题(连续区间)

方法1:两个logs表cross join自联结、起点-1不在logs表中,终点+1不在logs表中,且终点值大于起点值,借此筛选出起点和终点、由于用了笛卡尔积,每个起点都对应全部终点,所以再用group by和min() 函数选出终点。

  select  t1.log_id as start_id,min(t2.log_id) as end_idfrom  logs t1 inner join logs t2on  t1.log_id <= t2.log_idwhere  (t1.log_id - 1) not in (select  log_id from  logs)and  (t2.log_id + 1) not in (select  log_id from  logs)
group by  start_id
order by  start_id;

方法2:基本原理:连续区间中的数与某一基准等差数列的差值是固定的。例如123567与123456一一对应相减的差是000111,可借此差值对区间进行分类。

log_id - row_number() over() 得到差值、借之前的差值分组、每组最小的为start,最大的为end。

第1321题

原表先对visited_on去重(不然后面连接时会连多次)再自连接、连接条件是datediff(t2日期, t1日期) between 0 and 6。这样仍然不满足题意,题目意思是必须连续7天才计算一次,像例如4号这种连上了1,2,3三天的应该筛掉,所以可以加个having count(distinct)排除掉连上的日期不足7天的数据,也可以用where筛选掉日期不足最小日期(用子查询得到)+6的数据。

  select  t1.visited_on,sum(t2.amount) as amount,round(sum(t2.amount)/7, 2) as average_amountfrom  (select  distinct visited_on from  customer) t1 inner join customer t2on  datediff(t1.visited_on, t2.visited_on) between 0 and 6
group by  t1.visited_onhaving  count(distinct t2.visited_on) = 7
order by  t1.visited_on;

第1459题

用大炮发射法(即1连2、3,2连3)将两个points表按照inner join自连接、筛选出x和y坐标都不一样的数据、计算面积、order by 排序。

  select  t1.id as p1,t2.id as p2,abs((t1.x_value - t2.x_value) * (t1.y_value - t2.y_value)) as areafrom  points t1 inner join points t2on  t1.id < t2.idand  t1.x_value <> t2.x_valueand  t1.y_value <> t2.y_value
order by  area desc, p1, p2;

第1747题

两个loginfo表自连接、筛选出表2登录时间between表1登录和下线时间的数据、还要加个distinct因为有的账户出问题可能不止一次。

select  distinct t1.account_idfrom  loginfo t1 inner join loginfo t2on  t1.account_id = t2.account_idand  t1.ip_address <> t2.ip_addressand  t2.login between t1.login and t1.logout;

第1843题

临时表temp中将transactions表和accounts表inner join,on id相等、where筛选出所有‘creditor’数据、group by 按账户id和年月(需要用date_format取)联合分组、sum() 计算每人每月总收入、having 筛选出总收入 > 最大收入的数据。主查询中将两个temp表inner join自连接、on id相等且月份差1,求月份差用period_diff()、这样留下的人都是可疑的,只不过有的人可能连续多个月可疑,就会被选出多次,所以还要来个distinct。

with temp as (select  t1.account_id,sum(t1.amount) as all_amount,t2.max_income,date_format(t1.day, "%Y%m") as dfrom  transactions t1 inner join accounts t2on  t1.account_id = t2.account_idwhere  type = 'Creditor'group by  t1.account_id, dhaving  all_amount > max_income
)select  distinct t1.account_idfrom  temp t1 inner join temp t2on  t1.account_id = t2.account_idand  period_diff(t2.d, t1.d) = 1;

第1875题

两个原表用inner join按照工资相同id不同自连接以排除掉工资独一无二的员工、dense_rank() 窗口函数按工资排出顺序、distinct去除联结后重复出现的数据、order by 排序。

  select  distinct t1.employee_id,t1.name,t1.salary,dense_rank() over (order by t1.salary) as team_idfrom  employees t1 inner join employees t2on  t1.salary = t2.salaryand  t1.employee_id <> t2.employee_id
order by  team_id asc, employee_id asc

第1951题

子查询中2个ralations表自连接让拥有相同follower的用户处在同行方便之后计数、联结筛选条件是follower_相等且联上来的t2表user_id > 原来t1表的user_id(不然1,7和7,1会算成两种情况)、group by联合分组、rank() 窗口函数以count(*)排序编号、主查询中筛选出排1名的。

select  user1_id,user2_idfrom  (  select  t1.user_id as user1_id,t2.user_id as user2_id,rank() over(order by count(*) desc) as rankingfrom  relations t1 inner join relations t2on  t1.follower_id = t2.follower_idand  t1.user_id < t2.user_idgroup by  t1.user_id, t2.user_id) twhere  ranking = 1;

第1939题

两个confirmations表inner join自联结、on或者where后面筛选出user_id相等且time_stamp相差1天的数据、可使用timestampdiff()、date_add()、unix_timestamp转换日期格式再相减等。

select  distinct t1.user_idfrom  confirmations t1 inner join confirmations t2on  t1.user_id = t2.user_idand  t1.time_stamp < t2.time_stampand  unix_timestamp(t2.time_stamp) - unix_timestamp(t1.time_stamp) <= 24*60*60;

第1949题

由于每个人的朋友散布在user1_id和user2_id两列中,所以先用union all将原表和交换两列左右顺序后的新表进行并集,以便后续找某个id的朋友时只需要看第二列即可,再用临时表将上述查询结果保存下来取名为temp以免重复写。原表t1 inner join temp表 t2,将t1.user1_id的朋友联过来、继续inner join temp表t3,将t1.user2_id和前面联的t1.user1_id的共同朋友联过来、group by分组、count(t3中的id)计数、having筛选。

with temp as (   select  user1_id,user2_id from  friendshipunion all select  user2_id,user1_idfrom  friendship)select  t1.user1_id,t1.user2_id,count(t3.user2_id) as common_friendfrom  friendship t1 inner join temp t2on  t1.user1_id = t2.user1_idinner join temp t3on  t1.user2_id = t3.user1_idand  t2.user2_id = t3.user2_id
group by  t1.user1_id, t1.user2_idhaving  common_friend >= 3;

第1951题

子查询中2个ralations表自连接让拥有相同follower的用户处在同行方便之后计数、联结筛选条件是follower_相等且联上来的t2表user_id > 原来t1表的user_id(不然1,7和7,1会算成两种情况)、group by联合分组、rank() 窗口函数以count(*)排序编号、主查询中筛选出排1名的。

select  user1_id,user2_idfrom  (  select  t1.user_id as user1_id,t2.user_id as user2_id,rank() over(order by count(*) desc) as rankingfrom  relations t1 inner join relations t2on  t1.follower_id = t2.follower_idand  t1.user_id < t2.user_idgroup by  t1.user_id, t2.user_id) twhere  ranking = 1;

第2084题

left join 左外联结把orders表重复两次(注意此时可能本来只有1条的数据也会被复制多次,所以select后要加distinct)、筛选出要保留的数据。

select  distinct t1.order_id,t1.customer_id,t1.order_typefrom  orders t1 left join orders t2on  t1.customer_id = t2.customer_id and  t1.order_type <> t2.order_typewhere  t2.order_type is nullor  (t1.order_type = 0 and t2.order_type = 1);

第2228题

方法1:两个原表自连接、筛选条件是表2的id和表1id相同,且表2日期大于等于表1日期,且表2purchase_id不等于表1purchase_id、group by 按表1user_id和表1purchase_date联合分组、取得min(表2purchase_date),至此终于找到每一天对应的最近的下一次购买日期、datediff求差值、写进having里筛选。

  select  distinct t1.user_idfrom  purchases t1 inner join purchases t2on  t1.user_id = t2.user_idand  t2.purchase_date >= t1.purchase_dateand  t1.purchase_id <> t2.purchase_id
group by  t1.user_id, t1.purchase_datehaving  datediff(min(t2.purchase_date), t1.purchase_date) <= 7
order by  user_id

方法2:子查询中用lag()窗口函数按user_id分区,purchase_date和purchase_id排序,将每次购买日期的上次购买日期接在每次购买日期后面。主查询中datediff算差值、where筛选。

  select  distinct user_idfrom  (select  user_id,purchase_date,lag(purchase_date, 1, '2000-01-01') over(partition by user_id order by purchase_date, purchase_id) as last_datefrom  purchases) twhere  datediff(purchase_date, last_date) <= 7
order by  user_id

4. union 求并集(常用于列转行/宽转长)

功能用法
将两个表上下合并表1 union all 表2
将两个表上下合并+去重表1 union 表2

注意:

(1)合并时两个表列数必须相同。列名可以不同,但输出结果以最上方的表的列名为准。

(2)union all直接把两表上下拼一起,而union会将两个表中一整排都相同的地方进行去重。union all性能优于union,一般没有去重的要求都使用union all。


第602题

子查询中将requester_id和accepter_id两列并成1列、主查询中group by分组、count() 计数、order by排序、limit 1输出第1名。

  select  requester_id as id,count(*) as numfrom  (   select  requester_idfrom  requestacceptedunion allselect  accepter_idfrom  requestaccepted) t
group by  requester_id
order by  num desclimit  1

第1205题

临时表中chargeback表 left join transaction表,将chargeback表对应的国家和金额等信息补全(补到和事务表一样5列)、union all到transactions表上,组成一个完整的事务表。在完整的事务表中按日期的年月和国家多字段分组、挨个统计需要的数据,这里又用到了sum(条件语句)。

with temp as (select  t1.trans_id as id,t2.country,'chargeback' as state,t2.amount,t1.trans_datefrom  chargebacks t1 left join transactions t2on  t1.trans_id = t2.idunion allselect  id,country,state,amount,trans_datefrom  transactionswhere  state = 'approved'
)select  left(trans_date, 7) as month,country,sum(state = 'approved') as approved_count,sum(if(state = 'approved', amount, 0)) as approved_amount,sum(state = 'chargeback') as chargeback_count,sum(if(state = 'chargeback', amount, 0)) as chargeback_amountfrom  temp
group by  month, country;

第1212题

临时temp表中将原表数据和主客交换(记得队伍id和得分都要交换)后的新表数据union all到一起,这样的效果是原本要知道所有队伍的得分情况需要host_team和guest_team两列一起看,因为一个队伍有时是主场有时是客场,但现在只需要看host_team 1列即可,方便后续group by分组。主查询中将teams表left join temp表、group by按id分组、sum(条件语句)统计得分、order by排序。

with temp as (select  host_team,guest_team,host_goals,guest_goalsfrom  matchesunion allselect  guest_team,host_team,guest_goals,host_goalsfrom  matches
)select  team_id,team_name,sum(case when host_goals > guest_goals then 3when host_goals = guest_goals then 1else 0 end) as num_pointsfrom  teams t1 left join temp t2on  t1.team_id = t2.host_team
group by  team_id
order by  num_points desc, team_id asc;

第1264题

子查询中将friendship原表和交换左右顺序的原表union all取并集、套一层查询从第二列找到user_id=1的朋友、主查询中查distinct page_id,因为有的人喜欢同一个页面、where进行筛选,条件是user_id在上面子查询的朋友中且page_id不等于1号已经喜欢的page_id(要再用个子查询得到1号喜欢的page_id)。

select  distinct page_id as recommended_pagefrom  likeswhere  user_id in (select  user2_idfrom  (select  user1_id,user2_idfrom  friendshipunion all select  user2_id,user1_idfrom  friendship) t1where  user1_id = 1)and  page_id not in (select  page_idfrom  likeswhere  user_id = 1);

第1341题

两个子查询分别查出最多点评的用户和最高平均分的电影、union all(两个子查询列数要对的上,列名先改后改都可以,并不影响union)。这里子查询需要注意:要么用窗口函数排序,主查询中选排1的数据,要么用limit 1,但是用limit 1的话要给两个子查询套括号,否则无法使用union。

   select  resultsfrom  (  select  t2.name as results,rank() over(order by count(t1.user_id) desc, t2.name asc) as rkfrom  movierating t1 inner join users t2on  t1.user_id = t2.user_idgroup by  t1.user_id
union allselect  t4.title as results,rank() over(order by avg(t3.rating) desc, t4.title asc) as rkfrom  movierating t3 inner join movies t4on  t3.movie_id = t4.movie_idand  t3.created_at between '2020-02-01' and '2020-02-29'group by  t3.movie_id) twhere  rk = 1;

第1435题

先用嵌套查询分别筛选出不同时间段的表,并count(*)算出数量,最后两两union all,得到结果表。

  select  case when duration < 300 then '[0-5>' when duration < 600 then '[5-10>'when duration < 900 then '[10-15>'else '15 or more' end as bin,count(*) as totalfrom  sessions
group by  bin;

第1501题

子查询将calls表拆成2份并union all到一起,第一份是caller_id + duration,第二份是callee_id + duration,这是因为一通电话实际上对于呼出和呼入用户的国家来说要各计算一次,即使国家相同也是一样。inner join联好人名表,国家表、group by 分组、having中用avg() > 全世界电话平均时长来筛选、这里还要用子查询计算一次全世界的电话平均时长。

  select  t3.name as countryfrom  (   select  caller_id,durationfrom  callsunion allselect  callee_id,durationfrom  calls) t1 inner join person t2on  t1.caller_id = t2.id inner join country t3on  left(t2.phone_number, 3) = t3.country_code
group by  t3.namehaving  avg(t1.duration) > (select  avg(duration) from calls);

第1555题

临时表中将paid_by列和 - amount列作为一个表,paid_to列和amount列作为一个表,两表union all,得到所有人的财产变化表、财产变化表外面套一层按user_id(之前要别名一下)分组 + sum() 求出每个人财产变化总额,临时表结束。原表left join临时表、求出所有人现在的财产(记得coalesce()处理null值)、if(再算一次财产总值) 判断是否破产。

with temp as (select  user_id,sum(change_amount) as amountfrom  (   select  paid_by as user_id,-amount as change_amountfrom  transactionsunion allselect  paid_to as user_id,amount as change_amountfrom  transactions) tgroup by  user_id    
)select  t1.user_id,t1.user_name,t1.credit + coalesce(t2.amount, 0) as credit,if(t1.credit + coalesce(t2.amount, 0) < 0, 'Yes', 'No') as credit_limit_breachedfrom  users t1 left join temp t2on  t1.user_id = t2.user_id

第1699题

方法1:子查询中用case when 给from_id和to_id交换顺序、主查询中group by 联合分组、count()和sum() 统计。

SELECT person1,person2, count(*) call_count, sum(duration) total_duration 
FROM (
SELECT IF(from_id>to_id, to_id, from_id) person1, IF(from_id>to_id,from_id,to_id) person2, duration  
FROM calls 
) c 
GROUP BY person1, person2

方法2:子查询中用union all 分别把顺序对的数据、顺序不对但交换过的数据拼一起、后续同方法1。

  select  from_id as person1,to_id as person2,count(*) as call_count,sum(duration) as total_durationfrom  (   select  from_id,to_id,durationfrom  callswhere  from_id < to_idunion allselect  to_id as from_id,from_id as to_id,durationfrom  callswhere  from_id > to_id) t
group by  from_id, to_id;

第1789题

只有1个部门的员工和有多个部门的员工分开筛选,最后union。

   select  employee_id,department_idfrom  employee group by  employee_id having  count(*) = 1unionselect  employee_id,department_idfrom  employee where  primary_flag = 'Y'group by  employee_id;

第1783题

方法1:用多个union all 将championships表的数据汇总到一列并将列名改成player_id、natural join players 表、group by 分组、count() 统计。

方法2:两张表生成笛卡尔积,此时每个球员都对应一份championships表的全部数据、group by 分组、sum(case when) 筛选统计。

第1795题

用枚举法分别select出不同商店的数据,最后再union all。

select  product_id,'store1' as store,store1 as pricefrom  productswhere  store1 is not nullunion  all
select  product_id,'store2' as store,store2 as pricefrom  productswhere  store2 is not nullunion  all
select  product_id,'store3' as store,store3 as pricefrom  productswhere  store3 is not null;

第1811题

临时表temp中将原比赛表的3列转成1列方便统计每个人的获奖情况、找连续区间用编号法,算出contest_id和row_number() 编号的差值方便后续按差值diff分组。主查询中先将得了3次金奖的人选出并union all 连续3次得奖的人、group by按照id和diff联合分组,having + count() 计数选出连续得奖3次的人。再套一层查询natural join user表得到名字和邮箱、distinct去重以免有些又连续3次获奖又得3次金奖的人被算多。

with temp as (select  contest_id,player,contest_id - row_number() over(partition by player order by contest_id) as difffrom  (   select  contest_id,gold_medal as playerfrom  contestsunion allselect  contest_id,silver_medalfrom  contestsunion allselect  contest_id,bronze_medalfrom  contests) t
)select  distinct name,mailfrom  (   select  gold_medal as user_idfrom  contestsgroup by  gold_medal having  count(*) >= 3union allselect  playerfrom  tempgroup by  player, diffhaving  count(*) >= 3) t natural join users;

第1841题

和1212题不能说非常相似,只能说一模一样。

with temp as (select  home_team_id,away_team_id,home_team_goals,away_team_goalsfrom  matchesunion all select  away_team_id,home_team_id,away_team_goals,home_team_goalsfrom  matches
)select  team_name,count(home_team_id) as matches_played,sum(case when home_team_goals > away_team_goals then 3when home_team_goals = away_team_goals then 1else 0 end) as points,sum(home_team_goals) as goal_for,sum(away_team_goals) as goal_against,sum(home_team_goals - away_team_goals) as goal_difffrom  teams t1 inner join temp t2on  t1.team_id = t2.home_team_id
group by  team_id
order by  points desc, goal_diff desc, team_name asc;

第1907题

sum(case when)计数、3种工资都统计完了之后union all。

   select  'Low Salary' as category,sum(if(income < 20000, 1, 0)) as accounts_countfrom  accounts
union allselect  'Average Salary' as category,sum(if(income between 20000 and 50000, 1, 0)) as accounts_countfrom  accounts
union allselect  'High Salary' as category,sum(if(income > 50000, 1, 0)) as accounts_countfrom  accounts;

第1949题

由于每个人的朋友散布在user1_id和user2_id两列中,所以先用union all将原表和交换两列左右顺序后的新表进行并集,以便后续找某个id的朋友时只需要看第二列即可,再用临时表将上述查询结果保存下来取名为temp以免重复写。原表t1 inner join temp表 t2,将t1.user1_id的朋友联过来、继续inner join temp表t3,将t1.user2_id和前面联的t1.user1_id的共同朋友联过来、group by分组、count(t3中的id)计数、having筛选。

with temp as (   select  user1_id,user2_id from  friendshipunion all select  user2_id,user1_idfrom  friendship)select  t1.user1_id,t1.user2_id,count(t3.user2_id) as common_friendfrom  friendship t1 inner join temp t2on  t1.user1_id = t2.user1_idinner join temp t3on  t1.user2_id = t3.user1_idand  t2.user2_id = t3.user2_id
group by  t1.user1_id, t1.user2_idhaving  common_friend >= 3;

第1965题

先union all两个表求id的并集、再group by按id分组、having + count(*)筛选出只出现过一次的id、order by。

  select  employee_idfrom  (   select  employee_idfrom  employeesunion allselect  employee_idfrom  salaries) as t
group by  employee_idhaving  count(*) = 1
order by  employee_id;

第1990题(查固定字段)

union all枚举创建固定字段的platform表和experiment_name表、cross join得到这两个表笛卡尔积、left join experiments表+group by+count(*) 计数。

  select  t1.platform,t2.experiment_name,count(t3.experiment_name) as num_experimentsfrom  (select  'Android' as platform union allselect  'IOS' as platform union allselect  'Web' as platform) t1 cross join(select  'Reading' as experiment_name union allselect  'Sports' as experiment_name union allselect  'Programming' as experiment_name) t2 left joinexperiments t3on  t1.platform = t3.platformand  t2.experiment_name = t3.experiment_name
group by  platform, experiment_name;

5. 子查询/嵌套查询

注意:由于差集题肯定涉及子查询,所以写进差集分类的题不再写进子查询分类。

第176题

方法1:子查询中distinct 去重、order by 排序、limit + offset 输出第二位、主查询再查一次把空值变成null。

SELECT(SELECT DISTINCTSalaryFROMEmployeeORDER BY Salary DESCLIMIT 1 OFFSET 1) AS SecondHighestSalary;

方法2:子查询中dense_rank() 排序、主查询中where 筛选、sum()或者max()或者min()聚合函数把空值变成null。

select  max(salary) as SecondHighestSalaryfrom  (select  salary,dense_rank() over(order by salary desc) as rk from  employee) twhere  rk = 2;

第177题

类似第176题。用方法1要先set n = n - 1。

CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGINset n = n - 1;RETURN (# Write your MySQL query statement below.select  salaryfrom  (  select  distinct salary from  employeeorder by  salary desc) tlimit  1 offset n);
END

第597题

方法一:先distinct + 多字段得到去重后的临时表,再count()对临时表中的数据计数,无法直接count(distinct + 多字段)。

方法二:为了解决无法直接count(distinct + 多字段)的问题可以把多字段用concat()拼成单字段。

select  round(coalesce((t1.acc/t2.req), 0), 2) as accept_ratefrom  (select  count(distinct concat(requester_id, accepter_id)) as acc,'yyy' as tagfrom  RequestAccepted) t1 inner join(select  count(distinct concat(sender_id, send_to_id)) as req,'yyy' as tagfrom  FriendRequest) t2 on  t1.tag = t2.tag;

第1076题

方法1:子查询中group by按照id分组、count(*)计算人数、主查询中再用group by分组、having筛选、count( * )>=all(子查询中人数)。

select  project_idfrom  ( select  project_id,count(project_id) as numfrom  projectgroup by  project_id) twhere  num >= all(  select  count(project_id)from  projectgroup by  project_id)

方法2:子查询中group by按照id分组、用rank()函数按照count(*)排序(这里比较重要的是rank里是可以用聚合函数的)、主查询中where筛选排序第1名的(rank和dense_rank都可以排出并列排名)。

select  project_idfrom  (  select  project_id,rank() over(order by count(*) desc) as numfrom  projectgroup by  project_id) twhere  num = 1;

第1082题

子查询+group by建立临时表得到每人总销售额、主表用having筛选出sum()总销售额>=all临时表里销售额的销售者id。

  select  seller_idfrom  sales
group by  seller_idhaving  sum(price) >= all(  select  sum(price)from  salesgroup by  seller_id);

第1107题

子查询中按user_id分组得到每个人首次登陆日期、主查询中在子查询的基础上继续筛选出日期在90天内的数据并按日期分组。

  select  login_date,count(user_id) as user_countfrom  (  select  user_id,min(activity_date) as login_datefrom  trafficwhere  activity = 'login'group by  user_idhaving  datediff('2019-06-30', login_date) <= 90) t  
group by  login_date;

第1112题

方法1:子查询中查出max()分数、inner join联上原表,条件是id一样、grade一样,则不是最高分的没联上就筛掉了、分组并输出数据(因为有并列第一,所以course_id要输出min()最小值)。

  select  t1.student_id,min(t1.course_id) as course_id,t1.gradefrom  enrollments t1 inner join (  select  student_id,max(grade) as bestfrom  enrollmentsgroup by  student_id) t2on  t1.student_id = t2.student_idand  t1.grade = t2.best
group by  student_id
order by  student_id;

方法2:子查询中rank()窗口函数排序、主查询中筛选rank = 1。

第1126题

子查询中group by分组,avg()聚合求出每种event的平均数、主查询中原表natural join子查询、group by按business_id分组、having + sum(case when)统计并筛选。

  select  business_idfrom  events t1 natural join (  select  event_type,avg(occurences) as estdfrom  eventsgroup by  event_type) t2
group by  business_idhaving  sum(case when occurences > estd then 1 else 0 end) >= 2;

第1174题

子查询查出用户首次购买数据、主查询统计order_date = pref_date的数据。这里注意一个小技巧:按条件计数,是即时订单+1,计划订单+0,可以用sum(case when),也可以直接写sum(条件),此时条件满足时自动把布尔值转换成1计算,不满足转换成0计算。

select  round(100 * sum(order_date = customer_pref_delivery_date) / count(*), 2) as immediate_percentagefrom  delivery where  (customer_id, order_date) in (  select  customer_id,min(order_date) as first from  deliverygroup by  customer_id);

第1241题

子查询+where筛选查出两个表(帖子表、评论表)、帖子表left join评论表、count计数。

第1303题

子查询出每个团队人数,再join到原表上。

select  employee_id,team_sizefrom  employee t1 inner join (  select  team_id,count(*) as team_sizefrom  employeegroup by  team_id) t2on  t1.team_id = t2.team_id;

第1468题

子查询中分组max()聚合查出每个公司的最高工资、主查询中原表natural join子表、case when分条件计算税后工资。

select  company_id,employee_id,employee_name,round(case when cstd < 1000 then salarywhen cstd between 1000 and 10000 then salary * 0.76else salary * 0.51 end, 0) as salaryfrom  salaries t1 natural join (  select  company_id,max(salary) as cstdfrom  salariesgroup by  company_id) t2;

第1821题

子查询筛选出对应年份数据、group by 单字段分组、having筛选出正收入数据。

  select  customer_idfrom  (select  customer_id,revenue from  customerswhere  `year` = '2021') t2
group by  customer_idhaving  sum(revenue) > 0;

第1831题

方法1:子查询中rank() over() 排出每天amount的名次、主查询中得到排第1名的数据。

方法2:子查询中group by 分组、max() 求最大值、inner join 到原表上、筛选出原表中日期等于子查询日期且amount = 当日最大值的数据。

  select  transaction_idfrom  transactions t1 inner join(  select  date(day) as day,max(amount) as firstfrom  transactionsgroup by  date(day)) t2on  date(t1.day) = t2.dayand  t1.amount = t2.first
order by  transaction_id;

第1867题

子查询中查出每个order_id分组下quantity的平均值、主查询中再次group by 分组并用having 筛选出max(quantity) > all子查询平均值的数据。

第2112题

孙查询中union all把出发机场和到达机场列联结起来、子查询中分组聚合求飞机数并按飞机数用rank() 窗口函数排序、父查询中筛选出排第1的。

select  airport_idfrom  (  select  airport_id,rank() over(order by sum(flights_count) desc) as numfrom  (   select  departure_airport as airport_id,flights_countfrom  flights union all  select  arrival_airport as airport_id,flights_count from  flights) t1group by  airport_id) t2where  num = 1;

第2142题

子查询中将乘客表和公交车表内联结、筛选出乘客能坐的所有车(即后于乘客到达的车)、group by按乘客id分组、min()选出时间最早的车(即乘客真正坐的车)。主查询中公交车表 left join 子查询、group by按公交车id分组、count()计数、order by排序。

  select  t3.bus_id,count(t.tm) as passengers_cntfrom  buses t3 left join (  select  min(t2.arrival_time) as tmfrom  passengers t1 inner join buses t2on  t1.arrival_time <= t2.arrival_timegroup by  passenger_id) ton  t3.arrival_time = t.tm
group by  t3.bus_id
order by  bus_id asc;

6. 字符串相关函数

功能函数
字母大写upper(字符串)
字母小写lower(字符串)
字符串切割(切左n个字符)left(字符串, n)
字符串切割(切右n个字符)right(字符串, n)
字符串切割(从第n个开始切)substring(字符串, n)
字符串拼接concat(字符串1, 字符串2, …)
将字符串用指定字符连接concat_ws(‘连接符号’, 字符串1, 字符串2, …)
统计字符串长度char_length(字符串)
去除字符串左右空格trim(字符串)
重复目标字符串n次repeat(字符串, n)

第1543题

trim()去空格、lower()改小写、date_format() 或者left() 取年月、group by 联合分组、count()计数。

  select  lower(trim(product_name)) as product_name,left(sale_date, 7) as sale_date,count(*) as totalfrom  sales
group by  lower(trim(product_name)), left(sale_date, 7)
order by  product_name, sale_date;

第1565题

left()取日期中的年和月、group by分组、count()统计

  select  left(order_date, 7) as month,count(*) as order_count,count(distinct customer_id) as customer_countfrom  orderswhere  invoice > 20
group by  month;

第1667题

left() 取首字母,配合upper() 将字母大写,substring() 取后面,配合lower() 将字母小写,最后concat()组合。

  select  user_id,concat(upper(left(name, 1)), lower(substring(name, 2))) as namefrom  users
order by  user_id;

第1683题

char_length()统计字符串长度、where筛选。

select  tweet_idfrom  tweetswhere  char_length(content) > 15;

7. 时间相关函数

功能函数
从日期中提取年份数值year(日期)
从日期中提取月份数值month(日期)
从日期中提取日数值day(日期)
日期格式化date_format(日期, ‘占位符’)
计算两个日期的天数差datediff(大日期, 小日期)
日期差值(单位:需要指定)timestampdiff(输出单位, 小日期, 大日期)
给日期加上n个单位的时间(如1 day)date_add(日期, interval n type)
给日期减去n个单位的时间(如1 day)date_sub(日期, interval n type)
得到对应的时间戳unix_timestamp(‘yyyy-MM-dd’)
给日期加上一定的月份period_add(日期, n)
计算两个日期的月份差period_diff(日期1, 日期2)
计算星期几(范围0-6,周一开始算)weekday(日期)
计算星期几(范围1-7,周日开始算)dayofweek(日期)

注意:period函数的日期参数必须是’YYYYMM’或’YYMM’格式(可以用date_format转格式),且period_add函数输出的是’YYYYMM’格式。这个函数专门用来计算月份差,尤其是对于如2020年12月和2021年1月这种跨年的两个时间点能方便地计算出相差1个月。

附录:date_format的所有占位符

%W 星期名字(Sunday……Saturday)  
%D 有英语前缀的月份的日期(1st, 2nd, 3rd, 等等。)  
%Y 年, 数字, 4%y 年, 数字, 2%a 缩写的星期名字(Sun……Sat)  
%d 月份中的天数, 数字(00……31)  
%e 月份中的天数, 数字(0……31)  
%m 月, 数字(01……12)  
%c 月, 数字(1……12)  
%b 缩写的月份名字(Jan……Dec)  
%j 一年中的天数(001……366)  
%H 小时(00……23)  
%k 小时(0……23)  
%h 小时(01……12)  
%I 小时(01……12)  
%l 小时(1……12)  
%i 分钟, 数字(00……59)  
%r 时间,12 小时(hh:mm:ss [AP]M)  
%T 时间,24 小时(hh:mm:ss)  
%S 秒(00……59)  
%s 秒(00……59)  
%p AM或PM  
%w 一个星期中的天数(0=Sunday ……6=Saturday )  
%U 星期(0……52), 这里星期天是星期的第一天  
%u 星期(0……52), 这里星期一是星期的第一天  
%% 一个文字“%

第1141题

where + datediff 筛选、group by 分组、count(distinct) 计数。

  select  activity_date as day,count(distinct user_id) as active_usersfrom  activitywhere  datediff('2019-07-27', activity_date) < 30 and datediff('2019-07-27', activity_date) >= 0
group by  activity_date;

第1142题

方法1:子查询里where + datediff() 筛选、group by 分组、count() 计数、主查询里avg() 算平均值。

select  round(coalesce(avg(num), 0), 2) as average_sessions_per_userfrom  (  select  count(distinct session_id) as numfrom  activitywhere  activity_date between '2019-06-28' and '2019-07-27'group by  user_id) t;

方法2:where + datediff() 筛选、直接count(distinct session_id) / count(distinct user_id) 算平均值。

SELECT IFNULL(ROUND(COUNT(DISTINCT session_id) / COUNT(DISTINCT user_id), 2), 0) AS average_sessions_per_user
FROM Activity
WHERE DATEDIFF('2019-07-27', activity_date) < 30

第1454题(连续区间)

临时表对原表按照id和login_date联合去重,防止1个人1天登录多次。子查询中row_number()窗口函数编号准备找连续区间,但是这里有个问题,即原表是没有编号的,只能用日期减,即再窗口函数外面再套一个date_sub() 函数,给差别名为diff。主查询里将子查询right join到accounts表中(注意这本来inner join更好,采取right join的原因是之后就不用排序了)、按id和diff联合分组、having + count(*) 筛选连续天数大于等于5天的人、最后select时要去重,因为有的人这段时间连续登陆5天以上,后面哪段时间又连续登陆5天以上。

with temp as (select  distinct id,login_datefrom  logins
)select  distinct t1.id,t2.namefrom  (select  id,date_add(login_date, interval -row_number() over(partition by id order by login_date) day) as difffrom  temp) t1 right join accounts t2on  t1.id = t2.id
group by  t1.id, diffhaving  count(*) >= 5;

第1479题

weekday取星期数,case when判断,sum()求和。

  select  item_category as category,sum(case when weekday(order_date) = 0 then quantity else 0 end) as monday,sum(case when weekday(order_date) = 1 then quantity else 0 end) as Tuesday,sum(case when weekday(order_date) = 2 then quantity else 0 end) as Wednesday,sum(case when weekday(order_date) = 3 then quantity else 0 end) as Thursday,sum(case when weekday(order_date) = 4 then quantity else 0 end) as Friday,sum(case when weekday(order_date) = 5 then quantity else 0 end) as Saturday,sum(case when weekday(order_date) = 6 then quantity else 0 end) as Sundayfrom  items t1 left join orders t2on  t1.item_id = t2.item_id
group by  item_category
order by  item_category;

第1843题

临时表temp中将transactions表和accounts表inner join,on id相等、where筛选出所有‘creditor’数据、group by 按账户id和年月(需要用date_format取)联合分组、sum() 计算每人每月总收入、having 筛选出总收入 > 最大收入的数据。主查询中将两个temp表inner join自连接、on id相等且月份差1,求月份差用period_diff()、这样留下的人都是可疑的,只不过有的人可能连续多个月可疑,就会被选出多次,所以还要来个distinct。

with temp as (select  t1.account_id,sum(t1.amount) as all_amount,t2.max_income,date_format(t1.day, "%Y%m") as dfrom  transactions t1 inner join accounts t2on  t1.account_id = t2.account_idwhere  type = 'Creditor'group by  t1.account_id, dhaving  all_amount > max_income
)select  distinct t1.account_idfrom  temp t1 inner join temp t2on  t1.account_id = t2.account_idand  period_diff(t2.d, t1.d) = 1;

第1853题

date_format格式化日期。

select  concat_ws(', ', dayname(day), concat(monthname(day), ' ', day(day)), year(day)) as dayfrom  days;

第1890题

group by 单字段分组、year()取时间年份、max()。

  select  user_id,max(time_stamp) as last_stampfrom  loginswhere  year(time_stamp) = '2020'
group by  user_id;

第1939题

两个confirmations表inner join自联结、on或者where后面筛选出user_id相等且time_stamp相差1天的数据、可使用timestampdiff()、date_add()、unix_timestamp转换日期格式再相减等。

select  distinct t1.user_idfrom  confirmations t1 inner join confirmations t2on  t1.user_id = t2.user_idand  t1.time_stamp < t2.time_stampand  unix_timestamp(t2.time_stamp) - unix_timestamp(t1.time_stamp) <= 24*60*60;

第2298题

weekday()从日期中取得星期几、sum(case when)分类统计、when里面写法是weekday(日期) in (元组),而不是一个个去枚举week(day)=0、=1、=2的情况。

select  sum(case when weekday(submit_date) in (5, 6) then 1 else 0 end) as weekend_cnt,sum(case when weekday(submit_date) in (5, 6) then 0 else 1 end) as working_cntfrom  tasks;

8. group by 分组

功能用法
单字段分组group by 字段名(即列名)
多字段联合分组group by 字段名1, 字段名2, …

注意:

(1)单字段分组是把这个字段相同的数据全部归为一组,即聚合成1排。多字段联合分组是把这些字段都相同的数据归为一组。

(2)分组优先度仅次于联结、where筛选。

(3)分组的执行方式讲解:例表学生表:

姓名性别成绩
张三90
李四80
王五70

group by 性别 -> 性别列只剩下男、女

-> 女姓名列 -> 只有李四,如果要输出肯定也只输出李四

-> 男姓名列 -> 同时有张三和李四两排数据 -> 如果不聚合,则输出男组的第一排即张三

-> 女成绩列 -> 只有个80分,如果要输出肯定也只输出80

-> 男成绩列 -> 同时有90和70两排数据 -> 使用avg()平均数函数聚合,则输出80

总结:分组列不用管,其他列如果不聚合就默认输出它那组的第一排,如果聚合了就把组内目标列的数据进行聚合。当然,我们进行分组操作肯定是为了进行聚合,所以一般不聚合的列根本都不会输出。

这里补充一点:如果不分组直接聚合,那么被聚合的列理所应当输出聚合后的结果,但是没聚合的列则输出第一排数据,而不是和聚合列配对的同一排的数据。例如还是上面的表,想查询最低分,直接select 姓名, min(成绩),得到的是张三, 70,结果完全错误。


单字段分组

第511题

group by 分组、min() 求最小日期。

  select  player_id, min(event_date) as first_loginfrom  Activity
group by  player_id;

第586题

方法1:group by 分组、order by + count(*)排序、limit限制只输出第一名。(这个方法不能有并列第一,当然本题明说了没有并列第一)

  select  customer_numberfrom  orders
group by  customer_number
order by  count(*) desclimit  1;

方法2:group by 分组、having筛选出count(*) >= all(子查询出每个customer的count( *)) 。

  select  customer_number from  Orders
group by  customer_numberhaving  count(*) >= all(  select  count(order_number)from  ordersgroup by customer_number);

第619题

子查询中group by 分组、count()计数、having筛选、主查询中max()找最大值。

select  max(num) as numfrom  (  select  num, count(*) as cishufrom  mynumbersgroup by  numhaving  cishu = 1) t2;

第1069题

group by 分组、sum()求和。

  select  product_id,sum(quantity) as total_quantityfrom  sales
group by  product_id;

第1075题

两表inner join 联结、group by 分组、avg() 计算平均年龄、round() 四舍五入。

  select  project_id,round(avg(experience_years), 2) as average_yearsfrom  project t1 inner join employee t2on  t1.employee_id = t2.employee_id
group by  project_id;

第1194题

先将matches表的两个选手列和得分列拆成两张表,上下union all、套一层查询inner join联上players表、再按player_id分组、sum()求每人得的总分、order by按题目要求排好序、再套一层查询使用group by按group_id分组、由于group by的特性,如果不聚合,输出的就是每组的第一排数据,即刚才排好序的每组内得分最高的player。

  select  group_id,player_idfrom  (  select  t2.group_id,t2.player_id,sum(score) as scorefrom  (   select  first_player as player,first_score as score from  matchesunion allselect  second_player,second_scorefrom  matches) t1 inner join players t2on  t1.player = t2.player_idgroup by  playerorder by  score desc, player_id asc) t 
group by  group_id;

第1211题

group by 分组、avg()求平均数、sum()和case when配合统计目标数据的数量。

  select  query_name,round(avg(rating / position), 2) as quality,round(100 * sum(case when rating < 3 then 1 else 0 end) / count(*), 2) as poor_query_percentagefrom  queries
group by  query_name;

第1322题

group by 分组、sum(条件)(条件成立时为true,求和时按1算,否则按0算)统计个数。

  select  ad_id,coalesce(round(100 * sum(action='Clicked') / sum(action<>'Ignored'), 2), 0) as ctrfrom  ads
group by  ad_id
order by  ctr desc,ad_id asc;

第1327题

left join 两表联结将表id和表名对应起来、where去除不满足日期条件的数据、group by 分组、sum计算总下单数量、最后having筛选。

  select  product_name,sum(unit) as unitfrom  orders t1 left join products t2on  t1.product_id = t2.product_idwhere  left(order_date, 7) = '2020-02'
group by  t1.product_idhaving  unit >= 100;

第1484题

group by 分组、count(distinct) 计数、group_concat 字符串拼接。

select  sell_date,count(distinct product) as num_sold,group_concat(distinct product order by product asc separator ',') as productsfrom  activities
group by  sell_date;

第1571题

inner join 两表联结、group by 分组、sum()求和。

  select  name as WAREHOUSE_NAME,sum(units * v_pro) as VOLUMEfrom  warehouse t1 inner join (select  product_id,width * length * height as v_profrom  products) t2on  t1.product_id = t2.product_id
group by  t1.name;

第1633题

子查询求用户总数、group by 分组、count()/总数求百分比。

  select  contest_id,round(100 * count(*) / n, 2) as percentagefrom  register, (select  count(*) as n from users) t
group by  contest_id
order by  percentage desc, contest_id asc;

第1729题

group by 分组、count()计数。

  select  user_id,count(*) as followers_countfrom  followers
group by  user_id
order by  user_id;

第1821题

子查询筛选出对应年份数据、group by 分组、having筛选出正收入数据。

第1890题

group by 分组、year()取时间年份、max()。

  select  user_id,max(time_stamp) as last_stampfrom  loginswhere  year(time_stamp) = '2020'
group by  user_id;

多字段联合分组

group by 列1, 列2, 列3, …。

第1050题

group by 联合分组、having + count(*) 筛选。

  select  actor_id,director_idfrom  actordirector
group by  actor_id, director_idhaving  count(*) >= 3;

第1149题

子查询中先将表去重一次,排除看同一本书的情况。主查询中按viewer_id和view_date 联合分组、having+ count() 筛选出看了2篇及以上文章的人、最后distinct去重是排除同一个人多天读多本书的情况~~(太学霸了)~~。

  select  distinct viewer_id as id  from  (select  distinct article_id,viewer_id,view_datefrom  views) t
group by  viewer_id, view_datehaving  count(*) >= 2
order by  id;

第1193题

group by 按country和trans_date的年月部分联合分组、count() + sum(if) 聚合需要的数据。

  select  date_format(trans_date, '%Y-%m') as month,country,count(*) as trans_count,sum(state = 'approved') as approved_count,sum(amount) as trans_total_amount,sum(if(state='approved', amount, 0)) as approved_total_amountfrom  transactions
group by  country, month;

第1543题

trim()去空格、lower()改小写、date_format() 或者left() 取年月、group by 联合分组、count()计数。

  select  lower(trim(product_name)) as product_name,left(sale_date, 7) as sale_date,count(*) as totalfrom  sales
group by  lower(trim(product_name)), left(sale_date, 7)
order by  product_name, sale_date;

第1693题

group by 多字段联合分组、distinct去重。

  select  date_id,make_name,count(distinct lead_id) as unique_leads,count(distinct partner_id) as unique_partnersfrom  DailySales
group by  date_id, make_name;

第1699题

方法1:子查询中用case when 给from_id和to_id交换顺序、主查询中group by 联合分组、count()和sum() 统计。

SELECT person1,person2, count(*) call_count, sum(duration) total_duration 
FROM (
SELECT IF(from_id>to_id, to_id, from_id) person1, IF(from_id>to_id,from_id,to_id) person2, duration  
FROM calls 
) c 
GROUP BY person1, person2

方法2:子查询中用union all 分别把顺序对的数据、顺序不对但交换过的数据拼一起、后续同方法1。

  select  from_id as person1,to_id as person2,count(*) as call_count,sum(duration) as total_durationfrom  (   select  from_id,to_id,durationfrom  callswhere  from_id < to_idunion allselect  to_id as from_id,from_id as to_id,durationfrom  callswhere  from_id > to_id) t
group by  from_id, to_id;

第1741题

group by 多字段联合分组、sum()。

  select  event_day as day,emp_id,sum(out_time) - sum(in_time) as total_timefrom  employees
group by  event_day, emp_id;

第1951题

子查询中2个ralations表自连接让拥有相同follower的用户处在同行方便之后计数、联结筛选条件是follower_相等且联上来的t2表user_id > 原来t1表的user_id(不然1,7和7,1会算成两种情况)、group by联合分组、rank() 窗口函数以count(*)排序编号、主查询中筛选出排1名的。

select  user1_id,user2_idfrom  (  select  t1.user_id as user1_id,t2.user_id as user2_id,rank() over(order by count(*) desc) as rankingfrom  relations t1 inner join relations t2on  t1.follower_id = t2.follower_idand  t1.user_id < t2.user_idgroup by  t1.user_id, t2.user_id) twhere  ranking = 1;

9. having 筛选

注意:从使用情景上来说,where 是group by 分组前用,having 是group by 分组后用。从用法上来说,where 后不能跟聚合函数,因为执行where 语句时都还没分组。having 后可以跟聚合函数,因为此时组已经分好了。


第182题

group by 分组、having count()筛选。

  select  Emailfrom  Person
group by  Emailhaving  count(Email) > 1;

第586题

方法1:group by 分组、order by + count(*)排序、limit限制只输出第一名。(这个方法不能有并列第一,当然本题明说了没有并列第一)

方法2:group by 分组、having筛选出count(*) >= all(子查询出每个customer的count( *)) 。

第596题

group by 分组、having + count(*) 筛选。

  select  classfrom  courses
group by  classhaving  count(*) >= 5;

第1050题

group by 联合分组、having + count(*) 筛选。

  select  actor_id,director_idfrom  actordirector
group by  actor_id, director_idhaving  count(*) >= 3;

第1076题

方法1:子查询中group by按照id分组、count(*)计算人数、主查询中再用group by分组、having筛选、count( * )>=all(子查询中人数)。

select  project_idfrom  ( select  project_id,count(project_id) as numfrom  projectgroup by  project_id) twhere  num >= all(  select  count(project_id)from  projectgroup by  project_id)

方法2:子查询中group by按照id分组、用rank()函数按照count(*)排序(这里比较重要的是rank里是可以用聚合函数的)、主查询中where筛选排序第1名的(rank和dense_rank都可以排出并列排名)。

select  project_idfrom  (  select  project_id,rank() over(order by count(*) desc) as numfrom  projectgroup by  project_id) twhere  num = 1;

第1501题

子查询将calls表拆成2份并union all到一起,第一份是caller_id + duration,第二份是callee_id + duration,这是因为一通电话实际上对于呼出和呼入用户的国家来说要各计算一次,即使国家相同也是一样。inner join联好人名表,国家表、group by 分组、having中用avg() > 全世界电话平均时长来筛选、这里还要用子查询计算一次全世界的电话平均时长。

  select  t3.name as countryfrom  (   select  caller_id,durationfrom  callsunion allselect  callee_id,durationfrom  calls) t1 inner join person t2on  t1.caller_id = t2.id inner join country t3on  left(t2.phone_number, 3) = t3.country_code
group by  t3.namehaving  avg(t1.duration) > (select  avg(duration) from calls);

第1511题

inner join联结3表、group by分组、having和case when配合筛选。

  select  t3.customer_id,t3.namefrom  orders t1 inner join product t2 inner join customers t3on  t1.product_id = t2.product_idand  t1.customer_id = t3.customer_id
group by  t1.customer_idhaving  sum(case when left(order_date, 7) = '2020-06' then t1.quantity * t2.price end) >= 100and  sum(case when left(order_date, 7) = '2020-07' then t1.quantity * t2.price end) >= 100;

第1587题

inner join 两表联结、group by 分组、sum()求和、having 筛选。

第1867题

子查询中查出每个order_id分组下quantity的平均值、主查询中再次group by 分组并用having 筛选出max(quantity) > all子查询平均值的数据。

  select  order_idfrom  ordersdetails
group by  order_idhaving  max(quantity) > all(  select  avg(quantity)from  ordersdetailsgroup by  order_id);

第2041题

natural join 联结两表、group by 按面试id分组、where 筛选工作经验2年以上、having + sum() 筛选总分>15。

  select  candidate_idfrom  rounds t1 natural join candidateswhere  years_of_exp >= 2
group by  interview_idhaving  sum(score) > 15;

10. if/case 语句

功能用法
如果满足条件则输出A,否则输出Bif(条件, A, B)
如果目标列下的值等于目标值则输出A,否则输出Bcase 目标列 when 目标值 then A else B end
如果满足条件1则输出A,满足条件2则输出B,否则Ccase when 条件1 then A
when 条件2 then B
else C end

注意:由于if和case对于只有1个条件的情景用法是一样的,为了统一起见本文档只用case when,实际情况根据个人习惯而定。


简单的case语句(即只能判断是否相等)

第578题

group by 分组、order by 里计算回答率(使用sum(条件语句))、limit 1。

  select  question_id as survey_logfrom  surveylog 
group by  question_id
order by  sum(action='answer') / sum(action='show') desc, question_id asclimit  1;

第610题

case when。

select  x,y,z,if(y + z > x and y + x > z and x + z > y, 'Yes', 'No') as trianglefrom  triangle;

第627题

update、case when。

update salary set sex = case sex when 'm' then 'f' else 'm' end;  

第1083题

方法一:分别where筛选出买S8和iphone的人,再not in 找差集。

select  distinct buyer_idfrom  sales t1 left join product t2on  t1.product_id = t2.product_idwhere  buyer_id not in (select  buyer_idfrom  sales t1 left join product t2on  t1.product_id = t2.product_idwhere  product_name = 'iPhone')and  product_name = 'S8';

方法二:用sum(case when)的方法绕开子查询达到找差集目的,具体做法是求和时遇到“S8”就+1,遇到“iphone”就-1,只买“S8”的人的求和结果就是1,都买了的人就是0。

  select  buyer_idfrom  sales t1 left join product t2on  t1.product_id = t2.product_idwhere  product_name in ('S8', 'iPhone')
group by  buyer_idhaving  sum(distinct case product_name when 'S8' then 1 else -1 end) = 1;

第1126题

子查询中group by分组,avg()聚合求出每种event的平均数、主查询中原表natural join子查询、group by按business_id分组、having + sum(case when)统计并筛选。

  select  business_idfrom  events t1 natural join (  select  event_type,avg(occurences) as estdfrom  eventsgroup by  event_type) t2
group by  business_idhaving  sum(case when occurences > estd then 1 else 0 end) >= 2;

第1173题

sum()配合case语句只统计目标数量。还可以直接sum(条件),当条件为true时+1,条件为false时不加。

select  round(100 * sum(order_date = customer_pref_delivery_date) / count(*), 2) as immediate_percentagefrom  delivery;

第1179题(行转列/长转宽)

group by 分组、sum + case when 统计数据(每个月都要写一遍,写12遍)。

select  id,sum(case `month` when 'Jan' then revenue else null end) as Jan_Revenue,sum(case `month` when 'Feb' then revenue else null end) as Feb_Revenue,sum(case `month` when 'Mar' then revenue else null end) as Mar_Revenue,sum(case `month` when 'Apr' then revenue else null end) as Apr_Revenue,sum(case `month` when 'May' then revenue else null end) as May_Revenue,sum(case `month` when 'Jun' then revenue else null end) as Jun_Revenue,sum(case `month` when 'Jul' then revenue else null end) as Jul_Revenue,sum(case `month` when 'Aug' then revenue else null end) as Aug_Revenue,sum(case `month` when 'Sep' then revenue else null end) as Sep_Revenue,sum(case `month` when 'Oct' then revenue else null end) as Oct_Revenue,sum(case `month` when 'Nov' then revenue else null end) as Nov_Revenue,sum(case `month` when 'Dec' then revenue else null end) as Dec_Revenuefrom  department
group by  id;

第1211题

group by 单字段分组、avg()求平均数、sum()和case when配合统计目标数据的数量。

  select  query_name,round(avg(rating / position), 2) as quality,round(100 * sum(case when rating < 3 then 1 else 0 end) / count(*), 2) as poor_query_percentagefrom  queries
group by  query_name;

第1393题

case给买进金额加负号、sum()求和。

  select  stock_name,sum(case operation when 'Buy' then -price else price end) as capital_gain_lossfrom  stocks
group by  stock_name;

第1445题

group by 分组、sum(case when) 黄金组合做同列减法、order by 排序。

  select  sale_date,sum(case fruit when 'oranges' then -sold_num else sold_num end) as difffrom  sales
group by  sale_date
order by  sale_date;

第1479题

weekday取星期数,case when判断,sum()求和。

  select  item_category as category,sum(case when weekday(order_date) = 0 then quantity else 0 end) as monday,sum(case when weekday(order_date) = 1 then quantity else 0 end) as Tuesday,sum(case when weekday(order_date) = 2 then quantity else 0 end) as Wednesday,sum(case when weekday(order_date) = 3 then quantity else 0 end) as Thursday,sum(case when weekday(order_date) = 4 then quantity else 0 end) as Friday,sum(case when weekday(order_date) = 5 then quantity else 0 end) as Saturday,sum(case when weekday(order_date) = 6 then quantity else 0 end) as Sundayfrom  items t1 left join orders t2on  t1.item_id = t2.item_id
group by  item_category
order by  item_category;

第1511题

inner join联结3表、group by分组、having和case when配合筛选。

  select  t3.customer_id,t3.namefrom  orders t1 inner join product t2 inner join customers t3on  t1.product_id = t2.product_idand  t1.customer_id = t3.customer_id
group by  t1.customer_idhaving  sum(case when left(order_date, 7) = '2020-06' then t1.quantity * t2.price end) >= 100and  sum(case when left(order_date, 7) = '2020-07' then t1.quantity * t2.price end) >= 100;

第1661题

group by 分组、sum(case when) 黄金组合做同列减法求出总运行时间再除以count()进程数。

  select  machine_id,round(2 * sum(case when activity_type = 'start' then -timestamp else timestamp end) / count(process_id), 3) as processing_timefrom  activity 
group by  machine_id

第1699题

方法1:子查询中用case when 给from_id和to_id交换顺序、主查询中group by 联合分组、count()和sum() 统计。

SELECT person1,person2, count(*) call_count, sum(duration) total_duration 
FROM (
SELECT IF(from_id>to_id, to_id, from_id) person1, IF(from_id>to_id,from_id,to_id) person2, duration  
FROM calls 
) c 
GROUP BY person1, person2

方法2:子查询中用union all 分别把顺序对的数据、顺序不对但交换过的数据拼一起、后续同方法1。

  select  from_id as person1,to_id as person2,count(*) as call_count,sum(duration) as total_durationfrom  (   select  from_id,to_id,durationfrom  callswhere  from_id < to_idunion allselect  to_id as from_id,from_id as to_id,durationfrom  callswhere  from_id > to_id) t
group by  from_id, to_id;

第1777题(行转列/长转宽)

group by 分组、聚合函数(sum()和max()都可以)+ case when 统合数据。

  select  product_id,sum(if(store='store1', price, null)) as store1,sum(if(store='store2', price, null)) as store2,sum(if(store='store3', price, null)) as store3from  products
group by  product_id;

第1783题(列转行/宽转长)

方法1:用多个union all 将championships表的数据汇总到一列并将列名改成player_id、natural join players 表、group by 分组、count() 统计。

  select  player_id,player_name,count(*) as grand_slams_countfrom  (   select  wimbledon as player_idfrom  championshipsunion allselect  fr_open as player_idfrom  championships t1union allselect  US_open as player_idfrom  championshipsunion allselect  Au_open as player_idfrom  championships) t1 natural join players t2
group by  player_id;

方法2:两张表生成笛卡尔积,此时每个球员都对应一份championships表的全部数据、group by 分组、sum(case when) 筛选统计。

SELECT player_id,player_name,      SUM(IF(player_id=Wimbledon,1,0)+IF(player_id=Fr_open,1,0)+IF(player_id=US_open,1,0)+IF(player_id=Au_open,1,0)) AS grand_slams_count
FROM Championships 
JOIN Players 
GROUP BY player_id
HAVING grand_slams_count>0

第1934题

left join 左外联结、group by 分组、avg(case when) 把’confirmed’转成1,其他转成0然后求平均数。

  select  t1.user_id,round(avg(case action when 'confirmed' then 1 else 0 end), 2) as confirmation_ratefrom  signups t1 left join confirmations t2on  t1.user_id = t2.user_id
group by  t1.user_id;

第2308题

仔细审题可以发现,每种性别内部的user_id都是排好序的,即输出表中第一次出现的female、other、male的id一定是他们这个性别按id排序排第1的id。可以看做按性别分区后按id编号,以这个编号排序的结果。如果编号相同就按照female、other、male排序。所以思路是先rank()窗口函数按性别分区(partition by)、编号,然后按题目要求的性别顺序打上1、2、3三种tag。最后在主查询中order by排序。

  select  user_id,genderfrom  (select  user_id,gender,rank() over(partition by gender order by user_id) as rk,case gender when 'female' then 1when 'other' then 2else 3 end as tagfrom  genders) t
order by  rk, tag;

可搜索的case语句

第608题

case when + 多条件、去null值。

select  id,case when p_id is null then 'Root' wen id not in (  select  p_id from  treewhere  p_id is not null) then 'Leaf'else 'Inner' end as `type`from  tree;

第1294题

left join 两表联结、where筛选日期、group by分组、case when和avg计算平均天气并返回结果。

select  country_name,case when avg(weather_state) <= 15 then 'Cold' when avg(weather_state) >= 25 then 'Hot'else 'Warm' end as weather_typefrom  weather t1 left join countries t2on  t1.country_id = t2.country_id where  left(t1.day, 7) = '2019-11'  
group by  t1.country_id;

第1398题

natural join 两表联结、group by 分组、having + sum(distinct case when) 统计并筛选出只买了A和B却没有买C的用户。

具体做法是:买A的用case when标记成1,B的2,C的10,其他0,这样求和时只买了A和B却没买C的人一定是3,由于加了distinct,所以不会出现买多个A或B的人被重复计算的情况。

  select  customer_id,customer_namefrom  customers t1 natural join orders t2
group by  customer_idhaving  sum(distinct case when product_name = 'A' then 1 when product_name = 'B' then 2 when product_name = 'C' then 10 else 0 end) = 3;

第1440题

SQL里没有eval(),只有用case when。

inner join 把variables表联到expressions表上,联两次,一次是为了确定不等式左边,一次是右边、case when分别写出operator是>、<、=的三种情况、注意上一步可能计算不等式出来的结果是0和1,所以还要套一个case when或者if转成true和false。

select  left_operand,operator,right_operand,if(case operator when '>' then t2.value > t3.value when '<' then t2.value < t3.valueelse t2.value = t3.value end, 'true', 'false') as valuefrom  expressions t1 inner join variables t2on  t1.left_operand = t2.nameinner join variables t3on  t1.right_operand = t3.name;

第1468题

子查询中分组max()聚合查出每个公司的最高工资、主查询中原表natural join子表、case when分条件计算税后工资。

select  company_id,employee_id,employee_name,round(case when cstd < 1000 then salarywhen cstd between 1000 and 10000 then salary * 0.76else salary * 0.51 end, 0) as salaryfrom  salaries t1 natural join (  select  company_id,max(salary) as cstdfrom  salariesgroup by  company_id) t2;

第1873题

case when + 多条件。

select  employee_id,case when employee_id % 2 = 1 and left(name, 1) <> 'M' then salary else 0 end as bonusfrom  employees;

第1907题

sum(case when)计数、3种工资都统计完了之后union all。

   select  'Low Salary' as category,sum(if(income < 20000, 1, 0)) as accounts_countfrom  accounts
union allselect  'Average Salary' as category,sum(if(income between 20000 and 50000, 1, 0)) as accounts_countfrom  accounts
union allselect  'High Salary' as category,sum(if(income > 50000, 1, 0)) as accounts_countfrom  accounts;

第2051题

3表left join 联结(需要筛选)、group by 分组、case when 分类、分类标准里用两次count() 计算次数,买东西次数/来店次数的比值作分类标准。使用聚合函数时null会被剩下来,可以最后用else处理null的情况。

  select  t1.member_id,t1.name,case when count(t3.visit_id) / count(t2.visit_id) < 0.5 then "Silver" when count(t3.visit_id) / count(t2.visit_id) >= 0.8 then "Diamond" when count(t3.visit_id) / count(t2.visit_id) < 0.8 and count(t3.visit_id) / count(t2.visit_id) >= 0.5 then "Gold" else "Bronze" end as categoryfrom  members t1 left join visits t2on  t1.member_id = t2.member_idleft join purchases t3on  t2.visit_id = t3.visit_id
group by  t1.member_id;

第2072题

case when + 多条件。

select  case when (select  count(*) as result from  newyork where  score >= 90) > (select  count(*) as result from  california where  score >= 90)then 'New York University'when (select  count(*) as result from  newyork where  score >= 90) < (select  count(*) as result from  californiawhere  score >= 90) then 'California University'else 'No Winner' end as winner;

第2298题

weekday()从日期中取得星期几、sum(case when)分类统计、when里面写法是weekday(日期) in (元组),而不是一个个去枚举week(day)=0、=1、=2的情况。

select  sum(case when weekday(submit_date) in (5, 6) then 1 else 0 end) as weekend_cnt,sum(case when weekday(submit_date) in (5, 6) then 0 else 1 end) as working_cntfrom  tasks;

11. 窗口函数

img

功能函数
按照顺序编号,1、2、3、4row_number() over([partition by] … [order by] …)
跳跃排序,1、2、2、4rank() over([partition by] … [order by] …)
连续排序,1、2、2、3dense_rank() over([partition by] … [order by] …)
求和(不写order by 全加,写了只加到本行之前)sum() over([partition by] … [order by] …)
从col开始数,取前面offset行,如果没有就取到default值lag(col, offset, default) over([partition by] … [order by] …)
从col开始数,取后面offset行,如果没有就取到default值lead(col, offset, default) over([partition by] … [order by] …)
按列名2分区后求列名1的最小值,并将结果跟在本区所有行后面min(列名1) over(partition by 列名2)
按列名2分区后求列名1的最大值,并将结果跟在本区所有行后面max(列名1) over(partition by 列名2)
窗口函数聚合时的范围限定在前/后N范围的数据range N preceding/following
窗口函数聚合时的范围限定在前/后N行数据rows N preceding/following

注意区分rows 和 range,例:

有4行数据:2,3,4,7月。现在想对每个月的最近3月数据进行聚合,即对4月来说是2、3、4月,对7月来说是5、6、7月。

写rows 2 preceding,对于7月行会取到3、4、7月三行数据。

写range 2 preceding,对于7月行则只会取到7月一行数据,因为没有5、6月数据。

在这个案例中写range才是合适的。


第176题

方法1:子查询中distinct 去重、order by 排序、limit + offset 输出第二位、主查询再查一次把空值变成null。

SELECT(SELECT DISTINCTSalaryFROMEmployeeORDER BY Salary DESCLIMIT 1 OFFSET 1) AS SecondHighestSalary
;

方法2:子查询中dense_rank() 排序、主查询中where 筛选、sum()或者max()或者min()聚合函数把空值变成null。

select  max(salary) as SecondHighestSalaryfrom  (select  salary,dense_rank() over(order by salary desc) as rk from  employee) twhere  rk = 2;

第177题

类似第176题。用方法1要先set n = n - 1。

CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGINset n = n - 1;RETURN (# Write your MySQL query statement below.select  salaryfrom  (  select  distinct salary from  employeeorder by  salary desc) tlimit  1 offset n);
END

第178题

dense_rank()。

select  score,dense_rank() over(order by score desc) as 'rank'from  scores;

第185题

子查询中inner join 联表(题目要求就是没联上的地方不要了,而不用显示0)、dense_rank() 排个序、主查询中按序号筛选前3。

select  department,employee,salaryfrom  (select  t2.name as department,t1.name as employee,t1.salary,dense_rank() over(partition by t2.id order by t1.salary desc) as rkfrom  employee t1 inner join department t2on  t1.departmentid = t2.id) twhere  rk <= 3;

第534题

sum() 窗口函数。同1308、2066题。

select  player_id,event_date,sum(games_played) over(partition by player_id order by event_date) as games_played_so_farfrom  activity;

第550题

下次这种题目举例只给1个考文字理解的出题者能不能先打死啊?tnnd的我看了1小时题才看明白

子查询或者临时表中使用min()窗口函数按player_id分区求出初次登陆日期,再一口气求出当次日期与初次登陆日期差值,别名为diff,很明显每个player_id只有1个diff为1的行,即第二天登陆那行。主查询中count() 子查询/临时表里的where diff = 1的行、除以count(distinct player_id) from 原表、再套个select来查这个商,别名为fraction,注意如果就是为了计算,可以不写from。

with temp as (select  player_id,event_date - min(event_date) over(partition by player_id) as difffrom  activity
)select  round((select  count(*)from  tempwhere  diff = 1) / (select  count(distinct player_id) from  activity), 2) as fraction;

第569题

方法1:临时temp表中将每个公司的中位数计算出来(需要union all一次,因为偶数会对应两个中位数)。子查询中row_number() 窗口函数编号、联上temp表中的中位数、联上的地方就是结果。

with temp as (select  company,round((count(*) + 1) / 2, 0) as midfrom  employeegroup by  companyunion allselect  company,case when count(*) % 2 = 0 then count(*) / 2 endfrom  employeegroup by  company
)select  id,t1.company,salaryfrom  (select  id,company,salary,row_number() over(partition by company order by salary) as rkfrom  employee) t1 inner join temp t2on  t1.company = t2.companyand  t1.rk = t2.mid;

方法2:正反序找中位数。子查询中row_number() 窗口函数编正反序、count()窗口函数将总数算在后面、主查询中筛选出正反序都大于等于总数/2的数据。

select  id,t1.company,salaryfrom  (select  id,company,salary,row_number() over(partition by company order by salary, id) as rk1,row_number() over(partition by company order by salary desc, id desc) as rk2,count(*) over(partition by company) as cntfrom  employee) t1where  rk1 >= cnt / 2and  rk2 >= cnt / 2;

第571题

临时temp表中用recursive递归达到“解压数字”效果。子查询中用row_number()窗口函数分别按正反序编号、count()窗口函数加一列总数、主查询中筛选出中位数、avg求平均值。

with recursive temp as(select  num,frequencyfrom  numbersunion allselect  num,frequency - 1from  tempwhere  frequency > 1
)select  round(avg(num), 1) as medianfrom  (select  num,row_number() over(order by num) as rk1,row_number() over(order by num desc) as rk2,count(*) over() as cntfrom  temp) t where  rk1 >= cnt / 2and  rk2 >= cnt / 2

第579题

子查询中rank()窗口函数排序,方便一会儿剔除最大月份、sum()窗口函数求累计和,这里加上range 2 preceding表示求本月和最近的前两个月的累计和、主查询中筛选出排序大于1的数据、order by排序。

  select  id,month,salaryfrom  (select  id,month,rank() over(partition by id order by month desc) as rk,sum(salary) over(partition by id order by month range 2 preceding) as salaryfrom  employee) twhere  rk > 1
order by  id asc, month desc;

第613题

方法一:两表inner join 自连接再筛选。计算出任意两点间的距离,再min()找出最小的距离。

方法二:由于最小距离只存在于相邻两个点中,所以没必要算出全部任意两点间的距离。先用窗口函数rank排序+编号。再左外联结拼出一张同时有某点和它右一个点的坐标的临时表,并select出这个临时表中相邻两点的距离。最后min()找出最小距离。

select  distinct min(dis) as shortestfrom  (select  abs(t2.x - t1.x) as disfrom  (  select  x, rank() over(order by x) as idfrom  point) t1 left join (  select  x,rank() over(order by x) as idfrom  point) t2on  t1.id + 1 = t2.id) t3;

第618题

第一个子查询查美洲学生,后面的学生都没美洲学生多,所以可以用left join联上去。给每个子查询都用row_number()窗口函数编号,方便后续按照编号联结。

select  America,Asia,Europefrom  (select  name as America,row_number() over(order by name) as rkfrom  studentwhere  continent = 'America') t1 left join(select  name as Asia,row_number() over(order by name) as rkfrom  studentwhere  continent = 'Asia') t2on  t1.rk = t2.rkleft join(select  name as Europe,row_number() over(order by name) as rkfrom  studentwhere  continent = 'Europe') t3on  t1.rk = t3.rk;

第1076题

方法1:子查询中group by按照id分组、count(*)计算人数、主查询中再用group by分组、having筛选、count( * )>=all(子查询中人数)。

select  project_idfrom  (   select  project_id,count(project_id) as numfrom  projectgroup by  project_id) twhere  num >= all(  select  count(project_id)from  projectgroup by  project_id)

方法2:子查询中group by按照id分组、用rank()函数按照count(*)排序(这里比较重要的是rank里是可以用聚合函数的)、主查询中where筛选排序第1名的(rank和dense_rank都可以排出并列排名)。

select  project_idfrom  (  select  project_id,rank() over(order by count(*) desc) as numfrom  projectgroup by  project_id) twhere  num = 1;

第1077题

子查询中rank() 窗口函数排序、主查询中筛选排1名的。

select  project_id,employee_idfrom  (select  project_id,employee_id,rank() over(partition by project_id order by experience_years desc) as rankingfrom  project t1 natural join employee t2) twhere  ranking = 1;

第1112题

方法1:子查询中查出max()分数、inner join联上原表,条件是id一样、grade一样,则不是最高分的没联上就筛掉了、分组并输出数据(因为有并列第一,所以course_id要输出min()最小值)。

  select  t1.student_id,min(t1.course_id) as course_id,t1.gradefrom  enrollments t1 inner join (  select  student_id,max(grade) as bestfrom  enrollmentsgroup by  student_id) t2on  t1.student_id = t2.student_idand  t1.grade = t2.best
group by  student_id
order by  student_id;

方法2:子查询中rank()窗口函数排序、主查询中筛选rank = 1。

第1204题

方法同2066题以及1308题,sum+窗口函数求累计和,不赘述。最后多一个order by 和limit 1。

  select  person_namefrom  (select  person_name,sum(weight) over(order by turn) as afrom  queue) t where  a <= 1000
order by  a desclimit  1;

第1225题

思路是分别找出失败任务和成功任务再union all,下以失败任务为例讲解。先where筛选出2019年数据、date_sub()配合row_number()编号法找差值别名为diff(编号法用过很多次了不细讲了)、主查询中按diff分组查出min(fail_date)和max(fail_date)作为start和end日期。

   select  'failed' as period_state,min(fail_date) as start_date,max(fail_date) as end_datefrom  (select  fail_date,date_sub(fail_date, interval row_number() over(order by fail_date) day) as diff from  failedwhere  year(fail_date) = 2019) tgroup by  diff
union allselect  'succeeded' as period_state,min(success_date) as start_date,max(success_date) as end_datefrom  (select  success_date,date_sub(success_date, interval row_number() over(order by success_date) day) as diff from  succeededwhere  year(success_date) = 2019) tgroup by  difforder by  start_date;

第1285题

方法1:两个logs表cross join自联结、起点-1不在logs表中,终点+1不在logs表中,且终点值大于起点值,借此筛选出起点和终点、由于用了笛卡尔积,每个起点都对应全部终点,所以再用group by和min() 函数选出终点。

  select  t1.log_id as start_id,min(t2.log_id) as end_idfrom  logs t1 inner join logs t2on  t1.log_id <= t2.log_idwhere  (t1.log_id - 1) not in (select  log_id from  logs)and  (t2.log_id + 1) not in (select  log_id from  logs)
group by  start_id
order by  start_id;

方法2:基本原理:连续区间中的数与某一基准等差数列的差值是固定的。例如123567与123456一一对应相减的差是000111,可借此差值对区间进行分类。

log_id - row_number() over() 得到差值、借之前的差值分组、每组最小的为start,最大的为end。

第1308题

sum() over() 窗口函数、order by 排序。和2066题差不多。

  select  gender,day,sum(score_points) over(partition by gender order by day) as totalfrom  scores
order by  gender, day;

第1369题

子查询中窗口函数rank()排序,count()计数、主查询中筛选出排第二的数据或者计数只有1的数据。

   select  username,activity,startdate,enddatefrom  (select  username,activity,startdate,enddate,rank() over(partition by username order by startdate desc) as rk,count(*) over(partition by username) as cntfrom  useractivity) twhere  rk = 2or  cnt = 1;

第1412题

临时temp表中用min()、max()窗口函数找到每次考试最高最低分,再顺手求个当前学生分数和最高/最低分的分差、套一层查询筛选出两个分差乘积为0的id,这一步也就是在找考了最高分或最低分的id。

主查询中where筛选出在exam表(即参加了考试),但不在temp表(即没考到最高或最低分)中的人。

with temp as (select  distinct student_idfrom  (select  student_id,score - min(score) over(partition by exam_id order by score asc) as d1,score - max(score) over(partition by exam_id order by score desc) as d2from  exam) t where  d1 * d2 = 0    
)select  student_id,student_namefrom  studentwhere  student_id in (select  student_id from  exam)and  student_id not in (select  student_idfrom  temp);

第1454题(连续区间)

临时表对原表按照id和login_date联合去重,防止1个人1天登录多次。子查询中row_number()窗口函数编号准备找连续区间,但是这里有个问题,即原表是没有编号的,只能用日期减,即再窗口函数外面再套一个date_sub() 函数,给差别名为diff。主查询里将子查询right join到accounts表中(注意这本来inner join更好,采取right join的原因是之后就不用排序了)、按id和diff联合分组、having + count(*) 筛选连续天数大于等于5天的人、最后select时要去重,因为有的人这段时间连续登陆5天以上,后面哪段时间又连续登陆5天以上。

with temp as (select  distinct id,login_datefrom  logins
)select  distinct t1.id,t2.namefrom  (select  id,date_add(login_date, interval -row_number() over(partition by id order by login_date) day) as difffrom  temp) t1 right join accounts t2on  t1.id = t2.id
group by  t1.id, diffhaving  count(*) >= 5;

第1532题

子查询中rank() over() 按用户id分区+排序、主查询中筛选出排名<=3的数据并natural join上原来的orders表和customers表、最后order by排序。

  select  name as customer_name,customer_id,order_id,order_datefrom  (select  order_id,rank() over(partition by customer_id order by order_date desc) as rk from  orders) t1 natural join orders t2 natural join customers t3where  rk <= 3
order by  customer_name, customer_id, order_date desc;

第1549题

子查询中rank窗口排序按product_id分区,order_date降序排序、主查询中筛选排第1的数据。

  select  product_name,product_id,order_id,order_datefrom  (select  t2.product_name,t1.product_id,t1.order_id,order_date,rank() over(partition by t1.product_id order by t1.order_date desc) as rankingfrom  orders t1 natural join products t2) twhere  ranking = 1
order by  product_name, product_id, order_id;

第1596题

子查询中group by customer_id、product_id联合分组、rank() over(partition by customer_id order by count(*))、主查询中having 筛选出排名1的数据。这道题需要注意的两点是:1. 先按两个字段联合分组再在窗口函数中按单个字段分区;2. 窗口函数中还可以写聚合函数。

select  customer_id,product_id,product_namefrom  (  select  customer_id,product_id,rank() over(partition by customer_id order by count(*) desc) as rankingfrom  ordersgroup by  customer_id, product_id) t1 natural join products t2where  ranking = 1;

第1709题

方法1(开窗):子查询中用lead() 窗口函数把每个日期的下一次日期拼到后面、主查询中日期做差、分组聚合。

SELECTuser_id,MAX(DATEDIFF(next_day, visit_date)) AS biggest_window
FROM (SELECTuser_id,visit_date,LEAD(visit_date, 1, '2021-1-1') OVER (PARTITION BY user_id ORDER BY visit_date) AS next_dayFROM UserVisits
) tmp
GROUP BY user_id
ORDER BY user_id

方法2(不开窗):子查询中原表left join自连接、筛选条件是两表user_id相等且表2日期大于表1日期、两表日期相减(这里还要用coalesce处理null值)、主查询中group by分组、max() 聚合。

  select  user_id,max(diff) as biggest_windowfrom  (  select  t1.user_id,datediff(min(coalesce(t2.visit_date, '2021-1-1')), t1.visit_date) as difffrom  uservisits t1 left join uservisits t2on  t1.user_id = t2.user_idand  t1.visit_date < t2.visit_dategroup by  t1.user_id, t1.visit_date) t
group by  user_id
order by  user_id;

第1811题

临时表temp中将原比赛表的3列转成1列方便统计每个人的获奖情况、找连续区间用编号法,算出contest_id和row_number() 编号的差值方便后续按差值diff分组。主查询中先将得了3次金奖的人选出并union all 连续3次得奖的人、group by按照id和diff联合分组,having + count() 计数选出连续得奖3次的人。再套一层查询natural join user表得到名字和邮箱、distinct去重以免有些又连续3次获奖又得3次金奖的人被算多。

with temp as (select  contest_id,player,contest_id - row_number() over(partition by player order by contest_id) as difffrom  (   select  contest_id,gold_medal as playerfrom  contestsunion allselect  contest_id,silver_medalfrom  contestsunion allselect  contest_id,bronze_medalfrom  contests) t
)select  distinct name,mailfrom  (   select  gold_medal as user_idfrom  contestsgroup by  gold_medal having  count(*) >= 3union allselect  playerfrom  tempgroup by  player, diffhaving  count(*) >= 3) t natural join users;

第1831题

方法1:子查询中rank() over() 排出每天amount的名次、主查询中得到排第1名的数据。

方法2:子查询中group by 分组、max() 求最大值、inner join 到原表上、筛选出原表中日期等于子查询日期且amount = 当日最大值的数据。

  select  transaction_idfrom  transactions t1 inner join(  select  date(day) as day,max(amount) as firstfrom  transactionsgroup by  date(day)) t2on  date(t1.day) = t2.dayand  t1.amount = t2.first
order by  transaction_id;

第1875题

两个原表用inner join按照工资相同id不同自连接以排除掉工资独一无二的员工、dense_rank() 窗口函数按工资排出顺序、distinct去除联结后重复出现的数据、order by 排序。

  select  distinct t1.employee_id,t1.name,t1.salary,dense_rank() over (order by t1.salary) as team_idfrom  employees t1 inner join employees t2on  t1.salary = t2.salaryand  t1.employee_id <> t2.employee_id
order by  team_id asc, employee_id asc

第1951题

子查询中2个ralations表自连接让拥有相同follower的用户处在同行方便之后计数、联结筛选条件是follower_相等且联上来的t2表user_id > 原来t1表的user_id(不然1,7和7,1会算成两种情况)、group by联合分组、rank() 窗口函数以count(*)排序编号、主查询中筛选出排1名的。

select  user1_id,user2_idfrom  (  select  t1.user_id as user1_id,t2.user_id as user2_id,rank() over(order by count(*) desc) as rankingfrom  relations t1 inner join relations t2on  t1.follower_id = t2.follower_idand  t1.user_id < t2.user_idgroup by  t1.user_id, t2.user_id) twhere  ranking = 1;

第2066题

方法1:sum(case when) over(partition by … order by …) 改数据符号并求和。sum() + 窗口函数刚好可以满足这道题的要求。

  select  account_id,day,sum(case type when 'Deposit' then amount else -amount end) over(partition by account_id order by day) as balancefrom  transactions 
order by  account_id, day;  

方法2:transactions 表重复联结2次、筛选只留下表1日期>=表2日期的数据,即对于7号只留7号数据,对于9号则需要7号和9号两天的数据、sum(case when)求和。

第2112题

孙查询中union all把出发机场和到达机场列联结起来、子查询中分组聚合求飞机数并按飞机数用rank() 窗口函数排序、父查询中筛选出排第1的。

select  airport_idfrom  (  select  airport_id,rank() over(order by sum(flights_count) desc) as numfrom  (   select  departure_airport as airport_id,flights_countfrom  flights union all  select  arrival_airport as airport_id,flights_count from  flights) t1group by  airport_id) t2where  num = 1;

第2159题

由于一个表中不可能排2个序,所以使用两个子查询分别排序。按照1列升序和2列降序打编号+排序,打编号用row_number() 窗口函数、两个子查询按编号inner join、主查询中查出表1的1列和表2的2列。

select  t1.first_col,t2.second_colfrom  (  select  first_col,row_number() over(order by first_col asc) as tagfrom  data) t1 inner join (  select  second_col,row_number() over(order by second_col desc) as tagfrom  data) t2using(tag);

第2175题

子查询中先把变化后的新分数作为新列添加原表上、套一层查询用row_number窗口函数计算原排名和新排名、再套一层函数计算排名差。注意这个排名差设置了非负性,不能直接减,要么用if语句、要么用cast() 函数做类型转换再减。

select  team_id,name,if(rk1 > rk2, rk1 - rk2, -(rk2 - rk1)) as rank_difffrom  (select  team_id,name,row_number() over(order by points desc, name asc) as rk1,row_number() over(order by new_points desc, name asc) as rk2from  (select  team_id,name,points,points + points_change as new_pointsfrom  teampoints natural join pointschange) t0) t1

cast(列名 as 数据类型)

第2228题

方法1:两个原表自连接、筛选条件是表2的id和表1id相同,且表2日期大于等于表1日期,且表2purchase_id不等于表1purchase_id、group by 按表1user_id和表1purchase_date联合分组、取得min(表2purchase_date),至此终于找到每一天对应的最近的下一次购买日期、datediff求差值、写进having里筛选。

  select  distinct t1.user_idfrom  purchases t1 inner join purchases t2on  t1.user_id = t2.user_idand  t2.purchase_date >= t1.purchase_dateand  t1.purchase_id <> t2.purchase_id
group by  t1.user_id, t1.purchase_datehaving  datediff(min(t2.purchase_date), t1.purchase_date) <= 7
order by  user_id

方法2:子查询中用lag()窗口函数按user_id分区,purchase_date和purchase_id排序,将每次购买日期的上次购买日期接在每次购买日期后面。主查询中datediff算差值、where筛选。

  select  distinct user_idfrom  (select  user_id,purchase_date,lag(purchase_date, 1, '2000-01-01') over(partition by user_id order by purchase_date, purchase_id) as last_datefrom  purchases) twhere  datediff(purchase_date, last_date) <= 7
order by  user_id

第2292题

创建临时表,按product_id和purchase_date中的年份联合分组、having+count() 筛选出卖出3份及以上的数据。临时表再套一层查询,使用lag() 窗口函数按product_id分区,将每年的上一次年份接到后面、再套一层查询筛选出两个年份相差1年,即相邻的数据、distinct去重。

with temp as (select  product_id,year(purchase_date) as pyearfrom  ordersgroup by  product_id, pyearhaving  count(*) >= 3
)select  distinct product_idfrom  (select  product_id,pyear - lag(pyear, 1, '0000') over(partition by product_id order by pyear) as difffrom  temp) twhere  diff = 1;

第2308题

仔细审题可以发现,每种性别内部的user_id都是排好序的,即输出表中第一次出现的female、other、male的id一定是他们这个性别按id排序排第1的id。可以看做按性别分区后按id编号,以这个编号排序的结果。如果编号相同就按照female、other、male排序。所以思路是先rank()窗口函数按性别分区(partition by)、编号,然后按题目要求的性别顺序打上1、2、3三种tag。最后在主查询中order by排序。

  select  user_id,genderfrom  (select  user_id,gender,rank() over(partition by gender order by user_id) as rk,case gender when 'female' then 1when 'other' then 2else 3 end as tagfrom  genders) t
order by  rk, tag;

第2314题

子查询中排序、主查询中筛选第一名。

select  city_id,day,degree from  (select  city_id,day,degree,rank() over(partition by city_id order by degree desc, day asc) as rk from  weather) twhere  rk = 1;

第2324题

子查询中按user_id和product_id联合分组、sum求出总开销、rank窗口函数按照总开销排序、主查询中筛选第一名。

select  user_id,product_idfrom  (  select  user_id,product_id,rank() over(partition by user_id order by sum(quantity * price) desc) as rkfrom  sales t1 inner join product t2using(product_id)group by  user_id, product_id) t
where  rk = 1;

第2346题

子查询中rank窗口函数排序、count窗口函数计数、主查询中计算百分数。

select  student_id,department_id,round(coalesce((rk - 1) * 100 / (cnt - 1), 0), 2) as percentagefrom  (select  student_id,department_id,rank() over(partition by department_id order by mark desc) as rk,count(student_id) over(partition by department_id) as cntfrom  students) t

12. 限定谓词

功能用法
筛选大于或小于子查询中全部结果的数据where/having + 字段名 > 或 < all(子查询)
筛选大于或小于子查询中任意结果的数据where/having + 字段名 > 或 < any(子查询)

注意:子查询中只能输出1列,不然报错,哪怕是横着的1排也不行。


第574题

方法1:子查询中分组count()计算出每个候选人票数、主查询中再次分组计数、having > all(子查询)筛选。

  select  namefrom  vote t1 inner join candidate t2on  t1.candidateid = t2.id
group by  t1.candidateidhaving  count(*) >= all(  select  count(*)from  votegroup by  candidateid);

方法2:子查询中分组count()计算出每个候选人票数、order by 排序、limit 1输出第一名、主查询中只查name。

第586题

方法1:group by 分组、order by + count(*)排序、limit限制只输出第一名。(这个方法不能有并列第一,当然本题明说了没有并列第一)

方法2:group by 分组、having筛选出count(*) >= all(子查询出每个customer的count( *)) 。

第1076题

方法1:子查询中group by按照id分组、count(*)计算人数、主查询中再用group by分组、having筛选、count( * )>=all(子查询中人数)。

select  project_idfrom  ( select  project_id,count(project_id) as numfrom  projectgroup by  project_id) twhere  num >= all(  select  count(project_id)from  projectgroup by  project_id)

方法2:子查询中group by按照id分组、用rank()函数按照count(*)排序(这里比较重要的是rank里是可以用聚合函数的)、主查询中where筛选排序第1名的(rank和dense_rank都可以排出并列排名)。

select  project_idfrom  (  select  project_id,rank() over(order by count(*) desc) as numfrom  projectgroup by  project_id) twhere  num = 1;

第1082题

子查询+group by建立临时表得到每人总销售额、主表用having筛选出sum()总销售额>=all临时表里销售额的销售者id。

  select  seller_idfrom  sales
group by  seller_idhaving  sum(price) >= all(  select  sum(price)from  salesgroup by  seller_id);

第1355题

方法1:子表分组+count(*)统计出每个活动人数、主表用having+count( *)筛选、筛选条件是> and < any 子表统计出来的人数。这里用any而不用all的原因是,只要在子表中找到1个活动的人数比主表人数多,那主表人数肯定就不是最大值了,同理子表中只要找到1个活动人数比主表少,那主表人数肯定就不是最小值了,两个条件加起来就能筛选出中间值。

  select  activityfrom  friends
group by  activityhaving  count(*) > any(  select  count(*)from  friendsgroup by  activity)and  count(*) < any(  select  count(*)from  friendsgroup by  activity);

方法2:with临时表中rank() over(order by count(*))按人数排序、主查询中筛选排名不等于1且< max(排名) 的数据。

with temp as (  select  activity,rank() over(order by count(*)) as rankingfrom  friendsgroup by  activity)select  activityfrom  tempwhere  ranking not in (select  max(ranking)from  temp)and  ranking <> 1;

第1867题

子查询中查出每个order_id分组下quantity的平均值、主查询中再次group by 分组并用having 筛选出max(quantity) > all子查询平均值的数据。

  select  order_idfrom  ordersdetails
group by  order_idhaving  max(quantity) > all(  select  avg(quantity)from  ordersdetailsgroup by  order_id);

13. like模糊匹配和regexp 正则表达式

功能用法
模糊查询where 列名 like ‘…’
正则查询where 列名 regexp ‘…’

第1517题

正则匹配。

select  user_id,name,mailfrom  userswhere  mail regexp('^[a-z][-a-z_./0-9]*@leetcode[.]com$');

第1527题

like 模糊匹配。

select  patient_id,patient_name,conditionsfrom  patientswhere  conditions like 'DIAB1%' or  conditions like '% DIAB1%';

14. 临时表

用法:with 表名 as (查询语句)。之后要使用时就用表名。


第601题(找连续区间用编号法,同1285题)

临时表中筛选出people >= 100的数据,再用row_number()编号,计算id和编号的差值diff,方便后续根据diff分组。子查询中根据diff,group by 分组、having + count() 筛选出连续3次及以上的diff、主查询中筛选出diff in 子查询的数据。

with temp as (select  id,visit_date,people,id - row_number() over(order by id) as difffrom  stadiumwhere  people >= 100)
select  id,visit_date,peoplefrom  tempwhere  diff in (  select  difffrom  tempgroup by  diffhaving  count(*) >= 3)

第1164题

方法1:子查询中分组聚合查出日期在8月16号前的max(日期),即16号前的最近改动日期(这里没法连着new_price一起查哈!分组后的数据是不会自动一一对应的)、再套一层查询用where (id, 日期) in 子查询,查出id和new_price、将之前的查询结果外联结到一个distinct id的子查询表中、这样联上了的就按联上的价格取,没联上的就是10元。

方法2:既然原表没有初始10元这个数据,我们就自己加上去。创建临时表:原表union all 查出每个id原价10元的子查询(包含distinct id、10、初始日期’0000-00-00’),之后涉及from的全部用这个临时表、where筛选出16号之前的数据、同方法1先查出id和max(日期)、再套一层查询得到id和价格。

with temp as (select  product_id,new_price,change_datefrom  productsunion all select  distinct product_id,10,'0000-00-00'from  products
)select  product_id,new_price as price from  tempwhere  (product_id, change_date) in (  select  product_id,max(change_date) as change_datefrom  tempwhere  change_date <= '2019-08-16'group by  product_id);

第1949题

由于每个人的朋友散布在user1_id和user2_id两列中,所以先用union all将原表和交换两列左右顺序后的新表进行并集,以便后续找某个id的朋友时只需要看第二列即可,再用临时表将上述查询结果保存下来取名为temp以免重复写。原表t1 inner join temp表 t2,将t1.user1_id的朋友联过来、继续inner join temp表t3,将t1.user2_id和前面联的t1.user1_id的共同朋友联过来、group by分组、count(t3中的id)计数、having筛选。

with temp as (   select  user1_id,user2_id from  friendshipunion all select  user2_id,user1_idfrom  friendship)select  t1.user1_id,t1.user2_id,count(t3.user2_id) as common_friendfrom  friendship t1 inner join temp t2on  t1.user1_id = t2.user1_idinner join temp t3on  t1.user2_id = t3.user1_idand  t2.user2_id = t3.user2_id
group by  t1.user1_id, t1.user2_idhaving  common_friend >= 3;

第2238题

方法1:with创建去重后的driver_id临时表、临时表left join原表,将driver_id和passenger_id相等的地方联结起来、分组后count()计数。

with temp as(select  distinct driver_id from  rides
)select  t1.driver_id,count(t2.ride_id) as cntfrom  temp t1 left join rides t2on  t1.driver_id = t2.passenger_id
group by  t1.driver_id;

方法2:和方法1类似,但不创建临时表。由于没有创建临时表去重,每个相同的driver_id都对应同样的多个passenger_id。例如7号当了2次司机3次乘客,这样联结就变成了每次当司机的数据都连上了3次乘客数据,count()数出来就一共当了6次乘客。所以还要再用子查询查出每个人当司机的次数,联上原表后除以这个次数。

15. recursive递归

第571题

临时temp表中用recursive递归达到“解压数字”效果。子查询中用row_number()窗口函数分别按正反序编号、count()窗口函数加一列总数、主查询中筛选出中位数、avg求平均值。

with recursive temp as(select  num,frequencyfrom  numbersunion allselect  num,frequency - 1from  tempwhere  frequency > 1
)select  round(avg(num), 1) as medianfrom  (select  num,row_number() over(order by num) as rk1,row_number() over(order by num desc) as rk2,count(*) over() as cntfrom  temp) t where  rk1 >= cnt / 2and  rk2 >= cnt / 2

第1613题

with + recursive 递归创建临时表、从临时表中查出不在customers表中,且小于customers表最大id的数据。

with recursive temp as (select  1 as n union allselect  n + 1 from  tempwhere  n < 100
)select  n as idsfrom  tempwhere  n not in (select  customer_idfrom  customers)and  n < (select  max(customer_id)from  customers);

第1651题

临时boxes表中创建1到10的序号列。主查询中将rides表left join到boxes表上,连接条件是月份不能超过n 2个月,顺便把2020年以外的数据剔除、再left join AcceptedRides表、group by分组、计算对应数据。

with recursive boxes as (select  1 as nunion allselect  n + 1from  boxes where  n <= 9
)select  n as month,coalesce(round(sum(ride_distance) / 3, 2), 0) as average_ride_distance,coalesce(round(sum(ride_duration) / 3, 2), 0) as average_ride_durationfrom  boxes t1 left join rides t2on  month(t2.requested_at) between t1.n and t1.n + 2and  year(t2.requested_at) = 2020left join acceptedrides t3on  t2.ride_id = t3.ride_id
group by  t1.n
order by  t1.n;

第1767题

with recursive创建临时temp表、递归条件是从subtasks_count开始递减,减到1为止。temp表left join executed表,连不上的地方就是未完成任务(这么做比where not in找差集效率高很多)。

with recursive temp as (select  task_id,subtasks_countfrom  tasksunion allselect  task_id,subtasks_count - 1 as subtasks_countfrom  tempwhere  subtasks_count >= 2
)select  t1.task_id,subtasks_count as subtask_idfrom  temp t1 left join executed t2on  t1.task_id = t2.task_idand  t1.subtasks_count = t2.subtask_idwhere  t2.task_id is null;

http://chatgpt.dhexx.cn/article/Z7YIlPLW.shtml

相关文章

数据库题目之数据库设计

一、选择题 1、在数据库设计中&#xff0c;用E-R图来描述信息结构但不涉及信息在计算机中的表示&#xff0c;它是数据库设计的 阶段。 A&#xff0e;需求分析 B&#xff0e;概念设计 C&#xff0e;逻辑设计 D&#xff0e;物理设计 【答案&#xff1a;】B 2、在关系…

数据库常考大题

宝子们心心念念的数据库大题来啦~ 我已经整理好啦~你们根据目录找对应题目就好啦&#xff0c;嘿嘿~姐姐是不是还有点小贴心呢&#xff1f; &#x1f6f8; &#x1f30f; &#x1f318; • . ✯✯ ★ * &#x1f6f0; . • ★ • ▁▂▃▄▅▆▇▇▆▅▄▃▁▂▃. 目录…

33道数据库题目

33道数据库题目 准备三张表题目 准备三张表 emp–员工表 dept–部门表 salgrade–工资等级表 题目 取得每个部门最高薪水的人员名称 (方法1) select ename, sal, deptno from emp where sal in (select max(sal) from emp group by deptno);(方法2&#xff09; select e.…

50道数据库SQL练习题(深入理解各关键字的作用)

目录 表结构创建表练习题1、查询“001”课程比“002”课程成绩高的所有学生的学号2、查询所有同学的学号、姓名、选课数、总成绩3、查询平均成绩大于60分的同学的 学号和平均成绩4、查询姓“葛”的老师的个数5、查询没学过“五木”老师课的同学的学号、姓名6、查询学过“101”并…

数据库复习题选择题+判断题+填空题(考试续命必备

数据库复习题选择题判断题填空题(考试续命必备 一些选择题 1、从计算机数据管理的角度看&#xff0c;信息就是数据&#xff0c;数据就是信息。&#xff08;B &#xff09; A、对 B、错 (描述事物的符号记录称为数据 数据库的数据项之间无联系&#xff0c;记录之间存在联系。…

【课后习题】高等数学第七版上第二章 导数与微分 第五节 函数的微分

习题2-5 1. 已知 y x 3 − x yx^3-x yx3−x, 计算在 x 2 x2 x2 处当 Δ x \Delta x Δx 分别等于 1 , 0.1 , 0.01 1,0.1,0.01 1,0.1,0.01 时的 Δ y \Delta y Δy 及 d y \mathrm{d} y dy. 2. 设函数 y f ( x ) yf(x) yf(x) 的图形如下, 试在图(a)、(b)、&#xff0…

高等数学同济七版课后习题答案

高等数学同济七版课后习题答案上册下册,习题全解指南。 一、《高等数学》(第七版)下册习题全解 第八章 向量代数与空间解析几何 下载地址: 链接:https://pan.baidu.com/s/185C8RB4Y9pYO84V4Rup1Wg 提取码:p0o8 习题8-1 向量及其线性运算 习题8-2 数量积 向量积 *混合积…

高等数学(第七版)同济大学 习题10-3 (后6题)个人解答

高等数学&#xff08;第七版&#xff09;同济大学 习题10-3&#xff08;后6题&#xff09; 函数作图软件&#xff1a;Mathematica 10. 利 用 球 面 坐 标 计 算 下 列 三 重 积 分 &#xff1a; \begin{aligned}&10. \ 利用球面坐标计算下列三重积分&#xff1a;&\end…

高等数学(第七版)同济大学 习题1-8 个人解答

高等数学&#xff08;第七版&#xff09;同济大学 习题1-8 函数作图软件&#xff1a;Mathematica 1. 设 y f ( x ) 的图形如图 1 − 39 所示&#xff0c;试指出 f ( x ) 的全部间断点&#xff0c;并对可去间断点补充或 修改函数值的定义&#xff0c;使它成为连续点。 \begin{…

浅谈一下前端单元测试

关于单元测试这个概念&#xff0c;我想很多前端的小伙伴都知道&#xff0c;但是却并不一定能描述清楚。由于我开始接触单元测试还是在四个月前&#xff0c;当时也只是做了一些纯函数的单元测试。所以在这里只能说浅谈一下前端单元测试。 什么是单元测试&#xff1f; 我理解的…

web前端测试要点

【说明】 JS压缩&#xff1a; 目的&#xff1a; 1、减少JS代码容量&#xff0c;增加下载速度和执行速度&#xff1b; 2、压缩后的JS代码不具备可识性&#xff0c;在一定程度上达到加密效果&#xff0c;防止被人轻易使用。 常规Javascript压缩的原理&#xff1a; 1、压缩多余的…

为什么必须执行前端测试?

对于网站的真实前端测试&#xff0c;必须在不同的设备和浏览器(具有多个版本)上检查功能和性能。在不同浏览器、浏览器版本和操作系统上评估网站的过程称为跨浏览器测试&#xff0c;它被认为是每个前端网站测试计划的重要组成部分&#xff0c;用于评估你能够通过无缝UI和UX取悦…

前端接口测试

背景 由于需求不断更改&#xff0c;项目前后端分离后&#xff0c;后端开发人员经常需要改字段加字段&#xff0c;也有可能删掉接口&#xff0c;这时候前端如果没有相对应的调整&#xff0c;就容易造成bug。为了解决这个问题&#xff0c;我们需要对接口进行测试。 测试被调用接…

前端测试方法

最近在学校的《系统分析与设计》一课的大作业上&#xff0c;由于我担任的是测试工程师的角色&#xff0c;因此小小的研究了一些前端和后端的测试到底要怎么做。本文着重于前端测试方法。 1. 什么是测试&#xff1f; 我把测试定义成&#xff1a;是一段检测你的应用代码&#xf…

前端测试都要测什么

单元测试&#xff0c;Unit Testing&#xff0c;简称 UT&#xff0c;是指对软件中的最小可测试单元进行检查和验证&#xff0c;这是最低级别的测试活动&#xff0c;前端开发中单元可以是一个 function 也可以是一个 class&#xff0c;也可以是一个组件。对他们的输出做断言检查&…

前端测试开发工具--mock 的使用

目录 1. 背景 2. Mock是什么 3. Mock能做什么 4. Mock实现方式 5. Mock市面上常见的解决方案 6. Python下unittest.mock使用 1. 背景 在实际产品开发过程中&#xff0c;某个服务或前端依赖一个服务接口&#xff0c;该接口可能依赖多个底层服务或模块&#xff0c;或第三方…

浅谈前端测试

浅谈前端测试 浅谈 TDD 和 BDD TDD Test Driven Development (测试驱动开发) 一种使用自动化单元测试来推动软件设计并强制依赖关系解耦的技术。使用这种做法的结果是一套全面的单元测试&#xff0c;可随时运行&#xff0c;以提供软件可以正常工作的反馈。大概的流程是先针对…

前端测试介绍

测试,作为软件工程的一项重要环节,用来保证项目的正确性,完整性,安全性和可靠性。 前端测试是前端工程化的重要环节,根据测试的粒度可以分为单元测试,功能测试(E2E测试),集成测试。 前端测试框架 单元测试 - Mocha - Jasmine - Jest 断言库 - chai - Jest - …

前端测试接口,POSTMAN一键调试

前端测试接口&#xff0c;POSTMAN一键调试 当我们在开发中&#xff0c;遇到接口有问题时&#xff0c;一般都会找后端battle一下&#xff0c;在这之前我们需要先确认问题&#xff0c;当我们浏览器不方便调试时&#xff0c;我们一般会借助postman&#xff0c;因为比较方便也比较…

【测试数据准备-绕过后端,前端测试】

提示&#xff1a;文章写完后&#xff0c;目录可以自动生成&#xff0c;如何生成可参考右边的帮助文档 文章目录 前言一、 Charles 简单介绍二、本文使用场景三、Charles 通过修改后端返回值&#xff0c;在前端展示。1.工具安装和注册2.使用端点功能修改接口返参数 总结 前言 测…