Sunday, September 13, 2020

Recursive CTE | Print total number of unique hackers who made at least one submission each day

/*
https://www.hackerrank.com/challenges/15-days-of-learning-sql/problem
Query to print total number of unique hackers who made at least one submission each day (starting on the first day of the contest), and find the hacker_id and name of the hacker who made maximum number of submissions each day.
Solution/Approach:
This question comprises of two subproblems:
1. Name and Id of hacker who did max number of submissions on given day
2. Count of hackers who submitted on given day and all previous days
To solve first problem, we can use ROW_NUMBER function of SQL Server. Using this function we can partition data by submission date and then order by count in descending and hackerId ascending orders. Ordering by hackerId is required because problem statement says if more than one such hacker has a maximum number of submissions, print the lowest hacker_id.
To solve second problem, we can use recursive CTE. Here for each day we need to lookup data from all previous days. For day 1 it's count of all distinct hackers of that day, this will also be base case of our recursive CTE. Next we need to build on base case by joining next day data with previous day's. Limiting condition of our recursive query will be last day of our input data set.
*/
--Query
--This temp table holds data that will help us solve problem 1
SELECT s.submission_date, s.hacker_id, h.name, ROW_NUMBER() over (partition by submission_date ORDER BY COUNT(*) desc, s.hacker_id) rowNum
INTO #tmpSubmissions
FROM Submissions s
JOIN Hackers h
ON s.hacker_id = h.hacker_id
GROUP BY s.submission_date, s.hacker_id, h.name
ORDER BY s.submission_date, s.hacker_id
DECLARE @startDate datetime = (SELECT min(submission_date) FROM #tmpSubmissions)
DECLARE @endDate datetime = (SELECT DATEADD(day, 1, max(submission_date)) FROM #tmpSubmissions)
--Recursive CTE
;with cteResult AS (
--Base CTE | Here we fetch data of day 1
SELECT submission_date, t.hacker_id, name
FROM #tmpSubmissions t
where t.submission_date = @startDate
UNION ALL
--Here we are joining current day data with previous data data already present in CTE to find hackers from current day with that have submitted on previous day as well
SELECT t.submission_date, t.hacker_id, t.name
FROM #tmpSubmissions t
JOIN cteResult c
ON t.submission_date = DATEADD(day, 1, c.submission_date) and t.hacker_id = c.hacker_id
--Limiting condition
WHERE t.submission_date < @endDate
)
SELECT c.submission_date, count(*), t.hacker_id, t.name
FROM cteResult c
left JOIN (
SELECT *
FROM #tmpSubmissions
where rowNum = 1
) t
ON c.submission_date = t.submission_date
GROUP BY c.submission_date, t.hacker_id, t.name
ORDER BY c.submission_date, t.hacker_id
DROP TABLE #tmpSubmissions
/*
--Sample Data
create table Hackers (hacker_id int, name varchar(100))
create table Submissions (submission_date datetime, submission_id int, hacker_id int, score int)
insert into Hackers values
(15758, 'Rose'),
(20703, 'Angela'),
(36396, 'Frank'),
(38289, 'Patrick'),
(44065, 'Lisa'),
(53473, 'Kimberly'),
(62529, 'Bonnie'),
(79722, 'Michael')
insert into Submissions values
('2016-03-01', 8494, 20703, 0),
('2016-03-01', 22403, 53473, 15),
('2016-03-01', 23965, 79722, 60),
('2016-03-01', 30173, 36396, 70),
('2016-03-02', 34928, 20703, 0),
('2016-03-02', 38740, 15758, 60),
('2016-03-02', 42769, 79722, 25),
('2016-03-02', 44364, 79722, 60),
('2016-03-03', 45440, 20703, 0),
('2016-03-03', 49050, 36396, 70),
('2016-03-03', 50273, 79722, 5)
*/

About Me

My photo
Delhi, India
Fun, music, travel and nature loving, always smiling, computer addict!!