Monday, 28 December 2020

[SQL Server] A few things you need to consider when you use views in SQL Server - Part 2.

A few things you need to consider when you use views in SQL Server - Part 2. Aggregation.



Please check Part 1. More joins for the preparations.


One of the advantages of using a view is that you can hide code from the end-user. It means users do not need to care about the structure of views or table relations.

But what about aggregation queries in the view.


Example query(GROUP BY)


There is a view using GROUP BY as below.
CREATE OR ALTER VIEW dbo.vw_PostCount_per_Users
AS
SELECT
	COUNT(DISTINCT p.Id) AS PostCount
	, u.Id As UserID, u.AboutMe, u.Age, u.CreationDate AS UserCreationDate, u.DisplayName
FROM
	dbo.Posts p LEFT JOIN dbo.Users u ON p.OwnerUserId = u.Id
GROUP BY
	u.Id , u.AboutMe, u.Age, u.CreationDate, u.DisplayName
;


Only TOP 1000?


Lots of people use "SELECT TOP 1000 Rows"  using Object Explorer in order to see what's inside.


/****** Script for SelectTopNRows command from SSMS  ******/
SELECT TOP (1000) [PostCount]
      ,[UserID]
      ,[AboutMe]
      ,[Age]
      ,[UserCreationDate]
      ,[DisplayName]
  FROM [StackOverflow].[dbo].[vw_PostCount_per_Users]


Looks ok, right?

But maybe it will take forever.


Then, what about TOP 10? 

SELECT TOP (10) [PostCount]
      ,[UserID]
      ,[AboutMe]
      ,[Age]
      ,[UserCreationDate]
      ,[DisplayName]
  FROM [StackOverflow].[dbo].[vw_PostCount_per_Users]

It will not return the result very quickly as well.


What's happening?


Let's compare the execution plan.
SELECT TOP (1000) [PostCount]
      ,[UserID]
      ,[AboutMe]
      ,[Age]
      ,[UserCreationDate]
      ,[DisplayName]
  FROM [StackOverflow].[dbo].[vw_PostCount_per_Users]


SELECT TOP (10) [PostCount]
      ,[UserID]
      ,[AboutMe]
      ,[Age]
      ,[UserCreationDate]
      ,[DisplayName]
  FROM [StackOverflow].[dbo].[vw_PostCount_per_Users]


You might be noticed that it has several problems.
 1. It's 50 vs 50. No big differences.
 2. They all uses "Clustered Index Scan", meaning no index was being used. 
   -- dbo.Posts has 46947633 Rows.


The main reason it became a heavy query is that SQL server cannot return the result to you until GROUP BY operation is finished.




How to solve this issue?


Modifying the query to use the proper index can solve this issue.
For e.g. 
SELECT TOP 1000 * FROM dbo.vw_PostCount_per_Users 
WHERE DisplayName LIKE 'Kevin%';


Let's compare this one vs the previous one (without where clause - no index usage).



But, ultimately, you need to change this view to a stored procedure in order to enforce to use indexes.


No comments:

Post a Comment

(KOR)☁️ AWS Solutions Architect Associate 자격증 취득 여정

  !저의 경험 , 팁 , 그리고 SAA-C03 자격증 합격을 위한 노하우 공유! 조금 늦은 포스팅이지만 , 꼭 공유하고 싶었던 이야기입니다 . 회사 내 주변 동료들이 자주 이렇게 말하곤 했습니다 . “ 님 실...