Monday, 28 December 2020

[SQL Server] A few things you need to consider when you use views in SQL Server - Part 2.

A few things you need to consider when you use views in SQL Server - Part 2. Aggregation.



Please check Part 1. More joins for the preparations.


One of the advantages of using a view is that you can hide code from the end-user. It means users do not need to care about the structure of views or table relations.

But what about aggregation queries in the view.


Example query(GROUP BY)


There is a view using GROUP BY as below.
CREATE OR ALTER VIEW dbo.vw_PostCount_per_Users
AS
SELECT
	COUNT(DISTINCT p.Id) AS PostCount
	, u.Id As UserID, u.AboutMe, u.Age, u.CreationDate AS UserCreationDate, u.DisplayName
FROM
	dbo.Posts p LEFT JOIN dbo.Users u ON p.OwnerUserId = u.Id
GROUP BY
	u.Id , u.AboutMe, u.Age, u.CreationDate, u.DisplayName
;


Only TOP 1000?


Lots of people use "SELECT TOP 1000 Rows"  using Object Explorer in order to see what's inside.


/****** Script for SelectTopNRows command from SSMS  ******/
SELECT TOP (1000) [PostCount]
      ,[UserID]
      ,[AboutMe]
      ,[Age]
      ,[UserCreationDate]
      ,[DisplayName]
  FROM [StackOverflow].[dbo].[vw_PostCount_per_Users]


Looks ok, right?

But maybe it will take forever.


Then, what about TOP 10? 

SELECT TOP (10) [PostCount]
      ,[UserID]
      ,[AboutMe]
      ,[Age]
      ,[UserCreationDate]
      ,[DisplayName]
  FROM [StackOverflow].[dbo].[vw_PostCount_per_Users]

It will not return the result very quickly as well.


What's happening?


Let's compare the execution plan.
SELECT TOP (1000) [PostCount]
      ,[UserID]
      ,[AboutMe]
      ,[Age]
      ,[UserCreationDate]
      ,[DisplayName]
  FROM [StackOverflow].[dbo].[vw_PostCount_per_Users]


SELECT TOP (10) [PostCount]
      ,[UserID]
      ,[AboutMe]
      ,[Age]
      ,[UserCreationDate]
      ,[DisplayName]
  FROM [StackOverflow].[dbo].[vw_PostCount_per_Users]


You might be noticed that it has several problems.
 1. It's 50 vs 50. No big differences.
 2. They all uses "Clustered Index Scan", meaning no index was being used. 
   -- dbo.Posts has 46947633 Rows.


The main reason it became a heavy query is that SQL server cannot return the result to you until GROUP BY operation is finished.




How to solve this issue?


Modifying the query to use the proper index can solve this issue.
For e.g. 
SELECT TOP 1000 * FROM dbo.vw_PostCount_per_Users 
WHERE DisplayName LIKE 'Kevin%';


Let's compare this one vs the previous one (without where clause - no index usage).



But, ultimately, you need to change this view to a stored procedure in order to enforce to use indexes.


Sunday, 6 December 2020

[SQL Server] A few things you need to consider when you use views in SQL Server - Part 1.

A few things you need to consider when you use views in SQL Server - Part 1. More joins


View in SQL Server

View in SQL Server is great when it comes to the security, readability and simplicity.
But in terms of performance, you need to consider several factors. 


Preparation

I use StackOverflow database 

You can download from here(Thanks to Brent): https://www.brentozar.com/archive/2015/10/how-to-download-the-stack-overflow-database-via-bittorrent/


Creating indexes

Create indexes in order to check the performance clearly and save time for the result.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
USE StackOverflow
GO

CREATE INDEX IX_Posts_OwnerUserId ON dbo.Posts(OwnerUserId);

CREATE INDEX IX_Comments_PostID ON dbo.Comments(PostID);

CREATE INDEX IX_Votes_PostID ON dbo.Votes(PostID);

CREATE INDEX IX_Comments_UserId ON dbo.Comments(UserId);


Creating a view

Create a view for retrieving multiple columns across related tables at once.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
USE StackOverflow
GO

CREATE OR ALTER VIEW dbo.vw_PostInformation
AS
SELECT
	p.Id AS PostId, p.Title AS PostTitle, p.AnswerCount AS PostAnswerCount, p.Body AS PostBody
	, p.ClosedDate AS PostClosedDate, p.CreationDate AS PostCreationDate
	, p.LastEditDate AS PostLastEditDate, p.ViewCount AS PostViewCount
	, p.OwnerUserId AS PostOwnerUserId, u.DisplayName AS UserDisplayName, u.CreationDate AS UserCreationDate
	, c.Id AS CommentID, c.CreationDate AS CommentCreationDate, c.Score AS CommentScore, c.Text AS CommentText
	, v.BountyAmount AS VoteBountyAmount, v.VoteTypeId, v.CreationDate AS VoteCreationDate
FROM
	dbo.Posts p JOIN dbo.Users u ON p.OwnerUserId = u.Id
	JOIN dbo.Comments c ON p.Id = c.PostId
	JOIN dbo.Votes v ON p.Id = v.PostId
;


Example query of the view.

1
2
SELECT * FROM dbo.vw_PostInformation
WHERE PostID = 33


When you need only a few columns of data?

You may select lots of information using the view.

But if you select only a few columns from a few tables, then it becomes a different story.


For e.g. if you only select data from "Posts" table and "Users" table.

Executions

Queries

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
-- 01. With view
SELECT v.PostId, v.PostBody, v.PostCreationDate, v.UserDisplayName, v.UserCreationDate
FROM dbo.vw_PostInformation AS v
WHERE PostID = 33
GROUP BY v.PostId, v.PostBody, v.PostCreationDate, v.UserDisplayName, v.UserCreationDate

-- 02. Without View
SELECT
	p.Id AS PostId, p.Body AS PostBody, p.CreationDate AS PostCreationDate
	, u.DisplayName AS UserDisplayName, u.CreationDate AS UserCreationDate
FROM
	dbo.Posts p JOIN dbo.Users u ON p.OwnerUserId = u.Id
WHERE
	P.Id = 33
;


Result


Looks like the same.

But, when you use a view, you need to use "GROUP BY" for this case in order to have unique values otherwise you will get lots of duplicate rows.


Execution plan comarison

Let's take a look at the execution plans.



View uses Index seek(Luckily not scans, at least) for "Comments" and "Votes" table which we don't need for this query. Of course, using view is more expensive.


I/O comparison

Now, let's take a look at I/O using "SET STATISTICS IO ON".

Queries

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
SET STATISTICS IO ON;
-- 01. With view
PRINT '01. With View'
SELECT v.PostId, v.PostBody, v.PostCreationDate, v.UserDisplayName, v.UserCreationDate
FROM dbo.vw_PostInformation AS v
WHERE PostID = 33
GROUP BY v.PostId, v.PostBody, v.PostCreationDate, v.UserDisplayName, v.UserCreationDate

-- 02. Without View
PRINT '02. Without View'
SELECT
	p.Id AS PostId, p.Body AS PostBody, p.CreationDate AS PostCreationDate
	, u.DisplayName AS UserDisplayName, u.CreationDate AS UserCreationDate
FROM
	dbo.Posts p JOIN dbo.Users u ON p.OwnerUserId = u.Id
WHERE
	P.Id = 33
;
SET STATISTICS IO OFF;


Result

01. With View

(1 row affected)
Table 'Votes'. Scan count 4, logical reads 20, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Comments'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Users'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Posts'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)
02. Without View

(1 row affected)
Table 'Users'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Posts'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

As you can see, it reads from "Votes" and "Comments" tables when using a view even though you didn't use any data from those tables.


Therefore, using a view can read additional pages because of the structure.








(KOR)☁️ AWS Solutions Architect Associate 자격증 취득 여정

  !저의 경험 , 팁 , 그리고 SAA-C03 자격증 합격을 위한 노하우 공유! 조금 늦은 포스팅이지만 , 꼭 공유하고 싶었던 이야기입니다 . 회사 내 주변 동료들이 자주 이렇게 말하곤 했습니다 . “ 님 실...