搜尋本站文章

2010-09-08

對啟用「資料壓縮(Data Compression)」的資料表,執行大量匯入、修改作業之建議

「資料壓縮(Data Compression)」包含了:「頁面壓縮(Page Compression)」、「資料列壓縮(Row Compression)」。


您可以針對每個資料分割個別設定資料分割資料表和索引的壓縮。

假若需要對這類壓縮資料表執行大量匯入、修改等作業,經過測試,建議:使用分段式的作法,將可以大幅縮減維護時間。

所謂的分段式,其作法流程如下:
1. 先對資料表執行解壓縮作業。
2. 再執行大量的匯入、修改等作業。
3. 最後,對資料表執行壓縮作業。


方式耗用時間(秒)
 採取分段式作法,對啟用壓縮資料表的執行大量的 UPDATE 作業4
 直接對啟用壓縮資料表的執行大量的 UPDATE 作業43

相差快10倍的時間。應採用分段式。


請參考以下的範例程式碼:
-- 建立範例資料表:DC01 與 DC02
USE tempdb
GO
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DC01]') AND type in (N'U'))
DROP TABLE [dbo].[DC01]
GO
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DC02]') AND type in (N'U'))
DROP TABLE [dbo].[DC02]
GO
SELECT  TransactionID+0 TransactionID,ProductID, ReferenceOrderID, ReferenceOrderLineID, TransactionDate, 
 TransactionType, Quantity, ActualCost, ModifiedDate
 INTO DC01
FROM AdventureWorks2008.Production.TransactionHistory
GO
--
DECLARE @CNT INT =1
WHILE @CNT<=2
BEGIN
 INSERT DC01
  SELECT  *
  FROM DC01
 
 SET @CNT +=1
END

--
SELECT *
 INTO DC02
FROM DC01

--02 查詢範例資料表的使用空間(初估)資訊:DC01 與 DC02
SELECT a3.name AS N'結構描述', a2.name AS N'資料表', a1.rows AS N'資料列筆數', 
 (a1.reserved + ISNULL(a4.reserved,0))* 8 AS N'已保留(KB)', a1.data * 8 AS '資料使用空間(KB)',
 (CASE WHEN (a1.used + ISNULL(a4.used,0)) > a1.data THEN (a1.used + ISNULL(a4.used,0)) - a1.data ELSE 0 END) * 8 AS N'索引(KB)',
 (CASE WHEN (a1.reserved + ISNULL(a4.reserved,0)) > a1.used THEN (a1.reserved + ISNULL(a4.reserved,0)) - a1.used ELSE 0 END) * 8 AS N'未使用(KB)'
FROM (
 SELECT ps.object_id, SUM (CASE WHEN (ps.index_id < 2) THEN row_count ELSE 0 END ) AS [rows],
  SUM (ps.reserved_page_count) AS reserved,
  SUM (CASE WHEN (ps.index_id < 2) THEN (ps.in_row_data_page_count + ps.lob_used_page_count + ps.row_overflow_used_page_count)
   ELSE (ps.lob_used_page_count + ps.row_overflow_used_page_count) END) AS data,
  SUM (ps.used_page_count) AS used
 FROM sys.dm_db_partition_stats ps
 GROUP BY ps.object_id) AS a1
LEFT OUTER JOIN 
 (SELECT it.parent_id, SUM(ps.reserved_page_count) AS reserved, SUM(ps.used_page_count) AS used
  FROM sys.dm_db_partition_stats ps INNER JOIN sys.internal_tables it ON (it.object_id = ps.object_id)
  WHERE it.internal_type IN (202,204)
  GROUP BY it.parent_id) AS a4 ON (a4.parent_id = a1.object_id)
INNER JOIN sys.all_objects a2  ON ( a1.object_id = a2.object_id ) 
INNER JOIN sys.schemas a3 ON (a2.schema_id = a3.schema_id)
WHERE a2.type <> N'S' and a2.type <> N'IT'
ORDER BY 3 DESC
GO
請參考下圖所示:

--03 啟用壓縮:頁面壓縮(PAGE compression):DC01 與 DC02
ALTER TABLE DC01 
 REBUILD
WITH (DATA_COMPRESSION = PAGE);
GO
ALTER TABLE DC02
 REBUILD
WITH (DATA_COMPRESSION = PAGE);
GO

--04 查詢資料表的壓縮狀態
/*
data_compression 資料行,表示每個資料分割的壓縮狀態:
0 = NONE
1 = ROW
2 = PAGE
*/
SELECT OBJECT_NAME(object_id) N'資料表', MAX(rows) N'資料列筆數', MAX(data_compression) N'壓縮狀態',
 MAX(data_compression_desc) N'每個資料分割的壓縮狀態'
FROM sys.partitions
WHERE object_id>100
GROUP BY OBJECT_NAME(object_id)
ORDER BY 3 DESC,2 DESC
請參考下圖所示:

--05 查詢範例資料表的使用空間(初估)資訊:DC01 與 DC02
SELECT a3.name AS N'結構描述', a2.name AS N'資料表', a1.rows AS N'資料列筆數', 
 (a1.reserved + ISNULL(a4.reserved,0))* 8 AS N'已保留(KB)', a1.data * 8 AS '資料使用空間(KB)',
 (CASE WHEN (a1.used + ISNULL(a4.used,0)) > a1.data THEN (a1.used + ISNULL(a4.used,0)) - a1.data ELSE 0 END) * 8 AS N'索引(KB)',
 (CASE WHEN (a1.reserved + ISNULL(a4.reserved,0)) > a1.used THEN (a1.reserved + ISNULL(a4.reserved,0)) - a1.used ELSE 0 END) * 8 AS N'未使用(KB)'
FROM (
 SELECT ps.object_id, SUM (CASE WHEN (ps.index_id < 2) THEN row_count ELSE 0 END ) AS [rows],
  SUM (ps.reserved_page_count) AS reserved,
  SUM (CASE WHEN (ps.index_id < 2) THEN (ps.in_row_data_page_count + ps.lob_used_page_count + ps.row_overflow_used_page_count)
   ELSE (ps.lob_used_page_count + ps.row_overflow_used_page_count) END) AS data,
  SUM (ps.used_page_count) AS used
 FROM sys.dm_db_partition_stats ps
 GROUP BY ps.object_id) AS a1
LEFT OUTER JOIN 
 (SELECT it.parent_id, SUM(ps.reserved_page_count) AS reserved, SUM(ps.used_page_count) AS used
  FROM sys.dm_db_partition_stats ps INNER JOIN sys.internal_tables it ON (it.object_id = ps.object_id)
  WHERE it.internal_type IN (202,204)
  GROUP BY it.parent_id) AS a4 ON (a4.parent_id = a1.object_id)
INNER JOIN sys.all_objects a2  ON ( a1.object_id = a2.object_id ) 
INNER JOIN sys.schemas a3 ON (a2.schema_id = a3.schema_id)
WHERE a2.type <> N'S' and a2.type <> N'IT'
ORDER BY 3 DESC
GO

請參考下圖所示:



EX1 直接對啟用壓縮資料表的執行大量的 UPDATE 作業:DC01

--EX1 直接對啟用壓縮資料表的執行大量的 UPDATE 作業:DC01
--01
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS 
SET STATISTICS TIME ON
GO

UPDATE DC01
SET TransactionType='A'
GO

/*
SQL Server parse and compile time: 
   CPU time = 16 ms, elapsed time = 25 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 37206 ms,  elapsed time = 43283 ms.
*/
請參考下圖所示:

SQL Server 2008 支援針對資料表、索引和索引檢視表,以資料列和頁面格式進行磁碟內存儲存壓縮。

可以觀察到,直接對啟用壓縮資料表的執行大量的 UPDATE 作業,耗用了 43 多秒的時間。


EX2. 採取分段式作法,對啟用壓縮資料表的執行大量的 UPDATE 作業:DC02

--EX2. 採取分段式作法,對啟用壓縮資料表的執行大量的 UPDATE 作業:DC02
/*
1. 先對資料表 DC02 執行解壓縮作業。
2. 再執行大量的匯入、修改等作業。
3. 最後,對資料表 DC02 執行壓縮作業。
*/
--01 執行解壓縮作業:DC02
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS 
SET STATISTICS TIME ON
GO
ALTER TABLE DC02 
 REBUILD
WITH (DATA_COMPRESSION = NONE);
GO

--02 執行大量的 UPDATE 作業:DC02
UPDATE DC02
SET TransactionType='A'

--03 執行壓縮作業:DC02
ALTER TABLE DC02 
 REBUILD
WITH (DATA_COMPRESSION = PAGE);
GO

/*
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 15 ms.

 SQL Server Execution Times:
   CPU time = 1046 ms,  elapsed time = 924 ms.

 SQL Server Execution Times:
   CPU time = 1046 ms,  elapsed time = 992 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 2 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 952 ms,  elapsed time = 1178 ms.

(453772 個資料列受到影響)
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 2 ms.

 SQL Server Execution Times:
   CPU time = 4555 ms,  elapsed time = 2391 ms.

 SQL Server Execution Times:
   CPU time = 4555 ms,  elapsed time = 2395 ms.

*/
請參考下圖所示:

可以觀察到,採取分段式作法,對啟用壓縮資料表的執行大量的 UPDATE 作業,耗用約 4 秒多的時間。


參考資料:
建立壓縮資料表及索引
http://msdn.microsoft.com/zh-tw/library/cc280449.aspx

Creating Compressed Tables and Indexes
http://msdn.microsoft.com/en-us/library/cc280449.aspx