Indexes are used as entry points for memory-optimized tables.Data in memory is quickly read by index to locate in-memory data.
A hash index consists of a collection of buckets organized in an array. A hash function maps index keys to corresponding buckets in the hash index. The following figure shows three index keys that are mapped to three different buckets in the hash index. For illustration purposes the hash function name is f(x).
The hashing function used for hash indexes has the following characteristics:
- SQL Server has one hash function that is used for all hash indexes.
- The hash function is deterministic. The same index key is always mapped to the same bucket in the hash index.
- Multiple index keys may be mapped to the same hash bucket.
- The hash function is balanced, meaning that the distribution of index key values over hash buckets typically follows a Poisson distribution.
Poisson distribution is not an even distribution. Index key values are not evenly distributed in the hash buckets. For example, a Poisson distribution of n distinct index keys over n hash buckets results in approximately one third empty buckets, one third of the buckets containing one index key, and the other third containing two index keys. A small number of buckets will contain more than two keys.
If two index keys are mapped to the same hash bucket, there is a hash collision. A large number of hash collisions can have a performance impact on read operations.
The in-memory hash index structure consists of an array of memory pointers. Each bucket maps to an offset in this array. Each bucket in the array points to the first row in that hash bucket. Each row in the bucket points to the next row, thus resulting in a chain of rows for each hash bucket, as illustrated in the following figure.
The figure has three buckets with rows. The second bucket from the top contains the three red rows. The fourth bucket contains the single blue row. The bottom bucket contains the two green rows. These could be different versions of the same row.
Create In-Memory OLTP table
USE [AdventureWorks2014] SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [Sales].[SalesOrderDetail_imoltp4] ( [SalesOrderID] [int] NOT NULL, [SalesOrderDetailID] [int] IDENTITY(1,1) NOT NULL, [CarrierTrackingNumber] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [OrderQty] [smallint] NOT NULL, [ProductID] [int] NOT NULL, [SpecialOfferID] [int] NOT NULL, [UnitPrice] [money] NOT NULL, [UnitPriceDiscount] [money] NOT NULL CONSTRAINT [DF_SalesOrderDetail_UnitPriceDiscount4] DEFAULT ((0.0)), [ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_SalesOrderDetail_ModifiedDate4] DEFAULT (getdate()), INDEX [IX_SalesOrderDetail_ProductId_4] NONCLUSTERED ( [ProductID] ASC ), CONSTRAINT [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID_IN4] PRIMARY KEY NONCLUSTERED HASH ( [SalesOrderID], [SalesOrderDetailID] )WITH ( BUCKET_COUNT = 128) )WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA ) GO
Created PRIMARY KEY NONCLUSTERED HASH Index with BUCKET_COUNT=128. BUCKET_COUNT indicates the no of buckets in the hash index.In this case we have 128 buckets.The max value we can have for BUCKET_COUNT=1,073,741,824
CONSTRAINT [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID_IN4] PRIMARY KEY NONCLUSTERED HASH ( [SalesOrderID], [SalesOrderDetailID] )WITH ( BUCKET_COUNT = 128)
using Dynamic Management view we can monitor how BUCKET_COUNT is used.sys.dm_db_xtp_hash_index_stats
select * from sys.dm_db_xtp_hash_index_stats
HASH Index Statistics
Execute the below code to see how BUCKET_COUNT is used.
go SELECT object_name(hs.object_id) AS 'object name', i.name as 'index name', i.index_id, hs.total_bucket_count, hs.empty_bucket_count, floor((cast(empty_bucket_count as float)/total_bucket_count) * 100) AS 'empty_bucket_percent', hs.avg_chain_length, hs.max_chain_length FROM sys.dm_db_xtp_hash_index_stats AS hs JOIN sys.indexes AS i ON hs.object_id=i.object_id AND hs.index_id=i.index_id go
As you can see PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID_IN4 Primary Key Hash Index has a total of 128 BUCKET_COUNT, and as we insert the records empty_bucket_count will decrease, max_chain_length column will show no of records in each bucket.
After inserting few millions of records all the BUCKET_COUNT where used and each BUKCET has 90156 rows