In this article we will go over on how to Read Azure blob Storage Data from SQL Server (SSMS)
- Setup Azure Storage account
- Grant permission to access the Storage container.
- Connect to Azure Storage from SSMS ( Storage Account and Account Key Required).
- Install Polybase feature on the server.(If do not exists)
- PolyBase Connectivity Configuration.
- Create Master Key
- Create Database Scoped Credential
- Create External Data Source
- Create External File Format
- Create External Table
- Accessing External Table
- Accessing External Table with Joins
Setup Azure Storage account
Login to Azure Portal and Click on [Crete a Resource +]. In the search box type [Storage] and [Select Storage account] under Featured.
Create Storage Account and secure access to it.Save the Storage account keys safely locally , you can find the keys Under Settings–> Access Keys
Under Blob Service–>Containers, Create a storage container
Grant permission to access the Storage container
Secure your container (Read about Storage Security : https://docs.microsoft.com/en-us/azure/storage/common/storage-security-guide)
Access the Azure Blob Storage locally, download and install Microsoft Azure Storage Explorer
Connect Azure Storage Explorer to the Azure Storage. You can connect to Azure Storage using Azure Account,Using a connection String or a shared access signature URI or Use a storage name and key.
Connect to Azure Storage from SQL Server Management Studio ( Storage Account and Account Key Required).
Under Object Explorer ,Select Azure Storage and provide the Storage Account and Account Key (Use secure endpoints (HTTPS) for connection.
Once authenticated successfully, you will be connected to the Azure Storage Account
Install Polybase feature on the server.(If do not exists)
To Access Azure Blob Storage from SQL Server , you require SQL Server Polybase feature installed. Read here on : How to Install SQL Server Ploybase feature?
PolyBase Connectivity Configuration
Read more about Hadoop connectivity settings and their corresponding supported Hadoop data sources here
I am using Option 7: Accessing Azure Blob Storage on Windows Server
- Option 7: Hortonworks 2.1, 2.2, 2.3, 2.4, 2.5, and 2.6 on Linux
- Option 7: Hortonworks 2.1, 2.2, and 2.3 on Windows Server
Option 7: Azure blob storage (WASB[S])
sp_configure @configname = 'hadoop connectivity', @configvalue = 7; GO RECONFIGURE GO
Create Master Key
Provide a strong password.
CREATE MASTER KEY ENCRYPTION BY PASSWORD='P@ssw0rd';
SELECT * FROM sys.key_encryptions
Create Database Scoped Credential
Create Database Scoped Credential for Azure Storage Connectivity against your user database.
USE [USER DATABASE NAME] GO CREATE DATABASE SCOPED CREDENTIAL Credential__Name WITH IDENTITY = 'Azure Storage Account', SECRET = 'Storage Key' ;
Create External Data Source
Creates an external data source for PolyBase. Read more here
USE [USER DATABASE NAME] GO CREATE EXTERNAL DATA SOURCE [DataSourceName] WITH (TYPE = HADOOP, LOCATION = N'wasbs://containername@storage_account_name.blob.core.windows.net', CREDENTIAL = [Azure_Database_Scoped_Credential]) GO
Create External File Format
Creates an External File Format object defining external data stored in Hadoop, Azure Blob Storage, or Azure Data Lake Store. Creating an external file format is a prerequisite for creating an External Table. By creating an External File Format, you specify the actual layout of the data referenced by an external table.
PolyBase supports the following file formats:
- Delimited Text
- Hive RCFile
- Hive ORC
Read more about External File Format and options here
USE [USER DATABASE NAME] GO CREATE EXTERNAL FILE FORMAT [FILE FORMAT NAME] WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS ( FIELD_TERMINATOR = N',', USE_TYPE_DEFAULT = False ) ) GO
Create External Table
To access the data stored in Azure Blob Storage or Hadoop you need to Create an external table for each of the files.
Read more about External Table Format and options here
USE [USER DATABASE NAME] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER OFF GO CREATE EXTERNAL TABLE [dbo].[EXTERNAL TABLE NAME] ( [COLUMN NAME] [varchar](50) NULL, [COLUMN NAME] [varchar](100) NULL, [COLUMN NAME] [bit] NULL, [COLUMN NAME] [int] NULL ) WITH ( DATA_SOURCE = [DATA SOURCE NAME], LOCATION = N'/FILENAME.csv', FILE_FORMAT = [FILE FORMAT NAME], REJECT_TYPE = VALUE, REJECT_VALUE = 1 ) GO
Accessing External Table
External Tables can be accessed as normal tables.External tables are just for reading.Insert,Update and Delete are not allowed. I suggest not to use “** while accessing the blob data as azure has limiation on Row Size
Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". 107093;Row size exceeds the defined Maximum DMS row size: [42136 bytes] is larger than the limit of [32768 bytes]
SELECT [COLUMN NAME 1],[COLUMN NAME 2] , [COLUMN NAME 3] FROM EXTERNAL_TABLE_NAME
Accessing External Table with Joins
Join used for native sql server tables are used for external tables as well.
USE [USER DATABASE NAME] GO SELECT AUT.[COLUMN NAME 1],AUT.[COLUMN NAME 2],AUT.[COLUMN NAME 3],AUT.[COLUMN NAME 4],TAT.[COLUMN NAME 5], FROM [EXTERNAL TABLE NAME 1 ] AUT INNER JOIN [EXTERNAL TABLE NAME 2] TAT ON AUT.[COLUMN NAME 1]=TAT.[COLUMN NAME 1]