Reading Azure Blob Storage Data – SQL Server 2017 – Polybase

In this article we will go over on how to Read Azure blob Storage Data from SQL Server (SSMS)

  • Setup Azure Storage account
  • Grant permission to access the Storage container.
  • Connect to Azure Storage from SSMS ( Storage Account and Account Key Required).
  • Install Polybase feature on the server.(If do not exists)
  • PolyBase Connectivity Configuration.
  • Create Master Key
  • Create Database Scoped Credential
  • Create External Data Source
  • Create External File Format
  • Create External Table
  • Accessing External Table
  • Accessing External Table with Joins

Setup Azure Storage account

Login to Azure Portal and Click on [Crete a Resource +]. In the search box type [Storage] and [Select Storage account] under Featured.

Create Storage Account and secure access to it.Save the Storage account keys safely locally , you can find the keys Under Settings–> Access Keys

Under Blob Service–>Containers, Create a storage container

Grant permission to access the Storage container

Secure your container (Read about Storage Security : https://docs.microsoft.com/en-us/azure/storage/common/storage-security-guide)

Access the Azure Blob Storage locally, download and install  Microsoft Azure Storage Explorer

Connect Azure Storage Explorer to the Azure Storage. You can connect to Azure Storage using Azure Account,Using a connection String or a shared access signature URI or Use a storage name and key.

Connect to Azure Storage from SQL Server Management Studio ( Storage Account and Account Key Required).

Under Object Explorer ,Select Azure Storage and provide the Storage Account and Account Key (Use secure endpoints (HTTPS) for connection.

Once authenticated successfully, you will be connected to the Azure Storage Account

Install Polybase feature on the server.(If do not exists)

To Access Azure Blob Storage from SQL Server , you require SQL Server Polybase feature installed. Read here on : How to Install SQL Server Ploybase feature?

PolyBase Connectivity Configuration

Read more about  Hadoop connectivity settings and their corresponding supported Hadoop data sources here 

I am using Option 7: Accessing Azure Blob Storage on Windows Server

  • Option 7: Hortonworks 2.1, 2.2, 2.3, 2.4, 2.5, and 2.6 on Linux
  • Option 7: Hortonworks 2.1, 2.2, and 2.3 on Windows Server
  • Option 7: Azure blob storage (WASB[S])

sp_configure @configname = 'hadoop connectivity', @configvalue = 7;  
GO  

RECONFIGURE  
GO

Create Master Key

Provide a strong password.

CREATE MASTER KEY ENCRYPTION BY PASSWORD='P@ssw0rd';

Query Keys

SELECT * FROM sys.key_encryptions

Create Database Scoped Credential

Create Database Scoped Credential for Azure Storage Connectivity against your user database.

USE [USER DATABASE NAME]
GO
CREATE DATABASE SCOPED CREDENTIAL  Credential__Name
WITH IDENTITY = 'Azure Storage Account',
    SECRET = 'Storage Key'
;

Create External Data Source

Creates an external data source for PolyBase. Read more here

USE [USER DATABASE NAME]
GO


CREATE EXTERNAL DATA SOURCE [DataSourceName] WITH (TYPE = HADOOP, LOCATION = N'wasbs://containername@storage_account_name.blob.core.windows.net', CREDENTIAL = [Azure_Database_Scoped_Credential])
GO

Create External File Format

Creates an External File Format object defining external data stored in Hadoop, Azure Blob Storage, or Azure Data Lake Store. Creating an external file format is a prerequisite for creating an External Table. By creating an External File Format, you specify the actual layout of the data referenced by an external table.

PolyBase supports the following file formats:

  • Delimited Text
  • Hive RCFile
  • Hive ORC
  • Parquet

Read more about External File Format and options here

USE [USER DATABASE NAME]
GO

CREATE EXTERNAL FILE FORMAT [FILE FORMAT NAME] WITH 
(
FORMAT_TYPE = DELIMITEDTEXT, 
FORMAT_OPTIONS 
  (
   FIELD_TERMINATOR = N',', 
   USE_TYPE_DEFAULT = False
  )
)
GO

Create External Table

To access the data stored in Azure Blob Storage or Hadoop you need to Create an external table for each of the files.

Read more about External Table Format and options here

USE [USER DATABASE NAME]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER OFF
GO

CREATE EXTERNAL TABLE [dbo].[EXTERNAL TABLE NAME]
(
  [COLUMN NAME] [varchar](50) NULL,
  [COLUMN NAME] [varchar](100) NULL,
  [COLUMN NAME] [bit] NULL,
  [COLUMN NAME] [int] NULL	
)
WITH 
    (
     DATA_SOURCE = [DATA SOURCE NAME],
     LOCATION = N'/FILENAME.csv',
     FILE_FORMAT = [FILE FORMAT NAME],
     REJECT_TYPE = VALUE,
     REJECT_VALUE = 1
    )
GO

Accessing External Table

External Tables can be accessed as normal tables.External tables are just for reading.Insert,Update and Delete are not allowed. I suggest not to use “** while accessing the blob data as azure has limiation on Row Size

Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". 107093;Row size exceeds the defined Maximum DMS row size: [42136 bytes] is larger than the limit of [32768 bytes]
SELECT [COLUMN NAME 1],[COLUMN NAME 2] , [COLUMN NAME 3] FROM EXTERNAL_TABLE_NAME

Accessing External Table with Joins

Join used for native sql server tables are used for external tables as well.

USE [USER DATABASE NAME]
GO
SELECT AUT.[COLUMN NAME 1],AUT.[COLUMN NAME 2],AUT.[COLUMN NAME 3],AUT.[COLUMN NAME 4],TAT.[COLUMN NAME 5],
 FROM [EXTERNAL TABLE NAME 1 ] AUT 
        INNER JOIN 
      [EXTERNAL TABLE NAME 2] TAT 
        ON
      AUT.[COLUMN NAME 1]=TAT.[COLUMN NAME 1]

 

Leave a Reply

Your email address will not be published. Required fields are marked *