flowgraph_example

SAP HANA Smart Data Access, SAP HANA Smart Data Integration and SAP HANA Smart Data Quality

Introduction

SAP HANA is a column-based relational database management system built on new in-memory technology. All the data is kept in-memory and since disk access is slower than memory access, it’s ten to a thousand times faster compared to traditional disk-optimized databases. That enables the processing of high volumes of real time data, whether analytical or transactional.

It provides multiple different layers that cover the entire enterprise solution within a single installation, so we can’t say it is just a database, it’s also a platform.

SAP HANA can be used in a classical 3-tier or 2-tier architecture. The 3-tier architecture consists of a client, an application server and a database layer and the 2-tier architecture has an application server built into the SAP HANA Extended Application Services (SAP HANA XS). Apart from a full featured application server, SAP HANA XS also embeds a Web server and a development environment within the SAP HANA appliance itself. SAP HANA XS isn’t a separate software, it’s actually an extension that is tightly integrated into SAP HANA.

It is available as an on-premise appliance, in the Cloud, as Platform-as-a-Service (PaaS) or as a hybrid model that combines cloud and on-premise instances.

SAP HANA Smart Data Access is a feature that provides the possibility to virtually access data from various remote sources without the need for previous movement or replication of data to SAP HANA. Virtual tables’ data types are mapped to SAP HANA data types and they are handled as SAP HANA local tables. After the connection to a remote source has been established, SQL queries can be run against virtual tables and can be combined with other existing physical SAP HANA tables or other SDA sources. One of the use cases for SAP HANA Smart Data Access is archiving cold data in an alternate source and keeping active hot and warm data in SAP HANA.

SAP HANA Smart Data Integration and Smart Data Quality features together provide an integrated ETL mechanism and therefore replace the need for a separate ETL tool. They were launched for the first time in Service Pack 09 and further enhanced in following releases.

SAP HANA Smart Data Integration capabilities are used for analytics, data migration and batch or real-time integration from multiple sources. Real-time replication is also available and supported by a built-in RSS adapter. Smart Data Integration transformations include basic and advanced SQL-oriented transformations, transformations that handle the data movement lifecycle and transformations for code execution.

Advanced Smart Data Quality transformations support data quality functionalities like parsing, standardizing and enriching the information on dimensional attributes as well as global address cleansing, and geocode enrichment for countries worldwide.

Flowgraph is a development object used for creating data flows with Smart Data Integration and Smart Data Quality transformations. Modeling of flowgraph starts with its creation and concludes with activation and execution of generated task or stored procedure.

In the example, I will show how to establish a remote connection to an SQL Server database, create a virtual table, create a flowgraph, transform and join data from virtual tables and store it into a SAP HANA table.

Example

  1. Creating a Remote Source

The first step towards establishing a remote connection is creating and configuring an ODBC connection from SAP HANA to a remote database. The prerequisite for this action are previously installed drivers on the SAP HANA system for the desired database.

In SAP HANA Studio, there are two ways to create a remote source, by GUI or using an SQL statement. Creating a remote data source with GUI starts with expanding the Provisioning folder, then right clicking the Remote Sources and selecting New Remote Source… option from the context menu. After selecting the appropriate adapter from the drop down menu, the connection and authentication details should be entered. The connection details for the MSSQL source used in this example are shown in the picture below.

 

MSSQL connection details
MSSQL connection details

Once the remote source connection setup has been successfully completed, the data source is listed under remote sources and users and tables are visible under it.

  1. Creating a virtual table

Virtual tables can also be created via GUI or in the SQL Console. This time I will use the following SQL statement to create an virtual table for Employee dimension:

CREATE VIRTUAL TABLE MSSQL_Employee AT MSSQL.AdventureWorksDW.dbo.Employee

After creating all of the needed virtual tables, in this case Sales Territory dimension, we can move on to creating a Flowgraph Model. Both virtual tables used for the purposes of this example are from the AdventureWorks DW sample database.

Data previews of the Employee and the Sales Territory virtual tables are shown on the pictures below.

 

employee_table
Employee table
sales_territory
Sales Territory table
  1. Creating a Flowgraph Model

In the Project Explorer view of SAP HANA Development perspective, we can right-click on the existing project and choose New->Other->Flowgraph Model placed under SAP HANA->Database Development folder. We have to name our flowgraph and, in this case, we will select the Flowgraph for Activation as Task Plan option.

Task plan can be run in batch or in real-time mode, but stored procedure can only run in batch mode.

The new flowgraph with an empty container opens and we can now start transforming data. It’s important that the right Target Schema is selected. The Node Palette offers a set of nodes – functional elements, each of which represents a different functionality – data source, data sink or operator.

In this simple example I have used the Data Source, Filter, Join, Data Sink (Template Table) Pallete elements.

Template table is a target table that is created matching its input columns that are product of previous transformations.

flowgraph_example
Flowgraph Example

The first two Data Source nodes represent the MSSQL_D_Employee and MSSQL_D_SalesTerritory virtual tables. The Data Source MSSQL_D_Employee is followed by a Filter node used to limit the rows to those that meet the “SalesPersonFlag”=1 and “Status”=’Current’ conditions.

The Join node joins (inner join) both virtual tables by SalesTerritoryKey and is used to filter out unnecessary columns. Afterwards, the output rows will be written into an Employee_SalesTerritory template table in SAP HANA. To run the described action we need to activate our flowgraph and then execute it as a HANA task.

After this action has been completed, the template table Employee_SalesTerritory is created and loaded with data (data preview is shown on the picture below).

employe_sales_territory
Employee_SalesTerritory table

Conclusion

As an in-memory solution and due to its powerful capabilities that fully surpass traditional RDBMS’, SAP HANA enables business of the future.

SAP HANA Smart Data Access, SAP Smart Data Integration and SAP Smart Data Quality features simplify IT environment and accelerate performance considering everything is done under the same platform and is powered by SAP HANA’s strong capabilities.

2 thoughts on “SAP HANA Smart Data Access, SAP HANA Smart Data Integration and SAP HANA Smart Data Quality

  • Good article!!
    How we can use sdi for virtual tables but with explicit filter.
    lets say source has data for year 2015,2016,2017. I want to access data virtually but it should only 2017 when I does data preview or query on virtual table.
    I need to know using sdi is this possible ? I do not want to store data in hana it should be virtual only

    • Thanks! You Can filter virtual table for a specific year in a mapping, or you can create a view in the database and pull it through sda.

Leave a Reply

Your email address will not be published. Required fields are marked *