Streamlining Data Integration with Azure Sap Cdc Connector

Author

Reads 644

An artist's illustration of artificial intelligence (AI). This image represents storage of collected data in AI. It was created by Wes Cockx as part of the Visualising AI project launched ...
Credit: pexels.com, An artist's illustration of artificial intelligence (AI). This image represents storage of collected data in AI. It was created by Wes Cockx as part of the Visualising AI project launched ...

The Azure SAP CDC Connector is a powerful tool that simplifies data integration between SAP and Azure services.

It supports real-time data replication, which is essential for businesses that require up-to-the-minute insights.

With the Azure SAP CDC Connector, you can replicate data from SAP systems to Azure Synapse Analytics, Azure Data Lake Storage, and other supported services.

This enables you to make data-driven decisions quickly and efficiently.

The connector uses Change Data Capture (CDC) technology to capture changes made to SAP data in real-time, reducing latency and increasing data accuracy.

This is particularly useful for businesses that require high-speed data processing and analysis.

A fresh viewpoint: Sap Hana on Azure

Prerequisites

To use the SAP CDC capabilities in Azure Data Factory, you'll need to set up SAP systems to use the SAP Operational Data Provisioning (ODP) framework. This is a crucial step before moving forward.

You'll also need a self-hosted integration runtime for the SAP CDC connector. This will allow you to connect to your SAP systems and extract data.

For more insights, see: Sap on Azure

Credit: youtube.com, Get started with SAP CDC

Next, you'll need to set up an SAP CDC linked service. This will enable you to define the connection to your SAP systems and configure the data extraction process.

Debugging issues with the SAP CDC connector is also a must. You'll need to send self-hosted integration runtime logs to Microsoft to resolve any problems that may arise.

Lastly, it's essential to be familiar with monitoring data extractions on SAP systems. This will help you troubleshoot any issues and ensure that your data is being extracted correctly.

Here are the prerequisites in a nutshell:

  • Set up SAP systems to use the SAP Operational Data Provisioning (ODP) framework.
  • Set up a self-hosted integration runtime for the SAP CDC connector.
  • Set up an SAP CDC linked service.
  • Debug issues with the SAP CDC connector by sending self-hosted integration runtime logs to Microsoft.
  • Be familiar with monitoring data extractions on SAP systems.

SAP Capabilities

The SAP CDC connector can connect to all SAP systems that support ODP, including SAP ECC, SAP S/4HANA, SAP BW, and SAP BW/4HANA.

This connector is the core of the SAP CDC capabilities, and it can work directly at the application layer or indirectly via an SAP Landscape Transformation Replication Server (SLT) as a proxy.

The SAP CDC connector extracts data that includes not only physical tables but also logical objects created using tables, such as SAP Advanced Business Application Programming (ABAP) Core Data Services (CDS) views.

It's worth noting that the SAP CDC connector doesn't rely on watermarking to extract SAP data, whether fully or incrementally.

The SAP CDC capabilities in Data Factory use the SAP Operational Data Provisioning (ODP) framework to replicate the delta in an SAP source dataset.

Discover more: Azure Core

Configuring the SAP CDC Connector

Credit: youtube.com, #119 - The one with Updates to the SAP CDC Connector (Ulrich Christ) | SAP on Azure Video Podcast

To extract data from SAP, a self-hosted integration runtime is required, which you install on an on-premises computer or a virtual machine with a line of sight to your SAP source systems or your SLT server.

The SAP CDC connector uses the SAP ODP framework to extract various data source types, including SAP extractors, ABAP CDS views, InfoProviders and InfoObjects datasets, and SAP application tables.

A staging storage is required to be configured in the mapping data flow activity for seamless integration with the self-hosted integration runtime.

The SAP ODP context, ODP name, and run mode can be parametrized in the Source options of the source transformation, while key columns are provided as an array of (double-quoted) strings.

Here are some valid parameter values for the run mode:

  • Full
  • Delta
  • Incremental

In the Optimize tab, a source partitioning scheme can be defined via parameters to optimize performance for full or initial loads. Two steps are typically required.

Data Partitioning

Credit: youtube.com, SAP CDC in Azure Data Factory

To define a source partitioning scheme in Azure Data Factory, you follow the JSON standard with an array of partition definitions, each containing individual filter conditions.

Each filter condition is a JSON object aligned with SAP's selection options, and the format is similar to dynamic DTP filters in SAP BW. For example, a JSON definition with two partitions looks like this:

[

["VBELN" = '0000001000'],

["GJAHR" >= 2011, "GJAHR" <= 2015]

]

Azure Data Factory doesn't check these conditions for overlap, so it's up to you to ensure partition conditions don't overlap.

Partition conditions can be complex, combining multiple elementary filter conditions. In SAP, including conditions are combined with OR, while excluding conditions are combined with OR as well. The resulting conditions are combined with AND. For example:

["BUKRS" = '1000' OR "BUKRS" = '1010'] AND ["GJAHR" BETWEEN '2010' AND '2025'] AND NOT ["GJAHR" = '2021' OR "GJAHR" = '2023']

Credit: youtube.com, Automating SAP data ingest to Databricks via Azure DataFactory SAP CDC connector

Make sure to use the SAP internal format for low and high values, including leading zeroes, and express calendar dates as an eight character string with the format "YYYYMMDD".

To ingest the partitioning scheme into a mapping data flow, create a data flow parameter, such as "sapPartitions", and convert the JSON format to a string using the @string() function:

@string([{"VBELN" = '0000001000'}, {"GJAHR" >= 2011, "GJAHR" <= 2015}])

In the optimize tab of the source transformation, select Partition type "Source" and enter the data flow parameter in the Partition conditions property.

Troubleshooting and Monitoring

To monitor data extractions on SAP systems, start by running the ODQMON transaction code in the SAP Logon tool on your SAP source system. This will show you all registered subscriber processes in the operational delta queue (ODQ).

The SAP CDC connector in Data Factory reads delta changes from the SAP ODP framework, which are recorded in ODQ tables. You can check these tables to see if the number of records provided on the SAP side match the number of rows transferred by Data Factory. This will help you determine if the issue is related to Data Factory or the SAP side configuration.

Recommended read: Azure Data Factory Cdc

Credit: youtube.com, #137 - The one with Using SAP CDC via SLT (Manish Shah) | SAP on Azure Video Podcast

To delete an ODQ subscription, select the subscription and select the Delete icon. This will remove unconsumed data packages from the ODQ and stop SAP systems from tracking the subscription state.

Troubleshoot Delta Changes

Troubleshoot delta changes by first checking ODQMON to see if the number of records provided on the SAP side matches the number of rows transferred by Data Factory.

If the numbers match, the issue likely stems from an incorrect or missing configuration on the SAP side.

The SAP CDC connector in Data Factory reads delta changes from the SAP ODP framework, where deltas are recorded in ODQ tables.

Data movement works when mapping data flows finish without errors, but data isn't delivered correctly, you should check the number of records provided on the SAP side.

In such scenarios, check if the number of rows transferred by Data Factory matches the number of records provided on the SAP side.

A fresh viewpoint: Sap Migration to Azure

Monitor Data Extractions

Credit: youtube.com, How to Efficiently Extract and Monitor Big Data in Real Time

Monitoring data extractions is a crucial step in ensuring the smooth operation of your data flows.

To monitor data extractions on SAP systems, you can use the ODQMON transaction code in the SAP Logon tool.

You can see all registered subscriber processes in the operational delta queue (ODQ) by entering the value for the Subscriber name property of your SAP CDC linked service in Subscriber and selecting All in the Request Selection dropdown.

Subscriber processes represent data extractions from Azure Data Factory mapping data flow that use your SAP CDC linked service.

For each ODQ subscription, you can look at details to see all full and delta extractions.

To delete an ODQ subscription, select the subscription and select the Delete icon.

Here are the steps to delete an ODQ subscription:

  1. Select the subscription you want to delete.
  2. Click on the Delete icon.

Deleting ODQ subscriptions is essential when Data Factory mapping data flows that extract SAP data are no longer needed.

The SAP Architecture

The SAP architecture is a crucial part of the Azure SAP CDC connector. The SAP side includes the SAP ODP connector that invokes the ODP API over standard Remote Function Call (RFC) modules to extract full and delta raw SAP data.

Credit: youtube.com, All Things SAP on Azure Episode 5 - Data Integration using SAP ODP Connector

The SAP ODP connector is used to extract various data source types, including SAP extractors, ABAP CDS views, InfoProviders and InfoObjects datasets in SAP BW and SAP BW/4HANA, and SAP application tables when using an SAP LT replication server (SLT) as a proxy.

The SAP data sources are providers, which run on SAP systems to produce either full or incremental data in an operational delta queue (ODQ). The mapping data flow source is a subscriber of the ODQ.

The providers are decoupled from subscribers, allowing for any SAP documentation that offers provider configurations to be applicable to Data Factory as a subscriber.

Thomas Goodwin

Lead Writer

Thomas Goodwin is a seasoned writer with a passion for exploring the intersection of technology and business. With a keen eye for detail and a knack for simplifying complex concepts, he has established himself as a trusted voice in the tech industry. Thomas's writing portfolio spans a range of topics, including Azure Virtual Desktop and Cloud Computing Costs.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.