Resilient pipelines are pipelines that are designed to gracefully handle:
- Interruptions to the running pipeline.
- Constraints in the external systems like API limits and quotas.
- Errors interacting with external systems.
- Modifications to the inputs of the pipeline.
- Source schema changes.
This article will discuss strategies and techniques that can be used to build pipelines capable of operating continuously with minimal user intervention.
Idempotent Pipelines
Pipelines in Syncari may encounter interruptions due to various factors such as machine failure or a new release of the Syncari product, which are typical in a software environment. To mitigate potential data loss, Syncari ensures that the pipeline's watermark remains unchanged during interruptions. Consequently, upon the next pipeline run, the same data may be processed again, necessitating the need for Idempotent pipelines. Idempotent pipelines ensure that processing the same data multiple times yields the same results as processing it once.
Two factors that may affect pipeline idempotence.
Side Effects due to Actions
Typically Actions are idempotent, but this may not the case always, especially with Custom Actions. Consider a Custom Action which creates a record in another system. If this Action is called multiple times it is possible that subsequent actions result in an error as the record has already been created.
There are two approaches to address this issue. First, you can design the pipeline to ignore this error and continue. Alternatively, you can use another Custom Action to verify the existence of the record before invoking the initial Custom Action.
Complex Pipelines
Complex pipelines with long sync cycles pose two problems:
- The likelihood of a pipeline being interrupted mid-sync cycle is higher.
- Pipelines lag, as watermark progresses forward slowly due to interruptions.
We can avoid this by splitting a single pipeline into multiple pipelines. A common tactic is to have the pipeline update another entity as part of its data processing. The pipeline associated with that entity can be designed to handle some of the necessary business logic. This pipeline can then run independently of the first pipeline. This will reduce the sync cycle duration of the first pipeline. Note that splitting of the Pipelines may not always be feasible and heavily relies on the specific business use case that the pipeline is designed to address.
API Limits and Quotas
Certain Synapses have defined API limits and quotas for integrations. When Pipelines exceed these limits, read or write operations may fail, potentially causing slowdowns or even stalling the pipeline altogether. These errors can become particularly noticeable when multiple Pipelines are simultaneously reading from or writing to the same Synapse, causing a rapid increase in API usage against the set limit.
Reading from Sources
One approach to deal with this problem is to ensure that multiple pipelines are not reading from the source synapse at the same time. This can be achieved by assigning different schedules to each pipeline. Schedules can be configured in different Source nodes of the pipeline. For instance, in the following example, a source schedule is set to trigger every 5 minutes starting at 1 minute past the hour. Another source could be scheduled at 2 minutes past the hour using the expression 2/5 * * * *. This will ensure that pipelines are not reading data from source synapse at the same time.
Using schedule on the sources can cause slow processing of data and in some cases inconsistent results, specifically when you are unifying data from multiple sources. It's advisable to use this solution only when encountering API limits that cannot be raised.
Writing to destination
API limits can be hit while writing data from pipelines to destination synapses. One way to avoid this is by restricting the number of API calls made within an hour, which can be achieved through the "Sync Rate Limit" configuration. This configuration can be found by editing the Synapse configuration as shown below. Note that even though this configuration is for Synapse, rate limit itself is applied to the pipeline. To determine the appropriate value for this configuration, consider the following scenario: Suppose there are four pipelines writing to the same destination Synapse, and the API limit for the Synapse is 3600 API requests per hour. In this case, set the "Sync Rate Limit" to 900 records per hour.
The two examples provided are just demonstrations of approaches to address Action failures. The particular approach you adopt would vary depending on the specific Action being used and the business problem that the pipeline aims to solve.
Input Handling
Another aspect of building resilient pipelines is ability to handle different inputs. We discuss few approaches below. While these methods may not be universally applicable, it is useful to keep them in mind as you build pipelines.
- Case insensitive comparisons - Whenever you are using "Equals" operator in a function like Decision, Lookup Syncari Record, Attach Record etc. consider using operator "Equals Ignore Case" if you expect text with varying cases.
- Handle Fallback case - Consider incorporating a fallback case in the Decision Condition while configuring Decision functions. For instance, when evaluating potential values of an input like a picklist, it may be useful to include an extra Decision to manage any unforeseen picklist values.
- Handle null or blank values - You can handle null/blank values by either appropriate decision functions or by rejecting or accepting such values in Syncari field configuration as show below.
Handling Schema Changes
If the source schema regularly introduces new fields in the entity and you wish to seamlessly synchronize them with Syncari and onward to another destination synapse, consider using "Schema Sync". This will ensure that you do not have to manually add mappings in the Pipeline for new fields. Below is an example of how to configure Schema Sync. Here Salesforce Account entity is mapped to Snowflake Account entity through Syncari Account entity.
Note that Syncari automatically synchronizes other changes to the source schema, such as deleted fields and modified fields.