Similar to Faros Managed Sources, Airbyte connections can be run in either a full sync mode or an incremental sync. Full syncs do a complete refresh of data each time the Airbyte connection is run, while incremental syncs only update the new and changed data. Depending on the selected Airbyte Deployment Option you chose one of the following will apply:
To run an Airbyte incrementally there needs to be a notion of state to determine what should be updated. For users using the CLI this is in the form of a state file. An Airbyte connection will do an incremental sync if it can find a matching state file.
- Append the argument
--state myConnectionNameState.jsonto your CLI invocation
- Keep this file name the same for each run of that Airbyte connection
- Make sure each Airbyte connection has its own state file
- Always run your Airbyte connection in the folder that houses your state file
- Make sure you persist the file somewhere permanent, e.g. S3 bucket
- If you wish to do a full sync, delete the state file
For example when using AWS S3:
# Download the current state file
$ aws s3 cp s3://<my-bucket>/myConnectionNameState.json ./myConnectionNameState.json
# Run your connector with the --state argument
$ ./airbyte-local-cli.sh ... --state myConnectionNameState.json
# Persist the updated state file in the bucket
$ aws s3 cp ./myConnectionNameState.json s3://<my-bucket>/myConnectionNameState.json
Similar to manually running the CLI command, an Airbyte connection will do an incremental sync if it can find a matching state file.
- Follow naming conventions and notes from the section above
- In your scheduled job, after the Airbyte connection has run, make sure you persist the file somewhere permanent, e.g. S3 bucket
- In your schedule a job, before the Airbyte connection has run, make sure you pull the file back into the execution folder and that the name matches the CLI state argument
When setting up the streams for your connection, set the sync mode. Choose between
Full refresh and
Incremental. The Destination can be set to Append for both.
It’s possible some streams may not support incremental syncs. This just means those particular models will always be fully synced. Leave those as they are and set all streams that support it to use incremental mode.
If you ever want to force full sync of your data, in your connection select the
refresh your data option. This will do a complete re-sync of all data, then continue updating data as determined by the sync modes of your source.
Updated about 1 month ago