Airbyte Deployment Options
After completing the above, you will have successfully pulled data from your source into your Faros graph. As you move forward with your Faros implementation, you will want to continually sync your source data into Faros. This can be done the following ways:
1. Manually re-run the commands from your machine when you want fresh data
We recommend starting with this. This requires the least amount of overhead and is great for initially getting data into Faros.
2. Embed the commands in a scheduled job
If you have an existing scheduling/orchestration system or are comfortable implementing it on your own, add your airbyte-cli command into the scheduled job. This will sync your data on a regular cadence. Note: for incremental syncs you would need to manage the state file.
3. Set up an Airbyte Server to schedule it for you
This solution offers better visibility, state management and a helpful UI to manage and maintain several Airbyte connections in one place. But because of the overhead required to set up the server, it is only recommended for production implementations. Airbyte servers can be run with either on single-host/VM (e.g EC2) or Kubernetes.
Summary comparison
Airbyte Deployment option | Infrastructure | Orchestration / Scheduling | Incremental syncs state management | Logs storage | Web UI & API | Setup complexity |
---|---|---|---|---|---|---|
Manually execute airbyte-local-cli | Your machine | None | Custom | Custom | No | Simple |
Scheduled job to execute airbyte-local-cli | Source and destination containers on a single host/VM (e.g EC2) | Custom | Custom | Custom | No | Simple |
Airbyte Server on a single host/VM | All containers on a single host/VM (e.g EC2) | Airbyte | Database | Internal | Yes | Moderate |
Airbyte Server on Kubernetes | Containers on custom/managed k8s cluster | Airbyte | Database | Internal or external | Yes | Complex |
Updated 9 months ago