Airbyte Deployment Options

After completing the above, you will have successfully pulled data from your source into your Faros graph. As you move forward with your Faros implementation, you will want to continually sync your source data into Faros. This can be done the following ways:

1. Manually re-run the commands from your machine when you want fresh data

We recommend starting with this. This requires the least amount of overhead and is great for initially getting data into Faros.

2. Embed the commands in a scheduled job

If you have an existing scheduling/orchestration system or are comfortable implementing it on your own, add your airbyte-cli command into the scheduled job. This will sync your data on a regular cadence. Note: for incremental syncs you would need to manage the state file.

3. Set up an Airbyte Server to schedule it for you

This solution offers better visibility, state management and a helpful UI to manage and maintain several Airbyte connections in one place. But because of the overhead required to set up the server, it is only recommended for production implementations. Airbyte servers can be run with either on single-host/VM (e.g EC2) or Kubernetes.

Summary comparison

Airbyte Deployment option

Infrastructure

Orchestration / Scheduling

Incremental syncs state management

Logs storage

Web UI & API

Setup complexity

Manually execute airbyte-local-cli

Your machine

None

Custom

Custom

No

Simple

Scheduled job to execute airbyte-local-cli

Source and destination containers on a single host/VM (e.g EC2)

Custom

Custom

Custom

No

Simple

Airbyte Server on a single host/VM

All containers on a single host/VM (e.g EC2)

Airbyte

Database

Internal

Yes

Moderate

Airbyte Server on Kubernetes

Containers on custom/managed
k8s cluster

Airbyte

Database

Internal or external

Yes

Complex