Importing Data in Hybrid Deployment Mode
Introduction to Airbyte
Faros leverages the open source Airbyte tool to extract data from your source system and push it into your Faros cloud instance. Airbyte constructs a complete pipeline by combining these three pieces: Airbyte connection, Airbyte source and Airbyte destination.
- Airbyte connection - This combines the provided destination and source into a complete pipeline. The name you provide for your connection is also used in Faros as the dataβs origin.
- Airbyte source - this part is responsible for connecting to your source system.
- Airbyte destination - This part is responsible for transforming data to match the Faros canonical-schema then pushing that data into Faros. All of your Airbyte sources will use the same Faros destination each with a different configuration.
A Note on Terms:
Because hybrid mode leverages Airbyte, Airbyte terms refer to different concepts than their Faros counterparts.
- A Faros source in the SaaS app is equivalent to Airbyte Connection in hybrid mode. Both import data from an external system into Faros.
- A Faros connection represents the authentication to that external system. In hybrid mode, this is a part of the Airbyte source.
Airbyte Connections
Supported Airbyte Sources
- Some specific sources are wrapped in a
Faros Feeds
source. These are more robust and better fit the scale and use case of Faros than their generic Airbyte community maintained counterparts. - Supported Airbyte sources and Faros destination can be found in our airbyte-connectors repository
- Itβs also possible to develop and run your own source and extend the Faros destination.
Airbyte Destinations
All Faros connections leverage the same Faros Destination each with a different configuration.
Naming Conventions
Source container naming convention is lowercased vendor name with airbyte-
prefix and -source
suffix, e.g for VictorOps it's airbyte-victorops-source
, for SquadCast it's airbyte-squadcast-source
etc. You can find all the available connectors on our Docker Hub org.
The Faros Feeds source is airbyte-faros-feeds-source
The universal destination container is airbyte-faros-destination
Help Commands
To better understand which arguments the Faros Airbyte source/destination takes, you can run the spec-pretty
and airbyte-local-cli-wizard
commands.
spec-pretty
shows a table describing the arguments your source/destination needs.
airbyte-local-cli-wizard
walks you through the arguments and outputs them exactly as you should enter them when using the Airbyte Local CLI.
docker run farosai/airbyte-victorops-source:latest spec-pretty
docker run -it farosai/airbyte-victorops-source:latest airbyte-local-cli-wizard
Faros Feeds Source
For the specialized Faros Feeds wrapper source runningspec-pretty
will display all of the sources it wraps as well as each of their parameters.
Running spec-pretty --feed <source-name>
will print out only the subset of parameters required to run that particular source.
docker pull farosai/airbyte-faros-feeds-source
docker run farosai/airbyte-faros-feeds-source spec-pretty
docker run farosai/airbyte-faros-feeds-source spec-pretty --feed buildkite
Similarly, airbyte-local-cli-wizard
will show a list of wrapped sources to choose from and walk you through the selected source arguments. Alternatively, one can use airbyte-local-cli-wizard --feed <source-name>
when the source name is known.
docker pull farosai/airbyte-faros-feeds-source
docker run -it farosai/airbyte-faros-feeds-source airbyte-local-cli-wizard
docker run -it farosai/airbyte-faros-feeds-source airbyte-local-cli-wizard --feed buildkite
Running your Airbyte Connection
You can see examples of the most common Airbyte connections here.
-
Download our Airbyte Local CLI script. Make sure you have the required
bash
,docker
,jq
, and tee installed. -
Choose the source you want to connect to
- Pull the Faros Feeds source image and run the
spec-pretty
command to see if your source is in the list of supported sources for the Faros Feeds source. If so, useairbyte-local-cli-wizard
andspec-pretty
with the--feed
flag to learn which arguments are needed. - If it's not supported in the Faros Feeds source, find it in our list of supported sources. Run the help commands to understand which arguments youβll want to include for your source.
- Pull the Faros Feeds source image and run the
-
Add the name of your source to the list of source parameters you discovered above. Construct the following:
--src farosai/airbyte-<SOURCE_NAME>-source:latest --src.argument1 <value> --src.argument2 <value>
-
Copy the Faros destination and Connection arguments
--dst farosai/airbyte-faros-destination:latest \ --dst.edition_configs.edition cloud \ --dst.edition_configs.api_url https://prod.api.faros.ai \ --dst.edition_configs.api_key $FAROS_API_KEY \ --dst.edition_configs.graphql_api 'v2' \ --dst.edition_configs.graph default \ --dst-stream-prefix 'mysource__bitbucket-server__' \ --connection-name 'mysource' \
-
Combine all your arguments and run your Airbyte connection!
Below is an example of a completedbitbucket-server
command./airbyte-local.sh \ --src farosai/airbyte-bitbucket-server-source:latest \ --src.server_url $BITBUCKET_SERVER_URL \ --src.token $BITBUCKET_TOKEN \ --src.projects '["project-1","project-2"]' \ --src.repositories '["project-1/repo-1","project-1/repo-2"]' \ --src-docker-options '--env NODE_TLS_REJECT_UNAUTHORIZED=0' \ --src.cutoff_days 90 \ --dst farosai/airbyte-faros-destination:latest \ --dst.edition_configs.edition cloud \ --dst.edition_configs.api_url https://prod.api.faros.ai \ --dst.edition_configs.api_key $FAROS_API_KEY \ --dst.edition_configs.graphql_api 'v2' \ --dst.edition_configs.graph default \ --dst-stream-prefix 'mysource__bitbucket-server__' \ --connection-name 'mysource'
Incremental Syncs
The majority of sources support incremental syncs, a more efficient way of synchronizing data between the source system and Faros which only considers updated/deleted records since the last source run.
All you need to do is to provide the --state
state file to the source invocation command. In the example below we fetch the state file from an S3 bucket, invoke the sync which uses the provided state file, and finally we upload the updated state file back to the S3 bucket:
aws s3 cp s3://<bucket>/state.json state.json
./airbyte-local.sh <args> --state state.json
aws s3 cp state.json s3://<bucket>/state.json
Read more on how to manage the state between source syncs in the Incremental Syncs (advanced) chapter.
Updated 9 months ago