Importing Data in Hybrid Deployment Mode

Introduction to Airbyte

Faros leverages the open source Airbyte tool to extract data from your source system and push it into your Faros cloud instance. Airbyte constructs a complete pipeline by combining these three pieces: Airbyte connection, Airbyte source and Airbyte destination.

  • Airbyte connection - This combines the provided destination and source into a complete pipeline. The name you provide for your connection is also used in Faros as the dataโ€™s origin.
  • Airbyte source - this part is responsible for connecting to your source system.
  • Airbyte destination - This part is responsible for transforming data to match the Faros canonical-schema then pushing that data into Faros. All of your Airbyte sources will use the same Faros destination each with a different configuration.

๐Ÿ“˜

A Note on Terms:

Because hybrid mode leverages Airbyte, Airbyte terms refer to different concepts than their Faros counterparts.

  • A Faros source in the SaaS app is equivalent to Airbyte Connection in hybrid mode. Both import data from an external system into Faros.
  • A Faros connection represents the authentication to that external system. In hybrid mode, this is a part of the Airbyte source.

Airbyte Connections

Supported Airbyte Sources

  1. Some specific sources are wrapped in a Faros Feeds source. These are more robust and better fit the scale and use case of Faros than their generic Airbyte community maintained counterparts.
  2. Supported Airbyte sources and Faros destination can be found in our airbyte-connectors repository
  3. Itโ€™s also possible to develop and run your own source and extend the Faros destination.

Airbyte Destinations

All Faros connections leverage the same Faros Destination each with a different configuration.

Naming Conventions

Source container naming convention is lowercased vendor name with airbyte- prefix and -source suffix, e.g for VictorOps it's airbyte-victorops-source, for SquadCast it's airbyte-squadcast-source etc. You can find all the available connectors on our Docker Hub org.

The Faros Feeds source is airbyte-faros-feeds-source

The universal destination container is airbyte-faros-destination

Help Commands

To better understand which arguments the Faros Airbyte source/destination takes, you can run the spec-pretty and airbyte-local-cli-wizard commands.

spec-pretty shows a table describing the arguments your source/destination needs.

airbyte-local-cli-wizard walks you through the arguments and outputs them exactly as you should enter them when using the Airbyte Local CLI.

docker run farosai/airbyte-victorops-source:latest spec-pretty
docker run -it farosai/airbyte-victorops-source:latest airbyte-local-cli-wizard

Faros Feeds Source

For the specialized Faros Feeds wrapper source runningspec-pretty will display all of the sources it wraps as well as each of their parameters.

Running spec-pretty --feed <source-name> will print out only the subset of parameters required to run that particular source.

docker pull farosai/airbyte-faros-feeds-source
docker run farosai/airbyte-faros-feeds-source spec-pretty
docker run farosai/airbyte-faros-feeds-source spec-pretty --feed buildkite

Similarly, airbyte-local-cli-wizard will show a list of wrapped sources to choose from and walk you through the selected source arguments. Alternatively, one can use airbyte-local-cli-wizard --feed <source-name> when the source name is known.

docker pull farosai/airbyte-faros-feeds-source
docker run -it farosai/airbyte-faros-feeds-source airbyte-local-cli-wizard
docker run -it farosai/airbyte-faros-feeds-source airbyte-local-cli-wizard --feed buildkite

Running your Airbyte Connection

You can see examples of the most common Airbyte connections here.

  1. Download our Airbyte Local CLI script. Make sure you have the required bash,docker, jq, and tee installed.

  2. Choose the source you want to connect to

    1. Pull the Faros Feeds source image and run the spec-pretty command to see if your source is in the list of supported sources for the Faros Feeds source. If so, use airbyte-local-cli-wizard and spec-pretty with the --feed flag to learn which arguments are needed.
    2. If it's not supported in the Faros Feeds source, find it in our list of supported sources. Run the help commands to understand which arguments youโ€™ll want to include for your source.
  3. Add the name of your source to the list of source parameters you discovered above. Construct the following:

     --src farosai/airbyte-<SOURCE_NAME>-source:latest 
     --src.argument1 <value>  
     --src.argument2 <value>
    
  4. Copy the Faros destination and Connection arguments

     --dst farosai/airbyte-faros-destination:latest \  
     --dst.edition_configs.edition cloud \  
     --dst.edition_configs.api_url https://prod.api.faros.ai \  
     --dst.edition_configs.api_key $FAROS_API_KEY \
     --dst.edition_configs.graphql_api 'v2' \
     --dst.edition_configs.graph default \  
     --dst-stream-prefix 'mysource__bitbucket-server__' \  
     --connection-name 'mysource' \
    
  5. Combine all your arguments and run your Airbyte connection!
    Below is an example of a completed bitbucket-server command

    ./airbyte-local.sh \   
     --src farosai/airbyte-bitbucket-server-source:latest \  
     --src.server_url $BITBUCKET_SERVER_URL \  
     --src.token $BITBUCKET_TOKEN \  
     --src.projects '["project-1","project-2"]' \  
     --src.repositories '["project-1/repo-1","project-1/repo-2"]' \  
     --src-docker-options '--env NODE_TLS_REJECT_UNAUTHORIZED=0' \  
     --src.cutoff_days 90 \  
     --dst farosai/airbyte-faros-destination:latest \  
     --dst.edition_configs.edition cloud \  
     --dst.edition_configs.api_url https://prod.api.faros.ai \  
     --dst.edition_configs.api_key $FAROS_API_KEY \
     --dst.edition_configs.graphql_api 'v2' \
     --dst.edition_configs.graph default \  
     --dst-stream-prefix 'mysource__bitbucket-server__' \  
     --connection-name 'mysource'
    

Incremental Syncs

The majority of sources support incremental syncs, a more efficient way of synchronizing data between the source system and Faros which only considers updated/deleted records since the last source run.

All you need to do is to provide the --state state file to the source invocation command. In the example below we fetch the state file from an S3 bucket, invoke the sync which uses the provided state file, and finally we upload the updated state file back to the S3 bucket:

aws s3 cp s3://<bucket>/state.json state.json

./airbyte-local.sh <args> --state state.json

aws s3 cp state.json s3://<bucket>/state.json

Read more on how to manage the state between source syncs in the Incremental Syncs (advanced) chapter.