Sink

sinks are used to define the medium of consuming the metadata being extracted. You need to specify at least one sink or can specify multiple sinks in a recipe, this will prevent you from having to create duplicate recipes for the same job. The given examples show you its correct usage if your sink is http and kafka.

How Sinks Work

Each record flowing through the pipeline contains an Entity and its Edges. When a sink receives a record, it is responsible for handling both:

Entity: The core metadata (urn, type, name, description, source, properties).
Edges: Relationships such as ownership (owned_by) and lineage (derived_from/generates), each linking a source URN to a target URN.

How sinks handle edges depends on the destination. For example, the Compass sink sends entities and edges together so that Compass can build its relationship graph. The Kafka and HTTP sinks serialize the entire record (entity + edges) as JSON and send it to the configured endpoint.

For details on the data model, see Source.

Writing `sinks` part of your recipe

sinks: # required - at least 1 sink defined
  - name: http
    config:
      method: POST
      url: "https://example.com/metadata"
  - name: kafka
    config:
      brokers: localhost:9092
      topic: "target-topic"
      key_path:

key	Description	requirement
`name`	contains the name of sink	required
`config`	different sinks will require different configuration	optional, depends on sink

Available Sinks

Console

sinks:
  - name: console

Print metadata to stdout.

Compass

sinks:
  - name: compass
    config:
      host: https://compass.example.com

Upload metadata to Compass, Raystack's metadata catalog service. The Compass sink sends each entity along with its edges, which are upserted uniformly via the UpsertEdge API.

File

sinks:
  - name: file
    config:
      path: "./dir/sample.yaml"
      format: "yaml"

Sinks metadata to a file in json/yaml format as per the config defined.

Google Cloud Storage

sinks:
  - name: gcs
    config:
      project_id: google-project-id
      url: gcs://bucket_name/target_folder
      object_prefix: github-users
      service_account_base64: <base64 encoded service account key>

Sinks JSON data as ndjson format in a Google Cloud Storage bucket.

HTTP

sinks:
  - name: http
    config:
      method: POST
      success_code: 200
      url: https://example.com/v1/metadata
      headers:
        Header-1: value11,value12

Sinks metadata to an HTTP destination as per the config defined. The full record (entity + edges) is serialized as JSON.

Kafka

sinks:
  - name: kafka
    config:
      brokers: "localhost:9092"
      topic: metadata-topic
      key_path: ".urn"

Publish metadata as JSON messages to a Kafka topic. Supports message keying for partition control.

Stencil

sinks:
  - name: stencil
    config:
      host: https://stencil.com
      namespace_id: myNamespace
      format: json

Upload metadata of a given schema format in the existing namespace_id present in Stencil. Supported formats: json, avro.

sinks:
  - name: s3
    config:
      bucket_url: s3://bucket-name/target-folder
      region: us-east-1
      object_prefix: github-users
      access_key_id: <access-key-id>
      secret_access_key: <secret-access-key>

Sinks NDJSON data to an Amazon S3 bucket. Supports S3-compatible stores (MinIO, etc.) via the endpoint config.

Azure Blob

sinks:
  - name: azure_blob
    config:
      storage_account_url: https://myaccount.blob.core.windows.net
      container_name: my-container
      object_prefix: github-users
      account_key: <account-key>

Sinks NDJSON data to an Azure Blob Storage container.

Serializer

By default, metadata would be serialized into JSON format before sinking. To send it using other formats, a serializer needs to be defined in the sink config.

Custom Sink

Meteor has built-in sinks like Kafka and HTTP which users can just use directly. We will also allow creating custom sinks for DRY purposes.

It will be useful if you can find yourself sinking multiple metadata sources to one place.

Sample Custom Sink

central_metadata_store_sink.yaml

name: central-metadata-store # unique sink name as an ID
sink:
  - name: http
    config:
      method: PUT
      url: "https://metadata-store.com/metadata"

More info about available sinks can be found here.