Sink
sinks are used to define the medium of consuming the metadata being extracted. You need to specify at least one sink or can specify multiple sinks in a recipe, this will prevent you from having to create duplicate recipes for the same job. The given examples show you its correct usage if your sink is http and kafka.
How Sinks Work
Each record flowing through the pipeline contains an Entity and its Edges. When a sink receives a record, it is responsible for handling both:
- Entity: The core metadata (urn, type, name, description, source, properties).
- Edges: Relationships such as ownership (
owned_by) and lineage (derived_from/generates), each linking a source URN to a target URN.
How sinks handle edges depends on the destination. For example, the Compass sink sends entities and edges together so that Compass can build its relationship graph. The Kafka and HTTP sinks serialize the entire record (entity + edges) as JSON and send it to the configured endpoint.
For details on the data model, see Source.
Writing sinks part of your recipe
sinks: # required - at least 1 sink defined
- name: http
config:
method: POST
url: "https://example.com/metadata"
- name: kafka
config:
brokers: localhost:9092
topic: "target-topic"
key_path:| key | Description | requirement |
|---|---|---|
name | contains the name of sink | required |
config | different sinks will require different configuration | optional, depends on sink |
Available Sinks
- Console
sinks:
- name: consolePrint metadata to stdout.
- Compass
sinks:
- name: compass
config:
host: https://compass.example.comUpload metadata to Compass, Raystack's metadata catalog service. The Compass sink sends each entity along with its edges, which are upserted uniformly via the UpsertEdge API.
- File
sinks:
- name: file
config:
path: "./dir/sample.yaml"
format: "yaml"Sinks metadata to a file in json/yaml format as per the config defined.
- Google Cloud Storage
sinks:
- name: gcs
config:
project_id: google-project-id
url: gcs://bucket_name/target_folder
object_prefix: github-users
service_account_base64: <base64 encoded service account key>Sinks JSON data as ndjson format in a Google Cloud Storage bucket.
- HTTP
sinks:
- name: http
config:
method: POST
success_code: 200
url: https://example.com/v1/metadata
headers:
Header-1: value11,value12Sinks metadata to an HTTP destination as per the config defined. The full record (entity + edges) is serialized as JSON.
- Kafka
sinks:
- name: kafka
config:
brokers: "localhost:9092"
topic: metadata-topic
key_path: ".urn"Publish metadata as JSON messages to a Kafka topic. Supports message keying for partition control.
- Stencil
sinks:
- name: stencil
config:
host: https://stencil.com
namespace_id: myNamespace
format: jsonUpload metadata of a given schema format in the existing namespace_id present in Stencil. Supported formats: json, avro.
- S3
sinks:
- name: s3
config:
bucket_url: s3://bucket-name/target-folder
region: us-east-1
object_prefix: github-users
access_key_id: <access-key-id>
secret_access_key: <secret-access-key>Sinks NDJSON data to an Amazon S3 bucket. Supports S3-compatible stores (MinIO, etc.) via the endpoint config.
- Azure Blob
sinks:
- name: azure_blob
config:
storage_account_url: https://myaccount.blob.core.windows.net
container_name: my-container
object_prefix: github-users
account_key: <account-key>Sinks NDJSON data to an Azure Blob Storage container.
Serializer
By default, metadata would be serialized into JSON format before sinking. To send it using other formats, a serializer needs to be defined in the sink config.
Custom Sink
Meteor has built-in sinks like Kafka and HTTP which users can just use directly. We will also allow creating custom sinks for DRY purposes.
It will be useful if you can find yourself sinking multiple metadata sources to one place.
Sample Custom Sink
- central_metadata_store_sink.yaml
name: central-metadata-store # unique sink name as an ID
sink:
- name: http
config:
method: PUT
url: "https://metadata-store.com/metadata"More info about available sinks can be found here.