Extractors
Extractors are plugins that pull metadata from data sources. Each extractor connects to a source system, discovers resources, and emits Records containing entities and edges.
To use an extractor, add a source block to your recipe:
source:
name: bigquery
scope: my-project
config:
project_id: my-gcp-projectSupported Extractors
| Extractor | Entity Types | Edges | Source |
|---|---|---|---|
bigquery | table | derived_from | BigQuery API |
bigtable | table | — | Bigtable Admin API |
cassandra | table | — | CQL |
clickhouse | table | — | ClickHouse SQL |
couchdb | table | — | CouchDB HTTP API |
csv | table | — | Local filesystem |
dbt | model, source | derived_from, owned_by | dbt manifest.json |
elastic | table | — | Elasticsearch API |
mariadb | table | — | MariaDB SQL |
mongodb | table | — | MongoDB driver |
mssql | table | — | MS SQL Server |
mysql | table | — | MySQL SQL |
oracle | table | — | Oracle SQL |
postgres | table | — | PostgreSQL SQL |
presto | table | — | Presto SQL |
redshift | table | — | Redshift SQL |
snowflake | table | — | Snowflake SQL |
grafana | dashboard | — | Grafana HTTP API |
metabase | dashboard | derived_from | Metabase HTTP API |
redash | dashboard | — | Redash HTTP API |
superset | dashboard | — | Superset HTTP API |
tableau | dashboard | derived_from, owned_by | Tableau GraphQL API |
kafka | topic, consumer_group | consumed_by | Kafka admin client |
confluence | document | child_of, owned_by | Confluence REST API |
notion | document | child_of, owned_by | Notion API |
github | user, repository, team, document | member_of, owned_by, belongs_to | GitHub REST API |
gsuite | user | — | Google Admin SDK |
gcs | bucket | — | GCS API |
optimus | job | derived_from, generates, owned_by | Optimus gRPC API |
application_yaml | application | derived_from, generates, owned_by | Local YAML file |
openapi | api | — | OpenAPI/protobuf files or URLs |
http | (script-defined) | (script-defined) | Any HTTP API |
Entity Types
Each extractor emits one or more entity types. All entities share the same flat structure — the type field distinguishes them. See Metadata Models for the full schema.
| Entity Type | Description | Extractors |
|---|---|---|
table | Database tables, views, indices, collections | bigquery, bigtable, cassandra, clickhouse, couchdb, csv, elastic, mariadb, mongodb, mssql, mysql, oracle, postgres, presto, redshift, snowflake |
dashboard | Visualisation dashboards and their charts | grafana, metabase, redash, superset, tableau |
topic | Message bus topics | kafka |
consumer_group | Kafka consumer groups | kafka |
user | User accounts | github, gsuite |
repository | Source code repositories | github |
team | Teams within an organisation | github |
document | Documentation pages and wiki content | confluence, github, notion |
bucket | Cloud storage containers | gcs |
model | dbt transformation models | dbt |
source | dbt external source definitions | dbt |
job | Scheduled data transformation tasks | optimus |
application | Services and applications | application_yaml |
api | API schemas (OpenAPI specs, protobuf definitions) | openapi |
Edge Types
Edges represent relationships between entities. Not all extractors emit edges — see the table above.
| Edge Type | Meaning | Extractors |
|---|---|---|
derived_from | Entity depends on / reads from target (upstream dependency) | bigquery, dbt, metabase, tableau, optimus, application_yaml |
generates | Entity produces / writes to target (downstream output) | optimus, application_yaml |
owned_by | Entity is owned by a user or team | confluence, dbt, github, notion, optimus, application_yaml, tableau |
member_of | User belongs to an org or team | github |
belongs_to | Entity belongs to a parent entity | github |
child_of | Entity is a child of another entity (e.g. sub-page) | confluence, notion |
consumed_by | Consumer group reads from a topic | kafka |