Extractors

Extractors are plugins that pull metadata from data sources. Each extractor connects to a source system, discovers resources, and emits Records containing entities and edges.

To use an extractor, add a source block to your recipe:

source:
  name: bigquery
  scope: my-project
  config:
    project_id: my-gcp-project

Supported Extractors

ExtractorEntity TypesEdgesSource
bigquerytablederived_fromBigQuery API
bigtabletableBigtable Admin API
cassandratableCQL
clickhousetableClickHouse SQL
couchdbtableCouchDB HTTP API
csvtableLocal filesystem
dbtmodel, sourcederived_from, owned_bydbt manifest.json
elastictableElasticsearch API
mariadbtableMariaDB SQL
mongodbtableMongoDB driver
mssqltableMS SQL Server
mysqltableMySQL SQL
oracletableOracle SQL
postgrestablePostgreSQL SQL
prestotablePresto SQL
redshifttableRedshift SQL
snowflaketableSnowflake SQL
grafanadashboardGrafana HTTP API
metabasedashboardderived_fromMetabase HTTP API
redashdashboardRedash HTTP API
supersetdashboardSuperset HTTP API
tableaudashboardderived_from, owned_byTableau GraphQL API
kafkatopic, consumer_groupconsumed_byKafka admin client
confluencedocumentchild_of, owned_byConfluence REST API
notiondocumentchild_of, owned_byNotion API
githubuser, repository, team, documentmember_of, owned_by, belongs_toGitHub REST API
gsuiteuserGoogle Admin SDK
gcsbucketGCS API
optimusjobderived_from, generates, owned_byOptimus gRPC API
application_yamlapplicationderived_from, generates, owned_byLocal YAML file
openapiapiOpenAPI/protobuf files or URLs
http(script-defined)(script-defined)Any HTTP API

Entity Types

Each extractor emits one or more entity types. All entities share the same flat structure — the type field distinguishes them. See Metadata Models for the full schema.

Entity TypeDescriptionExtractors
tableDatabase tables, views, indices, collectionsbigquery, bigtable, cassandra, clickhouse, couchdb, csv, elastic, mariadb, mongodb, mssql, mysql, oracle, postgres, presto, redshift, snowflake
dashboardVisualisation dashboards and their chartsgrafana, metabase, redash, superset, tableau
topicMessage bus topicskafka
consumer_groupKafka consumer groupskafka
userUser accountsgithub, gsuite
repositorySource code repositoriesgithub
teamTeams within an organisationgithub
documentDocumentation pages and wiki contentconfluence, github, notion
bucketCloud storage containersgcs
modeldbt transformation modelsdbt
sourcedbt external source definitionsdbt
jobScheduled data transformation tasksoptimus
applicationServices and applicationsapplication_yaml
apiAPI schemas (OpenAPI specs, protobuf definitions)openapi

Edge Types

Edges represent relationships between entities. Not all extractors emit edges — see the table above.

Edge TypeMeaningExtractors
derived_fromEntity depends on / reads from target (upstream dependency)bigquery, dbt, metabase, tableau, optimus, application_yaml
generatesEntity produces / writes to target (downstream output)optimus, application_yaml
owned_byEntity is owned by a user or teamconfluence, dbt, github, notion, optimus, application_yaml, tableau
member_ofUser belongs to an org or teamgithub
belongs_toEntity belongs to a parent entitygithub
child_ofEntity is a child of another entity (e.g. sub-page)confluence, notion
consumed_byConsumer group reads from a topickafka