API

The Go API surface for Apache Iceberg Go. New to the project? Walk through the Getting Started tutorial first - the recipes here assume you already have a catalog and a table.

For configuration knobs (catalog options, FileIO credentials, table properties), see Configuration. For predicate construction details, see Row Filter Syntax and Expression DSL.

Catalog

catalog.Catalog is the entry point for everything: namespaces, tables, views.

Constructing a catalog

REST

import (
    "context"

    "github.com/apache/iceberg-go/catalog/rest"
)

cat, err := rest.NewCatalog(context.Background(), "rest", "http://localhost:8181",
    rest.WithOAuthToken("your-token"))

SQL (SQLite, Postgres, MySQL, Oracle, MSSQL)

import (
    "database/sql"

    "github.com/apache/iceberg-go"
    sqlcat "github.com/apache/iceberg-go/catalog/sql"
    "github.com/uptrace/bun/driver/sqliteshim"
)

db, err := sql.Open(sqliteshim.ShimName, "file:catalog.db")
// handle err
cat, err := sqlcat.NewCatalog("default", db, sqlcat.SQLite, iceberg.Properties{
    "warehouse": "file:///tmp/warehouse",
})

Glue

import (
    "github.com/apache/iceberg-go/catalog/glue"
    "github.com/aws/aws-sdk-go-v2/config"
)

awsCfg, err := config.LoadDefaultConfig(context.TODO())
// handle err
cat := glue.NewCatalog(glue.WithAwsConfig(awsCfg))

Hive

import (
    "github.com/apache/iceberg-go"
    "github.com/apache/iceberg-go/catalog/hive"
)

cat, err := hive.NewCatalog(iceberg.Properties{},
    hive.WithURI("thrift://localhost:9083"),
    hive.WithWarehouse("s3://my-bucket/warehouse"))

Hadoop

import (
    "github.com/apache/iceberg-go"
    "github.com/apache/iceberg-go/catalog/hadoop"
)

cat, err := hadoop.NewCatalog("default", "file:///tmp/warehouse", iceberg.Properties{})

Via the registry

catalog.Load looks up the right backend via the type property (or uri scheme as a fallback) plus your ~/.iceberg-go.yaml. Useful when you want runtime selection:

import (
    "github.com/apache/iceberg-go"
    "github.com/apache/iceberg-go/catalog"
)

cat, err := catalog.Load(ctx, "default", iceberg.Properties{
    "type":      "rest",
    "uri":       "http://localhost:8181",
    "warehouse": "s3://my-bucket/warehouse",
})

Namespaces

ns := table.Identifier{"sales"}

err := cat.CreateNamespace(ctx, ns, iceberg.Properties{"owner": "data-team"})

namespaces, err := cat.ListNamespaces(ctx, nil) // []table.Identifier
exists, err := cat.CheckNamespaceExists(ctx, ns)
props, err := cat.LoadNamespaceProperties(ctx, ns)
summary, err := cat.UpdateNamespaceProperties(ctx, ns,
    []string{"deprecated"}, // removals
    iceberg.Properties{"owner": "platform-team"}, // updates
)
err = cat.DropNamespace(ctx, ns)

table.Identifier is []string; use catalog.ToIdentifier("sales", "orders") (or catalog.ToIdentifier("sales.orders")) to build one from string parts.

Tables

Defining a schema

import "github.com/apache/iceberg-go"

schema := iceberg.NewSchema(1,
    iceberg.NestedField{ID: 1, Name: "id", Type: iceberg.PrimitiveTypes.Int64, Required: true},
    iceberg.NestedField{ID: 2, Name: "name", Type: iceberg.PrimitiveTypes.String, Required: false},
    iceberg.NestedField{ID: 3, Name: "active", Type: iceberg.PrimitiveTypes.Bool, Required: false},
)

For nested types use &iceberg.StructType{...}, &iceberg.ListType{...}, or &iceberg.MapType{...}. Use NewSchemaWithIdentifiers(id, identifierIDs, fields...) to mark identifier columns.

Create

import "github.com/apache/iceberg-go/catalog"

ident := catalog.ToIdentifier("sales", "orders")

tbl, err := cat.CreateTable(ctx, ident, schema,
    catalog.WithLocation("s3://my-bucket/sales/orders"),
    catalog.WithProperties(iceberg.Properties{"owner": "data-team"}),
)

Optional catalog.WithPartitionSpec, catalog.WithSortOrder, and catalog.WithStagedUpdates are also available.

Load, exists, list, drop, rename

tbl, err := cat.LoadTable(ctx, ident)

exists, err := cat.CheckTableExists(ctx, ident)

for ident, err := range cat.ListTables(ctx, table.Identifier{"sales"}) {
    if err != nil { /* ... */ }
    fmt.Println(ident)
}

err = cat.DropTable(ctx, ident)

renamed, err := cat.RenameTable(ctx,
    catalog.ToIdentifier("sales", "orders"),
    catalog.ToIdentifier("sales", "orders_v2"))

ListTables returns an iter.Seq2[table.Identifier, error] that streams results.

Inspecting metadata

tbl.Identifier()         // table.Identifier
tbl.Location()           // string
tbl.MetadataLocation()   // string (path of current metadata.json)
tbl.Metadata()           // table.Metadata
tbl.Schema()             // *iceberg.Schema (current)
tbl.Schemas()            // map[int]*iceberg.Schema
tbl.Spec()               // iceberg.PartitionSpec
tbl.SortOrder()          // table.SortOrder
tbl.Properties()         // iceberg.Properties

if snap := tbl.CurrentSnapshot(); snap != nil {
    fmt.Println(snap.SnapshotID, snap.TimestampMs, snap.Summary)
}

// All snapshots
for _, snap := range tbl.Metadata().Snapshots() {
    fmt.Println(snap.SnapshotID)
}

// Stream all manifest files across all snapshots
for mf, err := range tbl.AllManifests(ctx) {
    if err != nil { /* ... */ }
    fmt.Println(mf.FilePath())
}

Reading data

(t Table) Scan(opts ...ScanOption) *Scan returns a scan that you can resolve into Arrow data.

Streaming record batches

import "github.com/apache/iceberg-go/table"

scan := tbl.Scan()
arrowSchema, batches, err := scan.ToArrowRecords(ctx)
if err != nil { /* ... */ }

fmt.Println(arrowSchema)
for batch, err := range batches {
    if err != nil { /* ... */ }
    fmt.Printf("batch with %d rows\n", batch.NumRows())
    batch.Release()
}

ToArrowRecords returns iter.Seq2[arrow.RecordBatch, error] so only one batch is in memory at a time. Always call batch.Release() to free Arrow buffers.

Materializing as an Arrow Table

arrowTbl, err := tbl.Scan().ToArrowTable(ctx)
if err != nil { /* ... */ }
defer arrowTbl.Release()
fmt.Printf("%d rows in %d cols\n", arrowTbl.NumRows(), arrowTbl.NumCols())

Projection and filters

import "github.com/apache/iceberg-go"

scan := tbl.Scan(
    table.WithSelectedFields("id", "name"),
    table.WithRowFilter(
        iceberg.NewAnd(
            iceberg.GreaterThanEqual(iceberg.Reference("id"), int64(100)),
            iceberg.IsIn(iceberg.Reference("region"), "us-east", "us-west"),
        ),
    ),
    table.WithLimit(1000),
    table.WithCaseSensitive(true),
)

For the predicate vocabulary, see Row Filter Syntax.

Time travel

// By snapshot ID
scan := tbl.Scan(table.WithSnapshotID(snap.SnapshotID))

// As of a timestamp (milliseconds since epoch)
scan = tbl.Scan(table.WithSnapshotAsOf(time.Now().Add(-24*time.Hour).UnixMilli()))

Reading from a branch or tag

scan, err := tbl.Scan().UseRef("audit-branch")
if err != nil { /* ... */ }

arrowTbl, err := scan.ToArrowTable(ctx)

Iterating tasks for custom processing

If you need finer control (custom file readers, distributed scan planning):

scan := tbl.Scan(table.WithRowFilter(myFilter))
tasks, err := scan.PlanFiles(ctx)
if err != nil { /* ... */ }

arrowSchema, batches, err := scan.ReadTasks(ctx, tasks)

Writing data

The shortcut methods on Table open a transaction, perform the write, and commit. Use NewTransaction directly when you need to combine multiple operations.

Append

import (
    "github.com/apache/arrow-go/v18/arrow/array"
)

// From a streaming RecordReader
newTbl, err := tbl.Append(ctx, recordReader, nil /* snapshot props */)

// From an in-memory Arrow Table; batchSize controls the rolling writer
newTbl, err = tbl.AppendTable(ctx, arrowTbl, 1024, nil)

Overwrite

import "github.com/apache/iceberg-go/table"

// Replace all data
newTbl, err := tbl.Overwrite(ctx, recordReader, nil)

// Replace only rows matching a filter
newTbl, err = tbl.Overwrite(ctx, recordReader, nil,
    table.WithOverwriteFilter(
        iceberg.EqualTo(iceberg.Reference("date"), "2026-01-01"),
    ),
)

OverwriteTable is the arrow.Table variant.

Delete

newTbl, err := tbl.Delete(ctx,
    iceberg.LessThan(iceberg.Reference("id"), int64(100)),
    nil, /* snapshot props */
)

Add existing files

When you already have data files (e.g. produced by another writer), register them without rewriting:

txn := tbl.NewTransaction()
err := txn.AddFiles(ctx, []string{
    "s3://my-bucket/sales/orders/data/file-1.parquet",
    "s3://my-bucket/sales/orders/data/file-2.parquet",
}, nil /* snapshot props */, false /* ignoreDuplicates */)
if err != nil { /* ... */ }

newTbl, err := txn.Commit(ctx)

ReplaceDataFiles(ctx, filesToDelete, filesToAdd, snapshotProps) and ReplaceDataFilesWithDataFiles(ctx, filesToDelete, dataFilesToAdd, snapshotProps, opts...) are also available on *Transaction for swapping files atomically.

Transactions

Group writes and metadata changes into one atomic snapshot:

txn := tbl.NewTransaction()

if err := txn.Delete(ctx,
    iceberg.LessThan(iceberg.Reference("date"), "2026-01-01"), nil); err != nil {
    /* ... */
}
if err := txn.Append(ctx, recordReader, nil); err != nil {
    /* ... */
}
if err := txn.SetProperties(iceberg.Properties{"commit.user": "data-pipeline"}); err != nil {
    /* ... */
}

newTbl, err := txn.Commit(ctx)

To target a specific branch:

txn := tbl.NewTransactionOnBranch("staging")

Commit retries automatically on conflict (ErrCommitFailed) - tune via the commit.retry.* table properties.

Schema and partition evolution

Schema evolution

import "github.com/apache/iceberg-go/table"

txn := tbl.NewTransaction()

err := table.NewUpdateSchema(txn, true /* caseSensitive */, false /* allowIncompatibleChanges */).
    AddColumn([]string{"tip"}, iceberg.PrimitiveTypes.Float64, "Tip in dollars", false, nil).
    RenameColumn([]string{"name"}, "full_name").
    DeleteColumn([]string{"deprecated_field"}).
    Commit()
if err != nil { /* ... */ }

newTbl, err := txn.Commit(ctx)

Reorder fields with MoveFirst, MoveBefore, or MoveAfter. Set allowIncompatibleChanges to true to permit type narrowing or making optional columns required.

Partition evolution

us := table.NewUpdateSpec(txn, true /* caseSensitive */)
us.AddField("event_time", iceberg.DayTransform{}, "event_day") // sourceColName, transform, partitionFieldName
us.RemoveField("legacy_partition")
if err := us.Commit(); err != nil { /* ... */ }

AddField chains; AddIdentity(sourceCol) is a shortcut for an identity transform; RenameField(name, newName) renames an existing partition field.

Available transforms (root iceberg package): IdentityTransform{}, YearTransform{}, MonthTransform{}, DayTransform{}, HourTransform{}, BucketTransform{NumBuckets: N}, TruncateTransform{Width: W}.

Snapshots and refs

Inspecting

if snap := tbl.CurrentSnapshot(); snap != nil {
    fmt.Println(snap.SnapshotID, snap.TimestampMs, snap.Summary)
}
snap := tbl.SnapshotByID(snapshotID)
named := tbl.SnapshotByName("audit")

for _, s := range tbl.Metadata().Snapshots() {
    fmt.Println(s.SnapshotID, s.Summary)
}

Branches and tags

The CLI's branch create and tag create commands (CLI) are the most ergonomic surface today. Programmatically, ref creation goes through Catalog.CommitTable with a SetSnapshotRef update:

import "github.com/apache/iceberg-go/table"

snap := tbl.CurrentSnapshot()
update := table.NewSetSnapshotRefUpdate(
    "audit",                  // ref name
    snap.SnapshotID,
    table.BranchRef,          // or table.TagRef
    0,                        // maxRefAgeMs (0 = unset)
    0,                        // maxSnapshotAgeMs (0 = unset)
    0,                        // minSnapshotsToKeep (0 = unset)
)
reqs := []table.Requirement{
    table.AssertTableUUID(tbl.Metadata().TableUUID()),
    table.AssertRefSnapshotID("audit", nil), // ref must not already exist
}

_, _, err := cat.CommitTable(ctx, tbl.Identifier(), reqs, []table.Update{update})

Constants table.MainBranch, table.BranchRef, table.TagRef live in table/refs.go. A higher-level builder is on the roadmap.

Expiration and rollback

// Expire snapshots older than the table's retention properties
err := txn.ExpireSnapshots(/* options */)

// Roll back to a previous snapshot
err = txn.RollbackToSnapshot(targetSnapshotID)

Tune retention with the min-snapshots-to-keep, max-snapshot-age-ms, and max-ref-age-ms table properties (see Configuration).

Maintenance

Orphan file cleanup

import (
    "time"

    "github.com/apache/iceberg-go/table"
)

result, err := tbl.DeleteOrphanFiles(ctx,
    table.WithFilesOlderThan(72*time.Hour),
    table.WithDryRun(false),
    table.WithMaxConcurrency(8),
)
if err != nil { /* ... */ }
fmt.Printf("removed %d files\n", len(result.DeletedFiles))

Also see table.WithLocation, table.WithDeleteFunc, table.WithPrefixMismatchMode, table.WithEqualSchemes, and table.WithEqualAuthorities in table/orphan_cleanup.go.

Compaction (rewrite data files)

import "github.com/apache/iceberg-go/table"

txn := tbl.NewTransaction()
result, err := txn.RewriteDataFiles(ctx, groups /* []table.CompactionTaskGroup */, table.RewriteDataFilesOptions{})
if err != nil { /* ... */ }

newTbl, err := txn.Commit(ctx)
fmt.Printf("rewrote %d files into %d (%d -> %d bytes)\n",
    result.RemovedDataFiles, result.AddedDataFiles, result.BytesBefore, result.BytesAfter)

The table/compaction subpackage provides bin-packing planning. The iceberg compact analyze and compact run CLI commands wrap the same machinery - see CLI.

Views

Views are created and loaded through catalogs that support them (REST, Hive, SQL):

import "github.com/apache/iceberg-go/view"

// Create
v, err := view.CreateView(
    ctx,
    "my-catalog",
    table.Identifier{"analytics", "monthly_orders"},
    schema,
    "SELECT month, sum(amount) FROM orders GROUP BY month",
    table.Identifier{"sales"},                      // default namespace for unqualified names
    "s3://my-bucket/views/monthly_orders",
    iceberg.Properties{},
)

// Inspect
v.CurrentVersion()      // *view.Version
v.CurrentSchema()       // *iceberg.Schema
v.Versions()            // []*view.Version
v.Schemas()             // map[int]*iceberg.Schema
v.Properties()          // iceberg.Properties

view.New(ident, meta, metadataLocation) constructs a view from already-loaded metadata; view.NewFromLocation(ctx, ident, metadataLocation, fsysFactory) loads metadata from disk or object storage.

Iceberg ↔ Arrow types

When iceberg-go converts an Iceberg schema to Arrow (e.g. for the scanner output) or vice versa, the type mapping is:

Iceberg typeArrow type
booleanarrow.FixedWidthTypes.Boolean
intarrow.PrimitiveTypes.Int32
longarrow.PrimitiveTypes.Int64
floatarrow.PrimitiveTypes.Float32
doublearrow.PrimitiveTypes.Float64
decimal(p, s)arrow.Decimal128Type{Precision: p, Scale: s}
datearrow.FixedWidthTypes.Date32
timearrow.FixedWidthTypes.Time64us
timestamp&arrow.TimestampType{Unit: arrow.Microsecond} (no zone)
timestamptzarrow.FixedWidthTypes.Timestamp_us (UTC zone)
timestamp_ns&arrow.TimestampType{Unit: arrow.Nanosecond} (no zone)
timestamptz_nsarrow.FixedWidthTypes.Timestamp_ns (UTC zone)
stringarrow.BinaryTypes.String
binaryarrow.BinaryTypes.Binary
fixed[L]&arrow.FixedSizeBinaryType{ByteWidth: L}
uuidarrow.FixedWidthTypes.UUID (extension type)
struct<...>arrow.StructOf(...)
list<E>arrow.ListOf(E) (or LargeListOf if useLargeTypes)
map<K, V>arrow.MapOf(K, V)
variantarrow.ExtensionType for Variant

Helpers in table/arrow_utils.go:

  • SchemaToArrowSchema(sc *iceberg.Schema, nameMapping NameMapping, useLargeTypes, includeRowLineage bool) (*arrow.Schema, error)
  • VisitArrowSchema[T](sc *arrow.Schema, visitor ArrowSchemaVisitor[T]) (T, error)

For a writer-side schema (Arrow → Iceberg), the scanner and writers handle conversion automatically as long as your Arrow schema is compatible with the table schema.