Skip to content

Seq iteration#1213

Merged
micahscopes merged 25 commits intoargotorg:masterfrom
micahscopes:seq-iteration
Jan 19, 2026
Merged

Seq iteration#1213
micahscopes merged 25 commits intoargotorg:masterfrom
micahscopes:seq-iteration

Conversation

@micahscopes
Copy link
Collaborator

@micahscopes micahscopes commented Jan 13, 2026

Implements generic iteration via a Seq<T> trait, following Sean's proposal in #1207. Also adds .. range operator syntax and fixes const generics in impl target types.

Changes

Parser/HIR

  • Range operator - adds .. as ArithBinOp::Range, parsed and lowered through HIR
  • Formatter and codegen adjusted to print/emit the new expression form

Core library (library/core/)

  • Seq<T> trait - fn len(self) -> usize and fn get(self, i: usize) -> T
  • Range / DynRange - range types with Seq implementations
  • Array Seq impl - impl<T, const N: usize> Seq<T> for [T; N]

Type checking (hir/analysis/ty/)

  • For-loop Seq resolution - loops typed via Seq trait lookup instead of hardcoded array/range cases
  • ForLoopSeq - stores pre-resolved Callable for len/get so MIR can emit direct calls
  • Inference hygiene - snapshot/rollback when probing candidate impls to prevent pollution
  • Effect preservation - resolved effect args carried through synthesized method calls

MIR lowering

  • Seq-based desugaring - for x in seq becomes index loop using resolved Seq::len/Seq::get
  • Continue correctness - continue now targets increment block, not condition (fixes infinite loop bug)
  • CallOrigin.expr - now Option<ExprId> to represent synthesized calls without fake expr IDs
  • Fallback paths remain for arrays/ranges when Seq resolution unavailable

Const generics fix

  • Impl target scoping - Impl::lower_ast and ImplTrait::lower_ast now enter preliminary scope first, lower generics, then lower target type
  • Enables impl<T, const N: usize> ... for [T; N] patterns that previously failed resolution

Tests

  • const_generic_impl_target_array.fe / const_generic_impl_regression.fe - uitest for const generic fix
  • for_continue.fe / for_continue.mir.snap - MIR snapshot for continue-in-for correctness

Questions / follow-up

@micahscopes
Copy link
Collaborator Author

micahscopes commented Jan 14, 2026 via email

@micahscopes
Copy link
Collaborator Author

Still polishing a few details here

@micahscopes
Copy link
Collaborator Author

@Turupawn I just did another push, hopefully this is pretty close to what gets merged

@sbillig sbillig marked this pull request as ready for review January 16, 2026 17:18
@micahscopes
Copy link
Collaborator Author

@sbillig @Turupawn ah wait, just kidding, a few more things to do. I'll let you know when it's ready to look at.

@micahscopes micahscopes marked this pull request as draft January 16, 2026 17:18
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67f9eec23d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1038 to +1042
// Add end field initialization
if let Some(info) = self.field_access_info(range_ty, FieldIndex::Ident(end_ident)) {
inits.push((
MirProjectionPath::from_projection(Projection::Field(info.field_idx)),
end_value,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid storing Known range bounds into ZST fields

In lower_range_expr you always initialize the end field with end_value, but Range bounds are zero-sized when they’re Known (Known’s Repr is () in library/core/src/range.fe). For a range like n..10, the type is Range<Unknown, Known<10>>, whose runtime layout only includes start; initializing end still emits a usize store at the end-field offset, which is beyond the allocated size and can clobber adjacent memory. This should skip initialization when the bound field type is zero-sized (or synthesize a unit) so Known bounds don’t emit stores.

Useful? React with 👍 / 👎.

@micahscopes micahscopes marked this pull request as ready for review January 16, 2026 19:41
@micahscopes
Copy link
Collaborator Author

Okay, better now! c.c. @Turupawn @sbillig

@sbillig
Copy link
Collaborator

sbillig commented Jan 16, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec06f01b77

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1882 to +1886
origin: ValueOrigin::PlaceRef(place.clone()),
repr: ValueRepr::Word,
});
// Emit a load instruction for the start value
self.assign(None, None, Rvalue::Load { place });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid emitting load without a destination

The range fallback currently emits Rvalue::Load with dest = None, but the Yul emitter rejects this (emit_assign_inst errors on “load without destination”), so any for loop that hits lower_for_range (e.g., when Seq resolution fails or core Seq isn’t available during error recovery) will crash during codegen. Please load into a temp local (as done elsewhere) or otherwise avoid generating load-only assignments.

Useful? React with 👍 / 👎.


// Extract the end value from the range
let end_value =
if let Some(info) = self.field_access_info(range_ty, FieldIndex::Ident(end_ident)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to unwrap here and remove the fallback. Same goes for the 'start' case above.

Comment on lines +38 to +40
fn len(self) -> usize {
E - S
}
Copy link
Collaborator

@sbillig sbillig Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use saturating_sub (though sadly that doesn't exist yet), or we should add a hir check to ensure E >= S at least for the sugar syntax. (or we just make an issue to modify this when saturating_sub is available).

Comment on lines +1634 to +1648
if let Some(elem_local) = elem_local {
let idx_for_get = self.builder.body.alloc_value(ValueData {
ty: usize_ty,
origin: ValueOrigin::Local(idx_local),
repr: ValueRepr::Word,
});
let elem_value = self.emit_seq_get_call(
iterable_value,
idx_for_get,
elem_ty,
&seq_info.get_callable,
&seq_info.get_effect_args,
);

self.assign(None, Some(elem_local), Rvalue::Value(elem_value));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work for pattern bindings, eg for Point { x, y } in points.

I think all you need to do is:

// remove the `if let Some(elem_local)` guard
let idx_for_get = ...
let elem_value = ...
self.bind_pat_value(pat, elem_value)
// carry on with the `// Execute the body` code below.

Copy link
Collaborator

@sbillig sbillig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, pending minor issues above.

Update check_for to resolve element types via Seq<T> trait lookup
rather than the placeholder Iterator reference. Arrays remain special
cased since const generics in array type positions aren't fully
supported yet.
Range expressions (a..b) now type check to DynRange. Both
operands are verified to be usize, and the result is resolved
from core::range::DynRange.
Desugar range expressions `start..end` in MIR lowering to allocate
and initialize a DynRange struct with start and end fields, rather
than leaving them as Range binops that panic at codegen.
Desugar for-loops into while loops:

- For ranges (DynRange): initialize loop var to start, loop while < end
- For arrays: create hidden index, bind element each iteration

Both cases increment by 1 and use standard loop control flow with
proper continue/break targets registered.
Remove the bifurcated Const<N>/Dyn type system in favor of a simple
approach that matches issue argotorg#1207:

- Remove Const<N> and Dyn marker types from seq.fe
- Remove Range<N> (const-length), keep only Range with start/end
- Simplify Seq trait to return usize directly (no Len associated type)
- Update type checker and MIR comments from DynRange to Range

This defers const-vs-dynamic optimization to the backend rather than
encoding it in the type system, reducing complexity significantly.
Previously, `impl<const N: usize> Trait for [T; N]` failed because the
target type was lowered before entering the impl scope, so const generic
params weren't available for array length expressions.

Fix by entering the impl scope with a preliminary ID first, lowering
generic params, then lowering the target type. The final ID is swapped
in before lowering nested items like methods.

This enables the array Seq implementation in core library.
Resolve Seq::len and Seq::get at type-check time and store in
ForLoopSeq for MIR lowering. Desugars `for x in seq` to an
index-based while loop calling the resolved trait methods.

- snapshot/rollback when probing Seq impls to prevent inference pollution
- route `continue` to increment block to avoid infinite loops
- preserve effect args on synthesized Seq method calls
- Prevent unification-table panics when resolving associated types via impls

- Constrain record init generics from expected type without inference pollution

- Add uitests for assoc-type struct fields and for-loop over array literal
Replace the simple Range struct with a generic Range<S: Bound, E: Bound>
that encodes bound constness in the type system:

- Range<Known<0>, Known<10>> for fully const ranges (0 words)
- Range<Known<0>, Unknown> for const start, runtime end (1 word)
- Range<Unknown, Unknown> for fully dynamic ranges (2 words)

The type checker now detects integer literals in range expressions
and constructs the appropriate Range type, enabling zero-cost
abstractions when bounds are compile-time known.
@micahscopes micahscopes merged commit 6cf9ef3 into argotorg:master Jan 19, 2026
5 checks passed
@micahscopes
Copy link
Collaborator Author

@Turupawn here's some info on where things landed and some ideas for integrating #1211:

Seq-Based Iteration: Architecture Report & Future Integration Guide

Summary

The seq-iteration branch implements a unified iteration system where all for-loops flow through the Seq<T> trait. Ranges are "just another Seq implementor," and the ForLoopSeq metadata serves as the stable contract between type checking and MIR lowering.

This report covers the current architecture, considerations for integrating codegen-focused features, and some longer-term design explorations.


Part 1: Current Architecture

1.1 Unified Iteration via Seq + ForLoopSeq

The central design decision: all for .. in .. loops follow the same path.

for x in iterable { body }
         |
         v
  HIR type-checking resolves Seq<T> implementation
         |
         v
  ForLoopSeq captures resolved len/get callables
         |
         v
  MIR desugars via lower_for_seq() -> while-loop

There's no range-specific for-loop lowering — Range implements Seq like any other sequence type. This simplifies the compiler and means new Seq implementors automatically work with for-loops.

Pattern binding is handled uniformly: the loop element always flows through bind_pat_value, so destructuring patterns like for Point { x, y } in points work correctly.

Reference: crates/mir/src/lower/expr.rs:1535, crates/hir/src/analysis/ty/ty_check/stmt.rs

1.2 Range Type System

The Range type uses type-level encoding for memory optimization:

pub trait Bound {
    type Repr  // () for Known (ZST), usize for Unknown
}

pub struct Known<const N: usize> {}
pub struct Unknown {}

pub struct Range<S: Bound, E: Bound> {
    pub start: <S as Bound>::Repr,
    pub end: <E as Bound>::Repr,
}
Start End Size Example
Known<0> Known<10> 0 words 0..10 literal
Known<0> Unknown 1 word 0..n
Unknown Known<10> 1 word n..10
Unknown Unknown 2 words start..end

Note: "word" here means layout.word_size_bytes from TargetDataLayout. Fe's memory layout is packed bytes (not Solidity-style 32-byte slots), though usize is word-sized on EVM.

Layout invariants around ZST fields are enforced in MIR lowering and transforms, preventing out-of-bounds writes.

Reference: crates/mir/src/layout.rs, crates/mir/src/lower/expr.rs, crates/mir/src/transform.rs

1.3 Range::len() Semantics

Range::len() clamps to 0 when end < start, making behavior well-defined without requiring saturating_sub. Conceptually:

fn len(self) -> usize {
    if end > start { end - start } else { 0 }
}

This logic is implemented across the four Seq impls (const/const, const/runtime, runtime/const, runtime/runtime), each accessing bounds appropriately for its type signature.

Reference: library/core/src/range.fe

1.4 Core Type Resolution

Range type discovery is centralized via resolve_core_range_types(), which returns the Range, Known, and Unknown types in one call. This avoids scattering path strings throughout the codebase.

Reference: crates/hir/src/analysis/ty/corelib.rs:76

1.5 TargetDataLayout

Codegen now receives a TargetDataLayout, providing a path toward multi-backend support. Size-dependent decisions (like inline vs data section thresholds) can query the layout rather than assuming EVM-specific values:

// Query the layout
let word_size = layout.word_size_bytes;

// Rather than hardcoding
// if size > 32 { ... }

Reference: crates/codegen/src/yul/emitter/module.rs


Part 2: Integrating Codegen Features

2.1 Loop Unrolling

The natural integration point is lower_for() in crates/mir/src/lower/expr.rs:1535, where ForLoopSeq is available.

A subtlety worth noting: Known<const N> doesn't necessarily mean the trip count is a compile-time literal. The const parameter could itself be a generic that requires monomorphization to resolve. Today, Known<N> is inferred only from integer literals (no const arithmetic yet), so a conservative first implementation can key off the same literal detection used for bound inference.

Semantic considerations: Unrolled loops need to preserve break/continue/return semantics and respect the effect args captured in ForLoopSeq. The resolved callables represent the actual methods that should be invoked.

An alternative structure: Implementing unrolling as a MIR transform pass (post-lowering) rather than during lowering has some appeal — it separates semantic lowering from optimization, and makes it easier to gate by optimization level.

2.2 Data Regions

The concept of storing large constant arrays/strings in data sections (rather than inlining them) is valuable for EVM targets.

Backend neutrality: The MIR representation benefits from being backend-agnostic. A DataRegionId + bytes representation lets each backend choose its own labeling and emission scheme. Avoid baking Yul label strings directly into MIR — this is the main portability footgun.

Existing infrastructure: Fe already has code region plumbing (CodeRegionOffset/CodeRegionLen intrinsics) that drives dataoffset()/datasize() patterns. Data region work can follow a similar style rather than building parallel machinery.

Reference: crates/codegen/src/yul/emitter/module.rs

Design Note: Core/Std Relevance

Data regions are the simplest instance of a broader "data lives somewhere" story (code section, calldata, memory, storage, etc.). Even if the immediate work is EVM/Yul-focused, it's worth shaping the implementation so it doesn't pre-commit Fe's future core/std interfaces.

Concretely: keep the compiler's representation backend-neutral, keep size/threshold policy layout-driven, and avoid encoding backend naming/labeling decisions into MIR. This keeps the door open for future library-level "region/handle" traits without forcing the design today.

Future-proofing checklist (for any data-region PR):

  • MIR carries DataRegionId + bytes (or equivalent), not Yul label strings / dataoffset("...") names
  • Inline-vs-section (or similar) decisions consult TargetDataLayout / target policy (no hardcoded 32)
  • Reuse the existing "code region" collection/emission pattern where possible (avoid parallel bespoke machinery)
  • Keep any core/std hooks centralized (avoid scattered path/name checks), so refactors don't become invasive later

Part 3: Design Exploration — DataLoc/StaticSeq

Note: This section explores a potential future direction. It's speculative and implies compiler intrinsics and language-surface decisions that would need their own design process.

3.1 The Pattern

Following the Known/Unknown pattern, data storage location could be encoded at the type level:

pub trait DataLoc {
    type Handle  // () for compile-time resolved locations
}

pub struct Inline {}    // Small data in bytecode
pub struct CodeData {}  // Large data in data section

impl DataLoc for Inline { type Handle = () }
impl DataLoc for CodeData { type Handle = () }
3.2 StaticSeq Sketch
pub struct StaticSeq<T, const N: usize, L: DataLoc> {
    _handle: <L as DataLoc>::Handle  // ZST for both Inline and CodeData
}

impl<T, const N: usize> Seq<T> for StaticSeq<T, N, CodeData> {
    fn len(self) -> usize { N }
    fn get(self, i: usize) -> T { /* intrinsic: codecopy + load */ }
}

The compiler could automatically select Inline vs CodeData based on size thresholds (queried from TargetDataLayout), similar to how range bounds are detected from literals.

3.3 Considerations

This design implies compiler intrinsics for loading elements from code/data sections, and ties into backend capabilities that vary across targets. It also intersects with ongoing discussions about core/std "memory region" interfaces and contract desugaring plumbing; treat DataLoc/StaticSeq as a sketch for possible future surfaces, not a requirement for implementing data regions.


Part 4: Integration Summary

The seq-iteration branch establishes Seq/ForLoopSeq as the semantic contract for iteration. Performance features like unrolling and data regions can layer on top of this contract.

A few patterns emerge from the current design:

  • Uniformity: Range flows through the same Seq machinery as arrays, keeping the compiler simple
  • Type-level encoding: Compile-time knowledge (bounds, locations) lives in types with ZST representations
  • Backend abstraction: TargetDataLayout and backend-neutral MIR structures support future multi-target work
  • Centralized helpers: resolve_core_range_types() and similar functions reduce path-string duplication
  • Non-interference: Data regions should land as backend-neutral IR + layout-driven policy, leaving any new core/std "region" traits as an explicit follow-on design step

Part 5: Key Files Reference

Layer File Purpose
Core Library library/core/src/seq.fe Seq trait
library/core/src/range.fe Range with Known/Unknown, saturating len()
HIR crates/hir/src/analysis/ty/ty_check/stmt.rs ForLoopSeq, Seq resolution
crates/hir/src/analysis/ty/corelib.rs:76 resolve_core_range_types()
MIR crates/mir/src/lower/expr.rs:1535 lower_for_seq() — integration point
crates/mir/src/layout.rs Layout with ZST handling
Codegen crates/codegen/src/yul/emitter/module.rs TargetDataLayout, code region machinery

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants