Find and replace with type information

August 31, 2022

Eventually the realization will hit that you’re just trying to approximate what a compiler usually does, but with a regex pattern […] and all you really wanted was type information.

One of the earliest code changes I made with comby removed redundant code in the Go compiler. After proposing changes, I soon came to the sad realization that some of them turned out to be just plain wrong.¹ What I missed is that a value’s type could affect whether the code was really redundant or not.

The mechanical, syntactic change removes a redundant nil check and goes like this:

Before

if x != nil {
  for ... := range x {
    ...
  }
}

After

for ... := range x {
  ...
}

The tricky bit is that this change is only valid if x is a slice or map, and not if x is a pointer or channel.² If we only have the syntax to go on, there really is no way to know if the change is valid. Ever since then I’ve wanted a way to pull type information into comby.

There’s now an experimental way to query type information with comby that solves this kind of issue. The main idea is to write a rule that pattern matches against type information in hover tool tips (i.e., type-on-hover data).

where match :[x].lsif.hover {
  | ... map... -> true
  | ... []... -> true
  | _ -> false
}

This kind of rule gives enough flexibility to avoid false positives, filtering and changing only code where type-on-hover suggests we have a slice (like []string syntax in Go) or map (like map[string]string) on our hands.

A few things need to fall in place for this to work, and only a couple of open source projects and languages are supported right now. It’s meant to be an early interface, implemented to generically support future language and project expansions. Read on to learn what’s possible at the frontier of simpler find and replace with types.

Extending `comby` with type information

When I think about adding type information to comby I want to keep things simple and easy. Anything too heavyweight isn’t a good fit, because language-specific tools exist for automating complex changes to the nth degree (clang for C) and I don’t have time to integrate or maintain such toolchains directly in comby. To that end, I really just want comby to use an external service that answers: What is the type of the value at this location in the file?

Type-information-as-a-service is ideal because then I can keep the convenience of matching syntax easily, and outsource accessing language-specific semantic properties as needed. The concept is to expose a property on any matched variable that corresponds to type information (if available). It might look something like this:

func parse(:[args]) where :[args].type == string

For practical reasons, :[args].type would just be a simple string in comby. In other words, it’s just a string representation of a type (like what you’d see in a hover tooltip). So, calling this a “type” property is a bit too simplistic because in reality types are expressions. It isn’t practical to represent types as such in comby today, which is why I’ve chosen to use a different name .lsif.hover instead (more on this later). Incorporating more of the type system to accurately represent terms is welcome information, and perhaps one day comby will expose a more precise .type notion. For now we can still make progress on the find-and-replace-with-types problem in the leading example and many others. You can skip ahead to what this ends up looking like in comby today but it might make more sense if I tell you where I’m pulling the type information from first.

Where can I find type information as a service?

I know of only two high-level solutions resembling a service that conveniently provides type information today, and which aim to eventually support all languages. The first is language server implementations of LSP (Language Server Protocol). Language servers expose all kinds of information, and typically include type information in hover tooltips found in editors. With LSP there is no authoritative, centralized language server that you can just query. Instead, language servers are configured for specific projects and either run locally on your machine or get managed by an organization.

The second is a central, public-facing service maintained on Sourcegraph.com that exposes hover information similar to the LSP format (disclaimer: I work at Sourcegraph, but my day job is unrelated to this service). The key difference is that Sourcegraph automatically processes popular repositories on GitHub (currently about 45K) with support for TypeScript, Go, Java, Scala, and Kotlin. It’s also possible to process your own projects (locally, in CI, or GH actions) and upload the data. Once that happens, you can efficently browse or query type information for variables on hover.

Hover information in Sourcegraph

This is exactly the kind of information I need to tell whether a matched variable is a slice, map, channel, or pointer. For supported projects, you just fire a GQL request to get the hover content.

I went with this second option for two reasons.

Managing language servers is too difficult. Some years ago I had a prototype where comby starts up language servers and queries them. It was a pain to manage and configure servers, especially on various open source projects. Although I succeeded at getting things up and running for about 5 languages and various projects, I didn’t like how brittle things were and discontinued the effort. In principle there’s nothing stopping an ambitious developers to generically integrate comby with LSP (and even restrict it to one language or project). I just don’t want to write or maintain that kind of thing.

Network requests to central services are easy. Ultimately I implemented generic support in comby to pull hover information from any central service exposing type information over HTTP (less hassle for me), and the default service with the best support for that right now is Sourcegraph.com.

Using `comby` with types today

Type information is accessed in code hovers via :[x].lsif.hover syntax. This syntax resolves hover information at the end position of :[x] when comby succeeds at retrieving this information from an external service. Until generic code intelligence services become more definitive, inspecting hover information is currently the most direct way to access type information from a central service today. The lsif part is a reference to LSIF, a schema related to LSP and SCIP.

Go example redux, this time without disappointment. Adding a rule to check type-on-hover information solves the false positive issues from before.

where match :[x].lsif.hover {
  | ... map... -> true
  | ... []... -> true
  | _ -> false
}

I settled on this rule based on the type-on-hover information I saw for potential matches in popular Go repositories. Here are some matches and non-matches I found by using the new :[x].lsif.hover ability.

✅ struct field ExtraHeaders map[string]string — we want to match this, it’s a map!

Hover information in Sourcegraph for the rclone project

This is a file in github.com/rclone/rclone. You can browse it and see the redundant nil check for yourself ↗

✅ struct field ImportStateReturn []*InstanceState — we want to match this, it’s a [] slice!

This is a redundant nil check in github.com/hashicorp/terraform

❌ var hashCh chan *blockTxHashes — we don’t want to match this, its a channel type chan!

This valid nil check lives in github/ethereum/go-ethereum

You can use other patterns to achieve similar things, with deeper or looser precision. For example, you could write a regex pattern to match undesired cases where the string contains chan, and then explicitly exclude such matches.

| ...:[~\bchan\b]... -> false

Expand to see the full command line invocation I ran on each project

^{You first need to clone each repository, change to the root directory, and can then
run this command if you have a recent version of comby. See
footnotes if you want more info for running things yourself.}

COMBY_MATCH="$(cat <<"MATCH"
if :[x] != nil {
  for :[e] := range :[x] {
    :[body]
  }
}
MATCH
)"
COMBY_REWRITE="$(cat <<"REWRITE"
for :[e] := range :[x] {
  :[body]
}
REWRITE
)"
COMBY_RULE="$(cat <<"RULE"
where match :[x].lsif.hover {
  | ... map... -> true
  | ... []... -> true
  | _ -> false
}
RULE
)"
comby "$COMBY_MATCH" "$COMBY_REWRITE" -rule "$COMBY_RULE" .go

This behavior is admittedly an approximation to get type information. For example, .lsif.hover may also contain doc strings and matching may need to work around this extraneous data. But (I assert) it is often good enough for various type-sensitive changes. You can also use :[x].lsif.hover in the rewrite output.

This will let you inspect the information exposed by :[x].lsif.hover. In fact, that’s exactly what I did to decide how to match on type information for my rule. I just substituted it in a Go comment in the output.

Hover information substituted in output template

Expand to see sample match and rewrite templates for inspecting hover info

Match

if :[x] != nil {
  for :[e] := range :[x] {
    :[body]
  }
}

Rewrite

if :[x] != nil {
  /* hover info is: :[x].lsif.hover */
  for :[e] := range :[x] {
    :[body]
  }
}

Which projects and languages does this work for?

If a repository is processed with precise type-on-hover on Sourcegraph.com, accessing that information via comby should just work. This means you can use tools to process and upload this information with your own TypeScript, Java, Scala, Kotlin, or Go projects to Sourcegraph and comby will be able to access hover information for them. These tools encode type information and other data using SCIP, which is related to the Language Server Index Format (LSIF/LSP) but with efficiency in mind.

If you happen to work at a company with a running Sourcegraph instance that indexes any of these languages, you’re in luck: setting LSIF_SERVER=https://your-instance/.api/graphql when you run comby will enable it to get hover information from your instance.³

See more at the end of the post if you want to try things out yourself.⁴

When is type information actually useful for find and replace?

I’ve gotten this question a couple of times from my developer friends, and often they’re able to nod their heads that accessing this kind of type information is theoretically useful, but they really want to see concrete examples. The reality is that vast amounts of changes that simplify expressions, modernize functions, or apply lint-like fixes require type information.

The connection might not be obvious because accessing type information is typically buried deeper in a linter or compiler, after the initial layer of syntax concerns (parsing), and tightly coupled to the language. So to give a sense, here are some diverse ways where explicitly pulling in type information supercharges code search and refactoring.

› We’ve covered a Go example, which is a code simplification implemented in staticcheck. Here are many more code simplifications with the same flavor—the last time I counted, at least 18 of the 35 checks require type information to ensure correctness.

› In Java, try-catch-finally blocks can be replaced with more succinct try-with-resources blocks that automatically close resources, like files. The catch? We need to know whether an object implements the AutoCloseable interface. This also is type information we need to correctly refactor.
› Equality operators == on generic values can lead to Bad Things™. This is a well-known pitfall in JavaScript and found in other languages too. See this post on the “perils of polymorphic compare” in OCaml, and type coercion simplifications for Erlang in Section 3.2 of this paper. In statically typed languages like OCaml, you can use equality operators for a specific type like Int.equal or String.equal instead. But when you look at an expression like x == y, how do you know if it’s safe to replace the expression with something like Int.(x == y) or String.(x == y) or something else? That’s right, you need something to tell you the type of x or y.
› Types make static analysis more precise and help remove false positives in bug reports. Analyzers like Semgrep and PMD include ways to reference type information for checks like "only report a potential SQL injection if the argument passed is of type string (int is safe)" or "only report a violation if a loop index type is a float".
› Reducers like creduce and comby-reducer automatically transform a program in valid ways to produce smaller source code that triggers the same runtime behavior of the original. Reducers can use type information to more precisely guarantee valid programs, or use type information to make the program smaller. For example, creduce will remove the & operator in C code depending on the expression and its type (see the complete description here).

In these examples, types can sometimes be inferred by matching on existing syntax definitions, but things will quickly become intractable as you try to follow the type propagation in and across functions. Eventually the realization will hit that you’re basically just trying to approximate what a compiler usually does, but with a regex pattern. Really all you want is for the information to exist, and a convenient way to look it up.

A more general view on integrating type information

I’ve long wanted an interface that effectively decouples accessing general semantic properties from patterns that match syntax. This stems from the fact that there’s virtually no mature tools today where you can access deep semantic properties without needing to write code or interface with heavyweight tooling like clang (the exception might be IntelliJ structural search-and-replace). By deep I mean not just locally scoped type information or primitives. Although the lsif.hover data today is a string representation, it is a deep, project-global representation of the underlying type. As I’ve continued developing comby I’ve realized that it just makes sense to decouple language-general syntax pattern matching and language-specific semantic properties. It works well to compose syntax pattern matching and type information for those lightweight code changes or checks in the examples above.

The leading thought is that semantic properties don’t need to be restricted to just type information. Precise “rename symbol” operations in editors (say, to rename a function and update its calls) rely on semantic relations over definitions and references to work. Similar to exposing type information, comby could also expose information and operations on “all references for the syntax matched at this point” or resolve "definitions for this matched syntax". Because comby is a more general syntactic rewriter than editor find-and-replace prompts, these semantic properties could unlock easier ways to do things like instrument code (e.g., add a debug statement after every reference) or selectively emit information to build domain-specific semantic language models.

The idea of exposing semantic information alongside syntactic changes is not new. The closest and most cogent discussion on this topic I’ve found uses an apt term: semantic selectors. Unfortunately the paper is paywalled, but here’s an excerpt:

Program manipulation systems provide access to static semantic information through semantic selectors which yield attributes associated with nodes. For example, a manipulation system for a statically scoped language will supply a semantic selector that yields the defining occurrence of a given name (i.e., the point where the name is defined).

Language Design for Program Manipulation by Merks et al., IEEE Transactions on Software Engineering, 1992. ↗

Strikingly, tools today still don’t expose semantic information like this without heaps of complexity and difficulty (think about installing and running language-specific toolchains, or writing analyses using them). The current trend of language-server-like tools is the closest effort to a simpler semantic interface, but it is heavily influenced by code editor workflows. I’m hopeful for an ecosystem with broader semantic selectors as a service that expose deep information for all languages. Find and replace with types is one of the best reasons I’ve found for doing that so far.

Footnotes

^{1. Original Go PR}

^{2. The underlying semantics for a range operation differs based on the type of
x. When x is a nil slice or map, no check is needed, and the range
operation is effectively a no-op (my guess is this behavior is probably
implemented for convenience). But when x is a pointer or channel, a missing
check could lead to a null dereference or blocking, respectively.}

^{3. Please note nothing in this post is a Sourcegraph-endorsed
or maintained feature, it’s a personally maintained experimental feature in comby added by me, the author.}

^{4. Some quick usage notes if you decide to try the things in this post:

^{- › You need to use the latest docker distribution or build comby from source. This because the latest version is not yet available to Linux or via brew on macOS.}

^{› You still need to clone the target repository locally (like github.com/rclone/rclone), then run comby at the root of the project.}

^{› I’m not sure which projects are automatically indexed, but comby will
notify you if a lsif.hover property isn’t available.}

^{› For help with Sourcegraph stuff visit their Discord. For comby things visit the gitter channel.}

^{› Using comby to access hover info is experimental and subject to change.}}

<!-- docker run --platform linux/amd64 --rm -it --entrypoint sh comby/comby-rg:alpine-3.14-1.8.2 <p>COMBY_MATCH="$(cat <<"MATCH" if :[x] != nil { for :[e] := range :[x] { :[body] } } MATCH )" COMBY_REWRITE="$(cat <<"REWRITE" for :[e] := range :[x] { :[body] } REWRITE )" COMBY_RULE="$(cat <<"RULE" where match :[x].lsif.hover { | … map… -> true | … []… -> true | _ -> false } RULE )" comby “$COMBY_MATCH” “$COMBY_REWRITE” -rule “$COMBY_RULE” -rg "-g '*.go’"

–>

Deconstructing programs for compiler fuzzing

April 11, 2022

Rijnard

I don’t want to write a grammar specification for fuzzing compilers.

I want a shot at crashing the Rust compiler within 24 hours of fuzzing.

I happen to have test programs for the compiler. Let’s just get fuzzing already.

Fuzzing is the process of generating varied inputs to feed into a program, where the objective is to discover inputs that crash the program. These crash-inducing inputs often reveal subtle bugs. Fuzzing compilers adds an interesting twist because they expect rather structured inputs (like valid Rust source code) that encode a lot of logical complexity for compilers to reason about (think computational constructs like control flow, recursion, and type checking).

This post is about creatively using source code in compiler tests to exercise interesting compiler behavior quickly (and hopefully crash the compiler sooner than you would otherwise!). The big idea is to first deconstruct example source programs, then do some criss-cross applesauce that recombines program fragments to generate new inputs. In essence: use concrete programs to create a combinatorial space of potentially valid and unique programs that fuzzing can generate and explore over time.

There’s a related paper I co-authored ↗ with data that shows that yes, these techniques can likely surface bugs sooner. In this post I focus on the part of that work that I personally found most interesting.

A bit of background on fuzzing compilers. We know compilers expect programs that conform to a language grammar. Exploiting knowledge of a language grammar is a well-known way to try and exercise interesting parts of a compiler [1]. The basic idea there is that you can use a grammar to generate various programs that might end up causing the compiler to run into some corner case and crash (yay!). Using a grammar means you’re more likely to generate valid programs. This matters because your favorite fuzzer (like AFL++) might otherwise end up spending a lot of time in shallow, less interesting parts of a compiler (think parse errors and error handling for obviously invalid programs) before discovering paths that lead to deeper compiler logic (say code generation and optimization). But it can be a lot of effort to write a grammar specification, or turn it into an input generator. Just generally annoying. Can we sidestep getting into all that?

Another way to approach this is to say “Well, I know fuzzing compilers means trying structured inputs that conform to the underlying grammar. I happen to have programs from the compiler’s test suite. Let’s get fuzzing already.” And yes, that’s a grand way to start off: grab the already-valid programs, optionally have the fuzzer trim the corpus based on some feedback from instrumentation, and off you go. And now we get to the part where I tell you “Hey hang on, before you run off, we can get a bit more creative with those tests. It’s cheap and pretty easy to get started, and you might just find bugs that much quicker. Have you heard of comby-decomposer?”

Deconstructing programs with comby-decomposer

Existing programs in a compiler’s test suite typically hand-crafted by expert compiler writers to test phases like parsing, type checking, optimization, and code generation. These tests are extremely valuable for exercising broad functionality of a compiler, and it only makes sense to include them in the fuzzing corpus if we’re going to hunt for corner cases. Tuning the initial corpus can reasonably stop here, and it’s fairly “safe” to just rely on feedback-driven fuzzers to synthesize valid, interesting, and structured input over time. ↗ But we have an opportunity to be a bit smarter when it comes to compiler fuzzing.

Let’s not only start off with just the initial programs, but also use those initial programs to create new, unique combinations of likely-valid and never-before-seen programs. This is what I created comby-decomposer for, a tool that chops up program source code in basically any way you’d like. Let’s look at an example using this test program inspired by the Solidity contract language.

function f(uint256 arg) public {
  g(notfound);
}

One interesting way to chop this up is to recognize that valid and varied expressions exist between balanced delimiters like (...) or blocks delineated by {...}. We don’t need to know all the details of the Solidity grammar. Just by eyeballing the concrete test program, we can intuit that there are interesting terms or expressions inside general syntax like parentheses and braces.

comby-decomposer takes a comby pattern like (:[1]) and will then split up the input program into two sets with respect to this pattern. First a set of templates, which is just the original program templatized with respect to the holes input pattern (you can think of this as un-substituting matches). Second, the set of fragments matching holes in the input pattern. We can use any comby pattern. In this example, let’s use the input patterns (:[1]) and {:[1]}. Here’s the diagram of terms matched between the delimiters, and the two output sets generated (templates left, fragments right).

This output gives a lot of freedom for building new programs: any of these fragments may be substituted into any of the templates (and there’s a server included in comby-decomposer to generate these inputs randomly and on-demand). The combinations for this simple program aren’t very interesting or necessarily valid. This simple example also shows just one hole created in templates, but comby-decomposer will create templates with multiple holes too, or holes for nested terms. The point is to apply this idea to an entire test suite, yielding a combinatorial space of potentially unique and valid programs to explore over time.

Fuzz results: what difference does comby-decomposer make?

comby-decomposer supplements the hybrid fuzzing approach explained in the paper I mentioned before to generate more varied inputs earlier during fuzzing. The main takeaway for comby-decomposer is that using the tool with hybrid strategies generally found more bugs within a 24-hour window than approaches without it (we ran 210 days worth of fuzzing to create a controlled experiment and we say “more bugs” by counting unique bugs according to ground truth developer fixes, not fuzzer bucketing which is inaccurate). In the extreme, using comby-decomposer found on average 7 bugs in the Zig compiler whereas all other approaches found less than 3 on average. Many of the unique findings are interesting, in the sense that they trigger crashes in code analysis and generation ↗. Some crashing programs found exclusively by comby-decomposer are quite large, even after running comby-reducer. These likely trigger more complex reasoning in the compiler:

var frame: ?anyframe = null;
        export fn a() void {
            _ = async rangeSum(10);
        }
        fn rangeSum(x: i32) i32 {
            suspend {
while(1==1)
                frame = @frame();
            }
            if (x == 0) return 0;
            var child = rangeSumIndirect;
            return child + 1;
        }
        fn rangeSumIndirect(x: i32) i32 {
            if (x == 0) return 0;
            var child = rangeSum;
            return child + 1;
        }

Let’s crash the Rust compiler?

Anecdotally, I took comby-decomposer for a spin on the latest Rust compiler while writing this post. It found one crashing input in just 20 hours, here’s the reduced version:

extern "" {
async fn partial_init() -> u32 {
async fn hello_world() {}
}
}

Unfortunately (for me) this bug was already found and reported by fuzz-rustc just a couple of days prior. But I take solace knowing that there’s a sporting chance to find crashing inputs for the Rust compiler in less than 24 hours. For this fuzzing campaign I ran on just a single core and decomposed only more recent regression tests (around 200 odd tests). This because recent regressions are more juicy for testing, in case a fix may be incomplete. The 200 tests yielded around 1,600 templates and 1,300 fragments.

Running it yourself

👉To do program decomposition, check out the main comby-decomposer project.

👉For fuzzing, check out this modified AFL fuzzer for compilers that works with the comby-decomposer server to request new inputs on-demand.

👉Have other grand ideas to fuzz compilers and wondering if this could be useful? Post in the issue tracker or DM @rvtond.

When to choose comby-decomposer

There are two fundamentally practical ideas that make comby-decomposer one of the best tools for breaking up programs today:

You can generate well-formed templates and fragments for practically any language because it builds on comby to parse the program. It even decomposes nested syntax (a practically impossible task for regex-based engines). Want templates and fragments for Kotlin? Just find some Kotlin programs and tell comby-decomposer they’re Kotlin files.
You can decompose programs with respect to any pattern that comby recognizes. You don’t have to decompose with respect to a pattern like (:[1]) in the example. Maybe you’re fuzzing a domain where you only want to decompose with respect to numbers. So, you could just specify a regular expression pattern like [0-9]. Since comby supports regular expression matching too, you’d just specify this in comby with [~[0-9]+] see the docs for more ↗.

comby-decomposer was developed in part to make it extremely easy to generate inputs and fuzz compilers for domain-specific applications (Solidity and Move for smart contracts) and up-and-coming languages like Zig. Other tools exist to generate programs for testing and fuzzing (CSmith anyone?) and maybe they’re more suitable or mature for your language. But none are as dead simple and broadly language general to the degree that comby makes possible. So that’s why comby-decomposer exists now.

Whatever You Want syntax for rewriting code

April 25, 2021

Rijnard

We have color themes for our IDEs. We should have syntax themes for our tools.

Search and Replace commands could look like this.

swap($1, $2) → swap($2, $1)

swap(α, β) → swap(β, α)

swap(🐵, 🍌) → swap(🍌, 🐵)

With Whatever You Want (WYW) syntax, it’s really up to you.

Like many pattern-matching languages, comby recognizes some reserved syntax that has special meaning for matching input. Think of a grep pattern like .*. Here the reserved syntax . means mean “match any character” and * means "repeat matching the previous expression zero or more times". In the rest of this post I’ll just call this reserved syntax a metasyntax. comby predominantly uses a metasyntax that looks like this:

:[var]

which roughly means "match anything within well-balanced syntax for some language L and bind those contents to variable var". There are some variations of this syntax for other matching behaviors (you can find the reference here) but that’s not important for this post.

In some sense, syntax variety is the spice of programming languages. If you’re designing a new language you have to pick something, and you’re probably going to balance picking some familiar syntax (a la “parentheses seem reasonable for arithmetic expressions, we’re not barbarians”) but then also sprinkle in some unique things so that your grammar isn’t accidentally isomorphic to Java. Your decisions may retain the loyalty of early adopters (make famously stuck with tabs instead of spaces to not disrupt the early dozen or so users). Or you might dismay academics on twitter. Whatever you decide, it’s really your privilege.

I didn’t really like settling on a metasyntax for comby. It’s difficult to anticipate what the consequences of settling on a syntax are. Maybe I’m designing myself into a corner and it’ll make future syntax extensions impossible. Maybe the syntax will cause an Angular programmer to break out in hives. I just wanted it to be minimal, and I wanted it to be easy to assign matched contents to variables. Not something like (?P<name>pattern) syntax of named groups in some regular expression dialects.

Since then, I’ve wanted to experiment with other syntax. For example, I noticed that the gofmt tool shows an example using Greek unicode letters as variables.

gofmt -r 'α[β:len(α)] -> α[β:]'

And so I latched onto this idea of being less fastidious about syntax design.

Yes, allowing different language syntax can lead to fragmented and inconsistent tooling. But who am I to deprive you from shooting yourself in the foot, O mighty developer? Besides, if the tooling is malleable, maybe we can shape it to be in line and consistent with pervading opinions.

I factored out the metasyntax definition in comby to support Whatever You Want syntax. With WYW syntax it’s possible to attach the native matching behavior (semantics) implemented in comby to basically arbitrary syntax. Syntax definitions are defined in JSON. The rest of this post showcases some WYW definitions: the reasonable, the laughable, and the deplorable.

Default syntax definition

To ground things, the examples below run a transformation on the input swap(x, y) and rewrites it to swap(y, x). Here’s an invocation of a swap transformation in the default metasyntax.

echo 'swap(x, y)' | \
comby -stdin \
'swap(:[1], :[2])' 'swap(:[2], :[1])'

produces

@|-1,1 +1,1 ============================================================
-|swap(x, y)
+|swap(y, x)

Below is the JSON definition for this metasyntax. Entries correspond to the matching behavior described in the reference. No need to dwell on this format–this blog post is really just about enjoying the show. Have a look at the detailed page for defining your own metasyntax if you want particulars after reading.

Expand for JSON definition

{
  "syntax": [
    [ "Hole", [ "Everything" ], [ "Delimited", ":[", "]" ] ],
    [ "Hole", [ "Expression" ], [ "Delimited", ":[", ":e]" ] ],
    [ "Hole", [ "Alphanum" ],   [ "Delimited", ":[[", "]]" ] ],
    [ "Hole", [ "Non_space" ],  [ "Delimited", ":[", ".]" ] ],
    [ "Hole", [ "Line" ],       [ "Delimited", ":[", "\\n]" ] ],
    [ "Hole", [ "Blank" ],      [ "Delimited", ":[ ", "]" ] ],
    [ "Regex", ":[", "~", "]" ]
  ],
  "identifier":
    "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_"
}

Dolla dolla bills

Dolla syntax is an entirely reasonable alternative syntax where $ prefixes variables alphanumeric variables.

echo 'swap(x, y)' | \
comby -stdin -custom-metasyntax dolla.json \
'swap($1, $2)' 'swap($2, $1)'

This is a common and familiar syntax in Bash, Perl, PHP, and similar languages. It is also a bit more terse than the default comby syntax. You might be wondering You might ask "What if I want to match a the literal syntax $var, won’t that conflict with this metasyntax?". See the section on escaping later in this post if you’re interested in this corner case.

JSON definition of Dolla metasyntax


{
  "syntax": [
    [ "Hole", [ "Everything" ], [ "Delimited", "$*", null ] ],
    [ "Hole", [ "Expression" ], [ "Delimited", "$", null ] ],
    [ "Hole", [ "Alphanum" ],   [ "Delimited", "$$", null ] ],
    [ "Regex", "$", "~", "." ]
  ],
  "identifier":
    "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_"
}

Lambda syntax

To use similar Greek letters like the α[β:] syntax show before, I made it possible to associate any unicode string with the matching behaviors that comby supports. This is legitimately useful for avoiding syntax conflicts, and also very terse.

echo 'swap(x, y)' | \
comby -stdin -custom-metasyntax lambda.json \
'swap(α, β)' 'swap(β, α)'

And if we wanted to match any kind of function name like swap, we could use a variable λ if desired.

echo 'swap(x, y)' | \
comby -stdin -custom-metasyntax lambda.json \
'λ(α, β)' 'λ(β, α)'

I like this. The only challenge is that I can’t easily find these symbols on my keyboard.

JSON definition of Lambda metasyntax

{
  "syntax": [
    [ "Hole", [ "Everything" ], [ "Delimited", "place-holder", null ] ],
    [ "Hole", [ "Everything" ],
        [ "Reserved_identifiers",
            [ "Γ", "Δ", "Θ", "Λ", "Ξ", "Π", "Σ", "Φ", "Ψ", "Ω" ]
        ]
    ],
    [ "Hole", [ "Expression" ],
        [ "Reserved_identifiers",
            [
                "α", "β", "γ", "δ", "ε", "ζ", "η", "θ", "ι", "κ", "λ",
                "μ", "ξ", "π", "ρ", "ς", "σ", "τ", "υ", "φ", "χ", "ψ",
                "ω"
            ]
        ]
    ],
  ],
  "identifier":
    "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_"
}

Emoji Movie

I did say any unicode string, so emojis count.

echo 'swap(x, y)' | \
comby -stdin -custom-metasyntax emoji.json \
'swap(🐵, 🍌)' 'swap(🍌, 🐵)'

When variables are used multiple times in a pattern, it implies that they should be textually equal. For example, if we wanted to check that two arguments of an add function are equal, we could write a transformation like this:

echo 'add(x, x)' | \
comby -stdin \
'add(:[v], :[v])' 'multiply(2, :[v])'

@|-1,1 +1,1 ============================================================
-|add(x, x)
+|multiply(2, x)

This equals relation is still valid when the syntax is redefined. So we can instead equivalently write this in preferred emoji syntax.

echo 'add(x, x)' | \
comby -stdin -custom-metasyntax emoji.json \
'add(😂, 😂)' 'multiply(2, 😂)'

@|-1,1 +1,1 ============================================================
-|add(x, x)
+|multiply(2, x)

JSON definition of Emoji metasyntax

{
  "syntax": [
    [ "Hole", [ "Everything" ], [ "Delimited", "place-holder", null ] ],
    [ "Hole", [ "Everything" ],
        [ "Reserved_identifiers",
            [ "❤️", "💙", "💚", "💜", "💛", "🧡" ]
        ]
    ],
    [ "Hole", [ "Expression" ],
        [ "Reserved_identifiers",
            [ "😂", "🚡", "😃", "😬", "😈", "🐵", "✋", "😻", "🍌" ]
        ]
    ]
  ],
  "identifier":
    "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_"
}

Wutspace

Any unicode string. So of course whitespace is allowed. If whitespace conflicts with the some other part of the pattern, you’ll have to find an escape mechanism to match whitespace literally. This is WYW syntax: your syntax, your problems 🙂. If we use whitespace as identifiers, we’ll need unique ones when we don’t want matches to be equal. So we’ll choose <space> and <space><space> to demo the swap example. We’re also going remove spaces in our input, because that introduces ambiguity. But anyway, you end up in a world where this can make some kind of sense.

echo 'swap(x,y)' | \
comby -stdin -custom-metasyntax wutspace.json \
'swap( ,  )' 'swap(  , )'

JSON definition of Wutspace metasyntax

{
  "syntax": [
    // Currently a placeholder is needed if we only care about Reserved_idenitifersl to avoid trickery.
    // Order is significant.
    [ "Hole", [ "Everything" ], [ "Delimited", "place-holder", null ] ],
    [ "Hole", [ "Everything" ],
        [ "Reserved_identifiers",
            [ "  ", " " ]
        ]
    ]
  ],
  "identifier":
    "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_"
}

suǝɹɐd

That’s parens upside down. Because it’s (un)naturally possible to define inverted parentheses as variable delimiters.

echo 'swap(x, y)' | \
comby -stdin -custom-metasyntax inverted-parens.json \
'swap()1(, )2()' 'swap()2(, )1()'

JSON definition of inverted parens metasyntax

{
  "syntax": [
   [ "Hole", [ "Everything" ], [ "Delimited", ")", "(" ] ],
   [ "Hole", [ "Expression" ], [ "Delimited", "))", "((" ] ]
  ],
  "identifier":
    "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_"
}

Dangling

For maximum confusion, just define a prefix ) for variables. Comby won’t confuse that metasyntax with well-balanced parentheses because metasyntax takes priority during parsing.

echo 'swap(x, y)' | \
comby -stdin -custom-metasyntax dangling.json \
'swap()1, )2)' 'swap()2, )1)'

JSON definition of Dangling metasyntax

{
  "syntax": [
   [ "Hole", [ "Everything" ], [ "Delimited", ")", null ] ],
   [ "Hole", [ "Expression" ], [ "Delimited", "))", null ] ]
  ],
  "identifier":
    "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_"
}

Summary

There you have it. Maybe you’d like to define a templating language with {{var}} or something else, it’s Whatever You Want. For the curious reader, and to characertize some of the technical details more concretely, I cover some more thoughts on related tools and techniques below.

On syntax definitions

Language workbenches define programming languages to the nth degree–not only the syntax, but also static and dynamic semantics, and more. A prime example is Spoofax. SDF3 is Spoofax’s interface for defining language syntax. Generally, syntax definition here defines a grammar of input syntax to recognize, and parsing the input yields a concrete syntax tree or abstract syntax tree representation. This tree is later interpreted or evaluated by another program to perform some computation.

The WYW metasyntax definition in comby operates rather differently. It only lets you define a comparatively simple syntax for some fields in a JSON format (you’re not defining a grammar here, just some parts of it). And then, those definitions are only associated with preexisting computations that comby implements (roughly, language-aware context-free matching). Indeed, comby is built on the premise that no explicit parse tree or abstract syntax tree is needed at all to perform its syntax-tree rewriting. The comby metasyntax definitions directly parameterize matching behavior and there is no in-between tree representation. Because there is no in-between tree representation, the use cases and goals of comby metasyntax definitions are simpler and rather less powerful than syntax definition frameworks found in language workbenches.

Macros are a well-known abstraction in languages that can (re)define syntax. Racket define-syntax and Rust macros are notable. Macro definitions are different compared to comby metasyntax definitions in this post in the sense that macro systems can generally introduce arbitrary computation. With comby, the metasyntax definitions only parameterize existing match behaviors.

On escaping

What happens if you define a metasyntax where $x is a variable but you want to match the literal string $x? One way to avoid this is to change the metasyntax, but that can be inconvenient. Comby deals with syntax conflicts in a generic way by allowing an escape hatch via regular expression matching. Because regular expression syntax recognizes escape sequences (e.g., \. to match . literally) we can just match conflicting syntax literally via regular expression escaping. This way, you get embedded regular expression matching (should you choose to make use of it) and a built-in escape hatch for those rare cases of conflicting syntax. As far as corner cases for quoting and escape hatches go, I find it’s simpler to just piggyback on an established regular expression language that uses \-escaping rather than expose an entirely new interface that requires you to invent a custom escape syntax.

To make this work, comby just needs a metasyntax definition for embedding regular expressions. By default, comby uses :[<variable>~<regex>] for embedding regular expressions. We could define an alternative metasyntax like $<variable>~<regex>. Here the syntax definition allows you to customize the prefix and suffix syntax $ and ~ parts. The suffix, in this case ., is important so that we can recognize when to stop scanning this metasyntax, since the expression <regex> may be any character allowed by the regular expression language. Once this part is defined, you can match $var literally with a pattern like $1~$var..

echo 'swap($x, $y)' | \
comby -stdin -custom-metasyntax dolla.json \
'swap($1~$x., $2)' 'swap($2, $1)' # matches the first $x literally

@|-1,1 +1,1 ============================================================
-|swap($x, $y)
+|swap($y, $x)

As far as search-replace tools, it’s worth mentioning a related idea here in sed. It’s a lesser known fact that sed actually accepts arbitrary characters to delimit search and replace patterns. Most online examples use /:

$ echo 'http' | sed 's/http/https/'
> https

But you can really choose any character. This is especially useful when the input contains /, which you’d need to escape otherwise. Instead of

$ echo 'http://' | sed 's/http:\/\//https:\/\//'
> https://

You could use a ~:

$ echo 'http://' | sed 's~http://~https://~'
> https://

A simple program reducer for any language

March 26, 2021

Rijnard

I’ve been fuzzing compilers for less familiar languages like Solidity and Diem (for smart contracts), and up-and-coming languages like Zig. More on that fuzzing effort here. A fun part of that is looking at programs that crash the compilers. Reducing those programs to submit in bug reports… less so. Thing is, I want to submit programs that are small and comprehensible for maintainers, and doing things by hand got tedious after about 5 reports. There are tools for reducing programs out there, but none really checked the boxes I wanted. So I wrote comby-reducer to check those boxes, and things became a lot more fun:

Works on basically any language syntax or structured format (including e.g., DSLs)
Syntax-aware (not just regex), but without strictly requiring grammar definitions
Easy to define and include/exclude transformations (declarative, no scripting)
Easy to see what it did (so I can tweak transformations)
Easy to install and invoke
Fast on large inputs? Most of the crashing inputs are small, this not as important

Basically, something dead simple that allows easy tweaking for transforming syntax.

How it works

comby-reducer is built using comby, which supports a lightweight way of rewriting syntactic structures of a program’s parse tree, like expressions and function blocks. Comby is language-aware and understands basic syntax of code, strings, and comment syntax for many languages. Absent language-specific parsers, comby uses a generic parser that works well for syntactic constructs in DSLs and less mainstream languages. You can learn more about comby here.

comby-reducer uses a JavaScript library transpiled from comby's core parser engine, and a couple of functions for transforming a program to a fixed point.

Let’s move on to examples. If you want, you can learn more about how program reducers work by checking out the resources at the end of this post.

A program reduction tour

Some fuzzing found this program crashed the Move compiler.

module M {
    resource struct R {}
    struct Cup<T> {}

    fun t0(x8: u8, x64: u64, x128: u128) {
continue;
        (false as u8);
        (true as u128);

        (() as u64);
        (1,1
);

        (0 as bool);
        (0 as address);
        R = (0 as R);
        (0 as Cup<u8>);
        (0 as ());
        (0 as (u64, u8));

        (x"1234" as u64);
    }
}

comby-reducer reduces the program to something I’d be happy to submit in a bug report:

module M {
    resource struct R {}
    fun t0() {
        R = ();
    }
}

Move syntax is a similar Rust syntax, and like many languages, uses parentheses () and braces {} to delineate and nest expressions that correspond to an underlying parse tree. comby-reducer understands these syntactic constructs well, and can transform content inside balanced parentheses and braces (but won’t get confused if special characters like ( happen inside strings or comments).

This program crashes the Zig compiler:

const BitField = packed struct {
            a: u3,
            b: u3,
            c: u2,
        };

        fn foo(error.Hi
) u3 {
            return bar(&bit_field.b);
        }

        fn bar(x: *const u3) u3 {
            return x.*;
        }

        export fn entry() usize { return @sizeOf(@TypeOf(foo)); }

and reduces to another happy candidate for a bug report:

        fn foo(error.Hi) u3 {}
        export fn entry() usize { return @sizeOf(@TypeOf(foo)); }

And here’s a reduced Solidity smart contract:

contract C {
    function f(uint256[] calldata x, uint256 s, uint256 e) external returns (uint256) {
        (x[s:e]).length;
    }
}

Expand to see original Solidity program

pragma experimental ABIEncoderV2;
contract C {
    function f(uint256[] calldata x, uint256 s, uint256 e) external returns (uint256) {
        (x[s:e]).length;
    }
    function f(uint256[] calldata x, uint256 s, uint256 e, uint256 ss, uint256 ee) external returns (uint256) {
        return uint256[](x[s:e][ss:ee]).length;
    }
    function f_s_only(uint256[] calldata x, uint256 s) external returns (uint256) {
        return uint256[](x[s:]).length;
    }
    function f_e_only(uint256[] calldata x, uint256 e) external returns (uint256) {
        return uint256[](x[:e]).length;
    }
    function g(uint256[] calldata x, uint256 s, uint256 e, uint256 idx) external returns (uint256) {
 111111111111111111256[](x[s:e])[idx];
    }
    function gg(uint256[] calldata x, uint256 s, uint256 e, uint256 idx) external returns (uint256) {
        return x[s:e][idx];
    }
    function gg_s_only(uint256[] calldata x, uint256 s, uint256 idx) external returns (uint256) {
        return x[s:][idx];
    }
    function gg_e_only(uint256[] calldata x, uint256 e, uint256 idx) external returns (uint256) {
        return x[:e][idx];
    }
}

Declaring transformations

comby-reducer uses around 20 transformations to produce the above. These are in the repo, but you can also see a sample of them by expanding the below tab to get a sense of things. Transformations are defined using comby syntax, and we’ll walk through some of them.

Some simple reduction transformations

[delete_paren_content]
match='(:[1])'
rewrite='()'
rule='where nested'

[delete_brace_content]
match='{:[1]}'
rewrite='{}'
rule='where nested'

# Helps put blank bodies across newlines on the same line for line deletion.
[blank_brace]
match='{ }'
rewrite='{}'

[delete_line]
match=':[x\n]'
rewrite=''

[delete_string_content]
match='":[x]"'
rewrite='""'

[remove_first_paren_element]
match='(:[1],:[2])'
rewrite='(:[2])'

[delete_paren_contents]
match='(:[1])'
rewrite='()'
rule='where nested'

This transformation matches any content between balanced parentheses (including newlines) and deletes the content. The :[1] is a variable that binds to matched content. By default, comby-reducer will try to apply this transformation at the top-level of a file, wherever it sees (...). The rule='where nested' tells comby-reducer that it should also attempt to reduce nested matches of (...) inside other matched (...). In general, parentheses are a common syntax to nest expressions in programs, so it makes sense to add rule='where nested'.

Another transformation preserves the first comma-separated element inside parentheses:

[remove_first_paren_element]
match='(:[1],:[2])'
rewrite='(:[2])'

Program syntax often use call or function-like syntax that comma-separate parameters or arguments inside parentheses. This transformation attempts to remove elements in such syntax. This transform doesn’t have a rule part, since it might not be as fruitful to attempt nested reductions inside of :[1] or :[2]. But, we could easily add it.

The default transformations aren’t very language-specific and worked well to reduce the languages I dealt with (Solidity, Move, Zig) to small human-comprehensible programs. Of course, you can define your own transformations with the desired level of syntactic specificity. E.g., we might remove for loops in JavaScript with:

[remove_js_style_for_loop]
match='''
for (:[1]) {
  :[2]
}
'''
rewrite=''

Observing and replaying reduction

While running comby-reducer, I noticed that programs would often reduce to structures like:

function foo() {
}

After some sequence of transformations, a block might end up empty, but contain whitespace. To crunch these, I added the transformation

[blank_brace]
match='{ }'
rewrite='{}'

This transformation simply deletes contiguous whitespace (including newlines). In turn, this often lead to a reduction to something like func () {} which then ends up being removed by a transformation that deletes lines. Nice!

Because I had an easy way to introduce new transformations, I wanted more insight into how transformations behaved. This helped me understand what I could add to help reduction along, or even just discover transformations that would improve formatting.

For example, I noticed one Zig program reduced to:

        fn foo(error.Hi
) u3 {}
        export fn entry() usize { return @sizeOf(@TypeOf(foo)); }

Here I just wanted to group the first pair of parentheses pair on one line like fn foo(error.Hi) u3 {}. So I added something that would match all content up to trailing whitespace within balanced parentheses:

[delete_trailing_paren_whitespace]
match='(:[1] )'
rewrite='(:[1])'

To make this process a bit more snazzy, I added a way to replay transformations. comby-reducer takes a --record argument, and the output can be replayed with comby-reduce-replay. This makes it possible to step through the process. Here’s an example where I manually step through the Zig reduction.

Expand to see HTML rendering of transformation steps

------ 000.step 2021-03-26 04:50:53.499923-04:00
++++++ 001.step 2021-03-26 04:50:54.572018-04:00
@|-1,16 +1,16 ============================================================
 |
 |const BitField = packed struct {
 |            a: u3,
 |            b: u3,
 |            c: u2,
 |        };
 |
 |        fn foo(error.Hi) u3 {
!|            return bar(&bit_field.b);
 |        }
 |
 |        fn bar(x: *const u3) u3 {
 |            return x.*;
 |        }
 |
 |        export fn entry() usize { return @sizeOf(@TypeOf(foo)); }

------ 001.step 2021-03-26 04:50:54.572018-04:00
++++++ 002.step 2021-03-26 04:50:55.616110-04:00
@|-1,16 +1,16 ============================================================
 |
 |const BitField = packed struct {
 |            a: u3,
 |            b: u3,
 |            c: u2,
 |        };
 |
 |        fn foo(error.Hi) u3 {
 |            return bar();
 |        }
 |
!|        fn bar(x: *const u3) u3 {
 |            return x.*;
 |        }
 |
 |        export fn entry() usize { return @sizeOf(@TypeOf(foo)); }

------ 002.step 2021-03-26 04:50:55.616110-04:00
++++++ 003.step 2021-03-26 04:50:57.132244-04:00
@|-1,16 +1,12 ============================================================
 |
!|const BitField = packed struct {
!|            a: u3,
!|            b: u3,
!|            c: u2,
!|        };
 |
 |        fn foo(error.Hi) u3 {
 |            return bar();
 |        }
 |
 |        fn bar() u3 {
 |            return x.*;
 |        }
 |
 |        export fn entry() usize { return @sizeOf(@TypeOf(foo)); }

------ 003.step 2021-03-26 04:50:57.132244-04:00
++++++ 004.step 2021-03-26 04:50:57.748298-04:00
@|-1,12 +1,10 ============================================================
 |
 |const BitField = packed struct {};
 |
!|        fn foo(error.Hi) u3 {
!|            return bar();
!|        }
 |
 |        fn bar() u3 {
 |            return x.*;
 |        }
 |
 |        export fn entry() usize { return @sizeOf(@TypeOf(foo)); }

------ 004.step 2021-03-26 04:50:57.748298-04:00
++++++ 005.step 2021-03-26 04:50:58.356352-04:00
@|-1,10 +1,8 ============================================================
 |
 |const BitField = packed struct {};
 |
 |        fn foo(error.Hi) u3 {}
 |
!|        fn bar() u3 {
!|            return x.*;
!|        }
 |
 |        export fn entry() usize { return @sizeOf(@TypeOf(foo)); }

------ 005.step 2021-03-26 04:50:58.356352-04:00
++++++ 006.step 2021-03-26 04:50:59.464450-04:00
@|-1,8 +1,6 ============================================================
 |
 |const BitField = packed struct {};
 |
 |        fn foo(error.Hi) u3 {}
 |
-|        fn bar() u3 {}
-|        
 |        export fn entry() usize { return @sizeOf(@TypeOf(foo)); }

------ 007.step 2021-03-26 04:51:01.000586-04:00
++++++ 008.step 2021-03-26 04:51:01.596638-04:00
@|-1,5 +1,4 ============================================================
-|const BitField = packed struct {};
 |
 |        fn foo(error.Hi) u3 {}
 |
 |        export fn entry() usize { return @sizeOf(@TypeOf(foo)); }

You can find out more about replaying in the repo.

Comparison to `afl-tmin`

It’s useful to compare some reduction to afl-tmin, which is a readily-available reducer for our afl-instrumented Solidity and Move compilers. Because afl-tmin exploits coverage information, it can be very effective at minimizing the size of inputs irrespective of format (it works for text inputs like our crashing programs, and binary formats like JPEG). At the same time, this catch-all coverage-guided approach sacrifices some ability to exploit properties of the input domain. For bug reports, it’s useful to reduce around certain syntax like parentheses, or otherwise preserve syntactic elements, formatting, or symbol names. comby-reducer makes it easy to tailor the process around the input. For example, the reduced Move program by comby-reducer gives

module M {
    resource struct R {}
    fun t0() {
        R = ();
    }
}

while afl-tmin gives

module M{resource struct R{}struct C{}fun t(x:u){();();();R=();()}}

The afl-tmin variety has some redundant syntax, eagerly deletes whitespace that help with readability, and renames the original function t0 to t. These are not necessarily bad, but there isn’t much room for tweaking.

Another example produced by comby-reduce

module M {
    struct Box3<T1, T2, T3> {}
    fun cpy<C: copyable>() {}
    fun t3<U, C: copyable>() {
        cpy(Box3<U, U> {});
    }
}

versus afl-tmin:

module M{struct S{}resource struct C{}struct o{}struct Bo00<>{}fun h(r:R){}fun y(r:R){}fun t(){}fun t<>(){}fun t(){();(Bo00<U>{})}}

Original program for the above reductions

module M {
  struct S {}
  resource struct Coin {}
  struct Box<T> {}
  struct Box3<T1, T2, T3> {}

  fun both<R: resource, C: copyable>(r: R, c: C) {
      abort 0
  }

  fun cpy<C: copyable>(c: C) {
      abort 0
  }

  fun rsrc<R: resource>(r: R) {
      abort 0
  }


  fun t0() {
      both(S{}, Coin{});
      both(0, Coin{})
  }

  fun t1<R: resource, C: copyable>() {
      both(Box<C> {}, Box<R> {})
  }

  fun t2<R: resource, C: copyable>() {
      rsrc(Box3<C, C, C> {});

      cpy(Box3<R, C, C> {});
      cpy(Box3<C, R, C> {});
      cpy(Box3<C, C, R> {});

      cpy(Box3<C, R, R> {});
      cpy(Box3<R, C, R> {});
      cpy(Box3<R, R, C> {});

      cpy(Box3<R, R, R> {});
  }

  fun t3<U, C: copyable>() {
      cpy(Box3<U, C, C> {});
      cpy(Box3<C, U, C> {});
      cpy(Box3<C, C, U> {});

      cpy(Box3<C, U, U> {});
      cpy( C,Box3<U, U> {});
      cpy(Box3<U, U, C> {});

      cpy(Box3<U, U, U> {});
  }
}

Try it

See the GitHub repository for usage examples and more technical details. Note that comby-reducer is new, developed with simplicity, and not yet very battle tested; feel free to post issues in the GitHub issue tracker. If you want help writing transformations comby syntax, post in the Gitter channel.

Learn more

There are a lot of reducer tools and techniques out there. The academic in me would love to explore and compare these more deeply, but the engineer in me doesn’t have the time. 🙂

So instead, here are some related tools and topics that you might want to explore further.

The Reducing chapter in The Fuzzing Book provides a deeper explanation of reducers, and particularly grammar-based reduction. Here comby-reducer can be seen as a coarse, syntax-only grammar-based reducer that doesn’t need you to write or understand the grammar, nor write any scripts or visitors to hook into a parse tree. Instead, transformations are specified declaratively and appeal to intuitive notions around syntax common to many languages.
This deep-dive article on test-case reduction including various references to existing tools.
A favorite tool mentioned here is C-reduce (which also works well on non-C languages). It is an especially nice resource for explaining interestingness tests which can further guide how and when the input is transformed.

Comby website gets a refresh

June 10, 2020

Rijnard

The old website started off nice and simple and kept everything about comby on one page. As that grew things being unwieldly, and I wanted to partition off the usage documentation and "other things". Another piece of feedback was that the landing page wasn’t working for some (“what is this thing?”).

So over the past couple of weeks I’ve been building a new site piece by piece. I think the result gets rid of a couple of thorns and I like the refresh. I was inspired by the Sorbet website and ended up using Docusaurus, it’s cool.

Extending comby with type information

Where can I find type information as a service?

Using comby with types today

Which projects and languages does this work for?

When is type information actually useful for find and replace?

A more general view on integrating type information

Footnotes

Deconstructing programs with comby-decomposer

Fuzz results: what difference does comby-decomposer make?

Let’s crash the Rust compiler?

Running it yourself

When to choose comby-decomposer

Default syntax definition

Dolla dolla bills

Lambda syntax

Emoji Movie

Wutspace

suǝɹɐd

Dangling

Summary

On syntax definitions

On escaping

How it works

A program reduction tour

Declaring transformations

Observing and replaying reduction

Comparison to afl-tmin

Try it

Learn more

Extending `comby` with type information

Using `comby` with types today

Comparison to `afl-tmin`