Daniel Imfeld's blog - Notes

JS Sidecar

Mon, 05 Aug 2024 00:00:00 GMT

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling - A Survey

Tue, 09 Jul 2024 00:00:00 GMT

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey
Introduction
- The paper defines agents as a system that uses planning, loops, reflection, and other control structures, as well as leveraging the model's reasoning abilities to accomplish a task.
- This paper focuses mostly on the difference between single-agent vs. multiple-agent architectures.
- The multiple agent architectures are then subdivided into vertical and horizontal architectures.
- Each agent has a persona which is basically the system prompt as well as the tools that the agent has access to.
  - In addition to the instructions for the task, the persona may define a specific role such as an expert coder or a manager or a reviewer and so on.
- Tools of course are external function calls that the model can request, such as editing a document, searching the web, or other actions that the model is not able to do inside its computations.
- The paper defines a single agent architecture as those powered by a single language model which performs all the tasks on its own with no feedback from other models or agents. There may be feedback from humans though.
- In a multi-agent setup, each agent typically has a different persona.
- In a vertical multi-agent architecture, one agent access to leader and other agents report to it. There could be multiple levels of hierarchy as well. But the main distinction is the clear division of labor between the different sub-agents.
- In a horizontal architecture, the agents are all more or less equal and are part of a single discussion about the task. Communication between agents is shared between all of the agents. And agents can volunteer themselves to complete certain tasks or call tools.
Key Considerations
- Reasoning
  - Reasoning is basically the same thing that we humans do where we think critically about a problem, understand how it fits into the world around us, and make a decision.
  - For a model, reasoning is what allows it to go beyond its training data and learn new tasks or make decisions under new circumstances.
- Planning
  - Planning is an application of reasoning.
  - And there are five major approaches to it. Task decomposition, multi-plan selection, external modulated planning, reflection and refinement, and memory augmented planning. See understanding the planning of LLM agents.
  - Most agents have a dedicated planning step, which they run before executing any actions. There are many ways to do this. The paper particularly calls out Graph-enhanced Large Language Models in Asynchronous Plan Reasoning AKA "Plan like a graph," and tree of thought as examples which allow the agent to execute multiple steps in parallel.
    - Although my recollection of Tree of Thought was that it was more about trying different permutations of problem solving and not so much about planning.
- Tool Calling
  - Tool calling goes hand in hand with reasoning and is what really allows the model to make effective and informed decisions.
  - Many agents use some iterative process of planning, reasoning, tool calling, and then breaking up the task into further sub steps with more planning and so on.
  - But some papers point out that single agent architectures often have trouble with these long chains of subtasks.
    - https://arxiv.org/pdf/2403.03031
    - https://arxiv.org/pdf/2401.17464
Single Agent Architectures
- Proper planning and self-correction is paramount here.
- A big risk with single agent architectures is that because they don't have any external method of automatically correcting themselves, they may get stuck in an infinite loop where they reason the same step over and over again with the same result.
- ReAct
  - ReAct: Synergizing Reasoning and Acting in Language Models was one of the first single agent methods designed to improve over single-step prompting. In React, which stands for Reason Plus Act, the agent has a cycle of thinking about a task, performing an action based on that thought and observing the output.
  - Aside from improved reliability, one big advantage of this method over previous single-prompt methods was that the sequence of thoughts and actions are all there to see, so it's easier to figure out how the model arrived at its conclusion.
  - But ReAct is susceptible to the infinite loops mentioned above.
- RAISE
  - RAISE, as described in From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models, stands for Reasoning and Acting through Scratch pad and Examples. There's no 'I' but I guess they thought RAISE sounded better than RASPE or something.
  - It's based on the ReAct method, but adds the scratchpad for short-term storage and a data set of similar previous examples for long-term storage.
    - NOTE How does this work?
  - One interesting problem that the race paper found was that agents would often exceed their defined roles such as a sales agent role which ended up writing Python code. The authors also cited problems with hallucinations and difficulty understanding complex logic.
- Reflexion
  - Reflexion: Language Agents with Verbal Reinforcement Learning
  - Reflexion is a method in which the agent is asked to reflect on its own performance with certain metrics such as success state and if the current trajectory matches the agent's desired task.
  - NOTE look at the paper to determine more about these
  - Some limitations cited by the authors
    - Reflextion is prone to falling into non-optimal local minima
    - The agent's memory is simply stored in the model's context with a sliding window and so older important items may be forgotten
- AutoGPT + P
  - AutoGPT+P: Affordance-based Task Planning with Large Language Models
  - AutoGPT+P is a technique specifically designed for use in robotics. It uses computer vision to detect the objects present in a scene. And then can use four tools to try to complete its task.
    - Plan Tool
    - Partial Plan Tool
    - Suggest Alternative Tool
    - Explore Tool
  - The model also works in concert with a traditional planning tool using PDDL or planning domain definition language. This planner helps with translating the model's instructions into things that the robot is actually able to do given its physical limitations.
  - As with many of the above approaches, it does have some problems such as sometimes choosing the wrong tools or getting stuck in loops. And at least as described in the paper, there's no opportunity for human interaction such as the agent asking for clarification or the human interrupting if the robot starts to do something wrong.
- LATS
  - Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
  - LATS is an algorithm based on Monte Carlo Tree Search. You can read the link for more details, but basically, it's inspired by Monte Carlo simulation in which you do a bunch of random runs to get a better idea of the probability space and the best action to take.
  - But as you can imagine, doing a bunch of random runs down a tree with language models can be very slow and expensive. Also, the paper doesn't tackle particularly complex scenarios.
Multi Agent Architectures
- Common themes with multi-agent architectures
  - Leadership of agent teams
  - Dynamic creation of agent teams between stages
  - Information sharing between team members
- Embodied LLM Agents Learn to Cooperate in Organized Teams
  - This method uses a hybrid approach that is mostly a horizontal team, but has a leader agent over the rest of the team.
  - They found that teams with a leader finished their tasks about 10% faster and that without a leader the agents spend about half of their time giving orders to each other. Whereas with a single designated leader, the leader spends 60% of its messages giving directions, while the other agents can focus more on actual exchange of useful information.
- DyLAN
  - Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
  - DyLAN is a dynamic team method which uses elimination rounds to remove the agents that have contributed the least to the task.
  - Team Optimization
    - Each agent is asked to rank the other agents’ results
    - These ratings are aggregated
    - An “Agent Importance Score” is calculated
    - Low-performing agents are removed from the system
- AgentVerse
  - AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
  - Agentverse uses a four-stage process
    - Recruitment, which uses a “recruiter” agent to generate personas for a set of agents to work on this iteration, based on the current goal state.
    - Collaborative decision-making between the agents.
      - This can be vertical or horizontal arrangement, depending on the task.
    - Independent action execution by each agent
      - Each agent uses a ReAct loop with up to 10 iterations to get to the desired output
    - Evaluation of how close the current state is to the goal.
  - This process can be repeated until the goal is reached.
  - One important finding here is that agent feedback is not always reliable.
    - Even if an agent’s feedback is not valid, the receiving agent may incorporate it anyway.
- MetaGPT
  - MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
  - MetaGPT focuses on using structured outputs to communicate between agents instead of plain text in order to reduce unproductive chatter and inefficiencies, such as "how are you? I'm fine".
  - It also implements a message bus which allows agents to publish their information to a common place but only listen to information that is relevant to them.
Discussion
- Single agent patterns tend to work best with a narrowly defined list of tools and well defined processes. They're also easier to implement because there's only one agent and set of tools, and they don't face the limitations of multi-agent systems like poor feedback from other agents or unrelated chatter from other team members. But they are more likely to get stuck in loops and fail to make progress if they find themselves in a situation that does not match their reasoning strengths.
- Multi-agent architectures work best when feedback from different personas helps to accomplish the task, such as drafting a document and then reviewing or proofreading it. They're also useful for performing parallel execution when there are distinct independent subtasks. Multi-agent architecture is particularly advantageous when no examples of the task have been provided.
- Feedback can be very helpful, but it's not a panacea. The AgentVerse paper notes a case where an agent gave invalid feedback to another agent, but it was still incorporated. Similarly, human feedback may conflict with the desired behavior of the agent, but because the language models tend to be willing to please, they may incorporate it anyway.
- Information Sharing
  - Information sharing in a horizontal multi-agent system is very useful, but also has issues. For example, agents can too closely simulate a human when assigned a persona and start asking the other agents small-talk questions such as "how are you?" Agents may also be exposed to information that is irrelevant to their particular task, so systems that allow subscribing or filtering incoming information can be helpful for keeping an agent on task.
  - Vertical architectures tend to not have as many of these issues, but can encounter problems when the managing agent does not send enough information to its team for them to do the job. The paper recommends using prompting techniques to help with this.
- Careful design of the system prompt for the persona can help to keep an agent on task and reduce the amount of unnecessary chatter between agents.
- Dynamic team creation where agents are brought in and out of the system can be a big help because it excludes irrelevant agents from adding noise a particular stage of the problem.
Limitations
- Evaluating agents is difficult and there are not very many good standard benchmarks.
- Many papers introduce their own benchmarks alongside a new agent system, which makes it difficult to compare agent systems beyond those tested in that particular paper.
- Many agent evals are complex and require manual scoring, which can be tedious, limits the size of the evaluation set, and adds the possibility of evaluator bias. The complexity of agents also leads to a lot more variation in their outputs, so it's more difficult to properly determine if an agent's answer is correct or not.
- As with language model evaluations, data set contamination is a problem, where the tasks that the agents are trying to work on can be found in their training data.
- Many standard benchmarks designed for language model testing, such as MMLU and GSM8K, are not applicable to agents because they don't really exercise an agent's ability to reason beyond what you would find in a single call to a language model.
- Some agent eval systems use simpler answers such as yes or no, which are easier to evaluate, but this limits the real world applicability of the eval, where most tasks require more complex answers. More complex benchmarks that use logic puzzles or video games come closer, but even in those cases it's questionable how much it translates to the real world, where tasks are less well-defined and data is dirtier.
- The paper mentions WildBench and SWE-bench as a couple of benchmarks that use real-world data, though WildBench doesn't seem to be designed for agent testing.

Aviator CLI Cheatsheet

Fri, 24 May 2024 00:00:00 GMT

Aviator is a tool for managing stacked branches and PRs. Their CLI can be used separately from the rest of the product. Here are some useful commands for it:
av stack sync rebases branches within the current stack and pushes
av stack sync --trunk is like the above, but also brings the stack up to date with the trunk branch
av stack sync --prune deletes branches for merged PRs
av stack sync --all --trunk --prune brings everything in the repository up to date, deletes merged branches, etc.
av stack submit to create and sync PRs for every branch in the current stack
av pr create creates a PR for the current branch
av stack branch creates a new branch off of the current branch
av stack branch-commit -b createe a new branch and commits the currently staged files (see —help on this one for more options)
av stack switch is a stack-aware TUI for switching branches.
av stack tree shows the same diagram used by switch but only prints it.
av stack reorder to move commits between branches in a stack, or also merge and split branches.
Moving branches between stacks requires a few commands
- git checkout
- git rebase
- av stack sync --parent

Chronicle

Fri, 29 Mar 2024 00:00:00 GMT

Ramus

Fri, 29 Mar 2024 00:00:00 GMT

AWS VPC Configuration

Fri, 01 Mar 2024 00:00:00 GMT

AWS sets up a bunch of convenient things in the default VPC for an account, which you may need to recreate when making a new VPC. Here's how to do that in Terraform.

First, the VPC and subnets:

resource "aws_vpc" "my_app" {
  cidr_block = var.my_app_cidr
  tags = {
    Name = "my_app"
  }
}

resource "aws_subnet" "my_app" {
  vpc_id = aws_vpc.my_app.id
  cidr_block = var.my_app_cidr
  # Set appropriately for your needs
  map_public_ip_on_launch = true
  availability_zone = var.az
  tags = {
    Name = "my_app"
  }
}

For tasks that access the internet, you also need an internet gateway and a routing table to use it.

You can also use a NAT gateway or something, but this is the simplest and cheapest way to go.
Here we'll also set up a VPC endpoint to link directly into S3, which saves egress charges for going through public routes.

resource "aws_internet_gateway" "my_app" {
  vpc_id = aws_vpc.my_app.id

  tags = {
    Name = "my_app"
  }
}

resource "aws_route_table" "my_app" {
  vpc_id = aws_vpc.my_app.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.my_app.id
  }

  tags = {
    Name = "my_app"
  }
}

Finally, we set up a VPC endpoint to communicate directly with S3, without needing to go through the public internet. This can improve performance and save money.

resource "aws_vpc_endpoint" "my_app_s3" {
  vpc_id       = aws_vpc.my_app.id
  service_name = "com.amazonaws.us-west-2.s3"

  tags = {
    Name = "my_app_s3"
  }
}

resource "aws_vpc_endpoint_route_table_association" "my_app_s3" {
  vpc_endpoint_id = aws_vpc_endpoint.my_app_s3.id
  route_table_id = aws_route_table.my_app.id
}

Build Docker Containers from a Monorepo

Thu, 29 Feb 2024 00:00:00 GMT

A .dockerignore file is very important here to reduce the amount of context that needs to be sent to the builder process. Note that the .dockerfile file requires prepending **/ to any match that you want to apply outside of the root directory.
- ```
**/node_modules
**/*.js
**/*.ts
**/apps
**/target
!apps/my-app
apps/my-app/target
```

Then you will want to build with the context directory being the root of the monorepo

#!/bin/bash
docker build \
  -t $NAME \
  -f Dockerfile \
  --ignorefile .dockerignore \
  ../../ # The monorepo root

Building for Rust
If your application is built in Rust, then you can use cargo-chef to help speed up the builds, but the default recipe needs a few directory tweaks if your application has dependencies elsewhere in the monorepo.
Something like this works well:

# This Dockerfile works with cargo chef, which prebuilds dependencies in a
# separate Docker image to speed up builds.

FROM lukemathwalker/cargo-chef:latest-rust-1 AS chef
RUN apt-get update && apt-get install -y pkg-config libssl-dev
WORKDIR /app/apps/my-app

FROM chef as planner
COPY ./libs/some-lib /app/libs/some-lib
COPY ./apps/my-app /app/apps/my-app
RUN cargo chef prepare --recipe-path recipe.json

FROM chef as builder
COPY --from=planner /app/apps/my-app/recipe.json recipe.json
# Build dependencies - this is the caching Docker layer!
COPY ./libs/some-lib /app/libs/some-lib
RUN cargo chef cook --release --recipe-path recipe.json
# Build application
COPY ./apps/my-app /app/apps/my-app
RUN cargo build --release --bin my-app

FROM debian:bookworm-slim as runtime
RUN apt-get update && apt-get install -y pkg-config libssl-dev ca-certificates
RUN update-ca-certificates
COPY --from=builder /app/apps/my-app/target/release/my-app /usr/local/bin
ENTRYPOINT ["/usr/local/bin/my-app"]

If building from Mac or Windows, you may also see benefits from increasing the amount of RAM available to the Docker VM. For example, the default VM from Podman uses only 2GB and while the Rust compiler can run within those boundaries, it's very slow. Using a 16 or 32GB VM can speed up the compilation by an order of magnitude.

Setting up AWS Fargate on ECS

Thu, 29 Feb 2024 00:00:00 GMT

This mostly focuses on using Fargate for one-off jobs. Configuration is in Terraform.
Set up a VPC for your Cluster if you need to.
ECS Cluster
- You need to create an ECS cluster but it doesn't need any configuration beyond being created.
- ```
resource "aws_ecs_cluster" "pipeline" {
  name = "pipeline"
}
```
Running on ARM
- Add "runtimePlatform": { "cpuArchitecture": "ARM64" } to your task definition.

Roles

A task can have a task role and an execution role.
The task role is given to your containers.
The execution role is given to the instance that runs your containers.
Both of these need an "assume role policy" that allows ECS to assume those roles.

Terraform for the roles. This also sets up extra S3 permissions on the task execution role in case you are sending your logs to S3 (see below).

data "aws_iam_policy_document" "ecs_assume_role" {
  statement {
    actions = ["sts:AssumeRole"]
    effect = "Allow"
    principals {
      type = "Service"
      identifiers = ["ecs-tasks.amazonaws.com"]
    }

    condition {
      test = "ArnLike"
      variable = "aws:SourceArn"
      values = [
        format("arn:aws:ecs:%s:%s:*", var.aws_region, var.aws_account_id)
      ]
    }

    condition {
      test = "StringEquals"
      variable = "aws:SourceAccount"
      values = [
        var.aws_account_id
      ]
    }

  }
}

resource "aws_iam_role" "my_task_execution_role" {
  name = "fargate_my_task_execution"

  assume_role_policy = data.aws_iam_policy_document.ecs_assume_role.json

  inline_policy {
    name = "s3_put"
    policy = jsonencode({
      "Version": "2012-10-17",
      "Statement": [        
        {
          "Effect": "Allow",
          "Action": [
             "s3:*"
          ],
          "Resource": [
            "arn:aws:s3:::my-app-logs",
            "arn:aws:s3:::my-app-logs/*"
           ]
        },
        {
          "Effect": "Allow",
          "Action": [
            "ecr:GetAuthorizationToken",
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
          ],
          "Resource": "*"
        },
        {
          "Effect": "Allow",
          "Action": [
            "ecr:BatchCheckLayerAvailability",
            "ecr:BatchGetImage",
            "ecr:DescribeImages",
            "ecr:DescribeRepositories",
            "ecr:GetDownloadUrlForLayer",
            "ecr:GetRepositoryPolicy",
            "ecr:Images"
          ],
          "Resource": [
            format("arn:aws:ecr:%s:%s:repository/*", var.aws_region, var.aws_account_id)
          ]
        }
      ]
    })
  }
}

resource "aws_iam_role" "my_task_role" {
  name = "fargate_my_task"

  assume_role_policy = data.aws_iam_policy_document.ecs_assume_role.json

  inline_policy {
    name = "s3_put"
    policy = jsonencode({
      "Version": "2012-10-17",
      "Statement": [{
        "Effect": "Allow",
        "Action": [
          "s3:*"
        ],
        "Resource": "*"
      }]
    })
  }
}

output "my_task_role_arn" {
  value = aws_iam_role.my_task_role.arn
}

output "my_task_execution_role_arn" {
  value = aws_iam_role.my_task_execution_role.arn
}

Sending Logs to S3

Cloudwatch is the AWS-recommended way to ship logs out but you can also send them to S3.
First you'll need roles like the ones in the previous section to give S3 permissions.

Then create your S3 bucket.

This configuration also autodeletes the files after 90 days.

resource "aws_s3_bucket" "my_app_logs" {
  bucket = "my-app-logs"
}

resource "aws_s3_bucket_lifecycle_configuration" "my_app_logs" {
  bucket = aws_s3_bucket.my_app_logs.id
  rule {
    id = "delete-old-logs"
    expiration {
      days = 90
    }
    status = "Enabled"
  }
}

Then set up your task definition like so.

{
  "taskRoleArn": "ARN of the task role above",
  "executionRoleArn": "ARN of execution role above",
  "requiresCompatibilities": ["FARGATE"],
  "containerDefinitions": [
    // Only needed for custom log routing.
    {
      "name": "log_router",
      "essential": true,
      "image": "amazon/aws-for-fluent-bit:stable",
      "firelensConfiguration": {
        "type": "fluentbit"
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "firelens-container",
          "awslogs-region": "us-west-2",
          "awslogs-create-group": "true",
          "awslogs-stream-prefix": "firelens"
        }
      },
      "memoryReservation": 50
    },
    {
      "name": "my_app",
      "image": "acctnum.dkr.ecr.region.amazonaws.com/image:tag",
      "portMappings": [],
      "essential": true,
      "logConfiguration": {
        "logDriver": "awsfirelens",
        "options": {
          "Name": "s3",
          // Optional, to customize key format which can be useful for batch jobs
          "s3_key_format":
            "/%Y/%m/%Y-%m-%d-my-app-$TAG-%S",
          "region": "your-aws-region",
          "bucket": "my-app-logs",
          // Rotate to a new log file when this size is reached.
          // For persistent services this should be much smaller.
          // For batch jobs I set it large since it's nice to have
          // all the logs for a run in one file.
          "total_file_size": "10G",
          "upload_timeout": "1m",
          "retry_limit": "2"
        },
      }
    }
  ]
}

The logConfiguration options are documented at https://docs.fluentbit.io/manual/pipeline/outputs/s3

Filigree

Sat, 09 Dec 2023 00:00:00 GMT

Filigree is a web app framework based on Rust with SvelteKit or Htmx, which includes common components such as authentication and background workers, and also makes it easier to set up data models and generate SQL, endpoints, and so on to use them.
Task List
- Up Next
  - Figure out what to do with this project
    - I'm still not getting something I really like with SQL generation, but ORMs are still too inflexible. What's the proper solution here?
    - Maybe
      - Rework auth to use JWT method
      - Autogenerate Drizzle table definitions for database so that the frontend code can just use Drizzle for database access.
      - Maybe use GraphQL for complex object fetches?
  - Make test objects easier to manage
    - Instead of a one-size-fits-all bootstrapping we should just do it with a custom function per file, which will speed up the tests overall a bit, and also greatly simplify testing of through models where we need to make sure that the IDs all line up between the different objects.
- Soon
  - Investigate making it easier to auto-forward page actions through SvelteKit to the API
  - Finish tests for through relationships
    - Do this after Make test objects easier to manage
  - Evaluate faster API authentication methods between SvelteKit Backend and Rust Backend
    - Basically to make it work better to make multiple API calls from a single request without hitting the database every time to look up the user/roles/etc.
    - Instead of just a session ID, have an option to generate a short-lived JWT or similar token that contains all the auth info. Store the JWT in the cookie, on each request check the expiration time and get a new one if getting close to expiring.
    - Might also need some hook for client-side requests that can do the renewal if they talk directly to the API.
  - Use docker compose with separate containers for API and SvelteKit parts
    - More trouble than it's worth to force the two parts into a single container
  - More flexible configuration
    - Switch from clap to config or something for the serve command's config
    - Main impetus here is to make it possible to read configuration from ConfigMap-sourced volumes in K8S and similar systems.
- Later/Maybe
  - Simplify permissions system when using custom auth
    - Since we aren't creating the roles and users, we need a way to make sure that people have reasonable permissions when they start.
    - Perhaps just a function that adds default permissions or something like that when we first see a user.
  - organization_members needs to be extendable like users is
    - This is so that there's a place to attach per-org metadata and fields
    - I think the proper thing to do here is that once we fully support Joining tables — Jul 22nd, 2024 we can just put this into the system as a joining table.
  - Improve API error printing
    - Some errors use source and error_stack's default printer ignores it. Should walk the source chain as well. This seems difficult though without using nightly features
  - Metrics tracking and alerts
    - Consistently high CPU, RAM usage, etc.
    - Might just use a service for this, need to see
  - Option to only allow connections from certain IP ranges
    - eg https://www.cloudflare.com/ips/ when using Cloudflare
  - Helpers for retrying requests and other operations
    - rate limit retry-after
    - exponential retry with jitter, etc. for regular retryable failures
    - ability to not retry for certain errors
    - Maybe just fork backon for this?
  - QoL Issues Encountered
    - Conflict in src/main.rs on first run
      - Maybe detect if this is the first run by looking at absence of .state directory and ask if it's ok to just replace it
  - Add auto-bootstrap option
    - When appropriate environment variables are set, allows the API to do the bootstrapping itself when it first starts up
  - Ops helpers
    - This isn't part of the core framework, but a set of tools for automatically doing ops-related tasks.
    - Back up database to cloud storage on a regular basis
    - Deploy on Git push
    - Preview environments?
    - Set up Postgres slow statement tracking and create a report on what was slow
    - Drop partitions of old data
  - Frontend
    - SvelteKit: Error page
    - Add dependencies to package.json like we do for the Rust side
    - App shell with nav bar, etc.
    - Password reset page
    - Profile management page
    - Manage your organization
    - Default Model editor page
    - Default Model list page
    - Admin: invite users to app
    - Admin: administer organizations
    - 404 page
    - TS Type Enhancements
      - Permission names per model
  - HTMX support improvements
    - Debugging option to show a toast whenever a livereload finishes or an htmx error occurs
    - move client-side code from SBBP into Filigree for reuse by other projects
    - theming, dark/light mode with persistence via localStorage and header script to inject dark class
      - DaisyUI provides a lot of this already
    - Island architecture support for Svelte components
      - Each of these should be its own entry point in the Vite config, and from there we can use the manifest support to get the necessary tags
    - Flash/toast support
      - Can probably be done with HX-Trigger header, or maybe an extension
      - Adapter function for actions to have errors return a toast
    - Implement fancy dropdown menu in Alpine with floating-ui
      - https://github.com/awcodes/alpine-floating-ui may work well
      - https://alpinejs.dev/plugins/anchor is the official one, seems to support fewer options
    - Improve livereload performance
      - Not sure how much opportunity there is here but it feels like it can be made 20% faster
      - Dioxus is working on a solution that I should be able to adapt once it's public.
  - Models Backend
    - Move a lot of the auth queries like add_permissions_to_role into the template system, so they can be placed on the model and/or customized
      - Everything in fiiligree/src/users except add_user_email_login which is used by the OAuth system and so is harder to move.
    - Generate endpoints for grandchild objects
      - Probably need to make this generally recursive which may difficult to do via the current template system, might need a macro that can call itself or some fancier fragment rendering.
      - I think what we want to do here is for each item in children to potentially have its own children, instead of recalculating each one independently. Also need some way to walk up the stack of children here for URL generation (or do we? Probably it doesn't matter to have the grandparent in the URL itself).
    - Make it easier for differing get/list model types to be used interchangeably
      - This matters more for HTMX mode
      - Basically we might have a function that renders something for a particular object, but it gets difficult when the input can be either the Get or List result types when they are different.
      - One option may be to generate a type of view struct which can take a reference to either type and then generate a bunch of methods for the shared fields which can return either value. This is a lot of boilerplate, but its all autogenerated, but these aren't used too often
    - Allow joining models to have their own extra data fields
    - For runtime queries, create query builders that use SeaQuery to add the auth clause automatically
      - Add an enum of columns in the table
      - Allow adding any population of data or id at runtime
      - Each child population can be seen as a column as well (but probably don't do this, it should )
      - Add functions for "all base columns" to drop into the query
    - tests for list endpoint filtering/pagination/sorting
    - Models with non-writable, but required, fields don't generate valid tests
      - This can happen when the normal "create" workflow happens in some other way than calling a REST create endpoint.
      - Also consider detecting this and just not generating an "insert" script that can't work.
        
        The criteria is a model that has at least one field which
        
        is not fixed
        
        is not nullable
        
        has no default (SQL or Rust)
        
        is not writable by owner
    - Support file uploads in multipart form submissions
      - A lot of the underlying framework for this is ready but the code itself needs to be written. Main issue is that the current FormOrJson can't be used in that case. Instead we probably need something that even on a JSON file looks like a multipart submission, but just doesn't return any files.
    - Test populated get and list when updatewithparent is false
    - Models can define extra permissions
  - CLI / Template System
    - Bug: CLI doesn't throw an error if a field is defined twice
    - Use toml_edit to allow adding features to Cargo.toml
    - When generating migrations, read in the actual app migrations and ignore tables that we don't care about.
      - This lets us see things like model tables having been removed if, say, a migration file was deleted since it did the wrong thing.
      - Do need to figure out how to handle extra columns added in the model via manual SQL migrations here.
    - Command to print config and diff files against last gen and current config gen
    - Consider using something like sea_query to generate queries instead of a bunch of fixed queries
      - Will add flexibility, and could make a lot of things simpler. Need to see if there's still a good way to pre-generate queries when running the CLI
    - Data-driven default org contents, even if just a JSON blob to start
      - This can piggyback off of the fixture system
      - Considering not doing this since it may be simpler to just do it in a function.
    - Some way to easily add hooks to CRUD operations
      - Developer should probably just edit the generated code here.
      - Maybe add stub functions to each model to allow adding pre/post save hooks
    - Make it easy for additional queries to use the permissions macros
      - Probably moving away from SQL and toward Rust code for permissions checks so this won't matter as much
      - Can suggest using sqlweld here
      - See if there's a path to using a query builder like sea_query instead, maybe as an addition to the existing thing.
    - Autogenerate belongs_to field if necessary when the inverse has is detected without a through table.
      - Currently this returns an error instead
  - Authentication and Registration
    - Make organization_members a real model instead of just a table
      - Once joining tables work properly
    - Make SessionBackend work with custom auth
      - It mostly works but when the tables are in another schema then it doesn't. Need a less-fixed query method here.
    - Rate limiting for login-related endpoints
      - https://docs.rs/governor/latest/governor/
      - https://lib.rs/crates/tower_governor
      - https://github.com/brandur/redis-cell if i need this to be distributed
        
        DragonflyDB is a redis-compatible that has this built-in
    - Passwordless login should include a code that can be pasted into the UI to log in instead.
    - Password requirements
      - length
      - mix of character types
    - Prevent reuse of recent passwords
    - Expire passwords that need a rehash due to upgraded standards
    - Option to expire passwords after a certain amount of time
      - This isn't recommended practice but some enterprise clients require this in order to sign with you.
    - Passkeys
      - https://passkeys.dev/
      - https://github.com/1Password/passkey-rs
    - RSA timecode Authenticator app
    - Authenticator Account recovery codes
    - Account self-deletion
    - Optional TOS acceptance at registration
    - Service accounts
    - Record info about sessions (geoip, user-agent, etc.)
    - Allow viewing other sessions for your user, and potentially logging them out
    - Invite users to your team
    - Invite users to the app, but not in your team
    - Link a new OAuth login that may not have a matching email to an existing account
    - When logging in via OAuth, fetch all known emails from the OAuth provider and see if any of them match an existing account if we haven't seen the account before.
      - Need to handle the case where there are multiple emails and they match different accounts. This probably requires an interstitial after the login page to select the matching account.
    - Waitlist functionality
    - Put auth code behind an adapter so it can be swapped out for a 3rd-party service
  - Permissions and Authorization
    - Project-based Access Control
    - Consider separating permissions checks from the SQL queries themselves
      - This has the upside that it would simplify a lot of checking for permissions and make more complex permissions easier
      - Upsides
        
        SQL templates become simpler
        
        Much easier to implement ABAC
        
        Easier to differentiate authz errors from missing objects
      - Downsides
        
        Requires remembering to actually check for permissions, though that's alleviated some by the template system since the checks can be placed in the access functions.
        
        We also end up reading objects multiple times in update/delete situations, though this isn't that big of a deal. For project-based permissions we could potentially load all the projects though would have to see about that.
      - Should probably do this as permissions is becoming more and more complex as more options are added.
      - Maybe consider something like casbin or biscuits too. Feels like overkill at this point for most things though, probably more appropriate for an API-first product.
      - list query would still need permissions checks in the SQL
    - API Key lookup query needs to change to make permissions a subset of the user's permissions when inherituserpermissions is false
      - This way if a user has some permissions removed, they can't bypass those limits using an old API key which still had those permissions
      - This is actually a good use case for casbin or biscuit, though could still be implemented without it.
    - Feature flags
    - Add "all" model permissions so that some roles have permissions on everything even when new models are added.
      - This needs some more thought though. Some models like user, org, etc don't want to have this blanket rule applied. Might be better to just do this via migrations.
      - Better: Have some models use global org-wide permissions of just "read," "write," etc. and only specific models have their own set of permissions.
    - Ability to apply limits on how many objects are created
      - There's a lot of flexibility here, since the limits could be something simple like number of objects, or some derived metric such as "number of minutes" for an audio-based model.
      - The source for the limits could also come from a number of different places, most commonly pricing plan but could be other stuff such as number of community contributions, upvotes, etc.
      - So the question is what's the proper framework to make it easy to write code that accomplishes all this.
        
        Make it easy to fetch the limits data.
        
        Make it easy to enforce rules around these limits.
    - API call quotas
      - e.g. 3000 calls per month, 20 calls per hour
      - This should be able to group endpoints by categories where they all share a quota
      - Might need to do this with a combination of event tracking and a rate limiter
    - Usage-based pricing
    - Ability to add rate limiting on any endpoint
      - keyed by org
      - keyed by user
      - endpoints can share a bucket
      - Local tracking is fine at first, should eventually support doing it in Redis or similar to support multiple API servers.
    - Ability for admins to assume permissions of other users
      - This needs to be audited as well even if nothing else is audited
  - Workers
    - Ability to run worker tasks in some other process/VM/FaaS
      - Need to figure out configuration for this... this might be better implemented in the task queue itself
    - Outbox pattern support for transactional starting of background jobs
      - Not important for now since Effectum only runs in-process but at some point it will have a server mode and then it will make more sense.
  - Block Storage and File Uploads
    - Option for Uploads via signed URL
    - Generate tests for file upload endpoints
    - When retainfileon_delete is set, optionally add an entry to a table that lists orphaned files.
      - The idea being that we might want to hold on to orphaned files for some period of time, and then delete them later.
    - Image statistics on upload
      - format, width, height
        
        Can detect on upload using code from Pic Store
    - Generate image thumbnail and blurhash
      - Best done in background worker
    - Potentially need a way to upload a file before the parent object exists. Will probably wait for a real use case on this one.
      - Could make the parent ID nullable here and then add an endpoint to "claim" a file for a particular parent object
      - Also could just do the Github style thing and upload them to a publicly available bucket. We don't maintain a file database entry in this case, and just return the public URL for whatever use it is.
    - Allow specifying storage provider presets from env vars, not just in config
  - Notifications
    - Notifications: Other types of notifications?
    - Notifications Internal notifications
      - e.g. Discord webhook notifying myself about events
      - This should have notification templates and also destinations for the notifications
  - Event logging
    - Should support arbitrary types of events and also allow setting up queries that filter on various event sequences for a user/org
  - Workflow System
    - Build the workflow system
      - A lot from Ergo can be reused
    - State chart implementation
    - Workflows can "sleep" by enqueueing a scheduled task that will continue running when it wakes up
    - Endpoint to send an event to a workflow
    - Define workflow actions
    - Ability to define workflows in the app config
    - Scripting for Workflows
      - Piggyback on Ergo but maybe switch to QuickJS or saghul/txiki.js: A tiny JavaScript runtime
  - Hosting
    - Not sure if I'll do any of this, it's really better handled by a reverse proxy
    - LetsEncrypt support
    - TLS/HTTPS support
    - HTTP -> HTTPS redirect
  - User model list/get functions need to return users which are in multiple orgs, and whose current org is a different one
  - Easy setup for commercialization
  - Collect and export metrics to some tool
    - Looks like Honeycomb supports Prometheus format
  - Basic Analytics
  - More complex workflow/funnel analytics
  - Data Hooks and Similar
    - Delete log table for possibility of undelete later
    - Option for auditing on all operations
  - Password confirmation flow for "dangerous" actions
  - User profile photos
  - Copy all tests in test-app into filigree crate where it makes sense
    - Some tests are more convenient to test in the test app, but rely on minimal scaffolded functionality. Should add a minimal server/migration setup in filigree crate tests so that they can run there as well.
  - Support SQLite
    - Not sure how much work it will be to support both Postgres and SQLite, could turn out to be a big hassle due to differing feature support and syntax, but I'll see at some point
    - Things I use that are different in SQLite
      - AL JOIN (it supports them, just not with this syntax)
      - No native uuid and timestamptz types
      - No array_agg - this is probably the biggest one
      - jsonb functions are a bit different
      - parameter binding syntax is different
- Done
  - Use a build script to inject the current filigree version into the value used by add_deps — Aug 6th, 2024
    - Minor thing but makes the release process a bit smoother. Right now this has to be updated manually.
  - When using "string auth IDs" or custom auth in general, make it a different field that references the user/organization/role ID in the external system, and continue using a normal ObjectId type internally — Jul 22nd, 2024
    - This simplifies a whole bunch of code that otherwise has to go back and forth depending on the actual underlying type of the auth object.
  - Joining tables — Jul 22nd, 2024
  - Model should have functions for adding/updating/removing child objects — Jul 17th, 2024
  - Rework query template system — Jul 16th, 2024
    - Create more of the query template stuff in the Rust code instead of in Tera
      - This will make certain things a lot simpler especially when dealing with binding parameters
      - When generating a static query, keep a mapping of binding parameter to position, and have some method for the template system to generate a bunch of bindings from an object.
        
        This allows the query system to call a single function with data generated along with the bindings, which adds the relevant parameter bindings and doesn't require us to keep the two places in sync.
    - Remove _permission from all the structs
      - This removes the ability to have the custom Serialize impl that only serializes fields visible to the Owner if the permission is write.
      - How useful is it anyway to have fields which are only viewable by the owner? Maybe it doesn't matter.
    - Remove permissions checks from queries for global auth scope cases where we already know the permissions from the roles
      - project-level and object-level still need it
    - Change all the query functions into objects on a struct. Consider making the struct into a thing that takes a reference to the AuthInfo.
      - This mostly just makes the typing experience nicer since instead of post::queries::create you call Post::create.
  - Make user/org/role/etc. models optional — May 26th, 2024
  - Allow user/org/role IDs to be strings when using custom auth — May 26th, 2024
  - Add more test apps — May 24th, 2024
    - Add more as big new features are added
    - These should mostly be .gitignored except for the files that filigree doesn't generate
  - Bug: Unique fields need to be unique per org, not globally — May 23rd, 2024
  - Pluggable auth system — May 22nd, 2024
    - Support talking to some other auth microservice, external service, using JWTs or Biscuits, etc.
    - This requires
      - A new function on the AuthQueries trait that can take a request Parts instead of a pre-parsed API key or session ID
        
        Or maybe just change the api_key to pass the raw Bearer token, and make it easy to unpack that into an API key for standard uses
      - trait functions should return some error type other than an sqlx::Error. An AuthError is probably the way to go or maybe even just start using Report there so we can save the internal error
  - Interactive bootstrap command — Apr 26th, 2024
  - Command to list all the environment variables that will be read along with their defaults if not provided
    - Might be a bit complex but worth it since there can be a lot
  - Bug: when deleting a model the files under src/model/NAME/ are not removed.
    - Probably need to read the .state directory to see what files we generated before but aren't being generated this time
    - Also don't delete files which have been edited
  - Anonymous user mode
    - Anonymous users get a default Authed with a specific org and user ID created just for the anonymous user.
  - Allow specifying a user ID for anonymous users to impersonate. This lets them have specific permissions, see certain objects, etc. — Apr 16th, 2024
  - Add init command that runs Cargo init, create-svelte, and creates basic config.toml? — Apr 16th, 2024
  - Add a command that creates a new Postgres user, password, and database for your project — Apr 16th, 2024
  - Support maud/htmx as a SvelteKit alternative for less complicated apps — Apr 16th, 2024
  - Switch out deprecated Jaeger tracing crate for opentelemetry-otlp — Apr 15th, 2024
  - When a type and its populated type are the same, don't generate e.g "ListAndPopulatedListResult", just "ListResult" — Apr 15th, 2024
  - htmx: manifest alterations need to be a vite plugin so they work in dev mode — Apr 12th, 2024
  - htmx: live reload — Apr 12th, 2024
  - htmx: asset pipeline config to bundle CSS and scripts, compress assets, etc. — Apr 11th, 2024
  - htmx: build JS with vite or esbuild, read from file manifest to get list of static files with hash — Apr 11th, 2024
  - htmx: Simple bundling with vite — Apr 11th, 2024
  - htmx: Read vite manifest to support hashed values — Apr 11th, 2024
  - htmx: Instantiate manifest object in app, set up watcher in dev mode — Apr 11th, 2024
  - htmx: obfuscate error middleware should only run on JSON responses (although maybe do HTML obfuscation too) — Apr 10th, 2024
  - htmx: Potentially a new extractor that lives alongside FormOrJson that always returns the form — Apr 10th, 2024
  - htmx: Customize / route in pages — Apr 10th, 2024
  - htmx: Tailwind CSS support — Apr 10th, 2024
  - Auth failures need to redirect to login page instead of just returning a 404 error - — Apr 9th, 2024
    - Errors in general need to do that
  - Skip rendering sveltekit files when it is not enabled — Apr 8th, 2024
  - Remove "sync types" and TS generation code when not using sveltekit — Apr 8th, 2024
  - htmx: Page wrapper that gets standard tag contents for all — Apr 8th, 2024
  - htmx: Allow defining pages and actions — Apr 8th, 2024
    - This works similar to the existing endpoints stuff but made for HTML app
    - Autogenerate a rust module for each path and its actions
    - Should this be in a nested directory structure or just by filenames? Maybe nested directories
  - Backend Error reporting — Apr 7th, 2024
    - Almost done, just need to hook up the error reporting interface into FiligreeServerState so that I can report errors outside of the automatic system.
  - Frontend error reporting — Apr 7th, 2024
  - Generate JS code to talk to the API
    - This is done for custom endpoint functions, not yet for CRUD
  - Add configuration to enable trace export — Apr 6th, 2024
  - Add SQL functions to go between an object ID text and UUID forms. — Apr 5th, 2024
    - Should have these left over from Ergo
  - Frontend: Generate client-side validation — Apr 5th, 2024
    - This is mostly done with the generated zod schema, but some things are needed such as checking against create or update schema as appropriate
    - Right now the UI doesn't update right on a client-side validation failure
  - Set it up to work in production — Apr 5th, 2024
    - Dockerfile
    - handleFetch hook
    - Rust side has a fallback route which directs data to the SvelteKit server
      - Most uses will also have a reverse proxy in front but this will let it work without that
  - Command to print all the environment variables that the generated application will check — Apr 2nd, 2024
  - Specify custom endpoints per model — Mar 29th, 2024
    - This generates an endpoint function, a route, path/query/body parameters, input/output/query structures, and corresponding Typescript functions and interfaces to use it.
  - Allow omitting certain normal fields on list — Mar 27th, 2024
    - e.g. if there's a large JSON field not needed for the list page we should omit it.
  - Easy generation of TS types from Rust types — Mar 25th, 2024
  - Background jobs v1 — Mar 23rd, 2024
    - Define background jobs in config
      - Option to set recurrence in config
    - Global and per-job concurrency settings
      - Actually do this via multiple worker configurations. Can configure workers equal to your concurrency desires.
    - Generate a payload structure for the background job
      - No need to define the contents in the config, the user can just fill in what they need
    - Generate a helper to enqueue a background job
    - Generate a stub payload and function that runs the job
    - Hook it all up
      - Add queue to ServerState
      - Make environment variables
  - Option to allow setting an object ID when creating an object — Mar 22nd, 2024
    - This can help with scenarios where we want to create an object and potentially write it to the database later, but have the ID to use beforehand.
  - Generate Typescript interfaces for models — Mar 19th, 2024
  - Generate tests for child object url endpoints — Mar 17th, 2024
  - File model type — Mar 15th, 2024
    - Files are a separate model, and when they need to be referenced by another model they can just form a parent-child relationship.
    - Add a function for upload
      - where should this function go? probably a new file which handles upload, delete, and whatever else might be necessary
    - Add an endpoint to process an upload directly
    - Delete file when model is deleted
    - Delete files when parent model is deleted
    - Special file configuration
      - upload path template
      - a bucket from storage.bucket to store an upload
      - permissions for download or not
    - Extra optional fields
      - hash
      - file size
      - original filename
  - Initial Block Storage Support — Mar 10th, 2024
    - Block storage abstraction
      - use object_store crate for the underlying interface
    - Storage configuration
      - The idea here is that each storage location can be configured but the details should be completely configurable at runtime.
        
        Or should it always be environment based? Probably yes actually except maybe some basics like bucket name, endpoint, etc.
      - something like STORAGE_${location_name}_BUCKET and so on for each provider
    - Use storage configuration in template
    - Stream request body into object storage
    - Stream multipart files into object storage
    - Allow examining data as it streams through
    - Stream file from object storage as Response body
  - Create endpoints should return children when doing updatewithparent — Feb 29th, 2024
  - ~~Store .gen files in a tar file instead~~ - Cancelled Feb 28th, 2024
    - This reduces the huge number of files in the state directory, but makes merge conflicts more difficult to deal with.
  - Parent/child model relationships — Feb 27th, 2024
    - Configuration
    - Add fields for belongs_to relationships to model structures
    - Generate composite primary key for joining tables and update queries appropriately
    - handle has_one relationships in fetch queries with ID fetch
    - handle has_one relationships in fetch queries with data fetch
    - handle has_many relationships in fetch queries with ID fetch
    - handle has_many relationships in fetch queries with data fetch
    - Allow populating references fields on fetch when configured
    - Updates for has_one relationships with data payload
    - Updates for has_many relationships with data payload
    - Generate CRUD endpoints for children of a particular object
      - This is basically the same as the normal endpoint, but with the parent ID specified in the URL instead of in the payload.
      - When it's not a many relationship, omit the child ID parameter
    - Populated query functions
      - Call populated functions from endpoints when appropriate
    - SQL Queries for child objects
      - List by parent ID
        
        This could be done as just another filter on the list query instead of a new query template
      - insert/update multiple objects
      - Delete by parent ID where ID is not in the set of IDs just inserted/deleted
    - Ensure that each model is not referenced my more than one non-through has relationship
    - Sort CREATE TABLE statements so that parents come first
    - First round of Tests
      - Create a test model with children
        
        Post
        
        has one Poll
        
        has many Comments
        
        has many Reactions
      - populated queries with has_one
      - populated queries with has_many
      - insert with child object
      - update with child object
    - create and update functions should return the resulting objects
  - Test passwordless login in Glance — Feb 14th, 2024
  - Test login with password in Glance — Feb 13th, 2024
  - Frontend: Handle validation errors from server — Feb 11th, 2024
  - On validation failure, return the sent data
    - This helps when SvelteKit has forwarded the request straight to the server, since it allows Kit to return the form data back again to repopulate the form
  - Command to bootstrap initial org and user — Feb 11th, 2024
    - This should have the ability to take admin user's email from either the command line or the environment
  - Initial OAuth login support — Feb 10th, 2024
    - Update createnewuser to implement UserCreator trait
    - Allow creating a user without an email address
    - Look for client id/secret environment variables for each provider and create the ones that are found — Feb 3rd, 2024
    - Hook up oauth code to endpoints — Feb 3rd, 2024
    - Link to existing account when it's a new OAuth login but a known email — Feb 2nd, 2024
    - Client-side code for OAuth login — Feb 9th, 2024
  - Login page — Feb 10th, 2024
  - Easier way to add new fields to base models than redefining the whole model — Jan 29th, 2024
  - Integrate into Glance
  - User Creation updates — Jan 27th, 2024
    - Add default role field to org, for users who are added without any set of roles
    - Add roles field to email_invites
  - Validation for form submissions — Jan 22nd, 2024
  - Use hosts list in post-login redirect processing
  - JSON validation that checks multiple fields
    - do this using axum-jsonschema
    - Derive schemars::JsonSchema on every structure that might need to be validated
    - Might want to add additional validation in the future with an extension to this
  - CORS configuration — Jan 17th, 2024
    - Host allow list for apps that want to lock to the app's host
    - Use CorsLayer::permissive() on /api for apps that want to expose the API to browsers
  - Initial SQL migration should only be written once — Jan 17th, 2024
  - New models should be added in new SQL migrations — Jan 17th, 2024
  - Column changes to models should generate new SQL migrations — Jan 17th, 2024
    - sql-migration-sim crate is written, just need diff functionality
    - Add index tracking to sql-migration-sim
  - New runs diff against previous run and only apply diffed changes — Jan 16th, 2024
    - When running the CLI, save the generated templates (post-formatter) to a state directory
    - On each run, diff the generated templates against the previous state in that directory
    - Apply that diff to the actual output file
      - Need a patch style that matches against nearby lines instead of just using line numbers.
    - Point out conflicts as they occur, probably use git style merge conflict markers
    - diffy supports three-way merge so that's also worth looking at
    - This system allows the user to make changes to the generated files but still update the config.
    - Need a command to rewrite a particular file
    - Need a command to just regenerate the state cache, for when changing formatter settings
  - Ability to make signup open to the public or admin-only — Jan 12th, 2024
  - signup endpoint to create a new org that also creates default roles, adds a user, etc. — Jan 12th, 2024
  - ability to sign up via passwordless email auth if enabled — Jan 12th, 2024
  - Password forgot/reset endpoints — Jan 12th, 2024
  - Passwordless email auth — Jan 11th, 2024
  - Email template system — Jan 11th, 2024
  - List all permissions in a file with descriptions — Jan 9th, 2024
  - For models with global permissions, add permissions check in route layer before hitting database — Jan 9th, 2024
  - MVP Basic email sending, enough to do password management and such — Jan 9th, 2024
  - Integrate with email sending services — Jan 9th, 2024
  - Endpoints for retrieving and updating your own user — Jan 7th, 2024
  - Auto-add appropriate dependencies to Cargo.toml — Jan 7th, 2024
  - Tests for users without proper permissions for each action — Jan 6th, 2024
  - Generate tests for models — Jan 6th, 2024
  - Tests for login/logout — Jan 5th, 2024
  - Generate fixtures for tests — Jan 5th, 2024
  - Write functions needed to bootstrap test data — Jan 4th, 2024
    - These should basically be the version that the actual app will use, just deferring all the surrounding functionality that isn't needed to write basic tests
    - Add org
    - Add roles
    - Add user to org with roles/permissions
    - Add new org (using all the above)
    - Add API key for user
  - Password login/logout endpoints — Dec 29th, 2023
  - Get server to start — Dec 27th, 2023
  - Add permissions checks to endpoints — Dec 27th, 2023
  - Add app-specific AuthInfo object — Dec 23rd, 2023
  - Add AuthQueries implementation — Dec 23rd, 2023
  - Select queries need to return the current permission level on each object for project/object-level permissions — Dec 22nd, 2023
  - Add middleware layer to do permissions check — Dec 22nd, 2023
  - Add authz middleware to do generic predicate check — Dec 22nd, 2023
  - query to load user/roles/org for auth info — Dec 21st, 2023
  - Generate router for all model endpoints — Dec 21st, 2023
  - Create endpoints for the model — Dec 20th, 2023
  - Bearer token auth middleware — Dec 20th, 2023
  - Session cookie auth middleware — Dec 20th, 2023
  - Figure out querystring filter parameters — Dec 18th, 2023
  - Pagination — Dec 18th, 2023
  - Rename team to organization
  - Create Rust structures for each model
  - Bootstrap CLI utility
  - Figure out main config format
  - Figure out model definition
  - Create team, user, role, permissions, etc. tables
  - Create SQL migration
  - Create SQL queries to read/write the model
Setup
- This isn't really full documentation., just notes on how I set things up. More of this will be automated and rough edges smoothed out over time.
- Initial Setup
  - Create a git repository
  - Create a Rust project in it
  - Create a SvelteKit project, add filigree-web as a dependency
  - Add the filigree directory with config.toml and other files
  - Create a rustfmt.toml
  - Delete the default src/main.rs
  - Run filigree to write the source code
  - Add this to the vite config
    - ```
      server: {
        proxy: {
          '/api': {
            target: 'http://localhost:7823',
          },
        },
      },
```
- Set up Database
  - Create Database and Database User
    - cat /dev/urandom | head | shasum for an easy random password
    - ```
    -- Create the user and database
    create role USER login password 'PASSWORD';
    create database THE_DB with owner USER;
```
- Set DATABASE_URL in your .env
- cd YOUR_API_DIRECTORY && sqlx migrate run
- Bootstrap the Database
  - ```
  cargo run --release -- db bootstrap \
    --email YOUR_EMAIL \
    --password 'PASSWORD' \
    --name 'YOUR NAME' \
    --org-name 'ORG NAME'
```
  - The only required option is email. Without a password you can log in via OAuth or through a passwordless email login.
  - You can also supply a prehashed password by using the --password-hash argument instead, and hashes can be created by running cargo run --release -- util hash-password YOURPASSWORD.
Authorization
- Concepts
  - Organization
  - User
    - Users can potentially be in multiple organizations
  - Role
    - Basically a collection of permissions
  - Group
    - This is similar to Role
    - Might not do this, but it can be useful to make this different when more advanced sharing options are created.
  - Actor - all the things (users, roles, groups) that can have permissions on objects
    - When looking at permissions we gather up all the actors that apply to the current user, and use that whole list for permissions checks
- The permissions table is then a list of the permissions each actor has.
- Permissions
  - These should be definable per model once more than one is supported.
  - Tables
    - Global permissions
      - A many-to-many mapping of actor to permission
    - Object permissions
      - A mapping of actor to object and permission
      - For a project-based model, the projects are also just objects
  - Models
    - Role-based Access Control (RBAC)
      - Each role has permissions on whether it can read/write/admin certain types of objects.
      - This is the first one we'll support
      - Might be useful to have two sets of roles, some which can have the global permissions and are only created by administrators, and others which are generic groups of people for managing sharing of projects and objects
        
        Need to think about how much this needs to be in the data model vs. just something in the UI. Probably just a single boolean flag is sufficient.
    - Resource-based access control
      - Read/write/admin on individual objects
      - Each role/user has specific permissions for each object
      - More complex, need to make sure the relevant entries get added for each object when it is created
      - Best for a model where objects are not shared by default.
    - Project-based Access Control
      - This combines with the above concepts but is a middle level of granularity compared to the all-or-nothing of plain RBAC and the often-overly-granular resource-based system.
      - A project acts as a container for objects, and actors can have permissions on the project instead of the individual objects.
      - Will probably implement this after RBAC, just need additional functionality for project management and ability to
      - Both individuals and roles can be added to projects
    - Attribute-based Access Control (ABAC)
      - Objects have attributes and actors
    - Public sharing
      - Some sites are built around publicly sharing your stuff.
      - This is basically the same as the resource-based access control model, but with the ability to share with an "anyone" user, and then all the permissions checks check for that actor ID as well.
      - The main change here is that the organization ID checks need to be relaxed when looking for publicly-shared objects.
  - Types of default permissions
    - creator only until shared
    - Creator gets write access, everyone else gets read access
    - everyone can see, only creator can write
    - Everyone in the project can see
    - Everyone in the project can edit
Model relationships
- Types
  - parent-child one to many
  - parent-child one to one
  - Many to one
    - Usually this isn't really a parent-child paradigm so much as related objects. This is actually similar to ActiveRecord belongs_to on the parent object, but without the meaning implied by the word "belongs".
    - e.g. Image uploads can all reference a single conversion configuration, but any image can be switched to a different configuration.
    - The main thing that we want here is the ability to easily fetch the associated objects instead of just their IDs
  - Many to many
    - Almost always best implemented with a join table
- Configuration keywords
  - has - For one-to-one or one-to-many parent-child relationships.
  - has with through for many-to-many relationships
    - In this case through is the name of the model that holds the joins. Making it a full-fledged model allows adding other information to the relationship, if applicable.
    - Need a special mode for through models that will omit the normal id field since its primary key will be a composite of the IDs of the two other models.
  - references for many-to-one relationships
    - This is actually accomplished by the existing references member on a model field definition.
    - Added 2 extra fields that allow setting the option to automatically populate the field on list or get
- Where does the linking field actually go?
  - When there's a linking model, it goes into that model's table, of course.
  - Otherwise for a has relationship it goes into the child object. This is slightly inefficient in a true has_one case but those are rare in practice. In general those turn into either a complex JSON field inside the model itself, or it would be a has_many.
Pricing Plan Support
- Add pricing tiers and permissions that go along with those tiers - this should mostly be data driven
  - Need ability for numeric values as well (e.g. max 3 projects, 500MB storage, up to 60 minutes of something, etc.)
- Need ability to do grandfathering of pricing tiers and permissions
  - Essentially this means that both price plans and permissions sets should be versioned, and separately.
  - And there should be the ability to update existing plans and all previous versions
Workflows
- Workflows are basically state machine definitions which can be instantiated with some data, linked to users/orgs/etc.
- Workflows can perform some set of predefined actions, which may include sending emails, notifying a user, spawning other workflows, etc.
- Workflows can define transitions to run after a certain amount of time
- Consider a scripting engine or something for checking state when transitioning (perhaps just reuse a bunch of code from Ergo for this)
Old Notes
- This should be made modular somehow so it's easy to maintain and add new stuff.
  - Modules define dependencies
- Config
  - formatters to use
  - SQL dialect (postgres or sqlite)
- Model definition
  - Write in TOML
  - Each field has
    - SQL type
    - nullable (default not null)
    - Rust type (can be inferred from the SQL type, but can be overridden)
  - Define Indexes and primary key
    - indexes can maybe just be defined in raw sql
  - Can have child relationships, one -to-one and one-to-many
  - Can have JSON fields, which deserialize to either JSON values or other Rust structs
    - The rust structs can be defined in the config as strings or something like that
  - List of other query functions to generate (find by field, etc.)
  - Permissions
    - Automatic checking on all operations
    - Ability to define which permissions are needed for what
  - some kind of hook on update? or is this better done by just changing the generated code myself
- Consider some kind of codegen for even easier crud endpoint generation
  - Need ability to generate migrations that add/remove columns too
    - this can definitely be manual to start though.
- Authentication
  - Need a user/account mapping to support multiple auth methods per user.
- Scrapped attempt to switch to SeaORM
  - The experience starting to try this ended up with Rework query template system — Jul 16th, 2024
  - Problems
    - exec_with_returning only supports returning a single model, this isn't an issue with raw sqlx
    - find_with_related only supports a single related model, not an issue with the current query building
  - But the nice things were...
    - Easy to add auth query with a wrapper
    - Easy dynamic query construction
    - ActiveModel is pretty nice
    - Easier to decide at runtime which things to populate
    - Templates were easier to deal with, no more gnarly templates with a bunch of ifs generating SQL or functions to call queries.
    - Partial Models were nice

PDF Extraction

Sat, 02 Dec 2023 00:00:00 GMT

A full pipeline that includes the below techniques and more: VikParuchuri/marker: Convert PDF to markdown quickly with high accuracy
tesseract is one of the leading OCR applications/libraries
- Usually want PSM mode 4 (single column of text) or 6 (single uniform block of text)
- Tesseract Page Segmentation Modes (PSMs) Explained: How to Improve Your OCR Accuracy - PyImageSearch

OCR often works best with some preprocessing

# get grayscale image
def get_grayscale(image):
    return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

def thresholding(image):
    return cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

Deskew

Table extraction
- Multi-Column Table OCR - PyImageSearch
- Microsoft Table Transformer (TATR)

SBBP

Sun, 19 Nov 2023 00:00:00 GMT

Preview a CSV In-Browser

Fri, 10 Nov 2023 00:00:00 GMT

This uses Papa Parse to parse the first few rows of a CSV to preview it before uploading.

<script lang="ts">
  import Papa from 'papaparse';
  
  let columns: string|undefined;
  let previewRows: Array<Record<string, string>> = [];
  
  async function openFile(file: File|undefined) {
    if(!file) {
      return;
    }
    
    Papa.parse(file, {
      header: true,
      preview: 3, // To read first 3 data rows
      skipEmptyLines: 'greedy', // Well-formed CSV files don't need this
      complete(parsed) {
        if(parsed.errors?.[0]) {
          error = parsed.errors[0].message;
        } else if(parsed.meta.fields) {
          columns = parsed.meta.fields;
          previewRows = parsed.data;
        }
      }
    })
  }
script>

<input 
  type="file"
  name="csvfile"
  accept=".csv,text/csv"
  on:change={(e) => openFile(e.currentTarget.files?.[0])} 
/>

PromptBox

Tue, 31 Oct 2023 00:00:00 GMT

This utility allows maintaining libraries of LLM prompt templates which can be filled in and submitted from the command line.
Github, Website

A sample prompt. Each of the options below becomes a CLI flag which can fill in the template.

description = "Summarize some files"

# This can also be template_path to read from another file.
template = '''
Create a {{style}} summary of the below files
which are on the topic of {{topic}}. The summary should be about {{ len }} sentences long.

{% for f in file -%}
File {{ f.filename }}:
{{ f.contents }}


{%- endfor %}
'''

[model]
# These model options can also be defined in a config file to apply to the whole directory of templates.
model = "gpt-3.5-turbo"
temperature = 0.7
# Also supports top_p, frequency_penalty, presence_penalty, stop, and max_tokens

[options]
len = { type = "int", description = "The length of the summary", default = 4 }
topic = { type = "string", description = "The topic of the summary" }
style = { type = "string", default = "concise" }
file = { type = "file", array = true, description = "The files to summarize" }

Task List
- Up Next
  - Switch request layer to use Chronicle
- Soon
  - Verbose mode should print token stats at end
  - List command
    - List all templates in a directory
    - should also take a filter
    - Short mode for completion
    - by default print template name and description in a table
  - Show command to output the information from a template
  - "run" command detection is fragile
  - Add tools definitions to templates
    - This lets us run a script and get a JSON output
- Later/Maybe
  - Save all invocations in a database? (will do as part of Chronicle switch)
  - Allow templates to reference partials in same directory
  - Allow templates to reference partials in parent template directories
  - Define ChatGPT functions in the prompt? Probably skip this, more appropriate for some other project
  - bash/zsh Autocomplete template names
  - Can we autocomplete options as well once the template name is present?
  - Recall previous invocations
  - Option to trim context in the middle with a message or something like that
- Done
  - Pass images to Ollama — v0.3.0 Dec 13th, 2023
  - Support for GPT4 Vision — v0.3.0 Dec 13th, 2023
    - Can pass images in base64
  - Support OpenRouter — v0.2.0 Dec 8th, 2023
    - OpenRouter offers an OpenAI compatible API which is probably the easiest way to add this.
  - Set prompt format, context length, etc. per model - v0.2.0 Dec 8th, 2023
    - Done specifically for Together right now, can expand this to generic host at some point
    - Support standard formats and allow custom formats too
    - Needed for some providers that don't apply the template for you, or who don't provide accurate info about context length and other things.
    - This can be defined in the model definition once it can be an object (see Support multiple hosts - v0.2.0 Dec 8th, 2023).
  - Support together.xyz model host - v0.2.0 Dec 8th, 2023
    - Fetch model info from https://api.together.xyz/models/info
    - Short term cache on the model info
    - Get the config from the model info to determine how to format the prompt, stop tokens, etc.
      - Some of the configs here actually don't include things like system prompt...
        
        Maybe just build in templates to the tool and lets them be specified in the config somehow
      - Looks like context length is missing from some models as well
    - max_tokens seems have a very small default, need to set this higher to be useful
  - Support multiple hosts - v0.2.0 Dec 8th, 2023
    - Allow defining hosts beyond the built-in hosts
      - API url
      - Request Format (currently just OpenAI and Ollama format but probably more in the future)
      - Environment variable that holds the API key
    - Ability to configure the default host for non-GPT-3.5/4 models (whereas now Ollama is the default)
    - Need a way to specify in the model name which host to use
      - Actually the way to do this is to allow the model name to be either a string or a { host: Option, model: String } structure.
    - Tests
      - config file overrides specific fields of bulit-in hosts
        
        e.g. host.openai.api_key = "DIFFERENT_VAR_NAME"
      - Adding new hosts from the config file
      - Use default provider when none is specified
      - Set default_host to something else
      - Complain when default_host refers to a nonexistent host
      - Alias handling
        
        Alias can be a full model spec
        
        Model can be a full model spec and also reference an alias which is a full model spec. Should fetch the alias from model and merge the remaining fields together in this case
  - Testing
    - stop at the top_level config
    - Resolution of model options between different configs
    - Don't require a config in every directory
    - Malformed configs raise an error
    - Malformed templates throw an error
    - templates resolved in order from the current directory
    - Look under ./promptbox.toml and ./promptbox/promptbox.toml
    - Prompts can be in subdirectories
    - Prepend
    - Append
    - Prepend and append
    - all types of arguments
    - Bool arguments are always optional
    - required arguments (switch from required to optional)
    - Array arguments
    - Template model options should override config model options
    - make sure it works to invoke with command-line options and template options at the same time
    - system prompt, embedded and in separate file
    - json mode
  - Handle 429 from OpenAI — v0.1.2 Dec 4th, 2023
  - Chop off too-large context, option to keep beginning or end — v0.1.1 Dec 1st, 2023
    - Should also be able to specify which inputs to slice off i.e. keep the fixed template intact but remove some of the piped-in input
    - Ideally have per-model context values.
      - Ollama can get this from the API.
      - OpenAI has few enough models that we can do some basic pattern matching to make a guess
      - But need ability to specify a lower cap too, e.g. maybe we never actually want to send 128K tokens to GPT4
  - Token counter functionality — v0.1.1 Nov 30th, 2023
  - Set up CI and distribution — Nov 21th, 2023
  - Streaming support for openai — Nov 14th, 2023
  - Append any additional positional arguments
  - Append input from stdin
  - Support format="json"
  - Streaming support for ollama
  - Integrate with ollama
  - Option type to paste a file contents in, and allow wildcards for array files
  - Send request to model
  - Move the main command to be a "run" subcommand
  - Basic functionality
  - Define CLI options in template file
  - Help output always shows openai_key (maybe due to .env?)

New Project Checklist

Tue, 03 Oct 2023 00:00:00 GMT

A lot of this applies mostly to library projects that are more likely to draw outside contributions.
General Tasks
- Set up a justfile to run common tasks
  - ```
  _default:
    @just --list
```

JS Projects

husky to install Git hooks
For public projects, set up Github PR template with a note to add a changeset.
lint-staged to help with running prettier on commit
- Sample .lintstagedrc file
changesets for changelog generation
- Add @changesets/changelog-github for including Github PR info in changelogs
- Add these package.json scripts
  - ```
  "changeset": "changeset",
  "generate-changelog": "source ghtoken; GITHUB_TOKEN=${GITHUB_TOKEN} changeset version",
```
- Install changeset-bot.

ESLint

Disable some annoying rules which are enabled by common config packs

rules: {
    'prefer-const': 'off',
    '@typescript-eslint/explicit-module-boundary-types': 'off',
    '@typescript-eslint/ban-ts-comment': 'off',
    '@typescript-eslint/ban-types': 'off',
    '@typescript-eslint/no-empty-function': 'off',
    '@typescript-eslint/no-non-null-assertion': 'off',
    '@typescript-eslint/no-unused-vars': 'warn',
 }

Prettier

Install prettier-plugin-tailwindcss

{
  "useTabs": false,
  "singleQuote": true,
  "trailingComma": "es5",
  "printWidth": 100,
  "plugins": ["prettier-plugin-svelte", "prettier-plugin-tailwindcss"],
  "overrides": [{ "files": "*.svelte", "options": { "parser": "svelte" } }]
}

For now, update package.json format and lint scripts to workaround a bug in Prettier 3.0. (Note: this seems to be fixed, but need to confirm)
- ```
    "lint": "prettier --ignore-unknown --check './**/*' && eslint .",
    "format": "prettier --ignore-unknown --write './**/*'"
```

Rust Projects

Set up rustfmt.toml the way I like it

edition = "2021"
imports_granularity = "Crate"
group_imports = "StdExternalCrate"

Glance

Fri, 29 Sep 2023 00:00:00 GMT

Time Series Forecasting Libraries

Tue, 26 Sep 2023 00:00:00 GMT

Extracting Contacts from an iOS backup

Thu, 21 Sep 2023 00:00:00 GMT

This is tricky since the format is not well-documented and there are a bunch of files with just random hexadecimal strings as names.
On Mac, the backups are stored at ~/Library/Application Support/MobileSync/Backup. For some reason the contents of this directory aren't visible from the terminal, but are in Finder. If you copy the backup into another directory then you can use the backup in Terminal as well.
Extracting Contacts
- The address book file is in the backup at 31/31bb7ba8914766d4ba40d6dfb6113c8b614be442. This is a SQLite3 database.
- The main tables you care about here are ABPerson and ABMultiValue. To do a simple extraction of names and phone/email you can use a query like this.
  - ```
  select ABPerson.last, ABPerson.first, ABMultiValue.value
  from ABPerson,ABMultiValue
  where ABMultiValue.record_id=ABPerson.ROWID order by Last, First
```
Locating Files
- The Manifest.db is a database of all the files. You can run a query like this to find a file.
  - ```
  SELECT fileID, relativePath
  FROM Files
  WHERE relativePath like '%Address%';
```
- From there, the first two digits of the fileId indicate the directory to look into, and the entire fieId is then the filename in that directory.
- The pypi page for the iOSbackup Python package also has a list of commonly needed files and their fileID values.

Buzzy

Thu, 07 Sep 2023 00:00:00 GMT

Experimental AI bot to talk with my kids and answer questions.
The ecosystem around voice-based chats has improved a lot since I started this project. On hold for now, but I'll probably start this up again later in 2024 and just use more services instead of trying to do it all on CPU.
https://github.com/dimfeld/buzzy
Task List
- Up Next
- Soon
  - Basic intent detection
  - Run web searches and generate an answer from the results.
  - read system message from a file
  - record llm pipeline actions for later analysis
- Later
  - Check out whisper-turbo for in-browser voice recognition
  - optional configuration to use better models that require GPU
- Done
  - set up basic ChatGPT workflow
  - Voice recognition
  - Basic TTS
  - Websocket-based communication
  - Stream results back to client
System Prompt Example
- You are Buzzy, an AI bot that answers questions for children. Your answers should be appropriate for a smart six year old boy, but also don't dumb your answers down too much.
Ideas
- Do I want to do a RAG-based conversation learning/memory?
- Decide whether to send back previous chat messages based on how much time has passed.
- Voice recognition to ask questions
  - Choice:
    - Decision: nvidia/stt_en_fastconformer_transducer_large
    - Options:
      - Run Whisper in browser?
      - Whisper on server?
        
        Too slow when running on CPU
      - nvidia/stt_en_fastconformer_transducer_large
        
        runs fast (~500ms for short passages) and works well
  - Needs to run ok on just CPU
  - Seems to work best to just send the audio in one big chunk to the server.
  - Huggingface Voice Recognition Leaderboard
- TTS to say responses
  - How easy is it to use one of the new models for this?
  - Choice:
    - Decision: Mimic 3
    - Options:
      - Bark seems promising
        
        https://github.com/suno-ai/bark
        
        https://github.com/serp-ai/bark-with-voice-clone
        
        https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer
        
        Very slow on CPU, RTF of 6-8
      - XTTS
        
        https://github.com/coqui-ai/TTS/blob/dev/docs/source/models/xtts.md
        
        Sounds the best, but has RTF of 4-5 on CPU
      - Mimic 3
        
        This turns out to be the only solution that both sounds good and runs in realtime on CPU
        
        Running with lengthScale 1.2 slows it down a bit and seems to give best results
- Some kind of 3D avatar that's like a dinosaur or robot or something?
- Scrensaver mode that does a photo carousel
Intent Detection
- DeBERTa-v3-base-mnli-fever-anli seems to work well for this at first try. Haven't really exercised it significantly yet though. The creator of that model now also has deberta-v3-large-zeroshot-v1.
- Tasks
  - Figure out if something is a question that can be answered by searching the web and/or wikipedia
  - How many days until...
  - Show me pictures of...
- When doing web search and intent , also need to detect if a query builds on the previous queries or not.
  - "No, the blue one" has no context but the context is probably in the previous message.
  - Small models don't seem to do great on this but gpt-3.5-turbo-instruct does well.
    - ```
    Assistant: {assistant's last message}
    
    User: {user's question}
    
    Does the user's question ask for clarification on the assistant's statement? Only answer yes or no.
```
- For this we can probably just pair up the last assistant message with the latest query, since they tend to include all the necessary info again in every message. Then use GPT to create a proper search for it.
- Do we even need to do the detection? Will it work to just ask GPT to make a search for the query?
  - Takes some tweaking of the prompt but this seems to work well.
  - ```
  This is an excerpt of a chat with an assistant and a user:
  
  Assistant: {assistant's last message}
  
  User: {user's question}
  
  What would be a good web search to answer the user's question? If the question is asking for clarification on the assistant's statement, then the web search should account for that. If it is a new line of questioning, then ignore the assistant's statement. Respond only with the web search and nothing else.
  
  Web search:
```
Web Search
- Use Brave Search API to do web searches to answer questions
  - Should searches be an intent, or should we run searches for anything that doesn't return another intent?
  - Maybe also wikipedia/wikidata?

Smelter

Tue, 27 Jun 2023 00:00:00 GMT

Huggingface Transformers

Sun, 14 May 2023 00:00:00 GMT

source: https://huggingface.co/learn/nlp-course/chapter0/1?fw=pt

Pipelines

from transformers import pipeline

classifier = pipeline("sentiment-analysis")

Full list of premade pipelines: https://huggingface.co/docs/transformers/main_classes/pipelines
Pipelines take a model argument. Some models also specify a particular pipeline to use, and in that case you can omit the pipeline name.

generator = pipeline("text-generation", model="distilgpt2")

Mask model for filling in words

from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about  models.", top_k=2)

Pipeline Implementation

Each pipeline is just running the few steps needed to run a model. For example, a BERT sequence classification pipeline may do something like this:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

text_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"

# Tokenize the input
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
inputs = tokenizer(text_inputs, padding=True, truncation=True, return_tensors="pt")

# Run the model
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)

# Softmax to convert from logits to probabilities
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# predictions == tensor([[4.0195e-02, 9.5980e-01],
#        [9.9946e-01, 5.4418e-04]], grad_fn=)

model.config.id2label
# {0: 'NEGATIVE', 1: 'POSITIVE'}
# So the first input is positive, second input is negative.

Specific Model Types
- In addition to AutoModel the transformers library provides classes for specific model types, such as BertConfig and BertModel. In most inference-only cases AutoModel is fine though.
- When not training a model from scratch, you will usually preload a specific config.
- ```
model = BertModel.from_pretrained("bert-base-cased")
tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
```
- from_pretrained will also download the weights and related files, if needed.
- When you have performed additional training, you can use model.save_pretrained(filename) to save the config and the weights to disk.
Tokenizers
- Tokenizers can both go from raw text to token IDs with tokenizer.tokenize and tokenizer.convert_tokens_to_ids, and back from token IDs to words again with tokenizer.decode, which will both convert IDs to tokens and combine subword tokens into full words.
- The padding token ID can be retrieved from tokenizer.pad_token_id.
- Attention masks can be used to tell the model to ignore certain tokens. This usually matches to the locations of the padding tokens being 0 and everything else being 1.
- ```
ids = torch.tensor([...])
attention_mask = torch.tensor([...])

outputs = model(ids, attention_mask=attention_mask)
```
- Depending on the tensor framework in use, you can ask the tokenizer for different types of tensors.
- ```
inputs = tokenizer(texts, return_tensor="pt") # PyTorch
inputs = tokenizer(texts, return_tensor="tf") # TensorFlow
inputs = tokenizer(texts, return_tensor="np") # NumPy
```
- Tokenizers support all the standard configuration
  - truncation=True to truncate inputs longer than model context. max_length=16 to use a custom truncation length
  - padding=True to pad inputs to the same length

Training

To train a model, you tokenize your inputs and then add an additional labels property which is a tensor with the expected answer for each one.

import torch
from transformers import AdamW, AutoTokenizer, AutoModelForSequenceClassification

checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification(checkpoint)

sequences = ["string1", "string2", "etc"]
batch = tokenizer(sequences, padding=True, truncation=True, return_tensors="pt")

# The answers for each of the above
batch["labels"] = torch.tensor([1, 0, 1])

# Single step, usually a whole training loop would go here.
optimizer = AdamW(model.parameters())
loss = model(**batch).loss
loss.backward()
optimizer.step()

Because bert-base-uncased is not originally set up for sequence classification, the library will discard the original model head and add a new randomly-weighted head for sequence classification.
The datasets library automatically splits a dataset into training, validation, and test sets.
dataset.features describes the feature names and types, including (when applicable) the descriptions of what each label number actually means

You can use dataset.map to tokenize while keeping all the data in the much more efficient Apache Arrow format. It also does multiprocessing and caches results.

e.g. for a BERT next-sentence prediction:

from transformers import DataCollatorWithPadding

def tokenize(row):
  return tokenizer(row["sentence1"], row["sentence2"], truncation=True)

tokenized = dataset.map(tokenize, batched=True)

# Using the collator to pad this way per batch is more efficient than padding everything to the max length across all items
collator = DataCollatorWithPadding(tokenizer=tokenizer)
batch_size = 512
samples = collator(tokenized["train"][:batch_size])

With that set up, you can start your training loop using the Trainer class, which handles all the batching, gradient descent, etc.

from transformers import TrainingArguments, Trainer, AutoModelForSequenceClassification
import evaluate
# Can also pass `push_to_hub=True` to automatically push to Huggingface Hub when done
training_args = TrainingArguments("directory-to-save-to", evaluation_strategy="epoch")

model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

# A function to report metrics at the end of each `evaluation_strategy` from the TrainingArguments
def compute_metrics(eval_preds):
	metric = evaluate.load(same arguments that loaded the dataset)
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

trainer = Trainer(
	model,
	training_args,
	train_dataset=tokenized["train"],
	eval_dataset=tokenized["validation"],
  	# This can be skipped if using a DataCollatorWithPadding since that's the default when omitted.
	data_collator=collator,
	tokenizer=tokenizer,
  	compute_metrics=compute_metrics
)

trainer.train()

Full training loop example at https://huggingface.co/learn/nlp-course/chapter3/4?fw=pt
Once the model has finished training, the Trainer will let you run the model.

predictions = trainer.predict(tokenized["validation"])
# { predictions: [predicted logits for each row], label_ids: [correct answers], metrics }

Daniel Imfeld's blog - Notes

JS Sidecar

Task List

Up Next

Soon

Later/Maybe

Done

Basic Design

Rust Side

JS Side

Alternatives

Why not Deno?

Why not bindings to QuickJS/Boa/etc?

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling - A Survey

Introduction

Key Considerations

Reasoning

Planning

Tool Calling

Single Agent Architectures

Multi Agent Architectures

Discussion

Information Sharing

Limitations

Aviator CLI Cheatsheet

Chronicle

Task List

Up Next

Soon

Later/Maybe

Done

Ramus

Task List

Up Next

Soon

Later/Maybe

Done

Bot Abstraction

Notes from Phind

AWS VPC Configuration

Build Docker Containers from a Monorepo

Building for Rust

Setting up AWS Fargate on ECS

Filigree

Task List

Up Next

Soon

Later/Maybe

Done

Setup

Initial Setup

Set up Database

Old Notes

PDF Extraction

SBBP

Task List

Up Next

Soon

Later

Done

Preview a CSV In-Browser

PromptBox

Task List

Up Next

Soon

Later/Maybe

Done

New Project Checklist

General Tasks

JS Projects

ESLint

Prettier

Rust Projects

Glance

Task List

Up Next

Soon

Later

Done

Basic Design