Projects

Check out what I've been working on.

Passive Logging for Computer-Use Modeling

Built a non-invasive app to collect behavioral data for computer use models. Explores what counts as ground truth in computer use data and how models reason about user intent.

See demo
Automated Prompt Optimization

Built an extension to promptimizer that optimizes prompts for AI apps across different task types. Achieves ~200% improvements in accuracy when applied to binary classification and open-ended text generation tasks. Includes extensive work on prompt engineering techniques.

LLM Evaluation Uncertainty

Conformal prediction for LLM-as-judge systems to quantify evaluation uncertainty. Includes LLM-as-judge evaluation using Sonnet 4, multiple conformal prediction strategies, and interactive visualization for uncertainty quantification.

Lightweight eBPF Runtime for SmallSat Operations

Research on novel runtime systems for multi-tenant satellite operations. Addresses the challenges of managing hundreds of commodity SmallSats with varying hardware capabilities under intermittent communication, focusing on isolation and security in constrained space environments.