Building at the intersection of ML systems, embedded firmware, and distributed infrastructure. Currently researching kernel optimization for neural network workloads.
Machine Learning • Operating Systems • Distributed Systems • Computer Networks • Data Structures & Algorithms • Parallel Computing • Embedded Control • Computer Architecture
Programmer (Unity C#), Developer & Audio Engineer. Built game prototypes in collaborative team environments using industry-standard tools.
A simplified electronic trading engine written in pure C for Linux. It accepts buy and sell limit orders from stdin or a file, maintains an in-memory order book, and matches orders using price-time priority
Interactive tool modeling Transformer inference scaling. Adjust layers, sequence length, precision, and hardware to visualize latency, attention cost, and KV cache behavior in real time.
Explore tokenization, attention patterns, and internal representations of large language models with real-time visual feedback.
Built an AI agent for RTL verification that reads Verilog, generates testbenches, runs simulations, parses results, and reports pass/fail automatically.
Benchmarked matmul and convolution kernels across CPU/GPU. Achieved 35% training speedup via memory access optimization and tensor-level tuning.
Embedded C firmware with FSM control logic, timers, interrupts, GPIO, and event-driven state management for real-time sensor-driven behavior.
Python toolkit to parse, validate, and analyze structured system logs. Modular architecture for reusable validation logic across datasets.
Raft consensus-based KV store with linearizable reads/writes, leader election, log replication, and snapshotting for fault tolerance.
Low-level packet capture decoding Ethernet, IP, TCP/UDP headers with real-time traffic visualization and filtering by protocol, port, and address.
User-space allocator with first-fit, best-fit, and buddy strategies. Benchmarked fragmentation and throughput against glibc malloc.