work
things i've built
these are projects i built to answer questions i actually had. each one is live. you can use them right now.
financial earnings research agent
ask natural language questions about real company earnings. and get answers cited back to the exact sentence in the actual SEC filing.
what it does
type something like "what were Apple's red flags in their latest 10-Q?" and the agent fetches the real filing from SEC EDGAR, chunks and indexes it into a vector database, and answers with citations. follow-up questions work. it remembers the company from the session, so you don't have to repeat yourself.
the interesting parts
i built the agentic loop from scratch. no LangChain. it's a while True loop: the model runs, decides what to call, you execute the tool, feed the result back, repeat. if it returns stop, you're done. once you see it that clearly, the magic becomes mechanics.
the hardest part wasn't the code. it was the system prompt. without strict prompting, the model would confidently answer from training data and completely bypass the retrieval pipeline. i had to engineer it to cite every claim by chunk ID and refuse outside knowledge explicitly. took a lot of iteration to make it refuse cleanly instead of hedging.
what i learned
building without LangChain forced me to understand what an agent actually is. it's just a loop where the LLM decides what to call next. and prompt engineering isn't decoration. it's load-bearing. the difference between a reliable system and a confident but wrong one is almost entirely in the prompt.
nyc yellow taxi analytics pipeline
2.96 million taxi trips, cleaned properly, loaded into postgres, and queryable in plain english.
what it does
a modular ETL pipeline that ingests january 2024 NYC taxi data, audits it for 5 distinct defect classes, removes 8.17% of bad rows with documented rationale, and loads clean data into PostgreSQL. the dashboard shows revenue KPIs, pickup zone performance, and payment breakdowns. there's also a text-to-SQL interface where you can ask questions in plain english.
the interesting parts
the data quality audit was where most of the real thinking happened. bad data isn't random. it comes in patterns. negative fares, trips with dropoff before pickup, passenger counts above the legal NYC limit of 6. finding each pattern and deciding whether to remove or investigate it is real data engineering, not just df.dropna().
the text-to-SQL interface uses a JSON output contract: ok / clarify / refuse / error. it sounds like over-engineering. it's actually the only way to make it reliable. the model gets the schema, domain knowledge, and explicit rules for what to return in each case. vague prompts return confident wrong SQL. schema-aware prompts with output contracts return SQL you can actually run.
what i learned
documenting why you removed data matters as much as removing it. and the text-to-SQL project taught me that LLM reliability is almost entirely a prompting problem. the model can write correct SQL. you just have to tell it exactly what correct means.
superstore sales dashboard
started as "let me practice SQL." ended with a finding that a business's entire margin problem was caused by broken discount policy.
what it does
an interactive BI dashboard on the kaggle superstore dataset. sales trends, regional and category performance, discount analysis, customer profitability, and a 12-month prophet forecast. everything queryable through filters, all built on SQLite with real SQL: CTEs, window functions, basket analysis.
the interesting parts
the margin calculation bug is the thing i always tell people about. my first result showed Binders at -20% margin. sounds alarming. switching from AVG(profit/sales) to SUM(profit)/SUM(sales) showed +14.86%. same data, different formula, completely opposite conclusion. that's when i understood why SQL fluency actually matters.
the discount analysis was the most revealing part. the business makes 29.5% margin on undiscounted sales. the overall 12.47% is almost entirely explained by discounting above the break-even threshold of ~25%. the most surprising finding: volume didn't increase with heavier discounts at all. 3.74 units per order at 20% discount vs 3.96 at 80%. the business was giving away margin for nothing.
what i learned
the wrong aggregation gives you the wrong business story. and sometimes the most interesting finding isn't what you set out to find. i started this to practice SQL and ended up discovering a broken pricing policy. the data had the answer, i just had to ask the right question.
sleep health & lifestyle dashboard
what actually predicts whether someone has a sleep disorder. and can you build an interactive predictor around it?
what it does
an exploratory ML dashboard on 374 people's sleep and lifestyle data. filter by occupation or gender and watch the charts update live. or go to the predict page, enter your own numbers, and see what the model says about your sleep disorder risk. with real-time probability bars.
the interesting parts
R² = 0.91 on a single train/test split. looked incredible. then i ran 5-fold cross-validation and got 0.60 with high variance. the first result was a lucky split. that's a lesson you remember. always use cross-validation before celebrating.
58% of the Sleep_Disorder column was NaN. looks alarming. but after checking the dataset description, NaN meant "no diagnosis". not missing data. a small distinction with a big impact on the model. the stress correlation at r = -0.90 was stronger than i expected too. it basically drowns out everything else as a predictor.
what i learned
never trust a single train/test split. the cross-validation humbled me. also: always check what missing data actually means before you fill or drop it. and engineers being the least stressed occupation in the dataset was a genuine surprise. i assumed the opposite. always check the data before writing conclusions.