Hi, I am

Tanmay Tiwari

Aerospace Engineer / AI & Data Science

View Work → Resume ↓

Available for collaboration · Kothri Kalan, IN · 2026

01 / Work

Selected Projects

ML / AI

🌲

Student Productivity Classifier

PythonScikit-learnRandom Forest

GitHub ↗ Read Case Study →

AEROSPACE

🌡️

ISA Atmospheric Calculator

PythonNumPyMatplotlib

GitHub ↗ Read Case Study →

LIVE

🎵

Vibe-Tunes

StreamlitPandasUX

GitHub ↗ Read Case Study →

DATA VIZ

📊

Habit Tracker

PythonOOPMatplotlib

GitHub ↗ Read Case Study →

TOOL

📝

Utility Scripts (To-Do)

PythonCLI

GitHub ↗ Read Case Study →

LIVE

💰

Expense Tracker

FastAPIStreamlitMySQL

GitHub ↗ Read Case Study →

02 / Blog

Project Case Studies

CASE STUDY 01 · MACHINE LEARNING · 2026

Student Productivity Classifier

A Random Forest classifier that predicts productivity tiers — Low, Medium, High — from four lifestyle features. My first complete end-to-end ML project.

GitHub Repo ↗ All Projects

What it does

This project is a supervised machine-learning classifier that takes four numerical lifestyle inputs — average sleep hours, study hours, screen time, and exercise frequency — and predicts the user's productivity tier: Low, Medium, or High. The underlying model is a Random Forest, an ensemble of decision trees that aggregates predictions across many weak learners to produce a robust final classification.

The pipeline runs the full ML lifecycle locally: data ingestion, train-test stratification, model fitting, evaluation, and prediction on unseen inputs. There is no web interface yet — the project lives as a Jupyter-style notebook plus supporting Python modules.

Why I built it

I wanted a project that would force me to walk through every stage of a real machine-learning workflow, not just the modeling step. Most introductory ML tutorials hand you clean data and a fit-predict-score loop. I wanted the messier reality: deciding what counts as a feature, choosing how to stratify the train-test split, picking an algorithm appropriate to the data size, and reading what the trained model actually tells you about the underlying domain.

The topic — student productivity — was deliberate. As an engineering student preparing for both academic exams and defense services, I am genuinely curious about which daily habits actually correlate with productive output. Building the model was a way to ask that question with code instead of intuition.

How I built it

The implementation walks through several deliberate stages. First, I structured the input data with four features and one target label — three productivity classes encoded as integers. Next came exploratory analysis in Pandas: distribution checks, missing-value audits, and basic correlation inspection between each feature and the target.

For modeling, I chose Scikit-learn's RandomForestClassifier over a single decision tree because Random Forests handle small datasets with mixed feature scales gracefully, resist overfitting through bagging, and expose feature-importance scores that turn the model into an interpretability tool, not just a prediction engine. The data was split using stratified sampling so each productivity class was proportionally represented in both train and test sets — critical when class sizes are uneven.

Evaluation went beyond raw accuracy. I read the confusion matrix to see which classes the model confused with which, since misclassifying "High" as "Low" is a different kind of error from confusing two adjacent tiers. Finally, I extracted feature-importance rankings to surface which lifestyle inputs the trees were actually leaning on.

What I learned

Three lessons shaped the rest of my ML work. Feature engineering matters more than algorithm choice at this scale — a thoughtful feature design and a clean train-test split beat algorithm hyperparameter tuning every time on small datasets. Interpretability is not optional — a classifier you cannot explain is a classifier you cannot trust, and Random Forest's feature-importance output is one of the cleanest interpretability handles in scikit-learn. Evaluation is multidimensional — accuracy is a single number; the confusion matrix and per-class precision/recall tell you what kinds of mistakes the model is actually making.

Real-world relevance

The specific use case — student productivity prediction — is modest. But the architecture generalizes to any problem where you want to predict a categorical outcome from a small set of numerical lifestyle, sensor, or operational inputs: fatigue prediction in pilots, fitness-readiness scoring for athletes, classification of engineering-system health states from sensor telemetry. The pattern of "ingest features → stratified split → Random Forest → interpret importance" is identical across all of these.

For me personally, the model became a baseline I will return to in aerospace contexts: any time I want to classify a discrete physical or behavioral state from a small feature set, Random Forest is now my default starting point.

Stack

Python Scikit-learn Pandas NumPy Matplotlib Jupyter

CASE STUDY 02 · AEROSPACE × COMPUTATION · 2026

ISA Atmospheric Calculator

A Python implementation of the International Standard Atmosphere from 0 to 86 km. Computes temperature, pressure, and density across every atmospheric layer, with Matplotlib visualizations validated against published reference tables.

GitHub Repo ↗ All Projects

What it does

The tool models the International Standard Atmosphere (ISA) — the reference atmosphere agreed upon by aviation, aerospace, and meteorological bodies as a baseline for engineering calculations. Given any altitude between sea level and 86 kilometers, the program returns three values: temperature, atmospheric pressure, and air density at that altitude.

The atmosphere is divided into seven layers, each with its own lapse rate (how temperature changes with altitude). The program detects which layer your input altitude falls in, applies the correct lapse-rate equation for that layer, and computes the corresponding thermodynamic state. It then plots the full vertical profile — temperature, pressure, and density vs. altitude — using Matplotlib.

Why I built it

I was studying aerospace fundamentals and kept hitting the same problem: every aerodynamics, propulsion, and flight-mechanics formula assumes you already know the atmospheric conditions at your flight altitude. The textbook ISA tables give you discrete values at fixed altitudes, but the real calculations want continuous values at any altitude in between. Looking up and interpolating tables by hand is slow and error-prone.

I wanted a tool that I owned, that I understood from first principles, and that I could trust because I had implemented every line of the underlying physics myself. Buying or installing a black-box atmospheric library would not have taught me anything. Writing one from scratch forced me to actually understand each layer transition, each governing equation, and why the ISA is constructed the way it is.

How I built it

The structure follows the physics directly. I encoded the seven atmospheric layers as a data structure with each layer's base altitude, base temperature, base pressure, and lapse rate. The core function then takes an input altitude, identifies which layer contains it, and applies one of two governing equations depending on whether that layer's lapse rate is zero (isothermal) or non-zero.

For layers with a non-zero lapse rate, temperature is computed linearly, and pressure follows the standard barometric formula derived from hydrostatic equilibrium combined with the ideal gas law. For isothermal layers, pressure decays exponentially. Density falls out of pressure and temperature via the ideal gas equation.

I implemented the math in vectorized NumPy so the same code path computes single values or full altitude sweeps without modification. Visualization uses Matplotlib's standard plotting interface, with each thermodynamic quantity rendered on its own subplot for clarity.

Validation was the most important step. I compared my output against published ISA reference tables at standard altitudes — sea level, 11 km (tropopause), 20 km, 32 km, 47 km, 51 km, and 71 km — to confirm the model matched established values within rounding tolerance.

What I learned

This was my first project where the code was almost entirely a transcription of physics into Python, and that taught me how engineering software is supposed to be written. Boundary conditions are everything — most of the bugs I hit were not algorithmic, they were off-by-one errors at layer transitions where two governing equations meet. Validation against authoritative sources is non-negotiable — without checking against published ISA tables, I would have shipped a model with subtle errors and not known. NumPy vectorization is a force multiplier — writing the math once in a vectorized form let the same code handle a single-altitude query or a thousand-point profile sweep with no logic change.

Real-world relevance

The ISA model is the foundation underneath nearly every aerospace calculation that involves air. It feeds into: aircraft performance modeling (lift, drag, and engine thrust all depend on local air density), missile and rocket trajectory simulation, UAV altitude planning, weather-balloon and high-altitude payload design, and the calibration of flight-test instrumentation. Any pilot, aerospace engineer, or atmospheric scientist works on top of this model whether they realize it or not.

For me, this project is the foundation layer of a longer arc. I plan to extend it into flight-envelope visualization, propulsion performance estimation at altitude, and eventually a small computational tool for aircraft mission profiling.

Stack

Python NumPy Matplotlib Atmospheric Physics Engineering Mathematics

CASE STUDY 03 · RECOMMENDATION SYSTEMS · 2025

Vibe-Tunes

A mood-to-music recommendation engine. Maps emotional states onto audio feature dimensions like tempo, valence, and energy, then surfaces matching tracks through a Streamlit interface.

GitHub Repo ↗ All Projects

What it does

Vibe-Tunes is a small but complete recommendation system. The user selects an emotional state — for example, "calm," "energetic," or "focused" — and the application returns a curated list of music tracks that match that mood, drawing from a dataset annotated with audio-feature scores.

Under the hood, each emotional state is mapped to a target zone in a multi-dimensional feature space defined by audio attributes: tempo (BPM), valence (musical positivity), energy, danceability, and acousticness. The recommendation engine then filters the track dataset for entries whose feature vectors fall within that target zone.

Why I built it

This project came out of a basic frustration: existing music recommenders optimize for what you have listened to in the past, not what you actually want to feel right now. If I am about to sit down for a focused study session, my listening history from yesterday's gym workout is the wrong signal to recommend from. I wanted a system where the input is the state I want to be in, not the state I have been in.

From a learning standpoint, the project was also my entry into end-to-end application development — moving beyond standalone scripts into something with a real user interface, persistent state, and a deliverable web UI. It was my milestone project for the Codebasics Gen AI bootcamp.

How I built it

The architecture has three layers. The data layer is a Pandas DataFrame of tracks, each row carrying its audio-feature vector and metadata. The logic layer contains the mood-to-feature mapping — a dictionary that defines, for example, that "calm" corresponds to low tempo, high acousticness, and moderate valence. The mapping was hand-tuned based on documented audio-feature conventions used by mainstream music recommendation services. The presentation layer is a Streamlit app that exposes mood selection through a dropdown and renders the filtered recommendations as cards.

For the filtering itself, I used Pandas boolean indexing with tolerance windows around each target feature value — not exact matches, since real audio features sit on a continuous scale. The system also handles edge cases: empty result sets fall back to a "closest match" mode using a weighted distance calculation across feature dimensions.

What I learned

This project taught me three things I now consider non-negotiable for any user-facing system. Domain modeling comes first — the recommendation quality depended almost entirely on how thoughtfully the mood-to-feature mapping was constructed, not on the cleverness of the filter logic. UI is a design decision, not a leftover — Streamlit removed almost all the technical friction of building an interface, but the user-flow choices (single dropdown vs. sliders, card layout vs. list) materially changed how the recommendations felt. Edge cases define product quality — handling the empty-result case gracefully is what separates a demo from a usable tool.

Real-world relevance

The pattern Vibe-Tunes implements — map a high-level user intent onto a feature space, filter or rank items within that space, expose results through a clean interface — is the same pattern that underlies content recommendation across nearly every domain: music, video, e-commerce, news. The same architecture extends to non-entertainment contexts as well: matching pilots to suitable training simulators based on skill-feature vectors, recommending engineering case studies to students based on their topic-interest profile, surfacing maintenance procedures based on detected fault signatures.

Stack

Python Streamlit Pandas NumPy Feature-Space Logic

CASE STUDY 04 · BEHAVIORAL ANALYTICS · 2026

Habit Tracker

A modular Python framework for tracking habits, computing streaks and completion rates, and visualizing behavioral trends. Built with strict separation between logging, analytics, and visualization layers.

GitHub Repo ↗ All Projects

What it does

The Habit Tracker is a Python framework that records daily check-ins for a configurable set of habits and then computes three categories of behavioral insight: current streak length, overall completion rate, and trend visualization over time. Streaks tell you about momentum. Completion rates tell you about consistency. Trend plots tell you about direction — whether your discipline is improving, flat, or decaying.

The tool runs locally as a Python module. There is no web interface; the focus was deliberately on the underlying computation and architecture rather than the surface UI.

Why I built it

I had two reasons, and they reinforced each other. The first was personal: as a defense aspirant maintaining a long-running discipline routine — physical training, study blocks, technical practice — I wanted a system that measured my consistency objectively, not by my own optimistic estimation.

The second reason was architectural. I had been writing single-file Python scripts for most of my prior projects, and I wanted to deliberately build something that separated concerns into independent modules. The habit-tracker problem is small enough to fit in one file but rich enough to benefit from real modular design — making it the right project to practice object-oriented structuring on.

How I built it

The architecture is three modules, each with one job. The logging module handles the data layer — adding a habit, recording a check-in for a given date, and persisting state. The analytics module takes the logged data and computes derived metrics: current streak, longest streak, completion rate over a configurable window, and rolling averages. The visualization module consumes the analytics output and renders Matplotlib plots — a calendar-heatmap-style streak view and a line plot showing completion-rate trend.

The design rule I enforced was that each module only knew about the layer directly below it. Visualization never reads raw logs — it only consumes analytics. Analytics never writes — it only reads logs and computes. This means I can swap any layer's implementation independently. If I later want to move the logging layer from a local file to a database, the analytics and visualization code does not change at all.

What I learned

The technical lesson was about architecture before features. The first version I wrote was a single 200-line script that did everything. When I refactored it into three modules, the total line count grew slightly, but the cognitive load of each module shrank dramatically. Adding the trend-plotting feature later took an hour because I only had to touch the visualization module. If I had stayed in the single-file structure, the same feature would have required understanding the entire codebase to add safely.

The personal lesson was harder. Once I had honest data on my own consistency, I could not lie to myself about it. The first three weeks of data showed that I was substantially less consistent than I had assumed. The tool worked exactly as intended — and the discomfort that produced was its actual value.

Real-world relevance

The architecture generalizes to any time-series behavioral analytics problem where you record discrete events over time and want to derive consistency, momentum, and trend metrics. Beyond personal habit tracking, the same pattern applies to: flight-hour logging and currency tracking for pilots, training-program adherence monitoring for athletes, preventive-maintenance compliance tracking for fleet operators, and onboarding-progress dashboards in technical training programs.

The deeper point is that any organization that depends on consistent behavior — defense forces, flight schools, R&D teams — needs tooling that measures that consistency objectively. This project was my first attempt at building that kind of tooling.

Stack

Python Matplotlib Object-Oriented Design Modular Architecture

CASE STUDY 05 · PYTHON FUNDAMENTALS · 2025

Utility Scripts — Python CLI To-Do

A lightweight command-line task manager. Zero external dependencies, persistent file storage, clean state management. My first Python project where the goal was discipline, not features.

GitHub Repo ↗ All Projects

What it does

A small command-line task manager written in pure Python. Add tasks, mark them complete, delete them, list everything. Tasks persist to disk between runs through a plain-text file — close the terminal, reopen tomorrow, your list is still there.

No web UI, no database, no dependencies beyond the Python standard library. The entire project is a few well-organized modules.

Why I built it

I needed a project that would force me to think about state management rather than just syntax. Most Python tutorials at the beginner level teach loops, lists, and functions in isolation — they never make you persist data, handle edge cases, or design a clean interface between modules. A to-do list demands all three.

I also wanted something I'd actually use. Building a tool you personally rely on is the strongest motivator to keep iterating until it's good.

How I built it

The structure separates data handling from user interaction. One module owns reading and writing the persistent file. Another module owns parsing CLI arguments and printing output. Neither layer reaches into the other's responsibilities — the CLI never touches the file directly; the data module never prints anything.

Persistence is plain text, one task per line, with simple delimiters. Could have used JSON or a database, but for this scale plain text wins on simplicity and human-readability — you can open the file in any editor and see your tasks.

What I learned

Three things that shaped every Python project I built afterward. Persistence design is a real decision — JSON vs SQLite vs plain text isn't just preference, each has cost-of-complexity vs cost-of-flexibility trade-offs. Module boundaries matter even in tiny projects — the moment I split data from CLI, adding features like task priority became one-file changes instead of refactors. Zero-dependency code ages well — this project still runs on any Python install today, no pip install required, no version-compatibility worry.

Real-world relevance

The pattern — CLI parses input → data module persists state → display layer renders output — is the foundation of every command-line tool in the Python ecosystem, from pip to black to pytest. Building one yourself, even at toy scale, demystifies how those tools work internally.

Stack

Python CLI File I/O Zero Dependencies

CASE STUDY 06 · FULL-STACK · FASTAPI + STREAMLIT + MYSQL · 2026

Expense Tracker

My first full-stack app. A 5-day summer sprint that took me from "what is an API?" to shipping a real working tool with FastAPI, Streamlit, and MySQL.

GitHub Repo ↗ All Projects

The Spark

A few weeks into my Gen AI + Data Science bootcamp at Codebasics, I hit the project module. The brief was simple: build an end-to-end expense tracker. Backend, frontend, database, tests — the works.

I had two options. Follow the video tutorials line-by-line, or build the whole thing myself and only check the videos when truly stuck. I picked option two. Summer vacation, 10+ free hours a day, no excuses. This is what I learned.

What I Built

A web app where users can add, update, and delete daily expenses by date, view category-wise spending with a Plotly pie chart, see monthly trends with a bar chart, and get analytics for any custom date range.

Tech stack: Streamlit (frontend) → FastAPI (backend) → MySQL (database), with Pydantic for validation, pytest for tests, and Python's logging module writing to a server.log file.

The architecture is the standard three-layer setup:

Streamlit UI  ──HTTP──>  FastAPI  ──SQL──>  MySQL
  (port 8501)            (port 8000)        (port 3306)

Clean separation. Each layer can change without breaking the others.

The Layers, Bottom-Up

1. The Database Layer. The heart of db_helper1.py is a custom context manager:

@contextmanager
def get_db_cursor(commit=False):
    connection = mysql.connector.connect(...)
    cursor = connection.cursor(dictionary=True)
    yield cursor
    if commit:
        connection.commit()
    cursor.close()
    connection.close()

That yield is the magic. Code before it runs as setup, code after runs as cleanup — automatically, even if the with block crashes. Every CRUD function now becomes a 3-liner instead of 10 lines of boilerplate.

Two non-negotiable rules I learned the hard way: always use %s placeholders, never f-strings, for SQL parameters — f-strings open the door to SQL injection. And commit=True for any write; forget it and your INSERTs vanish silently when the connection closes. Spent an hour on this bug before it clicked.

2. The API Layer. FastAPI made this surprisingly painless. Four endpoints, each maybe 5 lines:

@app.get("/expenses/{expense_date}", response_model=List[Expense])
def get_expenses(expense_date: date):
    return db_helper1.fetch_expenses_for_date(expense_date)

The response_model=List[Expense] line does three things at once: validates the output, strips fields not in the model, and auto-documents the response shape in /docs. That last part is huge — FastAPI gives you a full Swagger UI for free at http://127.0.0.1:8000/docs. I used it more than Postman.

One trap that cost me an afternoon: I included expense_date in my Expense Pydantic model. The frontend didn't send the date in the body (it's in the URL), so every POST returned a 422 validation error. Lesson: input models and output models often need to differ.

3. The Frontend. Streamlit is wild. Pure Python, no HTML, and you get a working UI. The whole add/update form is one with st.form() block and a loop that builds 5 input rows. The bug that taught me about Streamlit's quirks:

key=f"amount_{i}"               # ❌ widgets froze on old data
key=f"amount_{selected_date}_{i}"  # ✅ fresh widgets per date

Newer Streamlit versions cache widget state by key. When I switched dates, the widgets kept showing the previous date's values. Including selected_date in the key makes each date's form independent. My instructor's tutorial code (recorded on an older Streamlit) didn't need this — a real-world reminder that libraries drift, and you'll always be patching small things.

4. Tests. pytest with a conftest.py that adds the project root to sys.path:

sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))

This single line let me write from backend import db_helper1 in test files without fuss. Pytest auto-discovers conftest.py — no imports needed.

The Bugs That Taught Me the Most

Bug	Real Lesson
TypeError: NoneType has no len()	Functions that print but don't `return` are useless to callers.
422 Unprocessable Entity	Pydantic models for input vs output should often be different.
Widget state stuck on old date	Library versions matter. Tutorials get outdated.
JSONDecodeError from frontend	I was calling Streamlit's port (8501) instead of FastAPI's (8000).
ModuleNotFoundError: mysql	Two Pythons on the machine, library installed in the wrong one. Use a venv.

Each one stung in the moment. Each one is a permanent skill upgrade.

What "Done" Looked Like

After 5 days: all four endpoints working and tested, full UI with three tabs (Add/Update, Category Analytics, Monthly Analytics), tests passing in pytest, logging writing to server.log on every operation, .env for credentials with .env.example for others and .gitignore keeping secrets out, README with screenshots, tech stack, setup steps, and API endpoint table, pushed to GitHub with a clean commit history.

The project is at the point where someone can clone it, run pip install -r requirements.txt, import the SQL dump, and have it working in under five minutes. That portability is the real deliverable.

What I'd Do Differently

Authentication. Right now every user shares one database. Adding JWT auth + a user_id column on expenses is the obvious next step.
Edit individual rows. My current POST endpoint deletes all expenses for a date and re-inserts them. It works, but a proper UPDATE would be cleaner.
Deploy it. Streamlit Cloud for the frontend, Render or Railway for the backend. Want a real shareable link.

The Bigger Takeaway

The reason building from scratch matters isn't just "you learn more." It's that you build the muscle of being stuck and getting unstuck. Tutorials let you skip that. Real projects don't.

By the end, I wasn't just copying patterns — I could read an error trace, hypothesize where it came from, fix it, and explain why. That's the gap between "watched a course" and "can ship."

If you're early in your bootcamp and tempted to follow along passively: don't. Pick a project, close the video tab, and figure it out. Come back to the videos when you genuinely need them. It's slower in the short term and dramatically faster in the long term.