Skip to Content
Back to Blog

Resume Helper: Technical Documentation

December 17, 2025
PythonArchitectureLLMPrivacy

Abstract

This document describes Resume Helper, an AI-powered resume building application I built while learning about software architecture and LLM integration. The project explores how to structure a Python application using layered architecture, support multiple AI providers without vendor lock-in, and protect user privacy when working with cloud AI services.

The goal of this document is to share the thinking process behind the design decisions. What problems I encountered, what options I considered, and why I chose specific approaches. The complete source code is available on GitHub for those who want to dive into the implementation details.

Keywords: Layered Architecture, LLM Integration, Privacy, Python, Gradio, Resume Builder

App Demo: https://www.youtube.com/watch?v=SQgfXfSYLac

GitHub Repository: https://github.com/gibbenergy/Resume_Helper


Table of Contents

  1. Introduction
  2. Architecture Overview
  3. Why Layered Architecture?
  4. Two-Step AI Workflow
  5. Privacy Design and Cost Savings
  6. Supporting Multiple AI Providers
  7. Why Not LangChain?
  8. Technology Stack Decisions
  9. Application Tracker System
  10. Cost Tracking
  11. Lessons Learned
  12. References

1. Introduction

1.1 The Problem I Wanted to Solve

When I was job hunting, I noticed a few frustrations:

  1. Tailoring resumes is tedious: Every job posting is different, and customizing my resume for each one took hours.

  2. Privacy concerns with AI tools: I wanted to use ChatGPT or Claude to help optimize my resume, but that meant sending my full name, phone number, email, and home address to cloud servers. That felt uncomfortable.

  3. Vendor lock-in: Most AI tools I found only worked with one provider (usually OpenAI). What if I wanted to try a cheaper option? Or run everything locally?

  4. Tracking applications is messy: I was using spreadsheets to track where I applied, interview stages, and follow-ups. It was getting hard to manage.

1.2 What I Built

Resume Helper tries to address these issues:

  • Privacy-first approach: Personal information is stripped out before sending anything to AI, then restored only for the final document
  • Multiple AI providers: Works with OpenAI, Anthropic, Google, Groq, Perplexity, xAI, and local models via Ollama
  • Built-in application tracking: A simple system to track job applications, interview stages, and related documents
  • Cost visibility: Shows how much each AI operation costs so there are no surprises

1.3 Document Purpose

This document focuses on the why behind decisions rather than the how of implementation. For code details, please refer to the GitHub repository. Here I want to explain:

  • Why I organized the code the way I did
  • What trade-offs I considered
  • What I learned along the way

2. Architecture Overview

2.1 High-Level System View

The application has several main components that work together:

flowchart TB
    Browser["🌐 Browser"]

    Tabs["🖥️ UI: 10 Tabs"]
    RW["⚙️ Resume Workflows"]
    AW["📋 App Tracking"]
    Schema["📦 Data Schemas"]

    LLM["🤖 AI Provider"]
    Privacy["🔒 Privacy"]
    Cost["💰 Cost Tracker"]
    DB["💾 Database"]
    PDF["📄 PDF Gen"]

    SQLite[("🗄️ SQLite")]
    Local["🦙 Ollama"]
    Cloud["☁️ Cloud AI"]

    Browser <--> Tabs
    Tabs <--> RW
    Tabs <--> AW
    RW <--> Schema
    AW <--> Schema
    RW <--> LLM
    RW <--> Privacy
    LLM <--> Cost
    AW <--> DB
    DB <--> SQLite
    LLM --> Local
    LLM --> Cloud

    style Tabs fill:#1a5276,stroke:#64ffda,color:#fff
    style RW fill:#1e8449,stroke:#64ffda,color:#fff
    style AW fill:#1e8449,stroke:#64ffda,color:#fff
    style Schema fill:#b9770e,stroke:#64ffda,color:#fff
    style LLM fill:#7d3c98,stroke:#64ffda,color:#fff
    style Privacy fill:#7d3c98,stroke:#64ffda,color:#fff
    style Cost fill:#7d3c98,stroke:#64ffda,color:#fff
    style DB fill:#7d3c98,stroke:#64ffda,color:#fff
    style PDF fill:#7d3c98,stroke:#64ffda,color:#fff

2.2 What Each Part Does

Component Responsibility
User Interface Display forms, tables, buttons. Collect user input. Show results.
Business Logic Coordinate operations. Decide what happens when user clicks a button.
Data Schemas Define what resume and application data looks like. Validate inputs.
AI Provider Interface Talk to various AI services through a single interface.
Privacy Manager Remove/restore personal information before/after AI calls.
Cost Tracker Calculate and log how much each AI operation costs.
Database Access Store and retrieve job application data.
PDF Generator Create resume and cover letter PDFs from HTML templates.

3. Why Layered Architecture?

3.1 The Problem with "Quick and Dirty" Code

When I started, I wrote code the simplest way possible. UI code directly calling AI APIs, database queries mixed with display logic. It worked, but as the project grew, I ran into issues:

graph TD
    title["⚠️ Initial Approach - Problems"]
    UI["UI Code"]
    DB["Database Code"]
    API["API Code"]

    title ~~~ UI
    UI <--> DB
    DB <--> API
    UI <--> API

    style title fill:none,stroke:none,color:#e74c3c
    style UI fill:#922b21,stroke:#e74c3c,color:#fff
    style DB fill:#922b21,stroke:#e74c3c,color:#fff
    style API fill:#922b21,stroke:#e74c3c,color:#fff

What went wrong:

  • Changing how I stored data meant updating UI code too
  • Testing AI logic required running the whole application
  • Adding a new AI provider meant changes in multiple places
  • It was hard to remember where things were

3.2 Organizing by Responsibility

I reorganized the code so each part has one job:

graph TB
    P["🖥️ Presentation Layer"]
    A["⚙️ Application Logic"]
    D["📦 Domain Layer"]
    I["🔌 Infrastructure"]

    P -->|"calls"| A
    A -->|"uses"| D
    A -->|"uses"| I

    style P fill:#1a5276,stroke:#64ffda,color:#fff
    style A fill:#1e8449,stroke:#64ffda,color:#fff
    style D fill:#b9770e,stroke:#64ffda,color:#fff
    style I fill:#7d3c98,stroke:#64ffda,color:#fff

This follows Clean Architecture principles where dependencies point inward:

  • 🖥️ Presentation Layer: The UI that users interact with. Handles displaying data and capturing user input. Knows nothing about databases or AI providers.

  • ⚙️ Application Logic: Orchestrates the workflows. When user clicks "Generate Resume", this layer coordinates the steps: validate input, call AI, save results. Contains no UI code or database queries.

  • 📦 Domain Layer: The core business rules and data structures. Defines what a Resume or Application looks like. Has zero dependencies on external systems.

  • 🔌 Infrastructure: Talks to the outside world: AI APIs, database, file system. Can be swapped without touching business logic.

3.3 How This Helped

Before After
Change database --> Update UI code Change database --> Only update repository
Test AI logic --> Run full app Test AI logic --> Run just the workflow
Add new AI provider --> Hunt through codebase Add new provider --> Update one file
"Where's the code for X?" --> Search everywhere "Where's the code for X?" --> Check the right layer

3.4 Project Structure Mapping

Here's how the folders map to each layer:

Layer Folders
Presentation presentation/ui/tabs/, app.py
Application workflows/, presentation/operations/
Domain models/
Infrastructure infrastructure/providers/, infrastructure/repositories/, infrastructure/generators/, infrastructure/frameworks/
Shared Utilities utils/

4. Two-Step AI Workflow

4.1 The Problem with Raw User Input

Users enter resume data in various ways:

  • Inconsistent formatting
  • Verbose descriptions with filler words
  • Mixed bullet points and paragraphs
  • Irrelevant details that don't help the application

If we send this raw data directly to AI for resume tailoring, two problems occur:

  1. Wasted tokens: Filler content means more input tokens (cost) and more output tokens
  2. Diluted attention: LLMs have limited context windows. Irrelevant information competes for the model's attention, potentially reducing output quality

4.2 The Two-Step Solution

Instead of one big AI call, we split the workflow into two steps:

flowchart TB
    Input["📝 Raw Input"]
    Clean["🤖 AI: Extract & Clean"]
    Structured["✅ Clean Data"]
    Resume["📄 Tailor Resume"]
    Cover["💼 Cover Letter"]
    Suggest["💡 Suggestions"]

    Input --> Clean
    Clean --> Structured
    Structured --> Resume
    Structured --> Cover
    Structured --> Suggest

    style Input fill:#922b21,stroke:#e74c3c,color:#fff
    style Clean fill:#1a5276,stroke:#64ffda,color:#fff
    style Structured fill:#1e8449,stroke:#27ae60,color:#fff
    style Resume fill:#1a5276,stroke:#64ffda,color:#fff
    style Cover fill:#1a5276,stroke:#64ffda,color:#fff
    style Suggest fill:#1a5276,stroke:#64ffda,color:#fff

4.3 Why This Design?

Step 1 Benefits:

  • Removes irrelevant content ONCE, not three times
  • Produces a clean baseline that all subsequent operations use
  • Focuses AI attention on key achievements and skills
  • Standardizes data format so Step 2 prompts can be simpler

Step 2 Benefits:

  • Three operations can run in parallel (faster)
  • Each receives focused, concise input (better quality)
  • Smaller input tokens for each call (lower cost)
  • Easier to debug since each operation is independent

4.4 Token Savings Example

Consider a resume with verbose descriptions:

Metric Without Cleanup With Two-Step
Raw input 2,000 tokens 2,000 tokens
After Step 1 cleanup - 800 tokens
Step 2 input (per operation) 2,000 tokens 800 tokens
Total input for 3 operations 6,000 tokens 800 + (800 x 3) = 3,200 tokens
Savings per application - ~2,800 tokens (~47%)

Over 100 job applications, this adds up significantly.


5. Privacy Design and Cost Savings

5.1 The Concern

When you ask an AI to "make my resume better," you typically send:

  • Your full name
  • Email address
  • Phone number
  • Home address
  • LinkedIn, GitHub URLs
  • Sometimes citizenship status

This data goes to cloud servers. Even if providers promise not to train on it, I wanted to minimize what gets sent.

5.2 The Approach: Sanitize, Process, Restore

The idea is simple: before sending anything to AI, remove personal information. After getting the AI response, put it back.

flowchart LR
    Input["📄 Resume"]
    PM1["🔒 Strip Info"]
    Personal["👤 Personal Info<br/>(stored locally)"]
    Clean["📝 Work Content"]
    AI["☁️ Cloud AI"]
    Results["📊 AI Results"]
    PM2["🔒 Restore Info"]
    Output["📄 Final Output"]

    Input --> PM1
    PM1 -->|"remove"| Personal
    PM1 -->|"send"| Clean
    Clean --> AI
    AI -->|"resume, cover letter,<br/>suggestions, skill gaps"| Results
    Results --> PM2
    Personal -->|"append back"| PM2
    PM2 --> Output

    style Personal fill:#922b21,stroke:#e74c3c,color:#fff
    style Clean fill:#1e8449,stroke:#27ae60,color:#fff
    style AI fill:#1a5276,stroke:#64ffda,color:#fff
    style PM1 fill:#7d3c98,stroke:#64ffda,color:#fff
    style PM2 fill:#7d3c98,stroke:#64ffda,color:#fff
    style Results fill:#1a5276,stroke:#64ffda,color:#fff

5.3 What Gets Removed

The Privacy Manager removes these fields before any AI call:

  • Name prefix (Dr., Mr., Ms.)
  • Full name
  • Email
  • Phone
  • Current address / Location
  • Citizenship
  • LinkedIn URL
  • GitHub URL
  • Portfolio URL

The AI only sees your work descriptions, skills, and achievements. The professional content that actually needs optimization.

5.4 Cost Savings from Privacy

Removing personal information also saves money:

Field Removed Approximate Tokens Saved
Full name 2-5 tokens
Email 5-10 tokens
Phone 5-8 tokens
Address 10-20 tokens
LinkedIn URL 10-15 tokens
GitHub URL 8-12 tokens
Portfolio URL 8-15 tokens
Total per call ~50-85 tokens

This seems small, but consider:

  • Each job application has 3-4 AI operations (analysis, tailoring, cover letter, suggestions)
  • 100 job applications = 300-400 AI calls
  • Token savings: 50-85 tokens x 400 calls = 20,000-34,000 tokens saved
  • At typical cloud AI pricing (~$0.03/1K input tokens): $0.60-$1.00 saved

More importantly, this is data that doesn't need to leave your computer.

5.5 Validation Step

Before sending anything, there's a validation step that checks the sanitized data to make sure no personal fields slipped through. If any are found, the operation stops and logs a warning.

5.6 The Local Alternative: Ollama

For those who want complete privacy, the application supports Ollama. A way to run AI models entirely on your own computer:

graph TB
    App["🖥️ Resume Helper"]

    App --> Cloud
    App --> Local

    subgraph Cloud ["☁️ CLOUD OPTION"]
        C1["📚 High Context Window"]
        C2["🧠 Strong Reasoning"]
        C3["📰 Current Information"]
        C4["💸 Pay per Token"]
    end

    subgraph Local ["🦙 LOCAL OPTION - Ollama"]
        L1["💰 FREE"]
        L2["🔒 100% Private"]
        L3["📴 Works Offline"]
        L4["⚠️ Needs ~12GB VRAM"]
    end

    style C1 fill:#1a5276,stroke:#64ffda,color:#fff
    style C2 fill:#1a5276,stroke:#64ffda,color:#fff
    style C3 fill:#1a5276,stroke:#64ffda,color:#fff
    style C4 fill:#b9770e,stroke:#f39c12,color:#fff
    style L1 fill:#1e8449,stroke:#27ae60,color:#fff
    style L2 fill:#1e8449,stroke:#27ae60,color:#fff
    style L3 fill:#1e8449,stroke:#27ae60,color:#fff
    style L4 fill:#b9770e,stroke:#f39c12,color:#fff

Trade-offs with Ollama:

  • Zero data leaves your computer
  • No API keys or accounts needed
  • No usage costs
  • Works offline
  • Requires ~14GB disk space
  • Slower than cloud APIs (depends on your hardware)

6. Supporting Multiple AI Providers

6.1 The Problem

The AI landscape evolves at breakneck speed. New models launch every few months, each claiming better reasoning, larger context windows, or lower costs. Competition among providers is fierce:

  • GPT-5 pushes boundaries, then Claude 4 responds with stronger reasoning
  • Gemini 3 offers massive context, Groq delivers lightning-fast inference
  • A local model via Ollama might be "good enough" for simple tasks at zero cost

Users deserve choice. The best model today might not be the best tomorrow. Locking into a single provider means missing out when a competitor releases something better, cheaper, or faster.

Platform neutrality is the solution. The application should not care which AI powers it. Users pick the best tool for their needs at any moment, without code changes.

The challenge? Each provider has a completely different API format. Without abstraction, the code becomes a mess of if-else statements.

6.2 The Solution: LiteLLM + Wrapper

I chose LiteLLM, an open-source library that provides a unified interface to 100+ LLM providers. Then I wrapped it in a provider class that handles:

  • Provider switching at runtime
  • Model selection
  • Special handling for different provider quirks (like Ollama not supporting JSON mode)
  • Parsing responses from "reasoning" models that include their thinking
🔗 LiteLLM GPT-5 Claude 4 Gemini 3 🦙 Ollama Groq 𝕏 Grok

🔄 Switch anytime 💰 Cost tracked per call

6.3 Supported Providers

Provider Type Notes
Ollama Local Free, private, requires disk space
OpenAI Cloud GPT-5, GPT-4o, etc.
Anthropic Cloud Claude models
Google Cloud Gemini models
Groq Cloud Fast inference
Perplexity Cloud Search-augmented
xAI Cloud Grok models

6.4 How Provider Switching Works

flowchart LR
    User["👤 User"]
    Select["🔄 Select Provider"]
    Process["⚙️ AI Process"]
    Track["💰 Cost Logged"]
    Result["📄 Result"]

    User -->|"dropdown"| Select
    Select -->|"any provider"| Process
    Process --> Track
    Track --> Result

    style Select fill:#1e8449,stroke:#27ae60,color:#fff
    style Track fill:#b9770e,stroke:#f39c12,color:#fff
    style Result fill:#1a5276,stroke:#64ffda,color:#fff

6.5 Why This Matters

  • No lock-in: If OpenAI raises prices, switch to Groq with one dropdown change
  • Try before you commit: Test different providers to see which works best for your use case
  • Local option: Use Ollama when privacy matters most or you don't want to pay
  • Future-proof: New providers can be added without changing application logic

7. Why Not LangChain?

7.1 What is LangChain?

LangChain is a popular framework for building LLM applications. It provides chains, agents, memory management, and abstractions over different providers.

7.2 Why I Chose Not to Use It

This was a deliberate decision after trying it initially:

Consideration LangChain Direct Approach
Complexity Multiple abstraction layers Straightforward API calls
Debugging Hard to trace through chains Clear, linear flow
Dependencies Heavy (100+ packages) Minimal
Speed Extra abstraction layers (LangChain → LiteLLM → API) add latency Direct API calls via LiteLLM
Token usage Chain prompts add overhead Single prompt

7.3 The Key Insight: Speed Matters

The longest operation in this application is AI processing. Every abstraction layer adds latency. With LangChain, you have:

  • Abstraction within LangChain itself (chains, templates, parsers)
  • Another abstraction layer between LangChain and LiteLLM
  • Then LiteLLM to the actual API

That's multiple layers of indirection, each adding processing time. Modern LLM APIs are already sufficient and well-designed. There's no need to overengineer with additional abstraction layers that reduce speed.

Modern LLMs (GPT-5, GPT-4o, Claude Opus, Gemini) have become very good at following instructions. In 2022, you might have needed complex prompt chaining to get consistent outputs. In 2025, you can often just ask clearly and get good results.

For this application, I found that:

  • A well-written prompt works better than a complex chain
  • JSON mode handles structured outputs reliably
  • Direct API calls via LiteLLM are faster than adding LangChain on top
graph TD
    subgraph framework ["❌ Framework: 6 Steps"]
        A1[Input] --> B1[Template] --> C1[Chain] --> D1[Parser] --> E1[Memory] --> F1[Result]
    end

    subgraph direct ["✅ Direct: 4 Steps"]
        A2[Input] --> B2[Prompt] --> C2[API] --> D2[Result]
    end

    style A1 fill:#922b21,stroke:#e74c3c,color:#fff
    style B1 fill:#922b21,stroke:#e74c3c,color:#fff
    style C1 fill:#922b21,stroke:#e74c3c,color:#fff
    style D1 fill:#922b21,stroke:#e74c3c,color:#fff
    style E1 fill:#922b21,stroke:#e74c3c,color:#fff
    style F1 fill:#922b21,stroke:#e74c3c,color:#fff

    style A2 fill:#1e8449,stroke:#27ae60,color:#fff
    style B2 fill:#1e8449,stroke:#27ae60,color:#fff
    style C2 fill:#1e8449,stroke:#27ae60,color:#fff
    style D2 fill:#1e8449,stroke:#27ae60,color:#fff

7.4 When LangChain Makes Sense

To be fair, LangChain is valuable for:

  • Complex multi-step agent workflows
  • Applications needing conversation memory
  • RAG (Retrieval Augmented Generation) systems
  • Rapid prototyping with many different tools

For a focused application like this, the simpler approach worked better.


8. Technology Stack Decisions

8.1 Overview

Component Choice Alternatives Considered Why This Choice
Language Python - Rich AI/ML ecosystem, personal familiarity
UI Framework Gradio Streamlit, Flask+React Hugging Face integration, future LLM model support
AI Integration LiteLLM Direct SDKs, LangChain Unified interface, lightweight
Database SQLite PostgreSQL, MongoDB Personal use, small-medium data, simple setup
ORM SQLAlchemy Raw SQL, Peewee Widely used, good documentation
PDF Generation Playwright WeasyPrint, ReportLab Simpler setup, accurate CSS rendering

8.2 Why Gradio?

I considered several options for the UI:

Streamlit: A strong alternative with good data app support.

Flask + React: More control, but significantly more work. I didn't need a full SPA.

Gradio: Originally designed for ML demos, but works well for form-heavy applications. Built-in components for tables, file uploads, tabs. The key advantage is Gradio's integration with Hugging Face ecosystem, which opens possibilities for future LLM model integration and easy sharing via Hugging Face Spaces.

Trade-offs:

  • Less flexible than custom frontend
  • Styling options are limited

8.3 Why SQLite?

For a personal desktop application that stores job applications:

  • Personal use case: This is a single-user tool, not a multi-user web service
  • Small to medium data size: Even with hundreds of job applications, the database stays manageable
  • No server setup: Users don't need to install or configure PostgreSQL or MySQL
  • Simplicity: No complicated database configuration or administration
  • Single file: Database is just a .db file, easy to backup or move
  • Built into Python: No additional dependencies needed

SQLite handles the expected data volume well. A typical job search might involve 100-500 applications, which SQLite manages easily.

8.4 Why Playwright for PDFs?

Generating PDFs from HTML was surprisingly tricky:

WeasyPrint: Faster rendering and lighter weight. However, it requires separate installation beyond Python (system-level dependencies like cairo, pango, etc.), and the setup can be sophisticated depending on the operating system.

ReportLab: Low-level, requires building PDFs programmatically.

Playwright: Uses a real browser engine (Chromium), so CSS renders exactly as expected. The setup is simpler: just one command (playwright install chromium) and it works. Downside is it's heavier (downloads browser binaries).

For resume PDFs where appearance matters and cross-platform setup simplicity is important, Playwright's approach was the better trade-off.

8.5 Why a Classic Template?

The application uses a single classic resume template by design, not as a limitation.

The ATS Reality: Most resumes today are processed by ATS (Applicant Tracking Systems) that use AI parsers to store and retrieve candidate information. These systems rely on OCR (Optical Character Recognition) and multimodal AI to extract structured data from resumes.

Why Classic Works: The classic template is widely recognized and tested. It uses:

  • Standard section headers (Education, Experience, Skills)
  • Simple, linear layout
  • Standard fonts and formatting
  • Clear visual hierarchy

This ensures that company OCR systems and multimodal AI can read the resume correctly, avoiding parsing errors that could hurt job opportunities.

The Trade-off: While multiple templates might look more appealing visually, ATS compatibility takes priority. A beautifully designed resume that fails to parse correctly in an ATS system is worse than a simple, parseable one.


9. Application Tracker System

9.1 Why Build This?

I was tracking applications in a spreadsheet, but it got messy:

  • Which companies had I applied to?
  • What stage was each application in?
  • When should I follow up?
  • Where did I put that tailored resume?

Building a tracker into the app solved these problems and let me attach documents to each application.

9.2 Data Model

flowchart LR
    subgraph app ["📋 APPLICATION"]
        A1["Company + Position"]
        A2["Status + Priority"]
        A3["Match Score"]
        A4["Interview Pipeline"]
    end

    subgraph doc ["📎 DOCUMENTS"]
        D1["Resume PDF"]
        D2["Cover Letter"]
        D3["Notes"]
    end

    app -->|"1 : many"| doc

    style A1 fill:#1a5276,stroke:#64ffda,color:#fff
    style A2 fill:#1a5276,stroke:#64ffda,color:#fff
    style A3 fill:#1a5276,stroke:#64ffda,color:#fff
    style A4 fill:#1a5276,stroke:#64ffda,color:#fff
    style D1 fill:#1e8449,stroke:#27ae60,color:#fff
    style D2 fill:#1e8449,stroke:#27ae60,color:#fff
    style D3 fill:#1e8449,stroke:#27ae60,color:#fff

Each Application record stores the job details (company, position, URL), tracking info (status like "Applied" or "Interview Scheduled", priority level), an AI-generated match score, and the interview pipeline stages. Multiple Documents can be attached to each application: the tailored resume, generated cover letter, and any notes. This one-to-many relationship keeps everything organized per job opportunity.

9.3 Interview Pipeline

Each application can track multiple interview rounds:

stateDiagram-v2
    [*] --> Applied

    Applied --> PhoneScreen: Scheduled
    PhoneScreen --> Technical: Passed
    Technical --> Panel: Passed
    Panel --> ManagerRound: Passed
    ManagerRound --> FinalRound: Passed

    FinalRound --> Offer: Accepted!
    FinalRound --> Rejected: Declined

    Applied --> Rejected: No response
    PhoneScreen --> Rejected: Failed
    Technical --> Rejected: Failed
    Panel --> Rejected: Failed
    ManagerRound --> Rejected: Failed

9.4 Why Hash Job URLs?

Each application gets a unique ID by hashing the job URL. This means:

  • Same job URL = Same ID = Prevents duplicates
  • No need for auto-increment IDs that might conflict
  • Deterministic: same input always gives same ID

10. Cost Tracking

10.1 Why Track Costs?

LLM APIs charge per token (roughly per word). Without visibility:

  • You might get a surprise bill
  • You can't compare provider costs
  • You don't know which operations are expensive

10.2 How It Works

flowchart TB
    subgraph "Each AI Call"
        Call[API Call]
        Response[Response includes token counts]
    end

    subgraph "Cost Tracking"
        Tracker[Cost Tracker]
        Pricing[LiteLLM Pricing Data]
        Calculate[Calculate Cost]
        Log[Log to File]
        Display[Show in UI]
    end

    Call --> Response
    Response --> Tracker
    Pricing --> Calculate
    Tracker --> Calculate
    Calculate --> Log
    Log --> Display

The cost tracker:

  1. Intercepts the response from each AI call
  2. Extracts token counts (prompt tokens, completion tokens)
  3. Looks up pricing from LiteLLM's pricing database
  4. Calculates cost and adds to running total
  5. Persists to a JSON file
  6. Displays in the UI

10.3 Cost Estimates

Operation Typical Cost (GPT-5 / GPT-4o)
Resume tailoring $0.01-0.03
Job analysis $0.01-0.02
Cover letter $0.02-0.04
Suggestions $0.01-0.02
Per application (all 4) $0.05-0.11
100 applications $5-11
With Ollama $0.00

Having visibility helps decide when to use local models instead.


11. Lessons Learned

11.1 What Worked Well

Layered architecture: Initial setup took longer, but the Clean Architecture structure enables ease of modification, scaling up code, better organization, and easier bug catching. The separation of concerns means each layer has clear responsibilities, making it straightforward to locate issues and make changes without affecting other parts of the system. This structure also supports business workflows effectively. Features like cost tracking and cost awareness demonstrate how the architecture can be tailored to real business applications, where understanding operational costs and resource usage is critical for decision-making.

Two-step workflow: Cleaning data first, then running parallel operations saved tokens and improved quality. The AI gets focused input instead of verbose noise.

Privacy by design: Building privacy protection from the start was much easier than it would have been to add later. The sanitize/merge pattern is simple and reliable.

Schema definitions: Defining data structures upfront prevented many bugs. When AI returns inconsistent field names, the schema engine normalizes them.

Direct API calls: Skipping LangChain kept things simple. Debugging is straightforward. I can log exactly what goes to the AI and what comes back.

11.2 What I'd Do Differently

More async processing: Currently, AI calls are synchronous. For operations that don't depend on each other (like analyzing multiple sections), parallel calls would be faster.

Better error handling: Some edge cases could be handled more gracefully, especially network errors during AI calls.

11.3 Trade-offs I Accepted

Choice Gained Gave Up
Gradio over custom UI Fast development Full styling control
SQLite over PostgreSQL Simple deployment Multi-user support
Synchronous AI calls Simpler code Speed for parallel operations
No LangChain Simplicity, control Framework features

12. References

  1. Martin, R. C. (2017). Clean Architecture: A Craftsman's Guide to Software Structure and Design. Prentice Hall.

  2. LiteLLM Documentation. https://docs.litellm.ai/

  3. Gradio Documentation. https://www.gradio.app/docs/

  4. SQLAlchemy Documentation. https://docs.sqlalchemy.org/

  5. Ollama. https://ollama.com/


Last Updated: December 2025