Abstract
This document describes Resume Helper, an AI-powered resume building application I built while learning about software architecture and LLM integration. The project explores how to structure a Python application using layered architecture, support multiple AI providers without vendor lock-in, and protect user privacy when working with cloud AI services.
The goal of this document is to share the thinking process behind the design decisions. What problems I encountered, what options I considered, and why I chose specific approaches. The complete source code is available on GitHub for those who want to dive into the implementation details.
Keywords: Layered Architecture, LLM Integration, Privacy, Python, Gradio, Resume Builder
App Demo: https://www.youtube.com/watch?v=SQgfXfSYLac
GitHub Repository: https://github.com/gibbenergy/Resume_Helper
Table of Contents
- Introduction
- Architecture Overview
- Why Layered Architecture?
- Two-Step AI Workflow
- Privacy Design and Cost Savings
- Supporting Multiple AI Providers
- Why Not LangChain?
- Technology Stack Decisions
- Application Tracker System
- Cost Tracking
- Lessons Learned
- References
1. Introduction
1.1 The Problem I Wanted to Solve
When I was job hunting, I noticed a few frustrations:
-
Tailoring resumes is tedious: Every job posting is different, and customizing my resume for each one took hours.
-
Privacy concerns with AI tools: I wanted to use ChatGPT or Claude to help optimize my resume, but that meant sending my full name, phone number, email, and home address to cloud servers. That felt uncomfortable.
-
Vendor lock-in: Most AI tools I found only worked with one provider (usually OpenAI). What if I wanted to try a cheaper option? Or run everything locally?
-
Tracking applications is messy: I was using spreadsheets to track where I applied, interview stages, and follow-ups. It was getting hard to manage.
1.2 What I Built
Resume Helper tries to address these issues:
- Privacy-first approach: Personal information is stripped out before sending anything to AI, then restored only for the final document
- Multiple AI providers: Works with OpenAI, Anthropic, Google, Groq, Perplexity, xAI, and local models via Ollama
- Built-in application tracking: A simple system to track job applications, interview stages, and related documents
- Cost visibility: Shows how much each AI operation costs so there are no surprises
1.3 Document Purpose
This document focuses on the why behind decisions rather than the how of implementation. For code details, please refer to the GitHub repository. Here I want to explain:
- Why I organized the code the way I did
- What trade-offs I considered
- What I learned along the way
2. Architecture Overview
2.1 High-Level System View
The application has several main components that work together:
flowchart TB
Browser["🌐 Browser"]
Tabs["🖥️ UI: 10 Tabs"]
RW["⚙️ Resume Workflows"]
AW["📋 App Tracking"]
Schema["📦 Data Schemas"]
LLM["🤖 AI Provider"]
Privacy["🔒 Privacy"]
Cost["💰 Cost Tracker"]
DB["💾 Database"]
PDF["📄 PDF Gen"]
SQLite[("🗄️ SQLite")]
Local["🦙 Ollama"]
Cloud["☁️ Cloud AI"]
Browser <--> Tabs
Tabs <--> RW
Tabs <--> AW
RW <--> Schema
AW <--> Schema
RW <--> LLM
RW <--> Privacy
LLM <--> Cost
AW <--> DB
DB <--> SQLite
LLM --> Local
LLM --> Cloud
style Tabs fill:#1a5276,stroke:#64ffda,color:#fff
style RW fill:#1e8449,stroke:#64ffda,color:#fff
style AW fill:#1e8449,stroke:#64ffda,color:#fff
style Schema fill:#b9770e,stroke:#64ffda,color:#fff
style LLM fill:#7d3c98,stroke:#64ffda,color:#fff
style Privacy fill:#7d3c98,stroke:#64ffda,color:#fff
style Cost fill:#7d3c98,stroke:#64ffda,color:#fff
style DB fill:#7d3c98,stroke:#64ffda,color:#fff
style PDF fill:#7d3c98,stroke:#64ffda,color:#fff2.2 What Each Part Does
| Component | Responsibility |
|---|---|
| User Interface | Display forms, tables, buttons. Collect user input. Show results. |
| Business Logic | Coordinate operations. Decide what happens when user clicks a button. |
| Data Schemas | Define what resume and application data looks like. Validate inputs. |
| AI Provider Interface | Talk to various AI services through a single interface. |
| Privacy Manager | Remove/restore personal information before/after AI calls. |
| Cost Tracker | Calculate and log how much each AI operation costs. |
| Database Access | Store and retrieve job application data. |
| PDF Generator | Create resume and cover letter PDFs from HTML templates. |
3. Why Layered Architecture?
3.1 The Problem with "Quick and Dirty" Code
When I started, I wrote code the simplest way possible. UI code directly calling AI APIs, database queries mixed with display logic. It worked, but as the project grew, I ran into issues:
graph TD
title["⚠️ Initial Approach - Problems"]
UI["UI Code"]
DB["Database Code"]
API["API Code"]
title ~~~ UI
UI <--> DB
DB <--> API
UI <--> API
style title fill:none,stroke:none,color:#e74c3c
style UI fill:#922b21,stroke:#e74c3c,color:#fff
style DB fill:#922b21,stroke:#e74c3c,color:#fff
style API fill:#922b21,stroke:#e74c3c,color:#fffWhat went wrong:
- Changing how I stored data meant updating UI code too
- Testing AI logic required running the whole application
- Adding a new AI provider meant changes in multiple places
- It was hard to remember where things were
3.2 Organizing by Responsibility
I reorganized the code so each part has one job:
graph TB
P["🖥️ Presentation Layer"]
A["⚙️ Application Logic"]
D["📦 Domain Layer"]
I["🔌 Infrastructure"]
P -->|"calls"| A
A -->|"uses"| D
A -->|"uses"| I
style P fill:#1a5276,stroke:#64ffda,color:#fff
style A fill:#1e8449,stroke:#64ffda,color:#fff
style D fill:#b9770e,stroke:#64ffda,color:#fff
style I fill:#7d3c98,stroke:#64ffda,color:#fffThis follows Clean Architecture principles where dependencies point inward:
-
🖥️ Presentation Layer: The UI that users interact with. Handles displaying data and capturing user input. Knows nothing about databases or AI providers.
-
⚙️ Application Logic: Orchestrates the workflows. When user clicks "Generate Resume", this layer coordinates the steps: validate input, call AI, save results. Contains no UI code or database queries.
-
📦 Domain Layer: The core business rules and data structures. Defines what a Resume or Application looks like. Has zero dependencies on external systems.
-
🔌 Infrastructure: Talks to the outside world: AI APIs, database, file system. Can be swapped without touching business logic.
3.3 How This Helped
| Before | After |
|---|---|
| Change database --> Update UI code | Change database --> Only update repository |
| Test AI logic --> Run full app | Test AI logic --> Run just the workflow |
| Add new AI provider --> Hunt through codebase | Add new provider --> Update one file |
| "Where's the code for X?" --> Search everywhere | "Where's the code for X?" --> Check the right layer |
3.4 Project Structure Mapping
Here's how the folders map to each layer:
| Layer | Folders |
|---|---|
| Presentation | presentation/ui/tabs/, app.py |
| Application | workflows/, presentation/operations/ |
| Domain | models/ |
| Infrastructure | infrastructure/providers/, infrastructure/repositories/, infrastructure/generators/, infrastructure/frameworks/ |
| Shared Utilities | utils/ |
4. Two-Step AI Workflow
4.1 The Problem with Raw User Input
Users enter resume data in various ways:
- Inconsistent formatting
- Verbose descriptions with filler words
- Mixed bullet points and paragraphs
- Irrelevant details that don't help the application
If we send this raw data directly to AI for resume tailoring, two problems occur:
- Wasted tokens: Filler content means more input tokens (cost) and more output tokens
- Diluted attention: LLMs have limited context windows. Irrelevant information competes for the model's attention, potentially reducing output quality
4.2 The Two-Step Solution
Instead of one big AI call, we split the workflow into two steps:
flowchart TB
Input["📝 Raw Input"]
Clean["🤖 AI: Extract & Clean"]
Structured["✅ Clean Data"]
Resume["📄 Tailor Resume"]
Cover["💼 Cover Letter"]
Suggest["💡 Suggestions"]
Input --> Clean
Clean --> Structured
Structured --> Resume
Structured --> Cover
Structured --> Suggest
style Input fill:#922b21,stroke:#e74c3c,color:#fff
style Clean fill:#1a5276,stroke:#64ffda,color:#fff
style Structured fill:#1e8449,stroke:#27ae60,color:#fff
style Resume fill:#1a5276,stroke:#64ffda,color:#fff
style Cover fill:#1a5276,stroke:#64ffda,color:#fff
style Suggest fill:#1a5276,stroke:#64ffda,color:#fff4.3 Why This Design?
Step 1 Benefits:
- Removes irrelevant content ONCE, not three times
- Produces a clean baseline that all subsequent operations use
- Focuses AI attention on key achievements and skills
- Standardizes data format so Step 2 prompts can be simpler
Step 2 Benefits:
- Three operations can run in parallel (faster)
- Each receives focused, concise input (better quality)
- Smaller input tokens for each call (lower cost)
- Easier to debug since each operation is independent
4.4 Token Savings Example
Consider a resume with verbose descriptions:
| Metric | Without Cleanup | With Two-Step |
|---|---|---|
| Raw input | 2,000 tokens | 2,000 tokens |
| After Step 1 cleanup | - | 800 tokens |
| Step 2 input (per operation) | 2,000 tokens | 800 tokens |
| Total input for 3 operations | 6,000 tokens | 800 + (800 x 3) = 3,200 tokens |
| Savings per application | - | ~2,800 tokens (~47%) |
Over 100 job applications, this adds up significantly.
5. Privacy Design and Cost Savings
5.1 The Concern
When you ask an AI to "make my resume better," you typically send:
- Your full name
- Email address
- Phone number
- Home address
- LinkedIn, GitHub URLs
- Sometimes citizenship status
This data goes to cloud servers. Even if providers promise not to train on it, I wanted to minimize what gets sent.
5.2 The Approach: Sanitize, Process, Restore
The idea is simple: before sending anything to AI, remove personal information. After getting the AI response, put it back.
flowchart LR
Input["📄 Resume"]
PM1["🔒 Strip Info"]
Personal["👤 Personal Info<br/>(stored locally)"]
Clean["📝 Work Content"]
AI["☁️ Cloud AI"]
Results["📊 AI Results"]
PM2["🔒 Restore Info"]
Output["📄 Final Output"]
Input --> PM1
PM1 -->|"remove"| Personal
PM1 -->|"send"| Clean
Clean --> AI
AI -->|"resume, cover letter,<br/>suggestions, skill gaps"| Results
Results --> PM2
Personal -->|"append back"| PM2
PM2 --> Output
style Personal fill:#922b21,stroke:#e74c3c,color:#fff
style Clean fill:#1e8449,stroke:#27ae60,color:#fff
style AI fill:#1a5276,stroke:#64ffda,color:#fff
style PM1 fill:#7d3c98,stroke:#64ffda,color:#fff
style PM2 fill:#7d3c98,stroke:#64ffda,color:#fff
style Results fill:#1a5276,stroke:#64ffda,color:#fff5.3 What Gets Removed
The Privacy Manager removes these fields before any AI call:
- Name prefix (Dr., Mr., Ms.)
- Full name
- Phone
- Current address / Location
- Citizenship
- LinkedIn URL
- GitHub URL
- Portfolio URL
The AI only sees your work descriptions, skills, and achievements. The professional content that actually needs optimization.
5.4 Cost Savings from Privacy
Removing personal information also saves money:
| Field Removed | Approximate Tokens Saved |
|---|---|
| Full name | 2-5 tokens |
| 5-10 tokens | |
| Phone | 5-8 tokens |
| Address | 10-20 tokens |
| LinkedIn URL | 10-15 tokens |
| GitHub URL | 8-12 tokens |
| Portfolio URL | 8-15 tokens |
| Total per call | ~50-85 tokens |
This seems small, but consider:
- Each job application has 3-4 AI operations (analysis, tailoring, cover letter, suggestions)
- 100 job applications = 300-400 AI calls
- Token savings: 50-85 tokens x 400 calls = 20,000-34,000 tokens saved
- At typical cloud AI pricing (~$0.03/1K input tokens): $0.60-$1.00 saved
More importantly, this is data that doesn't need to leave your computer.
5.5 Validation Step
Before sending anything, there's a validation step that checks the sanitized data to make sure no personal fields slipped through. If any are found, the operation stops and logs a warning.
5.6 The Local Alternative: Ollama
For those who want complete privacy, the application supports Ollama. A way to run AI models entirely on your own computer:
graph TB
App["🖥️ Resume Helper"]
App --> Cloud
App --> Local
subgraph Cloud ["☁️ CLOUD OPTION"]
C1["📚 High Context Window"]
C2["🧠 Strong Reasoning"]
C3["📰 Current Information"]
C4["💸 Pay per Token"]
end
subgraph Local ["🦙 LOCAL OPTION - Ollama"]
L1["💰 FREE"]
L2["🔒 100% Private"]
L3["📴 Works Offline"]
L4["⚠️ Needs ~12GB VRAM"]
end
style C1 fill:#1a5276,stroke:#64ffda,color:#fff
style C2 fill:#1a5276,stroke:#64ffda,color:#fff
style C3 fill:#1a5276,stroke:#64ffda,color:#fff
style C4 fill:#b9770e,stroke:#f39c12,color:#fff
style L1 fill:#1e8449,stroke:#27ae60,color:#fff
style L2 fill:#1e8449,stroke:#27ae60,color:#fff
style L3 fill:#1e8449,stroke:#27ae60,color:#fff
style L4 fill:#b9770e,stroke:#f39c12,color:#fffTrade-offs with Ollama:
- Zero data leaves your computer
- No API keys or accounts needed
- No usage costs
- Works offline
- Requires ~14GB disk space
- Slower than cloud APIs (depends on your hardware)
6. Supporting Multiple AI Providers
6.1 The Problem
The AI landscape evolves at breakneck speed. New models launch every few months, each claiming better reasoning, larger context windows, or lower costs. Competition among providers is fierce:
- GPT-5 pushes boundaries, then Claude 4 responds with stronger reasoning
- Gemini 3 offers massive context, Groq delivers lightning-fast inference
- A local model via Ollama might be "good enough" for simple tasks at zero cost
Users deserve choice. The best model today might not be the best tomorrow. Locking into a single provider means missing out when a competitor releases something better, cheaper, or faster.
Platform neutrality is the solution. The application should not care which AI powers it. Users pick the best tool for their needs at any moment, without code changes.
The challenge? Each provider has a completely different API format. Without abstraction, the code becomes a mess of if-else statements.
6.2 The Solution: LiteLLM + Wrapper
I chose LiteLLM, an open-source library that provides a unified interface to 100+ LLM providers. Then I wrapped it in a provider class that handles:
- Provider switching at runtime
- Model selection
- Special handling for different provider quirks (like Ollama not supporting JSON mode)
- Parsing responses from "reasoning" models that include their thinking
6.3 Supported Providers
| Provider | Type | Notes |
|---|---|---|
| Ollama | Local | Free, private, requires disk space |
| OpenAI | Cloud | GPT-5, GPT-4o, etc. |
| Anthropic | Cloud | Claude models |
| Cloud | Gemini models | |
| Groq | Cloud | Fast inference |
| Perplexity | Cloud | Search-augmented |
| xAI | Cloud | Grok models |
6.4 How Provider Switching Works
flowchart LR
User["👤 User"]
Select["🔄 Select Provider"]
Process["⚙️ AI Process"]
Track["💰 Cost Logged"]
Result["📄 Result"]
User -->|"dropdown"| Select
Select -->|"any provider"| Process
Process --> Track
Track --> Result
style Select fill:#1e8449,stroke:#27ae60,color:#fff
style Track fill:#b9770e,stroke:#f39c12,color:#fff
style Result fill:#1a5276,stroke:#64ffda,color:#fff6.5 Why This Matters
- No lock-in: If OpenAI raises prices, switch to Groq with one dropdown change
- Try before you commit: Test different providers to see which works best for your use case
- Local option: Use Ollama when privacy matters most or you don't want to pay
- Future-proof: New providers can be added without changing application logic
7. Why Not LangChain?
7.1 What is LangChain?
LangChain is a popular framework for building LLM applications. It provides chains, agents, memory management, and abstractions over different providers.
7.2 Why I Chose Not to Use It
This was a deliberate decision after trying it initially:
| Consideration | LangChain | Direct Approach |
|---|---|---|
| Complexity | Multiple abstraction layers | Straightforward API calls |
| Debugging | Hard to trace through chains | Clear, linear flow |
| Dependencies | Heavy (100+ packages) | Minimal |
| Speed | Extra abstraction layers (LangChain → LiteLLM → API) add latency | Direct API calls via LiteLLM |
| Token usage | Chain prompts add overhead | Single prompt |
7.3 The Key Insight: Speed Matters
The longest operation in this application is AI processing. Every abstraction layer adds latency. With LangChain, you have:
- Abstraction within LangChain itself (chains, templates, parsers)
- Another abstraction layer between LangChain and LiteLLM
- Then LiteLLM to the actual API
That's multiple layers of indirection, each adding processing time. Modern LLM APIs are already sufficient and well-designed. There's no need to overengineer with additional abstraction layers that reduce speed.
Modern LLMs (GPT-5, GPT-4o, Claude Opus, Gemini) have become very good at following instructions. In 2022, you might have needed complex prompt chaining to get consistent outputs. In 2025, you can often just ask clearly and get good results.
For this application, I found that:
- A well-written prompt works better than a complex chain
- JSON mode handles structured outputs reliably
- Direct API calls via LiteLLM are faster than adding LangChain on top
graph TD
subgraph framework ["❌ Framework: 6 Steps"]
A1[Input] --> B1[Template] --> C1[Chain] --> D1[Parser] --> E1[Memory] --> F1[Result]
end
subgraph direct ["✅ Direct: 4 Steps"]
A2[Input] --> B2[Prompt] --> C2[API] --> D2[Result]
end
style A1 fill:#922b21,stroke:#e74c3c,color:#fff
style B1 fill:#922b21,stroke:#e74c3c,color:#fff
style C1 fill:#922b21,stroke:#e74c3c,color:#fff
style D1 fill:#922b21,stroke:#e74c3c,color:#fff
style E1 fill:#922b21,stroke:#e74c3c,color:#fff
style F1 fill:#922b21,stroke:#e74c3c,color:#fff
style A2 fill:#1e8449,stroke:#27ae60,color:#fff
style B2 fill:#1e8449,stroke:#27ae60,color:#fff
style C2 fill:#1e8449,stroke:#27ae60,color:#fff
style D2 fill:#1e8449,stroke:#27ae60,color:#fff7.4 When LangChain Makes Sense
To be fair, LangChain is valuable for:
- Complex multi-step agent workflows
- Applications needing conversation memory
- RAG (Retrieval Augmented Generation) systems
- Rapid prototyping with many different tools
For a focused application like this, the simpler approach worked better.
8. Technology Stack Decisions
8.1 Overview
| Component | Choice | Alternatives Considered | Why This Choice |
|---|---|---|---|
| Language | Python | - | Rich AI/ML ecosystem, personal familiarity |
| UI Framework | Gradio | Streamlit, Flask+React | Hugging Face integration, future LLM model support |
| AI Integration | LiteLLM | Direct SDKs, LangChain | Unified interface, lightweight |
| Database | SQLite | PostgreSQL, MongoDB | Personal use, small-medium data, simple setup |
| ORM | SQLAlchemy | Raw SQL, Peewee | Widely used, good documentation |
| PDF Generation | Playwright | WeasyPrint, ReportLab | Simpler setup, accurate CSS rendering |
8.2 Why Gradio?
I considered several options for the UI:
Streamlit: A strong alternative with good data app support.
Flask + React: More control, but significantly more work. I didn't need a full SPA.
Gradio: Originally designed for ML demos, but works well for form-heavy applications. Built-in components for tables, file uploads, tabs. The key advantage is Gradio's integration with Hugging Face ecosystem, which opens possibilities for future LLM model integration and easy sharing via Hugging Face Spaces.
Trade-offs:
- Less flexible than custom frontend
- Styling options are limited
8.3 Why SQLite?
For a personal desktop application that stores job applications:
- Personal use case: This is a single-user tool, not a multi-user web service
- Small to medium data size: Even with hundreds of job applications, the database stays manageable
- No server setup: Users don't need to install or configure PostgreSQL or MySQL
- Simplicity: No complicated database configuration or administration
- Single file: Database is just a
.dbfile, easy to backup or move - Built into Python: No additional dependencies needed
SQLite handles the expected data volume well. A typical job search might involve 100-500 applications, which SQLite manages easily.
8.4 Why Playwright for PDFs?
Generating PDFs from HTML was surprisingly tricky:
WeasyPrint: Faster rendering and lighter weight. However, it requires separate installation beyond Python (system-level dependencies like cairo, pango, etc.), and the setup can be sophisticated depending on the operating system.
ReportLab: Low-level, requires building PDFs programmatically.
Playwright: Uses a real browser engine (Chromium), so CSS renders exactly as expected. The setup is simpler: just one command (playwright install chromium) and it works. Downside is it's heavier (downloads browser binaries).
For resume PDFs where appearance matters and cross-platform setup simplicity is important, Playwright's approach was the better trade-off.
8.5 Why a Classic Template?
The application uses a single classic resume template by design, not as a limitation.
The ATS Reality: Most resumes today are processed by ATS (Applicant Tracking Systems) that use AI parsers to store and retrieve candidate information. These systems rely on OCR (Optical Character Recognition) and multimodal AI to extract structured data from resumes.
Why Classic Works: The classic template is widely recognized and tested. It uses:
- Standard section headers (Education, Experience, Skills)
- Simple, linear layout
- Standard fonts and formatting
- Clear visual hierarchy
This ensures that company OCR systems and multimodal AI can read the resume correctly, avoiding parsing errors that could hurt job opportunities.
The Trade-off: While multiple templates might look more appealing visually, ATS compatibility takes priority. A beautifully designed resume that fails to parse correctly in an ATS system is worse than a simple, parseable one.
9. Application Tracker System
9.1 Why Build This?
I was tracking applications in a spreadsheet, but it got messy:
- Which companies had I applied to?
- What stage was each application in?
- When should I follow up?
- Where did I put that tailored resume?
Building a tracker into the app solved these problems and let me attach documents to each application.
9.2 Data Model
flowchart LR
subgraph app ["📋 APPLICATION"]
A1["Company + Position"]
A2["Status + Priority"]
A3["Match Score"]
A4["Interview Pipeline"]
end
subgraph doc ["📎 DOCUMENTS"]
D1["Resume PDF"]
D2["Cover Letter"]
D3["Notes"]
end
app -->|"1 : many"| doc
style A1 fill:#1a5276,stroke:#64ffda,color:#fff
style A2 fill:#1a5276,stroke:#64ffda,color:#fff
style A3 fill:#1a5276,stroke:#64ffda,color:#fff
style A4 fill:#1a5276,stroke:#64ffda,color:#fff
style D1 fill:#1e8449,stroke:#27ae60,color:#fff
style D2 fill:#1e8449,stroke:#27ae60,color:#fff
style D3 fill:#1e8449,stroke:#27ae60,color:#fffEach Application record stores the job details (company, position, URL), tracking info (status like "Applied" or "Interview Scheduled", priority level), an AI-generated match score, and the interview pipeline stages. Multiple Documents can be attached to each application: the tailored resume, generated cover letter, and any notes. This one-to-many relationship keeps everything organized per job opportunity.
9.3 Interview Pipeline
Each application can track multiple interview rounds:
stateDiagram-v2
[*] --> Applied
Applied --> PhoneScreen: Scheduled
PhoneScreen --> Technical: Passed
Technical --> Panel: Passed
Panel --> ManagerRound: Passed
ManagerRound --> FinalRound: Passed
FinalRound --> Offer: Accepted!
FinalRound --> Rejected: Declined
Applied --> Rejected: No response
PhoneScreen --> Rejected: Failed
Technical --> Rejected: Failed
Panel --> Rejected: Failed
ManagerRound --> Rejected: Failed9.4 Why Hash Job URLs?
Each application gets a unique ID by hashing the job URL. This means:
- Same job URL = Same ID = Prevents duplicates
- No need for auto-increment IDs that might conflict
- Deterministic: same input always gives same ID
10. Cost Tracking
10.1 Why Track Costs?
LLM APIs charge per token (roughly per word). Without visibility:
- You might get a surprise bill
- You can't compare provider costs
- You don't know which operations are expensive
10.2 How It Works
flowchart TB
subgraph "Each AI Call"
Call[API Call]
Response[Response includes token counts]
end
subgraph "Cost Tracking"
Tracker[Cost Tracker]
Pricing[LiteLLM Pricing Data]
Calculate[Calculate Cost]
Log[Log to File]
Display[Show in UI]
end
Call --> Response
Response --> Tracker
Pricing --> Calculate
Tracker --> Calculate
Calculate --> Log
Log --> DisplayThe cost tracker:
- Intercepts the response from each AI call
- Extracts token counts (prompt tokens, completion tokens)
- Looks up pricing from LiteLLM's pricing database
- Calculates cost and adds to running total
- Persists to a JSON file
- Displays in the UI
10.3 Cost Estimates
| Operation | Typical Cost (GPT-5 / GPT-4o) |
|---|---|
| Resume tailoring | $0.01-0.03 |
| Job analysis | $0.01-0.02 |
| Cover letter | $0.02-0.04 |
| Suggestions | $0.01-0.02 |
| Per application (all 4) | $0.05-0.11 |
| 100 applications | $5-11 |
| With Ollama | $0.00 |
Having visibility helps decide when to use local models instead.
11. Lessons Learned
11.1 What Worked Well
Layered architecture: Initial setup took longer, but the Clean Architecture structure enables ease of modification, scaling up code, better organization, and easier bug catching. The separation of concerns means each layer has clear responsibilities, making it straightforward to locate issues and make changes without affecting other parts of the system. This structure also supports business workflows effectively. Features like cost tracking and cost awareness demonstrate how the architecture can be tailored to real business applications, where understanding operational costs and resource usage is critical for decision-making.
Two-step workflow: Cleaning data first, then running parallel operations saved tokens and improved quality. The AI gets focused input instead of verbose noise.
Privacy by design: Building privacy protection from the start was much easier than it would have been to add later. The sanitize/merge pattern is simple and reliable.
Schema definitions: Defining data structures upfront prevented many bugs. When AI returns inconsistent field names, the schema engine normalizes them.
Direct API calls: Skipping LangChain kept things simple. Debugging is straightforward. I can log exactly what goes to the AI and what comes back.
11.2 What I'd Do Differently
More async processing: Currently, AI calls are synchronous. For operations that don't depend on each other (like analyzing multiple sections), parallel calls would be faster.
Better error handling: Some edge cases could be handled more gracefully, especially network errors during AI calls.
11.3 Trade-offs I Accepted
| Choice | Gained | Gave Up |
|---|---|---|
| Gradio over custom UI | Fast development | Full styling control |
| SQLite over PostgreSQL | Simple deployment | Multi-user support |
| Synchronous AI calls | Simpler code | Speed for parallel operations |
| No LangChain | Simplicity, control | Framework features |
12. References
-
Martin, R. C. (2017). Clean Architecture: A Craftsman's Guide to Software Structure and Design. Prentice Hall.
-
LiteLLM Documentation. https://docs.litellm.ai/
-
Gradio Documentation. https://www.gradio.app/docs/
-
SQLAlchemy Documentation. https://docs.sqlalchemy.org/
-
Ollama. https://ollama.com/
Last Updated: December 2025