I Asked My AI Agent Where to Live on $2,000/Month. It Compared 5 Cities for Remote Workers.

Q: The Tech Stack

@ai-sdk/openai & ai — Vercel’s AI SDK for OpenAI integration. Comes with a generateObject() function for structured AI responses with schema validation, so we can always get consistent JSON outputs from GPT models. This is the core of our AI-powered cost analysis. zod — TypeScript-first schema

Q: 3. Why Bright Data for SERP?

This project depends on interpreting real-time search data across dozens of cities, categories, and query variations. That means geo-targeting, rotating IPs, and handling the frequently-changing schema of Google’s SERP HTML. Bright Data’s SERP API solves that cleanly: it gives us real-time results i

Q: Setting Up Bright Data’s SERP API

If you don’t have a Bright Data account yet, you can sign up for free. Adding a payment method will grant you a $5 credit to get started — no charges upfront. 1. Sign in to Bright Data Log in to your Bright Data account and you’ll be on the dashboard. 2. Creating a Proxy Zone From there, find the My

Q: Designing a Realistic Agent Loop

My plan was to build a multi-agent system grounded in a classic cognitive architecture: Perceive → Reason → Reflect. Each phase implemented as a distinct, debuggable step with clear inputs and outputs. The loop is scoped tightly to a single city. Each agent runs independently, with its own state (or

Q: Coding Our Agent

Let’s start at the beginning — our main() function. async function main(customBudget = null) { try { const budget = customBudget || CONFIG.monthlyBudgetUSD; const startTime = Date.now(); // Step 1: Launch autonomous agents for each city in parallel // Each agent will independently gather and analyze

Q: Step 1: Perception

// PERCEIVE: Gather cost-of-living data for this city using adaptive strategies async function perceive(context) { try { // Check cache first - no need to redo search if we have recent (e.g. within the last 7 days) SERP data for this city if (hasCachedData({ name: context.city, country: context.coun

Vibes vs. Verifiable Data

“You’ve built AI stuff before,” a friend said. “Can you make something that tells me the best place to live on $2000/mo? Like, actually data-backed?”

They didn’t want influencer lists or anecdotal Reddit threads, but an analysis with real numbers: grocery cost, internet, quality of life, all pulled fresh — not from someone who asked ChatGPT. That kicked off the challenge: could we build an agent that gives you a real answer for digital nomads?

The problem is that LLMs are everywhere right now — and sure, they’re great at reasoning, summarizing, and pattern matching — but they are not Wikipedia. They don’t have a regularly updated-and-maintained database of facts; they’re just improv actors with good recall.

But when tightly controlled, LLMs are great at interpreting and reasoning over real-world data — if and only if you give them that data and don't let them make things up. That’s the key tradeoff to make this work: stop thinking of how you ask ChatGPT questions, and instead turn your LLMs into focused, deterministic tools that make sense of the world without imagining it.

So I built an agent to do structured, repeatable evaluations of cities using live cost-of-living data pulled via Bright Data’s SERP API, at scale, from sources like Numbeo, Expatistan, Speedtest, and more, then ran it through a tightly constrained agentic Perceive → Reason → Reflect loop with an LLM for reasoning.

Input:

> node cost-of-living.js 2000

Output:

# The Verdict

**Winner: Bangkok**

After analyzing 5 cities, 3 cities fit within the $2,000/month budget. Bangkok emerges as the best choice, offering excellent value for remote workers with $994 left over each month.

**Key Findings:**
-  **3 out of 5 cities** fit within the $2,000 budget
-  **Most affordable**: Bangkok
-  **Best remote work infrastructure**: Bangkok
-  **Fastest internet**: Lisbon

## Cities Within Budget

These 3 cities fit comfortably within your $2,000/month budget:

| City | Monthly Cost | Money Left Over | Budget Used | Remote Work Score | Efficiency Score |
|------|--------------|----------------|-------------|-------------------|------------------|
| **Bangkok, Thailand** | $1,006 | $994 | 50% | 82/100 | 70/100 |
| **Bali, Indonesia** | $1,316 | $684 | 66% | 72/100 | 58/100 |
| **Lisbon, Portugal** | $1,975 | $25 | 99% | 61/100 | 43/100 | 

## Cities Over Budget

These cities exceed your $2,000/month budget:

| City | Monthly Cost | Over Budget By | Budget Used |
|------|--------------|----------------|-------------|
| **Berlin, Germany** | $2,035 | $35 | 102% |
| **Austin, United States** | $3,142 | $1,142 | 157% |

## ...rest of the report here, including a complete cost breakdown and city-by-city analysis..

You can find it here under an MIT license.

Of course, scoring a city’s “remote work suitability” involves making judgment calls: What matters more — cost of rent, or internet speed? What’s the right way to normalize across currencies and local purchasing power? I made some assumptions — mathematical and otherwise — so of course this blog post about a weekend hobby project is by no means meant to be definitive.

But said assumptions are transparent, the process is repeatable, and I think this code serves as a good base you can work off of. Let’s get to it.

The Tech Stack

@ai-sdk/openai & ai — Vercel’s AI SDK for OpenAI integration. Comes with a generateObject() function for structured AI responses with schema validation, so we can always get consistent JSON outputs from GPT models. This is the core of our AI-powered cost analysis.

zod — TypeScript-first schema validation library, needed for the above. This makes sure the AI’s responses match the expected data structures we want, and prevents runtime errors from malformed AI outputs. Essential for reliable cost data extraction.

node-fetch — Simple, lightweight HTTP client for making API requests in Node.js. We use this for all our web scraping and API calls.

https-proxy-agent — Lets us make HTTP requests through proxy servers. Essential for routing search queries through Bright Data’s proxy infrastructure without getting blocked by Google’s anti-bot measures.

dotenv — Pretty standard. Loads environment variables from .env files. We use this to securely store API keys and proxy credentials.

Use your package manager of choice to get core dependencies:

npm install @ai-sdk/openai ai zod node-fetch https-proxy-agent dotenv

You’ll also need accounts for:

Some sort of LLM — I’m using OpenAI, but bring your own.
Bright Data (for SERP API via sophisticated proxy network to avoid Google blocking)

FAQs

1. Why Not Just Scrape Numbeo?

Why not just scrape a website like Numbeo for CITY_NAME and be done with it? They distill everything down to a single score for readers already, right?

Not exactly. This data isn’t consistent between cities at all. Some pages include full breakdowns of categories like rent, groceries, utilities, etc. — others are missing entire sections. Worse, the update frequency and source transparency varies wildly.

If you’re benchmarking cities head-to-head, inconsistency is fatal. That’s why this agent fetches multiple sources and interprets them dynamically, rather than anchoring everything to a brittle scraper.

2. What mathematical assumptions are you making?

My math is understandably rusty, but here’s what I’m doing. Basically, it’s a set of rules grounded in a set of tunable, transparent equations designed for comparing global cities for remote workers. Here’s how it breaks down:

A. PPP Adjustments

To compare costs across countries, we normalize all values using Purchasing Power Parity (PPP) from 2019 World Bank data (I hardcoded this to reduce complexity, but I recommend using a FX API to grab the latest values when you run this agent). The adjustment formula is:

adjusted_cost = original_cost × ppp_factor

Example: If rent in Berlin is $1000/month and Germany’s PPP factor is 0.721, then:

$1,000 × 0.721 = $721

Essentially, this means $1000 in Berlin feels like spending $721 in the U.S.

I’m assuming PPP factors haven’t changed drastically post-2019 and that they’re still directionally accurate for consumer expenses like rent, groceries, and transit.

B. Remote Work Suitability Score

We blend cost of living and internet quality into a single, weighted score:

total_score = 0.7 × cost_score + 0.3 × internet_score

The cost component is normalized on a linear scale from $500–$3000/month:

cost_score = max(0, min(100, 100 − ((monthly_cost − 500) / 2500) × 100))

This prioritizes affordability and assumes diminishing returns on cost efficiency above $3000/month. So, $500/month = 100 pts, $1750/month = 50 pts, $3000/month = 0 pts.

C. Internet Speed Scoring

We apply a logarithmic curve to reflect how humans perceive network improvements. Because going from 5→25 Mbps does feel dramatic; 50→100 Mbps, not so much for most.

internet_score = 100 × (log(speed + 10) − log(10)) / (log(110) − log(10))

This establishes diminishing returns above ~50 Mbps. It also avoids giving perfect scores to countries that inflate bandwidth with peak-only figures.

D. Confidence Thresholds

We use a low-latency AI model to extract data. The extraction confidence is treated as a proxy for data quality:

60%+ = “reliable enough to use”
70%+ = “high confidence”
<50% = discard or retry

These thresholds control when the agent exits or loops again, and are based on observed hallucination rates during extraction and validation.

E. Search Result Quality

We score SERP data quality before reasoning begins:

quality_score = (organic_hits × 5) + (numbeo_score) + (structured_bonus)

Where:

Organic hits = up to 10 results × 5 pts (max 50)
Structured bonus = +20 pts for sources like Knowledge Graphs
Numbeo/Expatistan = +10 pts per match (max 30)

This gives the agent a way to quantify how strong the raw input is before it commits to reasoning or reflecting.

3. Why Bright Data for SERP?

This project depends on interpreting real-time search data across dozens of cities, categories, and query variations. That means geo-targeting, rotating IPs, and handling the frequently-changing schema of Google’s SERP HTML. Bright Data’s SERP API solves that cleanly: it gives us real-time results in normalized JSON, abstracts away proxy headaches, and works reliably even with high volume and global targets.

Bright Data - Real-Time Data for AI Agents

Mine is just a one-off weekend project, but this kind of infrastructure unlocks real product-grade use cases:

City Intelligence APIs — Aggregate city cost profiles, normalize them with PPP, and serve as JSON to dev teams, HR tools, or global relocation services.
SaaS for Remote Hiring — Let companies compare compensation benchmarks against real local prices for where their talent lives (or wants to move).
Onboarding Tools for Nomads — Personalized location suggestions based on speed, cost, and lifestyle goals — powered by live search data, not static lists.

All of those require fresh, interpretable, geo-aware search data. Bright Data gives us the substrate. The agent gives us structure. Together, they make it possible to turn messy public information into real commercial potential.

Setting Up Bright Data’s SERP API

If you don’t have a Bright Data account yet, you can sign up for free. Adding a payment method will grant you a $5 credit to get started — no charges upfront.

1. Sign in to Bright Data

2. Creating a Proxy Zone

From there, find the My Zones page, go to the SERP API section and click Get Started.
If, somehow, you were already using an active Bright Data proxy, you don’t have to create a new zone, just click Add in the top-right corner of this page.

3. Assign a name to your SERP API zone

Choose a meaningful name, as it cannot be changed once created.

4. Click “Add” and Verify Your Account

If you haven’t verified your account yet, you’ll be prompted to add a payment method at this stage.
First-time users receive a $5 bonus credit, so you can test the service without any upfront costs.

There’s additional ways to configure the SERP API you just created, but we won’t have to think about those just yet.

Once that’s done, this is what you’ll need to copy out in the “Overview” tab, for your .env file.

# Bright Data SERP API Configuration
BRIGHT_DATA_CUSTOMER_ID=hl_xxxxxxx
BRIGHT_DATA_ZONE=xxxxxx
BRIGHT_DATA_PASSWORD=xxxxxxxxxxxxx
# OpenAI
OPENAI_API_KEY="sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Throw in an OpenAI (or whatever LLM-as-a-service you’re using) key here too, while we have the file open.

Designing a Realistic Agent Loop

My plan was to build a multi-agent system grounded in a classic cognitive architecture: Perceive → Reason → Reflect. Each phase implemented as a distinct, debuggable step with clear inputs and outputs.

The loop is scoped tightly to a single city. Each agent runs independently, with its own state (or context, if you will), memory, and retries.

Let’s walk through what’s actually happening inside that loop:

Perceive

This is the first step you’ll see in the Lifecycle section of the diagram:

Use Memory to Select Strategy → Build Google Search Query → Check Cache → Fetch via SERP API.

The agent begins by deciding how to search. First off, it pulls from memory — previous success/failure data stored per strategy and category — to pick the most promising retrieval method. That could be a Numbeo-focused strategy, a multi-source query, or even a Reddit-local approach.

Once the strategy is chosen, the agent constructs a Google search query, checks whether results for that city-query pair are cached, and if not, issues a live request via Bright Data’s SERP API.

This is the agent’s sensory input: a slice of the open web, filtered and scoped.

Reason

Once the raw content is in, we move to the Reasoning phase.

Extract Costs using LLM → Compute PPP and Remote Work Score.

This phase takes noisy search results (structured JSON from SERP is better than raw HTML, of course, but it still contains data we might not need) and transforms them into structured insights. Each cost category is processed through a low-latency, tightly constrained LLM extraction step, and most importantly, validated using a strict Zod schema. Any hallucinated or incomplete data fails hard.

Valid entries are adjusted for purchasing power parity, scored with a weighted formula (I went with 70% raw cost of living, 30% internet, but feel free to roll your own), and logged with confidence metrics for future learning.

The result is a structured, transparent snapshot of each city’s remote-work suitability.

Reflect

Finally, the agent reviews its own work to get better.

Evaluate Confidence and Completeness → Decide Retry or Exit.

This is the gatekeeper step. Did this agent get good data across a broad range of categories like rent, grocery, utilities, or internet infrastructure for this city? How confident is it in the source quality and extraction accuracy? How sure is it about the comparative analysis it performed on the city data?

I’m setting a moderately high bar for success here: 75% confidence, 80% category completeness, and no individual category falling below a 50% floor. If the results meet those thresholds (or if retrying would be pointless e.g. max iterations reached, or confidence is too low to recover), the agent exits.

Otherwise, it adapts. It analyzes which cost categories underperformed, adjusts its search strategy (e.g. “try_local_sources” or “expand_search_terms”, more on those later), and re-enters the loop with a new plan.

After three total passes, or a hard failure, it exits — success or not — with all performance data logged for future runs.

This loop makes each agent self-contained, predictable, and easy to reason about. Contrary to what techbros might have you believe, there’s no actual “intelligence”, no emergent behavior here — just structured iteration over (mostly) unreliable input, with strict guardrailing to make things as deterministic as we can make them. And because every step is visible in logs or memory, you can debug a bad result without guessing what the agent was “thinking.”

And there’s no hidden state beyond what you choose to persist.

My approach isn’t exactly something you can build a million dollar SaaS off of, but it works for a proof-of-concept/tutorial/weekend project thing I was making. It trades the flashy speculative autonomy of general purpose LLMs for heavily “guardrailed” utility — which, hot take alert, is actually what you want in a real-world tool, in my opinion.

Coding Our Agent

Let’s start at the beginning — our main() function.

async function main(customBudget = null) {
  try {
    const budget = customBudget || CONFIG.monthlyBudgetUSD;
    const startTime = Date.now();

    // Step 1: Launch autonomous agents for each city in parallel
    // Each agent will independently gather and analyze cost data for their assigned city
    const agentPromises = CONFIG.cities.map(async (city, index) => {
      try {
        // Good practice: stagger agent launches to avoid overwhelming external APIs
        await delay(1000);
        
        // Execute one complete agent cycle: 
        // Perceive(search Google)→ Reason(uses AI)→ Reflect (learn from the process)
        const context = await agentTick(city); //  this function inits and then adds to context in each phase
        return { context, error: null };
        
      } catch (error) {
        // Agent failed - capture error but don't crash the entire analysis
        return { context: null, error: error.message };
      }
    });

    // Step 2: Wait for all agents to complete their analysis
    const agentResults = await Promise.all(agentPromises);
    
    // Filter out failed agents - we can still generate insights from partial data!
    const successfulAgents = agentResults.filter(result => result.context !== null);

    // Fail fast if no agents succeeded
    if (successfulAgents.length === 0) {
      throw new Error('No cities were successfully analyzed');
    }

    // Step 3: Transform agent contexts into structured analysis data
    // Each context contains raw search results + the summary and insights that our LLM reasoned out from that data
    // Don't sweat this. This is just mapping structured JSON into another JSON that makes it easy for the next step.
    const cityAnalyses = successfulAgents
      .map(result => contextToAnalysis(result.context))
      .filter(analysis => analysis !== null);

    // Step 4: Generate cross-city comparative insights
    // No AI here, this just does a lot of math 
    const comparativeAnalysis = generateComparativeAnalysis(cityAnalyses, budget);
    
    // Step 5: Take analysis + results of math from Step 4 and create a human-readable report
    const markdownReport = generateMarkdownReport(cityAnalyses, comparativeAnalysis);

    // Step 6: Save that to filesystem
    const reportPath = saveMarkdownReport(markdownReport, CONFIG.dataDir);

    // Step 7: Return structured results for programmatic use
    return {
      summary: {
        total_cities: CONFIG.cities.length,
        successful_agents: successfulAgents.length,
        execution_time_ms: Date.now() - startTime,
budget_used: budget
      },
      city_analyses: cityAnalyses,
      markdown_report: markdownReport,
      report_path: reportPath
    };

  } catch (error) {
    throw new Error(`Analysis pipeline failed: ${error.message}`);
  }
}

So here’s what should be immediately apparent here:

We’re not running one agent, but a whole fleet of agents in parallel — one for each city — and each does its thing with that agentic loop we talked about (we’ll quantify this as one agent tick, and put it in an appropriately named agentTick() function).
Each agent has a context which serves as its memory and state container throughout — each agent gets initialized with this context when agentTick() runs. This is an agent’s “working memory” that persists all the data it gathers and all the conclusions it draws, so it can make increasingly informed decisions without manual supervision as it progresses through its analysis cycle.
Our meta-analysis layer — generateComparativeAnalysis() — takes all the LM-generated city analyses and produces rankings across multiple dimensions — cheapest to most expensive by category, best PPP-adjusted value, highest remote work scores, fastest internet speeds, and highest data quality. No AI is involved here, just math, and it produces structured data that we can just format into markdown for our final result.

What would such a context look like? I’m going to design it like this, but feel free to add anything else you might like (tracing/logging comes to mind immediately as a potential improvement).

function createAgentContext(cityObj) {
  return {
    // Basic city info
    city: cityObj.name,
    country: cityObj.country,

    // Phase results
    perception: null,
    reasoning: null,

    // Error tracking
    errors: [],

    // Agent state
    state: {
      iteration: 0,
      max_iterations: 3,
      confidence: 0,
      completeness: 0,
      goals_met: false
    },

    // Goals
    goals: {
      confidence_target: 75,
      completeness_target: 0.8, // 80% of categories covered = consider city complete
      min_acceptable_confidence: 50 // 50% confidence achieved = consider city acceptable
    },

    // Memory for learning
    /**
     * Agent uses memory to:
     * - Avoid repeating failed strategies within the same session
     * - Prefer previously successful strategies for specific categories
     * - Adapt its approach across iterations (up to 3 per session)
     */
    memory: {
      attempted_strategies: [],
      failed_strategies_by_category: {},    // per-category failures
      successful_strategies_by_category: {}, // per-category successes
      category_patterns: {}
    },

    // Metadata
    metadata: {
      started_at: new Date().toISOString(),
      completed_at: null,
      execution_time_ms: null
    }
  };
}

Each phase of an agent has access to this common context, and this gets added to and changed throughout the cycle.

Of course, this being the NodeJS ecosystem, there’s probably a library out there that can do this for me per agent, but we’re learning here so we’re just gonna roll everything from scratch.

Next, we’ll cover our core Agent lifecycle — the Perceive-> Reason -> Reflect loop — starting with Perception.

Step 1: Perception

// PERCEIVE: Gather cost-of-living data for this city using adaptive strategies
async function perceive(context) {
  try {
    // Check cache first - no need to redo search if we have recent (e.g. within the last 7 days) SERP data for this city
    if (hasCachedData({ name: context.city, country: context.country }, CONFIG.cacheDir, CONFIG.cacheExpiryDays)) {
      const cachedPerception = loadCachedData({ name: context.city, country: context.country }, CONFIG.cacheDir);
      if (cachedPerception) {
        context.perception = cachedPerception;
        return context;
      }
    }

    const categorySearchResults = {};

    // Process each category with adaptive strategy selection
    for (const category of CONFIG.costCategories) {
      try {
        // STEP 1: Strategy-Level Learning (Strategy Selection)
        // Select search strategy based on context (rent, groceries, internet, etc) and memory
        const strategy = selectStrategy(context, category);
        
        // Based on that strategy, build a targeted search query for this city + category
        let searchQuery = strategy.buildQuery(context.city, context.country, category);
        
        // STEP 2: Query-Level Learning (Adaptations)
        // Apply adaptations to modify the query based on reflect phase suggestions
        if (context.state.current_adaptations?.includes('expand_search_terms')) {
          // Make search broader with additional terms
          searchQuery += ` OR "${category.displayName} price" OR "${category.displayName} budget" OR "${category.displayName} expense"`;
        }
        if (context.state.current_adaptations?.includes('try_local_sources')) {
          // Add local community sources to search query
          searchQuery += ` site:reddit.com OR site:expat.com OR site:nomadlist.com`;
        }
        
        // STEP 3: Execute the search query with Bright Data SERP API
        const rawSearchData = await fetchWithBrightDataProxy(searchQuery, CONFIG);
        
        // STEP 4: Based on that collected SERP data, build a perception of the category
        // Don't sweat this, this is another structured object/JSON builder for the next step
        const categoryPerception = buildPerceptionFromSearchData(rawSearchData, context.city, category.name, CONFIG.maxResults);
        
        // STEP 5: Apply strategy-specific adjustments and confidence modifiers
        categoryPerception.strategy_used = strategy.name;
        
        // Apply confidence modifier based on strategy performance
        const { getStrategyStats } = require('./context');
        const strategyStats = getStrategyStats(context, strategy.name, category.name);
        if (strategyStats.success_rate < 0.5 && strategyStats.attempts > 2) {
          categoryPerception.metadata.confidence_modifier = -0.2; // Lower confidence for underperforming strategies
        } else if (strategyStats.success_rate > 0.8 && strategyStats.attempts > 1) {
          categoryPerception.metadata.confidence_modifier = 0.1; // Higher confidence for proven strategies
        } else {
          categoryPerception.metadata.confidence_modifier = 0; // Neutral for new or average strategies
        }
        
        categorySearchResults[category.name] = categoryPerception;

        await delay(CONFIG.delayBetweenRequests);

      } catch (categoryError) {
        addError(context, 'perception', `Category search failed: ${categoryError.message}`, category.name);

        // Record strategy failure for future learning
        const { recordStrategy } = require('./context');
        const strategy = selectStrategy(context, category);
        recordStrategy(context, strategy.name, category.name, false, 0);
      }
    }

    // Package all category searches into complete city perception
    const perception = {
      timestamp: new Date().toISOString(),
      city: context.city,
      country: context.country,
      search_strategy: 'adaptive_agentic',
      category_searches: categorySearchResults
    };

    // Save SERP data to cache for future runs
    saveCacheData({ name: context.city, country: context.country }, perception, CONFIG.cacheDir, CONFIG.cacheExpiryDays);

    context.perception = perception;

    // Update agent state based on search results
    const completeness = Object.keys(categorySearchResults).length / CONFIG.costCategories.length;
    updateState(context, { completeness });

    return context;

  } catch (error) {
    addError(context, 'perception', `Perception failure: ${error.message}`);
    throw error;
  }
}

1. Smart Caching Check

First, the agent checks if it already has recent data cached. No point in burning API calls if we just searched for Austin’s cost data yesterday, after all. This is basic efficiency, but it also demonstrates agent memory — the agent remembers what it learned before.

For this, I just save SERP data for each city to the filesystem in a JSON using saveCacheData, and retrieve that at the beginning of each run using hasCachedData, but if you have a better solution (such as a dedicated library/package), feel free to plug that in here.

2. Adaptive Strategy Selection

Here’s where it gets interesting. For each cost category (rent, groceries, utilities, etc.), the agent doesn’t just use the same search approach every time. Instead, it calls selectStrategy() which looks at:

What type of data we’re searching for (rent data might need different sources than internet speeds)
What worked before — if Numbeo gave us great rent data last time, prioritize that.
What failed before — if a strategy consistently returns junk, avoid it.

You’ll find this in strategies.jsin the repo:

// Dynamic Strategy Selection - Multiple search approaches
// each strategy has a confidence modifier (how much to trust results from this source)
const SEARCH_STRATEGIES = {
  numbeo_focused: {
    name: 'numbeo_focused',
    description: 'Focus on Numbeo.com data',
    confidence_modifier: 1.0,
    buildQuery: (city, country, category) => 
      `${category.displayName} cost ${city} ${country} site:numbeo.com`,
    sites: ['numbeo.com']
  },
  
  expatistan_focused: {
    name: 'expatistan_focused', 
    description: 'Focus on Expatistan.com data',
    confidence_modifier: 0.9,
    buildQuery: (city, country, category) =>
      `${category.displayName} price ${city} ${country} site:expatistan.com`,
    sites: ['expatistan.com']
  },
  
  // more strategies here
};

// The brains of the operation.
// Selects a strategy based on a given category (rent, groceries, internet, etc)
function selectStrategy(context, category) {
  const categoryName = category.name;
  
  // Step 1: Before we start, get memory of what worked and what didn't for this category
  const categoryFailedStrategies = context.memory.failed_strategies_by_category[categoryName] || [];
  const categorySuccessfulStrategies = context.memory.successful_strategies_by_category[categoryName] || [];
  
  // Step 2: Filter out strategies that have already failed for this category
  const availableStrategies = Object.values(SEARCH_STRATEGIES).filter(strategy =>
    !categoryFailedStrategies.includes(strategy.name)
  );
  
  // Step 3: If we've exhausted all strategies, fall back to our most general one
  if (availableStrategies.length === 0) {
    return SEARCH_STRATEGIES.multi_source; // a generic multi-search is my most general strategy; if yours is something else, put that in here
  }
  
  // Step 4: Strategy selection logic - prioritize what worked before
  let selectedStrategy;
  
  // First, try strategies that worked well for this category in the past
  const successfulStrategy = availableStrategies.find(strategy =>
    categorySuccessfulStrategies.includes(strategy.name)
  );
  
  if (successfulStrategy) {
    selectedStrategy = successfulStrategy;
  }
  // We don't have past data? Then let's start with our most comprehensive strategy
  else if (context.state.iteration === 0) {
    selectedStrategy = SEARCH_STRATEGIES.multi_source;
  }
  // Okay, got it. That strategy doesn't work. Let's iterate! For subsequent attempts, try specialized strategies in order of reliability
  else {
    const strategyOrder = [
      SEARCH_STRATEGIES.numbeo_focused,
      SEARCH_STRATEGIES.reddit_local,
      SEARCH_STRATEGIES.government_stats,
      SEARCH_STRATEGIES.expatistan_focused,
      SEARCH_STRATEGIES.expat_forums
    ];
    
    const nextStrategy = strategyOrder.find(strategy => 
      availableStrategies.includes(strategy)
    );
    
    selectedStrategy = nextStrategy || availableStrategies[0];
  }
  
  return selectedStrategy;
}

The selectStrategy() function is where we see something resembling true agentic learning in action. Our agent will make intelligent decisions based on its accumulated experience.

A. Memory Consultation

First, the agent consults its memory about this specific category. It asks itself:

“What strategies have I tried before for finding rent data in this city?”
“Which ones gave me good results?”
“Which ones completely failed and I should avoid?”

This is episodic memory — the agent remembers specific experiences and uses them to inform future decisions.

B. Smart Filtering

The agent then filters out strategies that have already proven ineffective. If searching government statistics for grocery prices consistently returned nothing useful, that strategy gets eliminated from consideration. This prevents the agent from repeatedly making the same mistakes.

This is negative learning — learning what NOT to do is just as valuable as learning what works.

C. Prioritized Decision Making

Now comes the intelligent selection process. First Priority: Try strategies that worked well before for this category, regardless of city. If Numbeo gave great rent data last time, awesome, start there again. This is positive reinforcement learning, and it’ll work for whatever sources you decide to use.

Then, the most common case: for brand new searches (iteration 0), start with the most comprehensive strategy (multi_source) to cast a wide net.

Finally, for retry attempts, systematically work through specialized strategies in order of general reliability — Numbeo first (most structured data), then Reddit (community insights), then government stats (official but often outdated), etc. Of course, you can define these yourself.

D. Graceful Degradation

If the agent has exhausted all strategies for a category, it falls back to the most general approach rather than giving up entirely. This ensures the agent always has a “Plan B” even in difficult situations.

Our goal for this part should be to create a learning curve where agents get better at finding information over time. Early searches might be broad and inefficient, but as the agent builds up experience, it develops specialized knowledge about the best sources for different types of cost data in different cities.

If we get this right, each agent essentially develops its own “search expertise” — knowing that Austin rent data is best found on Numbeo, but Austin internet speeds are better sourced from Reddit discussions, for example.

This accumulated wisdom makes each subsequent search more targeted and effective.

This is strategy-level learning — the agent learns where to look for different types of information. For complete learning, we also need query-level learning — adapting the query string itself, on the fly, based on the Reflect phase. Let’s cover that next.

3. Query-Level Adaptations

Even after picking a strategy, the agent can modify its search query based on previous Reflection phases. If the agent realized it needed broader search terms or local community sources, it adapts the actual search string:

expand_search_terms → adds synonyms and alternative phrasings
try_local_sources → includes Reddit, expat forums, nomad communities

For this, we can just look in the context for this agent.

if (context.state.current_adaptations?.includes('expand_search_terms')) {
  // Make search broader with additional terms
  // TODO: use LLM call for this
  searchQuery += ` OR "${category.displayName} price" OR "${category.displayName} budget" OR "${category.displayName} expense"`;
}
// other adaptation strategies here

This will make more sense when we cover the Reflect phase. For now, just know that our Agent uses both Strategy-level learning seen before (the agent learns where to search for certain categories) and Query-level learning (the agent learns and adapts how to search within those chosen sources.)

4. Execute & Learn

Execution here would be the actual act of searching the web, using the chosen search strategy. As mentioned before, this project uses the Bright Data SERP API, which provides structured access to Google Search results, returned as raw HTML or parsed JSON (we’re choosing the latter).

Can’t just hit websites directly — that breaks in the real world. Pages change markup constantly. IP bans, regional content blocks, and anti-bot defenses make direct scraping brittle and non-reproducible — and none of that gets better at scale. Bright Data solves this by acting like a real user on a real browser, through a proxy network, and gives you back Google (or a number of other search engines, but we’re going with Google here) results in a predictable format. For our agent, this is essentially Google-search-as-a-service, and it lets us focus on interpretation instead of arms races.

const { HttpsProxyAgent } = require('https-proxy-agent');
const fetch = require('node-fetch');

async function fetchWithBrightDataProxy(searchQuery, config) {
  try {
    const proxyUrl = `http://brd-customer-${config.customerId}-zone-${config.zone}:${config.password}@${config.proxyHost}:${config.proxyPort}`;

    const agent = new HttpsProxyAgent(proxyUrl, {
      rejectUnauthorized: false
    });

    const searchUrl = `https://www.google.com/search?q=${encodeURIComponent(searchQuery)}&num=${config.maxResults}&brd_json=1`;

    const response = await fetch(searchUrl, {
      method: 'GET',
      agent: agent,
      headers: {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
        'Accept': 'application/json, text/html, */*',
        'Accept-Encoding': 'gzip, deflate, br'
      }
    });

    const responseText = await response.text();

    if (!response.ok) {
      throw new Error(`HTTP error! Status: ${response.status} - ${response.statusText}`);
    }

    let data;
    try {
      data = JSON.parse(responseText);
      return data;

    } catch (parseError) {
      if (responseText.trim().startsWith('<!DOCTYPE') || responseText.trim().startsWith('<html')) {
        throw new Error('Received HTML instead of JSON - proxy may not be working correctly');
      } else {
        throw new Error('Response is not valid JSON');
      }
    }
  } catch (error) {
    console.error('❌ Search request failed:', error.message);
    throw error;
  }
}

Each category (e.g. rent, groceries, internet) gets its own tailored search query, built using an adaptive strategy system that can evolve over time. Once the query is constructed (with all the right city/country/category context baked in), it’s handed off to Bright Data’s SERP API, which executes the search on real Google results in real time.

What matters is that this step is live, and** not reliant on training data for the LM. Every execution of perceive() brings in fresh results (fresh within the last 7 days, anyway).

After executing the search query and receiving raw SERP data from Google, the agent needs to transform this data into a structured perception and apply learning mechanisms for future improvement.

4. Building Structured Perceptions

The raw search results from Google come in a complex JSON with organic results, knowledge graphs, and various metadata. The buildPerceptionFromSearchData() function transforms this into a clean, standardized perception object:

This function extracts the most relevant information:

Organic search results: Titles, descriptions, and links from the top search results
Knowledge graph data: Structured facts and descriptions when available
Data quality metrics: Counts of results, presence of authoritative sources, relevance scores

The resultant perception object provides a clean, consistent structure that the reasoning phase can work with, regardless of the variability in Google’s raw response format.

5. Strategy Performance Tracking and Confidence Adjustment

This is where the agent’s learning mechanism kicks in. The system tracks how well each search strategy performs for different categories and cities, then adjusts confidence levels accordingly:

This required some mathematical assumptions to be made, but I eyeballed these values:

Underperforming strategies (success rate < 50% after multiple attempts) get a -0.2 confidence penalty
High-performing strategies (success rate > 80% with proven track record) get a +0.1 confidence boost
New or average strategies remain neutral with 0 adjustment

This confidence modifier value becomes crucial in the next phase — Reasoning, where the agent must decide how much weight to give to information from different search strategies. Data from consistently successful strategies will be trusted more, while data from failing strategies will be treated with skepticism.

💡 All this code for “learning” is crucial because without it, the agent would treat all search results equally, even when certain strategies consistently fail for specific categories or cities. The confidence modifier allows the agent to become more discerning over time, improving the overall quality of its cost-of-living analysis.

6. State Update

Finally, the agent updates its internal state/context with a “completeness” score based on how much useful data it found. This feeds into the next phase where the agent decides if it needs to search more or if it’s confident enough to proceed.

Each city agent develops its own “search personality” based on what works best for a given category, and a particular location and data landscape.

Step 2: Reasoning

This is where raw search data gets transformed into actionable cost-of-living insights through AI analysis and economic calculations.

// REASONING - AI analysis with confidence tracking
async function reason(context) {
  if (!context.perception) {
    const error = new Error('No perception data available for reasoning');
    addError(context, 'reasoning', error.message);
    throw error;
  }

  try {
    const costCategories = [];
    let totalConfidence = 0;
    let categoriesProcessed = 0;

    // Process each category
    for (const category of CONFIG.costCategories) {
      const categoryPerception = context.perception.category_searches[category.name];
      if (!categoryPerception) {
        continue;
      }

      try {
        // STEP 1: Extract cost data from search results using AI
        const result = await extractCostData(categoryPerception, category, context.city, context.country);

        // STEP 2: Apply strategy confidence modifier from perception phase
        const strategyModifier = categoryPerception.metadata.confidence_modifier || 0;
        result.confidence = Math.max(0, Math.min(100, result.confidence + (strategyModifier * 100)));
        result.strategy_used = categoryPerception.strategy_used;

        // STEP 3: Record strategy performance for future learning
        const { recordStrategy } = require('./context');
        const isSuccess = result.confidence >= 60; // Consider 60%+ confidence as success
        recordStrategy(context, result.strategy_used, category.name, isSuccess, result.confidence);

        costCategories.push(result);
        totalConfidence += result.confidence;
        categoriesProcessed++;

        await delay(500);

      } catch (categoryError) {
        addError(context, 'reasoning', `Category analysis failed: ${categoryError.message}`, category.name);
      }
    }

    const averageConfidence = categoriesProcessed > 0 ? totalConfidence / categoriesProcessed : 0;

    // STEP 4: Calculate purchasing power parity adjustments
    const basicCosts = {};
    costCategories.forEach(cat => {
      if (cat.usd_amount && cat.usd_amount > 0) {
// Transform "Monthly Rent" -> "monthly_rent", "Internet" -> "internet", etc.
        basicCosts[cat.category.toLowerCase().replace(/\s+/g, '_')] = cat.usd_amount;
      }
    });

    const pppResults = calculatePPPAdjustedCosts(basicCosts, context.country, CONFIG.ppp);

    // STEP 5: Generate final analysis and summary
    const remoteWorkScore = calculateRemoteWorkScore({
      cost_analysis: {
        cost_categories: costCategories,
        ppp_analysis: pppResults
      }
    }, CONFIG.remoteWorkWeights);

    const remoteWorkerSummary = await generateRemoteWorkSummary(
      context.city,
      context.country,
      costCategories,
      averageConfidence,
      remoteWorkScore,
      pppResults
    );

    const reasoning = {
      timestamp: new Date().toISOString(),
      city: context.city,
      country: context.country,
      cost_analysis: {
        cost_categories: costCategories,
        ppp_analysis: pppResults,
        overall_assessment: {
          data_availability: averageConfidence > 70 ? 'good' : averageConfidence > 50 ? 'limited' : 'poor',
          source_reliability: averageConfidence > 70 ? 'high' : averageConfidence > 50 ? 'medium' : 'low',
          summary: remoteWorkerSummary
        },
        total_confidence: averageConfidence
      },
      remote_work_score: remoteWorkScore
    };

    context.reasoning = reasoning;
    updateState(context, { confidence: averageConfidence });

    return context;

  } catch (error) {
    addError(context, 'reasoning', `Reasoning failure: ${error.message}`);
    
    // Create fallback reasoning
    context.reasoning = {
      timestamp: new Date().toISOString(),
      city: context.city,
      country: context.country,
      cost_analysis: {
        cost_categories: [],
        ppp_analysis: {
          original: {},
          ppp_adjusted: {},
          ppp_factor: CONFIG.ppp.find(p => p.country === context.country)?.value || 1.0
        },
        overall_assessment: {
          data_availability: 'poor',
          source_reliability: 'low',
          summary: `${context.city} analysis failed due to data collection issues.`
        },
        total_confidence: 0
      },
      remote_work_score: 0,
      error: error.message
    };

    return context;
  }
}

1. AI-Powered Cost Extraction

For each category (rent, groceries, transportation, etc.), the agent sends the search results to an AI model that acts as a specialized data analyst:

// STEP 1: Extract cost data from search results using AI
const result = await extractCostData(categoryPerception, category, context.city, context.country);

The AI model examines titles, descriptions, and content from the search results, looking for specific cost information. It’s trained to:

Identify actual prices and costs mentioned in the text
Distinguish between different types of costs (monthly rent vs. purchase prices)
Extract numerical values and convert them to USD
Assess the reliability of the information based on source quality

We give the AI no creative freedom.

async function extractCostData(categoryPerception, category, cityName, countryName) {
  if (!process.env.OPENAI_API_KEY) {
    throw new Error('OPENAI_API_KEY environment variable is not set');
  }

  const searchData = categoryPerception.sources.google_search;
  const formattedResults = JSON.stringify(searchData, null, 2);

  let prompt = `You are a cost analysis assistant. Extract ${category.displayName} cost data for ${cityName}, ${countryName} from these targeted search results.
City: "${cityName}, ${countryName}"
Target Category: "${category.displayName}"
Search Results:
${formattedResults}
Extract the following information:
1. Find the most relevant and recent cost data for ${category.displayName}
2. Extract the amount and currency
3. Convert to USD if possible
4. Identify the source and reliability
5. Provide context (e.g., city center vs suburbs, monthly vs daily)
6. Assign a confidence score as an INTEGER from 0 to 100 (IMPORTANT: This must be a whole number percentage like 85, not a decimal like 0.85)`;

  if (category.name === 'internet') {
    prompt += `
INTERNET SPECIFIC REQUIREMENTS:
7. Extract internet speed in Mbps (look for download speeds)
8. Rate internet reliability on a scale of 0-100 based on user reviews/reports
9. Determine if fiber internet is widely available (true/false)
10. Look for monthly internet package costs (not daily or hourly rates)
INTERNET DATA SOURCES: Prioritize Numbeo, Speedtest.net data, ISP websites, and user reviews.`;
  }

  prompt += `
Focus on credible sources like Numbeo, Expatistan, or official city data.
CONFIDENCE SCORING GUIDE (return as INTEGER 0-100):
- 90-100: Recent data from Numbeo/Expatistan with clear pricing
- 70-89: Reliable source but older data or less specific location
- 50-69: General estimates or less reliable sources  
- 30-49: Rough estimates or poor source quality
- 0-29: Very unreliable or no data found
EXAMPLE CONFIDENCE VALUES: 85, 72, 91, 43 (NOT 0.85, 0.72, 0.91, 0.43)
Return structured data for this specific category only.`;

  const baseSchema = {
    category: z.literal(category.displayName),
    amount: z.number().nullable(),
    currency: z.string().nullable(),
    usd_amount: z.number().nullable(),
    source: z.string().nullable(),
    context: z.string().nullable(),
    confidence: z.number().min(0).max(100),
    notes: z.string().nullable()
  };

  if (category.name === 'internet') {
    baseSchema.internet_speed_mbps = z.number().nullable();
    baseSchema.internet_reliability_score = z.number().min(0).max(100).nullable();
    baseSchema.fiber_availability = z.boolean().nullable();
  }

  const result = await generateObject({
    model: openai('gpt-4o-mini'),
    schema: z.object(baseSchema),
    prompt: prompt
  });

  return result.object;
}

The AI must return data in exactly this structure. It cannot add extra fields, change data types, or provide unstructured responses. The Zod baseSchemaused in generateObject()enforces this.

Also, note how instead of letting the AI guess how confident it should be, we provide explicit criteria.

prompt += `
CONFIDENCE SCORING GUIDE (return as INTEGER 0-100):
- 90-100: Recent data from Numbeo/Expatistan with clear pricing
- 70-89: Reliable source but older data or less specific location
- 50-69: General estimates or less reliable sources  
- 30-49: Rough estimates or poor source quality
- 0-29: Very unreliable or no data found

EXAMPLE CONFIDENCE VALUES: 85, 72, 91, 43 (NOT 0.85, 0.72, 0.91, 0.43)`;

2. Strategy Confidence Integration

Remember those confidence modifiers we calculated in the Perception phase? Now they come into play:

// STEP 2: Apply strategy confidence modifier from perception phase
const strategyModifier = categoryPerception.metadata.confidence_modifier || 0;
result.confidence = Math.max(0, Math.min(100, result.confidence + (strategyModifier * 100)));

If a search strategy has been consistently successful, it boosts the confidence in the extracted data. If it’s been failing, it reduces confidence. This creates a feedback loop where the agent becomes more discerning about which data to trust.

3. Learning Loop Completion

The agent records whether this reasoning attempt was successful, completing the learning cycle:

// STEP 3: Record strategy performance for future learning
const isSuccess = result.confidence >= 60; // Consider 60%+ confidence as success
recordStrategy(context, result.strategy_used, category.name, isSuccess, result.confidence);

This data feeds back into the strategy selection process, helping the agent make better choices in future searches.

4. Economic Adjustments

There’s just one hiccup — raw costs don’t tell the full story. A $500 rent might be expensive in Thailand, but dirt cheap in San Francisco. The agent therefore applies purchasing power parity (PPP) adjustments:

// STEP 4: Calculate purchasing power parity adjustments
const pppResults = calculatePPPAdjustedCosts(basicCosts, context.country, CONFIG.ppp);

PPP adjustments normalize costs based on the local economy, giving you a better sense of what those costs actually mean in terms of purchasing power.

function calculatePPPAdjustedCosts(costs, country, pppData) {
  const pppFactor = getPPPFactor(country, pppData); // use some forex API for this
  const adjustedCosts = {};
  
  Object.entries(costs || {}).forEach(([key, value]) => {
    if (typeof value === 'number' && value > 0 && !isNaN(value)) {
      adjustedCosts[key] = value * pppFactor;
    } else {
      adjustedCosts[key] = 0;
    }
  });
  
  return {
    original: costs || {},
    ppp_adjusted: adjustedCosts,
    ppp_factor: pppFactor,
    explanation: pppFactor < 1 ? 'Lower costs due to higher purchasing power' : 
                 pppFactor > 1 ? 'Higher costs due to lower purchasing power' : 
                 'USD baseline'
  };
}

5. Final Analysis and Scoring

Finally, the agent synthesizes everything into a comprehensive analysis:

// STEP 5: Generate final analysis and summary
const remoteWorkScore = calculateRemoteWorkScore(/* ... */);
const remoteWorkerSummary = await generateRemoteWorkSummary(/* ... */);

This includes:

Remote work suitability score: Based on internet costs, living costs, and other factors
Overall assessment: Data quality, reliability, and cost level categorization
AI-generated summary: Human-readable executive summary of the city’s cost of living data (cleaned).

Here’s how remote worker score is calculated (mathematical assumptions here were covered at the start of this post):

function calculateRemoteWorkScore(analysis, weights) {
  let totalScore = 0;
  let totalWeight = 0;
  
  // Cost of Living Score = 70% of total score
  const costCategories = analysis.cost_analysis.cost_categories.filter(c => 
    ['1BR Apartment Rent', 'Monthly Groceries', 'Monthly Utilities'].includes(c.category) && c.usd_amount
  );
  
  if (costCategories.length > 0) {
    const avgMonthlyCost = costCategories.reduce((sum, c) => sum + c.usd_amount, 0);
    const costScore = Math.max(0, Math.min(100, 100 - ((avgMonthlyCost - 500) / 2500) * 100));
    totalScore += costScore * weights.cost_of_living;
    totalWeight += weights.cost_of_living;
  }
  
  // Internet Quality Score = 30% of total score
  const internetData = analysis.cost_analysis.cost_categories.find(c => c.category === 'Internet Speed & Cost');
  if (internetData && internetData.internet_speed_mbps) {
    const speed = internetData.internet_speed_mbps;
    const minSpeed = 10;
    const maxSpeed = 100;
    const internetScore = Math.max(5, Math.min(100, 
      100 * (Math.log(speed + minSpeed) - Math.log(minSpeed)) / (Math.log(maxSpeed + minSpeed) - Math.log(minSpeed))
    ));
    totalScore += internetScore * weights.internet_quality;
    totalWeight += weights.internet_quality;
  }
  
  return totalWeight > 0 ? Math.round(totalScore / totalWeight) : 0;
}

The executive summary generation is the second of our two LLM-enhanced functions, and it demonstrates our crucial principle for building reliable AI agents: constrained, data-driven AI generation rather than open-ended creativity.

async function generateRemoteWorkSummary(cityName, countryName, costCategories, averageConfidence, remoteWorkScore, pppResults) {
  const summaryPrompt = `You are a remote work consultant. Based on the cost analysis data for ${cityName}, ${countryName}, write a concise summary (2-3 sentences) that helps remote workers understand this city's suitability.
Cost Data Summary:
- Average confidence: ${averageConfidence.toFixed(1)}%
- Categories analyzed: ${costCategories.length}
- Remote work score: ${remoteWorkScore}/100
- PPP factor: ${pppResults.ppp_factor.toFixed(3)}
- Country: ${countryName}
Key costs found:
${costCategories.map(cat => `- ${cat.category}: ${cat.usd_amount ? `$${cat.usd_amount}` : 'Not found'} (${cat.confidence}% confidence)`).join('\n')}
Write a practical summary that mentions:
1. Overall affordability/value proposition
2. Key strengths for remote workers (internet, costs, lifestyle)
3. Data reliability context
4. Specific appeal (e.g., time zones, infrastructure, cost savings)
Example format: "Lisbon offers a strong balance for remote workers, with affordable rent, reliable internet, and high quality-of-life scores. Based on a confidence score of 78.3%, it is considered a high-potential location for remote living, especially for Europeans seeking time zone alignment."
Keep it concise and actionable for someone deciding where to live remotely.`;

  try {
    const summaryResult = await generateObject({
      model: openai('gpt-4o-mini'),
      schema: z.object({
        summary: z.string().describe("Remote worker-focused summary of the city's suitability")
      }),
      prompt: summaryPrompt
    });
    return summaryResult.object.summary;
  } catch (error) {
    return `${cityName} shows ${averageConfidence > 70 ? 'strong' : averageConfidence > 50 ? 'moderate' : 'limited'} data availability for remote work planning. With a ${remoteWorkScore}/100 remote work score and ${pppResults.ppp_factor.toFixed(3)} PPP factor, it ${remoteWorkScore > 70 ? 'appears well-suited' : remoteWorkScore > 50 ? 'offers mixed potential' : 'may present challenges'} for remote workers based on available cost and infrastructure data.`;
  }
}

Notice what’s happening here:

We’re grounding the data strictly. Every piece of information in the prompt comes from actual search results and calculations
We have explicit constraints in place. The AI is told exactly what to focus on and how to structure the response
We use Zod with OpenAI SDK’s generateObject() to ensure consistent, parseable results
When the AI fails, there’s a deterministic backup based on the actual data

💡 The ChatGPT-style “Tell me about living costs in Bangkok for remote workers” would be entirely the wrong approach to take here. The AI would hallucinate costs, make up statistics, work off of outdated and/or biased training data, and provide generic advice.

With the agentic style of doing this “Based on these specific search results showing $400 rent and $25 internet costs, with 85% confidence from reliable sources, synthesize a remote work assessment”, the AI can only work with the provided data and must structure its response according to strict guidelines.

Perfect? No. But now we’re trading in the “wow factor” of general-purpose AI for predictable, grounded, auditable output — far more useful in production systems.

What makes this reasoning phase powerful is how it closes the learning loop. Every analysis:

Feeds back into strategy selection for future searches
Adjusts confidence based on past performance
Improves over time as the agent learns which approaches work best

The result is an agent that gets better at finding and analyzing living cost data with each city it processes.

Step 3: Reflect

The final phase, Reflection, is where we make an effort to make our Agent autonomous — before proceeding, we force it to evaluate its own performance and decide whether to discard all data and try again with different strategies.

// REFLECTION - Self-assessment and adaptation
async function reflect(context) {
  const evaluation = evaluateGoals(context);

  if (evaluation.goals_met) {
    return { should_continue: false, adaptations: [] };
  }

  if (!evaluation.should_retry) {
    return { should_continue: false, adaptations: [] };
  }

  // Analyze results for adaptation
  const results = context.reasoning?.cost_analysis?.cost_categories || [];
  const adaptationAnalysis = adaptSearchBasedOnResults(context, results);

  return {
    should_continue: true,
    adaptations: adaptationAnalysis.suggested_adaptations,
    needs_retry: adaptationAnalysis.needs_retry
  };
}

1. Goal Evaluation

// in agent.js
async function reflect(context) {
  const evaluation = evaluateGoals(context);

  if (evaluation.goals_met) {
    return { should_continue: false, adaptations: [] };
  }

// evaluateGoals() can be found in context.js
// Evaluate whether the agent has met its analysis goals and determines if retry is needed
function evaluateGoals(context) {
  const confidence_met = context.state.confidence >= context.goals.confidence_target;
  const completeness_met = context.state.completeness >= context.goals.completeness_target;
  const min_acceptable = context.state.confidence >= context.goals.min_acceptable_confidence;

  context.state.goals_met = confidence_met && completeness_met;

  return {
    confidence_met,
    completeness_met,
    min_acceptable,
    goals_met: context.state.goals_met,
    should_retry: !context.state.goals_met && context.state.iteration < context.state.max_iterations && min_acceptable
  };
}

The agent first asks itself: “Did I accomplish what I set out to do?”. It evaluates success based on concrete criteria:

Did I find cost data for most categories?
Is the data reliable enough to make decisions?
Do I have enough information about this city?

If the goals are met, the agent stops — no need to waste time and API calls on additional searches.

2. Try, try, try again.

if (!evaluation.should_retry) {
  return { should_continue: false, adaptations: [] };
}

But if our bar isn’t met, the agent has to make an intelligent decision:

Has it already tried multiple times? (Avoid infinite loops)
Is the data quality improving? (Don’t retry if searches keep failing)
Are there different strategies to try? (Only retry if there are new approaches)

This prevents the agent from getting stuck in unproductive loops while still allowing for intelligent persistence.

3. Adapt Strategy Choice

Remember how we used learning data in Perceive? Here’s where that comes from.

// in agent.js
// Analyze results for adaptation
const results = context.reasoning?.cost_analysis?.cost_categories || [];
const adaptationAnalysis = adaptSearchBasedOnResults(context, results);

return {
  should_continue: true,
  adaptations: adaptationAnalysis.suggested_adaptations,
  needs_retry: adaptationAnalysis.needs_retry
};

// adaptSearchBasedOnResults() can be found in strategies.js
// Determines if we need to retry based on confidence
function adaptSearchBasedOnResults(context, results) {
  // FYI: you should have per agent tracing here in production
  // Analyze results to determine if retry is needed
  const lowConfidenceCategories = results.filter(r => r.confidence < 60);
  const highConfidenceCategories = results.filter(r => r.confidence > 80);
  
  // Great, now we have a list of low confidence categories and high confidence categories identified
  return {
    needs_retry: lowConfidenceCategories.length > results.length * 0.5, // More than 50% low confidence
    suggested_adaptations: generateAdaptations(context, lowConfidenceCategories)
  };
}

// Generate adaptations based on the results of the search
// This is a simple, static list of recommendations that the agent can use to improve the query (expand search terms, try alternative approaches, try local sources, etc)
// In production, you should have a more dynamic list of adaptations 
function generateAdaptations(context, lowConfidenceCategories) {
  const adaptations = [];
  
  // If many categories failed, try broader search
  if (lowConfidenceCategories.length > 2) {
    adaptations.push('expand_search_terms');
  }
  
  // If we have low confidence categories, try alternative approaches
  if (lowConfidenceCategories.length > 0) {
    adaptations.push('alternative_category_approach');
    adaptations.push('try_local_sources');
  }
  
  return adaptations;
}

When the agent decides to retry, it doesn’t just repeat the same searches. It analyzes what went wrong and suggests specific adaptations:

Here’s some adaptive strategies I’m using, feel free to augment these with your own:

expand_search_terms: If searches were too narrow, broaden them with synonyms
try_local_sources: If general sources failed, focus on local forums and communities
alternative_category_approach : Can we use a different name for this category? Would an adjacent category fulfill the same purpose?
adjust_location_specificity: If city-specific searches failed, try country-level or region-level searches

…and saves that to the Agent context.

💡 You could even expand this to a shared memory for your entire fleet of agents if you want — just that it wouldn’t be very useful in this case because what an agent learns about sources and search queries while researching Bali, Indonesia won’t necessarily apply for Austin, TX.

Anyway, the Reflection phase is critical because it creates a powerful feedback mechanism that affects all the other phases:

Search → Find some data, miss some categories
Reason → Extract what’s available, note low confidence in missing areas
Reflect → “I’m missing rent data, and my internet searches were too generic”
Adapt → Try rent-specific sites and add internet speed terms
Search again → With better strategies based on what was learned

Without reflection, an agent is just an expensive API wrapper. With reflection, it a) knows when it’s succeeding or failing, b) can change strategies on the fly based on hard data, c) is efficient; stops when goals are met, avoiding unnecessary work, and d) is self improving; each iteration teaches it better approaches for future cities

This is what separates it from a chatbot or simple automation. You need that persistence and adaptability to make it genuinely useful for complex, real-world tasks where the first attempt often isn’t enough.

So that’s everything. Again, this project is on GitHub under an MIT License. Feel free to check it out. This being a complex project, I’ve split it into modules.

The Virtues of Handcuffing LLMs

While this was a weekend project, the architecture naturally scales into real product use cases — anything that has a need for fresh, structured input, and LLM interpretation on top. That’s where this stack shines.

The heart of this agent is live web data. Without fresh search results across cost categories, countries, and query styles, the loop would either stall or hallucinate. Bright Data’s SERP API solves this elegantly and gives me live Google data, structured and as expansive as I could want. If you need a SERP API, check out their pricing here.

That reliability meant I could focus on orchestrating the agent loop instead of babysitting proxies or fixing brittle scrapers.

Oh, and handcuffing the model (e.g. strict schemas, deterministic flows, scoped tasks) doesn’t just improve reliability — it creates observability. You can track failures, identify weak strategies, and learn iteratively. That feedback loop can’t exist if you’re treating the model like a magic oracle.

I Asked My AI Agent Where to Live on $2,000/Month. It Compared 5 Cities for Remote Workers.

Or: why you’re better off trading the flashy speculative autonomy of general purpose LLMs for strict, “guardrailed” utility when building real world AI agents.

Vibes vs. Verifiable Data

The Tech Stack

FAQs

1. Why Not Just Scrape Numbeo?

2. What mathematical assumptions are you making?

3. Why Bright Data for SERP?

Setting Up Bright Data’s SERP API

Designing a Realistic Agent Loop

Perceive

Reason

Reflect

Coding Our Agent

Step 1: Perception

1. Smart Caching Check

2. Adaptive Strategy Selection

3. Query-Level Adaptations

4. Execute & Learn

4. Building Structured Perceptions

5. Strategy Performance Tracking and Confidence Adjustment

6. State Update

Step 2: Reasoning

1. AI-Powered Cost Extraction

2. Strategy Confidence Integration

3. Learning Loop Completion

4. Economic Adjustments

5. Final Analysis and Scoring

Step 3: Reflect

1. Goal Evaluation

2. Try, try, try again.

3. Adapt Strategy Choice

The Virtues of Handcuffing LLMs

Comments

Promote your content

Join our developer community

Main Menu

I Asked My AI Agent Where to Live on $2,000/Month. It Compared 5 Cities for Remote Workers.

Or: why you’re better off trading the flashy speculative autonomy of general purpose LLMs for strict, “guardrailed” utility when building real world AI agents.

Vibes vs. Verifiable Data

The Tech Stack

FAQs

1. Why Not Just Scrape Numbeo?

2. What mathematical assumptions are you making?

3. Why Bright Data for SERP?

Setting Up Bright Data’s SERP API

Designing a Realistic Agent Loop

Perceive

Reason

Reflect

Coding Our Agent

Step 1: Perception

1. Smart Caching Check

2. Adaptive Strategy Selection

3. Query-Level Adaptations

4. Execute & Learn

4. Building Structured Perceptions

5. Strategy Performance Tracking and Confidence Adjustment

6. State Update

Step 2: Reasoning

1. AI-Powered Cost Extraction

2. Strategy Confidence Integration

3. Learning Loop Completion

4. Economic Adjustments

5. Final Analysis and Scoring

Step 3: Reflect

1. Goal Evaluation

2. Try, try, try again.

3. Adapt Strategy Choice

The Virtues of Handcuffing LLMs

Comments

Promote your content

Join our developer community