TL;DR — Silent failures break data pipelines. This post shows how Effect-TS enables typed errors, safe resource management, declarative retry logic, and composable pipelines to build predictable, fault-tolerant web data ingestion systems at scale.

It annoys me to no end that production web data pipelines rarely fail catastrophically. Instead, batch jobs “succeed” with incomplete data — silently corrupting downstream analytics, triggering retry storms that lead to IP bans, or letting one bad edge case crash a large nightly job.
I’m currently rebuilding the web data ingestion pipeline I’m responsible for at work: aggregation and analysis from 100+ upstream sources daily, hundreds of items per batch, with strict consistency requirements. Over time, I stopped trying to paper over failures with more logging + more retries, and started looking for a way to make system behavior explicit and easier to reason about.
That search eventually led me to Effect (formerly, Effect-TS) — a TypeScript effect system for modeling side effects, failures, and resource lifecycles directly in the type system.
Effect didn’t make my life easier in the sense of “fewer lines of code.” What it changed was how I thought about failure in TypeScript systems. Instead of treating network errors, rate limits, and partial responses as things to catch and move on from, Effect pushes you to model these failure modes explicitly and decide ahead of time how the system should respond.
Reliability engineering isn’t about building systems that never fail. It’s about building systems where failure is expected, understood, and bounded — so it doesn’t cascade into larger outages or silent data corruption.
In this post, I’ll walk through what that style of reliability engineering looks like in practice: using Effect-TS with typed errors, resource management, and declarative retries to build a fault-tolerant web data ingestion pipeline whose behavior is predictable under real-world failure.
All of the code in this post lives in a public Effect-TS web scraping and data ingestion repository on GitHub:
https://github.com/sixthextinction/effect-ts-scraping
What is Effect-TS?
At a practical level, Effect lets you describe work without running it yet.
An Effect value represents an operation that might perform I/O, might fail, and might depend on some environment…but none of that happens until you explicitly run it.
The crucial thing to realize is that Effect doesn’t just describe what the operation does. It also encodes what the operation produces on success, what it can fail with, and, optionally, what it depends on.
And ALL of that information lives in the TypeScript type system.
This might not sound like a big deal at first, but it changes when decisions get made — and turns out, that matters a lot.
In a typical TypeScript codebase, a data-fetching function looks like this:
async function fetchHtml(url: string): Promise<string> {
const res = await fetch(url);
if (!res.ok) {
throw new Error(`Request failed: ${res.status}`);
}
return await res.text();
}
// and then...
const promise = fetchHtml("https://example.com");
What most people don’t realize is that Promises in JavaScript are eager. As soon as that line runs, the request has already started. Even if you never await the promise, the network request is already in flight, side effects have already happened, and yes — failures may already be occurring.
Now compare that to an Effect-based version:
// first define the error
class NetworkError extends Data.TaggedError('NetworkError')<{
url: string;
}> {}
// then, do this.
const fetchHtml = (url: string): Effect.Effect<string, NetworkError, never> =>
Effect.tryPromise({
try: async () => {
const res = await fetch(url);
if (!res.ok) {
throw new Error();
}
return await res.text();
},
catch: () => new NetworkError({ url }),
});
Effect is lazy, not eager. With Effect, just doing const effect = fetchHtml(“https://example.com"); does nothing. It’s simply data, returning a description of a computation. Nothing runs until you explicitly say so, by calling a runner like this:
Effect.runPromise(fetchHtml("https://example.com"));
Because the work doesn’t start until you run that, turns out you can still alter how it should behave — retries, timeouts, cancellations and more, attached before execution, not bolted on afterward.
Instead of discovering failure modes at runtime (or more likely, encoding them in comments/conventions) you’re forced to confront them at design time.
const program = pipe(
fetchHtml("https://example.com"),
Effect.retry(retryPolicy), // add retry logic
Effect.timeout("10 seconds") // add exponential backoff
// anything else
);
// still nothing has run
// until….
Effect.runPromise(program);
That’s why Effect works so well for hostile I/O like data ingestion. You’re deciding ahead of time how the system behaves when failures do happen. And with it, cross-cutting concerns (retries, rate limits, cleanup) can go on top without refactoring core logic.
Also, look at the Effect version’s type — Effect<string, NetworkError> — this is a machine-checkable contract that tells you, precisely:
- this operation performs effects
- it produces a string on success
- it can fail with
NetworkError - it CANNOT fail with any other expected error
Compare that to the vanilla TS type signature ((url: string) => Promise<string>), you cannot tell:
- what errors might be thrown
- whether they’re retryable
- whether this is safe to call multiple times
- whether this does I/O or just compute
All of that information exists only in comments, conventions, or someone’s head (or you only find out by running it and reacting.)
All this is why Effect feels like the TypeScript framework that you didn’t know you needed.
How Effect Changed How I Design for Failure
When the mental model of Effect clicked for me, I knew that if I can describe behavior before anything runs, then I’m not just deciding what happens on success, I’m deciding how the operation behaves under every condition. That includes failures obviously, but it also includes retries, slowdowns, and backpressure.
That’s where my thinking about data ingestion started to change. Most failures in a data ingestion pipeline are expected. None of them behave like typical fix-and-forget bugs:
- networks are slow or unavailable
- upstream APIs rate-limit you
- data formats change without notice
- some batches succeed while others fail
What’s different about these failures is that they’re maddeningly partial, and often. A job can succeed just enough to look healthy while quietly producing incomplete/stale data.
That’s not a correctness problem so much as a reliability problem. Once I started using Effect more deliberately, I noticed that it actually pushed me away from reacting to failures after the fact, and toward making those decisions up front. So instead of adding another retry or another catch, while designing I had to decide:
- What kinds of failures do I expect to see in production?
- Which of these should be retried, and which shouldn’t?
- When should the pipeline slow down instead of pushing harder?
- When is failing fast the correct behavior?*
*This one is slightly debatable, but lets throw it in there because it’s an adjacent problem anyway
Because these questions were now part of the TypeScript type system itself, those decisions end up close to the code that triggers them. There’s less room for “we’ll handle it later” logic that never quite materializes because Effect forces the conversation.
Designing a Web Data Pipeline with Effect
The first concrete step was obvious: I needed to enumerate what actually breaks in my pipeline, and decide how each case should behave.
So I sat down and reduced my Puppeteer-based ingestion pipeline down to its real failure modes:
- Network timeouts. Transient. These should be retried with backoff.
- Rate limits. Expected. These require slowing down.
- IP blocks. Fatal without proxy rotation; but with the right infrastructure (as was my case), just another retryable case.
- CAPTCHAs. Not a logic problem. For me, this is handled entirely by the proxy layer, and is also retryable without any code on my part.
- Schema changes. The site changed and selectors broke. This isn’t transient — it’s a logic error and should fail fast.
Traditional error handling lumps all of these into “something went wrong, throw an exception.” Effect lets you model them as distinct failure types, which means you can build infrastructure that handles them systematically. And that’s exactly where we’re going to start.
For reference, here’s the code for the full pipeline: https://github.com/sixthextinction/effect-ts-scraping/blob/main/full-pipeline.ts
Failure as a First-Class Concept (Tagged Errors)
The first thing I do is write down every failure mode I expect to see, and give each one a name. Each of these errors represents something meaningfully different from an operational perspective.
class NetworkError extends Data.TaggedError('NetworkError')<{
message: string;
url: string;
cause?: unknown;
}> {}
class TimeoutError extends Data.TaggedError('TimeoutError')<{
message: string;
url: string;
timeout: number;
}> {}
class RateLimitError extends Data.TaggedError('RateLimitError')<{
message: string;
url: string;
retryAfter?: number;
}> {}
class IPBlockError extends Data.TaggedError('IPBlockError')<{
message: string;
url: string;
proxyId?: string;
}> {}
class ParseError extends Data.TaggedError('ParseError')<{
message: string;
cause?: unknown;
}> {}
What’s this Data.TaggedError? That’s something Effect provides us. Basically, it’s a premade error class that automatically gets a _tag field — a string literal that acts as a discriminant.
This _tag field gives us type-safe error handling. TypeScript can distinguish between different error types at compile time, and you can use functions like Effect.catchTag to handle specific errors without losing type information.
You technically can do this with vanilla TypeScript (discriminated unions), but it’ll be a pain. Yes, you can catch generic Error objects and use instanceof checks — but TypeScript can’t always narrow them correctly. Effect’s tagged errors give you precise type narrowing. When you catch a NetworkError, for example, TypeScript 100% knows it has a url property. When you catch a RateLimitError, TypeScript knows it might have a retryAfter property. This makes error handling both type-safe and composable (not to mention 500% less annoying to write code for. 😅)
Anyway, when modeling complex pipelines, this has to be your first step because once you have typed errors, you can decide which ones are retryable and which aren’t:
const retryableErrors = [
'NetworkError',
'TimeoutError',
'RateLimitError',
'IPBlockError',
'BrowserError',
] as const;
const isRetryableError = (error: ScrapingError): boolean =>
retryableErrors.includes(error._tag as any);
So a ParseError means your HTML selectors broke. That’s not a network problem so retrying won’t help. But a TimeoutError is when you retry with backoff.
Browser Logic — Side Effects Without the Pain
My pipeline uses Puppeteer to handle dynamic/JS based websites.
For this, we’ll use the Effect interface. (this Effect is a type within the Effect-TS/effect library we’re working with) Instead of letting Puppeteer leak browser state all over the codebase, everything related to it lives in these Effects.
The Effect interface is the quintessential part of the Effect-TS library — a description of a workflow or operation that is lazily executed. Here's what it looks like:
Effect<Success, Error, Requirements>
Where
Successrepresents the type of what is returned on a success,Errorrepresents the same for an error, andRequirementsrepresents the type of required dependencies that you need to pass.
We’ve talked about the main difference before — unlike Promises, Effects are lazy. They don’t run until executed. This opens up a lot of opportunities for us for composition, cancellation, and resource management.
Using Effect, the lowest-level operation we need is…simply launching a browser. Makes sense that this should be our Step 1:
// STEP 1: Actually launch the browser
// tryPromise converts a Promise-returning function into an Effect
// Errors are caught and converted to typed errors (BrowserError)
const launchBrowser = (
proxyConfig: ProxyConfig
): Effect.Effect<Browser, BrowserError, never> =>
Effect.tryPromise({
try: async () => {
process.env['NODE_TLS_REJECT_UNAUTHORIZED'] = '0'; // disable SSL validation for Bright Data proxy
return await puppeteer.launch({
headless: true,
ignoreHTTPSErrors: true, // ignore SSL certificate errors
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
`--proxy-server=${proxyConfig.host}:${proxyConfig.port}`, // proxy host:port (credentials set via page.authenticate later)
],
});
},
catch: (error: unknown) =>
new BrowserError({
message: 'Failed to launch browser with Bright Data proxy',
cause: error,
}),
});
The never in the Requirements position means the effect doesn’t require any external dependencies or context to run.
Effect.tryPromise will convert a Promise-returning function into an Effect (here, puppeteer.launch). Any thrown error gets mapped into a typed failure — since I explicitly use a catch function here, it will explicitly map it to an error of type BrowserError.
Your proxy config can live in a separate object like so. Using a proxy is technically optional, but I already had access to residential proxies and that handles the messy parts for me — fingerprinting, CAPTCHA solving, IP rotation, and geo-targeting — so in my pipeline, my Puppeteer instances behave like a real user instead of getting blocked immediately, with no extra code on my part.
Bright Data - All in One Platform for Proxies and Web Scraping
// Bright Data HTTP Proxy configuration (from env vars or .env file)
// You'll get these values from your dashboard when you sign up
const BRIGHT_DATA_CONFIG = {
customerId: process.env.BRIGHT_DATA_CUSTOMER_ID,
zone: process.env.BRIGHT_DATA_ZONE,
password: process.env.BRIGHT_DATA_PASSWORD,
proxyHost: 'brd.superproxy.io',
proxyPort: 33335,
};
// Validate configuration
if (!BRIGHT_DATA_CONFIG.customerId || !BRIGHT_DATA_CONFIG.zone || !BRIGHT_DATA_CONFIG.password) {
throw new Error(
'Bright Data configuration missing. Set BRIGHT_DATA_CUSTOMER_ID, BRIGHT_DATA_ZONE, and BRIGHT_DATA_PASSWORD environment variables or add them to .env file'
);
}
interface ProxyConfig {
host: string;
port: number;
username: string;
password: string;
}
const buildProxyConfig = (): ProxyConfig => {
const username = `brd-customer-${BRIGHT_DATA_CONFIG.customerId}-zone-${BRIGHT_DATA_CONFIG.zone}`;
return {
host: BRIGHT_DATA_CONFIG.proxyHost,
port: BRIGHT_DATA_CONFIG.proxyPort,
username,
password: BRIGHT_DATA_CONFIG.password!,
};
};
Those proxy config values are just the credentials you get when you set up a proxy to use.
With Puppeteer, you use proxies via page.authenticate(), which is where we’ll use this in the next step.
Alright, so as of now, we have a Puppeteer instance up and running. Next, we need navigation and content extraction. We’ll use Effect.acquireUseRelease to do this:
// STEP 2: Go to page, extract content.
const navigatePageAndGetContent = ( browser: Browser, // this was returned as a result of what we did in Step 1
url: string, // the URL to go to
proxyConfig: ProxyConfig, // we already set this up earlier
timeout: number // use your own values in ms) =>
Effect.acquireUseRelease(
// acquire: create the page
// use: navigate and get content
// release: always close the page, even on error
);
This acquireUseRelease is Effect’s version of try / finally. You use it when describing real-world operations where you have to work with external resources (database connections, network stuff, etc.) that must be acquired, used properly, and released when no longer needed (even an error occurs).
It always involves a 3-step process. For us, this will involve:
- Acquire: open a page in Puppeteer using a proxy that we authenticate
- Use: navigate, check status codes, return HTML
- Release: close the page, even if something failed
You don’t have to explicitly remember to do cleanup — the structure enforces it.
Let’s look at all of those steps in detail.
// STEP 2: Go to page, extract content.
// Effect.acquireUseRelease manages resource lifecycle: acquire, use, and release
// Ensures cleanup happens even if errors occur (like try/finally)
// See: https://effect.website/docs/resource-management/introduction
const navigatePageAndGetContent = (
browser: Browser,
url: string,
proxyConfig: ProxyConfig,
timeout: number = 10000
): Effect.Effect<string, BrowserError | TimeoutError | IPBlockError | RateLimitError | NetworkError> =>
Effect.acquireUseRelease(
// STEP 2.1: acquire: create the page
Effect.tryPromise({
try: async () => {
const page = await browser.newPage();
await page.authenticate({ username: proxyConfig.username, password: proxyConfig.password }); // authenticate with Bright Data proxy
return page;
},
catch: (error: unknown) =>
new BrowserError({
message: 'Failed to create page or authenticate',
cause: error,
}),
}),
// STEP 2.2: use: navigate and get content
(page) =>
Effect.tryPromise({
try: async () => {
const response = await page.goto(url, {
waitUntil: 'networkidle2', // use 'load' if 'networkidle2' fails - proxies can have background requests that never stop
timeout: timeout
});
// check for HTTP errors that indicate blocks/rate limits
if (response) {
const status = response.status();
if (status === 429) {
throw new RateLimitError({
message: `Rate limited: ${url}`,
url,
});
}
if (status === 403) {
throw new IPBlockError({
message: `IP blocked: ${url}`,
url,
});
}
if (status >= 400) {
throw new NetworkError({
message: `HTTP error ${status}: ${url}`,
url,
});
}
}
return await page.content();
},
catch: (error: unknown) => {
if (error instanceof RateLimitError || error instanceof IPBlockError) {
return error;
}
if (error instanceof NetworkError) {
return error;
}
if (error instanceof Error && error.message.includes('timeout')) {
return new TimeoutError({
message: `Navigation timeout after ${timeout}ms`,
url,
timeout,
});
}
return new BrowserError({
message: 'Failed to navigate or get content',
cause: error,
});
},
}),
// STEP 2.3: release: always close the page, even on error
(page) =>
pipe(
Effect.tryPromise({
try: async () => await page.close(),
catch: () => new Error('Failed to close page'),
}),
Effect.catchAll(() => Effect.void) // ignore close errors
)
);
Effect's pipe (seen here in Step 2.3) composes functions left-to-right, passing the output of one as input to the next. It makes Effect operations readable instead of nested. So while reading, you start with Effect.tryPromise, then apply Effect.catchAll
Great, we now have a Puppeteer instance, and we can use it to go visit a page, extract the content we want, and close the page. Now we just have to bring it all together i.e. manage the browser lifecycle.
This should also be on an acquire → use → release cycle, this time on a browser level rather than a page level.
const scrapeUrl = (
url: string,
options?: { timeout?: number }
): Effect.Effect<
string,
BrowserError | TimeoutError | IPBlockError | RateLimitError | NetworkError, never>
=> {
const proxyConfig = buildProxyConfig();
const timeout = options?.timeout || 10000;
// Bright Data automatically rotates IPs on each request,
// so retrying after an IP block gets a fresh IP
return Effect.acquireUseRelease(
// STEP 1: ACQUIRE -- launch browser
launchBrowser(proxyConfig),
// STEP 2: USE -- navigate and extract HTML
(browser) =>
navigatePageAndGetContent(browser, url, proxyConfig, timeout),
// STEP 3: RELEASE -- always clean up the browser
(browser) =>
pipe(
Effect.tryPromise({
try: async () => await browser.close(),
catch: () => new Error('Failed to close browser'),
}),
Effect.catchAll(() => Effect.void)
)
);
};
💡 In production usage you will usually also want to separate our fetch → parse → exit cycle into fetch → persist raw → parse → persist parsed, so you can debug raw HTML later, or parallelize parsing.
Again, each step is expressed as a function returning an Effect, and cleanup is guaranteed, even if navigation or retries fail.
At the end of this stage we have the basic Puppeteer loop: visiting a dynamic page, extracting its HTML, and cleaning up after ourselves.
But there’s more to do — namely, making all of the above work with parsing (business logic), and retry behavior + rate limiting (cross-cutting concerns.)
Retries That Understand Why Something Failed
Effect makes retry behavior declarative via its Schedule API.
// remember we defined which errors were retryable in step 1
// here, first, we define HOW we should schedule retries…
const retryPolicy = pipe(
Schedule.exponential(Duration.seconds(1)),
Schedule.intersect(Schedule.recurs(3))
);
This says: “retry with exponential backoff starting at 1 second, up to 3 times, but only if the error is retryable.”
But the schedule alone isn’t enough. The system also needs to know which failures deserve a retry. Luckily, we already know those.
//… and then actually retry the ones which are retryable
// Effect<A, E, R> is just the Effect ecosystem’s convention/shorthand for Effect<Success, Error, Requirements>
const retryIfRetryable = <A, E extends ScrapingError, R>( effect: Effect.Effect<A, E, R>) =>
Effect.retry(effect, {
schedule: retryPolicy,
until: (error) => isRetryableError(error),
});
The until predicate is the key. Instead of retrying blindly, the system checks the error type and only retries when it makes sense. If you hit a ParseError (not in our list of retryable errors), the pipeline fails immediately — which makes sense, no point in hammering a broken CSS selector.
This is where our tagged errors from Step 1 pay off. The retry logic doesn’t inspect strings or guess intent. It operates entirely on types.
We’ll use retryIfRetryable at the very end, when we’re bringing all parts of our pipeline together.
Rate Limiting as a Declarative Policy
Rate limiting is what we call backpressure, rather than proper error handling. That is — you don’t want to wait until you get rate-limited to slow down, you want to prevent it in the first place.
For this tutorial, we can keep rate limiting intentionally boring here. Because even our rate limiting is an Effect (Effect.Effect), it composes cleanly with retries and resource management above.
const withSimpleRateLimit = <A, E, R>( effect: Effect.Effect<A, E, R>) =>
pipe(
Effect.sleep(Duration.millis(100)),
Effect.flatMap(() => effect)
);
This simply introduces a delay before the effect runs. Just like retries, we’ll use withSimpleRateLimit at the very end when composing the pipeline.
Of course, if you wanted to, you could go all out — you can build production grade rate limiting because Effect provides primitives like Ref (keep track of some form of state), Queue (lightweight in-memory queue), and Schedule (we used this in just the previous step.)
💡 I’m not going to go into detail on building a full rate limiter with Effect because that’s way too much cognitive load for just a blogpost. The point isn’t to build perfect throttling — it’s to show that backpressure can be a first-class part of the pipeline in Effect.
Parsing as Its Own Failure Domain
Finally, remember that while parsing logic is synchronous, it can still fail. We haven’t accounted for those yet.
This is our HTML parsing step to get the data we need. Instead of letting ParseError’s throw, it’s better if we wrap it in Effect.try — this is similar to Effect.tryPromise from earlier, but ONLY for synchronous functions that may throw (like cheerio parsing here.)
// This is just your scraping logic with selectors for the data you want
// For this one we just get h1’s and spans.
interface ScrapingResult {
title: string;
spans: string[];
url: string;
}
// Effect.try wraps synchronous logic that may throw
// Errors are caught and converted to typed errors (ParseError)
const parseHtml = (html: string): Effect.Effect<ScrapingResult, ParseError, never> =>
Effect.try({
try: () => {
const $ = cheerio.load(html);
const title = $('h1').text().trim();
const spans = $('span')
.map((_i: number, el: any) => $(el).text().trim())
.get()
.filter((s: string) => s.length > 0);
return {
title,
spans,
url: TARGET_URL,
};
},
catch: (error: unknown) =>
new ParseError({
message: 'Failed to parse HTML',
cause: error,
}),
});
This completes our error modeling — now parsing failures are distinct from network failures and both are handled properly. If parsing breaks, all our retries stop. That should be intentional — and so we made it so.
Composing the Pipeline
Up to this point, we’ve built individual pieces in isolation:
scrapeUrlknows how to fetch HTML safelyretryIfRetryableknows when to retrywithSimpleRateLimitenforces basic throttlingparseHtmlturns raw HTML into structured data as per our domain logic (we provide the selectors we need)
Now we compose them into a single pipeline.
const scrapeWithRetry = (): Effect.Effect<
ScrapingResult,
ScrapingError
> =>
pipe(
// Step 2: fetch HTML
scrapeUrl(TARGET_URL, { timeout: 30000 }),
// Step 4: apply rate limiting
withSimpleRateLimit,
// Step 3: retry transient failures
retryIfRetryable,
// Step 5: parse HTML
Effect.flatMap(parseHtml)
);
Read this top to bottom.
- We start by fetching HTML with
scrapeUrl(which instantiates and uses Puppeteer) - That operation is rate-limited
- If it fails with a retryable error, it’s retried with backoff
- If it succeeds, we move on to parsing
- If parsing fails, the whole Effect fails immediately (as it should)
There are no callbacks here, no try/catch, and no manual error propagation. Control flow is handled by the Effect runtime.
Crucially, this function does not run anything yet. It only describes what should happen.
💡 I’ve skipped observability for this blog post, but in production, you should add more detailed logging, save retry metrics/failures, tracing per URL or proxy etc. Effect makes this very easy with its Logging APIs.
Executing the Pipeline
So far, we’ve built a description of a workflow. To actually execute it, we need to define what happens at the edges of the system (and this is where we finally use scrapeWithRetry.)
That’s what the final program does. This is again a pipe, so these happen sequentially:
const program = pipe(
scrapeWithRetry(),
// Log success
Effect.tap((result: ScrapingResult) =>
pipe(
Console.log('Scraping successful!'),
Effect.flatMap(() =>
Console.log(JSON.stringify(result, null, 2))
)
)
),
// Handle all failures in one place
Effect.catchAll((error: ScrapingError) =>
pipe(
Console.error('Pipeline failed:', error),
Effect.flatMap(() =>
Effect.sync(() => process.exit(1))
)
)
)
);
Finally, here’s how you run this.
// This is the entry point that ACTUALLY kicks off the entire pipeline
Effect.runPromise(program).catch((error: unknown) => {
console.error('Unhandled error:', error);
process.exit(1);
});
The
.catch()wrapper here handles any truly unexpected errors that escape the Effect system (really shouldn’t happen, but it’s defensive programming).
This is the moment where everything becomes “real” — the browser is launched, requests are made, retries happen, resources are acquired and released, and logs are written.
Until this line runs, nothing has executed.
That mental separation — describing a workflow first, then running it explicitly — is one of the key reasons Effect works well for systems like this. You can reason about behavior before anything touches the network.
Why This Matters for Production Data Pipelines
So what did we build? Our pipeline has a few important properties:
- Our errors are part of the type system and MUST be handled or intentionally propagated, and structured errors make any debugging or observability WAY easier
- All of our resource lifecycles are enforced by construction
- Our retry behavior is declarative, composable, and constrained by error types. In general, all cross-cutting concerns (retries, rate limits, cleanup) compose without refactoring core logic
- All failure modes are explicit and discoverable at compile time
- Any concurrency is safer by default, especially around shared resources
That structure is what makes this production pipeline evolvable. You can add ingestion sources, tune retries, adjust rate limits, or add observability without turning the code into a pile of special cases.
This system will always, always fail predictably, with enough context to debug what went wrong and why.
This kind of setup pays off when you’re scraping at scale, your failures have real business impact, and you need debuggability and auditability — something a team will maintain.
Most scraping tutorials stop at “how to fetch a lot of HTML without getting blocked.” That’s not the hard part. The hard part is building something you don’t have to babysit. Effect.ts gives you a way to model failure honestly, and a lot of readymade, first-class APIs to handle the parts application code built from scratch never should + you can add proxies to handle CAPTCHA/general unblocking. It’s a solid foundation to work off of.
It’s way more difficult, absolutely — Effect’s learning curve is more like a cliff wall — but it’s also way more reliable. And when a system runs unattended in production, that reliability is what actually matters.
Full source code: https://github.com/sixthextinction/effect-ts-scraping