Everyone wants more money — whether to grow a business or fund new ventures. But securing the right investors or clients isn’t about luck; it’s about finding and reaching the right leads, so it has to be strategically done.
The challenge? Even for companies with dedicated sales and marketing teams, lead generation remains an uphill battle. Traditional methods — cold outreach, LinkedIn searches, and static lead lists — are often time-consuming, inaccurate, and yield low conversion rates. To stay ahead, businesses need data-driven lead-generation strategies that leverage real-time business intelligence.
So, where do we find these leads? Where are the decision-makers and high-value prospects hiding?
That’s where ZoomInfo and Crunchbase come in. These platforms contain information on companies, decision-makers, and funding activities, allowing businesses to identify high-value leads with precision. However, manually sifting through this data isn’t scalable.
That’s why I built an automated lead generation tool that extracts, processes, and ranks leads using ZoomInfo and Crunchbase datasets from Bright Data. This tool helps businesses focus on the most promising prospects by analyzing key factors like funding rounds, employee growth, and industry relevance.
What You’ll Learn
In this article, I’ll walk you through: ✅ Accessing clean and structured ZoomInfo and Crunchbase datasets. ✅ Smart filtering and scoring strategies to identify high-quality leads. ✅ Automating lead retrieval and integrating with CRMs. ✅ Lessons learned from building a data-driven lead generation system.
By the end, you’ll have a clear blueprint to build your own AI-powered lead generation system — one that delivers high-value leads in real-time and eliminates the guesswork in prospecting.
Let’s get started! 🚀
Understanding the Data Sources
To build an effective business lead generation tool, I needed to find a platform that provided rich insights into companies and key decision-makers. I went with two of the most influential business intelligence datasets in this project: ZoomInfo and Crunchbase. Let’s break down what each platform offers and how they contribute to lead generation.
ZoomInfo Dataset — Comprehensive B2B Data
ZoomInfo is a top-tier B2B data provider offering extensive intelligence on companies, leadership teams, funding history, and more. Key attributes include:
🔹 Company Information: Name, description, industry, website, headquarters 🔹 Financial Data: Revenue, total funding amount, funding rounds 🔹 Leadership & Contacts: CEO, leadership team, top contacts 🔹 Employee Breakdown: Total employees, department-wise breakdown 🔹 Technology & Products: Tech stack, products owned 🔹 Market Positioning: Similar companies, popular searches
How ZoomInfo Data Helps in Lead Generation
✅ Identify high-growth companies based on revenue and funding rounds ✅ Target decision-makers (CEOs, VPs, Directors) with accurate contact details ✅ Find companies using specific technologies that match your offerings ✅ Analyze organizational structure to prioritize high-value leads
Crunchbase Dataset — Market & Investment Insights
Crunchbase focuses on company funding, investments, and market trends. It’s particularly valuable for tracking startups and venture-backed firms. Key attributes include:
🔹 Company Overview: Name, industries, headquarters, operating status 🔹 Financial Data: Funding rounds, investors, investment stage 🔹 Employee & Leadership Data: Founders, leadership hires 🔹 Market Presence: Monthly visits, web traffic, technologies used
How Crunchbase Data Enhances Lead Generation
✅ Track startups with recent funding rounds ✅ Identify companies at the right funding stage for targeted outreach ✅ Discover key investors and decision-makers in emerging markets
Combining ZoomInfo & Crunchbase for Maximum Impact
By merging ZoomInfo’s company intelligence with Crunchbase’s funding data, I created a highly accurate lead-scoring system prioritizing:
✅ Fast-growing companies with recent funding ✅ Companies hiring for leadership positions (indicating expansion) ✅ Businesses using specific technologies aligned with our solutions ✅ High-traffic companies with strong media presence
Now, how do I get this data to build the lead generation tool?
How I Got the ZoomInfo and Crunchbase Data from Bright Data
Before diving into the tool’s development, I needed clean, structured, and enriched datasets. Instead of scraping these platforms manually (which is complex and often against their policies), I used Bright Data’s ready-made datasets.
Why Bright Data?
Bright Data provides:
✅ Pre-cleaned and structured datasets — No need for data wrangling ✅ Real-time and historical data — Ensuring up-to-date insights ✅ API access — Seamlessly integrating with any data pipeline
👉 You can find fresh, ready-to-use, and structured datasets from your favorite websites below here.
To obtain the datasets:
- Sign up on Bright Data and access their dataset marketplace from the dashboard.
Press enter or click to view image in full size
2. Search for ZoomInfo and Crunchbase datasets, respectively — both covering company profiles, financials, and decision-makers.
Press enter or click to view image in full size
Press enter or click to view image in full size
3. Download the data in CSV format from Bright Data’s dashboard. (You can use an API option; it happens for future updates and scalability. You can also download sample data and try it out.)
With the data in hand, it was time to build the lead generation system.
Building the Lead Generation Tool
Step 1: Setting Up the Project
I’ll use Python for data processing and Streamlit for a simple UI.
- Create the project folder:
mkdir lead-gen-tool
cd lead-generation-tool
2. Set up a virtual environment:
python -m venv venv
Activate the environment:
- Windows:
venv\Scripts\activate
- macOS/Linux:
source venv/bin/activate
3. Install Dependencies:
pip install pandas streamlit plotly langchain_community
The key dependencies include:
- Streamlit — for building the UI
- Pandas — for handling dataset operations
- Plotly — for visualizing data
- LangChain (Ollama) — for AI-driven company analysis
4. Define the Project Structure:
lead-gen-tool/
│── data/
│ ├── zoominfo_data.csv
│ ├── crunchbase_data.csv
│── app.py
Step 2: Installing and Running the Ollama Phi3 Model Locally
I want to be able to generate insights about the growth potential and market position of the companies and also the AI to provide actionable recommendations. To achieve this, I used the Ollama Phi3 model; why did I choose it? It is free and runs locally. Ollama has other free models like DeepSeek-R1, Llama 3.3, Gemma, etc.
- Install Ollama
Ollama provides a simple CLI tool to run large language models (LLMs) locally. Install it by following the instructions for your operating system:
- Windows (PowerShell):
iwr -useb https://ollama.ai/install.ps1 | iex
- Linux (Curl):
curl -fsSL https://ollama.ai/install.sh | sh
- macOS (Homebrew):
brew install ollama
2. Download the Phi3 Model:
ollama pull phi3
3. Run the Ollama Model:
ollama run phi3
This starts an interactive chat session on your terminal where you can input text prompts and receive AI-generated responses.
NB: Always ensure that the Ollama model is running locally when you run your code. If not, you won’t be able to query the AI model.
Step 3: Implementing the Lead Generation Tool
Now, let’s break down the core functionalities.
A. Loading and Preprocessing the Data
First, I loaded the datasets from CSV files and cleaned them.
import pandas as pd
import streamlit as st
@st.cache_data
def load_data():
zoominfo_df = pd.read_csv("Zoominfo.csv")
crunchbase_df = pd.read_csv("Crunchbase.csv")
# Standardize column names
zoominfo_df.columns = zoominfo_df.columns.str.strip().str.lower()
crunchbase_df.columns = crunchbase_df.columns.str.strip().str.lower()
# Handle missing columns
required_cols = ["name", "industry", "revenue", "employees", "headquarters", "website", "total_funding_amount"]
for col in required_cols:
if col not in zoominfo_df.columns:
zoominfo_df[col] = "Unknown"
zoominfo_df["total_funding_amount"] = pd.to_numeric(
zoominfo_df["total_funding_amount"], errors="coerce"
).fillna(0)
return zoominfo_df, crunchbase_df
✅ What this does:
- Reads CSV files
- Standardizes column names to lowercase
- Ensures required columns exist
- Converts the total_funding_amount column to numeric
B. Filtering Leads Based on User Input
The aim here is to filter companies by industry, employee size, and revenue range.
# Sidebar filters
st.sidebar.header("Filters")
industries = sorted(zoominfo_df["industry"].dropna().unique())
selected_industries = st.sidebar.multiselect("Select Industries", industries)
employee_ranges = ["0-50", "51-200", "201-1000", "1000+"]
selected_employee_range = st.sidebar.selectbox("Employee Size", employee_ranges)
revenue_ranges = ["$0-1M", "$1M-10M", "$10M-50M", "$50M+"]
selected_revenue_range = st.sidebar.selectbox("Revenue Range", revenue_ranges)
✅ What this does:
- Users can filter leads based on industry, employee size, and revenue range.
C. Displaying the Filtered Leads
Filtered leads are displayed in a table, and users can download the results.
filtered_df = zoominfo_df.copy()
if selected_industries:
filtered_df = filtered_df[filtered_df["industry"].isin(selected_industries)]
st.dataframe(filtered_df[["name", "industry", "revenue", "employees", "headquarters", "website", "total_funding_amount"]])
# Export button
csv_data = filtered_df.to_csv(index=False)
st.download_button("Export Filtered Leads", csv_data, "filtered_leads.csv", "text/csv")
✅ What this does:
- Displays leads based on selected filters
- Allows users to export filtered leads
D. AI-Powered Company Analysis
I used LangChain (Ollama) to analyze a company’s market position. Ollama is open-source, so you can run it for free.
from langchain_community.llms import Ollama
llm = Ollama(model="phi3")
def analyze_company(company_data, llm):
prompt = f"""
Analyze the following company data:
{company_data.to_dict()}
Focus on:
1. Growth potential
2. Market position
3. Investment opportunities
4. Key strengths and weaknesses
"""
try:
return llm.invoke(prompt)
except Exception as e:
return f"Error generating analysis: {e}"
# Company selector
selected_company = st.selectbox("Select a company to analyze", filtered_df["name"].unique())
if selected_company:
company_data = filtered_df[filtered_df["name"] == selected_company].iloc[0]
if st.button("Generate Analysis"):
with st.spinner("Analyzing company data..."):
analysis = analyze_company(company_data, llm)
st.write(analysis)
✅ What this does:
- Uses Ollama’s AI model to analyze a company
- Provides insights into growth potential, investment opportunities, and weaknesses
E. Visualizing Market Insights
I used Plotly to visualize industry trends.
import plotly.express as px
st.subheader("Industry Distribution")
industry_dist = filtered_df["industry"].value_counts()
fig = px.pie(values=industry_dist.values, names=industry_dist.index)
st.plotly_chart(fig)
st.subheader("Funding Distribution")
fig = px.box(filtered_df, y="total_funding_amount", x="industry")
st.plotly_chart(fig)
✅ What this does:
- Displays a pie chart of industry distribution
- Shows funding distribution per industry
Press enter or click to view image in full size
The complete code for this project is available on my GitHub; check it out and fine-tune it to meet your needs. Let me know how you used it in the comments.
Results and Lessons Learned
Results
- The tool successfully helps me discover business leads in interested sectors.
- Incorporating AI-generated insights provided valuable company analysis.
- And the visualization tools offered a better market overview.
Lessons Learned
- Data standardization is crucial for consistency.
- Pre-built datasets saved time compared to web scraping.
- AI models like Ollama need fine-tuning for better insights. (You can also try using OpenAI or Deepseek)
Conclusion
By combining real-time data from ZoomInfo and Crunchbase with AI-powered insights, I built a scalable lead generation tool that helps businesses identify and target high-value prospects. With this approach, my sales teams can focus on leads with the highest conversion potential rather than wasting time on outdated or irrelevant data.
If you’re looking for clean, structured, and enriched datasets for lead generation, Bright Data is an excellent choice. Their marketplace offers ready-made datasets and APIs that can easily be integrated into any workflow, which will, in turn, drive growth and innovation.