Apify Actor
Apify Actors are cloud programs designed for a wide range of web scraping, crawling, and data extraction tasks. These actors facilitate automated data gathering from the web, enabling users to extract, process, and store information efficiently. Actors can be used to perform tasks like scraping e-commerce sites for product details, monitoring price changes, or gathering search engine results. They integrate seamlessly with Apify Datasets, allowing the structured data collected by actors to be stored, managed, and exported in formats like JSON, CSV, or Excel for further analysis or use.
Overview
This notebook walks you through using Apify Actors with LangChain to automate web scraping and data extraction. The langchain-apify
package integrates Apify's cloud-based tools with LangChain agents, enabling efficient data collection and processing for AI applications.
Setup
This integration lives in the langchain-apify package. The package can be installed using pip.
%pip install langchain-apify
Prerequisites
- Apify account: Register your free Apify account here.
- Apify API token: Learn how to get your API token in the Apify documentation.
import os
os.environ["APIFY_API_TOKEN"] = "your-apify-api-token"
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
Instantiation
Here we instantiate the ApifyActorsTool
to be able to call RAG Web Browser Apify Actor. This Actor provides web browsing functionality for AI and LLM applications, similar to the web browsing feature in ChatGPT. Any Actor from the Apify Store can be used in this way.
from langchain_apify import ApifyActorsTool
tool = ApifyActorsTool("apify/rag-web-browser")
Invocation
The ApifyActorsTool
takes a single argument, which is run_input
- a dictionary that is passed as a run input to the Actor. Run input schema documentation can be found in the input section of the Actor details page. See RAG Web Browser input schema.
tool.invoke({"run_input": {"query": "what is apify?", "maxResults": 2}})