Tool Usage: Web Scraping

Overview

Web scraping involves the extraction and understanding of web page content, providing users with more intelligent, customized information retrieval and analysis capabilities. We will implement this scenario requirement with DataInterpreter.

Example: Using the Tool to Obtain Table Data from a Static Web Page

Task

Retrieve paper information containing the keywords: multiagent and large language model from iclr-2024-statistics

Code

examples/di/crawl_webpage.py

bash

python examples/di/crawl_webpage.py

python examples/di/crawl_webpage.py

Execution Results

Mechanism Explained

Use the tool function scrape_web_playwright from metagpt.tools.libs.web_scraping to obtain the webpage's HTML and inner text. This tool function is a wrapper for the browser automation library Playwright.
Use BeautifulSoup to retrieve the table with the id paperlist, and load it as a pandas DataFrame.
Obtain the column names of the DataFrame to locate the title column, match keywords multiagent, large language model to filter data. Save the filtered data in filtered_papers.csv.

Configuration

Data Interpreter

Environment

Contribution Guidelines

Tool Usage: Web Scraping

Overview

Example: Using the Tool to Obtain Table Data from a Static Web Page

Task

Code

Execution Results

Mechanism Explained

Data Interpreter

Tool Usage: Web Scraping ​

Overview ​

Example: Using the Tool to Obtain Table Data from a Static Web Page ​

Task ​

Code ​

Execution Results ​

Mechanism Explained ​

Tool Usage: Web Scraping

Overview

Example: Using the Tool to Obtain Table Data from a Static Web Page

Task

Code

Execution Results

Mechanism Explained