Skip to content
@apify

Apify

We're making the web more programmable.

Pinned Loading

  1. crawlee-python crawlee-python Public

    Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Wo…

    Python 4k 254

  2. crawlee crawlee Public

    Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, an…

    TypeScript 15.2k 638

  3. proxy-chain proxy-chain Public

    Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.

    JavaScript 839 140

  4. apify-sdk-js apify-sdk-js Public

    Apify SDK monorepo

    TypeScript 119 31

  5. got-scraping got-scraping Public

    HTTP client made for scraping based on got.

    TypeScript 526 40

  6. fingerprint-suite fingerprint-suite Public

    Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

    TypeScript 911 95

Repositories

Showing 10 of 128 repositories
  • apify-cli Public

    Apify command-line interface helps you create, develop, build and run Apify actors, and manage the Apify cloud platform.

    apify/apify-cli’s past year of commit activity
    TypeScript 121 18 35 (1 issue needs help) 7 Updated Sep 30, 2024
  • apify-shared-js Public

    Utilities and constants shared across Apify projects.

    apify/apify-shared-js’s past year of commit activity
    TypeScript 12 Apache-2.0 10 4 2 Updated Sep 30, 2024
  • crawlee Public

    Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

    apify/crawlee’s past year of commit activity
    TypeScript 15,173 Apache-2.0 638 112 (1 issue needs help) 13 Updated Sep 30, 2024
  • actor-vector-database-integrations Public

    Transfer data from Apify Actors to vector databases (Chroma, Milvus, Pinecone, PostgreSQL (PG-Vector), Qdrant, and Weaviate)

    apify/actor-vector-database-integrations’s past year of commit activity
    Python 2 Apache-2.0 4 0 0 Updated Sep 30, 2024
  • openapi Public

    An OpenAPI specification for the Apify API.

    apify/openapi’s past year of commit activity
    JavaScript 2 MIT 0 16 3 Updated Sep 30, 2024
  • airbyte Public Forked from airbytehq/airbyte

    Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

    apify/airbyte’s past year of commit activity
    Python 0 4,074 0 0 Updated Sep 30, 2024
  • keboola-ex-apify Public

    Apify extractor for Keboola Connection

    apify/keboola-ex-apify’s past year of commit activity
    JavaScript 0 Apache-2.0 0 5 1 Updated Sep 30, 2024
  • crawlee-python Public

    Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

    apify/crawlee-python’s past year of commit activity
    Python 4,017 Apache-2.0 254 68 4 Updated Sep 30, 2024
  • fingerprint-suite Public

    Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

    apify/fingerprint-suite’s past year of commit activity
    TypeScript 911 Apache-2.0 95 18 12 Updated Sep 30, 2024
  • workflows Public

    Apify's reusable github workflows

    apify/workflows’s past year of commit activity
    6 3 2 3 Updated Sep 29, 2024