Apify

All

128 repositories

crawlee
Public
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping
TypeScript
•
Apache License 2.0
•638•15k•112•14•Updated Sep 30, 2024Sep 30, 2024
crawlee-python
Public
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
python crawler scraper automation web-crawler headless scraping crawling pip web-scraping
Python
•
Apache License 2.0
•254•4k•69•5•Updated Sep 30, 2024Sep 30, 2024
apify-shared-js
Public
Utilities and constants shared across Apify projects.
TypeScript
•
Apache License 2.0
•10•12•4•2•Updated Sep 30, 2024Sep 30, 2024
apify-cli
Public
Apify command-line interface helps you create, develop, build and run Apify actors, and manage the Apify cloud platform.
command-line headless-chrome puppeteer serveless apify
TypeScript
•18•121•35•7•Updated Sep 30, 2024Sep 30, 2024
actor-vector-database-integrations
Public
Transfer data from Apify Actors to vector databases (Chroma, Milvus, Pinecone, PostgreSQL (PG-Vector), Qdrant, and Weaviate)
Python
•
Apache License 2.0
•4•2•0•0•Updated Sep 30, 2024Sep 30, 2024
openapi
Public
An OpenAPI specification for the Apify API.
JavaScript
•
MIT License
•0•2•16•3•Updated Sep 30, 2024Sep 30, 2024
airbyte
Public
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Python
•
Other
•4k•0•0•0•Updated Sep 30, 2024Sep 30, 2024
keboola-ex-apify
Public
Apify extractor for Keboola Connection
JavaScript
•
Apache License 2.0
•0•0•5•1•Updated Sep 30, 2024Sep 30, 2024
fingerprint-suite
Public
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
scraping fingerprinting playwright typescript puppeteer
TypeScript
•
Apache License 2.0
•95•911•18•12•Updated Sep 30, 2024Sep 30, 2024
workflows
Public
Apify's reusable github workflows
3•6•2•3•Updated Sep 29, 2024Sep 29, 2024
apify-docs
Public
This project is the home of Apify's documentation.
API Blueprint
•
Apache License 2.0
•73•26•64•24•Updated Sep 27, 2024Sep 27, 2024
actor-whitepaper
Public
This whitepaper describes a new concept for building serverless microapps called Actors, which are easy to develop, share, integrate, and build upon. Actors are a reincarnation of the UNIX philosophy for programs running in the cloud.
0•0•7•4•Updated Sep 26, 2024Sep 26, 2024
apify-client-js
Public
Apify API client for JavaScript / Node.js.
JavaScript
•
Apache License 2.0
•27•65•16•5•Updated Sep 25, 2024Sep 25, 2024
apify-sdk-python
Public
The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.
automation scraping apify python sdk
Python
•
Apache License 2.0
•12•115•9•1•Updated Sep 25, 2024Sep 25, 2024
apify-actor-docker
Public
Base Docker images for Apify actors.
Dockerfile
•
Apache License 2.0
•22•69•9•2•Updated Sep 24, 2024Sep 24, 2024
actor-templates
Public
This project is the 🏠 home of Apify actor template projects to help users quickly get started.
Python
•15•25•7•1•Updated Sep 24, 2024Sep 24, 2024
rag-web-browser
Public
Retrieve website content from the top Google Search Results Pages (SERPs)
scraper crawling serp llm
0•0•0•1•Updated Sep 24, 2024Sep 24, 2024
apify-haystack
Public
The official integration for Apify and Haystack 2.0
apify rag haystack-ai
Python
•
Apache License 2.0
•0•1•0•0•Updated Sep 23, 2024Sep 23, 2024
apify-client-python
Public
Apify API client for Python
api client scraping apify python
Python
•
Apache License 2.0
•11•46•8•0•Updated Sep 23, 2024Sep 23, 2024
homebrew-tap
Public
A Homebrew tap for Apify tools
Ruby
•1•8•0•3•Updated Sep 19, 2024Sep 19, 2024
actor-aws-costs-to-slack
Public
TBD
TypeScript
•
MIT License
•0•0•0•1•Updated Sep 18, 2024Sep 18, 2024
apify-sdk-js
Public
Apify SDK monorepo
actor apify nodejs javascript typescript sdk
TypeScript
•
Apache License 2.0
•31•119•9•7•Updated Sep 13, 2024Sep 13, 2024
proxy-chain
Public
Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.
javascript-library headless-chrome proxy-server proxychains
JavaScript
•
Apache License 2.0
•140•839•9•11•Updated Sep 12, 2024Sep 12, 2024
release-pr-action
Public
This action simplify creating of release PR
JavaScript
•
Apache License 2.0
•0•0•1•0•Updated Sep 12, 2024Sep 12, 2024
idcac
Public
I Don't Care About Cookies extension compiled for use with Playwright/Puppeteer
JavaScript
•
GNU General Public License v3.0
•0•8•0•1•Updated Sep 9, 2024Sep 9, 2024
actor-monorepo-example
Public
An example repository with multiple Apify Actors sharing code between each other.
JavaScript
•5•1•1•1•Updated Sep 6, 2024Sep 6, 2024
docs-search-modal
Public
Custom Algolia search modal for Apify Documentation.
TypeScript
•
MIT License
•1•0•0•2•Updated Sep 5, 2024Sep 5, 2024
docusaurus-plugin-typedoc-api
Public
Apify's fork of `docusaurus-plugin-typedoc-api`, customized for our Python documentation.
TypeScript
•25•0•0•0•Updated Sep 4, 2024Sep 4, 2024
apify-zapier-integration
Public
Apify integration for Zapier
api zapier web-scraping apify
JavaScript
•
Apache License 2.0
•1•8•5•0•Updated Aug 27, 2024Aug 27, 2024
echo-standby-actor
Public
An example Actor using Standby mode
Dockerfile
•0•0•0•1•Updated Aug 14, 2024Aug 14, 2024