diff --git a/docs/.gitbook/assets/Screenshot 2024-11-27 at 14.43.50.png b/docs/.gitbook/assets/Screenshot 2024-11-27 at 14.43.50.png new file mode 100644 index 000000000..d379a0b4a Binary files /dev/null and b/docs/.gitbook/assets/Screenshot 2024-11-27 at 14.43.50.png differ diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 5d0057d46..1820935db 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -12,12 +12,18 @@ * [SaaS vs Self-Host](self-hosting/saas-vs-self-host.md) * [Install Reacher in 20min](self-hosting/install.md) * [Licensing](self-hosting/licensing.md) -* [IP Health Maintenance](self-hosting/interactive-blocks.md) +* [Scaling for Production](self-hosting/scaling-for-production.md) * [Proxies](self-hosting/proxies.md) -* [Bulk Verification](self-hosting/bulk.md) +* [Bulk Verification (v0.7)](self-hosting/bulk.md) * [Debugging Reacher](self-hosting/debugging-reacher.md) -* [Docker Environment Variables](self-hosting/docker-environment-variables.md) +* [Docker Environment Variables (v0.7)](self-hosting/docker-environment-variables.md) ## Advanced -* [OpenAPI](advanced/openapi.md) +* [OpenAPI](advanced/openapi/README.md) + * [/v0/check\_email](advanced/openapi/v0-check_email.md) + * [/v1/check\_email](advanced/openapi/v1-check_email.md) + * [/v1/bulk](advanced/openapi/v1-bulk.md) +* [Migrations](advanced/migrations/README.md) + * [Migrating from 0.7 to 0.10 (beta)](advanced/migrations/migrating-from-0.7-to-0.10-beta.md) + * [Reacher Configuration (v0.10)](advanced/migrations/reacher-configuration-v0.10.md) diff --git a/docs/advanced/migrations/README.md b/docs/advanced/migrations/README.md new file mode 100644 index 000000000..6cdd1e432 --- /dev/null +++ b/docs/advanced/migrations/README.md @@ -0,0 +1,3 @@ +# Migrations + +* [migrating-from-0.7-to-0.10-beta.md](migrating-from-0.7-to-0.10-beta.md "mention") diff --git a/docs/advanced/migrations/migrating-from-0.7-to-0.10-beta.md b/docs/advanced/migrations/migrating-from-0.7-to-0.10-beta.md new file mode 100644 index 000000000..d1ac5c808 --- /dev/null +++ b/docs/advanced/migrations/migrating-from-0.7-to-0.10-beta.md @@ -0,0 +1,22 @@ +# Migrating from 0.7 to 0.10 (beta) + +{% hint style="info" %} +v0.10 is currently still in `beta` phase. +{% endhint %} + +Reacher v0.10 introduces the `/v1/*` endpoints, namely: + +* `/v1/check_email`: Performs a single email verification while respecting the optional throttle and concurrency settings set in [reacher-configuration-v0.10.md](reacher-configuration-v0.10.md "mention"). +* `/v1/bulk`, `/v1/bulk/{job_id}`, `/v1/bulk/{job_id}/results`: Create a bulk verification job, and query its progress and status. Docs coming soon. + +The `/v0/check_email` endpoint **DOES NOT** change, neither in API nor in behavior. More specifically, even if you specify throttle and concurrency settings in the newly introduced Reacher Configuration, they will not be taken into account by the `/v0/check_email` endpoint, which will perform email verification as soon as it receives the request. + +## Environment Variables + +With the introduction of [reacher-configuration-v0.10.md](reacher-configuration-v0.10.md "mention"), some of the Environment Variables have changed names. + +
Old nameNew nameDescription
RCH_HTTP_HOSTRCH__HTTP_HOSTThe host name to bind the HTTP server to.
PORTRCH__HTTP_PORTThe port to bind the HTTP server to, often populated by the cloud provider.
RCH_SENTRY_DSNRCH__SENTRY_DSNIf set, bug reports will be sent to this Sentry DSN.
RCH_HEADER_SECRETRCH__HEADER_SECRETIf set, then all HTTP requests must have the x-reacher-secret header set to this value. This is used to protect the backend against public unwanted HTTP requests.
RCH_FROM_EMAILRCH__FROM_EMAILEmail to use in the <MAIL FROM:> SMTP step. Can be overwritten by each API request's from_email field.
RCH_HELLO_NAMERCH__HELLO_NAMEName to use in the <EHLO> SMTP step. Can be overwritten by each API request's hello_name field.
RCH_SMTP_TIMEOUTRCH__SMTP_TIMEOUTTimeout for each SMTP connection.
RCH_WEBDRIVER_ADDRRCH__WEBDRIVER_ADDRSet to a running WebDriver process endpoint (e.g. http://localhost:9515) to use a headless navigator to password recovery pages to check Yahoo and Hotmail/Outlook addresses. We recommend chromedriver as it allows parallel requests.
For Bulk Verification:
RCH_ENABLE_BULKRCH__WORKER__ENABLE
DATABASE_URLRCH__WORKER__POSTGRES__DB_URL[Bulk] Database connection string for storing results and task queue
RCH_DATABASE_MAX_CONNECTIONSRemoved[Bulk] Connections created for the database pool
RCH_MINIMUM_TASK_CONCURRENCYRemoved[Bulk] Minimum number of concurrent running tasks below which more tasks are fetched
RCH_MAXIMUM_CONCURRENT_TASK_FETCHRemoved[Bulk] Maximum number of tasks fetched at once
+ +## Bulk Verification + +The `/v0/bulk` endpoints are deprecated, in favor of a RabbitMQ-based queue system. Docs for `/v1/bulk` endpoints are coming soon. diff --git a/docs/advanced/migrations/reacher-configuration-v0.10.md b/docs/advanced/migrations/reacher-configuration-v0.10.md new file mode 100644 index 000000000..4dd59d190 --- /dev/null +++ b/docs/advanced/migrations/reacher-configuration-v0.10.md @@ -0,0 +1,178 @@ +# Reacher Configuration (v0.10) + +{% hint style="info" %} +This configuration has been introduced in v0.10, which is still in `beta`. For the stable 0.7 version, please see [docker-environment-variables.md](../../self-hosting/docker-environment-variables.md "mention"). +{% endhint %} + +Previously, in v0.7, configuration was done solely via environment variables. Given the growing amount of configurable parameters, we now offer a file-based configuration too, on top of environment variables. + +You can find below the exhaustive list of configurations, as well as their corresponding environment variable. + +```toml +# Backend configuration. + +# Name to identify the backend. +# +# Env variable: RCH__BACKEND_NAME +backend_name = "backend-dev" + +# Host to bind the backend to. +# +# Env variable: RCH__HTTP_HOST +http_host = "127.0.0.1" + +# Port for the backend. +# +# Env variable: RCH__HTTP_PORT +http_port = 8080 + +# Shared secret between a trusted client and the backend, required in the +# `x-reacher-secret` header of all incoming requests. +# +# Env variable: RCH__HEADER_SECRET +# header_secret = "my-secret" + +# Name to use during the EHLO/HELO command in the SMTP conversation. +# Ideally, this should match the reverse DNS of the server's IP address. +# +# Env variable: RCH__HELLO_NAME +hello_name = "localhost" + +# Email to use during the MAIL FROM command in the SMTP conversation. +# Ideally, the domain of this email should match the "hello_name" above. +# +# Env variable: RCH__FROM_EMAIL +from_email = "hello@localhost" + +# Address of the Chrome WebDriver server for headless email verifications. +# +# Env variable: RCH__WEBDRIVER_ADDR +webdriver_addr = "http://localhost:9515" + +# Timeout for each SMTP connection, in seconds. Leaving it commented out will +# not set a timeout, i.e. the connection will wait indefinitely. +# +# Env variable: RCH__SMTP_TIMEOUT +# smtp_timeout = 45 + +# Uncomment the lines below to route all SMTP verification requests +# through a specified proxy. Note that the proxy must be a SOCKS5 proxy to work +# with the SMTP protocol. This proxy will not be used for headless +# verifications. +# +# The username and password are optional and only needed if the proxy requires +# authentication. +# +# Env variables: +# - RCH__PROXY__HOST +# - RCH__PROXY__PORT +# - RCH__PROXY__USERNAME +# - RCH__PROXY__PASSWORD +# +# [proxy] +# host = "my.proxy.com" +# port = 1080 +# username = "my-username" +# password = "my-password" + +# Verification method to use for each email provider. Available methods are: +# "smtp", "headless", and "api". Note that not all methods are supported by +# all email providers. +[verif_method] +# Gmail currently only supports the "smtp" method. +# +# Env variable: RCH__VERIF_METHOD__GMAIL +gmail = "smtp" +# Hotmail B2B currently only supports the "smtp" method. +# +# Env variable: RCH__VERIF_METHOD__HOTMAILB2B +hotmailb2b = "smtp" +# Hotmail B2C supports both "headless" and "smtp" methods. The "headless" +# method is recommended. +hotmailb2c = "headless" +# Yahoo supports both "headless" and "smtp" methods. The "headless" method is +# recommended. +yahoo = "headless" + +[worker] +# Enable the worker to consume emails from the RabbitMQ queues. If set, the +# RabbitMQ configuration below must be set as well. +# +# Env variable: RCH__WORKER__ENABLE +enable = true + +# RabbitMQ configuration. +[worker.rabbitmq] +# Env variable: RCH__WORKER__RABBITMQ__URL +url = "amqp://guest:guest@localhost:5672" + +# Queues to consume emails from. By default, the worker consumes from all +# queues. +# +# To consume from only a subset of queues, uncomment the line `queues = "all"` +# and specify the queues you want to consume from. +# +# Below is the exhaustive list of queue names that the worker can consume from: +# - "check.gmail": subscribe exclusively to Gmail emails. +# - "check.hotmailb2b": subscribe exclusively to Hotmail B2B emails. +# - "check.hotmailb2c": subscribe exclusively to Hotmail B2C emails. +# - "check.yahoo": subscribe exclusively to Yahoo emails. +# - "check.everything_else": subscribe to all emails that are not Gmail, Yahoo, or Hotmail. +# +# Env variable: RCH__WORKER__RABBITMQ__QUEUES +# +# queues = ["check.gmail", "check.hotmail.b2b", "check.hotmail.b2c", "check.yahoo", "check.everything_else"] +queues = "all" + +# Number of concurrent emails to verify for this worker across all queues. +# +# Env variable: RCH__WORKER__RABBITMQ__CONCURRENCY +concurrency = 20 + +# Throttle the maximum number of requests per second, per minute, per hour, and +# per day for this worker. +# All fields are optional; comment them out to disable the limit. +# +# Important: these throttle configurations only apply to /v1/* endpoints, and +# not to the previous /v0/check_email endpoint. The latter endpoint always +# executes the verification immediately, regardless of the throttle settings. +# +# Env variables: +# - RCH__WORKER__THROTTLE__MAX_REQUESTS_PER_SECOND +# - RCH__WORKER__THROTTLE__MAX_REQUESTS_PER_MINUTE +# - RCH__WORKER__THROTTLE__MAX_REQUESTS_PER_HOUR +# - RCH__WORKER__THROTTLE__MAX_REQUESTS_PER_DAY +[worker.throttle] +# max_requests_per_second = 20 +# max_requests_per_minute = 100 +# max_requests_per_hour = 1000 +# max_requests_per_day = 20000 + +# Postgres configuration. Currently, a Postgres database is required to store +# the results of the verifications. This might change in the future, allowing +# for pluggable storage. +[worker.postgres] +# Env variable: RCH__WORKER__POSTGRES__DB_URL +db_url = "postgresql://localhost/reacherdb" + +# Optional Sentry DSN. If set, all errors will be sent to Sentry. +# +# Env variable: RCH__SENTRY_DSN +# sentry_dsn = "" +``` + +## Usage with Docker + +You can continue using environment variables with Docker. For example, to overwrite the EHLO/HELO name, simply run: + +```bash +docker run -e RCH__HELLO_NAME=my.company.com -p 8080:8080 reacherhq/backend:beta +``` + +However, if you prefer to pass in a local `backend_config.toml` file instead, run: + +```bash +docker run -e RUST_LOG=reacher=debug -v /path/to/local/backend_config.toml:./backend_config.toml -p 8080:8080 reacherhq/backend:beta +``` + +We recommend passing in `-e RUST_LOG=reacher=debug`, at least on first run, as the debug logs will show the final configuration parsed by Reacher. diff --git a/docs/advanced/openapi.md b/docs/advanced/openapi.md deleted file mode 100644 index 6b2ccaf70..000000000 --- a/docs/advanced/openapi.md +++ /dev/null @@ -1,7 +0,0 @@ -# OpenAPI - -Below is the OpenAPI specification of Reacher's API endpoints. - -{% swagger src="https://raw.githubusercontent.com/reacherhq/check-if-email-exists/master/backend/openapi.json" path="/check_email" method="post" %} -[https://raw.githubusercontent.com/reacherhq/check-if-email-exists/master/backend/openapi.json](https://raw.githubusercontent.com/reacherhq/check-if-email-exists/master/backend/openapi.json) -{% endswagger %} diff --git a/docs/advanced/openapi/README.md b/docs/advanced/openapi/README.md new file mode 100644 index 000000000..b1b187e71 --- /dev/null +++ b/docs/advanced/openapi/README.md @@ -0,0 +1,6 @@ +# OpenAPI + +* [v0-check\_email.md](v0-check_email.md "mention") +* [v1-check\_email.md](v1-check_email.md "mention") +* [v1-bulk.md](v1-bulk.md "mention") + diff --git a/docs/advanced/openapi/v0-check_email.md b/docs/advanced/openapi/v0-check_email.md new file mode 100644 index 000000000..6dd113b1c --- /dev/null +++ b/docs/advanced/openapi/v0-check_email.md @@ -0,0 +1,6 @@ +# /v0/check\_email + +{% swagger src="https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json" path="/v0/check_email" method="post" %} +[https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json](https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json) +{% endswagger %} + diff --git a/docs/advanced/openapi/v1-bulk.md b/docs/advanced/openapi/v1-bulk.md new file mode 100644 index 000000000..c582a5724 --- /dev/null +++ b/docs/advanced/openapi/v1-bulk.md @@ -0,0 +1,18 @@ +# /v1/bulk + +{% hint style="info" %} +This endpoint is available starting from Reacher v0.10. +{% endhint %} + +{% swagger src="https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json" path="/v1/bulk" method="post" %} +[https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json](https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json) +{% endswagger %} + +{% swagger src="https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json" path="/v1/bulk/{job_id}" method="get" %} +[https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json](https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json) +{% endswagger %} + +{% swagger src="https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json" path="/v1/bulk/{job_id}/results" method="get" %} +[https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json](https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json) +{% endswagger %} + diff --git a/docs/advanced/openapi/v1-check_email.md b/docs/advanced/openapi/v1-check_email.md new file mode 100644 index 000000000..6b98018d1 --- /dev/null +++ b/docs/advanced/openapi/v1-check_email.md @@ -0,0 +1,10 @@ +# /v1/check\_email + +{% hint style="info" %} +This endpoint is available starting from Reacher v0.10. +{% endhint %} + +{% swagger src="https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json" path="/v1/check_email" method="post" %} +[https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json](https://raw.githubusercontent.com/reacherhq/check-if-email-exists/refs/heads/master/backend/openapi.json) +{% endswagger %} + diff --git a/docs/getting-started/is-reachable.md b/docs/getting-started/is-reachable.md index bc2245307..b4e29fa58 100644 --- a/docs/getting-started/is-reachable.md +++ b/docs/getting-started/is-reachable.md @@ -127,4 +127,4 @@ The full response contains more details about the email verification. It is prov ``` -You can also check the [openapi.md](../advanced/openapi.md "mention")specification. +You can also check the [openapi](../advanced/openapi/ "mention")specification. diff --git a/docs/self-hosting/docker-environment-variables.md b/docs/self-hosting/docker-environment-variables.md index cc68f4f2c..eaef55b04 100644 --- a/docs/self-hosting/docker-environment-variables.md +++ b/docs/self-hosting/docker-environment-variables.md @@ -1,4 +1,8 @@ -# Docker Environment Variables +# Docker Environment Variables (v0.7) + +{% hint style="info" %} +This page only applies to Reacher version 0.7. For the `beta` v0.10 version, see [reacher-configuration-v0.10.md](../advanced/migrations/reacher-configuration-v0.10.md "mention"). +{% endhint %} Reacher's software is available on [Docker Hub](https://hub.docker.com/r/reacherhq/backend/tags). You can get started using the default parameters: diff --git a/docs/self-hosting/install.md b/docs/self-hosting/install.md index cd9fa87f2..470d83c0e 100644 --- a/docs/self-hosting/install.md +++ b/docs/self-hosting/install.md @@ -6,33 +6,27 @@ Reacher is built with self-hosting as a primary feature, giving you full control You can run this tutorial without a Commercial License, as a Free Trial. Read more about the Free Trial in [licensing.md](licensing.md "mention"). {% endhint %} -## Choosing the right infrastructure +## Tutorial Scope: Install Reacher on a single server -The first choice to make is the cloud provider to run Reacher. Reacher is **stateless** by design, meaning you can deploy multiple containers, each running a separate instance of Reacher, to perform email verifications in parallel. This architecture enables easy horizontal scaling. +Reacher is **stateless** by design, meaning you can deploy multiple containers, each running a separate instance of Reacher, to perform email verifications in parallel. This architecture enables easy horizontal scaling. -{% hint style="info" %} -If you enable [bulk.md](bulk.md "mention"), all Reacher instances will need to connect to the same Postgres database. They can still run independently, ensuring scalability and parallel processing. -{% endhint %} - -However, for larger volume, it's important to manage the reputation of the IP addresses used for verifications, as poor IP will greatly decrease the quality of the results. Read more about [interactive-blocks.md](interactive-blocks.md "mention"). - -To get started, using dedicated servers with fixed IPs is recommended, as this allows you to maintain control over the IPs' reputation. Alternatively, you can also outsource the IP health maintenance to a 3rd party, by using [proxies.md](proxies.md "mention"). +However, for the sake of this tutorial, we will install Reacher on a single dedicated server. This allows minimal setup to get Reacher working, and ensures that the chosen cloud provider allows outgoing port 25 requests. -The tutorial below assumes a single dedicated server. Make sure that your cloud provider allows outgoing requests on port 25. +If you're interested in ideas for a production deployment setup, skip to [scaling-for-production.md](scaling-for-production.md "mention"). ## Step-by-Step Tutorial 1. Install Docker on your server. You can follow [Docker's guide](https://docs.docker.com/engine/install/) for your OS. -2. Run Reacher's [Docker image](https://hub.docker.com/r/reacherhq/backend): +2. Run Reacher's latest (v0.7) [Docker image](https://hub.docker.com/r/reacherhq/backend): ```bash -docker run -p 8080:8080 reacherhq/backend:latest +docker run -p 8080:8080 reacherhq/backend:latest # v0.7 ``` You should see the following output: ```bash -2024-09-19T12:58:32.918254Z INFO reacher: Running Reacher version="0.7.0" +2024-09-19T12:58:32.918254Z INFO reacher: Running Reacher version="0.10.0" Starting ChromeDriver 124.0.6367.78 (a087f2dd364ddd58b9c016ef1bf563d2bc138711-refs/branch-heads/6367@{#954}) on port 9515 Only local connections are allowed. Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe. @@ -42,30 +36,21 @@ ChromeDriver was started successfully. Advanced users can set additional [docker-environment-variables.md](docker-environment-variables.md "mention"). -3. Verify an email locally, from the shell of your server. +3. Make sure that you can verify an email remotely by running the following command from your local machine. ```bash curl -X POST \ -H'Content-Type: application/json' \ -d'{"to_email":"amaury@reacher.email"}' \ - http://localhost:8080/v0/check_email + http://:8080/v0/check_email ``` {% hint style="warning" %} If this step hangs for a long time, or returns a JSON result with `is_reachable="unknown"`, it generally means that port 25 is restricted. See [debugging-reacher.md](debugging-reacher.md "mention")on how to fix this. {% endhint %} -4. Make sure that you can verify an email remotely by running the following command from your local machine. - -```bash -curl -X POST \ - -H'Content-Type: application/json' \ - -d'{"to_email":"amaury@reacher.email"}' \ - http://:8080/v0/check_email -``` - -5. If you see a JSON output with an `is_reachable` field, then you're set, congratulations! :tada: +4. If you see a JSON output with an `is_reachable` field, then you're set, congratulations! :tada: ## Troubleshooting -If you have any issue in one of the steps above, you can try [debugging-reacher.md](debugging-reacher.md "mention")yourself, or send me an email [amaury](https://app.gitbook.com/u/F1LnsqPFtfUEGlcILLswbbp5cgk2 "mention"). +If you have any issue in one of the steps above, you can try [debugging-reacher.md](debugging-reacher.md "mention")yourself, or send me an email [amaury@reacher.email](https://app.gitbook.com/u/F1LnsqPFtfUEGlcILLswbbp5cgk2 "mention"). diff --git a/docs/self-hosting/interactive-blocks.md b/docs/self-hosting/interactive-blocks.md deleted file mode 100644 index 80652c4bc..000000000 --- a/docs/self-hosting/interactive-blocks.md +++ /dev/null @@ -1,29 +0,0 @@ -# IP Health Maintenance - -When running Reacher at scale, managing your **IP health** is crucial to maintaining high email verification accuracy and avoiding potential issues such as blacklisting. Since Reacher connects to email servers to verify addresses, the health of the IPs making these connections directly affects the success rate of your verification requests. - -Here are some best practices for maintaining IP health. - -## **Rotate IP Addresses** - -Constantly making requests from the same IP can lead to throttling or blacklisting, especially if you're performing high-volume verifications. Using an IP rotation service or proxy pools can help distribute the load and reduce the risk of any single IP being blocked. - -## Monitor IP Reputation - -Regularly check the reputation of the IP addresses you're using. Tools like Sender Score can help monitor your IP reputation. If one or more of your IPs gets flagged, it can impact the accuracy of email verifications, as some servers may block connections from poor IPs. - -## **Use Dedicated IPs** - -Consider using dedicated IPs for email verification. Shared IPs can sometimes have degraded reputations due to the behavior of other users sharing the same IP range. Dedicated IPs ensure that you maintain control over your reputation. - -## Warm Up IPs - -If you’re starting with a fresh set of IPs, begin with a lower verification volume and gradually increase the load. This process, known as **IP warming**, helps avoid immediate blacklisting and allows you to build a solid reputation with email servers over time. - -## Throttle Verification Requests - -Sending too many requests from a single IP in a short period can trigger rate limits or spam filters on email servers. Implement rate-limiting to space out requests and ensure smoother interactions with email servers. - -## Too Much Work? Use Proxies! - -Deploying Reacher generally requires a one-time setup with minimal ongoing maintenance. However, managing IP addresses can be more time-consuming over the long run. If you prefer to avoid the complexities of IP management, you can route all SMTP requests through a third-party service that handles IP health and reputation management for you. Read more about [proxies.md](proxies.md "mention"). diff --git a/docs/self-hosting/proxies.md b/docs/self-hosting/proxies.md index f29034538..f044bea5d 100644 --- a/docs/self-hosting/proxies.md +++ b/docs/self-hosting/proxies.md @@ -1,5 +1,7 @@ # Proxies +Maintaining a good IP reputation is hard. Reacher integrates seamlessly with SOCKS5 proxies. + ## What is a SOCKS5 Proxy? A **SOCKS5 proxy** is a flexible proxy protocol that supports various types of traffic, including SMTP. When using it for email verifications, the reputation of the **proxy’s IP** is what matters, not your own IP. The proxy handles requests on your behalf, which helps protect your actual IP while ensuring verifications go through successfully. This is crucial for maintaining deliverability and avoiding issues like blacklisting. diff --git a/docs/self-hosting/scaling-for-production.md b/docs/self-hosting/scaling-for-production.md new file mode 100644 index 000000000..77c3f245b --- /dev/null +++ b/docs/self-hosting/scaling-for-production.md @@ -0,0 +1,67 @@ +# Scaling for Production + +{% hint style="info" %} +The architecture detalied below is currently only available on the v0.10 `beta` Docker version. +{% endhint %} + +Reacher's stateless design allows for efficient horizontal scaling. We propose here a queue-based architecture to handle more than 10 millions of email verifications per month. + +The architecture contains 4 components: + +
ComponentDescriptionDocker image
HTTP serverReceives incoming email verification requests, and post them into the queue.reacherhq/backend:beta
RabbitMQReacher uses a reliable, mature and open-source queue implementation.rabbitmq:4.0-management
WorkersOne or more consumers of the queue, which perform the actual email verification task.reacherhq/backend:beta
StorageA place to store all results, currently only PostgresDB is supported.postgres:14
+ +Note that Reacher provides the same Docker image `reacherhq/backend` which can act as both a **Worker** and a **HTTP server**. + +

Reacher architecture for scaling

+ +With this architecture, it's possible to horizontally scale the number of workers, while making sure that the individual IPs don't get blacklisted. To do so, we propose to start with two types of workers. + +### Common Configuration to both workers + +To enable the above worker architecture, set the following parameters in [reacher-configuration-v0.10.md](../advanced/migrations/reacher-configuration-v0.10.md "mention"): + +* `worker.enable`: true +* `worker.rabbitmq.url`: Points to the URL of the RabbitMQ instance. +* `worker.postgres.db_url`: A Postgres database to store the email verification results. + +### 1st worker type: SMTP worker using Proxy + +These workers will consume all emails that should be verified through SMTP. Currently, this includes all emails, except Hotmail B2C and Yahoo emails, which are best verified using a headless navigator. Since maintaing IP addresses is hard, we recommend using a proxy, see [proxies.md](proxies.md "mention"). + +Assuming your proxy has `N` available IP addresses, we recommend spawning the same number `N` of workers, each with the config below: + +* `worker.rabbitmq.queues`: `["check.gmail","check.hotmailb2b","everything_else"]`. The SMTP workers will listen to these queues. +* `worker.proxy.{host,port}`: Set a proxy to route all SMTP requests through. You can optionally pass in `username` and `password` if required. +* `worker.rabbitmq.concurrency`: 10. +* `worker.throttle.max_requests_per_minute`: 100. +* `worker.throttle.max_requests_per_day`: 10000. This is the recommended number of verifications per IP per day. Assuming there are `N` IP addresses and `N` workers, each worker should perform 10000 verifications per day. + +You can scale up the number `N` as much as you need. Remember, the rule of thumb is 10000 verifications per IP per day. For example, if you're aiming for 10 millions verifications per month, we recommend 33 or 34 IPs. + +``` +10,000,000 emails per month / 30 = 33,000 emails per day / 10000 = 33 IPs +``` + +Refer to [reacher-configuration-v0.10.md](../advanced/migrations/reacher-configuration-v0.10.md "mention")to see how to set these settings. + +### 2nd worker type: Headless worker + +These workers will consume all emails that are best verified using a headless browser. The idea behind this verification method is to spawn a headless browser that will navigate to the email provider's password recovery page, and parse the website's response to inputting emails. This method currently works well for Hotmail and Yahoo emails. + +To spawn such a worker, provide the config: + +* `worker.rabbitmq.queues`: `["check.hotmailb2c","check.yahoo"]`. These are the emails that are best verified using headless. +* `worker.throttle.max_requests_per_minute`: 100 + +Refer to [reacher-configuration-v0.10.md](../advanced/migrations/reacher-configuration-v0.10.md "mention")to see how to set these settings. + +## Understanding the architecture with Docker Compose + +We do not recommend using Docker Compose for a high-volume production setup. However, for understanding the architecture, the different Docker images, as well as how to configure the workers, this [`docker_compose.yaml`](../../docker-compose.yaml) file can be useful. + +## More questions? + +Contact [amaury@reacher.email](https://app.gitbook.com/u/F1LnsqPFtfUEGlcILLswbbp5cgk2 "mention")if you have more questions about this architecture, such as: + +* deploying on Kubernetes (Ansible playbook, Pulumi) +* more specialized workers (e.g. Gmail and Hotmail B2B workers can be separated)