|
Registriert seit: 3. Sep 2023 482 Beiträge |
#25
Fail2Ban wird nicht viel bringen, da dies nur Login-Versuche ab einer frei definierbaren Anzahl von Fehlversuchen blockiert. Der meiste Traffic kommt aber nicht von Anmeldeversuchen sondern vom Aufrufen von Webseiten. Könnte allerdings auch ein Überlassungsversuch sein, um das Forum offline gehen zu lassen.
Anyway, i asked ChatGPT with ![]() Using "Fail2Ban with custom filters" how to recognize and stop crawlers and scrappers ?
Code:
And i followed with, "Yes show me that"
Excellent question — and an advanced one 👏
Fail2Ban is traditionally used to block brute-force login attempts or repeated bad HTTP requests, but it can be extended very effectively to recognize and stop web crawlers / scrapers by using custom filters and jail rules on your web server logs (Apache or Nginx). Let’s go step-by-step so you can build a reliable and non-destructive setup. 🧩 1. General Concept Fail2Ban works like this: It monitors log files (e.g. /var/log/nginx/access.log). It applies regex filters to detect bad patterns (excessive requests, fake user agents, 404 floods, etc). Once the threshold is reached (e.g. maxretry = 10 in findtime = 60s), it bans the offending IP (by default using iptables or firewalld). 🕵️ 2. What Defines a Crawler / Scraper? Scrapers tend to show one or more of these traits: Suspicious user-agents (empty, fake, or known bad bots). Excessive request rate from a single IP. Repeated 404/403s (trying random URLs). Requests for disallowed paths (/wp-admin, /phpmyadmin, /etc/passwd, etc). Ignoring robots.txt. Fail2Ban can catch all these with different filters. ⚙️ 3. Example: Basic Anti-Scraper Jail (Nginx) 3.1 Create the filter /etc/fail2ban/filter.d/nginx-noscrape.conf [Definition] failregex = <HOST> -.*"(GET|POST).*HTTP.*" 200 ignoreregex = But this alone bans everything (too broad). Let’s refine. Example detecting excessive requests with bad user-agents or too fast access (tuned via log tailing): [Definition] failregex = ^<HOST> -.*"(GET|POST).*HTTP/.*" 200 .*"(?:-|curl|wget|python|scrapy|bot|spider|crawler|httpclient|libwww|nikto|scan|Go-http-client).*"$ ignoreregex = That matches requests containing those keywords in the User-Agent field. 3.2 Create the jail /etc/fail2ban/jail.local entry [nginx-noscrape] enabled = true port = http,https filter = nginx-noscrape logpath = /var/log/nginx/access.log maxretry = 20 findtime = 60 bantime = 86400 action = iptables-multiport[name=NoScrape, port="http,https"] This bans any IP that hits 20 bad user-agent requests in 60 seconds. 🧠 4. Example: Detect Floods / Too Many Requests (regardless of UA) You can also catch excessive requests (rate-limit style): Filter /etc/fail2ban/filter.d/nginx-flood.conf [Definition] failregex = ^<HOST> -.*"(GET|POST).*HTTP/.*" Jail /etc/fail2ban/jail.local [nginx-flood] enabled = true port = http,https filter = nginx-flood logpath = /var/log/nginx/access.log maxretry = 100 findtime = 10 bantime = 600 That means “if any IP makes more than 100 requests in 10 seconds — ban for 10 minutes”. This is crude but effective against aggressive crawlers that ignore rate limits. 🪤 5. Detect Repeated 404s (path probing) [nginx-404] enabled = true port = http,https filter = nginx-404 logpath = /var/log/nginx/access.log maxretry = 10 findtime = 60 bantime = 3600 Filter (/etc/fail2ban/filter.d/nginx-404.conf): [Definition] failregex = ^<HOST> -.*"(GET|POST).*" 404 ignoreregex = 🧱 6. Combine with Nginx/Apache Hints You can: Add limit_req_zone in Nginx to rate-limit per IP (lightweight, fast). Add custom log formats to expose user-agents or country (for fine control). Use GeoIP to ignore trusted countries or whitelists. 🪶 7. Optional: Ignore Good Bots You can whitelist good bots (Google, Bing, etc) via DNS or regex: ignoreip = 127.0.0.1/8 ::1 66.249.0.0/16 40.77.167.0/24 or in the filter: ignoreregex = "(Googlebot|Bingbot|YandexBot)" 🧩 8. Testing Your Filters Run: fail2ban-regex /var/log/nginx/access.log /etc/fail2ban/filter.d/nginx-noscrape.conf This shows which lines match and helps tune your regex. 🧱 9. Optional — Combined Smart Detection Some admins merge metrics (404 + high rate + bad UA) in a single jail for smarter detection. Example composite filter: failregex = ^<HOST> -.*"(GET|POST).*" (403|404|444|499) failregex = ^<HOST> -.*"(GET|POST).*HTTP/.*" 200 .*"(?:curl|wget|python|bot|scrapy|Go-http-client).*"$ 🧰 10. Maintenance Tips Keep bantime long enough but not too aggressive (1 day is typical for bots). Review /var/log/fail2ban.log weekly. Whitelist known partners or CDNs. Combine with Cloudflare / ModSecurity for layered protection. Would you like me to show a ready-to-use bundle of filters (noscrape + 404 + flood) in one jail.local that you can drop directly into a Debian/Ubuntu Fail2Ban install?
Code:
Perfect 👍 — here’s a ready-to-use anti-crawler / anti-scraper setup for Fail2Ban that you can copy directly to /etc/fail2ban/jail.local.
This setup has three protection layers: noscrape → catches bad user-agents 404 → catches scanners and broken scrapers flood → catches too many requests per second All work together and are tuned to minimize false positives while effectively stopping bad bots. 🧱 /etc/fail2ban/jail.local # =============================== # Anti-Crawler / Anti-Scraper Jails # =============================== [DEFAULT] # Whitelist localhost and your internal networks ignoreip = 127.0.0.1/8 ::1 bantime = 86400 # 1 day ban findtime = 60 maxretry = 5 backend = auto banaction = iptables-multiport # =============================== # 1. Bad User-Agent Scrapers # =============================== [nginx-noscrape] enabled = true filter = nginx-noscrape logpath = /var/log/nginx/access.log port = http,https maxretry = 20 findtime = 60 bantime = 86400 action = iptables-multiport[name=NoScrape, port="http,https"] ignorecommand = # You can increase `maxretry` if you have legitimate API traffic # =============================== # 2. Repeated 404s / Path Probing # =============================== [nginx-404] enabled = true filter = nginx-404 logpath = /var/log/nginx/access.log port = http,https maxretry = 10 findtime = 60 bantime = 3600 action = iptables-multiport[name=No404, port="http,https"] # =============================== # 3. Request Flooding (rate-based) # =============================== [nginx-flood] enabled = true filter = nginx-flood logpath = /var/log/nginx/access.log port = http,https maxretry = 100 findtime = 10 bantime = 600 action = iptables-multiport[name=Flood, port="http,https"] 🧩 /etc/fail2ban/filter.d/nginx-noscrape.conf [Definition] # Detect common scraping tools or fake/bad bots in User-Agent failregex = ^<HOST> -.*"(GET|POST).*HTTP/.*" .*"(?:-|curl|wget|python|httpclient|libwww|nikto|scan|scrapy|Go-http-client|java|php|perl|ruby|bot|crawler|spider).*"$ ignoreregex = "(Googlebot|Bingbot|YandexBot|DuckDuckBot|AhrefsBot|Applebot)" This bans clients using clearly automated or fake agents, but ignores major search engines. 🧩 /etc/fail2ban/filter.d/nginx-404.conf [Definition] # Detect repeated 404 responses from same IP failregex = ^<HOST> -.*"(GET|POST).*" 404 ignoreregex = 🧩 /etc/fail2ban/filter.d/nginx-flood.conf [Definition] # Detect excessive requests (any response code) failregex = ^<HOST> -.*"(GET|POST).*HTTP/.*" ignoreregex = 🔍 Test Before Activating Run: sudo fail2ban-regex /var/log/nginx/access.log /etc/fail2ban/filter.d/nginx-noscrape.conf That shows which log entries match your filters — crucial to avoid banning good users. Then enable and start Fail2Ban: sudo systemctl enable fail2ban sudo systemctl restart fail2ban sudo fail2ban-client status You’ll see jails like: Jail list: nginx-noscrape, nginx-404, nginx-flood To inspect bans: sudo fail2ban-client status nginx-noscrape ⚙️ Optional Tweaks For Apache, just change logpath to /var/log/apache2/access.log and the filters will still work (format compatible). If your log format differs, adjust the regex (especially if you use JSON or custom fields). Add your whitelist under [DEFAULT] → ignoreip. Would you like me to extend this setup to log and email ban notifications (showing IP, user-agent, and request pattern)? It helps to analyze who’s hitting your site and why.
Kas
|
![]() |
Ansicht |
![]() |
![]() |
![]() |
ForumregelnEs ist dir nicht erlaubt, neue Themen zu verfassen.
Es ist dir nicht erlaubt, auf Beiträge zu antworten.
Es ist dir nicht erlaubt, Anhänge hochzuladen.
Es ist dir nicht erlaubt, deine Beiträge zu bearbeiten.
BB-Code ist an.
Smileys sind an.
[IMG] Code ist an.
HTML-Code ist aus. Trackbacks are an
Pingbacks are an
Refbacks are aus
|
|
Nützliche Links |
Heutige Beiträge |
Sitemap |
Suchen |
Code-Library |
Wer ist online |
Alle Foren als gelesen markieren |
Gehe zu... |
LinkBack |
![]() |
![]() |