防爬虫策略构
2025-07-02
22
参考资料
防爬虫策略构
验证码机制
关键操作:在敏感操作前插入CAPTCHA验证
示例代码:
from flask import Flask, request import random app = Flask(__name__) @app.route('/search') def search(): if 'captcha' not in request.cookies: return '''<form action="/verify"> <img src="/captcha_img"> <input name="captcha"> <button>Submit</button> </form>''' return "Search results..." @app.route('/captcha_img') def captcha_img(): # 生成4位随机验证码图片 code = ''.join(random.choices('ABCDEFGHJKLMNPQRSTUVWXYZ23456789', k=4)) # 实际实现应使用图形库生成图片 return code
请求频率限制
关键配置:Nginx限流设置
示例配置:
http { limit_req_zone $binary_remote_addr zone=search_limit:10m rate=10r/m; server { location /api/ { limit_req zone=search_limit burst=20 nodelay; proxy_pass http://backend; } } }
动态页面技术
关键实现:AJAX动态加载
示例代码:
<div id="product-data"></div> <script> fetch('/api/product/123') .then(r => r.json()) .then(data => { document.getElementById('product-data').innerHTML = ` <h2>${data.name}</h2> <p>Price: $${data.price}</p> `; }); </script>
行为分析检测
关键检测点:鼠标移动轨迹分析
示例代码:
let mousePath = []; document.addEventListener('mousemove', (e) => { mousePath.push({x: e.clientX, y: e.clientY, t: Date.now()}); }); function analyzeBehavior() { // 检测直线移动(非人类行为) const straightLine = mousePath.every((p,i) => i === 0 || Math.abs(p.x - mousePath[i-1].x) < 5); if(straightLine && mousePath.length > 10) { fetch('/log/bot', {method: 'POST', body: JSON.stringify(mousePath)}); } } setInterval(analyzeBehavior, 5000);
数据混淆技术
关键方法:CSS类名随机化
示例实现:
import hashlib def generate_class(product_id): return 'p_' + hashlib.md5( f'salt_{product_id}'.encode() ).hexdigest()[:8] # 输出示例 print(generate_class(123)) # 输出类似 p_a1b2c3d4
接口令牌验证
关键流程:动态令牌生成
示例代码:
import time import hmac SECRET_KEY = b'your_secret_key' def generate_token(): timestamp = int(time.time() / 300) # 5分钟有效期 return hmac.new(SECRET_KEY, str(timestamp).encode(), 'sha256').hexdigest() def verify_token(token): current = generate_token() previous = hmac.new(SECRET_KEY, str(int(time.time()/300)-1).encode(), 'sha256').hexdigest() return token in (current, previous)