Rack::Attack: protection from abusive clients

I'm excited to introduce Rack::Attack, a ruby rack middleware for throttling abusive requests. We depend on it to keep Kickstarter fast and reliable.

If you've looked at web server logs, you know there are some weird clients out there. Malicious scripts probe for exploits. Scrapers download the same page dozens of times each second, or request the 10,000th page of comments for a post with only 2 comments.

Tackling each curious anomaly that threatens your site's reliability saps developer productivity and happiness. Rack::Attack lets you throttle abusive requests with just a few lines of code. Check out the README for more details about how it works. Seriously, the README does a great job explaining how to use it. Okay, I'm going to assume you've skimmed the README. Moving on.

What kind of requests do we throttle?

We limit the number of requests that can be made per IP address in a short time period like this:

Rack::Attack.throttle('ip', limit: x, period: y) do |req|
  req.ip
end

Pro tip: to allow occasional bursts, set the limit and period to an higher multiple. Instead of limit: 1, period: 1 (1 req/s), do limit: 10, period: 10. The long-term average still can't exceed 1req/s.

Typical visitors never come close to our limit. But aggressive scrapers often do. Of course we graph it.

throttle

Those shark fin-shaped spikes are our database thanking us.

For the security of our users, we have a stricter throttle for login attempts. This makes it very time consuming for attackers to guess users' passwords.

# Throttle logins per ip
Rack::Attack.throttle("login_ip", limit: x, period: y) { |req|
  req.ip if req.post? && req.path == "/session"
}
# Throttle logins per email param (regardless of ip)
Rack::Attack.throttle("login_email", limit: x, period: y) { |req|
  req.params['email'].presence if req.post? && req.path == "/session"
}

We also use the IPCat ruby library to detect requests from well-known datacenters. You could block login attempts from datacenters with this:

Rack::Attack.blacklist('bad_login_ip') { |req|
  req.post? && req.path == "/session" && IPCat.datacenter?(req.ip))
}

Easily graph requests

Rack::Attack can also track requests without blocking them. On Feb 14, we launched our iPhone app, and wanted an easy way to monitor the HTTP requests it generates. Since the app uses a special header, it was simple to track with Rack::Attack:

Rack::Attack.track("ios_app") { |req|
  req.env.key?("HTTP_OUR_CUSTOM_HEADER")
}

We are very happy with how it went:

iphone launch

We rely on Rack::Attack to let developers quickly track and throttle requests. It helps keep our site reliable, so we can spend more energy building better features. We're glad to make it publicly available to the open source community.

Comments

    1. Seb3-crop.small

      Creator Sebastien Barre on May 2, 2013

      Would wholesale blocking all login attempts from known datacenters affect someone who is proxying their traffic through a VPS or a cloud server?

      Sometimes when I'm on an untrusted network (on the road, airports, etc) I use an SSH tunnel to proxy my traffic through one of my Rackspace boxes.

      Or do you only start to block after a certain number of failed attempts?

    2. Aaron-suggs-170x240.small

      Creator Aaron Suggs on May 2, 2013

      @Sebastien,

      Well, what we do is a bit more nuanced than the example above. ;-)

      That said, it's a best practice to configure your proxy to set the X-Forwarded-For header with the source IP.

    3. Missing_small

      Creator Carl on May 2, 2013

      @Aaron

      That's entirely irrelevant. A lot of people don't trust their local network security, given that they use wifi. As such, they tunnel *all* TCP traffic via a remote server with an encrypted TCP tunnel. I do, for instance. Blocking non-abusive access from well-known data centers is a good way to block people who care about their network security from using your site.

    4. Missing_small

      Creator Michael Dopheide on May 2, 2013

      @Carl,

      You probably aren't tunneling your traffic through the same datacenters they're likely blocking (unless you're using TOR, in which case you're using it wrong). After all, you were able to access the site. I'm guessing they're referring to those that are popular for spammers, bots, etc that are part of public and private blacklists. If you're tunneling through any of those, you should probably stop anyway.

      @Aaron,

      I think this is great and probably an easier entry into throttling for a lot of site admins over iptables (mentioned in the README) or doing it through the network hardware. Plus with the move to the cloud services, a lot of people won't even have access to the network layer.

    5. Missing_small

      Creator Carl on May 3, 2013

      @Michael

      Of course I'm not. I have to pay to use a tiny data center that no one knows about, instead of using a free EC2 micro instance, because everyone blocks that. And they'd block the data center I'm using if they knew about it. Because that's just how bad the policy is.

    6. Seb3-crop.small

      Creator Sebastien Barre on May 4, 2013

      @Michael I actually looked at the datacenters file they reference in the article and it includes Rackspace and others, so it's not just a database of "bad" datacenters, it's just a list of all major/known ones..

      @Aaron made the good point that configuring your proxy is a workable solution, so I guess I just have to look into how I do that when I'm tunnelling all my traffic through an SSH proxy...