I would like to present my friend and colleague, Dmitry Gordiyevsky. We have worked together for a long time on both sides of the ocean: initially in Ukraine and now in the U.S. Recently Dmitry helped me protect my site from spam bots, and I asked him to tell in detail about the plug-in he developed.

All of us have probably run into spam bots or have at least heard about them at some point in time. For a long time I have been battling with spam comments made by bots, and this has driven me to a point where I have decided to stop them once and for all.

It all started with my frustration of cleaning out a sty of a forum filled to the brim with spam comments. I would delete more than a hundred spam messages on a daily basis. That prompted me to install one of the CAPTCHA modules as the first line of defense. This has helped to reduce the wave of electronic manure, but not for long.

As you might already know, the CAPTHA method, most often encountered on the web today works something like this: an image of a string of text distorted in some way is presented to the user, and the user has to recognize and enter the text to pass through. Once the user has successfully passed the validation process, he may go on to do some action (i.e. leave a comment, register for an account, etc).

Unfortunately, spam bots are developing very quickly and today can recognize CAPTCHA images nearly as well as humans can. According to some sources bots can successfully recognize as much as 85% of the CAPTCHAs encountered. As a result CAPTCHAs become more and more difficult to understand even for a human, sometimes making people abandon their attempt to solve a CAPTHA and hitting that dreadful back button.

To a lot of developers the next evolutionary step in the protection against spam bots is to design an algorithm that is relatively easy to solve for humans, extremely difficult to solve for bots, and can be easily installed. Many software engineers have come to a conclusion that the new system will exploit the fact that people are much better at recognizing pictures than computers will be in the foreseeable future.

There are developers that keep popping up on the internet today who try to design an anti-spam system based on images. Even Google has attempted to build one some time ago. But most of these systems have an Achilles’ heel of some sort. If the system is nearly impossible to break, it would be difficult to install. It may even be too difficult to solve for humans. If the system is easy for humans to solve, it may also be quite easily broken by a computer. Point in case, a system where the user has to select a different object among a number of other identical objects (i.e. the task is to choose a cat among nine images, where there is one picture of a cat, and 8 identical pictures of a dog).

Analyzing all the past attempts of creating a successful system that would comply with all of the three requirements set above, and listed below:

  1. Easy to solve for humans

  2. Impossible to solve for computers

  3. Easy to install

I was able to come up with an algorithm that can eliminate all the current generation of spam bots. I was able to do it by adding an extra layer of complexity into the problem of choosing images.

I named my program “aRe yoU a Human?” or “pRove yoU are a Human” or RUH for short. The most important aspect of the underlying algorithm in RUH is that it asks people to select two images that are not identical or even similar, but belong to the same category or subcategory of objects. For example consider the following illustration:

As you can see from Fig. 1, the closest two objects from the set of nine objects shown are jets and a vessel – they are both from the ‘Machinery’ main category.  On Fig. 2, despite that there are three images with animals (‘Animals’ main category), the dogs are a better choice, because they belong to the ‘Dogs’ subcategory of ‘Animals’. There can be any number of images that fall into any number of categories and subcategories in the RUH database. That’s what makes this algorithm extremely effective at stopping current generation of spam bots.

But this is all talk, without any kind of statistical data to back up the theory. I was excited to test my creation so I developed a standalone build of RUH, which I now have also made into a WordPress plug-in. At first I installed RUH on the forum that was being overloaded with spam. Spam simply disappeared. Then I asked my colleague and a friend Anatoly Milner to install the RUH WordPress plug-in on his blog (in Russian).

The results from running RUH on Anatoly’s website couldn’t make me happier. After a month of exposure to the internet, RUH fetched 2400 puzzles, from which there were 1600 attempts to select two most closely related pictures, and from which only 50 were actually humans. Spam is now a thing of the past for Anatoly and his blog.

Inspired by the results, I am in the process of exposing my plug-in to the world. In the nearest future RUH will appear in the official WordPress library of plug-ins.

If you wish to follow the development of RUH or participate in the discussion, please find more info at my blog (in English)

Read in Russian

Комментарии

дискуссий, синхронизированных с Фейсбук, и (за ними) «внутренняя» дискуссия, если она есть

Powered by Facebook Comments

Тэги: , , , , , , , , ,

комментариев 10 to “No bots allowed!”

  1. Dmitry, we have already discussed your plug-in in Russian part of the blog. Here is additional question. What do you tell about this technique — nobotsallowed.com?

    • Anatoliy, thank you for the link, it is interesting technique.
      The only disadvantage, IMO, is that it tells the User what to do in plain English, it might be a weak spot. As I already mentioned, my method doesn’t use any meaningful words which can be used as a clue how to solve the puzzle.

      As for me — it seems that I will have to rename my project, unfortunately…

      • Please, take a look on typical comments in Twitter stream at this site: «I hate any intellectual captcha. If I see it, I never send a comment and leave the site». What do you think about this issue?

        • Anatoliy, I agree with Artyom that most of the Tweets are about reCAPTCHA (i.e. text-based CAPTCHA). Although, in order to be more tolerant to the people, the plug-in might be installed only for initial registration, leaving comments either unprotected or choosing another, ‘lighter’ CAPTCHA for them.

          • …the plug-in might be installed only for initial registration, leaving comments either unprotected or choosing another, ‘lighter’ CAPTCHA for them

            Dmitry I think you are absolutely right. About comment protection, it have to decide site owners. In many case for this goal, an antispam plug-in like Akismet may be enough.

  2. Artyom:

    Hello Anatoliy,

    IMHO, and I think we can agree on this, that some sort of spam protection on a blog is a necessary evil nowadays, if we want to allow people to comment on it.

    And of course, there are bound to be people who do not want to solve any Capthas at all. And there are even those that say they do not want to answer any ‘intellectual captchas’. In this case, we just need to let those particular readers go, because they can’t even solve a trivial puzzle, what can they possibly bring to the discussion?

    I think the bottom line is, as a blog owner, or a developer, we cannot please 100% of our readers or users (much like in real life), so our goal is to get a good percentage of people liking our blog or software in order for us to be profitable or happy or both. Even if we’re not making a profit from this, still we’re filtering not only spam bots, but also people with bad puzzle solving skills.

    Also, I looked at the website nobotsallowed dot com and the Twitter feed seemed to have users mostly complaining about the traditional CAPTCHAs with distorted text. And obviously it is a collection of tweets that are made to convince the visitor to consider the nobotsallowed dot com product more seriously. Essentially a marketing device. A lot of the tweets are dated more than 2 days ago, and are on a loop. In other words they are not real time tweets, and repeat after about a minute.

    So, to summarize, I strongly believe that even with some people hating to solve any kind of captchas, even intellectual ones, they (CAPTCHAs) are not going away any time soon. The remaining question is: how to maximize the percentage of people who are willing to try, and use RUH?

    • Thank you for the comment, Artyom! Attractiveness is the keystone of any web site. I hope that my RUH is aesthetically pleasing to everybody:) As for intelligence… Well, there is a ‘golden rule’ — the simpler defense is, the more chances it will be broken. Did I find the ‘golden mean’? I don’t know. Let’s the users decide.

    • Hi Artem, I do agree with you but…only in common. Maybe, I’m not such smart but as a regular user, unfortunately, I don’t like intellectual capture too:). On the other hand as a blogger, I admire any captcha and other antispam and antibot tools so much. It is the reason why I told Dmitry about those readers’ opinion. If you want to work on a market, you must know opinions of all the market consumers. And I glad Dmitry indirectly agrees (above) with my point of view.

  3. Hi there would you mind sharing which blog platform you’re using?

    I’m planning to start my own blog soon but I’m having a hardd time choosing between BlogEngine/Wordpress/B2evolution and Drupal.
    The reason I ask is because your desig seems different then most blogs and I’m looking for something
    unique. P.S My apologies for being off-topic but
    I had to ask!

    Feel frree to visiut my website; marry