Anonymizer as a replacement for the proxy. Checking the validity of proxy
Stumbled on an interesting note titled " And a little more about the Google Hack ", in which the author describes the use of proxy (anonymizer example - the site ) Instead of public proxies to bypass the captcha on Google.
This method of using proxy I also liked, and I decided to write a script to collect and verify the validity of the list of public web proxy.
Benefits to the anonymizer "classic" public proxy
- Proxies, to distinguish from a public proxy, rarely die and are almost always available online
- Proxies usually provide a speed higher than that of public proxies or Tor
- Anonymizer will not only hide your IP-address, but, depending on the settings, can hide cookies, user-agent, etc. "Tails"
- Working through a web proxy easier to "teach" their program - enough to pass the encoded string interface anonymizer
For what is useful and what can be useful anonymizer (web proxy)?
- for use in conjunction with the parser of search engines - to bypass captcha, issued by the SAR when you receive a large number of requests from one address
- sending requests to the correct site through different proxies, you can cheat counter visits (this hypothesis needs verification)
- scripts for posting in various forums for posting comments on sites
- in any other parsing, where there is a risk the ban, anonymizer is also useful (for example, when parsing a directory site nakolesah.ru , which I mentioned)
Collect a list of public web proxy
Build and test the proxy list, we entrust perl-script, some fragments of which are given below and the full text as usually available for download in the " Soft "(ibid. it will update.)
To run the script in the mode of the list of proxies, you need to pass through the-i option value or google ajax:
anocheck.pl -i google
Explanation of options:
- google - search for public Web proxies used parsing issue Web search Google. The list in this case is large enough, but there is a chance to get a captcha or a temporary ban
- ajax - a list of proxies obtained from a query to the Google API to search. At the output of 8 results, but there is no captcha.
It seems to me the best version of this script - the primary proxy list is compiled using the option google, then transferred to the test list file.
To search for a proxy, working on engines and PHPProxy Glype use the following query parameters:
A 2 3 4 | # 1 - on the engine PHPProxy = '"Rotate13" "Base64" "Strip" inurl:index.php?q=' ; my $ phproxy_sreq = '"Rotate13" "Base64" "Strip" inurl: index.php? q ='; # 2 - on the engine Glype = '"Encode URL" "Allow Cookies" "Remove Scripts" inurl:browse.php?u=' ; my $ glype_sreq = '"Encode URL" "Allow Cookies" "Remove Scripts" inurl: browse.php? u ='; |
This is followed by parsing the results of Google and entering addresses found in the web proxy list:
A 2 3 4 5 6 7 8 | # 1 - The search engine based on PHProxy $source =~ m #<h3 class="r"><a href="(https?://w{0,3}\.?[\w-]+\.[az]{2,4}[/\w-]*/index\.php)\?q#ig) { while ($ source = ~ m # <h3 class="r"> <a href = "(https?: / / w {0,3} \.? [\ w-] + \. [az] {2, 4} [/ \ w-] * / index \. php) \? q # ig) { { $1 } ++; $ Proxy_list -> {$ 1} + +; } # 2 - look for work on Glype $source =~ m #<h3 class="r"><a href="(https?://w{0,3}\.?[\w-]+\.[az]{2,4}[/\w-]*/browse\.php)\?u#ig) { while ($ source = ~ m # <h3 class="r"> <a href = "(https?: / / w {0,3} \.? [\ w-] + \. [az] {2, 4} [/ \ w-] * / browse \. php) \? u # ig) { { $1 } ++; $ Proxy_list -> {$ 1} + +; } |
Checking the validity of proxy
In addition to creating a list of web proxy, the script can check the existing list for validity, it is enough to send him through the-i option to name the file containing the proxy list:
anocheck.pl -i proxy.txt
Mechanism to verify the validity of proxies found not too complicated (I took his idea of the notes listed in the first paragraph) - each found anonymizer sends a request to open the main page of Google, which is then parsed to see if it the correct title. If the header is present - consider proxies working - otherwise transfer to the list of public holidays:
A 2 3 4 5 6 7 8 9 10 11 12 13 14 | ( keys %$proxy_list ) { foreach my $ proxy_url (keys% $ proxy_list) { = $ua -> get ( $proxy_url . '?q=' . encode_base64 ( 'http://www.google.com' ) ) ; my $ response = $ ua -> get ($ proxy_url. '? q ='. encode_base64 ('http://www.google.com')); # Warn "Error: $ response-> status_line \ n" unless $ response-> is_success; $response -> decoded_content =~ m #<title>Google</title>#) { if ($ response -> decoded_content = ~ m # <title> Google </ title> #) { "%-45s %10s" , $proxy_url , " \x 1b[32m [OK] \x 1b[0m \n " ) ; printf ("%-45s% 10s", $ proxy_url, "\ x 1b [32m [OK] \ x 1b [0m \ n"); } else { "%-45s %10s" , $proxy_url , " \x 1b[31m [ERROR] \x 1b[0m \n " ) ; printf ("%-45s% 10s", $ proxy_url, "\ x 1b [31m [ERROR] \ x 1b [0m \ n"); @bad_proxy , $proxy_url ) ; push (@ bad_proxy, $ proxy_url); $proxy_list -> { $proxy_url } ) ; delete ($ proxy_list -> {$ proxy_url}); } } |
The results validate the proxy list
As a result, proxy checking the validity of results in two files (by default, with names and good.txt bad.txt), containing, respectively, lists of recent and not validated proxy.
Valid proxies, as mentioned above, you can connect to the parser, and the invalid - from time to time to check again (do not overwrite the list of valid, as supplemented), and in general usage found Web proxy depends on the availability of their own ideas, each, of which I wish you more good-bye!
More on similar topics:
Filed under: Internet , Coding , Search Engines |
8 comments 


Excellent article, I am sure will be helpful to many. Sorry, my technical level does not allow to use it.
Take advantage of some sort is easy - download a program started, get the result.
Dmitry, do you think it can be used for posting in various acca one blogging system?
I think that is possible. Check is not complicated - enough hands to go through and do anonymizer necessary actions.
Thank you for mentioning my blog
Site by the way you have an interesting, signed up.
And I thank you for your kind words, glad to get acquainted as it
this is great! good article!
I always use the site dostupest.ru, on the other can catch vyrusov