Click

The parser base nakolesah.ru

Грабим nakolesah.ru

An example of the parser site nakolesah.ru

Phew, dokrutili parser nakolesah to sane state and grab the selection of tires on your car. Who cares - a reference to the script at the end of the post.

Something in him will have to change, not so I like the logic of the present, based on the GET-requests (if the browser gets all the information referring to the asp-script to the transfer of various parameters in the POST-request). I POST only at the end, and we ought to try the browser completely copied, so the time was not particularly understand.

Still do not like a crutch as a function of change of names of models of machines. Parsing nkolesah faced with the problem (actually only for GET-requests) different names brands and modifications of machinery in the drop-down lists and address pages, for example:

 sub TransformModel ($$){ my ($ brand, $ car_model) = @ _; $ car_model = ~ s / - / / g if $ brand! ~ / Saab | Jaguar | Nissan | Honda | Citroen | MG | Mercedes | Mazda | Ford / i; $ car_model = ~ s /[-+]/_/ g if $ brand! ~ / Citroen / i; if ($ brand = ~ / Nissan / i) {$ car_model = ~ s/Z/350z / i; $ car_model = ~ s / GT_R / GTR / i;} $ car_model = 'navigaror_1' if $ brand = ~ m # Lincoln # i and $ car_model eq 'Navigator'; $ car_model = 'Du% D1% 81ato' if $ brand = ~ m # Fiat # i and $ car_model = ~ / dusato / i; if ($ brand = ~ / Chery / i) {$ car_model = 'c_eastar' if $ car_model eq 'CrossEastar'; $ car_model = $ brand .'_'.  $ Car_model if $ car_model = ~ / kimo | qq \ d? / I;} return $ car_model;} 

Complete unloading takes about 12 hours in sequential mode (works in one stream, multi-threading client was not necessary, but I had no time to attach it for fun). If someone decide to make downloading and parsing - I do that like four copies of the script and break the range of brands of machines into four groups, respectively (all in the database nakolesah 61 mark at the moment). You can use the ready decomposition, which is the code I have done:

  # Next if $ brand! ~ / Rover | FAW | Volkswagen | Ferrari | Jaguar | Smart | Suzuki | gaz | Bentley |
 Peugeot | Pontiac | Honda | Maybach | vaz | Infiniti | Buick | Subaru / i;
 # Next if $ brand! ~ / Lancia | Opel | Daihatsu | Hummer | Kia | Fiat | Nissan | Saturn |
 Mini | Hyundai | Renault | Citroen | Lincoln | Chevrolet | Dodge / i;
 # Next if $ brand! ~ / Chery | Mazda | Ford | uaz | Acura | Porsche | Lotus | Volvo | Toyota |
 Skoda | Cadillac | Scion | Saab | Mercury | Daewoo / i;
 # Next if $ brand! ~ / Chrysler | BMW | Isuzu | MG | Mercedes | GMC | Seat | Maserati |
 Mitsubishi | Jeep | Lexus | Audi | Lifan | Geely / i; 

In each of the four copies to uncomment the range, the files are better described in different ways, as the default output is a file named imya_skripta.xml (although you can if zpuske little key to transfer the output file name).

Along the way, did skriptik to validate the results of the parser nakolesah.ru, once again rejoiced beautiful pearl regulyarok:

  m | <(\ w +) \ s? \ w *=?"? \ w * "?> \ s * </ \ 1> $ | ig 

single line scans the tags to the occupancy (all I download), understands the tags with attributes and without. Validator results nakolesah.ru unloading can be downloaded along with the parser.

For fun, a little showgirl (when can pull ponostalgirovat :) ):

  • net database in XML (no blank lines):

      $ Wc-l nakolesah.ru_full_base_4.12.2009.xml
     550 657 nakolesah.ru_full_base_4.12.2009.xml 

  • 577 car models

As promised, a link to download a parser-grabber site nakolesah.ru (validator output also is in the archive): nakolesah.ru_parser + validator

Good luck to everyone!

More on similar topics:

Category Filed under: Internet , Coding | Tag Tags: , , , , | Comments 18 comments

Comments

18 comments to "base parser nakolesah.ru"

  1. sberkut writes:

    Good day! Apparently they changed the design and size are not parsed, can not you fix this payment / free of charge! thank you)

    • dimio writes:

      Dimensions of exactly what? Let us at once with the specifics, so it will be easier to understand what was going on.

      • sberkut writes:

        excellent script takes the machines, but the sizes of wheels and tires suitable he chooses, as a result of xml has the form:

        ....

        • dimio writes:

          I can not say what it was, as I normally unloaded all the information.

          • sberkut writes:

            I swill redirect does not work, writes:

            Use of uninitialized value $ redir_url in concatenation (.) Or string at / home / digbox / data / www / digbox.ru / cgi-bin / nakolesah_ru_parser.pl line 152.

            not help us to understand? :)

            • dimio writes:

              Immediately on the first run does not pass? Add to line 152 as follows:

              A
              2
              print $ response-> content, "\ n";
              exit;

              and let me know the result.

              • sberkut writes:

                issues following:

                1 | # | | 4 | 54 | pageRedirect | |% 2fselect% 2ftiresbyauto% 2facura% 2fcl% 2f2003% 2f32i.aspx |

                I realized URL to redirect to recognize, but does not pass : (

                • dimio writes:

                  But this fix is ​​not difficult. He just did not recognize the link to redirect, changed shape since its issuance.
                  It should be in line to replace 150 search pattern:

                  A
                  my $ redir_url = $ 1 if $ response-> content = ~ m # / ([\ WA-nk-I \. \ s \ (\ ),%-]+) \ | $ # i;

                  on

                  A
                  my $ redir_url = $ 1 if $ response-> content = ~ m # \ | \ | ([\ WA-nk-I \. \ s \ (\ ),%-]+) \ | $ # i;
                  • sberkut writes:

                    Thank you very much it worked)

                  • sberkut writes:

                    but not hurried (did not want to pull out, keeps the same (

                    • dimio writes:

                      Most likely it has changed not only the form of issue links, but also giving information on tires / disks and to recover, have a lot of change in the function of parsing pages.

  2. Cry wrote:

    Corrected parser can share or base or of his own writing ... skype:

    A
    cry.int
  3. Vipertp writes:

    If someone could fix the parser. Help pzhl.
    icq: 308037667
    skype: viperstp

  4. Rock'n'roll writes:

    Can someone share all the same information, why not pull out sizes, what the code should be changed.

    • dimio writes:

      Above people gave their contact and wrote that he straightened everything under the present-day. the conditions.

  5. Alexander writes:

    Hello, if anyone has a parser for php then please share and then immediately have (((my Asya 202 716 and then we Dle Engine (he nxn)

Leave a Reply