Parser base nakolesah.ru

Грабим nakolesah.ru

An example of parser site nakolesah.ru

Ugh, dokrutil parser nakolesah to the alleged state and sgrabil selection of tires on the car. Who cares - a reference to the script at the end of fasting.

Something in him still be changed, not really I like the logic of the present, based on the GET-requests (while the browser gets all the information referring to the asp-script to the transfer of various parameters in the POST-request). I POST only at the end, and we ought to try your browser to copy, but the time was not particularly understand.

It is not like a crutch as a function of the replacement of names of models of machines. When parsing nkolesah faced with the problem (relevant only for GET-requests) different name brands and modifications of the machines in the drop-down lists and address of the page, for example:

  sub TransformModel ($$){
 my ($ brand, $ car_model) = @ _;
	 $ Car_model = ~ s / - / / g if $ brand! ~ / Saab | Jaguar | Nissan | Honda | Citroen | MG | Mercedes | Mazda | Ford / i;
	 $ Car_model = ~ s /[-+]/_/ g if $ brand! ~ / Citroen / i;

	 if ($ brand = ~ / Nissan / i) (
		 $ Car_model = ~ s/Z/350z/i;
		 $ Car_model = ~ s / GT_R / GTR / i;
	 )

	 $ Car_model = 'navigaror_1' if $ brand = ~ m # Lincoln # i and $ car_model eq 'Navigator';
	 $ Car_model = 'Du% D1% 81ato' if $ brand = ~ m # Fiat # i and $ car_model = ~ / dusato / i;

	 if ($ brand = ~ / Chery / i) (
		 $ Car_model = 'c_eastar' if $ car_model eq 'CrossEastar';
		 $ Car_model = $ brand .'_'.  $ Car_model if $ car_model = ~ / kimo | qq \ d? / I;
	 )
 return $ car_model;
 ) 

Full unloading takes about 12 hours a sequential mode (works in one stream, the customer does not need multithreading, but I had no time to put on it for fun). If someone is artificially unloading and parsing to produce - to advise for example, four copies of the script and break the band makes of vehicles into four groups, respectively (all in the database nakolesah 61 mark at the moment). You can use the ready decomposition, which is the code I have done:

  # Next if $ brand! ~ / Rover | FAW | Volkswagen | Ferrari | Jaguar | Smart | Suzuki | gaz | Bentley |
 Peugeot | Pontiac | Honda | Maybach | vaz | Infiniti | Buick | Subaru / i;
 # Next if $ brand! ~ / Lancia | Opel | Daihatsu | Hummer | Kia | Fiat | Nissan | Saturn |
 Mini | Hyundai | Renault | Citroen | Lincoln | Chevrolet | Dodge / i;
 # Next if $ brand! ~ / Chery | Mazda | Ford | uaz | Acura | Porsche | Lotus | Volvo | Toyota |
 Skoda | Cadillac | Scion | Saab | Mercury | Daewoo / i;
 # Next if $ brand! ~ / Chrysler | BMW | Isuzu | MG | Mercedes | GMC | Seat | Maserati |
 Mitsubishi | Jeep | Lexus | Audi | Lifan | Geely / i; 

In each of the four copies uncomment the desired range, the files are best described in different ways, because the default output is a file named imya_skripta.xml (though you can with a little key zpuske output file transfer).

Along the way, did skriptik to validate the results of the parser nakolesah.ru, once again rejoiced in the beauty of pearl regulyarok:

  m | <(\ w +) \ s? \ w *=?"? \ w * "?> \ s * </ \ 1> $ | ig 

one line checks the tags on the occupancy (all to download it), understands the tags with attributes and without. Validator results unloading nakolesah.ru can be downloaded together with the parser.

For fun little social statistics (which can pull ponostalgirovat :) ):

  • net database in XML (without blank lines):
      $ Wc-l nakolesah.ru_full_base_4.12.2009.xml
     550657 nakolesah.ru_full_base_4.12.2009.xml 
  • 577 models of machines

As promised, a link to download the parser-grabber Site nakolesah.ru (validator output also lies in the archive): nakolesah.ru_parser + validator

Good luck!

More on similar topics:

Leave a Reply