Sony Vaio Specification Tables
|
|
Hi, I would like to extract the notebook specification information from this page: It is essentially two columns of info but separated in different length(row) tables. If you help me with this I will send you a few bucks with Paypal :). Cheers |
|
|
I make this code time ago, it isn’t fully working but atleast it helps, sorry if I not remember to post this:
require 'rubygems'
require 'scrubyt'
Scrubyt.logger = Scrubyt::Logger.new
sony_data = Scrubyt::Extractor.define do
fetch 'http://vaio.sony.co.uk/view/ShowProduct.action?product=VGN-AR51SU&site=voe_en_GB_cons&category=VN+AR+Series'
product_img "//div[@class='bgProdImg']" do
product_img_url 'src', :type => :attribute
end
product_name "//div[@class='col2']" do
product_name_title "//h1"
product_name_subtitle "//h2"
end
product_description "//ul[@class='ulType1']" do
product_description_1 "//li[1]"
product_description_2 "//li[2]"
product_description_3 "//li[3]"
product_description_4 "//li[4]"
end
end
sony_data.to_xml.write($stdout, 1)
Here is the output:
<product_img>
<product_img_url><a href="http://sp.sony-europe.com/media/47/23070">http://sp.sony-europe.com/media/47/23070</a></product_img_url>
</product_img>
<product_name>
<product_name_title>VGN-AR51SU</product_name_title>
<product_name_subtitle>Full HD Entertainment notebook featuring Blu-ray Disc⢠Drive</product_name_subtitle>
</product_name>
<product_description>
<product_description_1>Integrated Blu-ray Disc⢠Drive</product_description_1>
<product_description_2>WUXGA X-black LCD with Double Lamp Technology</product_description_2>
<product_description_3>Integrated hybrid digital/analog TV tuner and remote control</product_description_3>
<product_description_4>Maximum features of Windows Vista® Ultimate combined with the power of latest Intel® Centrino® Duo processor technology</product_description_4>
</product_description>
|