rstehwien
1 post
|
I’m trying to learn scRUBYt by pulling out all the WizKids minature images. How can I filter out the links returned below to only those with an href having `releaseid=` for example I want <a href="figuregallery.asp?releaseid=11"> as well as <a href="figuregallery.asp?releaseid=99">?
web_data = Scrubyt::Extractor.define do
fetch 'http://www.wizkidsgames.com/heroclix/dc/figuregallery.asp'
release_links "//td" do
link "//a" do
url "href", :type => :attribute
end
end
# 1. on first page find lines like this for each release:
# <td align="center"><font class="body"><a href="figuregallery.asp?releaseid=11"><img alt="Hypertime" src="/images/releases/release_Hypertime.gif" border="0"></a><br>Hypertime</font></td>
# 2. on those links find lines like this for each character
# <tr><td class="tdbody"><a href="figuregallery.asp?unitid=2414">Aquaman</a></td><td class="tdbody">Rookie</td><td class="tdbody">Hypertime</td>
# 3. the desired character info looks like
# <td colspan="2" align="center"><img src="/images/figures/Rotating/HDHT/HDHT_052_rot01.jpg" border="0" name="imgBase" id="Img1">
# <tr><td class="tdheader">Name</td><td class="tdbody">Aquaman</td></tr>
# <tr><td class="tdheader">Collector's Number</td><td class="tdbody">052</td></tr>
# 4. Loop to 2 for each page navigation link that looks like
# <a href="figuregallery.asp?action=showsearchresults&Output=0&Flight=0&Aquatic=0&DialType=0&Retired=0&Strength=0&AttackQty=0&GamePlayTips=&DialCountComp=0&DialCountVal=0&ClickCountComp=0&ClickCountVal=0&StatType1=0&StatType2=0&StatType3=0&StatType4=0&StatType1Comp=0&StatType2Comp=0&StatType3Comp=0&StatType4Comp=0&StatType1Val=&StatType2Val=&StatType3Val=&StatType4Val=&PointValComp=0&PointVal=0&RangeComp=0&RangeVal=0&Rarity=&UAType1=0&UAType2=0&UAType1Comp=0&UAType2Comp=0&UAType1Val=&UAType2Val=&FrontArcComp=0&FrontArcVal=0&RearArcComp=0&RearArcVal=0&Ability1=0&Ability2=0&Ability3=0&Ability1Comp=0&Ability2Comp=0&Ability3Comp=0&USLID1=0&USLID2=0&USLID1Comp=0&USLID2Comp=0&USLID1Val=&USLID2Val=&keyword=&searchtype=0&sort=0&factionid=0&releaseid=11&p=2"> 2</a>
end
web_data.to_xml.write($stdout, 1)
|