0.2.8 is out!

Subscribe to 0.2.8 is out! 6 posts, 3 voices

 
Avatar scrubber 437 posts

Just uploaded to rubyforge.

Of course bug reports, comments etc. warmly welcome!

 
Avatar Kevinf 1 post

Just updated my gem. Im a bit of noob, and after updating the new dependencies were not downloaded automagically via the gem command. I had to manually go out and get ParseTree and ruby2ruby. Is this always the case or is this something you can specify on the gem?

 
Avatar noah 17 posts

I had to do the same, Kevin.

to_hash works well for me so far. I can’t get extractors to export to file using the export() method, anymore.

Here are the release notes:
Notes:
Based on the great feedback received through the forum, we have managed to fix the most bugs ever. There are also some nice new features, as well as a lot of improvements on the older ones (mainly detail pages which was a bit clumsy in 0.2.6).

Changes:
[NEW] download pattern: download the file pointed to by the
      parent pattern
[NEW] checking checkboxes
[NEW] basic authentication support
[NEW] default values for missing elements
[NEW] possibility to resolve relative paths against a custom url
[NEW] first simple version of to_csv and to_hash
[NEW] complete rewrite of the exporting system (Credit: Neelance)
[NEW] first version of smart regular expressions: they are constructed
      from examples, just as regular expressions (Credit: Neelance)
[NEW] Possibility to click the n-th link
[FIX] Clicking on links using scRUBYt's aadvanced example lookup
[NEW] Forcing writing text of non-leaf nodes with :write_text => true
[NEW] Possibility to set custom user-agent; Specified default user agent
      as Microsoft IE6
[FIX] Fixed crawling to detail pages in case of leaving the
      original site (Credit: Michael Mazour)
[FIX] fixing the '//' problem - if the relative url contained two
      slashes, the fetching failed
[FIX] scrubyt assumed that documents have a list of nested elements
      (Credit: Rick Bradley)
[FIX] crawling to detail pages works also if the parent pattern is
      a string pattern
[FIX] shorcut url fixed again
[FIX] regexp pattern fixed in case it's parent was a string
[FIX] refactoring the core classes, lots of bugfixes and stabilization
 
Avatar scrubber 437 posts

noah,

what exactly is the problem with export()?

 
Avatar noah 17 posts

I posted a bug report on it. Something to do with comments and parsing the original code.

I’ve got messy test files with lots of commented out extractor definitions, but I found that moving those below the working Scrubyt::Extractor.define block takes care of things nicely.

See if you can recreate it. Could be a problem with my dependencies. I did install rubyinline and ruby2ruby with 0.2.8 (which I obviously needed to do).

 
Avatar scrubber 437 posts

Ah, ok. Thanks for the bug report, it’s great that some people are using the tracker :-). I am going to install trac or lighthouse soon.

The thing is that the exporting was rewritten from scratch (therefore the dependencies on Ruby2Ruby and ParseTree) and while it is possible to partially use the old exporting (which is the
my_pattern.export(__FILE__) 
style), that is considered to be obsolete, because it goes to the learning extractor file and grabs the name of the extractor with a regexp – therefore the confusion if it finds a commented out extractor declaration (it has no chance to know it is commented out since it treats the file as a string). So, if you want to avoid this, call the export with:
stuff.export('my_super_extractor.rb')
This way you are not depending on anything written in the learning extractor file.