Subject: [jira] [Resolved] (NUTCH-1288) Generator should not generate filter and not found and denied and gone and permanently moved pages [ https://issues.apache.org/jira/browse/NUTCH-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1288. ---------------------------------- Resolution: Invalid This is not the right way to do. If you don't want to re-try such pages then implement a custom fetch schedule - don't hack the AbstractFetchSchedule as you do. Hardcoding the schedule policy forces people to use Nutch the way you want to use it, not a good idea. Moreover your patch removes useful information about the status of a page to give a more generic (and dubious value). > Generator should not generate filter and not found and denied and gone and > permanently moved pages > -------------------------------------------------------------------------------------------------- > > Key: NUTCH-1288 > URL: https://issues.apache.org/jira/browse/NUTCH-1288 > Project: Nutch > Issue Type: Bug > Components: fetcher, generator > Affects Versions: 1.4 > Reporter: behnam nikbakht > Attachments: NUTCH-1288.patch > > > Generator should not generate filter and not found and denied and gone and > permanently moved pages. > in the shouldFetch method in AbstractFetchSchedule, CrawlDatum must checked > against special states of fetch like not found, and not generate them again. > so we can add a status in CrawlDatum that indicates invalid urls, and set > this status in fetch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira |