Subject: [jira] [Updated] (NUTCH-1288) Generator should not generate filter and not found and denied and gone and permanently moved pages [ https://issues.apache.org/jira/browse/NUTCH-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] behnam nikbakht updated NUTCH-1288: ----------------------------------- Attachment: NUTCH-1288.patch > Generator should not generate filter and not found and denied and gone and > permanently moved pages > -------------------------------------------------------------------------------------------------- > > Key: NUTCH-1288 > URL: https://issues.apache.org/jira/browse/NUTCH-1288 > Project: Nutch > Issue Type: Bug > Components: fetcher, generator > Affects Versions: 1.4 > Reporter: behnam nikbakht > Attachments: NUTCH-1288.patch > > > Generator should not generate filter and not found and denied and gone and > permanently moved pages. > in the shouldFetch method in AbstractFetchSchedule, CrawlDatum must checked > against special states of fetch like not found, and not generate them again. > so we can add a status in CrawlDatum that indicates invalid urls, and set > this status in fetch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira |