Subject: I think I found a bug --> multiple_values_encountered_for_non_multiValued_field_title so I've been getting this error
"multiple_values_encountered_for_non_multiValued_field_title" every once
in a while when I am trying to run solrindex. I can now say that this is
being caused by index-more plug in (MoreIndexingFilter.java) private NutchDocument resetTitle(NutchDocument doc, ParseData data, String url) { String contentDisposition = data.getMeta(Metadata.CONTENT_DISPOSITION); if (contentDisposition == null) return doc; for (int i=0; i<patterns.length; i++) { Matcher matcher = patterns[i].matcher(contentDisposition); if (matcher.find()) { doc.add("title", matcher.group(1)); break; } } return doc; } the problem here is that in my case this function is not reseting but it is just adding a new title. it seems that the original idea was that if CONTENT_DISPOSITION exist then the document will not have a title set from other plug ins (namely index-basic). unfortunately this seems not to be always the case as you can see by running this command: bin/nutch indexchecker http://www.2modern.com/site/gift-registry.html what i do get (the part that is relevant) is: tstamp : Tue Feb 21 13:18:13 PST 2012 type : text/html type : text type : html date : Tue Feb 21 13:18:13 PST 2012 url : http://www.2modern.com/site/gift-registry.html content : 2Modern Gift Registry Modern Furniture & Lighting items in cart 0 checkout Returning 2Modern cu user_ranking : 25.0 title : 2Modern Gift Registry title : gift-registry.html plutoz_ranking : 10.0 categories : Furniture Home contentLength : 12924 and as you can see there are 2 titles. I think it would be very easy to fix that. just check to see if a title exist already before setting the name of the file as title: if (contentDisposition == null || null != doc.getField("title")) return doc; or if the substitution must happen in presence of CONTENT_DISPOSITION, at least remove the old one: if (matcher.find()) { doc.remove("title"); doc.add("title", matcher.group(1)); break; } now that being said, the real problem here is why NutchDocument doesn't observe the schema.xml file and alway assumes that all fields are multi value? public void add(String name, Object value) { 53 NutchField field = fields.get(name); 54 if (field == null) { 55 field = new NutchField(value); 56 fields.put(name, field); 57 } else { 58 ----> field.add(value); <--- 59 } 60 } -- Kaveh Minooie www.plutoz.com |