Subject: RE: Unusually long data import time?
Oh sure! As best as I can, anyway.

I have not set the Java heap size, or really configured it at all.

The server running both the SQL Server and Solr has:
* 2 Intel Xeon X5660 (each one is 2.8 GHz, 6 cores, 12 logical processors)
* 64 GB RAM
* One Solr instance (no shards)

I'm not using faceting.
My schema has these fields:
<field name="Id" type="string" indexed="true" stored="true" />
<field name="RecordId" type="int" indexed="true" stored="true" />
<field name="RecordType" type="string" indexed="true" stored="true" />
<field name="Name" type="LikeText" indexed="true" stored="true"
termVectors="true" />
<field name="NameFuzzy" type="FuzzyText" indexed="true" stored="true"
termVectors="true" />
<copyField source="Name" dest="NameFuzzy" />
<field name="NameType" type="string" indexed="true" stored="true" />

Custom types:

*LikeText
PatternReplaceCharFilterFactory ("\W+" => "")
KeywordTokenizerFactory
StopFilterFactory (~40 words in stoplist)
ASCIIFoldingFilterFactory
LowerCaseFilterFactory
EdgeNGramFilterFactory
LengthFilterFactory (min:3, max:512)

*FuzzyText
PatternReplaceCharFilterFactory ("\W+" => "")
KeywordTokenizerFactory
StopFilterFactory (~40 words in stoplist)
ASCIIFoldingFilterFactory
LowerCaseFilterFactory
NGramFilterFactory
LengthFilterFactory (min:3, max:512)

Devon Baumgarten


-----Original Message-----
From: Glen Newton [mailto:glen.newton@xxxxxxxxx]
Sent: Wednesday, February 22, 2012 9:24 AM
To: solr-user@xxxxxxxxxxxxxxxxx
Subject: Re: Unusually long data import time?

Import times will depend on:
- hardware (speed of disks, cpu, # of cpus, amount of memory, etc)
- Java configuration (heap size, etc)
- Lucene/Solr configuration (many ...)
- Index configuration - how many fields, indexed how; faceting, etc
- OS configuration (this usually to a lesser degree; _usually_)
- Network issues if non-local
- DB configuration (driver, etc)

If you can give more information about the above, people on this list
should be able to better indicate whether 18 hours sounds right for
your situation.

-Glen Newton

On Wed, Feb 22, 2012 at 10:14 AM, Devon Baumgarten
<dbaumgarten@xxxxxxxxxxxxxxxx> wrote:
> Hello,
>
> Would it be unusual for an import of 160 million documents to take 18 hours?
>  Each document is less than 1kb and I have the DataImportHandler using the
> jdbc driver to connect to SQL Server 2008. The full-import query calls a
> stored procedure that contains only a select from my target table.
>
> Is there any way I can speed this up? I saw recently someone on this list
> suggested a new user could get all their Solr data imported in under an hour.
> I sure hope that's true!
>
>
> Devon Baumgarten
>
>



--
-
http://zzzoot.blogspot.com/
-

(C)2011 mailinglist-archive.com