I had been experiencing issues crawling a large document library of over 60,000 items in a SharePoint 2007 farm 64x after the index was corrupt and I had to reset the crawled content. The only error I could find in the crawl log was the error “The item may be too large or corrupt.” The crawler stopted around the 33,000 items from this document library. I have searched a lot on the internet for this problem and found a few Blogs describing this problem with different solutions. The solution for my issue was a mix of what I found on the internet. After these changes the crawler was able to index all 60,000 items from one Library.
- HKLM/SOFTWARE/Microsoft/Office Server/12/Search/Global/GatheringManager/DedicatedFilterProcessMemoryQuota” –> Change the value to: 256000000 Hex
- HKLM/SOFTWARE/Microsoft/Office Server/12/Search/Global/GatheringManager/FilterProcessMemoryQuota –> Change the value to: 256000000 Hex
- HKLM/SOFTWARE/Microsoft/Office Server/12/Search/Global/GatheringManager/FolderHighPriority –> Change the value to: 500 Hex
- HKEY_LOCAL_MACHINE/SOFTWARE/Microsoft/Office Server/12.0/Search/Global/Gathering Manager: set DeleteOnErrorInterval –> Change the value to: 4 Decimal
Search Time Out settings:
1. Central Administration -> Application Management -> Search section -> Manage search service
2. Manage Search Service page –> “Farm-level search settings