Saturday, December 4, 2010

Index and search for PDF documents with SharePoint 2010

SharePoint Server 2010, like its predecessors, includes indexing and search capabilities. But what doesn’t come out of the box is the ability to index and search for PDF documents.

PDF is a format owned by Adobe, not Microsoft. If you want to be able to find Adobe PDF documents, or have the PDF icon appear when viewing PDF files in a SharePoint document library , you will need to set it up for yourself,


1. Download and install Adobe’s 64-bit PDF iFilter http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025
2. Download the Adobe PDF icon (select Small 17 x 17)
3. Give the icon a name or accept the default: ‘pdficon_small.gif’
4. Save the icon (or copy to) C:\Program Files\Common Files\Microsoft Shared\Web Server
Extensions\14\TEMPLATE\IMAGES
5. Edit the DOCICON.XML file to include the PDF icon
6. In Windows Explorer, navigate to C:\Program Files\Common Files\Microsoft
Shared\Web Server Extensions\14\TEMPLATE\XML
7. Edit the DOCICON.XML file (I open it in Notepad, you can also use the built-in XML
Editor)
8. Ignore the section and scroll down to the section of the file
9. Within the section, insert attribute. The easiest way is to copy an existing one
10. Save and close the file
11. Add PDF to the list of supported file types within SharePoint
12. In the web browser, open SharePoint Central Administration
13. Under Application Management, click on Manage service applications
14. Scroll down the list of service apps and click on Search Service Application
15. Within the Search Administration dashboard, in the sidebar on the left, click File Types
16. Click ‘New File Type’ and enter PDF in the File extension box. Click OK
17. Scroll down the list of file types and check that PDF is now listed and displaying the pdf icon.
18. Close the web browser
19. Stop and restart Internet Information Server (IIS) Note: this will temporarily take SharePoint offline. Open a command line (Start – Run – enter ‘cmd’) and type ‘iisreset’
20. Perform a full crawl of your index. Note: An incremental crawl is not sufficient when you have added a new file type. SharePoint only indexes file names with the extensions listed under File Types and ignore everything else. When you add a new file type, you then have to perform a full crawl to forcibly identify all files with the now relevant file extension.

If you now perform a search, PDF files should be displayed in results where they match the search query, along with the PDF icon on display in results. The icon should also be visible in any document libraries that contain PDF files.

Good Luck

No comments:

Post a Comment

Posting is provided "AS IS" with no warranties, and confers no rights.