Confusing titles and PDF files in SharePoint Search

Confusing titles and PDF files in SharePoint Search

The search capabilities of Office 365 are a great way for users to find their content. The search engine provide you and your users with the option to create overviews or search for content. Within Office 365 you also have the possibility to tweak the search configuration. By tweaking the configuration you can provide your users with the information they require.

There is only one minor downside of changing the existing search schema: it is hard to see if your changes are processed. When you create a new managed property it will only be available and populated after a full crawl. As it can take up to seven days before those happen you might have to wait a while. If you change something that can take up to a seven days as well, however as the property is already present you might not notice it.

When working with documents in search within Office 365 (or SharePoint 2013) the first thing your user will be presented with is the title. This title has seen some changes in SharePoint 2013. As you can read in Show more relevant Titles in search results in SharePoint 2013. Instead of just showing the title of the document as provided in SharePoint you can now also expect to see "The title extracted from the body of Word documents and PowerPoint presentations." So if you edit the Title managed property you can see that the there is a new crawled property mapped, the MetadataExtractorTitle.

image

If you would rather like to show the title that your users provided in SharePoint you can always resort to changing the order, or just delete the MetadataExtractorTitle from the mapped properties. The column that is provided by SharePoint would be the ows_Title. There are some other properties required so the minimal set would be the following:

image

Keep in mind that the Office Graph and Delve uses search as well. Changing these settings might impact what you would see in Delve.

Usually that would be all the required changes, yet in a recent case we had some trouble here. As it turns out all files work find with these settings except some PDF files. Users uploaded the PDF file, provided a title in SharePoint and yet in the search results the PDF’s would show up with completely confusing titles. As it turns out the PDF’s where scanned with a document scanner.  This document scanner did add a property called Title with a prefix and datetime stamp as value. So if you would edit the PDF in a reader (or editor) and check the Document (or File) properties you would see that value.

PDF Document properties

As it turns out the crawler does pick up that title and overwrites what is found from SharePoint. So rather than picking up the title that is configured it uses what is put in the PDF file. If you ever find yourself having problems with PDF files and search just make sure that it does not contain exotic properties or a custom title property as it will show up in the search.

Leave a Reply