Full-text search allows you to find search term matches within the content of a document, whereas the standard search only locates matches within document fields.
Note: For performance reasons, full-text search returns only the first 5,000 matching documents.
How to Search Document Content
You can only use full-text search from the Advanced Search dialog. To include document content in searches:
- Click the binoculars icon in the search bar to open Advanced Search.
- In Search Scope, choose Include Content.
- Fill in the remaining fields as needed.
Vault separates search terms into various segments when searching on alpha-numeric and punctuation fields. This process is called “tokenization.”
About Search Results
When you search within the content of a document, Vault runs separate searches for document fields and document content, and then merges the final set of results. If the search results include more than 5,000 documents, Vault limits the results to the first 5,000 documents that are most relevant to your search terms and displays a warning. To see a complete set of results, apply additional filters before performing another full-text search.
Search Results Page
If Vault finds a match for your search terms within the document content, the search results page displays an excerpt from the document to provide context for the matching term.
Indexing for Full-Text Search
Vault automatically indexes the full text for documents with supported source file formats in order to support full-text search. Document content is typically available for search within minutes after upload, but in cases where Vault is uploading many documents simultaneously, there may be a delay. Indexing also occurs for document and object attachments.
Searchable Scanned Documents
Vault can extract and index text within scanned source documents that users upload as images or PDF files. This functionality, called Optical Character Recognition (OCR) allows you to use full-text search on these documents. Vault only extracts typed, English-language text.
Supported Formats for Text Extract
OCR will automatically attempt to extract text from files with supported formats:
- PDF (only if the PDF does not already contain text)
- Portable Network Graphics (PNG)
- Tagged Image File Format (TIF, TIFF)
- JPEG (JPEG, JPG)
- Graphics Interchange Format (GIF)
- Bitmap (BMP)