Vault uses several internal rules to find the right documents that match your search terms and to sort them in the most relevant way. This article explains the logic we use “behind-the-scenes” to find and sort documents. If you’re not seeing the documents you expect to see, or the documents are not presented in the right order, understanding the basics of our search algorithm may help you to find documents more effectively.
How We Search
By default, Vault is always searching for words that begin with the entered search string. For example, searching ind returns independent, but not find and rescind. You can override this behavior by using quotes, which will do an exact-match search.
Note: Number-type fields are not included in search.
Some fields, like Checksum, use a field type called Text (Exact-Match). When searching, Vault only matches to fields with this type if the search term is an exact match for the document’s field value, including capitalization. This field type is only used on fields that are searched very infrequently, or where only finding an exact match makes sense.
Stop words are words that are so common in a language that including them in a search returns many irrelevant results. For each language, Vault has a list of stop words. For English, these include terms like and, the, and on. When these words are included in the search terms, Vault removes them when performing the search. You can use quotes to force inclusion of these terms. See a complete list of stop words.
Vault separates search terms into various segments. This process is called “tokenization.” The following table explains how Vault splits terms:
|Tokenization Rule||Original Term||Tokenized Terms|
|Strip leading and trailing punctuation||Report (FDA)||Report, FDA|
|Strip & preserve leading zeros||0008670||0008670, 8670|
|Split on punctuation (hyphen, underscore, period, apostrophe, etc.)||CholeCap-300mg/400iu||CholeCap, 300mg, 400iu|
|Split on space||109839 CC US||109839, CC, US|
|Split on number||CC356||CC, 356|
|Case change||GludactaBrochure||Gludacta, Brochure|
|Preserve strings between punctuation||GL-45RLC-JA||GL, 45RLC, JA|
When performing searches for documents fields containing any of the above, we recommend that you:
- Search with the complete field value if known: CA-MDD-415A
- Avoid searching only with only the tail end of a term, for example, 9A-SOP will not find 129A-SOP.
- Use double quotes when you are searching for a phrase: “Report FDA”
- Only use leading zeros when they are included in the original term. Leading zeros are not stripped from search terms so 000123 will not match when the original term is 0123.
Because we use “Starts with” search, Vault only finds partial matches on a segment if you’ve included the beginning. For example, a search for DD415A would not match MDD415A.
Vault allows users to enter common special characters (@, #, $, Δ, etc.) in text fields. Vault search can find matches on special characters both when they are part of an alphanumeric string (like 53.4% or #wonderdrug) and when they are used by themselves.
However, special character support is only for metadata fields. When indexing document or attachment content for full-text search, Vault treats special characters in the content as a signal to split terms. The example below shows how Vault treats the same string differently based on whether it’s found in the document content or document metadata.
|String||Found In||Indexed Strings|
|email@example.com||Document Source File||wonderdruginfo, veeva, com|
To search for an exact match, put double quotes around the terms. (Single quotes will not change how Vault searches.) You can also put quotes around a single search term, like a document number. This will force an exact match of words and word order. For example, a search on “reduced blood pressure” would not return documents that contained the phrase blood pressure reduced. Note that this will not prevent search term segmentation.
If an Admin configures search synonyms, Vault expands search results based on the Admin-created thesaurus. When you search for terms that are listed as an entry in the thesaurus, Vault also includes results that include any of that entry’s synonyms as well. Your Admin can also choose whether each entry is multidirectional. If an entry is multidirectional, Vault also expands searches for the synonyms to include the entry.
When you enter multiple search terms without quotes, Vault performs searches using the “OR” operator. The “OR” operator finds matches for any document that contains at least one of the search terms. Documents matching multiple terms appear earlier in the search results. See below for details on results ranking.
Matching Across Document Versions
Vault matches search criteria across all document versions, but only returns a document if the latest version for which you have View Document permission matches the search criteria.
When you are assigned to multiple roles on a document because you belong to multiple groups, Vault may not return the latest document version. This happens when your search criteria only matches a prior document version and that version is the latest that one of your assigned roles can access. If you have access to the latest version of a particular document, an icon will appear next to the document name to indicate that a later version is available. Click on the icon to display the latest document version available to you and any role assignments that are causing the prior version of the document to appear in the results.
Example Search & Results
The tables below show the versions that exist for each document and whether Thomas has View Document permission.
|Document Number||Version & Status||View Permission||Match Details|
|SOP-1||0.1 – Draft||Yes||Match|
|0.2 – In Review||Yes||–|
|1.0 – Approved||Yes||Latest for user|
|SOP-2||1.0 – Approved||Yes||–|
|1.1 – Draft||Yes||Latest for user & Match|
|1.2 – In Review||No||–|
Thomas uses Advanced Search to search on Document Type = SOP and Status = Draft. For this search, Vault returns the following results:
- SOP-1: No match
- SOP-2: Match on v1.1
In another scenario, Thomas is assigned the Editor role for SOP-1 and has the View Document permission for v0.1. He is also assigned the Viewer role for SOP-1 and has the View Document permission for v1.0. If he filters on Document Type = SOP and Status = Draft, Vault returns a match for v0.1 with an icon next to the document name indicating that a later version is available.
Search results are returned in order of relevance. This does not affect which documents are found in the search, only the order in which Vault displays them. For relevance ranking, Vault uses various criteria to determine which documents appear earlier in the search results.
- Search Term Frequency: Documents with multiple matches to a single search term appear earlier.
- Search Term Proximity: For multi-term searches, documents that contain all search terms appear first, followed by documents that contain fewer search terms. When all matching terms are close together (within the same document field, for example), the document also appears earlier.
- Exact Matches: If a document contains an exact search term match, rather than a match on part of a word, it appears earlier.
- Document Name Field: If a search term matches a word in the Document Name field, the document appears earlier.
- Classification Field: If a search term matches a word in the Classification field (part of the document type), the document appears earlier.
By default, Vault performs searches based your Vault’s Base Language. To use multi-language search, your Admin must enable multilingual document handling, which adds the Language standard document field to your Vault. Vault automatically populates the Language field, but you can edit it to update the document’s language at any time. The Language field must be set to the correct language in order for Vault’s language-specific search functionality to work properly. You can modify your preferred search languages from your user profile.
When users search, Vault respects the language of a document by incorporating language-specific elements like word separators, stop words (ignores “a” and “the” in English), and word stemming. The Language field affects Vault searches on both document content and metadata.