A trie (pronounced “try”) stands for prefix tree, a data structure used to efficiently store and retrieve keys in a set of strings. This is often found in auto-complete/spell-checker systems.

There are 3 main operations:

  • void insert(String word) - inserts a word into the trie, in O(n)
  • bool search(String word) - returns true if word is found in the trie, found in O(n)
  • bool startsWith(String prefix) - returns true if there is some word in the trie with the prefix prefix, found in O(n)

    Note: n is the length of the given string argument

What’s it useful for?

  1. A lot of “word” validation problems where you are trying to match multiple strings in a given input in one single pass rather than multiple. For example, given a list of strings, find all the celebrity names. E.g. FIND: "Swift" or "omg" IN: "omg I <3 Taylor Swift 4ever!".
  2. Prefix-search where you must find all words with a given prefix. You can just traverse to end of the prefix in the trie and find all words following it rather than scanning a large list of words and checking word.startsWith(prefix).
  3. Spell checking
  4. Word filters (checking profanity), just scan all of the text in one pass and see if it matches any word in the trie.

The structure

A trie contains just one thing: a reference to the root, which is a trienode. Every node contains a mapping of possible following characters to other tries. Additionally, a node keeps track of whether the character at that point in the tree constitutes a complete word.

class TrieNode
	HashMap<Character, Node> children
	bool isCompleteWord

Operations

All operations are fairly similar in structure. We just keep track of which node we have traversed to and perform the corresponding operation.

Insertion

For every character, create a new trienode and add a mapping of that character from the current trienode to the new trienode. If a mapping already exists, just traverse to the trienode as you otherwise would. After we finish traversing we are on the last trienode representing the final letter, this trienode should be marked as isCompleteWord.

For every character, determine if a mapping exists for that character from the current trienode. If it does, traverse. Otherwise, return false as it does not exist.

If we manage to traverse to the end, we must still verify that the trienode is marked as isCompleteWord, otherwise, return false.

Starts With

This is the exact same as search, except that the word does not need to be marked as isCompleteWord.