easy portal contact
Language Switch

Glossary

Approximate String Matching:

How does one find ‘Client’ when someone types ‘Clint’? The answer is Approximate String Matching. This is a search procedure based on the measurement of structural proximity between strings.

The goal of this search procedure is to find words that are structurally closest to the search term, even if they do not match exactly.

What is Approximate String Matching?

Approximate String Matching is a computer science procedure that corrects flawed or inaccurate input with high probability. It is often referred to as Fuzzy Search or Similarity Search.

The method finds all matches that deviate only slightly from the search term, measured in distance points. This is particularly valuable for:

  • Very long words with a high probability of typos.
  • Historical data where spellings may vary.
  • Database queries involving input errors.

How Does Approximate String Matching Work?

The central principle is the calculation of the distance between the search term and the words in the index. This distance is determined using a distance measure algorithm.

The Levenshtein Distance

The best-known algorithm is the Levenshtein Distance (also called Edit Distance). It measures how many single edit operations are necessary to transform one word into another.

Edit operations include:

  • A Substitution (changing one character)
  • An Insertion (adding one character)
  • A Deletion (removing one character)
ExampleLevenshtein DistanceExplanation
CustomerCustmer1A deletion of the letter ‘o’
BerlinBerlim1A substitution of ‘n’ for ‘m’
SearchwordSearchwordee2Two insertions of ‘e’
SearchPhonetics8High distance, words are not similar.

Advantages and Differentiation

Approximate String Matching, together with Phonetic Search, forms a robust fault tolerance layer for Full-Text Search.

  • Efficient Error Correction: It corrects errors that are neither phonetic nor exact. This makes it the most comprehensive method for resolving typos.
  • Data Cleansing: The mechanism can be used for the automatic detection and merging of duplicates in databases (e.g., “Frankfurt am Main” and “Frankfurt a. M.”).
  • Clear Differentiation from Phonetics: This String Matching is script-based. It would effectively correct errors like “Maier” “Meier”. However, it would not recognize the difference between “Meyer” and “Meier” based on sound like Phonetic Search, as their Edit Distance is zero, but the phonetic similarity is given. The procedures are complementary.

Approximate String Matching offers a fundamental form of content-based approximation, but no contextual or semantic interpretation. It only recognizes the proximity of the letters.

Outlook and Conclusion

While advanced search functions in Full-Text Search (such as wildcards or the tilde operator for Fuzzy Search) enable this at the user level, Approximate String Matching is the underlying, formal principle. Unlike rigid Full-Text Search and sound-based Phonetic Search, this search Method uses algorithmic distance measures to calculate the structural proximity of two strings.

These procedures are essential for ensuring data quality in Document Management for increasing the hit rate in complex, error-prone databases.

easyarchive

Archive data securely and compliant.

Discover easy archive

easyDMS

Mann arbeitet mit easy DMS

Manage documents easily and efficiently.

Discover easy DMS
Newsroom Media Library Glossary
Newsletter

We will keep you regularly up to date. Subscribe to our newsletter and find out everything you need to know about the digitization of business processes. The topics will be prepared for you in a tailor-made and varied way.

Newsletter subscription