Packages

p

com.github.vickumar1981

stringdistance

package stringdistance

Provides classes for calculating distances and fuzzy match similarities between two strings. Also provides implicits for using distance and fuzzy match scores as an operator, like:

val result = "abc" levenshtein "abc"

Includes functionality for phonetic comparisons between strings.

Overview

The main class to use is com.github.vickumar1981.stringdistance.StringDistance

If you include com.github.vickumar1981.stringdistance.StringConverter, you can convert/use the string distance and score functions as an operator between two strings.

To compare two strings phonetically, i.e. if they sound alike, use the com.github.vickumar1981.stringdistance.util.StringSound class.

To use in Java, please use the corresponding classes in the com.github.vickumar1981.stringdistance.util package.

Class

Description

com.github.vickumar1981.stringdistance.StringDistance

Singleton class with fuzzy match scores and distances

com.github.vickumar1981.stringdistance.StringConverter

Implicit converstions between strings s1 and s2

com.github.vickumar1981.stringdistance.StringSound

Phonetic comparison between strings s1 and s2

com.github.vickumar1981.stringdistance.util.StringDistance

Java class for fuzzy match scores and distances

com.github.vickumar1981.stringdistance.util.StringSound

Java class for phonetic comparison between strings s1 and s2

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. stringdistance
  2. SoundDefinitions
  3. ScoreDefinitions
  4. DistanceDefinitions
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Package Members

  1. package impl
  2. package implicits
  3. package util

Type Members

  1. trait CosineAlgorithm extends StringMetricAlgorithm

    A marker interface for the cosine similarity algorithm.

  2. trait DamerauLevenshteinAlgorithm extends StringMetricAlgorithm

    A marker interface for the damerau levenshtein distance algorithm.

  3. trait DiceCoefficientAlgorithm extends StringMetricAlgorithm

    A marker interface for the dice coefficient algorithm.

  4. trait DistanceAlgorithm[+T <: StringMetricAlgorithm] extends AnyRef

    A type class to extend a distance method to StringMetricAlgorithm.

  5. trait HammingAlgorithm extends StringMetricAlgorithm

    A marker interface for the hamming distance algorithm.

  6. trait JaccardAlgorithm extends StringMetricAlgorithm

    A marker interface for a jaccard similarity algorithm.

  7. trait JaroAlgorithm extends StringMetricAlgorithm

    A marker interface for the jaro similarity algorithm.

  8. trait JaroWinklerAlgorithm extends StringMetricAlgorithm

    A marker interface for the jaro winkler algorithm.

  9. trait LevenshteinAlgorithm extends StringMetricAlgorithm

    A marker interface for the levenshtein distance algorithm.

  10. trait LongestCommonSeqAlorithm extends StringMetricAlgorithm

    A marker interface for the longest common subsequence algorithm.

  11. trait MetaphoneAlgorithm extends StringMetricAlgorithm

    A marker interface for the metaphone algorithm.

  12. class MetaphoneImplWrapper extends MetaphoneImpl

    Java Wrapper for metaphone similarity.

  13. trait NGramAlgorithm extends StringMetricAlgorithm

    A marker interface for the n-gram similarity algorithm.

  14. trait NeedlemanWunschAlgorithm extends StringMetricAlgorithm

    A marker interface for the needleman wunsch similarity algorithm.

  15. trait OverlapAlgorithm extends StringMetricAlgorithm

    A marker interface for the overlap similarity algorithm.

  16. trait ScorableFromDistance[+T <: StringMetricAlgorithm] extends ScoringAlgorithm[T]

    A mix-in trait to extend a score method using the distance method to StringMetricAlgorithm.

  17. trait ScoringAlgorithm[+T <: StringMetricAlgorithm] extends AnyRef

    A type class to extend a score method to StringMetricAlgorithm.

  18. trait SmithWatermanAlgorithm extends StringMetricAlgorithm

    A marker interface for the smith waterman similarity algorithm.

  19. trait SmithWatermanGotohAlgorithm extends StringMetricAlgorithm

    A marker interface for the smith waterman gotoh similarity algorithm.

  20. trait SoundScoringAlgorithm[+T <: StringMetricAlgorithm] extends AnyRef

    A type class to extend a sound score method to StringMetricAlgorithm.

  21. trait SoundexAlgorithm extends StringMetricAlgorithm

    A marker interface for the soundex similarity algorithm.

  22. class SoundexImplWrapper extends SoundexImpl

    Java Wrapper for soundex similarity.

  23. trait StringMetric[A <: StringMetricAlgorithm] extends AnyRef

    Defines implementation for StringMetricAlgorithm by adding implicit definitions from DistanceAlgorithm, ScoringAlgorithm, WeightedDistanceAlgorithm, or WeightedScoringAlgorithm

  24. trait StringMetricAlgorithm extends AnyRef

    A marker interface for the string metric algorithm.

  25. trait StringSoundMetric[A <: StringMetricAlgorithm] extends AnyRef
  26. trait TverskyAlgorithm extends StringMetricAlgorithm

    A marker interface for the tversky similarity algorithm.

  27. trait WeightedDistanceAlgorithm[+A <: StringMetricAlgorithm, B] extends AnyRef

    A type class to extend a distance method with a 2nd typed parameter to StringMetricAlgorithm.

  28. trait WeightedScoringAlgorithm[+A <: StringMetricAlgorithm, B] extends AnyRef

    A type class to extend a score method with a 2nd typed parameter to StringMetricAlgorithm.

  29. trait WeightedStringMetric[A <: StringMetricAlgorithm, B] extends AnyRef

Value Members

  1. implicit def gapToGapAndWindow(g: Gap): (Gap, Int)
  2. object ArrayDistance

    Main class to work with generic arrays, Array[T], analagous to StringDistance

    Main class to work with generic arrays, Array[T], analagous to StringDistance

    import com.github.vickumar1981.stringdistance.ArrayDistance._
    
    // Example Levenshtein Distance and Score
    val levenshteinDist = Levenshtein.distance(Array("m", "a", "r", "t", "h", "a"), Array("m", "a", "r", "h", "t", "a")) // 2
    val levenshtein = Levenshtein.score(Array("m", "a", "r", "t", "h", "a"), Array("m", "a", "r", "h", "t", "a")) // 0.667
  3. object StringConverter

    Object to extend operations to the String class.

    Object to extend operations to the String class.

    import com.github.vickumar1981.stringdistance.StringConverter._
    
    // Scores between two strings
    val cosSimilarity: Double = "hello".cosine("chello")
    val damerau: Double = "martha".damerau("marhta")
    val diceCoefficient: Double = "martha".diceCoefficient("marhta")
    val hamming: Double = "martha".hamming("marhta")
    val jaccard: Double = "karolin".jaccard("kathrin")
    val jaro: Double = "martha".jaro("marhta")
    val jaroWinkler: Double = "martha".jaroWinkler("marhta")
    val levenshtein: Double = "martha".levenshtein("marhta")
    val needlemanWunsch: Double = "martha".needlemanWusnch("marhta")
    val ngramSimilarity: Double = "karolin".nGram("kathrin")
    val bigramSimilarity: Double = "karolin".nGram("kathrin", 2)
    val overlap: Double = "karolin".overlap("kathrin")
    val smithWaterman: Double = "martha".smithWaterman("marhta")
    val smithWatermanGotoh: Double = "martha".smithWatermanGotoh("marhta")
    val tversky: Double = "karolin".tversky("kathrin", 0.5)
    
    // return a List[String] of ngram tokens
    val tokens = "martha".tokens(2) // List("ma", "ar", "rt", "th", "ha")
    
    // Distances between two strings
    val damerauDist: int = "martha".damerauDist("marhta")
    val hammingDist: Int = "martha".hammingDist("marhta")
    val levenshteinDist: Int = "martha".levenshteinDist("marhta")
    val longestCommonSeq: Int = "martha".longestCommonSeq("marhta")
    val ngramDist: Int = "karolin".nGramDist("kathrin")
    val bigramDist: Int = "karolin".nGramDist("kathrin", 2)
    
    // Phonetic similarity of two strings
    val metaphone: Boolean = "merci".metaphone("mercy")
    val soundex: Boolean = "merci".soundex("mercy")
  4. object StringDistance

    Main class to organize functionality of different string distance algorithms

    Main class to organize functionality of different string distance algorithms

    import com.github.vickumar1981.stringdistance.StringDistance._
    import com.github.vickumar1981.stringdistance.impl.{ConstantGap, LinearGap}
    
    // Scores between strings
    val cosSimilarity: Double = Cosine.score("hello", "chello")
    val damerau: Double = Damerau.score("martha", "marhta")
    val diceCoefficient: Double = DiceCoefficient.score("martha", "marhta")
    val hamming: Double = Hamming.score("martha", "marhta")
    val jaccard: Double = Jaccard.score("karolin", "kathrin", 1)
    val jaro: Double = Jaro.score("martha", "marhta")
    val jaroWinkler: Double = JaroWinkler.score("martha", "marhta", 0.1)
    val levenshtein: Double = Levenshtein.score("martha", "marhta")
    val needlemanWunsch: Double = NeedlemanWunsch.score("martha", "marhta", ConstantGap())
    val ngramSimilarity: Double = NGram.score("karolin", "kathrin", 1)
    val bigramSimilarity: Double = NGram.score("karolin", "kathrin", 2)
    val overlap: Double = Overlap.score("karolin", "kathrin", 1)
    val smithWaterman: Double = SmithWaterman.score("martha", "marhta", (LinearGap(gapValue = -1), Integer.MAX_VALUE))
    val smithWatermanGotoh: Double = SmithWatermanGotoh.score("martha", "marhta", ConstantGap())
    val tversky: Double = Tversky.score("karolin", "kathrin", 0.5)
    
    // Distances between strings
    val damerauDist: Int = Damerau.distance("martha", "marhta")
    val hammingDist: Int = Hamming.distance("martha", "marhta")
    val levenshteinDist: Int = Levenshtein.distance("martha", "marhta")
    val longestCommonSubSeq: Int = LongestCommonSeq.distance("martha", "marhta")
    val ngramDist: Int = NGram.distance("karolin", "kathrin", 1)
    val bigramDist: Int = NGram.distance("karolin", "kathrin", 2)
    
    // return a List[String] of ngram tokens
    val tokens = NGram.tokens("martha", 2) // List("ma", "ar", "rt", "th", "ha")
  5. object StringSound

    Main class to organize functionality of different phonetic/sound string algorithms

    Main class to organize functionality of different phonetic/sound string algorithms

    import com.github.vickumar1981.stringdistance.StringSound._
    import com.github.vickumar1981.stringdistance.implicits._
    
    // Phonetic similarity between strings
    val metaphone: Boolean = Metaphone.score("merci", "mercy")
    val soundex: Boolean = Soundex.score("merci", "mercy")
  6. implicit object DamerauLevenshteinDistance extends LevenshteinDistanceImpl with DistanceAlgorithm[DamerauLevenshteinAlgorithm] with ScorableFromDistance[DamerauLevenshteinAlgorithm]

    Implicit definition of damerau levenshtein distance for DamerauLevenshteinAlgorithm.

    Implicit definition of damerau levenshtein distance for DamerauLevenshteinAlgorithm.

    Definition Classes
    DistanceDefinitions
  7. implicit object HammingDistance extends HammingImpl with DistanceAlgorithm[HammingAlgorithm] with ScorableFromDistance[HammingAlgorithm]

    Implicit definition of hamming distance for HammingAlgorithm.

    Implicit definition of hamming distance for HammingAlgorithm.

    Definition Classes
    DistanceDefinitions
  8. implicit object LevenshteinDistance extends LevenshteinDistanceImpl with DistanceAlgorithm[LevenshteinAlgorithm] with ScorableFromDistance[LevenshteinAlgorithm]

    Implicit definition of levenshtein distance for LevenshteinAlgorithm.

    Implicit definition of levenshtein distance for LevenshteinAlgorithm.

    Definition Classes
    DistanceDefinitions
  9. implicit object LongestCommonSeqDistance extends LongestCommonSeqImpl with DistanceAlgorithm[LongestCommonSeqAlorithm]

    Implicit definition of longest common subsequence for CosineAlgorithm.

    Implicit definition of longest common subsequence for CosineAlgorithm.

    Definition Classes
    DistanceDefinitions
  10. implicit object NGramDistance extends NGramImpl with WeightedDistanceAlgorithm[NGramAlgorithm, Int]

    Implicit definition of n-gram distance for NGramAlgorithm.

    Implicit definition of n-gram distance for NGramAlgorithm.

    Definition Classes
    DistanceDefinitions
  11. implicit object CosSimilarityScore extends CosSimilarityImpl with ScoringAlgorithm[CosineAlgorithm]

    Implicit definition of cosine similarity score for CosineAlgorithm.

    Implicit definition of cosine similarity score for CosineAlgorithm.

    Definition Classes
    ScoreDefinitions
  12. implicit object DiceCoefficientScore extends DiceCoefficientImpl with ScoringAlgorithm[DiceCoefficientAlgorithm]

    Implicit definition of dice coefficient score for DiceCoefficientAlgorithm.

    Implicit definition of dice coefficient score for DiceCoefficientAlgorithm.

    Definition Classes
    ScoreDefinitions
  13. implicit object JaccardScore extends JaccardImpl with WeightedScoringAlgorithm[JaccardAlgorithm, Int]

    Implicit definition of jaccard score for JaccardAlgorithm.

    Implicit definition of jaccard score for JaccardAlgorithm.

    Definition Classes
    ScoreDefinitions
  14. implicit object JaroScore extends JaroImpl with ScoringAlgorithm[JaroAlgorithm]

    Implicit definition of jaro score for JaroAlgorithm.

    Implicit definition of jaro score for JaroAlgorithm.

    Definition Classes
    ScoreDefinitions
  15. implicit object JaroWinklerScore extends JaroImpl with WeightedScoringAlgorithm[JaroWinklerAlgorithm, Double]

    Implicit definition of jaro winkler score for JaroWinklerAlgorithm.

    Implicit definition of jaro winkler score for JaroWinklerAlgorithm.

    Definition Classes
    ScoreDefinitions
  16. implicit object NGramScore extends NGramImpl with WeightedScoringAlgorithm[NGramAlgorithm, Int]

    Implicit definition of n-gram score for NGramAlgorithm.

    Implicit definition of n-gram score for NGramAlgorithm.

    Definition Classes
    ScoreDefinitions
  17. implicit object NeedlemanWunschScore extends NeedlemanWunschImpl with WeightedScoringAlgorithm[NeedlemanWunschAlgorithm, ConstantGap]

    Implicit definition of needleman wunsch score for NeedlemanWunschAlgorithm.

    Implicit definition of needleman wunsch score for NeedlemanWunschAlgorithm.

    Definition Classes
    ScoreDefinitions
  18. implicit object OverlapScore extends OverlapImpl with WeightedScoringAlgorithm[OverlapAlgorithm, Int]

    Implicit definition of overlap score for OverlapAlgorithm.

    Implicit definition of overlap score for OverlapAlgorithm.

    Definition Classes
    ScoreDefinitions
  19. implicit object SmithWatermanGotohScore extends SmithWatermanImpl with WeightedScoringAlgorithm[SmithWatermanGotohAlgorithm, ConstantGap]

    Implicit definition of smith waterman gotoh score for SmithWatermanGotohAlgorithm.

    Implicit definition of smith waterman gotoh score for SmithWatermanGotohAlgorithm.

    Definition Classes
    ScoreDefinitions
  20. implicit object SmithWatermanScore extends SmithWatermanImpl with WeightedScoringAlgorithm[SmithWatermanAlgorithm, (Gap, Int)]

    Implicit definition of smith waterman score for SmithWatermanAlgorithm.

    Implicit definition of smith waterman score for SmithWatermanAlgorithm.

    Definition Classes
    ScoreDefinitions
  21. implicit object TverskyScore extends JaccardImpl with WeightedScoringAlgorithm[TverskyAlgorithm, Double]

    Implicit definition of tversky score for TverskyAlgorithm.

    Implicit definition of tversky score for TverskyAlgorithm.

    Definition Classes
    ScoreDefinitions
  22. implicit object MetaphoneScore extends MetaphoneImpl with SoundScoringAlgorithm[MetaphoneAlgorithm]

    Implicit definition of metaphone score for MetaphoneAlgorithm.

    Implicit definition of metaphone score for MetaphoneAlgorithm.

    Definition Classes
    SoundDefinitions
  23. implicit object SoundexScore extends SoundexImpl with SoundScoringAlgorithm[SoundexAlgorithm]

    Implicit definition of soundex score for SoundexAlgorithm.

    Implicit definition of soundex score for SoundexAlgorithm.

    Definition Classes
    SoundDefinitions

Inherited from SoundDefinitions

Inherited from ScoreDefinitions

Inherited from DistanceDefinitions

Inherited from AnyRef

Inherited from Any

Ungrouped