package stringdistance
Provides classes for calculating distances and fuzzy match similarities between two strings. Also provides implicits for using distance and fuzzy match scores as an operator, like:
val result = "abc" levenshtein "abc"
Includes functionality for phonetic comparisons between strings.
Overview
The main class to use is com.github.vickumar1981.stringdistance.StringDistance
If you include com.github.vickumar1981.stringdistance.StringConverter, you can convert/use the string distance and score functions as an operator between two strings.
To compare two strings phonetically, i.e. if they sound alike, use the com.github.vickumar1981.stringdistance.util.StringSound class.
To use in Java, please use the corresponding classes in the com.github.vickumar1981.stringdistance.util package.
- Java String Distance Class: com.github.vickumar1981.stringdistance.util.StringDistance
- Java String Sound Class: com.github.vickumar1981.stringdistance.util.StringSound
Class | Description |
---|---|
Singleton class with fuzzy match scores and distances | |
Implicit converstions between strings s1 and s2 | |
Phonetic comparison between strings s1 and s2 | |
Java class for fuzzy match scores and distances | |
Java class for phonetic comparison between strings s1 and s2 |
- Alphabetic
- By Inheritance
- stringdistance
- SoundDefinitions
- ScoreDefinitions
- DistanceDefinitions
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- trait CosineAlgorithm extends StringMetricAlgorithm
A marker interface for the cosine similarity algorithm.
- trait DamerauLevenshteinAlgorithm extends StringMetricAlgorithm
A marker interface for the damerau levenshtein distance algorithm.
- trait DiceCoefficientAlgorithm extends StringMetricAlgorithm
A marker interface for the dice coefficient algorithm.
- trait DistanceAlgorithm[+T <: StringMetricAlgorithm] extends AnyRef
A type class to extend a distance method to StringMetricAlgorithm.
- trait HammingAlgorithm extends StringMetricAlgorithm
A marker interface for the hamming distance algorithm.
- trait JaccardAlgorithm extends StringMetricAlgorithm
A marker interface for a jaccard similarity algorithm.
- trait JaroAlgorithm extends StringMetricAlgorithm
A marker interface for the jaro similarity algorithm.
- trait JaroWinklerAlgorithm extends StringMetricAlgorithm
A marker interface for the jaro winkler algorithm.
- trait LevenshteinAlgorithm extends StringMetricAlgorithm
A marker interface for the levenshtein distance algorithm.
- trait LongestCommonSeqAlorithm extends StringMetricAlgorithm
A marker interface for the longest common subsequence algorithm.
- trait MetaphoneAlgorithm extends StringMetricAlgorithm
A marker interface for the metaphone algorithm.
- class MetaphoneImplWrapper extends MetaphoneImpl
Java Wrapper for metaphone similarity.
- trait NGramAlgorithm extends StringMetricAlgorithm
A marker interface for the n-gram similarity algorithm.
- trait NeedlemanWunschAlgorithm extends StringMetricAlgorithm
A marker interface for the needleman wunsch similarity algorithm.
- trait OverlapAlgorithm extends StringMetricAlgorithm
A marker interface for the overlap similarity algorithm.
- trait ScorableFromDistance[+T <: StringMetricAlgorithm] extends ScoringAlgorithm[T]
A mix-in trait to extend a score method using the distance method to StringMetricAlgorithm.
- trait ScoringAlgorithm[+T <: StringMetricAlgorithm] extends AnyRef
A type class to extend a score method to StringMetricAlgorithm.
- trait SmithWatermanAlgorithm extends StringMetricAlgorithm
A marker interface for the smith waterman similarity algorithm.
- trait SmithWatermanGotohAlgorithm extends StringMetricAlgorithm
A marker interface for the smith waterman gotoh similarity algorithm.
- trait SoundScoringAlgorithm[+T <: StringMetricAlgorithm] extends AnyRef
A type class to extend a sound score method to StringMetricAlgorithm.
- trait SoundexAlgorithm extends StringMetricAlgorithm
A marker interface for the soundex similarity algorithm.
- class SoundexImplWrapper extends SoundexImpl
Java Wrapper for soundex similarity.
- trait StringMetric[A <: StringMetricAlgorithm] extends AnyRef
Defines implementation for StringMetricAlgorithm by adding implicit definitions from DistanceAlgorithm, ScoringAlgorithm, WeightedDistanceAlgorithm, or WeightedScoringAlgorithm
- trait StringMetricAlgorithm extends AnyRef
A marker interface for the string metric algorithm.
- trait StringSoundMetric[A <: StringMetricAlgorithm] extends AnyRef
- trait TverskyAlgorithm extends StringMetricAlgorithm
A marker interface for the tversky similarity algorithm.
- trait WeightedDistanceAlgorithm[+A <: StringMetricAlgorithm, B] extends AnyRef
A type class to extend a distance method with a 2nd typed parameter to StringMetricAlgorithm.
- trait WeightedScoringAlgorithm[+A <: StringMetricAlgorithm, B] extends AnyRef
A type class to extend a score method with a 2nd typed parameter to StringMetricAlgorithm.
- trait WeightedStringMetric[A <: StringMetricAlgorithm, B] extends AnyRef
Value Members
- implicit def gapToGapAndWindow(g: Gap): (Gap, Int)
- object ArrayDistance
Main class to work with generic arrays, Array[T], analagous to StringDistance
Main class to work with generic arrays, Array[T], analagous to StringDistance
import com.github.vickumar1981.stringdistance.ArrayDistance._ // Example Levenshtein Distance and Score val levenshteinDist = Levenshtein.distance(Array("m", "a", "r", "t", "h", "a"), Array("m", "a", "r", "h", "t", "a")) // 2 val levenshtein = Levenshtein.score(Array("m", "a", "r", "t", "h", "a"), Array("m", "a", "r", "h", "t", "a")) // 0.667
- object StringConverter
Object to extend operations to the String class.
Object to extend operations to the String class.
import com.github.vickumar1981.stringdistance.StringConverter._ // Scores between two strings val cosSimilarity: Double = "hello".cosine("chello") val damerau: Double = "martha".damerau("marhta") val diceCoefficient: Double = "martha".diceCoefficient("marhta") val hamming: Double = "martha".hamming("marhta") val jaccard: Double = "karolin".jaccard("kathrin") val jaro: Double = "martha".jaro("marhta") val jaroWinkler: Double = "martha".jaroWinkler("marhta") val levenshtein: Double = "martha".levenshtein("marhta") val needlemanWunsch: Double = "martha".needlemanWusnch("marhta") val ngramSimilarity: Double = "karolin".nGram("kathrin") val bigramSimilarity: Double = "karolin".nGram("kathrin", 2) val overlap: Double = "karolin".overlap("kathrin") val smithWaterman: Double = "martha".smithWaterman("marhta") val smithWatermanGotoh: Double = "martha".smithWatermanGotoh("marhta") val tversky: Double = "karolin".tversky("kathrin", 0.5) // return a List[String] of ngram tokens val tokens = "martha".tokens(2) // List("ma", "ar", "rt", "th", "ha") // Distances between two strings val damerauDist: int = "martha".damerauDist("marhta") val hammingDist: Int = "martha".hammingDist("marhta") val levenshteinDist: Int = "martha".levenshteinDist("marhta") val longestCommonSeq: Int = "martha".longestCommonSeq("marhta") val ngramDist: Int = "karolin".nGramDist("kathrin") val bigramDist: Int = "karolin".nGramDist("kathrin", 2) // Phonetic similarity of two strings val metaphone: Boolean = "merci".metaphone("mercy") val soundex: Boolean = "merci".soundex("mercy")
- object StringDistance
Main class to organize functionality of different string distance algorithms
Main class to organize functionality of different string distance algorithms
import com.github.vickumar1981.stringdistance.StringDistance._ import com.github.vickumar1981.stringdistance.impl.{ConstantGap, LinearGap} // Scores between strings val cosSimilarity: Double = Cosine.score("hello", "chello") val damerau: Double = Damerau.score("martha", "marhta") val diceCoefficient: Double = DiceCoefficient.score("martha", "marhta") val hamming: Double = Hamming.score("martha", "marhta") val jaccard: Double = Jaccard.score("karolin", "kathrin", 1) val jaro: Double = Jaro.score("martha", "marhta") val jaroWinkler: Double = JaroWinkler.score("martha", "marhta", 0.1) val levenshtein: Double = Levenshtein.score("martha", "marhta") val needlemanWunsch: Double = NeedlemanWunsch.score("martha", "marhta", ConstantGap()) val ngramSimilarity: Double = NGram.score("karolin", "kathrin", 1) val bigramSimilarity: Double = NGram.score("karolin", "kathrin", 2) val overlap: Double = Overlap.score("karolin", "kathrin", 1) val smithWaterman: Double = SmithWaterman.score("martha", "marhta", (LinearGap(gapValue = -1), Integer.MAX_VALUE)) val smithWatermanGotoh: Double = SmithWatermanGotoh.score("martha", "marhta", ConstantGap()) val tversky: Double = Tversky.score("karolin", "kathrin", 0.5) // Distances between strings val damerauDist: Int = Damerau.distance("martha", "marhta") val hammingDist: Int = Hamming.distance("martha", "marhta") val levenshteinDist: Int = Levenshtein.distance("martha", "marhta") val longestCommonSubSeq: Int = LongestCommonSeq.distance("martha", "marhta") val ngramDist: Int = NGram.distance("karolin", "kathrin", 1) val bigramDist: Int = NGram.distance("karolin", "kathrin", 2) // return a List[String] of ngram tokens val tokens = NGram.tokens("martha", 2) // List("ma", "ar", "rt", "th", "ha")
- object StringSound
Main class to organize functionality of different phonetic/sound string algorithms
Main class to organize functionality of different phonetic/sound string algorithms
import com.github.vickumar1981.stringdistance.StringSound._ import com.github.vickumar1981.stringdistance.implicits._ // Phonetic similarity between strings val metaphone: Boolean = Metaphone.score("merci", "mercy") val soundex: Boolean = Soundex.score("merci", "mercy")
- implicit object DamerauLevenshteinDistance extends LevenshteinDistanceImpl with DistanceAlgorithm[DamerauLevenshteinAlgorithm] with ScorableFromDistance[DamerauLevenshteinAlgorithm]
Implicit definition of damerau levenshtein distance for DamerauLevenshteinAlgorithm.
Implicit definition of damerau levenshtein distance for DamerauLevenshteinAlgorithm.
- Definition Classes
- DistanceDefinitions
- implicit object HammingDistance extends HammingImpl with DistanceAlgorithm[HammingAlgorithm] with ScorableFromDistance[HammingAlgorithm]
Implicit definition of hamming distance for HammingAlgorithm.
Implicit definition of hamming distance for HammingAlgorithm.
- Definition Classes
- DistanceDefinitions
- implicit object LevenshteinDistance extends LevenshteinDistanceImpl with DistanceAlgorithm[LevenshteinAlgorithm] with ScorableFromDistance[LevenshteinAlgorithm]
Implicit definition of levenshtein distance for LevenshteinAlgorithm.
Implicit definition of levenshtein distance for LevenshteinAlgorithm.
- Definition Classes
- DistanceDefinitions
- implicit object LongestCommonSeqDistance extends LongestCommonSeqImpl with DistanceAlgorithm[LongestCommonSeqAlorithm]
Implicit definition of longest common subsequence for CosineAlgorithm.
Implicit definition of longest common subsequence for CosineAlgorithm.
- Definition Classes
- DistanceDefinitions
- implicit object NGramDistance extends NGramImpl with WeightedDistanceAlgorithm[NGramAlgorithm, Int]
Implicit definition of n-gram distance for NGramAlgorithm.
Implicit definition of n-gram distance for NGramAlgorithm.
- Definition Classes
- DistanceDefinitions
- implicit object CosSimilarityScore extends CosSimilarityImpl with ScoringAlgorithm[CosineAlgorithm]
Implicit definition of cosine similarity score for CosineAlgorithm.
Implicit definition of cosine similarity score for CosineAlgorithm.
- Definition Classes
- ScoreDefinitions
- implicit object DiceCoefficientScore extends DiceCoefficientImpl with ScoringAlgorithm[DiceCoefficientAlgorithm]
Implicit definition of dice coefficient score for DiceCoefficientAlgorithm.
Implicit definition of dice coefficient score for DiceCoefficientAlgorithm.
- Definition Classes
- ScoreDefinitions
- implicit object JaccardScore extends JaccardImpl with WeightedScoringAlgorithm[JaccardAlgorithm, Int]
Implicit definition of jaccard score for JaccardAlgorithm.
Implicit definition of jaccard score for JaccardAlgorithm.
- Definition Classes
- ScoreDefinitions
- implicit object JaroScore extends JaroImpl with ScoringAlgorithm[JaroAlgorithm]
Implicit definition of jaro score for JaroAlgorithm.
Implicit definition of jaro score for JaroAlgorithm.
- Definition Classes
- ScoreDefinitions
- implicit object JaroWinklerScore extends JaroImpl with WeightedScoringAlgorithm[JaroWinklerAlgorithm, Double]
Implicit definition of jaro winkler score for JaroWinklerAlgorithm.
Implicit definition of jaro winkler score for JaroWinklerAlgorithm.
- Definition Classes
- ScoreDefinitions
- implicit object NGramScore extends NGramImpl with WeightedScoringAlgorithm[NGramAlgorithm, Int]
Implicit definition of n-gram score for NGramAlgorithm.
Implicit definition of n-gram score for NGramAlgorithm.
- Definition Classes
- ScoreDefinitions
- implicit object NeedlemanWunschScore extends NeedlemanWunschImpl with WeightedScoringAlgorithm[NeedlemanWunschAlgorithm, ConstantGap]
Implicit definition of needleman wunsch score for NeedlemanWunschAlgorithm.
Implicit definition of needleman wunsch score for NeedlemanWunschAlgorithm.
- Definition Classes
- ScoreDefinitions
- implicit object OverlapScore extends OverlapImpl with WeightedScoringAlgorithm[OverlapAlgorithm, Int]
Implicit definition of overlap score for OverlapAlgorithm.
Implicit definition of overlap score for OverlapAlgorithm.
- Definition Classes
- ScoreDefinitions
- implicit object SmithWatermanGotohScore extends SmithWatermanImpl with WeightedScoringAlgorithm[SmithWatermanGotohAlgorithm, ConstantGap]
Implicit definition of smith waterman gotoh score for SmithWatermanGotohAlgorithm.
Implicit definition of smith waterman gotoh score for SmithWatermanGotohAlgorithm.
- Definition Classes
- ScoreDefinitions
- implicit object SmithWatermanScore extends SmithWatermanImpl with WeightedScoringAlgorithm[SmithWatermanAlgorithm, (Gap, Int)]
Implicit definition of smith waterman score for SmithWatermanAlgorithm.
Implicit definition of smith waterman score for SmithWatermanAlgorithm.
- Definition Classes
- ScoreDefinitions
- implicit object TverskyScore extends JaccardImpl with WeightedScoringAlgorithm[TverskyAlgorithm, Double]
Implicit definition of tversky score for TverskyAlgorithm.
Implicit definition of tversky score for TverskyAlgorithm.
- Definition Classes
- ScoreDefinitions
- implicit object MetaphoneScore extends MetaphoneImpl with SoundScoringAlgorithm[MetaphoneAlgorithm]
Implicit definition of metaphone score for MetaphoneAlgorithm.
Implicit definition of metaphone score for MetaphoneAlgorithm.
- Definition Classes
- SoundDefinitions
- implicit object SoundexScore extends SoundexImpl with SoundScoringAlgorithm[SoundexAlgorithm]
Implicit definition of soundex score for SoundexAlgorithm.
Implicit definition of soundex score for SoundexAlgorithm.
- Definition Classes
- SoundDefinitions