Name Description Size
api.rs 22993
ext.rs 1267
mod.rs ! Provides packed multiple substring search, principally for a small number of patterns. This sub-module provides vectorized routines for quickly finding matches of a small number of patterns. In general, users of this crate shouldn't need to interface with this module directly, as the primary [`AhoCorasick`](crate::AhoCorasick) searcher will use these routines automatically as a prefilter when applicable. However, in some cases, callers may want to bypass the Aho-Corasick machinery entirely and use this vectorized searcher directly. # Overview The primary types in this sub-module are: [`Searcher`] executes the actual search algorithm to report matches in a haystack. [`Builder`] accumulates patterns incrementally and can construct a `Searcher`. [`Config`] permits tuning the searcher, and itself will produce a `Builder` (which can then be used to build a `Searcher`). Currently, the only tuneable knob are the match semantics, but this may be expanded in the future. # Examples This example shows how to create a searcher from an iterator of patterns. By default, leftmost-first match semantics are used. (See the top-level [`MatchKind`] type for more details about match semantics, which apply similarly to packed substring search.) ``` use aho_corasick::{packed::{MatchKind, Searcher}, PatternID}; # fn example() -> Option<()> { let searcher = Searcher::new(["foobar", "foo"].iter().cloned())?; let matches: Vec<PatternID> = searcher .find_iter("foobar") .map(|mat| mat.pattern()) .collect(); assert_eq!(vec![PatternID::ZERO], matches); # Some(()) } # if cfg!(all(feature = "std", any( # target_arch = "x86_64", target_arch = "aarch64", # ))) { # example().unwrap() # } else { # assert!(example().is_none()); # } ``` This example shows how to use [`Config`] to change the match semantics to leftmost-longest: ``` use aho_corasick::{packed::{Config, MatchKind}, PatternID}; # fn example() -> Option<()> { let searcher = Config::new() .match_kind(MatchKind::LeftmostLongest) .builder() .add("foo") .add("foobar") .build()?; let matches: Vec<PatternID> = searcher .find_iter("foobar") .map(|mat| mat.pattern()) .collect(); assert_eq!(vec![PatternID::must(1)], matches); # Some(()) } # if cfg!(all(feature = "std", any( # target_arch = "x86_64", target_arch = "aarch64", # ))) { # example().unwrap() # } else { # assert!(example().is_none()); # } ``` # Packed substring searching Packed substring searching refers to the use of SIMD (Single Instruction, Multiple Data) to accelerate the detection of matches in a haystack. Unlike conventional algorithms, such as Aho-Corasick, SIMD algorithms for substring search tend to do better with a small number of patterns, where as Aho-Corasick generally maintains reasonably consistent performance regardless of the number of patterns you give it. Because of this, the vectorized searcher in this sub-module cannot be used as a general purpose searcher, since building the searcher may fail even when given a small number of patterns. However, in exchange, when searching for a small number of patterns, searching can be quite a bit faster than Aho-Corasick (sometimes by an order of magnitude). The key take away here is that constructing a searcher from a list of patterns is a fallible operation with no clear rules for when it will fail. While the precise conditions under which building a searcher can fail is specifically an implementation detail, here are some common reasons: Too many patterns were given. Typically, the limit is on the order of 100 or so, but this limit may fluctuate based on available CPU features. The available packed algorithms require CPU features that aren't available. For example, currently, this crate only provides packed algorithms for `x86_64` and `aarch64`. Therefore, constructing a packed searcher on any other target will always fail. Zero patterns were given, or one of the patterns given was empty. Packed searchers require at least one pattern and that all patterns are non-empty. Something else about the nature of the patterns (typically based on heuristics) suggests that a packed searcher would perform very poorly, so no searcher is built. 4454
pattern.rs 17789
rabinkarp.rs 6440
teddy
tests.rs 18926
vector.rs 58880