@MinDoc(copyright="Copyright 2010 A. Weinert", author="Albrecht Weinert", version="V.11", lastModified="5.02.2019", lastModifiedBy="A. Weinert", usage="extend for clever substring search", purpose="common type for clever searches like Rabin Karp (RK), KMP etc.") public abstract class CleverSSS extends Object implements CharSequence, Comparable<CleverSSS>
CleverSSS
object holdsRK
, Knuth Morris Pratt (KMP
) and others is their
very successful striving for O(n). The standard ("naive"
loopinloop) algorithms, also used in String
and consorts, are
of type O(n*m).RK
and KMP
) hide their
constructors behind factory respectively make() methods. Inheritors do
neither have to handle the cases, where an
"{link optimistic
" search is
sufficient nor do they have to handle the the trivial cases ofignored
optionally.java.lang.String
:
String
regards an empty subsequence as occurring everywhere
or being every other String
's prefix. But the implementing classes
factories must recognise those special cases and indirectly call this
classes factories or constructors accordingly.CleverSSS
objects made or returned by the factories are immutable.
The rationale behind making objects of this type immutable is their
multiple and threadsafe reusability. Inheritors must be immutable too
and should preferably be final.CharSequence
a CleverSSS
object, besides providing
the search algorithms, is at the same time (as of type
CharSequence
) the sub sequence to search for. In case of
ignoreCase
being true this sequence
is all
lower case. In case of ignoreWS
it does not contain any blanks at
all and may hence be significantly shorter than the CharSequence
supplied to the factory method as the original search sequence.indexOf()
type are where heavy duty text processing tools — like
SVNkeys
or FuR
—
do spend most of their processing time [sic!]. For example, beautifying
all SVN keys of all (some 200) Frame4J's source files by
SVNkeys
once took over 15 seconds. Shifting to
RK
and then to KMP
took this down to 600ms on the same
machine. CleverSSS
's inheritors is usually well worth the virtually nil
extra effort.RK
Modifier and Type  Class and Description 

static class 
CleverSSS.Simple
Simple substring search, clever only for the short and optimistic
cases.

Modifier and Type  Field and Description 

String 
algName
The algorithm's (short) name.

boolean 
ignoreCase
The comparison / search is not case sensitive.

boolean 
ignoreWS
The comparison / search has to ignore all white space.

int 
len
The sub sequence's length.

static int 
mskMod
The modulo mask used in the (RK sliding) hash.

boolean 
optimisticOK
The very simple optimistic search is OK.

static int 
primF
The prime factor used in the (RK sliding) hash.

protected char[] 
subC
The internal character array.

Modifier  Constructor and Description 

protected 
CleverSSS(String algName,
boolean ignoreCase,
boolean ignoreWS)
Constructor for subclass / factory use, empty search sequence.

protected 
CleverSSS(String algName,
char[] subC,
boolean ignoreCase,
boolean ignoreWS,
boolean optimisticOK)
Constructor for subclass / factory use, no checks.

protected 
CleverSSS(String algName,
char subOne,
boolean ignoreCase,
boolean ignoreWS)
Constructor for subclass / factory use, single character.

Modifier and Type  Method and Description 

int 
allWhere(long[] therFnd,
CharSequence sequ,
int sI,
int mxLen,
boolean overlap)
All (respectively up to limit) findings of this sub sequence.

static StringBuilder 
asPair(StringBuilder bastel,
long val)
Formate a long as int pair.

static StringBuilder 
asPairs(StringBuilder bastel,
long[] vals)
Formate a long array as int pairs .

char 
charAt(int index)
The subsequence's value at
index . 
StringBuffer 
commonState()
The (common) state as StringBuffer.

int 
compareTo(CleverSSS other)
Compare with other CleverSSS object for order.

int 
endIndexOf(CharSequence sequ,
int sI)
An "endIndexOf" using this object's settings.

int 
endIndexOf(CharSequence sequ,
int sI,
int mxLen)
An "endIndexOf" using this object's settings.

boolean 
equals(Object other)
Compare with other CleverSSS object for same state.

int 
hashCode()
The sub sequence's (RK) hash value.

protected abstract long 
implAlgWhere(CharSequence sequ,
int lk,
int sI,
int mxSi)
The algorithmic implementation of whereImpl().

protected int 
implAlgWhere(long[] therFnd,
CharSequence sequ,
int lk,
int sI,
int mxSi,
boolean overlap)
The algorithmic implementation of allWhere().

int 
indexOf(CharSequence sequ,
int sI)
An "indexOf" using this object's final settings.

int 
indexOf(CharSequence sequ,
int sI,
int mxLen)
An "indexOf" using this object's final settings.

int 
lastIndexOf(CharSequence sequ,
int sI)
A "lastIndexOf" using this object's settings.

int 
lastIndexOf(CharSequence sequ,
int sI,
int minSI)
A "lastIndexOf" using this object's settings.

long 
lastWhereImpl(CharSequence sequ,
int sI,
int minSI)
A "lastIndexOf" (lastWhere) using this object's
settings.

int 
length()
The sub sequence's length.

static CleverSSS 
make(boolean ignoreCase,
boolean ignoreWS)
Make a CleverSSS object, to search for the empty substring.

static CleverSSS 
make(char subOne,
boolean ignoreCase,
boolean ignoreWS)
Make a CleverSSS object, to search for a single character.

static CleverSSS 
makeSimple(CharSequence sub,
boolean ignoreCase,
boolean ignoreWS)
Make a "stupid" or simple reference implementation.

static long 
pair2long(int sI,
int eI)
An int pair as long value.

String 
state()
The state as String.

CharSequence 
subSequence(int start,
int end)
A "sub sub" sequence (made as String) .

static char[] 
toCharA(CharSequence sub,
boolean ignoreCase,
boolean ignoreWS)
Prepare a character array from a sequence.

String 
toString()
The content (sub sequence) as String.

int[] 
where(CharSequence sequ,
int sI)
An "indexOf" (where) using this object's settings.

int[] 
where(CharSequence sequ,
int sI,
int mxLen)
An "indexOf" (where) using this object's settings.

long 
whereImpl(CharSequence sequ,
int sI,
int mxLen)
An "indexOf" (where) using this object's settings.

clone, finalize, getClass, notify, notifyAll, wait, wait, wait
chars, codePoints
public static final int primF
(RK)
implementations / inheritors of this class.RK
, but that has to be explored then.hashCode()
,
mskMod
,
Constant Field Valuespublic static final int mskMod
primF
and mskMod
here is
transporting Rabin Karp
's implementation details to its (this)
superclass. The rationale behind this design decision is to have the same
hash
algorithm in all CleverSSS
types. And that is by nature clearly determined by
Rabin Karp
.hashCode()
,
primF
,
Constant Field Valuespublic final int len
length()
protected final char[] subC
public final boolean ignoreCase
hashCode()
hashing and else.public final boolean ignoreWS
internal array
. They are not counted for the
subsequence's length
nor considered for
hashing
.String.trim()
.public final boolean optimisticOK
case
or
white space
, as well as, of course, for empty or
length one sequences.optimisticOK
true all searches are handled in this class
without calling the inheritor's
implementation
that
can restrict its cleverness to the "hard" requests.public final String algName
protected CleverSSS(String algName, char[] subC, boolean ignoreCase, boolean ignoreWS, boolean optimisticOK)
optimisticOK
being forced to true regardless of the respective
parameter if subC
is null, empty or of length one.algName
 the algorithms short name; null or empty defaults to
"CleverSSS (ext.)"subC
 the sequence to search foroptimisticOK
 true for trivial (length <= 1) cases as well as
those where an optimistic O(n) left to right search is feasibleignoreCase
,
ignoreWS
protected CleverSSS(String algName, char subOne, boolean ignoreCase, boolean ignoreWS)
ignoreWS
is true and subOne
is a space (<=' ') this
is considered as the empty sequence case.subOne
 single character search sequenceignoreCase
 true: ignore caseignoreWS
 true: ignore white spacesignoreCase
,
ignoreWS
protected CleverSSS(String algName, boolean ignoreCase, boolean ignoreWS)
ignoring
white space if so requested.ignoreCase
 true: ignore caseignoreWS
 true: ignore white spacesignoreCase
,
ignoreWS
public final int hashCode()
CleverSSS
objects, like RK
and KMP
, may very well
be used as (hash) keys. In the case of ignoreCase
true the hash
value respectively the hashing function is not case sensitive. This would
be a feature not available with String
s as keys.Rabin Karp
algorithm but irrelevant here KMP
uses RK
's hash function
for consistency within the type CleverSSS
.public final boolean equals(Object other)
other
object is of type
CleverSSS
and has (completely) the same state.compareTo(other)
.public final int length()
length
in interface CharSequence
public final String toString()
CleverSSS
's type CharSequence
and the naive use in String concatenating expressions.toString
in interface CharSequence
toString
in class Object
public final char charAt(int index) throws NullPointerException, ArrayIndexOutOfBoundsException
index
.CharSequence
.CharSequence
.charAt
in interface CharSequence
NullPointerException
 if this sequence is emptyArrayIndexOutOfBoundsException
 if index < 0
or >= length
public final CharSequence subSequence(int start, int end)
subSequence
in interface CharSequence
public final int compareTo(CleverSSS other)
other
CleverSSS and a negative value if the
other
CleverSSS is regarded as greater.longer
(sub) sequence or
if the length
s are equal the (sub) sequence being
greater by simple character value compare.equals()
: this.equals(other)
true implies that this.compareTo(other)
returns 0. But not the other way round!this.compareTo(other)
== 0 implies equal
subsequences to search for both objects. So, the consistency might be
adequate for most uses as Comparable
.ignoreCase
and
ignoreWS
, that might be somewhat surprising, especially when set
differently on both CleverSSS
objects to compare.compareTo
in interface Comparable<CleverSSS>
NullPointerException
 if other is nullpublic String state()
CleverSSS
object. The purpose is debugging or fabrication of
often used searches. The result is usually multi line.public final StringBuffer commonState()
state()
public static CleverSSS make(boolean ignoreCase, boolean ignoreWS)
public static CleverSSS make(char subOne, boolean ignoreCase, boolean ignoreWS)
ignoreCase
 true: ignore caseignoreWS
 true: ignore white spacespublic static CleverSSS makeSimple(CharSequence sub, boolean ignoreCase, boolean ignoreWS)
simple
"
CleverSSS
object that is not so clever in the light of Rabin,
Knuth & alii by implementing the real searches (naively) by O(n*m)
loop in loop (as does String
.String
's
indexOf methods — non regarding the extra ability to optionally
ignore case
and / or
white space
.sub
 the sequence to search forignoreCase
 true: ignore caseignoreWS
 true: ignore white spacesignoreCase
,
ignoreWS
public static char[] toCharA(CharSequence sub, boolean ignoreCase, boolean ignoreWS)
sub
 the input sequenceignoreCase
 true means convert to lower caseignoreWS
 true means do not transfer white spacespublic static final StringBuilder asPair(StringBuilder bastel, long val)
public static long pair2long(int sI, int eI)
sI
 first int (start index) goes to the lower 32 bitseI
 first int (end index) goes to the upper 32 bitspublic static final StringBuilder asPairs(StringBuilder bastel, long[] vals)
asPair(StringBuilder, long)
. The formatting stops at the end
of the array or at the first element == 1L (exclusive).public final int indexOf(CharSequence sequ, int sI, int mxLen)
CleverSSS
's (sub)
sequence
in the CharSequence
seq
supplied. The search is done once starting at position sI
and
going to the end of seq
respectively mxLen 1
.case
or ignoring
white space
(or both or none) according to this
CleverSSS
object's final settings.(int)whereImpl(sequ, sI, mxLen)
whereImpl(CharSequence, int, int)
.sequ
 the sequence in which sub is searched forsI
 the index in sequ
to start the search for this
object's (sub) sequence (<0 is regarded as 0)mxLen
 if > 0 and < seq
's length it is taken as
"shortened" length of sequ
public final int indexOf(CharSequence sequ, int sI)
indexOf(seq, sI, 0)
and(int)whereImpl(sequ, sI, 0)
whereImpl(CharSequence, int, int)
.sequ
 the sequence in which sub is searched forsI
 the index in sequ
to start the search for sub
(<0 is regarded as 0)public final int endIndexOf(CharSequence sequ, int sI, int mxLen)
indexOf(seq, sI, mxLen
except
for returning the match's end index + 1 instead of the first index. This
value can only be unequal to first index plus subsequence's
length
in the case of ignoring
white space.(int)(whereImpl(sequ, sI, mxLen) >>> 32)
whereImpl(CharSequence, int, int)
.sequ
 the sequence in which sub is searched forsI
 the index in sequ
to start the search for sub
(<0 is regarded as 0)mxLen
 if > 0 and < seq
's length it is taken as
"shortened" length of sequ
public final int endIndexOf(CharSequence sequ, int sI)
indexOf(seq, sI
except for
returning the match's end index + 1 instead of the first index. This
value can only be unequal to first index plus subsequence's
length
in the case of ignoring
white space.enIndexOf(seq, sI, 0)
and (int)(whereImpl(sequ, sI, 0) >>> 32)
whereImpl(CharSequence, int, int)
.sequ
 the sequence in which sub is searched forsI
 the index in sequ
to start the search for sub
(<0 is regarded as 0)seq
or 1 if no matchpublic final int[] where(CharSequence sequ, int sI, int mxLen)
final long ret = whereImpl(sequ, sI, mxLen);
return new int[] {(int) ret, (int)(ret >>> 32)};
*
this being also the implementation based on the inheritors implementation
of whereImpl(CharSequence, int, int)
.sequ
 the sequence in which sub is searched forsI
 the index in sequ
to start the search for sub
(<0 is regarded as 0)mxLen
 if > 0 and < seq
's length it is taken as
shortened length of sequ
public final int[] where(CharSequence sequ, int sI)
sequ
 the sequence in which sub is searched forsI
 the index in sequ
to start the search for sub
(<0 is regarded as 0)public final int allWhere(long[] therFnd, CharSequence sequ, int sI, int mxLen, boolean overlap)
therFnd
)
findings of this substring in seq
. The found spots are returned
as modifications of the array therFnd
supplied; see
whereImpl()
for the
interpretation of the long value.therFnd
supplied is longer as the (returned) number of findings
the next element ([# of findings]) will be set to 1L. This ensures a
comfortable break out solution in (enhanced for) loops over that
array.implAlgWhere(long[], CharSequence, int, int, int, boolean)
that
might cleverly be overridden in extending classes, if applicable.therFnd
 where to put all findings; this array will be
modified up to [return value] and at least up to [0] (if that
element exists)sequ
 the sequence in which sub is searched forsI
 the index in sequ
to start the search for sub
(<0 is regarded as 0)mxLen
 if > 0 and < seq
's length it is taken as
shortened length of sequ
overlap
 if true the findings may overlap, meaning that
"otto" is twice found in "ottotto" ( 0 and 3)
instead of once (from beginning to end)implAlgWhere(long[], CharSequence, int, int, int, boolean)
,
whereImpl(CharSequence, int, int)
protected int implAlgWhere(long[] therFnd, CharSequence sequ, int lk, int sI, int mxSi, boolean overlap)
allWhere()
after handling all trivial, exceptional, illegal and optimistic
cases.implAlgWhere(CharSequence, int, int, int)
in an adequately
organised loop. That's what can be done for most algorithms. But some
(like Knuth Morris Pratt
) could do those multiple searches
much better.public final long whereImpl(CharSequence sequ, int sI, int mxLen)
CleverSSS
objects subsequence within the sequence
seq
supplied as parameter. This method handles all trivial
and optimistic
cases by itself
delegating
only
the hard requests to the inheritor's clever
algorthm
.RK (Rabin Karp)
or
KMP (Knuth &alii)
do implement the famous inventor's
clever substring search algorithm by implementing the abstract method
implAlgWhere(CharSequence, int, int, int)
.CleverSSS.Simple
on the other hand provides the
(String
standard) O(n*m) simple loop in loop implementation as
a reference.indexOf(CharSequence, int, int)
,
indexOf(CharSequence, int)
,
endIndexOf(CharSequence, int, int)
,
where(CharSequence, int, int)
or
where(CharSequence, int)
.whereImpl(CharSequence, int, int)
instead of only a
where(..)
is avoiding the
throwaway object (returning one long primitive value instead of an
two element int array).sequ
 the sequence in which sub is searched forsI
 the index in sequ
to start the search for sub
(<0 is regarded as 0)mxLen
 if > 0 and < seq
's length it is taken as
shortened length of sequ
implAlgWhere(CharSequence, int, int, int)
,
optimisticOK
protected abstract long implAlgWhere(CharSequence sequ, int lk, int sI, int mxSi)
whereImpl(CharSequence, int, int)
does all parameter
checks and handles all exceptional / illegal cases by returning 1. It
also handles the optimistic
cases.sequ
 the sequence to search inlk
 its correct or shortened) lengthsI
 the starting index for the searchmxSi
 the maximum possible value of the match indexpublic long lastWhereImpl(CharSequence sequ, int sI, int minSI)
TextHelper.lastIndexOf(CharSequence, CharSequence, int)
with the
difference of returning both the index of the first matching and of the
first non matching character (behind) in seq
in case of
a hit.ignoreWS
false these both numbers just differ by the
length
of the sub sequence searched for.ignoring
search the
difference mentioned can be greater than length
as well as shorter than the subsequence used to make this
CleverSSS
object.whereImpl(CharSequence, int, int)
.simple
"
search objects, but may disappoint if used with an object of one of the
(clever) inheritors like RK
or KMP
.sequ
 the sequence in which sub is searched forsI
 the index in sequ
to start the search for sub
(<0 or too large is regarded as from the end)minSI
 a stopper down to where seq will be searched;
<= 0 is taken as 0; > (adjusted) sI will return 1public final int lastIndexOf(CharSequence sequ, int sI)
(int)lastWhereImpl(sequ, sI, 0)
.public final int lastIndexOf(CharSequence sequ, int sI, int minSI)
(int)lastWhereImpl(sequ, sI, minSI)
.