Class Combiner<T>
- Type Parameters:
T- The generic type of the value which is mapped into each array position
- All Implemented Interfaces:
LongFunction<T[]>
Combiner - a combinatoric toolkit for NoSQLBench
Synopsis
Combiner is the core implementation of a combinatoric toolkit which is used by other NoSQLBench functions in a more type-specific way. It allows for a common approach to encoding unique values across a range of dimensions (which can be non-uniform) with an affine mapping between different forms of data.
Specifier
The specifier required by the constructor is a way to specify a range of character sets, each representing both the per-value labeling as well as the radix of each position in the associated index, value, or character position. Each position is delimited from the others with commas or semicolons. Each position can be either a single printable character or a range of characters separated by '-'. Optionally, you can repeat a position with a multiplier in the form of '*n' where n is any valid number.
Examples:
- "0-9A-F" - hexadecimal characters, one digit only ; 0123456789ABCDEF
- "0-9*12" - characters 0-9 in 12 digits, symbolic of values 000000000000 (0) .. 999999999999
- "5;5;5;-;8;6;7;-;5;3;0;9" - 12 digits with one character each, effectively 555-867-5309, a single value
- "0-9;*2_=24j36*5*1" - a somewhat random pattern with char sets [0123456789] and [*2_=24j36*5], showing how '*1' at the end can be used to escape '*5'
Value Function
The function provided in the constructor is used to symbolically map the characters in the encoding string to a value of any type. The value function will be called with number of distinct values up the the cardinality of the largest position in the radix model. For example, a specifier of `A-Za-z0-9` would provide an input range from 0 to 61 inclusive to the value function. It is the combination of positions and unique values which provides the overall cardinality, although the value function itself is responsible for the relatively lower cardinality elements which are combined together to create higher-cardinality value arrays.
Types and Forms
Each form represents one way of seeing the data for a given cycle:
- ordinal (long) - also known as the cycle, or input. This is an enumeration of all distinct combinations.
- indexes (int[]) - an array of indexes, one for each position in the specifier and thus each element in the array or character in the encoding.
- encoding (String) - a string which encodes the ordinal and the indexes in a convenient label which is unique within the range of possible values.
- (values) array (T[]) - An array of the type T which can be provided via a mapping function. This is a mapping from the indexes through the provided value function.
Mapping between forms
The array value can be derived with apply(long), getArray(int[]) (int[])}, and
getArray(String),
given ordinal, indexes, or encoding as a starting point, respectively. This all ultimately use the one-way
function which you provide, thus you can't go from array form to the others.
Mapping between the other three is fairly trivial:
- You can get indexes from ordinal and encoding with
getIndexes(long)andgetArray(String). - You can get encoding from ordinal and indexes with
getEncoding(long)andgetEncoding(int[]). - You can get ordinal from indexes or encoding with
getOrdinal(int[])andgetOrdinal(String).
This makes it easy to derive textual identifiers for specific combinations of elements such as a vector, use them for cross-checks such as with correctness testing, and represent specific test values in a very convenient form within deterministic testing harnesses like NoSQLBench.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionT[]apply(long value) Return an array ofCombinerelements by indexing into the sequence of character sets and their relative cardinality to derive column-specific index, and then converting them to the type T through the provided function.static long[]computeRadixFactors(char[][] charsets) T[]getArray(int[] indexes) T[]getEncoding(int[] indexes) getEncoding(long ordinal) int[]getIndexes(long value) Get the indexes directly which are used byapply(long)int[]getIndexes(String encoding) longgetOrdinal(int[] indexes) Using the provided column offsets, derive the ordinal value which matches it.longgetOrdinal(String name) Using the provided name, derive the ordinal value which matches it.static int[][]invertedIndexFor(char[][] charsetColumns) static int[][]invertedIndexFor(String charsetsSpecifier) static char[][]Parse the spec, yielding an array of character arrays. each position in the spec delimited by comma or semicolon is represented by an array.static char[]Parse the range and return set of characters in an array.Create a list of characters from the US ASCII plane based on a start and end character.
-
Constructor Details
-
Combiner
Construct a combiner which can compose unique combinations of array data.- Parameters:
spec- The string specifier, as explained inCombinerdocs.elementFunction- The function that indexes into a unique population of T elementselementClazz- The component type for the values array which are produced byapply(long)
-
-
Method Details
-
parseSpec
Parse the spec, yielding an array of character arrays. each position in the spec delimited by comma or semicolon is represented by an array. Each array is then constructed fromrangeFor(String).- Parameters:
rangesSpec- A range set specifier- Returns:
- An array of char arrays
-
rangeFor
Parse the range and return set of characters in an array. Any occurrences of a range specifier likea-zare expanded into the two characters and every on in between, in ordinal order. Otherwise, the characters are taken as they are presented. Each range is built and sanity checked byrangeFor(java.lang.String)to ensure ordering is valid as well as that the characters are all in the printable range of ordinal 32 to ordinal 126.- Parameters:
range- a character range specifier like 'a-z' or '1357'- Returns:
- An array of characters
-
rangeFor
Create a list of characters from the US ASCII plane based on a start and end character.- Parameters:
startChar- A single ASCII characterendChar- A single ASCII character, must be equal to or come after startChar- Returns:
- A list of characters in the range
-
invertedIndexFor
-
invertedIndexFor
public static int[][] invertedIndexFor(char[][] charsetColumns) -
apply
Return an array ofCombinerelements by indexing into the sequence of character sets and their relative cardinality to derive column-specific index, and then converting them to the type T through the provided function.- Specified by:
applyin interfaceLongFunction<T>- Parameters:
value- the function argument- Returns:
- a T which is identified by the provided value, unique if value is less than the maximum number of combinations, but repeated otherwise
-
getArray
- Parameters:
indexes- indexes derived fromgetIndexes(long)- Returns:
- a T[]
-
getArray
-
getEncoding
-
getEncoding
-
getIndexes
public int[] getIndexes(long value) Get the indexes directly which are used byapply(long)- Parameters:
value-- Returns:
- an offset array for each column in the provided charset specifiers
-
getIndexes
- Parameters:
encoding- the string encoding for the given ordinal- Returns:
- the indexes used to select a value from the value function for each position in the output array
-
getOrdinal
Using the provided name, derive the ordinal value which matches it.- Parameters:
name- - the textual name, expressed as an ASCII string- Returns:
- the long which can be used to construct the matching name or related array.
-
getOrdinal
public long getOrdinal(int[] indexes) Using the provided column offsets, derive the ordinal value which matches it.- Parameters:
indexes- - the indexes used to derive an array of values, or equivalently a name- Returns:
- the long which can be used to construct the matching name or related array.
-
computeRadixFactors
public static long[] computeRadixFactors(char[][] charsets)
-