Class WeightedStringsFromCSV

java.lang.Object
io.nosqlbench.virtdata.library.basics.shared.distributions.WeightedStringsFromCSV
All Implemented Interfaces:
LongFunction<String>
Direct Known Subclasses:
FirstNames, LastNames

public class WeightedStringsFromCSV extends Object implements LongFunction<String>
Provides sampling of a given field in a CSV file according to discrete probabilities. The CSV file must have headers which can be used to find the named columns for value and weight. The value column contains the string result to be returned by the function. The weight column contains the floating-point weight or mass associated with the value on the same line. All the weights are normalized automatically.

If there are multiple file names containing the same format, then they will all be read in the same way.

If the first word in the filenames list is 'map', then the values will not be pseudo-randomly selected. Instead, they will be mapped over in some other unsorted and stable order as input values vary from 0L to Long.MAX_VALUE.

Generally, you want to leave out the 'map' directive to get "random sampling" of these values.

This function works the same as the three-parametered form of WeightedStrings, which is deprecated in lieu of this one. Use this one instead.

  • Constructor Details

    • WeightedStringsFromCSV

      public WeightedStringsFromCSV(String valueColumn, String weightColumn, String... filenames)
      Create a sampler of strings from the given CSV file. The CSV file must have plain CSV headers as its first line.
      Parameters:
      valueColumn - The name of the value column to be sampled
      weightColumn - The name of the weight column, which must be parsable as a double
      filenames - One or more file names which will be read in to the sampler buffer
  • Method Details