Class LogRegWeka
- Author:
 - mo55
 
- 
Constructor Summary
ConstructorsConstructorDescriptionMain Constructor - create a Multinomial Logistic Regression Model based on the passed training data, using the WEKA library - 
Method Summary
Modifier and TypeMethodDescriptiongetAttFromProb(double p) Important: this method ONLY works for binomial (2-class) datasets with a single attribute x, and will return null if that is not truedouble[][]Return the coefficients of the classifierdouble[][]double[]getDistribution(double[] x) Passing the input array into the current regression model, a double array is passed back which contains the percentage values for each of the possible output classifications.Return exception string from model if it failed to fit.getPrediction(double[] x) Get a prediction (output variable) based on the passed input array.static voidTest model using data from https://machinelearningmastery.com/logistic-regression-tutorial-for-machine-learning/booleansetTrainingData(double[][] x, double[] y) Take the training data passed in, convert to something that WEKA understands and then create a logistic regression modelbooleansetTrainingData(double[] xVar, double[] yVar) Logistic regression where y variable i spassed in as a single array (e.g.booleansetTrainingData(double[] xVar, int[] yVar) Logistic regression where y variable i spassed in as a single array (e.g. 
- 
Constructor Details
- 
LogRegWeka
public LogRegWeka()Main Constructor - create a Multinomial Logistic Regression Model based on the passed training data, using the WEKA library 
 - 
 - 
Method Details
- 
setTrainingData
public boolean setTrainingData(double[] xVar, int[] yVar) Logistic regression where y variable i spassed in as a single array (e.g. array of ranges) and y Variable may not be 0's or 1's.- Parameters:
 xVar-yVar-- Returns:
 
 - 
setTrainingData
public boolean setTrainingData(double[] xVar, double[] yVar) Logistic regression where y variable i spassed in as a single array (e.g. array of ranges) and y Variable may not be 0's or 1's.- Parameters:
 xVar-yVar-- Returns:
 
 - 
setTrainingData
public boolean setTrainingData(double[][] x, double[] y) Take the training data passed in, convert to something that WEKA understands and then create a logistic regression model- Parameters:
 x- a 2d array of training data. Columns are any number of input variables (x1, x2, x3... aka attributes) and rows are data pointsy- the output variable. Length of array should match number of rows in x parameter. Since this is a logistic regression, the output is considered 'nominal' and not numeric - a distinct classification, and not a continuous variable. It's odd that nominal values should be passed as doubles, but that's what WEKA wants. For best results, use continuous integers starting at 0 - e.g. 0, 1, 2, 3 etc.
Also, there can't be any gaps in the output of the training dataset - you can't have 0, 1, 2, 4. WEKA will throw an error.
Keep track in your own code of what each value represents (e.g. for a binomial problem, 0=yes and 1=no; for a weather problem, 0=cold, 1=warm, 2=hot, etc).- Returns:
 - true=successful, false=unsuccessful
 
 - 
getPrediction
Get a prediction (output variable) based on the passed input array. The order of the elements in the x array must match the order that was used in the training data. The Double output references the unique values that were used in the training data (0, 1, 2, etc).- Parameters:
 x- an array containing the input variables to use in the regression- Returns:
 
 - 
getDistribution
public double[] getDistribution(double[] x) Passing the input array into the current regression model, a double array is passed back which contains the percentage values for each of the possible output classifications. Thus, if there are 3 potential classes (0, 1 and 2) then the method will return a 3-element array with a percentage in each index corresponding to the probability of the input variable falling into the corresponding category.- Parameters:
 x- an array containing the input variables to use in the regression- Returns:
 
 - 
getCoefficients
public double[][] getCoefficients()Return the coefficients of the classifier- Returns:
 
 - 
getCoeffUncertainty
public double[][] getCoeffUncertainty() - 
getAttFromProb
Important: this method ONLY works for binomial (2-class) datasets with a single attribute x, and will return null if that is not true
Given the probability p of classification as the second class, this method returns the attribute x required. If interested in the probability of classification as the first class, pass the value 1-p instead.
The equation solved is P = 1 / (1 + e-(b0 + b1x)), where P is the probability desired for the second class, b0 and b1 are the coefficients, and x is the value that is solved for. Let's say there are 2 possible classes: undetected (y=0) and detected (y=1), and there is a single dependent attribute 'range'. If we want to know the range required for a 70% probability of classification as detected (y=1), we would call this method and pass it 0.7. If we wanted to know the range required for a 70% probability of classification as undetected (y=0), we would call this method and pass it 0.3.
- Parameters:
 p- the probability desired- Returns:
 - a Double value for the attribute, or null if this method fails
 - Throws:
 ArithmeticException- thrown if the attribute calculation returns infinity or NaN
 - 
main
Test model using data from https://machinelearningmastery.com/logistic-regression-tutorial-for-machine-learning/- Parameters:
 args-
 - 
getModelError
Return exception string from model if it failed to fit.- Returns:
 - the modelError
 
 
 -