README --- SBB Java v1 Date: May 9, 2014 Contents of this archive: SBB-J_vXX.jar parameters.arg dataset.train dataset.test Description: This software is designed as a machine learning algorithm for classification tasks and comes as a single JAR file. It is currently enclosed in a wrapper in order to fulfill the FCUBE running requirements. Below you will find the operational requirements of this software, split into two categories: training and prediction. Please Note: This program requires Java 1.7 to run, but I assume you have that installed because not having Java 1.7 installed is so 2010. Training: Individuals are represented as Teams of Learners and Points in an environment. Teams proceed by enforcing actions on Points, which yield an outcome. When Teams attempt to act, they query every Learner about the best course of action and each Learner on the team presents their suggestion. The Team then chooses an action and the Learner whose action is chosen is given priority in later queries. Teams will grow through each iteration and then compete against each other against the test dataset. The Team that presented the most beneficial suggestions over time are considered when deciding which Team is the best. When a Team is found, it is saved to "bestTeam.model" in the local directory. The program will then terminate. Training Requirements: As per the guidelines of FCUBE, the training session can be initiated by executing SBB-J_vXX.jar as follows: java -jar SBBJ.jar -train datasetPath -minute min -properties argsPath When these three parameters are given together, SBB-J will initiate training mode. If any of these commands are absent, the software will not proceed. In the above execution command, "datasetPath" is the path to the desired dataset to be used for training purposes. This dataset's values may be separated by commas or spaces. Each line of this dataset must be some number of doubles (as defined by the setDim parameter below) and a single double or long value to act as a label. The number of lines in this dataset is defined by the trainSetSize parameter below. The minutes command takes a value "min", which is an integer for representing the number of minutes to allow the software to train for. This time is enforced internally and the software will cease training at the start of the generation that occurs immediately following the end of the timer. When the timer ends, the training software will move into a test phase where it will determine which Team is the best and save it as a model file named "bestTeam.model" in the local directory. The properties command accepts a path to a file containing additional arguments for use by the SBB-J software. It can read Java-style properties files ( parameterName=parameterValue ) as long as they are not commented. The accepted (and required) parameters are as follows: seed: Default = 0. This parameter allows you to change the seed given to the random number generator. envType: Default = datasetEnv. This is an internal distinction for the way SBB-J proceeds. You won't need to change this. Psize: Default = 16. This is the maximum Point population size. When generating Points, the population will be this value. Msize: Default = 16. This is the maxmimum Team population size. When generating Teams, the population will be this value. pd: Default = 0.1. This is the probability that Learners will be modified by deletion. This is usually not modified. pa: Default = 0.1. This is the probability that Learners will be modified by addition. This is usually not modified. mua: Default = 0.1. This is the probability that Actions will be modified by mutation. This is usually not modified. omega: Default = 9. This is the maximum size that any Team can be (how many Learners each has). Do not confuse this with Msize. t: Default = 100. This is the number of generations that the training phase will proceed to. Pgap: Default = 4. This is the number of Points that are lost during selection. The population tends to be Psize - Pgap. Mgap: Default = 4. This is the number of Teams that are lost during selection. The population tends to be Msize - Mgap. trainSetName: Default = dataset.train. This is the dataset to be used for training. It is overridden by the training execution command. testSetName: Default = dataset.test. This is the dataset to be used for testing. It is overridden by the test execution command below. trainSetSize: Default = 3772. The training set size (lines in dataset). The program will not run in training mode if this is incorrect. testSetSize: Default = 3428. The testing set size (lines in dataset). This does not need to be modified in this release. setDim: Default = 21. The number of features of the environment. The program will not run in training mode if this is incorrect. maxProgSize: Default = 48. The largest number of program commands a Learner can have. There's no need to change this value. pBidMutate: Default = 0.1. This is the probability that bids will be modified by mutation. pBidSwap: Default = 0.1. This is the probability that bids will be modified by mutation. pBidDelete: Default = 0.1. This is the probability that bids will be modified by mutation. pBidAdd: Default = 0.1. This is the probability that bids will be modified by mutation. statMod: Default = 10. This is the frequency by which statistical information will appear; currently once every statMod generations. Testing: The software will read in a Team model file, initialize a new environment of Points, then evaluate the Team against the Points. As the Team attempts to classify each point, it will print its prediction to an output file. When this process concludes, the program will terminate. Testing Requirements: As per the guidelines of FCUBE, the testing session can be initiated by executing SBB-J_vXX.jar as follows: java -jar SBBJ.jar -predict datasetPath -model teamPath -o outputPath When these three parameters are given together, SBB-J will initiate testing mode. If any of these commands are absent, the software will not proceed. In the above execution command, "datasetPath" is the path to the desired dataset to be used for testing purposes. This dataset's values may be separated by commas or spaces. The model command is followed by "teamPath", a path to the Team model file to be tested. By default, the training software will create a model file named "bestTeam.model" and store it in the local directory. The o command is followed by "outputPath", a path to a file location where the Team's predictions will be saved. These predictions are stored as one label per line as per the FCUBE requirements. Note that the testing software does not require use of a parameters file. The restrictions (such as setDim) of the training program are lifted during the testing phase. Finally: The above should be enough information to use the SBB-J software with FCUBE. If you have any questions or do something weird (or even not weird) to this software and it breaks, feel free to e-mail Robert (Robert.Smith@dal.ca). Hopefully it doesn't come to that!