Download binaries from the TreeExtra main web page. You can also download sources and compile them.
Prepare your data set following the instructions on input data format. You will need train, validation, test and attribute files. Here is a sample synthetic data set: data.train, data.valid, data.test, data.attr.
Create a new folder where you want to run this experiment and cd there. Output and temporary files will be placed in this folder.
Run ag_train (if needed, modify the file names in the following command
line):
> ag_train -t data.train -v data.valid -r data.attr
The log output will end with the recommendation which command to run next. Most likely the
recommendation will be to run ag_expand. Keep following
recommendations (often it takes about 4 runs of ag_expand) until you
run ag_save.
... recommendation: ag_expand -n 11 -b 100
> ag_expand -n 11 -b 100
... recommendation: ag_expand -n 16 -b 140
> ag_expand -n 16 -b 140
... recommendation: ag_save -a 0.1 -n 11
> ag_save -a 0.1 -n 11
The best model is saved in the file model.bin.
Run ag_predict on the test data:
> ag_predict -p data.test -r data.attr
... RMSE: 0.253168
That's it. The predictions on the test set are saved in preds.txt.
If you can afford to increase the running time of the program, I recommend you repeat the same
experiment in the slow mode. This will create a better model with better performance. To do it,
run ag_train with an additional flag -s slow. The
rest of the process is the same.
> ag_train -t data.train -v data.valid -r data.attr -s slow
> ag_expand ...
> ag_expand ...
...
> ag_save ...
> ag_predict -p data.test -r data.attr
... RMSE: 0.239593
Check out the rest of Additive Groves manual for other options like parallelization, evaluation by ROC, "superfast" training with fixed parameters, etc.