Tabata is a concept learning algorithm. Given a learning set of
instances of k disjoint classes, Tabata outputs a concept description
for each class. Each description is made of a variable number of
conjunctive terms. Each term is expressed as a list of atoms. Instances
also are represented as lists of atoms (negation is not handled by
Tabata, so negative litterals have to be explicitly defined as atoms).
A decision procedure is provided in order to classify test instances.
Tabata is fully described in the ECAI 98 paper. Here we just give some
details to use it.
 
 

 SOURCES OF TABATA:

The main directory (TABATADIR) is organized as follows:

DeciCNClass     DeciKnClass     Read-Me         Tabata          lib
DeciCNStat      DeciKnStat      TRIAL           bin

It contains a Read-Me (this file).
The sources of Tabata are in the subdirectory SRC of the directory
Tabata.
 
 

SRC
 

SRC contains the C sources of Tabata (*.c, *.h and a makefile
mak_newtabata).  This makefile works for workstation running Solaris and
should be modified in order to run others OS.  The makefile use gcc as a
compiler. The resulting binary "Tabata" is put in the "bin" directory.

Some parameters of Tabata are "hard coded". It means that, for
instance, the maximum number of classes which can be processed is 40.
If you want to run Tabata with more than 40 classes, you must recompile Tabata
using the file "mak_newtabata" (more precisely, you must change the parameter
CARDCLASSES in the ".h" file and then do "make -f mak_newtabata" ; notice
that recompiling Tabata requires the directory "bin" and "lib",  see the make
file for more details).
The "hard coded" parameters - which can be found in the file "definitions0.h",
"definitions1.h" and "definitions2.h" - are the following":

CARDCLASSES (maximum number of classes)(40),
LONGINSTANCES (maximum size of an instance (string of bits))(600),
LONGDIVERS (maximum size of the name of classes)(50),
MAXTABOU (maximum number of elements of the tabu list)(50),
MAXPOS (maximum number of positive instances)(20000),
MAXNEG (maximum number of negative instances)(20000).

Remark: running Tabata without parameters returns the values of these
"frozen parameters".
 

TRIAL

TRIAL contains the material to test Tabata. The files of the trial have
been made with instances of the "waveform problem" from UCI Irvin
Databases (300 learning instances and 4700 test instances). The
instances are described using 21 numerical variables. A prior
discretization of instances have been performed, resulting in 295
atoms. This preprocessing  depends on the learning instances but does
not depends on the classes (see the paper for details).
 

"wave_trial.app" contains the learning set (have a look at it to see
its shape).  "wave_trial.test"contains the test set.

Each line of these files is organized as follows:  a string of n bits
representing the atomic representation of the instance, then a blank
followed by a string representing the class to which belongs the
instance. Concerning the atomic representation, the n bits correspond
to the n atoms allowing to describe the instances. Each bit is set to 1
if the corresponding atom belongs to the instance and else the bit is
set to 0.

As described in the paper, Tabata has 7 parameters.

1) InstancesFile (String): name of the learning file which contains a
set of instances (expressed as strings of bits), each instance being
associated to a name of Class.

2) TabuSize (integer): size of the Tabu list (typically 10 < MAXTABOU).

3) Starting (T or B): T means that the search starts from the Top of
the lattice, and B means that it starts from the Bottom of the
lattice (generally T is preferred).

4) Closure (O, N or M): concerns computation of the neighborhood of the
current term. Tabata computes all most specific generalizations and
all most general specialisations of the current term.  N means that
specialisation is purely syntactic (adding one atom). O means that
syntactic specialisations are transformed to closed terms (most
specific ones). M is as N but the current term is first transformed
into a closed term (O is usually fastest as searching a reduced space,
specially when instance description is redundant).

5) OrderTest (O, N): concerns the last step when computing  upper and
lower covers of the current term. During this step terms that are NOT
most specific generalisations (together with terms that are NOT most
general specialisations) are eliminated. N means that the last step is
not performed (so saving computations), O means that the last step is
performed (O is the usual value).

6) MaxSearch (Integer): this parameter concerns the stopping criteria.
If the current Best Term is not updated after Maxsearch neighboorhood
computations, the search is stopped.
 

7) ResultsPrefix (String) : the results of Tabata are saved in
various files. The prefix of all these files is $ResultsPrefix. As
tabata learns multiple concepts, it generates two files for each concept
(or Class).

The first one $ResultsPrefix.SAT.$ClassName contains the terms learned
by tabata (left-part of rules concluding to $ClassName). Each term is a
closed term (when Tabata is run with $Closure = N, the closure is
performed before to build $ResultPrefix.SAT.$ClassName). Each line
contains a binary string (the term) followed by $ClassName, and follwed
by the numbers of positive and negatives learning instances covered by
the term (usally the last number is 0).

The second one $ResultsPrefix.DESAT.$ClassName also contains the terms
learned by tabata,  but each learned (closed) term is first transformed
into a most general term that covers the same positive and negative
instances as the learned term.

The last two files $ResultsPrefix.SATALL and $ResultsPrefix.DESATALL
respectively contains the concatenation of all
$ResultsPrefix.SAT.$ClassName and the concatenation of all
$ResultsPrefix.DESAT.$ClassName.
 
 
 
 
 
 

Running "Tabata wave_trial.app 7 T O O 10 TOTO" generates the following
files:

TOTO.DESAT.0, TOTO.DESAT.1, TOTO.DESAT.2 contain the results of
Tabata concerning the three-class (0, 1, 2) problem which learning
instances are in the file  wave_trial.app. Here the terms are most
general ones. TOTO.DESATALL contains the terms of the three above
files.

TOTO.SAT.0, TOTO.SAT.1, TOTO.SAT.2 contain the results of Tabata
concerning the same three-class problem, but here the terms are most
specific ones. TOTO.SATALL contains the terms of the three above
files.
 

   DECISION:
 

Tabata has been provided with decision functions that take as inputs
concept definitions of concepts, as learned by Tabata, together with
test instances. As explained in the paper, there is two ways to apply
learned terms to new instances. The first one (referred to as [CN]) is
the usual way when handling unordered rules (as in the CN algorithm):

1) if one term matches the instance, the result is the corresponding
Class,

2) if several terms apply, the Class corresponding to the
highest sum of weights is returned,

3) if no term matches the instance, the default Class is returned.
This last case is seldom when terms are most general ones as new instances
most often match some term. So this mode will be used with
$ResultsPrefix.DESATALL files.

The second one (referred to as [k=n] is devoted to closed term (i.e.
most specific terms) and is similar to decision procedures usually used
in instance-based learning situations (see RISE for instance). In this
case, the decision function is similar to [CN] in the two first
situations (one term and several terms match the new instance) but
differs in using a k-nearest-neighbour approach to classify the new
instance. This mode will be used with $ResultsPrefix.SATALL files.

For each decision type ([CN] and [k=n]) we provide two functions.  The
first one gives statistics about the test instances and requires to give
as an input the correct Class for each test instance. Its name here is
either DeciCNStat ([CN] case) or DeciKnStat ([k=n] case). The second
one give for each test instance the predicted Class. Its name here is
either DeciCNClass or DeciKnClass.

--------------

DeciKnStat (subdirectory SRC of directory DeciKnStat) has five parameters:
 

ClassNames (string): name of the file containing the name of the
classes (as appearing in $ResultsPrefix.SATALL).

LearnedTerms (string): name of the file containing the results of
learning (generally $ResultsPrefix.SATALL).

TestFile (string): name of the file containing the test instances. Each
line contains an instance (a binary string) followed by the correct
Class name to predict.

kNN (integer): the number k of nearest neighboors (usually 1)

Weighting (M or m):  if several terms apply, the Class corresponding
to the highest sum of weights is returned. Each term has a weight. When
$Weighting is m, all weights are equal to 1. When  $Weighting is M the
weight of each term is the number of positive instances covered by this
term (as found in $ResultsPrefix.xxALL).

DeciKnStat only returns the whole accuracy (ratio of correctly
classified test instances).

Running "DeciKnStat Classes TOTO.SATALL wave_trial.test 1 M" generates
0.787447.

-------------

DeciKnClass (subdirectory SRC of directory DeciKnClass) has five parameters:

ClassNames (string)

LearnedTerms (string)

UnclassifiedFile (string): name of the file containing the test
instances. Here each line only contains the instance (a binary
string).

kNN (integer)

Weighting (M or m)

DeciKnClass returns a set of lines. Each line corresponds to a test
instance and contains the predicted Class.

Running "DeciKnClass Classes TOTO.SATALL wave_trial.unknown 1 M", where
file "wave_trial.unknown" contains six unclassified instances, returns
0
0
1
1
2
2

--------------

DeciCNStat (subdirectory SRC of directory DeciCNStat) has three parameters:

ClassNames (string)

LearnedTerms (string): (generally $ResultsPrefix.DESATALL).

TestFile (string)
 

DeciCNStat only returns the whole accuracy (ratio of correctly
classified test instances).

Running "DeciCNStat Classes TOTO.DESATALL wave_trial.test" returns 0.746809.
 

 -------------

DeciCNClass (subdirectory SRC of directory DeciCNClass) has three parameters:

ClassNames (string)

LearnedTerms (string)

UnclassifiedFile (string): name of the file containing the test
instances. Here each line only contains the instance (a binary
string).

DeciCNClass returns a set of lines. Each line corresponds to a test
instance and contains the predicted Class.

Running "DeciCNClass Classes TOTO.DESATALL wave_trial.unknown" returns
0
0
1
1
2
2

---------------
Details and examples about the way to code attributes, numeric, hierarchic,
nominal, boolean as lists of atoms, will be found in the companion file
"CodeMe".