AminetAminet
Search:
84782 packages online
About
Recent
Browse
Search
Upload
Setup
Services

text/misc/dbacl-1.3.lha

Mirror:Random
Showing: ppc-warpup icongeneric icon
No screenshot available
Short:DBACL - digramic Bayesian classifier
Author:"Laird A. Breyer" laird at lbreyer.com
Uploader:Diego Casorran <dcr8520 amiga org>
Type:text/misc
Version:1.3
Architecture:m68k-amigaos
Date:2003-08-24
Download:text/misc/dbacl-1.3.lha - View contents
Readme:text/misc/dbacl-1.3.readme
Downloads:364

PURPOSE

dbacl is a command line program which can be used to categorize
several types of text documents. Each document category is
constructed as a maximum entropy language model, with respect to
a reference measure based on digrams (character pairs).

Before recognition can take place, a number of text corpora must 
be "learned". For example, an English category could be based on 
a text file containing the collected works of Shakespeare. The 
Gutenberg project (http://promo.net/pg/) makes freely available
many public domain works in electronic form.

After learning, any number of text files can be compared, in terms 
of Bayesian posterior probabilities, with up to 128 learned categories.
The actual number of categories is limited only by available memory.

dbacl is bundled with a few other utilities:

- bayesol is a postprocessor which takes the dbacl output and computes
  an optimal decision based on costs of misclassification. Together with
  dbacl, this allows the construction of sophisticated, multilingual, 
  classification scripts, if you're not afraid of some shell scripting.

- mailcross performs email classification cross validation. It can be used
  to assess the performance of custom email classification scripts based on
  dbacl and bayesol.

- mailinspect reads an mbox style mail folder and displays the emails in sorted
  order, based on similarity to any given category. 
 
DOCUMENTATION

See the bundled manpage. Generic instructions can be found in the file INSTALL.
A tutorial is to be found in the file tutorial.html, and an exposition of 
the algorithms is in dbacl.ps. 

LICENSE

DBACL is distributed under the terms of the GNU General Public License (GPL)
which can be found in the file COPYING. The hash function code used in the 
file jenkins.c is public domain, by Bob Jenkins.

BUILDING

There are several configuration options you can change in the file dbacl.h,
if you want to increase the maximum number of categories or optimize
hash table overhead. 

To build and install the program, you can execute the following steps from
within the source DBACL directory:

./configure
make
make install 

The last part should be executed with superuser privileges for system wide
installation. Alternatively

./configure --prefix=/home/xyzzy
make 
make install

builds and installs in user xyzzy's home directory, without the need for
root privileges. In this case, the following environment variables 
should be set permanently (e.g. in the file .profile):

PATH=$PATH:/home/xyzzy/bin
MANPATH=$MANPATH:/home/xyzzy/man

INTERNATIONALIZATION

dbacl uses the current locale for processing. 8-bit clean multibyte 
character sets (such as UTF-8) are supported in the default mode, 
and arbitrary multibyte character sets require the -i command line option. 
If you intend to use the -i option together with regular expressions,
you must build with a wide character POSIX regex library: ensure that
the BOOST library is present on the system and type

./configure WIDE_REGEX=1
make 
make install

Warning: there is a large performance penalty if you build dbacl this way,
which shows up whenever you use regular expressions. Only build this way if
you need correct regular expressions in a multibyte environment which isn't 
8-bit clean.

OTHER DEPENDENCIES

The main filter programs dbacl and bayesol have no special dependencies, and
can always be compiled. 

mailinspect uses the readline and slang libraries for screen management in
interactive mode. The configure script will check for these libraries and 
if it can't find them, mailinspect will be compiled without interactive support. 

mailcross is a bash shell script which calls awk and formail at various
points. It will test for the existence of these programs in your path and
refuse to run if it can't find them.

RUNNING

There is a tutorial which you can read with any web browser, point it to the
file tutorial.html. For command line options and examples of possible use, 
type after installation: 

man dbacl
man bayesol
man mailcross
man mailinspect

You can also find a technical description of the algorithms and statistics
in the postscript file dbacl.ps

TUTORIAL SAMPLES

The tutorial.html document comes with several sample text files:

- sample1.txt and sample4.txt are extracts from Mark Twain, Huckleberry Finn
- sample2.txt, sample3.txt, sample5.tx are extracts from Douglas Adams, 
  The Hitchhikers' Guide to the Galaxy

AUTHOR

Laird A. Breyer <laird@lbreyer.com>



ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`
`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø

Latest update of this package can be found at  http://amiga.sourceforge.net/

ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`ø°`
`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø`°ø




·············································A·r·c·h·i·v·e··C·o·n·t·e·n·t·s··
LhA Freeware Version 2.2
Copyright © 1991-94 by Stefan Boberg.
Copyright © 1998-2000 by Jim Cooper and David Tritscher.

Listing of archive 'dbacl-1.3.lha':
Original  Packed Ratio    Date     Time    Name
-------- ------- ----- --------- --------  -------------
   77560   37453 51.7% 15-Jan-03 21:19:12 +bayesol.040
  162288   60196 62.9% 15-Jan-03 21:17:22 +dbacl.040
    1566    1023 34.6% 05-Dec-02 01:34:44 +japanese.txt
    5977    2189 63.3% 15-Dec-02 06:51:14 +mailcross
    5757    2368 58.8% 29-Dec-02 01:32:38 +mailcross.1
  228180   86699 62.0% 15-Jan-03 21:07:48 +mailinspect
  226292   85822 62.0% 15-Jan-03 21:27:36 +mailinspect.040
    6199    2593 58.1% 29-Dec-02 01:36:30 +mailinspect.1
       0       0  0.0% 21-Nov-02 11:36:26 +NEWS
     168     124 26.1% 07-Dec-02 13:08:36 +prop.pl
    4525    2150 52.4% 29-Dec-02 01:25:44 +README
    3318    1674 49.5% 06-Dec-02 00:05:42 +sample1.txt
    2605    1318 49.4% 06-Dec-02 00:02:34 +sample2.txt
    3073    1535 50.0% 05-Dec-02 23:57:12 +sample3.txt
    3283    1653 49.6% 06-Dec-02 00:53:20 +sample4.txt
    3757    1851 50.7% 08-Dec-02 08:38:06 +sample5.txt
    4055    1869 53.9% 08-Dec-02 05:15:14 +sample6.txt
     136      96 29.4% 06-Dec-02 10:52:08 +toy.risk
   29582   10326 65.0% 15-Dec-02 07:01:20 +tutorial.html
    3274    1580 51.7% 12-Aug-02 04:16:10 +ylwrap
      31      31  0.0% 17-Oct-02 13:41:40 +AUTHORS
   77816   37613 51.6% 15-Jan-03 21:02:54 +bayesol
    4202    1844 56.1% 29-Dec-02 01:32:06 +bayesol.1
    1267     655 48.3% 29-Dec-02 01:26:08 +ChangeLog
   17992    7014 61.0% 12-Aug-02 04:16:10 +COPYING
  161124   59441 63.1% 15-Jan-03 20:59:38 +dbacl
   14851    5694 61.6% 29-Dec-02 01:31:36 +dbacl.1
  435463  182427 58.1% 29-Nov-02 01:13:30 +dbacl.ps
     318     166 47.7% 08-Dec-02 09:06:46 +example1.risk
     452     236 47.7% 08-Dec-02 09:06:46 +example2.risk
     492     258 47.5% 08-Dec-02 09:06:46 +example3.risk
-------- ------- ----- --------- --------
 1485603  597898 59.7% Operation successful.




_____________________________
.Readme created with:  MRea  \
==============================================================================
>»>»>»>»> Some additional info about this archive:

Source:   http://prdownloads.sf.net/amiga/dbacl-1.3.lha?download
FileSize: 599252 Bytes

CRC: EBCC5E0C
MD5: 7D06389E578478190ECF577E3B6F7F1E
SHA: 17F53C8D799561B112250241692A392E72135851
==============================================================================


Contents of text/misc/dbacl-1.3.lha
 PERMSSN    UID  GID    PACKED    SIZE  RATIO     CRC       STAMP          NAME
---------- ----------- ------- ------- ------ ---------- ------------ -------------
[generic]                37453   77560  48.3% -lh5- 7221 Jan 15  2003 dbacl-1.3/bayesol.040
[generic]                60196  162288  37.1% -lh5- 32da Jan 15  2003 dbacl-1.3/dbacl.040
[generic]                 1023    1566  65.3% -lh5- e132 Dec  5  2002 dbacl-1.3/japanese.txt
[generic]                 2189    5977  36.6% -lh5- 3796 Dec 15  2002 dbacl-1.3/mailcross
[generic]                 2368    5757  41.1% -lh5- d7e5 Dec 29  2002 dbacl-1.3/mailcross.1
[generic]                86699  228180  38.0% -lh5- d432 Jan 15  2003 dbacl-1.3/mailinspect
[generic]                85822  226292  37.9% -lh5- f424 Jan 15  2003 dbacl-1.3/mailinspect.040
[generic]                 2593    6199  41.8% -lh5- e6e2 Dec 29  2002 dbacl-1.3/mailinspect.1
[generic]                    0       0 ****** -lh0- 0000 Nov 21  2002 dbacl-1.3/NEWS
[generic]                  124     168  73.8% -lh5- d782 Dec  7  2002 dbacl-1.3/prop.pl
[generic]                 2150    4525  47.5% -lh5- 1986 Dec 29  2002 dbacl-1.3/README
[generic]                 1674    3318  50.5% -lh5- bdc4 Dec  6  2002 dbacl-1.3/sample1.txt
[generic]                 1318    2605  50.6% -lh5- 390c Dec  6  2002 dbacl-1.3/sample2.txt
[generic]                 1535    3073  50.0% -lh5- b443 Dec  5  2002 dbacl-1.3/sample3.txt
[generic]                 1653    3283  50.4% -lh5- 3f8c Dec  6  2002 dbacl-1.3/sample4.txt
[generic]                 1851    3757  49.3% -lh5- 32f5 Dec  8  2002 dbacl-1.3/sample5.txt
[generic]                 1869    4055  46.1% -lh5- ee9e Dec  8  2002 dbacl-1.3/sample6.txt
[generic]                   96     136  70.6% -lh5- 1be7 Dec  6  2002 dbacl-1.3/toy.risk
[generic]                10326   29582  34.9% -lh5- ba34 Dec 15  2002 dbacl-1.3/tutorial.html
[generic]                 1580    3274  48.3% -lh5- 7a6f Aug 12  2002 dbacl-1.3/ylwrap
[generic]                   31      31 100.0% -lh0- 125b Oct 17  2002 dbacl-1.3/AUTHORS
[generic]                37613   77816  48.3% -lh5- 5553 Jan 15  2003 dbacl-1.3/bayesol
[generic]                 1844    4202  43.9% -lh5- 30e2 Dec 29  2002 dbacl-1.3/bayesol.1
[generic]                  655    1267  51.7% -lh5- 8f5a Dec 29  2002 dbacl-1.3/ChangeLog
[generic]                 7014   17992  39.0% -lh5- 4902 Aug 12  2002 dbacl-1.3/COPYING
[generic]                59441  161124  36.9% -lh5- 7b94 Jan 15  2003 dbacl-1.3/dbacl
[generic]                 5694   14851  38.3% -lh5- 7b60 Dec 29  2002 dbacl-1.3/dbacl.1
[generic]               182427  435463  41.9% -lh5- 9f42 Nov 29  2002 dbacl-1.3/dbacl.ps
[generic]                  166     318  52.2% -lh5- e575 Dec  8  2002 dbacl-1.3/example1.risk
[generic]                  236     452  52.2% -lh5- df2d Dec  8  2002 dbacl-1.3/example2.risk
[generic]                  258     492  52.4% -lh5- 48c3 Dec  8  2002 dbacl-1.3/example3.risk
---------- ----------- ------- ------- ------ ---------- ------------ -------------
 Total        31 files  597898 1485603  40.2%            Aug 24  2003
Page generated in 0.02 seconds
Aminet © 1992-2024 Urban Müller and the Aminet team. Aminet contact address: <aminetaminet net>