?? changelog
字號(hào):
* vpc.c (bow_barrel_set_vpc_priors_by_counting): Fix crash that occurs if limited vocabulary causes all files in a class to be empty. * stoplist.c (bow_stoplist_add_word): New function. * rainbow-stats.pl (confusion): Print percentage correct for each category. * istext.c (bow_fp_is_text): Also return 0 for files that have more than 30% of their lines of the same length. This way we avoid files containing uuencoded blocks. * bow/libbow.h: Declare new function.Tue Apr 22 11:19:03 1997 Andrew McCallum <mccallum@jprc.com> * deflexer.c (bow_default_lexer): Add cast to initialization to avoid warning. Add a uniform, global way of keeping track of binary file format versions. * io.c (bow_file_format_version): New global variable. (bow_write_format_version_to_file): New function. (bow_read_format_version_from_file): New function. * bow/libbow.h (bow_file_format_version): Declare new global variable. (BOW_DEFAULT_FILE_FORMAT_VERSION): New macro. (bow_write_format_version_to_file): New function declaration. (bow_read_format_version_from_file): New function declaration. * rainbow.c (FORMAT_VERSION_FILENAME): New macro. (rainbow_archive): Write format version to disk. (rainbow_unarchive): Read it from disk if the file exists, otherwise set it to 3, which is the format version number of data before BOW_FILE_FORMAT_VERSION was added to the library. * rainbow.c (rainbow_options): New option "print-word-counts", alias for "print-counts-for-words". Hide the later option from the --help text. * rainbow-stats.pl (confusion): Print confusion matrix in a more readable format. Add new command-line option to rainbow for using only 0 or 1 word counts. * opts.c (bow_binary_word_counts): New global variable. (bow_options): New option "binary-word-counts". (parse_bow_opt): Handle it. * bow/libbow.h: Declare new global variable. * dv.c (bow_dv_add_di_count_weight): When BOW_BINARY_WORD_COUNTS is true, insist on keeping DV's entry count below 2, i.e. 0 or 1. Fri Apr 18 16:09:06 1997 Andrew McCallum <mccallum@jprc.com> * configure.in: Add -Wno-implicit to default CFLAGS. * rainbow.c (rainbow_lisp_query): Return if QUERY_WV is emtpy. (Previously would have crashed.) * tfidf.c (TFIDF_METHOD): Fix typo that defined _register_method_tfidf_.. functions without the last underscore. (Reported by Kamal Nigam.) * split.c (bow_test_split): When selecting documents for test set, and randomly pick a document that was already in the test set, don't just scan sequentially for the next non-test document, pick a new random number. This will avoid long contiguous stretches of test documents. * naivebayes.c (bow_naivebayes_score): Move the handling of SCORE_WITH_LOG_PROBABILITIES. * barrel.c (bow_barrel_set_cdoc_priors_to_class_uniform): Assert that CDOC->PRIOR must be greater or equal, not just greater.Thu Apr 10 14:54:08 1997 Andrew McCallum <mccallum@jprc.com> * rainbow-h.c: Fix the `compile-command'. (PRINT_TREE_SCORES): New macro. (hier_set_method): New function. (main): Call it if BOW_ARGP_METHOD is non-NULL. * deflexer.c (bow_default_lexer): Initialize it to -1, so that deflexer.o will get linked in under SunOS. Ug. See comment. * bow/libbow.h (bow_methods): Declare extern!Wed Apr 9 11:14:13 1997 Andrew McCallum <mccallum@jprc.com> * lex-html.c (bow_lexer_html_get_raw_word): Return last word in document, even if it is not followed by a non-word character! * lex-simple.c (bow_lexer_simple_get_raw_word): Likewise. * rainbow.c (rainbow_lisp_setup): Call all __attribute__((constructor)) functions here since this will be dynamically loaded and the contructor functions won't be called then. * opts.c (parse_bow_opt): Remove call to _bow_default_lexer_init(); moved to rainbow.c. Fix a bug whereby --skip-html was a no-op. * deflexer.c (bow_default_lexer_simple, bow_default_lexer_indirect, bow_default_lexer_gram, bow_default_lexer_html, bow_default_lexer_email): Change global variable from struct's to pointers to structs. (_bow_default_lexer_simple, _bow_default_lexer_gram, _bow_default_lexer_html, _bow_default_lexer_email): New static variables. (_bow_default_lexer_init): Set BOW_DEFAULT_LEXER_INDIRECT to point inside of BOW_DEFAULT_LEXER_GRAM, which is the BOW_DEFAULT_LEXER. * opts.c: Now use all default lexers as pointers to struct's instead of struct's. * bow/libbow.h (bow_default_lexer_simple, bow_default_lexer_indirect, bow_default_lexer_gram, bow_default_lexer_html, bow_default_lexer_email): Change global variable from struct's to pointers to structs. * vpc.c (bow_barrel_new_vpc_merge_then_weight): Assert the method name. * Makefile.in (dist-cmu, bow-$(BOW_VERSION).tar.gz): New targets.Tue Apr 8 08:00:00 1997 Andrew McCallum <mccallum@jprc.com> * Version (BOW_MINOR_VERSION): Version 0.7. * bow/libbow.h (BOW_MINOR_VERSION): Likewise. * rainbow.c (RAINBOW_MINOR_VERSION): Version 0.2. * arrow.c (ARROW_MINOR_VERSION): Version 0.2. * NEWS: Update for new version of library and rainbow. * readme.texi: Likewise. * Makefile.in (DIST_FILES): Add NEWS. * Makefile.in (dist): Fix invocation of `tr' for cvs rtag. * split.c (bow_test_next_wv): Initialize CURRENT_DI to avoid warning. * split.c (bow_test_split): Initialize DOC to avoid warning. * int4word.c (bow_words_keep_top_by_infogain): Initialize MAX_IG_WI to avoid warning. * dv.c (bow_dv_add_di_count_weight): Only give "overflowed short" message at BOW_VERBOSE level, not BOW_PROGRESS level. * crossbow.c (main): Initialize NORMALIZER to zero. * Makefile.in (dist): Create ./bow directory. Fix invocation of argp. (snapshot): Likewise. * configure.in: Add -O to the default CFLAGS. * rainbow.c (rainbow_options): Improve some option help text. (rainbow_parse_opt) [INFOGAIN_PAIR_VECTOR_KEY]: Handle it. * opts.c (bow_options): Improve some option help text. * Makefile.in (version.texi): Define BOWVERSION instead of BOW_VERSION, so makeinfo can get the value. (%.dvi, %.info): Fix typo. * libbow.texi: Fix typos and begin preliminary documentation. * rainbow.c (rainbow_options): New option "repeat"/'r'. (rainbow_parse_opt): Handle it. (rainbow_arg_state): New member REPEAT_QUERY. (rainbow_query): Attend to REPEAT_QUERY. * naivebayes.c (bow_naivebayes_set_weights): Fix assertion so it works for both naivebayes and crossentropy.Mon Apr 7 11:00:06 1997 Andrew McCallum <mccallum@jprc.com> * sarray.c (bow_sarray_entry_at_keystr): If there is no index for that KEYSTR, print an error message. This way if user mistypes a method name to rainbow's -m option, they get a message that makes some sense. * opts.c (_help_filter): New function to add the names of the available methods to the help text. (bow_argp): Put it in. Use strings to identify methods instead of integers. Separate method declarations instead separate .h files. * bow/tfidf.h, bow/naivebayes.h, bow/prind.h: New files. * Makefile.in (LIBBOW_H_FILES): Add files bow/naivebayes.h, bow/tfidf.h, bow/prind.h. * naivebayes.c (bow_method_naivebayes, bow_method_crossentropy): Use string method identifier instead of integer. * prind.c (bow_method_prind): Likewise. * tfidf.c (TFIDF_METHOD): Likewise. * rainbow.c (rainbow_parse_opt) [G]: Step through methods according to new BOW_METHODS bow_sarray, instead of old static array. * methods.c (bow_methods): Static array removed. (bow_methods): Renamed from _bow_str4method, and made non-static. * barrel.c (bow_method_id, _old_bow_methods): Put copies of what used to be in libbow.h here, so we can unarchive old-format barrel's. (BOW_DEFAULT_BARREL_VERSION): Changed from 2 to 3. (bow_barrel_new_from_data_fp): If VERSION_TAG is less than 3, read the method id integer and use _OLD_BOW_METHOD, otherwise, read a string and use new BOW_METHOD_AT_NAME(). (bow_barrel_write): Write the method as a string instead of as an integer. * Makefile.in (ALL_CPPFLAGS): -I$(srcdir) instead of -I$(srcdir)/bow. * All files: Include <bow/libbow.h> instead of "libbow.h". * bow/libbow.h: Include <bow/tfidf.h>, <bow/naivebayes.h>, <bow/prind.h>. (bow_method_register_with_name, bow_method_at_name): Declare functions. (bow_method_id): Typedef removed. (bow_str_to_method_id): Macro removed. (bow_methods): Global variable removed. (bow_method_tfidf_words, bow_method_tfidf_log_words, bow_method_tfidf_log_occur, bow_params_tfidf): Removed. (bow_method_prind, bow_params_prind): Removed. (bow_method_naivebayes, bow_params_naivebayes): Removed. * methods.c (bow_method_at_name): Comment function. (bow_method_register_with_name): Likewise. * opts.c (parse_bow_opt) [m]: Use bow_method_at_name(). * naivebayes.c: Use bow_method_register_with_name(). Add new method "crossentropy". (bow_naivebayes_score): Pay attention to SCORE_WITH_LOG_PROBABILITIES when setting class priors. When it is true, use inverse of cross-entropy instead of negative! * prind.c: Use bow_method_register_with_name(). * tfidf.c: Use bow_method_register_with_name(). * rainbow.c (main): Strip any trailing `/'s from classnames, so FILENAME_TO_CLASSNAME() will find the classnames. (Reported by Jason Rennie <jr6b@syrinx.res.cmu.edu>.) * rainbow-h.c (PRINT_COUNTS_FOR_WORD_KEY): New macro. (rainbowh_options): New option "print-counts-for-words". (rainbowh_parse_opt): Handle it. (struct rainbowh_arg_state): New member PRINTING_WORD. (hier_barrel_print_word_counts): New function. (main): Handle new option. Do the right think for `-O' if BOW_PRUNE_VOCAB_BY_OCCUR_COUNT_N. * info_gain.c (LEAVE_OUT_LAST_CLASS): Macro defined once at top. Changed from 0 to 1. * install.texi: Explain the results of --prefix. Remove old references to Objective C installation.Thu Apr 3 12:50:23 1997 Andrew McCallum <mccallum@jprc.com> * rainbow.c (rainbow_test_files): Use macros for setting QUERY_WV weights, so we can handle case in which the wv normalizer is NULL! (main): Replace code for implementing word-count-printing with call to new function. * barrel.c (bow_barrel_set_cdoc_priors_to_class_uniform): Initialize ci2dc entries to zero! (bow_barrel_print_word_count): New function. * opts.c (bow_options): Add new option "naivebayes-score-with-log-probs". (parse_bow_opt): Handle it. * naivebayes.c (bow_naivebayes_score): Begin adding code to support SCORE_WITH_LOG_PROBABILITIES parameter; not yet finished. (bow_naivebayes_params): Add initializer for SCORE_WITH_LOG_PROBABILITIES, initialize it BOW_NO. * bow/libbow.h: Declare new function. (bow_params_naivebayes): New entry SCORE_WITH_LOG_PROBABILITIES.Wed Apr 2 10:07:30 1997 Andrew McCallum <mccallum@jprc.com> * configure.in: Add a check to see if __attribute__((constructor)) works. If it does not, define CONSTRUCTOR_FAILS. * rainbow.c (rainbow_lisp_setup): Fix typo. * Makefile.in ($(PERL_RUNNABLE_FILES)): Use % in pattern and $< in rule so that we get the .pl file from the $(srcdir). * rainbow-h.c (rainbowh_options): New option "print-infogain-vector", 'I'. (struct rainbowh_arg_state): Add state for it. (rainbowh_parse_opt): Handle it. (hier_barrel_write_to_file): Close the FP after writing a barrel. (hier_barrel_set_vpc_with_weights): Construct and pass a CLASSNAMES array. (hier_barrel_set_cdoc_priors_to_class_uniform): New function. (_hier_barrel_set_node_scores): Print a little header/separator if BOW_PRINT_WORD_SCORES. (hier_barrel_test): Initialize the QUERY_WV to NULL, so BOW_TEST_NEXT_WV doesn't try to free unallocated memory. (hier_barrel_print_infogain): New function. (rainbowh_archive): New function. (rainbowh_unarchive): New function. (main): Use above two functions. Deal with printing infogain. * rainbow.c: Re-written for using libargp. This should make it work with the WebKB lisp crawler again. * prind.c (bow_prind_score): Make sure CDOC->FILENAME is non-NULL before trying to print it when BOW_PRINT_WORD_SCORES is true. * opts.c (parse_bow_opt) [ARGP_KEY_INIT]: Call _bow_default_lexer_init(). * deflexer.c (_bow_default_lexer_init): Don't make it static. Use static local variable to make sure we don't run through it twice. This is because we will call is explicitly in opts.c:parse_bow_opt(), because __attribute__ ((constructor)) doesn't seem to work on SunOS. * Makefile.in (PERL_FILES): Added rainbow-ac.pl and rainbow-pr.pl. * (rainbow-ac.pl, rainbow-pr.pl): New files from Dayne Freitag <dayne@cs.cmu.edu>.Tue Apr 1 10:11:03 1997 Andrew McCallum <mccallum@jprc.com> * rainbow-h.c (rainbowh_parse_opt): Implement option 'M' for use_maximum_likelihood_path. (hier_default_method): Renamed from METHOD; all uses changed. (hier_barrel): New member NUM_NON_REST_CDOCS, to keep track of DOC_BARREL->CDOCS->LENGTH *before* the `rest' documents start getting added, so that we can implement HIER_PARENT_DI_TO_CHILD_INDEX_AND_DI properly. (hier_barrel_new): Initialize it to -1. (hier_barrel_add_child): Set it. (hier_barrel_new_from_text_dir_leaf): Set it. (hier_barrel_write_to_file): Write it. (hier_barrel_new_from_file): Read it. (hier_parent_di_to_child_index_and_di): Use it. (hier_barrel_print): Print it instead of DOC_BARREL->CDOCS->LENGTH.
?? 快捷鍵說明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號(hào)
Ctrl + =
減小字號(hào)
Ctrl + -