?? changelog
字號:
(hier_barrel_add_stats): New function split out from HIER_BARREL_ADD_CHILD. (hier_barrel_add_child): Use it. (hier_barrel_add_rest): New function. (hier_barrel_new_from_text_dir): Call it to add `rest' documents. (hier_barrel_test): Allocate space for 3 as many SCORES, to make room for the `rest' classes. (main): Set HIER_DEFAULT_METHOD from BOW_ARGP_METHOD, if non-NULL. * scale.c (bow_barrel_scale_weights_by_given_infogain): Only verbosify every 100 words. (bow_barrel_scale_weights_by_given_foilgain): Likewise. * vpc.c (bow_barrel_set_vpc_priors_by_counting): Fix indentation. * rainbow-h.c: Converted to do command-line argument processing with libargp. * opts.c (bow_options): Remove "version" 'V' option. libargp can handle that automatically. (_print_version): New function to print both program version and library version. (argp_program_version_hook): Set it to _PRINT_VERSION(). * rainbow.c (rainbow_print_usage): Function removed. Libargp does that now.Mon Mar 31 11:07:30 1997 Andrew McCallum <mccallum@jprc.com> * barrel.c (bow_barrel_set_cdoc_priors_to_class_uniform): Use ALLOCA() instead of BOW_MALLOC() to avoid memory leak. * Makefile.in (configure, config.status): Sprinkle with $(srcdir). * configure.in: Move the setting of CFLAGS above AC_PROC_CC, so that it will have an effect. * install.texi: Mention how to set CPPFLAGS in the ./configure line. * vpc.c (bow_barrel_set_vpc_priors_by_counting): Properly set the CDOC->PRIOR's. * rainbow.c (INFOGAIN_PAIR_VECTOR_KEY): New macro. (rainbow_options): New option "infogain-pair-vector". (rainbow_parse_opt): Handle it. (main): Likewise. When RAINBOW_WORD_COUNT_PRINTING, also print the total number of words in each class. * prind.c (bow_prind_set_weights): Get MAX_WI from MIN of WI2DVF->SIZE and BOW_NUM_WORDS(), not just BOW_NUM_WORDS(). * opts.c (bow_uniform_class_priors): New global variable. (bow_options): New option "uniform-class-priors". (parse_bow_opt): Handle it. * naivebayes.c (bow_naivebayes_set_weights): Get MAX_WI from MIN of WI2DVF->SIZE and BOW_NUM_WORDS(), not just BOW_NUM_WORDS(). (bow_naivebayes_score): Pay attention to BOW_UNIFORM_CLASS_PRIORS. Don't sum in score of words that don't have a DV entry! Previously we were allowing words that `aren't in the vocabulary' of the BARREL to contribute! This was wrong. They were contributing according to the Laplace Estimators, and classes with larger numbers of words were getting penalized. * info_gain.c (bow_infogain_per_wi_new): Sum floating point CDOC->PRIOR's instead of increment integer count of documents, so that infogain can be calculated from documents with different `weights'. (bow_infogain_per_wi_new_using_pairs): New function. For now it prints its results instead of returning them. * barrel.c (bow_barrel_set_cdoc_priors_to_class_uniform): New function. * bow/libbow.h: Declare new functions.Mon Mar 31 11:56:48 1997 Andrew McCallum <mccallum@cs.cmu.edu> * Makefile.in (CFLAGS, CPPFLAGS): Get values from configure. * configure.in: Do AC_SUBST() for CPPFLAGS and CFLAGS.Fri Mar 28 10:28:26 1997 Andrew McCallum <mccallum@jprc.com> * rainbow-h.c: Fix spelling: "heir" -> "hier". How embarrassing! * dv.c (bow_dv_new_from_data_fp): Fix typo in feof() assertion. (Reported by Doreen Cheng <dcheng@PRPA.Philips.COM>.) * rainbow.c (PRINT_COUNTS_FOR_WORD_KEY): New macro. (rainbow_options): New option "print-counts-for-word". (rainbow_parse_opt): Handle it. (main): Implement it. * bow/libbow.h: (bow_wi2dvf): Add new element to structure: `num_words'. (bow_barrel): Put `is_vpc' at end of structure instead of the beginning. * wi2dvf.c (bow_wi2dvf_new): Initialize NUM_WORDS. (bow_wi2dvf_add_di_wv): Increment it. (bow_wi2dvf_add_wi_di_count_weight): Likewise. (bow_wi2dvf_new_from_data_fp): Likewise. (bow_wi2dvf_remove_wi): Decrement it. (bow_wi2dvf_print_stats): Print it. * prind.c (bow_prind_set_weights): Use BARREL->WI2DVF->SIZE and BARREL->WI2DVF->NUM_WORDS instead of BOW_NUM_WORDS(). In particular, this will allow us to set the Laplace estimators using the correct number of words in the barrel, not the arbitrary libbow-wide vocabulary size. Properly use CDOC->WORD_COUNT instead of overloading CDOC->NORMALIZER. (bow_prind_score): Likewise use BARREL->WI2DVF->SIZE and BARREL->WI2DVF->NUM_WORDS instead of BOW_NUM_WORDS(). (bow_print_word_scores): Removed to opts.c. * opts.c (bow_print_word_scores): Global variable moved here from prind.c. (bow_options): New option "print-word-scores". (parse_bow_opt): Handle it. * naivebayes.c (bow_naivebayes_set_weights): Use BARREL->WI2DVF->SIZE and BARREL->WI2DVF->NUM_WORDS instead of BOW_NUM_WORDS(). In particular, this will allow us to set the Laplace estimators using the correct number of words in the barrel, not the arbitrary libbow-wide vocabulary size. (bow_naivebayes_score): Likewise, and add code to print scores contributions of each word with BOW_PRINT_WORD_SCORES is non-NULL. (SCORE_WITH_LOG_PROBABILITIES): New macro. * barrel.c (bow_barrel_printf): Comment out the code that would skip over documents that are not of type `model'.Thu Mar 27 11:29:34 1997 Andrew McCallum <mccallum@jprc.com> * rainbow-stats.pl: Make output labels more descriptive. Say `average percentage accuracy'. * split.c (bow_test_split): Use the micro-seconds field from gettimeofday() instead of time() to set the random number generator seed. Otherwise, if we re-call this function too quickly we'll get exactly the same seed! ...because time() returns a number of seconds. * demos/script: New shell script file that will demo rainbow, with running commentary. * demos/data: New directory containing 20 articles 2 newsgroups. This is for use with demos/script. * install.texi: Remove mention of `checks' and `examples' directory; they don't exist. (Reported by Doreen Cheng <dcheng@PRPA.Philips.COM>.)Mon Mar 24 12:07:53 1997 Andrew McCallum <mccallum@jprc.com> * Makefile.in (rainbow-lisp.o): Use $(ALL_CPPFLAGS) and $(ALL_CLFAGS) instead of non-ALL versions. * rainbow.c (rainbow_lisp_setup): Rewrite for use with libargp. * methods.c (bow_method_at_name): Fix typo. (bow_method_at_index): Likewise. * opts.c (parse_bow_opt): Use 'g' instead of 'N' for setting gram size. * rainbow.c (rainbow_lisp_query): Free the QUERY_WV before returning! * methods.c (bow_method_register_with_name): New function. (bow_method_at_name): New function. * arrow.c (PRINT_IDF_KEY): New macro. (arrow_options): Add new option "print-idf". (struct arrow_arg_state): New enum ARROW_PRINTING_IDF. (arrow_index): Prune the vocabulary if BOW_PRUNE_VOCAB_BY_OCCUR_COUNT_N is non-zero. (main): Add code to print idf values. * lex-simple.c (bow_alpha_lexer, bow_alpha_only_lexer, bow_white_lexer): Initialize STEM_FUNC to 0 instead of BOW_STEM_PORTER. * tfidf.c (bow_tfidf_set_weights): Comment out code that sets total_word_count. Do the DF_TRANSFORM on DF, not on IDF! Otherwise we get negative IDF's. * rainbow-h.c (use_maximum_likelihood_path): New global variable. (_heir_barrel_set_node_scores): Use it. (main): Set it when -M passed on command line. (num_top_words): Moved from main-local variable to global. (heir_barrel_test): Reduce vocab by infogain.Fri Mar 21 14:02:39 1997 Andrew McCallum <mccallum@jprc.com> * bow/libbow.h (bow_lexer_simple): Add entry TOSS_WORDS_LONGER_THAN. (bow_wv_set_weights_to_count_times_idf): Declare new function. * wv.c (bow_wv_set_weights_to_count_times_idf): New function. * tfidf.c (bow_tfidf_set_weights): Comment out code saying that TFIDF is broken. Rewrite the way IDF is calculated. (bow_tfidf_score): Set and normalize the QUERY_WV weights here (even though it is redundant) so that we can properly use the IDF from the BARREL when normalizing weights. Normalize the QUERY_WV weight when incrementing CURRENT_SCORE. * prind.c (bow_prind_set_weights): Skip a document if it does not of type model, both when setting NORMALIZER and TOTAL_TERM_COUNT, and when setting weights. (bow_prind_score): Skip a document if it does not of type model. * lex-simple.c (bow_lexer_simple_postprocess_word): Add code to toss words longer than SELF->TOSS_WORDS_LONGER_THAN. Set WORDLEN at beginning. It appeared that it was getting used uninitialized before! (bow_alpha_lexer, bow_alpha_only_lexer, bow_white_lexer): Add value for new field TOSS_WORDS_LONGER_THAN. * opts.c (APPEND_STOPLIST_FILE_KEY): New macro. (bow_options): Added "append-stoplist-file" (parse_bow_opt): Handle new option. * int4str.c (_str2id): Return the absolute value of the old return value. Sometimes with really long strings, the return value was going negative. (_str_hash_lookup): Assert that ID is non-negative.Thu Mar 20 11:47:49 1997 Andrew McCallum <mccallum@jprc.com> These changes by Karl Kleinpaste <karl@jprc.com> * int4word.c (bow_words_reread_from_file): Use fopen() instead of bow_fopen(), so we are sure not to call abort(). * wv.c (bow_wv_sprintf): Fix function to account for length troubles properly. (bow_wv_sprintf_words): New function, prints the words themselves, rather than the word indices. * bow/libbow.h: Declare new function. * naivebayes.c (bow_naivebayes_set_weights): Add commented-out code that forces all counts to either 0 or 1. This was used on some experiments with Shumeet. * lex-html.c (bow_lexer_html_get_raw_word): Add a ! to the FALSE_TO_END condition test, so we don't end the tokenization too early.Tue Mar 18 14:47:35 1997 Andrew McCallum <mccallum@jprc.com> * rainbow.c (rainbow_parse_opt) [ARGP_KEY_END]: Print a useful error when only one classname is given. (main): Check for rainbow_infogain_printing properly. * opts.c (parse_bow_opt) [ARGP_KEY_END]: Check for the existance of BOW_DATA_DIRNAME in a way that works even when the directory is owned by someone else. * bow/libbow.h (bow_fread_string): Assert that the string length is non-negative. * barrel.c (_bow_barrel_version): New variable. (BOW_DEFAULT_BARREL_VERSION): New macro. (bow_barrel_new_from_data_fp): Read the version number instead of a null_tag. (bow_barrel_write): Likewise, for writing. * arrow.c (main): Remove redundant code that is now in opts.c.Mon Mar 17 12:09:32 1997 Andrew McCallum <mccallum@jprc.com> * Makefile.in (%.o:%.c): Fix the order on this pattern rule. ($(DEMO_EXECUTABLES):%:%.o): Put $(DEMO_EXECUTABLES) at the beginning of this pattern, so it matches only those files. * arrow.c: Don't include getopt.h; we're using argp.h instead. (arrow_index): Fix typo. * configure.in: Don't look for getopt.h anymore. We don't need it now that we are using libargp. * configure.in: AC_INIT looking for int4str.c instead of libbow.h. * Makefile.in (%): Use this pattern to make DEMO_EXECUTABLES instead of listing them all. This avoids making all the .o's for one of the DEMO_EXECUTABLES. * rainbow.c: Converted to use argp command-line argument processing. * opts.c (bow_argp_method): Renamed from bow_default_method. (parse_bow_opt) [ARGP_KEY_INIT]: Add words to stoplist. * deflexer.c (_bow_default_lexer_init): Initialize bow_default_lexer to BOW_DEFAULT_LEXER_GRAM, not BOW_LEXER_GRAM! * bow/libbow.h (bow_argp_method): Renamed from bow_default_method. * arrow.c (arrow_parse_opt) [q]: Set query.filename. (arrow_index): BOW_DEFAULT_METHOD renamed to BOW_ARGP_METHOD. * arrow.c (arrow_index): Set the method according to BOW_DEFAULT_METHOD. * opts.c: Fleshed out into first working version. * error.c: Comment fix. Include libbow.h and stdio.h. * deflexer.c (_bow_default_lexer_init): New constructor function. (bow_default_lexer_simple, bow_default_lexer_indirect, bow_default_lexer_gram, bow_default_lexer_html, bow_default_lexer_email): New variables, default instantiations of lexers. * bow/libbow.h: Add argp declarations. (bow_argp_children): New variable. (bow_prune_vocab_by_infogain_n): New variable. (bow_prune_vocab_by_occur_count_n): New variable. (bow_default_method): New variable. (bow_data_dirname): New variable. * arrow.c: Convert to using argp for command-line processing. * Makefile.in: Change all instances of `libbow.h' to `bow/libbow'. (includedir): Add `/bow' to end. (LIBBOW_C_FILES): Add opts.c. (ALL_CPPFLAGS): add -I$(srcdir)/bow and -I$(srcdir)/argp. (rainbow-lisp.o): Use $< instead of rainbow.c, so VPATH will find it when compiling in a different directory than the source. * bow/libbow.h (STRINGIFY): New macro. (bow_default_lexer_simple, bow_default_lexer_indirect, bow_default_lexer_gram, bow_default_lexer_html, bow_default_lexer_email): Declare default instantiations of lexers.
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -