?? changelog
字號:
warning. --test-files-loo should now work. * prind.c: Convert scoring function to take LOO_CLASS arguement. * kl.c: Likewise. * naivebayes.c: Likewise. * tfidf.c: Likewise. * evi.c: Likewise. * rainbow.c: Call scoring function with LOO_CLASS argument. * bow/libbow.h (bow_barrel_score): Add extra LOO_CLASS argument. (bow_method): Likewise to (*score) member. * rainbow-ac.pl: Make sure last confidence number gets printed properly. Before it was always just zero. * rainbow.c (rainbow_options): Added "test-files-loo" for Leave-One-Out testing. Not implemented yet, however. (struct rainbow_arg_state): New member LOO_CV. (rainbow_query): Do proper checks before using lisp score truncation. (rainbow_test): Likewise. Also, add (commented out) code to print more stats. (main): Call _register_method_evi() to make sure it gets linked in. * Makefile.in (LIBBOW_C_FILES): Added evi.c. * evi.c: New file. * naivebayes.c (bow_naivebayes_set_weights): Add checks that make sure that Sum_w Pr(w|c) is 1 for all classes. * kl.c (bow_kl_score_loo): Implement normalized KL scores, with Witten-Bell discounting. (NOTE: NaiveBayes does not yet have Witten-Bell implemeted. Thus the accuracy of Witten-Bell can be easily compared with Laplace by comparing "kl" with "naivebayes".) * rainbow-stats.pl (confusion): Initialize $MAX_CLASSNAME_LENGTH to the length of "classname", so that we still get proper formatting with very short classnames. * istext.c (bow_fp_is_text): Temporarily comment out the code that tries to avoid files with uuencoded blocks, because the current scheme also seems to avoid many HTML files. (Reported by Sean Slattery.) Warning, trying to index the 20_newsgroups data in this state will give bad results.Mon Jun 23 11:59:50 1997 Andrew McCallum <mccallum@jprc.com> * prind.c (bow_prind_score): Comment fixes. Describe the smoothing situation accurately. * int4word.c (bow_words_keep_top_by_infogain): Don't try to "keep" more words than are available in the BARREL! (Bug reported by Daniel A Dipasquo <greenface+@CMU.EDU>.) If NUM_WORDS_TO_KEEP is greater than or equal to the number of words in the BARREL, put all these words in the new vocabulary.Wed Jun 11 16:40:14 1997 Andrew McCallum <mccallum@jprc.com> * rainbow.c (rainbow_test): Don't do CommonLisp score truncation if the score is negative. (This change should be made to other score-printing functions too.) (main): Gratuitously call _register_method_kl(), so that kl.c gets linked in with the rainbow executable. * kl.c (_register_method_kl): Make sure we can't register the method twice, even if this function is called twice. * naivebayes.c (bow_naivebayes_score_loo): When using uniform class priors, set SCORES[CI] based on log of uniform distribution of classes, not to 1. When setting log_pr_tf, instead of using pow() before taking the log(), just multiply after using log(). * Makefile.in (LIBBOW_C_FILES): Added kl.c.Fri Jun 6 09:48:06 1997 Andrew McCallum <mccallum@jprc.com> * rainbow.c (NO_LISP_SCORE_TRUNCATION_KEY): New macro. (rainbow_options): New option "no-lisp-score-truncation". (rainbow_parse_opt): Handle it. (struct rainbow_arg_state): New member USE_LISP_SCORE_TRUNCATION. (rainbow_query): Obey it. (rainbow_test): Likewise. (main): Make its default value 1.Tue Jun 3 10:31:10 1997 Andrew McCallum <mccallum@jprc.com> * readme.texi: Use BOWVERSION, not BOW_VERSION to match version.texi.Thu May 29 15:25:06 1997 Andrew McCallum <mccallum@jprc.com> * Version (BOW_MINOR_VERSION): Version 0.8. * bow/libbow.h (BOW_MINOR_VERSION): Version 0.8. * docnames.c (bow_map_filenames_from_dir): Remove local variables no longer used.Mon May 26 12:59:50 1997 Andrew McCallum <mccallum@jprc.com> * rainbow.c (main): New commented-out code for computing the number of word co-occurrences.Fri May 23 11:34:05 1997 Andrew McCallum <mccallum@jprc.com> * rainbow.c (USE_VOCAB_IN_FILE_KEY): New macro. (rainbow_options): New option "use-vocab-in-file". (rainbow_parse_opt): Handle it. (struct rainbow_arg_state): New member VOCAB_MAP. (rainbow_query): Use it to remove words from the vocabulary. (rainbow_test): Likewise. (main): Likewise. * rainbow-stats.pl (prune_from_classname): New global variable. A regular expression to be removed from the end of classnames before gathering stats on them. This allows us to gather stats on performance in the middle of class hierarchies. (read_trial): Use it. * int4str.c (bow_int4str_new_from_text_file): Return MAP instead of NULL! * barrel.c (bow_barrel_prune_words_not_in_map): Define MAX_WI and use it, so we don't ask for word indices larger than bow_num_words(). (bow_barrel_print_word_count): Also print word probability according to counts. * rainbow-h.c (main) [printing_word_counts]: Print word that is being counted.Wed May 21 15:01:51 1997 Andrew McCallum <mccallum@jprc.com> * barrel.c (bow_barrel_prune_words_not_in_map): Remove the words instead of hiding them, so that future bow_keep_top_words_by_infogain() calls won't unhide them. This version got 46% on hier/yahoo-science (dataset with a 10 document-per-class threshold). * rainbow-h.c (rainbowh_options): Added --use-vocab-in-file command-line option. (rainbowh_arg_state): Added PARENT and CI_IN_PARENT. Added HIER_LEAF. Removed printing of leaf- and intermediate-results. (hier_barrel_prob_wi_in_ci): New function. (check_prob_wi_in_ci): New function. (_hier_barrel_local_score): New function. (_hier_barrel_set_node_scores): Use it. (hier_barrel_print_infogain): Print FULL_NAME with interspersed spaces, so it won't get lexed by bow_int4str_new_from_text_file(). (main): Change defaults. Before populate_by_scoring=1 and hier_structure=hier_niece. Populate branches first thing, and check prob_wi_ci consistency. * naivebayes.c (bow_naivebayes_score_loo): Comment change. * int4str.c (bow_int4str_new_from_text_file): New function. * bow/libbow.h: Declare new functions.Tue May 20 16:02:24 1997 Andrew McCallum <mccallum@jprc.com> * barrel.c (bow_barrel_prune_words_not_in_map): New function.Mon May 19 09:52:09 1997 Andrew McCallum <mccallum@jprc.com> * rainbow-stats.pl (confusion): Calculate longest classname and use it to fix indentation. * wi2dvf.c (bow_wi2dvf_add_di_wv): Set SEEK_START to special flag 2. (bow_wi2dvf_add_wi_di_count_weight): Likewise. (bow_wi2dvf_hide_wi): Decrement WI2DVF->NUM_WORDS in the right place. (bow_wi2dvf_unhide_all_wi): Increment WI2DVF->NUM_WORDS. (bow_wi2dvf_write): Unhide all words first. (bow_wi2dvf_dv): Change assertion to deal with special flag 2. * rainbow.c (main): Pass new argument to bow_infogain_per_wi_print(). * rainbow-h.c: Misc changes. Print infogain during run. (hier_barrel_set_local_class_model): Add IS_ROOT argument. Unhide vocabulary after pruning by infogain, so lower levels get all words. * naivebayes.c (M_EST_M): New macro. (M_EST_P): New macro. (bow_naivebayes_score_loo): Use them to implement M-estimates, instead of old Laplace smoothing. * info_gain.c (bow_infogain_per_wi_print): Add FP argument. * bow/libbow.h: Add argument to infogain function. * barrel.c: Fix the math for assigning CDOC->PRIOR, and add assertion checks.Fri May 16 10:19:19 1997 Andrew McCallum <mccallum@jprc.com> This was state of code on Thursday night. * rainbow-h.c: Add options for changing population scheme and tree structure. Add ability to output intermediate and leaf results. * naivebayes.c (WORD_PRIOR_COUNT): New macro. Current value 1.0. (bow_naivebayes_score_loo): Use it.Thu May 15 16:22:27 1997 Andrew McCallum <mccallum@jprc.com> * rainbow.c (rainbow_test): Assert that the ACTUAL_NUM_HITS returned by bow_barrel_score() is the same as the NUM_HITS_TO_RETRIEVE requested. * split.c (bow_test_split): Use rand() properly so that the number of test documents in each class are not so biased. Add special code that *ensures* that the test documents are evenly distributed across classes. * rainbow.c (rainbow_print_weight_vector): Don't use CDOC->NORMALIZER if the method is "naivebayes", because NaiveBayes doesn't use it. Previously the printed values were bogus.Wed May 14 11:02:44 1997 Andrew McCallum <mccallum@jprc.com> * rainbow-h.c: -q RAINBOWH_QUERYING now seems to work. * naivebayes.c (bow_naivebayes_score_loo): Add assertion that CDOC->PRIOR is greater than zero. This restriction should be relaxed! * array.c (bow_array_free): Decrement length after testing for non-zero-ness, not before. Without this change, empty arrays would call free() on un-malloc'ed() memory.Tue May 13 18:16:31 1997 Andrew McCallum <mccallum@jprc.com> * rainbow-h.c: Add code for doing selective population of lower branches. This population seems to be working. Querying/scoring does not yet work. * wi2dvf.c (bow_wi2dvf_hide_wi): Change assertion to "if" so that we won't crash if we try to hide words that are already hidden. * split.c (bow_tmp_word_struct2): New type. (bow_model_next_wv): New function. (bow_nontest_next_wv): New function. * rainbow.c (rainbow_options): Fix documentation for test-files. (rainbow_test): Choose vocabulary by info gain *after* the test/train split. Add temporary code to test bow_naivebayes_score_loo(). Remove this later! * naivebayes.c (bow_naivebayes_score_loo): New function, copy of bow_naivebayes_score_loo, with extra code to do leave-one-out testing if argument LOO is non-negative. (bow_naivebayes_score): Call above function with -1 for LOO. (bow_method_naivebayes): Change NORMALIZE_WEIGHTS from bow_barrel_normalize_weights_by_summing() to NULL. The normalizing function was not taking account of the Laplace smoothing numbers, and was giving incorrect weights. (bow_method_crossentropy): Likewise. * istext.c (bow_fp_is_text): Increase NUM_LINE_LENGTHS to NUM_TEST_CHARS to avoid potential crash. * docnames.c (bow_map_filenames_from_dir): For directory names and filenames, make it use names of soft links, not the directories that the links point to. * barrel.c (bow_barrel_add_document): New function. * bow/libbow.h: Declare new function. * docnames.c (bow_map_filenames_from_dir): Change commented-out code so that, if uncommented, this function will work if you pass it a filename instead of a directory name.Tue May 6 15:30:30 1997 Andrew McCallum <mccallum@jprc.com> * Makefile.local (rainbow-h): Make it depend on libbow.a. * rainbow-h.c: May 5 changes from Andrew Ng. (rainbowh_unarchive): Switch order of unarchiving for vocabulary and hier_barrel. (hier_barrel_new_from_file): Use bow_barrel_new_from_data_file() instead of bow_barrel_new_from_fp(), so we close FILE*'s instead of keeping them open. Otherwise we run out of UNIX's available open file descriptor's. * wi2dvf.c (FREE_WHEN_HIDING_WI): New macro. (bow_wi2dvf_hide_wi): Heed it. (bow_wi2dvf_dv): Don't check to make sure that WI is less than bow_num_words(). Check SEEK_START before returning a non-NULL DV, because if SEEK_START is less than -1, the DV should be considered `hidden'. * opts.c (bow_exclude_filename): New global variable. (bow_options): New option "exclude-filename". (parse_bow_opt): Handle it. * docnames.c (bow_map_filenames_from_dir): Make sure BOW_EXCLUDE_FILENAME is non-NULL before passing it to strcmp(). * bow/libbow.h (bow_exclude_filename): Declare new global variable. * barrel.c (bow_barrel_set_cdoc_priors_to_class_uniform): Use bow_malloc() instead of alloca(), so that bow_realloc() will work. free() it at the end. (bow_barrel_new_from_data_file): New function.Mon May 5 21:08:34 1997 Andrew McCallum <mccallum@jprc.com> * rainbow-h.c: Changes by Andrew Ng, before Andrew McCallum's changes to close barrel FP's.Fri May 2 09:53:12 1997 Andrew McCallum <mccallum@jprc.com> * rainbow-h.c: Additions by Andrew Ng to implement cousin scheme.Wed Apr 30 10:48:30 1997 Andrew McCallum <mccallum@jprc.com> * Makefile.in: Include Makefile.local, avoiding error if it isn't present. * barrel.c (bow_barrel_keep_top_words_by_infogain): Unhide and hide the DVF's instead of removing them, so that we can call this function mulitple times with increasing NUM_WORDS_TO_KEEP. * wi2dvf.c (bow_wi2dvf_hide_wi): New function. (bow_wi2dvf_unhide_all_wi): New function. (bow_wi2dvf_dv): Handle new negative values of SEEK_START set by BOW_WI2DVF_HIDE_WI(). * bow/libbow.h: Declare new functions. (bow_doc_type): Add ignored_model, for rainbow-h.c.Thu Apr 24 09:03:10 1997 Andrew McCallum <mccallum@jprc.com>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -