?? changelog
字號:
Fri Mar 14 11:01:14 1997 Andrew McCallum <mccallum@jprc.com> * Makefile.in (LIBBOW_C_FILES): Renamed defparser.c to deflexer.c. * deflexer.c: Renamed from defparser.c. Add the `argp' subdirectory, and incorporate it into the Makefile. * HACKING: Add argp autoconf instruction. * configure.in: Call AC_CONFIG_SUBDIRS to configure argp also. * Makefile.in (ALL_LIBS): Move it closer to $(DEMO_EXECUTABLES) target. $(DEMO_EXECUTABLES): Make this target depend on argp/libargp.a. (install): Call make install in argp directory also. (dist, snapshot): Call make in argp directory to include its files too. Wed Mar 12 20:00:27 1997 Andrew McCallum <mccallum@jprc.com> * Makefile.in (CPPFLAGS): Don't include $(DEFS) here, it's now in ALL_CPPFLAGS. * Makefile.in (ALL_CPPFLAGS): New variable. (ALL_CFLAGS): New variable. (.c.o): New pattern rule that uses above new variables. Now Kamal can safely type `make CPPFLAGS=-DNDEBUG'. * rainbow-h.c (_heir_barrel_set_node_scores): Don't threshhold the scores to 0/1. (strdup): New function. Implement this local version to help with debugging. Consider removing it later. * libbow.h (bow_params_prind): Remove variable SCALE_BY_FOILGAIN. It isn't needed since we have a function pointer for it in BOW_METHOD. * prind.c (bow_prind_params): Remove BOW_NO for scaling. * rainbow.c (rainbow_lisp_setup): Remove setting of BOW_PRIND_SCALE_BY_INFOGAIN; it now defaults to on. (rainbow_print_usage): Change the sense of -G. It now turns off foilgain scaling, instead of turning on. (Actually, it was the default before this anyway.) (main): Given -G, zero-out the SCALE_WEIGHTS entries in all the methods. Tue Mar 11 11:58:03 1997 Andrew McCallum <mccallum@jprc.com> * Version (BOW_MINOR_VERSION): Version 0.6. * libbow.h: Likewise. * Makefile.in (DIST_FILES): Add TODO. Remove p.inc. (p-alpha.o, p-alonly.o, p-white.o): Targets removed. * rainbow.c (rainbow_query): Use bow_barrel_ macros instead of indexing into the methods structure manually. * crossbow.c: Add copyright info. * readme.texi: Fill out. * libbow-desc.texi: Add description. * install.texi: Add pointer to the README. Say that it requires GCC. * HACKING: Update CVS repository machine name. * tfidf.c (bow_tfidf_set_weights): Insert dislaimer explaining that TFIDF is broken.Tue Mar 11 11:31:39 1997 Rahul Sukthankar <rahuls@syzygy.jprc.com> * Makefile.in (DEMO_C_FILE): Added crossbow.c. * crossbow.c: New file.Mon Mar 10 18:52:03 1997 Andrew McCallum <mccallum@jprc.com> * int4str.c (_str2id): Keep return value smaller using modulus. This fixes bug Rosie Jones encountered with negative hash values. (_str_hash_lookup): Assert that H is non-negative. * Makefile.in (LIBBOW_C_FILES): Added lex-email.c.Fri Mar 7 10:54:09 1997 Andrew McCallum <mccallum@jprc.com> * int4word.c (bow_words_reread_from_file): Make sure LAST_FILE is non-NULL.Tue, 18 Feb 1997 20:15:42 -0500 Jason Rennie <jr6b@andrew.cmu.edu> * lex-email.c: New file. Created lexer for e-mail/newsgroup messages * lex-html.c: Changed code to allow words separated by HTML tags to be tokenized as single words. <FONT SIZE=+2>B</FONT>ig is now tokenized as "Big". Nested brackets are now ignored. This should more closely model the way HTML is interpreted. * rainbow.c: Added rainbow_email_lexer as a bow_lexer_indirect. Added '-M' option to allow user to make use of rainbow_email_lexer. rainbow_email_lexer will remove "Newsgroups:" and "Path:" headers from message. * libbow.h (bow_email_headers_to_remove): Declare new global variable. (bow_email_lexer): Likewise.Tue Mar 4 11:51:53 1997 Andrew McCallum <mccallum@jprc.com> * libbow.h (bow_barrel_scale_weights): Don't call underlying function if it's NULL. (bow_barrel_normalize_weights): Likewise. (bow_wv_set_weights): Likewise. (bow_wv_normalize_weights): Likewise. * vpc.c (bow_barrel_new_vpc_merge_then_weight): Use macros for weight setting. (bow_barrel_new_vpc_weight_then_merge): Likewise. * rainbow-h.c (_heir_barrel_cdoc_write): Write WORD_COUNT. (_heir_barrel_cdoc_read): Read it. (heir_dir_is_leaf): Check the return status from CHDIR(), and print appropriate error message. (heir_barrel_keep_top_words_by_infogain): Return immediately if num_words_to_keep is 0 or the children count is 0. (heir_barrel_set_vpc_with_weights): Return immediately if the children count is 0. (_heir_barrel_set_node_scores): Add temporary #if'ed code to make score either 1 or 0, so winner takes all. (heir_barrel_score_recurse): New argument DEPTH. All callers changed. (main): Change default NUM_TOP_WORDS from 3000 to 0. Add new command line argument -m and -N. Changes made with Sean Slattery. * naivebayes.c (bow_naivebayes_set_weights): Store class-wide word count in CDOC->WORD_COUNT instead of overloading CDOC->NORMALIZER. (bow_naivebayes_score): Use CDOC->WORD_COUNT instead of CDOC_NORMALIZER. Use it to fix PR_W_C in case where that word doesn't appear in the class. Instead of (1.0 / MAX_WI) use (1.0 / (MAX_WI + CDOC->WORD_COUNT)). Don't normalize the weight by CDOC->NORMALIZER because it it set to already by normalized correctly, including the words that don't appear in the in class. (bow_method_naivebayes): Change the weight normalizing function from BOW_NORMALIZE_WEIGHTS_BY_SUMMING to NULL, because we don't use CDOC->NORMALIZER anymore. * libbow.h (bow_cdoc): Add WORD_COUNT. * barrel.c (_bow_barrel_cdoc_write): Write WORD_COUNT. (_bow_barrel_cdoc_read): Read it. * vpc.c (bow_barrel_new_vpc): Assert MAX_CI is positive, otherwise this means we didn't find any classes.Wed Feb 26 11:08:50 1997 Andrew McCallum <mccallum@jprc.com> * HACKING: Fix sandbox's name.Wed Feb 19 11:27:55 1997 Andrew McCallum <mccallum@jprc.com> * barrel.c (bow_barrel_keep_top_words_by_infogain): Return immediately if NUM_WORDS_TO_KEEP is 0.Tue Feb 18 13:39:34 1997 Andrew McCallum <mccallum@jprc.com> * libbow.h (bow_str_to_method_id): Use a temporary variable, to we use statements like ARGI++ as an argument. * info_gain.c (bow_infogain_per_wi_new): Change assertion to handle round-off error. * rainbow-h.c: Include <math.h>, <time.h>. (heir_barrel): Add components INDEX_IN_PARENT, NUM_LEAVES, FULL_NAME. (heir_barrel_new): Set them. (heir_dir_is_leaf): Use chdir() so that symlinks are dealt with properly. Free() the results of scandir(). (heir_barrel_new_from_text_dir_leaf): Set FULL_NAME and add assertions. (_heir_barrel_new_from_text_dir_recurse): New parameter PARENT_NAME. Move the chdir() to handle symlinks properly. Don't make a SUBDIRNAME. (heir_barrel_new_from_text_dir): New function. (heir_barrel_write_to_file): Write new heir_barrel components. (heir_barrel_new_from_file): Read them. (heir_barrel_free): Free FULL_NAME. (heir_barrel_keep_top_words_by_infogain): New function. (heir_parent_di_to_child_index_and_di): New function. (heir_di_to_classname): New function. (heir_barrel_test_split): New function. (_heir_barrel_set_node_scores): Use bow_barrel_score() instead to bow_get_best_matches(). (heir_barrel_print_scores_recurse): Return void not int. Print all on same line. (heir_barrel_score_recurse): New function. (heir_barrel_score): New function. (heir_barrel_test): New function. (heir_barrel_print_weight_vectors): Change formatting. (set_vocabulary_from_file): New function (unused). (main): Allow user to set DATADIR (-d) and NUM_TOP_WORDS (-T), test (-t). Compile with -Wall.Mon Feb 17 10:36:32 1997 Andrew McCallum <mccallum@jprc.com> * configure.in: Remove check for <float.h>, all ANSI compilers should have it. * split.c: Remove SunOS declarations of rand() and srand(). (RAND_MAX): Define macro, if not already defined. These two changes needed to compile on SunOS. * naivebayes.c (bow_naivebayes_set_weights): Uncomment assertion about METHOD->ID. * rainbow.c (rainbow_lisp_setup): Add `-N' to effective arguments. (rainbow_lisp_query): Fix typo in BOW_FOPEN() call. * rainbow.c (rainbow_query): Check for QUERY_WV being NULL, and output more useful messages in that case. * naivebayes.c (bow_naivebayes_score): Rearrange the code for stepping through a DV so we always get the CDOC. This change should have no effect on the outcome. * lex-simple.c (bow_lexer_simple_open_text_fp): Fix test for matching END_PATTERN_PTR. Don't push the DOCUMENT_END_PATTERN back on the input stream after we find it; this is a stylistic choice. * docnames.c (bow_map_filenames_from_dir): Pass relative instead of absolute directory names to recursive calls. Before I was having trouble with symbolic links. This seems to fix it. * int4word.c (bow_words_keep_top_by_infogain): Fix assertions; its OK to have infogain equal to 0. * prind.c: Comment fixes. * foilgain.c (bow_foilgain_per_wi_ci_new): Use malloc() for POS_PER_WI_CI and NEG_PER_WI_CI, instead of using stack. We were overflowing the stack before.Tue Feb 11 12:15:30 1997 Andrew McCallum <mccallum@jprc.com> * naivebayes.c (bow_naivebayes_score): When word doesn't appear in the class vector, make Pr(w|C) include CDOC->NORMALIZER. (Suggested by Sean Slattery). * naivebayes.c (bow_naivebayes_score): Fix constant in assertion. * configure.in: When perl5 isn't found, PERL will be "", not ":". Deal with it properly. * libbow.h: Don't bother with HAVE_FLOAT, just always include <float.h>. (bow_get_best_matches): Remove declaration. The function no longer exists. * rainbow.c (rainbow_test): Use macros for accessing method functions. * split.c: Fix author comment. * info_gain.c (bow_infogain_per_wi_new): Use double instead of float, because before we were loosing resolution and getting negative IG's. (bow_entropy): Likewise.Mon Feb 10 16:25:04 1997 Andrew McCallum <mccallum@jprc.com> * docnames.c (bow_map_filenames_from_dir): Use perror() when can't open directory.Fri Feb 7 11:00:50 1997 Andrew McCallum <mccallum@jprc.com> * int4word.c (bow_words_read_from_file): Fix typo. * rainbow.c (rainbow_lisp_query): Use bow_barrel_score instead of bow_get_best_matches. These changes by Tony Brusseau <brusseau@jprc.com>, with modifications by <mccallum@jprc.com>. * wv.c (bow_wv_new_from_text_string): New function. (bow_wv_sprintf): New function. * int4word.c (bow_words_set_map): Add new argument indicating if old map should be freed. All callers changed. (bow_words_reread_from_file): New function. * docnames.c: Include <stdio.h>. (bow_map_filenames_from_dir): Add WindowsNT backslashes to first assertion. * libbow.h: Declare new functions.Thu Feb 6 18:33:21 1997 Andrew McCallum <mccallum@jprc.com> * rainbow.c: Updated for below library changes. * arrow.c: Likewise. * libbow.h: Declare many new functions, variables and types, including: (bow_boolean): New type. (bow_wv_set_weights_to_count): New function declaration. (bow_wv_normalize_weights_by_vector_length): Likewise. (bow_wv_normalize_weights_by_summing): Likewise. (bow_str_to_method_id): Macro renamed from bow_str2method. (bow_method_id): New enum, replacing bow_method. (bow_method): Now a struct. (bow_barrel_set_weights, bow_barrel_scale_weights, bow_barrel_normalize_weights, bow_new_vpc_with_weights, bow_barrel_score, bow_wv_set_weights, bow_wv_normalize_weights): New macros. (bow_methods): New global variable declaration. (bow_params_*): New types. (bow_score): Renamed from bow_doc_score. * wv.c (bow_wv_set_weights_to_count): New function. * weight.c: File removed. * vpc.c (bow_barrel_new_vpc_merge_then_weight): New function. (bow_barrel_new_vpc_weight_then_merge): New function. (bow_barrel_set_vpc_priors_by_counting): Renamed from _bow_barrels_set_naivebayes_vpc_priors. * tfidf.c: File contents totally replaced to implement TFIDF. Functions removed from weight.c and other places. (bow_tfidf_set_weights): Function renamed. (bow_tfidf_score): Function renamed from bow_get_best_matches(). (bow_tfidf_params_{words,log_words,log_occur}): New variables. (bow_method_tfidf_{words,log_words,log_occur}): New global variables. * prind.c (bow_prind_uniform_priors): Global variable removed. (bow_prind_scale_by_infogain): Likewise. (bow_prind_normalize_scores): Likewise. (bow_prind_set_weights): Renamed from _bow_barrel_set_prind_weights. (bow_prind_score): Renamed from _bow_score_prind_from_wv, and updated for library changes. (bow_prind_params): New variable. (bow_method_prind): New global variable. * score.c: File removed. * naivebayes.c (_bow_barrels_set_naivebayes_vpc_priors): Function removed. Replacement in vpc.c. (bow_naivebayes_set_weights): Minor updates for library changes. (bow_naivebayes_params): New va
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -