?? search2.txt
字號:
::::::::: :::::::: ::::::::: ::::::::::
:+: :+: :+: :+: :+: :+: :+:
+:+ +:+ +:+ +:+ +:+ +:+
+#++:++#+ +#++:++#++ +#++:++#: :#::+::#
+#+ +#+ +#+ +#+ +#+ +#+
#+# #+# #+# #+# #+# #+# #+#
######### ######## ### ### ###
http://blacksun.box.sk
____________________
______________________I Topic: I_____________________
\ I Search Engines I /
\ E-mail: I Ripped Apart I Written by: /
> I I <
/ rammal81@hotmail.com I____________________I Mikkkeee \
/___________________________> <_________________________\
OverView
\***************/
1-Search Engines
A--How do they work?
B--Subject Trees
2-Information Retrieval Concepts
3-Fuzzy Queries
4-Using Logical Expressions, Boolean Operators
5-More Signal and Less Noise, Deeper into Boolean expressions.
6-Meta Search Engines
7-Advanced Search Features (not for the jumpy people)
8-Date meta tags
9-Building a search engine in PHP
10-Final Note
\***************/
Okay before we start I think you probably know how to use a browser and have some knowledge on how the Internet works. This text file is not something that will make you very smart or "elite," but will show you how search engines work and how to get full advantage by using faster and clearer queries.
Lets start, there are two major tools for locating information on the web: Subject trees, and Search engines. Now I am positive you know and have seen some of these thingies, but you sit there and you try to search for something simple and all of a sudden you get 1739739 results if your searching for the word, "Hacking." Now do you have time to sit and browse through all those sites or are you going to give up? I think your going to start picking random sites thus wasting your precious time, but if you like wasting time STOP READING THIS FILE. Well since we don't like to waste time I'll show you easier ways to sharpen your browsing activities and to make your exploration more productive and exercise some self-discipline in order to keep your time online focused.
.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.
1-Search Engines:
For all of us online we are always looking for stuff to browse and waste time looking at, but most of the time we waste time just trying to find those good sites. For many people, searching the web is synonymous with using a search engine. Search engines are similar to your online card catalog to locate a book in a library, even though search engines give you many search options and search features, the ones that have keyword search systems work best when you can tell them exactly what your looking for. Many of you think, ohh man those just suck, but believe it or not, they are powerful tools that can locate resources you might never find in any other way.
.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.
A-So how do they work?
Search engines are based on software programs called "spiders," which scan the web periodically, collecting (Uniform Resource locators) urls. Spiders usually start with a list of "seed" urls, these programs watch for hyperlinks to other urls and add any new urls not seen in their master list of urls. After doing this, each url is then visited in order to scan it for more urls, and you know they just keep visiting and visiting. Spiders collect web pages for search engines, and the search engines send spiders out onto the web to remain current, so after doing this each page found is indexed for keyword retrieval. Then the results are added to massive databases of Web Page indices from which you can get a very fast response when you run a keyword search.
To start using a search engine, you must first construct a "search query." Many or most engines work by matching keywords in the query against a database of web page indices. This stuff just means you write a topic in the search box and you get results. Search engines are generally incapable of finding the best possible document for any given query, so they are designed to retrieve a number of possible documents in order to increase your chance of finding at least one good site. So each document or site returned after you construct a query search is called a "hit," and at many times usually every time engines will return thousands of hits for a single query. When people receive stuff irrelevant to their search these results are called "false hits," which is really nothing more than a hit that doesn't address the keyword your trying to search for. It is impossible to eliminate all false hits in a keyword search so the term, "noise" is used to describe the rate of these false hits. So our goal today is to try to increase the signal and reduce the noise when conducting a keyword web search.
B-Subject Trees:
One other search strategy is to use a subject tree, or a directory. A subject tree is a hierarchically organized collection and subcategorizes that can be browsed to locate information. Subject trees are really just browsing aids because they require some exploration, but they are designed to get you where you need to go, ie. www.yahoo.com . Almost every engine I have seen actually the large ones have a subject tree for many topics. When using a subject tree, always start from the root to the branch, which you will do after selecting options, which fit your keyword. If you don't like to visit many pages and see crapy web art and pictures then subject trees are your friends. A good subject tree will make it easy for you to get where you want to go without having to go back and look at other topics. These trees have a good advantage. You see they only index documents that have been checked by reviewers and accepted as legitimate sources of information. So this means the stuff they got is some good shit. These trees won't cover every topic or every site but will aid you in finding something. Well lets look at the bad side. A difficulty with subject trees is that storing everything that is relevant to a single topic under a single location is often impossible. Lets look at an example, lets say you have to find some stuff on "hacking texts," you will have to look under computers, Internet, programming, and the list goes on. So this way might work but usually it won't cover very large or broad topics.
Which is better:
Okay I have constructed a table which will show you which will fit your needs.
--------------------------------------------------------
| | Search Engines | Subject Trees |
--------------------------------------------------------
|Quality of urls| No real control | Human reviewers |
----------------|------------------|-------------------|
|Amount of Noise| A huge Problem | No problem |
----------------|------------------|-------------------|
|Dead Links | Many Many | Very Few or none |
----------------|------------------|-------------------|
|Coverage | Spiders find | Few Gaps |
| | everything |(for broad topics) |
----------------|------------------|-------------------|
|Which is easy | Need to study | No need for study |
| |advanced features | |
----------------|------------------|-------------------|
|Stability of | Very unstable | Very Stable |
|results | | |
--------------------------------------------------------
+++++++++++++++++
Newbie Cool Note|
+++++++++++++++++
Well if you still don't have an idea of what and how to
look for stuff, well I found a good little tool.
The WebCrawler search engine has set up a page
that displays the keywords of 28 random searches being
submitted to WebCrawler by real users in real time.
The display is automatically updated every 15 seconds, or
you can update it by pressing the refresh or reload button,
depending on your browser.
the url is http://webcrawler.com/cgi-bin/SearchTicker
++++++++++++++++++++++++++++
2-Information Retrieval Concepts
Lets start learning ways to find what we are looking for in a simple/easy manner. So now we are trying to find sharper queries but first we need to know how these really work.
Information Retrieval (IR) is a branch of computer science that deals with finding information in large text databases. This branch of computer science has been around for many years, but became popular once the web became a popular attraction. A web search engine is an IR system dressed up in a user friendly interface. So under this nice/clean interface is a computer program that doesn't understand natural language and has no ability to comprehend your information needs. IR systems work by checking out keywords in your input query and then they try to locate documents that contain those exact keywords. So its your job to construct a good search query if your going to hope for the best. Well you go doing so, but you still see crap sites showing up in your query, then you say, Mikkkee this bullshit doesn't work why these lame sites keep showing up? I am guessing you don't know anything about HTML so let me explain.
The method of ranking sites by the engines tries to put all relevant documents that have that key word query in the very top of the list. When you happen to see the engine coming back with url's for "cooking," when you were searching for "hacking" this error was not the engines fault or your fault, but it was due to the fact that the engine was responding to hidden text on the web page. HTML documents can contain text that is not displayed by web browsers but is nevertheless used by search engines. In particular, a document may contain a list of keywords for retrieval purposes that are never displayed by the web browsers. This method can also be used by a search engine when it describes the document in its hit list. Document keywords and document descriptions are created with the <META> tag in HTML, as shown in this example I have set up.
<!--META_START-->
<meta http-equiv="title" content="BLAH, BLAH, BLAH.">
<meta name="resource-type" content="document">
<meta name="classification" content="Public Domain">
<meta name="description" content="BLAH, BLAH, BLAH.">
<meta name="keywords"content="OKAY THIS IS WHERE YOU OR THE WEBMASTER PUTS ALL THE KEYWORDS TO MAKE THE ENGINE RESPOND IN YOUR HIT LIST.">
<meta name="distribution" content="global">
<meta name="copyright" content="Copyright
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -