亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

? 歡迎來到蟲蟲下載站! | ?? 資源下載 ?? 資源專輯 ?? 關于我們
? 蟲蟲下載站

?? spider.htm

?? 一個類似爬行的程序,用于了一些數學方法,能用的,也不錯的
?? HTM
字號:
<HTML>

<!-- Header information-->
<HEAD>
   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
   <META NAME="Author" CONTENT="Chris Maunder">
   <TITLE>Section - Title</TITLE>
</HEAD>

<!-- Set background properties -->
<body background="../fancyhome/back.gif" bgcolor="#FFFFFF" link="#B50029" vlink="#8E2323" alink="#FF0000">

<!-- A word from our sponsors... -->
<table WIDTH="100%">
<tr WIDTH="100%"><td><!--#exec cgi="/cgi/ads.cgi"--><td></tr>
</table>


<!-- Article Title -->
<CENTER><H3><FONT COLOR="#AOAO99">
Pre-emptive Multithreading Web Spider
</FONT></H3></CENTER>
<CENTER><H3><HR></H3></CENTER>

<!-- Author and contact details -->
This article was contributed by <A HREF="mailto:sim@ayersoft.com">Sim Ayers</A>.

<!-- Sample image and source code/demo project -->
<P>
<IMG SRC="article.gif">&nbsp;<A HREF="spider.zip">Download Source Code and Example</A>
</p>
<br>

<!-- The article... -->

<p> The Win32 API supports applications that are pre-emptively multithreaded. 
This is a very useful and powerful feature of Win32 in writing MFC Internet Spiders. 
The SPIDER project is an example of how to use preemptive multithreading 
to gather information on the Web using a SPIDER/ROBOT with the MFC WinInet classes.
<p>
This project produces a spidering software program that checks 
Web sites for broken URL links. Link verification is done only on
href links. It displays a continously updated list of URLs in a CListView
that reports the status of the href link. The project could be used as a template for
gathering and indexing information to be stored in a database file for queries. 
<p>
Search engines gather information on the Web using programs called Robots. 
Robots (also called Web Crawlers, Spiders, Worms, Web Wanderers, and Scooters) 
automatically gather and index information from around the Web, and then put 
that information into databases. (Note that a Robot will index a page, and then
follow the links on that page as a source for new URLs to index.) Users can than 
construct queries to search these databases to find the information they want. 

<p>
By using preemptive multithreading, you can index a Web page of URL links,
start a new thread to follow each new URL link for a new source of URLs to index.

<p>
The project uses the MDI CDocument used with a
custom MDI child frame to display a  CEditView when downloading
Web pages and a  CListView when checking URL links. The project also uses the CObArray, 
CInternetSession, CHttpConnection, CHttpFile, and CWinThread MFC classes. The CWinThread
class is used to produce multiple threads instead of using the Asynchronous mode
in CInternetSession, which is realy left over from the winsock 16 bit windows platform.
<p>

The SPIDER project use's simple worker threads to check URL links or download a Web page.
The CSpiderThread class is derived from the CWinThread class so each
CSpiderThread object can use the CwinThread MESSAGE_MAP() function.
By declaring a 	"DECLARE_MESSAGE_MAP()" in the CSpiderThread class the
user interface is still responsive to user input. This means you can
check the URL links on one Web server and at the same time download and open
a Web page from another Web Server. The only time the user interface will become
unresponsive to user input is when the thread count exceedes MAXIMUM_WAIT_OBJECTS
which is defined as 64.
<p>

In the constructor for each new CSpiderThread object  we supply the ThreadProc function and the
thread Paramters to be passed to the ThreadProc function.
<FONT COLOR="#990000"><TT><PRE>

	CSpiderThread* pThread;
	pThread = NULL;
	pThread = new CSpiderThread(CSpiderThread::ThreadFunc,pThreadParams); // create a new CSpiderThread object


</tt></PRE></FONT>
In the CSpiderThread constructor we set the	CWinThread* m_pThread pointer in the thread Paramters structure so we can point to the
correct instance of this thread;
<FONT COLOR="#990000"><TT><PRE>
pThreadParams->m_pThread = this;
</tt></PRE></FONT>


The CSpiderThread ThreadProc Function 
<FONT COLOR="#990000"><TT><PRE>

// simple worker thread Proc function
UINT CSpiderThread::ThreadFunc(LPVOID pParam)
{
	ThreadParams * lpThreadParams = (ThreadParams*) pParam;
	CSpiderThread* lpThread = (CSpiderThread*) lpThreadParams->m_pThread;
	
	lpThread->ThreadRun(lpThreadParams);

	// Use  SendMessage instead of PostMessage here to keep the current thread count
	// Synchronizied. If the number of threads is greater than MAXIMUM_WAIT_OBJECTS (64)
	// the program will be come	 unresponsive to user input

	::SendMessage(lpThreadParams->m_hwndNotifyProgress,
		WM_USER_THREAD_DONE, 0, (LPARAM)lpThreadParams);  // deletes lpThreadParams and decrements the thread count

	return 0;
}
</tt></PRE></FONT>
The structure passed to the CSpiderThread ThreadProc Function
<FONT COLOR="#990000"><TT><PRE>
typedef struct tagThreadParams
{
	HWND m_hwndNotifyProgress;
	HWND m_hwndNotifyView;
	CWinThread* m_pThread;
	CString m_pszURL;
	CString m_Contents;
	CString m_strServerName;
	CString m_strObject;
	CString m_checkURLName;
	CString m_string;
	DWORD m_dwServiceType;
	DWORD  m_threadID;
	DWORD m_Status;
	URLStatus m_pStatus;
	INTERNET_PORT  m_nPort;
	int m_type;
	BOOL m_RootLinks;

}ThreadParams; 

</tt></PRE></FONT>

After the CSpiderThread object has been created we use the CreatThread function to start
the execution of the new thread object.

<FONT COLOR="#990000"><TT><PRE>

	if (!pThread->CreateThread())   //  Starts execution of a CWinThread object
	{
		AfxMessageBox("Cannot Start New Thread");
		delete pThread;
		pThread = NULL;
		delete pThreadParams;
		return FALSE;
	}    
</tt></PRE></FONT>

Once the new thread is running we use the ::SendMessage function to send messages to the CDocument's-> CListView with the status structure of the URL link.
<FONT COLOR="#990000"><TT><PRE>

	if(pThreadParams->m_hwndNotifyView != NULL)
		::SendMessage(pThreadParams->m_hwndNotifyView,WM_USER_CHECK_DONE, 0, (LPARAM) &pThreadParams->m_pStatus);
</tt></PRE></FONT>
Sturcture used for URL status.
<FONT COLOR="#990000"><TT><PRE>

typedef struct tagURLStatus
{
	CString m_URL;
	CString m_URLPage;
	CString m_StatusString;
	CString m_LastModified;
	CString m_ContentType;
	CString m_ContentLength;
	DWORD	m_Status;
}URLStatus, * PURLStatus;
</tt></PRE></FONT>

Each new thread creats a new  CMyInternetSession (derived from CInternetSession) object with EnableStatusCallback set to TRUE,
so we can check the status on all  InternetSession callbacks. The dwContext ID for callbacks is set to the 
thread ID.

<FONT COLOR="#990000"><TT><PRE>

BOOL CInetThread::InitServer()
{
	
	try{
		m_pSession = new CMyInternetSession(AgentName,m_nThreadID);
		int ntimeOut = 30;  // very important, can cause a Server time-out if set to low
							// or hang the thread if set to high.
		/*
		The time-out value in milliseconds to use for Internet connection requests. 
		If a connection request takes longer than this timeout, the request is canceled.
		The default timeout is infinite. */
		m_pSession->SetOption(INTERNET_OPTION_CONNECT_TIMEOUT,1000* ntimeOut);
		
		/* The delay value in milliseconds to wait between connection retries.*/
		m_pSession->SetOption(INTERNET_OPTION_CONNECT_BACKOFF,1000);
		
		/* The retry count to use for Internet connection requests. If a connection 
		attempt still fails after the specified number of tries, the request is canceled.
		The default is five. */
		m_pSession->SetOption(INTERNET_OPTION_CONNECT_RETRIES,1);
        m_pSession->EnableStatusCallback(TRUE);

		}
		catch (CInternetException* pEx)
		{
			// catch errors from WinINet
			//pEx->ReportError();
			m_pSession = NULL;
			pEx->Delete();

			return FALSE ;
		}

	return TRUE;
}

</tt></PRE></FONT>

The key to using the MFC WinInet classes in a single or multithread program is to use a try 
and catch block statement surrounding all MFC WinInet class functions.
The internet is very unstable at times or the web page you are requesting no longer exist, which
is guaranteed to throw a CInternetException Error.
<FONT COLOR="#990000"><TT><PRE>


	try
	{
			// some MFC WinInet class function
	}
	catch (CInternetException* pEx)
	{
		// catch errors from WinINet
		//pEx->ReportError();
		pEx->Delete();

		return FALSE ;
	}
 </tt></PRE></FONT>

<p>
The maximum count of threads is initially set to 64, 
but you can configure it to any number between 1 and 100. 
A number that is too high will result in failed connections, 
which means you will have to recheck the URL links. 
<p>
A rapid fire succession of HTTP requests in a /cgi-bin/ directory could bring a server to it's knees. 
The SPIDER program sends out about 4 HTTP request a second. 4 * 60 = 240  a minute. This can also
bring a server to it's knees. Be carefull about what server you are checking. Each server has
a server log with the requesting Agent's IP address that requested the Web file. You might get some nasty
email from a angry Web Server administrator.
<p>
You can prevent any directory from being indexed by creating a robots.txt file for 
that directory. This mechanism is usually used to protect /cgi-bin/ directories. CGI scripts take more
server resources to retrieve.
<p>
When the SPIDER program checks URL links it's goal is not requested too many documents too quickly. 
The SPIDER program adheres somewhat to the standard for robot exclusion.
This standard is a joint agreement between robot developers, that allows WWW sites to limit
what URL's the robot requests. By using the standard to limit access, the robot will not 
retrieve any documents that Web Server's wish to disallow.
<p>
Before checking the Root URL, the program checks to see if there is a robots.txt file
in the main directory. If the SPIDER program finds a robots.txt file the program will
abort the search. The program also checks for the META tag in all Web pages. If it finds
a  META NAME="ROBOTS" CONTENT ="NOINDEX,NOFOLLOW"   tag it will not index the URLs
on that page.
<p>
Build:
<br>  Windows 95
<br>  MFC/VC++ 5.0
<br>  WinInet.h  dated 9/25/97
<br>  WinInet.lib dated  9/16/97
<br>  WinInet.dll  dated  9/18/97

<p>
Problems:
<br> can't seem to keep the thread count below 64 at all times.
<br> limit of 32,767 URL links in the CListView
<br> wouldn't parse all URLs correctly,will crash program occasionally using CString functions with complex URLs.
<p>
Resources:
<br>Internet tools - Fred Forester 
<br><a href="http://www.amazon.com/exec/obidos/ISBN=0201634929/programcomA/">Multithreading Applications in Win32</a> 
<br> <a href ="http://www.oreilly.com/catalog/multithread/">Win32 Multithreaded Programming</a>
<!-- Remember to update this -->
<p>Last updated: 25 May 1998

<P><HR>

<!-- Codeguru contact details -->
<TABLE BORDER=0 WIDTH="100%">
<TR>
<TD WIDTH="33%"><FONT SIZE=-1><A HREF="http://www.codeguru.com">Goto HomePage</A></FONT></TD>

<TD WIDTH="33%">
<CENTER><FONT SIZE=-2>&copy; 1997 Zafir Anjum</FONT>&nbsp;</CENTER>
</TD>

<TD WIDTH="34%">
<DIV ALIGN=right><FONT SIZE=-1>Contact me: <A HREF="mailto:zafir@home.com">zafir@home.com</A>&nbsp;</FONT></DIV>
</TD>
</TR>
</TABLE>

<!-- Counter -->
<CENTER><FONT SIZE=-2><!--#exec cgi="../cgi/counters/counter.cgi"--></FONT></CENTER>


</BODY>
</HTML>

?? 快捷鍵說明

復制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频
国产精品久久久久影院老司| 国产+成+人+亚洲欧洲自线| 麻豆国产精品一区二区三区| 成人av在线播放网址| 欧美一区二区女人| 国产精品乱人伦中文| 麻豆一区二区在线| 欧美精品乱人伦久久久久久| 一区精品在线播放| 国产a视频精品免费观看| 日韩欧美专区在线| 五月天网站亚洲| 日本韩国精品一区二区在线观看| 久久久亚洲高清| 日韩国产欧美一区二区三区| 欧美伊人精品成人久久综合97| 国产精品三级视频| 国产一区不卡视频| 精品国产一区二区三区不卡 | 日本网站在线观看一区二区三区| 成+人+亚洲+综合天堂| 精品国产成人在线影院| 亚洲国产精品久久久久婷婷884 | 久草精品在线观看| 欧美日韩激情一区二区| 夜色激情一区二区| 99国产精品一区| 亚洲国产精品黑人久久久| 国产精品中文有码| 久久久久久免费网| 国产精品亚洲综合一区在线观看| 精品久久免费看| 久久草av在线| 精品国产一区二区三区久久久蜜月| 美女视频一区二区三区| 日韩精品一区二区三区中文不卡| 丝袜美腿亚洲色图| 7777精品伊人久久久大香线蕉的 | 欧美一卡2卡3卡4卡| 图片区小说区区亚洲影院| 欧美日韩小视频| 午夜av一区二区| 欧美一区二区大片| 激情另类小说区图片区视频区| 日韩欧美久久一区| 国产主播一区二区| 国产欧美日韩精品在线| 波多野结衣中文一区| 亚洲三级在线看| 在线看日本不卡| 日韩高清在线观看| 久久尤物电影视频在线观看| 国产成a人亚洲精品| 亚洲日本护士毛茸茸| 欧美日韩精品免费| 久久www免费人成看片高清| 精品盗摄一区二区三区| 成人av动漫网站| 亚洲一区二区五区| 日韩精品自拍偷拍| 粗大黑人巨茎大战欧美成人| 亚洲欧美另类图片小说| 91精品国产综合久久精品麻豆 | 国产成人午夜电影网| 亚洲色欲色欲www在线观看| 欧美日韩专区在线| 激情深爱一区二区| 一区精品在线播放| 538prom精品视频线放| 国产成人免费视频一区| 亚洲一区在线电影| 精品国产91洋老外米糕| 91在线国产观看| 日韩黄色免费电影| 欧美激情资源网| 日韩一区二区三区电影在线观看| 成人自拍视频在线| 五月天亚洲婷婷| ...中文天堂在线一区| 91精品国产综合久久久久久久久久| 岛国精品在线观看| 青娱乐精品视频| 亚洲四区在线观看| 国产欧美在线观看一区| 欧美日韩一级二级| 波多野结衣亚洲| 国内精品免费在线观看| 性欧美疯狂xxxxbbbb| 国产精品伦理一区二区| 欧美精品一区视频| 欧美人伦禁忌dvd放荡欲情| 成人精品视频一区| 国产在线国偷精品免费看| 国产精品妹子av| 久久综合九色综合97婷婷女人| 欧美午夜寂寞影院| 99精品国产一区二区三区不卡| 狠狠色综合播放一区二区| 日韩精品电影在线| 亚洲免费在线视频一区 二区| 久久久久免费观看| 2024国产精品视频| 日韩视频中午一区| 在线播放视频一区| 欧美网站大全在线观看| 91在线高清观看| 成人晚上爱看视频| 国产凹凸在线观看一区二区| 国产精品一区一区三区| 久久狠狠亚洲综合| 热久久久久久久| 蜜臀av一区二区| 免费观看在线色综合| 午夜国产不卡在线观看视频| 夜夜嗨av一区二区三区四季av | 色88888久久久久久影院野外| av电影一区二区| 成人小视频免费在线观看| 国产精品亚洲午夜一区二区三区| 韩国成人精品a∨在线观看| 另类欧美日韩国产在线| 精品一区二区影视| 麻豆精品国产91久久久久久| 久久99久久99| 国产精品资源网站| 国产成人午夜视频| a美女胸又www黄视频久久| 91美女片黄在线观看91美女| 色综合久久综合网97色综合| 在线观看日产精品| 欧美日韩国产成人在线91 | 色婷婷综合五月| 在线观看欧美精品| 欧美高清dvd| 欧美精品一区二区久久久| 国产日韩v精品一区二区| 亚洲欧洲国产日韩| 亚洲国产欧美日韩另类综合| 日本不卡一二三区黄网| 国产美女娇喘av呻吟久久| 99久久久久久| 欧美嫩在线观看| 精品国产乱码久久| 国产精品护士白丝一区av| 亚洲综合激情网| 精品一区二区三区欧美| 成+人+亚洲+综合天堂| 欧美精品视频www在线观看| 国产色爱av资源综合区| 亚洲一区二区欧美激情| 久久97超碰国产精品超碰| 成人国产免费视频| 欧美日韩视频在线第一区| 精品国产露脸精彩对白 | 奇米影视7777精品一区二区| 国产高清精品在线| 欧美三级电影在线看| 国产日产欧产精品推荐色| 亚洲综合色噜噜狠狠| 国产露脸91国语对白| 91黄色激情网站| 久久这里只有精品视频网| 亚洲午夜成aⅴ人片| 国产一区二区视频在线| 欧美日韩综合不卡| 国产精品久久久久久久浪潮网站 | 亚洲精品伦理在线| 狠狠色狠狠色综合| 欧美天天综合网| 国产日产欧美一区二区视频| 日韩av中文在线观看| 成人黄色免费短视频| 日韩免费高清电影| 亚洲综合免费观看高清完整版 | 一本在线高清不卡dvd| 91精品欧美久久久久久动漫| 亚洲欧美日韩中文播放| 国产精品123| 日韩精品一区在线观看| 一区二区三区鲁丝不卡| 国产精品中文字幕日韩精品| 91精品国产91久久久久久最新毛片| 亚洲三级视频在线观看| 国产sm精品调教视频网站| 精品嫩草影院久久| 奇米色777欧美一区二区| 在线亚洲高清视频| 17c精品麻豆一区二区免费| 岛国精品一区二区| 久久久精品国产免大香伊| 激情欧美日韩一区二区| 日韩亚洲欧美综合| 丝袜亚洲精品中文字幕一区| 在线观看欧美日本| 伊人性伊人情综合网| 成人精品视频一区二区三区 | 欧美一级黄色大片| 日韩和欧美的一区| 91麻豆精品国产91久久久久久久久| 亚洲综合丁香婷婷六月香|