-
以后再也不用擔心寫爬蟲ip被封,不用擔心沒錢買代理ip的煩惱了
在使用python寫爬蟲時候,你會遇到所要爬取的網站有反爬取技術比如用同一個IP反復爬取同一個網頁,很可能會被封。如何有效的解決這個問題呢?我們可以使用代理ip,來設置代理ip池。
現在教大家一個可獲取大量免費有效快速的代理ip方法,我們訪問西刺免費代理ip網址
這里面提供了許多代理ip,但是我們嘗試過后會發現并不是每一個都是有效的。所以我們現在所要做的就是從里面提供的篩選出有效快速穩定的ip。
以下介紹的免費獲取代理ip池的方法:
優點:免費、數量多、有效、速度快
缺點:需要定期篩選
主要思路:
從網址上爬取ip地址并存儲
驗證ip是否能使用-(隨機訪問網址判斷響應碼)
格式化ip地址
代碼如下:
1.導入包
import requests
from lxml import etree
import time
1
2
3
2.獲取西刺免費代理ip網址上的代理ip
def get_all_proxy():
url = 'http://www.xicidaili.com/nn/1'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
}
response = requests.get(url, headers=headers)
html_ele = etree.HTML(response.text)
ip_eles = html_ele.xpath('//table[@id="ip_list"]/tr/td[2]/text()')
port_ele = html_ele.xpath('//table[@id="ip_list"]/tr/td[3]/text()')
proxy_list = []
for i in range(0,len(ip_eles)):
proxy_str = 'http://' + ip_eles[i] + ':' + port_ele[i]
proxy_list.append(proxy_str)
return proxy_list
1
2
3
4
5
6
7
8
9
10
11
12
13
14
3.驗證獲取的ip
def check_all_proxy(proxy_list):
valid_proxy_list = []
for proxy in proxy_list:
url = 'http://www.baidu.com/'
proxy_dict = {
'http': proxy
}
try:
start_time = time.time()
response = requests.get(url, proxies=proxy_dict, timeout=5)
if response.status_code == 200:
end_time = time.time()
print('代理可用:' + proxy)
print('耗時:' + str(end_time - start_time))
valid_proxy_list.append(proxy)
else:
print('代理超時')
except:
print('代理不可用--------------->'+proxy)
return valid_proxy_list
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
4.輸出獲取ip池
if __name__ == '__main__':
proxy_list = get_all_proxy()
valid_proxy_list = check_all_proxy(proxy_list)
print('--'*30)
print(valid_proxy_list)
1
2
3
4
5
技術能力有限歡迎提出意見,保證積極向上不斷學習
————————————————
版權聲明:本文為CSDN博主「彬小二」的原創文章,遵循 CC 4.0 BY-SA 版權協議,轉載請附上原文出處鏈接及本聲明。
原文鏈接:https://blog.csdn.net/qq_39884947/article/details/86609930
標簽:
python
ip
代理
防止
上傳時間:
2019-11-15
上傳用戶:fygwz1982
-
In the hit CBS crime show Person of Interest, which debuted in 2011,
the two heroes—one a former Central Intelligence Agency agent and
the other a billionaire technology genius—work together using the
ubiquitous surveillance system in New York City to try to stop violent
crime. It’s referred to by some as a science fiction cop show. But the
use of advanced technology for crime analysis in almost every major
police department in the United States may surpass what’s depicted
on TV crime dramas such as Person of Interest. Real-time crime cen-
ters (RTCCs) are a vital aspect of intelligent policing. Crime analysis
is no longer the stuff of science fiction. It’s real.
標簽:
Intelligence
Analysis
Crime
上傳時間:
2020-05-25
上傳用戶:shancjb
-
This books attempts to provide an extensive overview on Long-Term Evolution
(LTE) networks. Understanding LTE and its Performance is purposely written to
appeal to a broad audience and to be of value to anyone who is interested in 3GPP
LTE or wireless broadband networks more generally. The aim of this book is to
offer comprehensive coverage of current state-of-the-art theoretical and techno-
logical aspects of broadband mobile and wireless networks focusing on LTE. The
presentation starts from basic principles and proceeds smoothly to most advanced
topics. Provided schemes are developed and oriented in the context of very actual
closed standards, the 3 GPPP LTE.
標簽:
Performance
LTE
and
its
上傳時間:
2020-05-27
上傳用戶:shancjb
-
The wide deployment of wireless networks and mobile technologies, along with the
significant increase in the number of mobile device users, have created a very strong
demand on various wireless-based, mobile-based software application systems and
enabling technologies. This not only provides many new business opportunities and
challenges to wireless and networking service providers, mobile technology ven-
dors, and software industry and solution integrators, butalso changes and enhances
people’s lives in many areas, including communications, information sharing and
exchange, commerce, home environment, education, and entertainment. Business
organizations and government agencies face new pressure fortechnology updatesto
upgrade their networking infrastructures with wireless connectivity to enhance
enterprise-oriented systems and solutions.
標簽:
Wireless-Based
Software
Systems
上傳時間:
2020-06-01
上傳用戶:shancjb
-
The main aim of this book is to present a unified, systematic description of
basic and advanced problems, methods and algorithms of the modern con-
trol theory considered as a foundation for the design of computer control
and management systems. The scope of the book differs considerably from
the topics of classical traditional control theory mainly oriented to the
needs of automatic control of technical devices and technological proc-
esses. Taking into account a variety of new applications, the book presents
a compact and uniform description containing traditional analysis and op-
timization problems for control systems as well as control problems with
non-probabilistic models of uncertainty, problems of learning, intelligent,
knowledge-based and operation systems – important for applications in the
control of manufacturing processes, in the project management and in the
control of computer systems.
標簽:
Modern_Control_Theory
上傳時間:
2020-06-10
上傳用戶:shancjb
-
It all started rather innocuously. I walked into Dr GT Murthy’s office one fine day, andchanged my life. “Doc” was then the General Manager, Central R&D, of a very largeelectrical company headquartered in Bombay. In his new state-of-the-art electronics center,he had hand-picked some of India’s best engineers (over a hundred already) ever assembledunder one roof. Luckily, he too was originally a Physicist, and that certainly helped me gainsome empathy. Nowadays he is in retirement, but I will always remember him as athoroughly fair, honest and facts-oriented person, who led by example. There were severalthings I absorbed from him that are very much part of my basic engineering persona today.You can certainly look upon this book as an extension of what Doc started many years agoin India … because that’s what it really is! I certainly wouldn’t be here today if I hadn’t metDoc. And in fact, several of the brash, high-flying managers I’ve met in recent years,desperately need some sort of crash course in technology and human values from Doc!
標簽:
開關電源
上傳時間:
2021-11-23
上傳用戶:
-
PADS Layout 的用戶接口具有非常易于使用和有效的特點。PADS Layout 在滿足專業用戶需要的同時,還考慮到一些初次使用PCB 軟件的用戶需求。教程的這節將將覆蓋以下內容:· 使用PADS Layout 進行交互操作· 工作空間的使用· 設置柵格(Grids)· 使用取景(Pan)和縮放(Zoom)· 面向目標(Object Oriented)的選取方式
標簽:
pads
上傳時間:
2021-11-28
上傳用戶:
-
高清電子書-C++ Primer Plus, 第6版英文版 1438頁Learning C++ is an adventure of discovery, particularly because the language accommodates several programming paradigms, including object-oriented programming,
generic programming, and the traditional procedural programming.The fifth edition of
this book described the language as set forth in the ISO C++ standards, informally
known as C++99 and C++03, or, sometimes as C++99/03. (The 2003 version was
largely a technical correction to the 1999 standard and didn’t add any new features.)
Since then, C++ continues to evolve.As this book is written, the international C++
Standards Committee has just approved a new version of the standard.This standard had
the informal name of C++0x while in development, and now it will be known as
C++11. Most contemporary compilers support C++99/03 quite well, and most of the
examples in this book comply with that standard. But many features of the new standard
already have appeared in some implementations, and this edition of C++ Primer Plus
explores these new features.
C++ Primer Plus discusses the basic C language and presents C++ features, making
this book self-contained. It presents C++ fundamentals and illustrates them with short,
to-the-point programs that are easy to copy and experiment with.You learn about
input/output (I/O), how to make programs perform repetitive tasks and make choices,
the many ways to handle data, and how to use functions.You learn about the many
features C++ has added to C, including the followi
標簽:
C++
上傳時間:
2022-02-19
上傳用戶:trh505
-
《HeadFirstJava》是一本完整地面向對象(object-oriented,OO)程序設計和Java的學習指導用書,根據學習理論所設計,你可以從程序語言的基礎開始,到線程、網絡與分布式程序等項目。重要的是,你可以學會如何像一個面向對象開發者一樣去思考,而且不只是讀死書。 在這里,你可以會玩游戲、拼圖、解謎題以及以意想不到的方式與Java交互。 在這些活動中,你還會寫出一堆真正的Java程序,如一個船艦炮戰游戲和一個網絡聊天程序等等。 “HeadFirst系列”圖文并茂學習方式能讓你快速地在腦海中掌握住知識,敞開心胸準備好學習這些關鍵性的主題: ★Java程序語言 ★面向對象程序開發 ★Swing圖形化接口 ★使用JavaAPI函數庫 ★編寫、測試與布署應用程序 ★處理異常;多線程 ★網絡程序設計 ★集合與泛型
標簽:
java
上傳時間:
2022-06-12
上傳用戶:
-
RFID(Radio Frequency Identification)中間件的設計與系統的多個層相關,如RFID電子標簽的數據采集、標簽數據管理、RFID系統安全等。對于不同層,不同的設計和實現被具體應用所采納。然而,以這種方法設計出來的中間件就會缺乏一致性和靈活性,設計者不能夠以一個統一的框架設計RFID中間件。面向服務的RFID中間件架構SOA(Service-oriented Architecture)是一種用于RFID各個應用領域軟件開發的框架,它是一種以服務為中心的包含運行環境、編程架構風格在內的一套新的分布式軟件系統構造方法和環境。使用SOA開發RFID中間件,能很好地改善軟件設計中的整體性、靈活性和統一性。SOA是RFID中間件設計的基礎,本文針對RFID中間件設計中存在的一些問題,如EPC編碼的自動解析、RFID讀寫器的接入、RFID標簽數據的交換或共享、RFID系統安全等,提出了面向服務的RFID中間件平臺架構。本文用SOA的設計原則建立RFID中間件的軟件構架,然后通過系統集成服務的方式——查詢服務、調用服務和提供服務清晰地定義出RFID讀寫器管理服務、標簽信息服務、RFID安全服務等。使其適合于不同的RFID應用,并且根據EPCglobal 標準實現EPC編碼的自動解析,這樣不僅有助于在不同平臺間RFID標簽數據的交換和集成,而且對于不同的應用降低了構建RFID系統的難度。
標簽:
rfid
上傳時間:
2022-06-25
上傳用戶: