网页蜘蛛采集代码,CSharp代码库,德仔网

代码语言

代码分类

【CSharp】网页蜘蛛采集

作者:寂静之秋 / 发布于2013/3/1/ 679

这段代码有修改过比博客的新，对UTF-8和GB2312都能自动识别，对iso-8859-1无法判断需要手动选择编码方式，采集的HTML代码通过正则可以过滤成文本。 jQuery文件没有上传，下载后自己改一下地址就行，对程序没什么影响。


	?<%@ Page Language="C#" AutoEventWireup="true" CodeFile="WebClientDemo.aspx.cs" Inherits="WebClientDemo" ValidateRequest="false" %>

	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
	<html xmlns="http://www.w3.org/1999/xhtml">
	<head runat="server">
	 <title></title>
	 <script src="scripts/jquery-1.7.2.js" type="text/javascript"></script>
	 <script type="text/javascript">
	 $(function () {
	 var txt1 = $("#TextBox1");
	 if (txt1.val() == null || txt1.val() == "") {
	 txt1.val("http://");
	 }
	 txt1.mouseover(function () {
	 if ($(this).val() != "http://") {
	 $(this).select();
	 }
	 })
	 txt1.change(function () {
	 if ($(this).val() == null || $(this).val() == "") {
	 $(this).val("http://");
	 }
	 });
	 });
	 </script>
	</head>
	<body>
	 <form id="form1" runat="server">
	 <div>
	 <asp:TextBox ID="TextBox1" runat="server" Width="786px"></asp:TextBox>
	 <asp:DropDownList ID="ddl_encoding" runat="server">
	 <asp:ListItem Value="0">自动识别</asp:ListItem>
	 <asp:ListItem>UTF-8</asp:ListItem>
	 <asp:ListItem>GB2312</asp:ListItem>
	 <asp:ListItem>GBK</asp:ListItem>
	 <asp:ListItem>Big5</asp:ListItem>
	 </asp:DropDownList>
	 <asp:Button ID="Button1" runat="server" Text="确定" OnClick="Button1_Click" />
	 
	 <asp:TextBox ID="TextBox2" runat="server" Height="184px" TextMode="MultiLine" Width="796px"></asp:TextBox>
	 
	 <asp:TextBox ID="TextBox3" runat="server" Height="100px" TextMode="MultiLine" Width="796px"></asp:TextBox>
	 </div>
	 <div>
	 <asp:Literal ID="Literal1" runat="server"></asp:Literal>
	 </div>
	 </form>
	</body>
	</html>

试试其它关键字

　采集　

同语言下

可能有用的

寂静之秋贡献的其它代码(1)

.网页蜘蛛采集