Selenium介绍--实例爬取京东商品信息与图片

这篇具有很好参考价值的文章主要介绍了Selenium介绍--实例爬取京东商品信息与图片。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

一、Selenium简介

二、Selenium组成

三、Selenium特点

四、Selenium的基本使用

1.下载所用浏览器需要的驱动

2.创建项目并导入依赖

3.入门

3.代码演示

五、实例爬取京东商品信息与图片 _

5.1 执行效果

一、Selenium简介

 Selenium是一个用于Web应用程序自动化测试工具。Selenium测试直接运行在浏览器中，就像真正的用户在

操作一样。支持的浏览器包括IE（7, 8, 9, 10, 11），Mozilla Firefox，Safari，Google Chrome，Opera等。适用于自动化测试，js动态爬虫（破解反爬虫）等领域。

二、Selenium组成

 1）Selenium IDE：嵌入到Firefox浏览器中的一个插件，实现简单的浏览器操作录制与回放功能，主要用于快速创建BUG及重现脚本，可转化为多种语言
 2）Selenium RC： 核心组件，支持多种不同语言编写自动化测试脚本，通过其服务器作为代理服务器去访问应用，达到测试的目的
 3）Selenium WebDriver（重点）：一个浏览器自动化框架，它接受命令并将它们发送到浏览器。它是通过特定于浏览器的驱动程序实现的。它直接与浏览器通信并对其进行控制。Selenium WebDriver支持各种编程语言，如Java、C# 、PHP、Python、Perl、Ruby
 4）Selenium grid：测试辅助工具，用于做分布式测试，可以并行执行多个测试任务，提升测试效率。

三、Selenium特点

 1）开源、免费
 2）多浏览器支持:FireFox、Chrome、IE、Opera、Edge;
 3）多平台支持:Linux、Windows、MAC;
 4）多语言支持:Java、Python、Ruby、C#、JavaScript、C++;
 5）对Web页面有良好的支持；
 6）简单(API 简单)、灵活(用开发语言驱动);
 7）支持分布式测试用例执行。

四、Selenium的基本使用

爬虫：数据采集、数据清晰、数据分析！！！
java爬虫入门

1.下载所用浏览器需要的驱动

不同浏览器有不同的驱动我使用的谷歌浏览器为例

Install browser drivers | Selenium 官网下载地址国外网加载有点慢

Selenium介绍--实例爬取京东商品信息与图片

1.1查看浏览器版本号 - 下载对应版本

Selenium介绍--实例爬取京东商品信息与图片

1.2 其他地址下载 --

Firefox(火狐)浏览器驱动

下载地址：https://github.com/mozilla/geckodriver/releases/ 根据自己的操作系统下载相对应的驱动

Chrome(google)浏览器驱动

下载地址：http://chromedriver.storage.googleapis.com/index.html 或https://sites.google.com/a/chromium.org/chromedriver/home 根据自己的操作系统下载相对应的驱动

IE浏览器驱动

下载地址：http://selenium-release.storage.googleapis.com/index.html 根据自己的操作系统下载相对应的驱动：

Microsoft Edge (EdgeHTML)浏览器驱动

下载地址：Microsoft Edge WebDriver - Microsoft Edge Developer 根据自己的操作系统下载相对应的驱动

Microsoft Edge (Chromium)浏览器驱动

下载地址：Microsoft Edge WebDriver - Microsoft Edge Developer 根据自己的操作系统下载相对应的驱动

Opera浏览器驱动

下载地址：https://github.com/operasoftware/operachromiumdriver/releases 根据自己的操作系统下载相对应的驱动

Safari浏览器驱动

该浏览器不用下载驱动，可以直接执行代码

2.创建项目并导入依赖

maven 导入项目jar包

    <dependency>
        <groupId>org.seleniumhq.selenium</groupId>
        <artifactId>selenium-java</artifactId>
        <version>3.141.59</version>
     </dependency>

官网下载

Downloads | Selenium

点击Download进行下载也可下载老版本

Selenium介绍--实例爬取京东商品信息与图片

3.入门

方式一：打开浏览器进行操作爬出等等

    //设置驱动  前面下载的对象浏览器驱动地址 解压后文件路径
     System.setProperty("webdriver.chrome.driver","D:\\chromedriver.exe");
     //创建驱动
     WebDriver driver=new ChromeDriver();
     //与将要爬取的网站建立连接
     driver.get("https://www.baidu.com");
     //关闭浏览器
     driver.close();
     //释放资源
     driver.quit();

方式二：不打开浏览器进行

 //设置驱动
        System.setProperty("webdriver.chrome.driver","D:\\chromedriver.exe");

        //定义浏览器参数
        ChromeOptions chromeOptions = new ChromeOptions();
        //设置不打开浏览器
        chromeOptions.addArguments("--headless");
        //初始化驱动
        WebDriver driver = new ChromeDriver(chromeOptions);
         //与将要爬取的网站建立连接
        driver.get("https://www.baidu.com");
        //关闭浏览器
        driver.close();
        //释放资源
        driver.quit();

1.元素选择方式

 1）Class选择：driver.findElement(By.className("s_ipt"));
 2）ID选择：   driver.findElement(By.id("kw"));
 3）name选择： driver.findElement(By.name("wd"));
 4）tag选择：  driver.findElements(By.tagName("input"));
 5）link选择： driver.findElement(By.linkText("地图"));
 6）Partial link选择（a标签文本内容模糊匹配）：driver.findElement(By.partialLinkText("使用百"));
 7）css选择器：driver.findElement(By.cssSelector("#kw"));
 8）xpath选择：driver.findElement(By.xpath("//*[@id=\"kw\"]"));

选择元素节点右键复制有相对应的xpath与css选择器

Selenium介绍--实例爬取京东商品信息与图片

2.基本节点操作

 1.获取单个元素：driver.findElement

 2.获取多个元素：driver.findElements

 3.输入内容：input.sendKeys("java");

 4.元素点击：element.click();

 5.获取元素属性：nextPageEle.getAttribute("class")

 6.获取标签文本内容：titleEle.getText()

3.代码演示

package com.zking.selenium;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

import java.util.List;

public class Dome {
    public static void main(String[] args) {


    //设置驱动    //请更换自己驱动位置
     System.setProperty("webdriver.chrome.driver","D:\\课件与作业\\课件\\课件\\临时\\Selenium\\01.Selenium之入门\\资料\\chromedriver_win32\\chromedriver.exe");
    //创建驱动
    WebDriver driver=new ChromeDriver();
    //与将要爬取的网站建立连接
     driver.get("https://www.baidu.com");
           //class 选择器
        WebElement wq1 = driver.findElement(By.className("s-hotsearch-title"));
        //获取元素标签
        System.out.println(wq1.getTagName());
        //获取元素文本类容
        System.out.println(wq1.getText());


         //id选择器
        WebElement wq2 = driver.findElement(By.id("hotsearch_data"));
        //获取value值  getAttribute 获取某个属性
        System.out.println(wq2.getAttribute("value"));
        //获取别的属性
        System.out.println(wq2.getAttribute("style"));

        //根据name属性获取元素
        WebElement element = driver.findElement(By.name("tn"));
        System.out.println(element.getAttribute("value"));

        //根据元素名获取 元素
      List<WebElement> lista= driver.findElements(By.tagName("a"));
        for (WebElement e:lista
             ) {
             //获取a标签中的 href属性
            System.out.println(e.getAttribute("href"));
        }
        //页面上的iink元素 获取
        //linkText 精确匹配
        WebElement element1 = driver.findElement(By.linkText("地图"));
        System.out.println(element1.getText());
        System.out.println(element1.getAttribute("href"));
         //Partial link选择（a标签文本内容模糊匹配）
          //partialLinkText 模糊匹配
        WebElement element2 = driver.findElement(By.partialLinkText("使用百"));
        System.out.println(element2.getText());
        System.out.println(element2.getAttribute("href"));
        //css 选择器
        WebElement element3 = driver.findElement(By.cssSelector("#bottom_space"));
        System.out.println(element3.getAttribute("class"));
       //根据xpath获取元素
        WebElement element4 = driver.findElement(By.xpath("//*[@id=\"s-top-left\"]"));
        System.out.println(element4.getAttribute("class"));
        
        //---------------------- 模拟搜索 ---------------
        //为输入框 输入类容 并且点击搜索

        WebElement kw = driver.findElement(By.id("kw"));
        kw.sendKeys("java"); //向输入框输入内容
        WebElement element5 = driver.findElement(By.id("su"));
        element5.click(); //模拟点击事件

        //关闭浏览器
    //driver.close();
    //释放资源
   // driver.quit();


    }

}

五、实例爬取京东商品信息与图片 _

5.1 执行效果

Selenium介绍--实例爬取京东商品信息与图片

----------------------------------------------------实例代码-----------------------------------------------------------

package com.zking.selenium;

import org.openqa.selenium.*;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.util.List;

import static java.lang.Thread.sleep;

public class Dome1
{
    public static void main(String[] args) throws InterruptedException, IOException {
        //设置驱动
        System.setProperty("webdriver.chrome.driver","D:\\课件与作业\\课件\\课件\\临时\\Selenium\\01.Selenium之入门\\资料\\chromedriver_win32\\chromedriver.exe");
        
        //true 为打开浏览器操作 false 不打开浏览器进行爬取
        boolean fa=false;
        WebDriver driver =null;
         
        if(fa){

        //创建驱动
        driver=new ChromeDriver();
        }else{

        //定义浏览器参数
        ChromeOptions chromeOptions = new ChromeOptions();
        //设置不打开浏览器
        chromeOptions.addArguments("--headless");
        //初始化驱动
        driver = new ChromeDriver(chromeOptions);
        }

        //与将要爬取的网站建立连接
        driver.get("https://www.jd.com");
        //搜索商品 ！----------
        driver.findElement(By.id("key")).sendKeys("jk");
        //获取点击按钮  然后执行点击事件
        driver.findElement(By.cssSelector("#search > div > div.form > button")).click();

       //设定睡眠时间（可根据网络速度实际调整） 网速慢的话可能会出现 找不到节点 没有加载出来
        sleep(2000);
        //JavascriptExecutor 强制转换为执行器  executeScript同步执行 executeAsyncScript异步执行
        //执行js，滚动条下拉到最底
        ((JavascriptExecutor)driver).executeScript("window.scrollTo(0,document.body.scrollHeight)");
        sleep(2000);
        //获取京东搜索后的类如 中查看  获取商品描述
        List<WebElement> elements = driver.findElements(By.cssSelector("#J_goodsList ul li"));
        for (WebElement e:elements
             ) {
              WebElement img= null;
            try{

             img = e.findElement(By.cssSelector(".p-img")).findElement(By.tagName("img"));
            }catch (Exception es){
                continue;
            }
            //获取图片地址
            //判断商品信息图片信息 data-lazy-img 不确定读取是原有值还是保存的图片值 京东一直读取src属性会出现为空现象
            String imgpath = img.getAttribute("data-lazy-img").equals("done")?      
            img.getAttribute("src"):"https:"+img.getAttribute("data-lazy-img");
            System.out.println("----------------------------------------------------------");
            System.out.println("图片：\n"+imgpath);
            //下载图片
            //后方地址请更改为自己的
            downloadImage(imgpath,"D:\\Temp\\TestImge");
           //获取描述
            String text = e.findElement(By.cssSelector(".p-name  a")).getText();
           //获取价格
            String price = e.findElement(By.cssSelector(".p-price strong i")).getText();

            System.out.println("描述：\n"+text+"\n价格："+price);

        }

    }
 //下载图片
 public static void downloadImage(String img,String path) throws IOException, InterruptedException {
     URL url =new URL(img);
     InputStream inputStream = url.openStream();
     //获取图片名称
     String pants= path+img.substring(img.lastIndexOf("/"));
     FileOutputStream fileOutputStream = new FileOutputStream(pants);
     //写入文件
     byte[] bytes = new byte[1024 * 8];
    int len =0;
    while((len=inputStream.read(bytes))!=-1){

        fileOutputStream.write(bytes,0,len);
    }
    //关闭流
     fileOutputStream.flush();
    fileOutputStream.close();
    inputStream.close();
    //延迟下载
    sleep(300);
    // System.out.println(pants);
 }


}

使用脚本方式向页面执行js代码文章来源地址https://www.toymoban.com/news/detail-404023.html

//执行js，滚动条下拉到最底
((JavascriptExecutor) driver).executeScript("window.scrollTo(0,document.body.scrollHeight)");

到了这里，关于Selenium介绍--实例爬取京东商品信息与图片的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！

Toy模板网

Selenium介绍--实例爬取京东商品信息与图片

一、Selenium简介

二、Selenium组成

三、Selenium特点

四、Selenium的基本使用

1.下载所用浏览器需要的驱动

2.创建项目并导入依赖

3.入门

3.代码演示

五、实例爬取京东商品信息与图片 _

5.1 执行效果

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

支付宝扫一扫领取红包，优惠每天领

二维码1

二维码2