
Sometime back I was looking for a way to search Google using Java Program. I was surprised to see that Google had a web search API but it has been deprecated long back and now there is no standard way to achieve this.

有时,我正在寻找一种使用Java程序搜索Google的方法。 我很惊讶地看到Google拥有一个Web搜索API,但是很早以前就已弃用了它,现在没有标准的方法可以实现此目的。

Basically google search is an HTTP GET request where query parameter is part of the URL, and earlier we have seen that there are different options such as Java HttpUrlConnection or Apache HttpClient to perform this search. But the problem is more related to parsing the HTML response and get the useful information out of it. That’s why I chose to use jsoup that is an open source HTML parser and it’s capable to fetch HTML from given URL.

基本上,谷歌搜索是一个HTTP GET请求,其中查询参数是URL的一部分,并且我们之前已经看到有不同的选项(例如Java HttpUrlConnection或Apache HttpClient)来执行此搜索。 但是问题更多与解析HTML响应并从中获取有用信息有关。 这就是为什么我选择使用jsoup ,它是一个开放源代码HTML解析器,并且能够从给定的URL中获取HTML。

So below is a simple program to fetch google search results in a java program and then parse it to find out the search results.


package com.journaldev.jsoup;import;
import java.util.Scanner;import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import;public class GoogleSearchJava {public static final String GOOGLE_SEARCH_URL = "";public static void main(String[] args) throws IOException {//Taking search term input from consoleScanner scanner = new Scanner(;System.out.println("Please enter the search term.");String searchTerm = scanner.nextLine();System.out.println("Please enter the number of results. Example: 5 10 20");int num = scanner.nextInt();scanner.close();String searchURL = GOOGLE_SEARCH_URL + "?q="+searchTerm+"&num="+num;//without proper User-Agent, we will get 403 errorDocument doc = Jsoup.connect(searchURL).userAgent("Mozilla/5.0").get();//below will print HTML data, save it to a file and open in browser to compare//System.out.println(doc.html());//If google search results HTML change the <h3 class="r" to <h3 class="r1"//we need to change below accordinglyElements results ="h3.r > a");for (Element result : results) {String linkHref = result.attr("href");String linkText = result.text();System.out.println("Text::" + linkText + ", URL::" + linkHref.substring(6, linkHref.indexOf("&")));}}}

Below is a sample output from above program, I saved the HTML data into file and opened in a browser to confirm the output and it’s what we wanted. Compare the output with below image.

下面是上述程序的输出示例,我将HTML数据保存到文件中,并在浏览器中打开以确认输出,这就是我们想要的。 将输出与下图进行比较。

Please enter the search term.
Please enter the number of results. Example: 5 10 20
Text::JournalDev, URL::=
Text::Java Interview Questions, URL::=
Text::Java design patterns, URL::=
Text::Tutorials, URL::=
Text::Java servlet, URL::=
Text::Spring Framework Tutorial ..., URL::=
Text::Java Design Patterns PDF ..., URL::=
Text::Pankaj Kumar (@JournalDev) | Twitter, URL::=
Text::JournalDev | Facebook, URL::=
Text::JournalDev - Chrome Web Store - Google, URL::=
Text::Debian -- Details of package libsystemd-journal-dev in wheezy, URL::=
Text::Debian -- Details of package libsystemd-journal-dev in wheezy ..., URL::=
Text::Debian -- Details of package libsystemd-journal-dev in sid, URL::=
Text::Debian -- Details of package libsystemd-journal-dev in jessie, URL::=
Text::Ubuntu – Details of package libsystemd-journal-dev in trusty, URL::=
Text::libsystemd-journal-dev : Utopic (14.10) : Ubuntu - Launchpad, URL::=
Text::Debian -- Details of package libghc-libsystemd-journal-dev in jessie, URL::=
Text::Advertise on JournalDev | BuySellAds, URL::=
Text::JournalDev | LinkedIn, URL::=
Text::How to install libsystemd-journal-dev package in Ubuntu Trusty, URL::=
Text::[global] auth supported = cephx ms bind ipv6 = true [mon] mon data ..., URL::=
Text::UbuntuUpdates - Package "libsystemd-journal-dev" (trusty 14.04), URL::=
Text::[Journal]Dev'err - Cursus Honorum - Enjin, URL::=

That’s all for google search in a java program, use it cautiously because if there is unusual traffic from your computer, chances are Google will block you.





