1、创建Maven工程

调整Maven仓库所在的位置,具体参考:http://blog.csdn.net/tototuzuoquan/article/details/74571374

2、编写Pom文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>cn.toto.spark</groupId><artifactId>bigdata</artifactId><version>1.0-SNAPSHOT</version><properties><maven.compiler.source>1.7</maven.compiler.source><maven.compiler.target>1.7</maven.compiler.target><encoding>UTF-8</encoding><scala.version>2.10.6</scala.version><spark.version>1.6.2</spark.version><hadoop.version>2.6.4</hadoop.version></properties><dependencies><dependency><groupId>org.scala-lang</groupId><artifactId>scala-library</artifactId><version>${scala.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.10</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>${hadoop.version}</version></dependency><dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version>5.1.38</version></dependency></dependencies><build><sourceDirectory>src/main/scala</sourceDirectory><testSourceDirectory>src/test/scala</testSourceDirectory><plugins><plugin><groupId>net.alchim31.maven</groupId><artifactId>scala-maven-plugin</artifactId><version>3.2.2</version><executions><execution><goals><goal>compile</goal><goal>testCompile</goal></goals><configuration><args><arg>-make:transitive</arg><arg>-dependencyfile</arg><arg>${project.build.directory}/.scala_dependencies</arg></args></configuration></execution></executions></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-shade-plugin</artifactId><version>2.4.3</version><executions><execution><phase>package</phase><goals><goal>shade</goal></goals><configuration><filters><filter><artifact>*:*</artifact><excludes><exclude>META-INF/*.SF</exclude><exclude>META-INF/*.DSA</exclude><exclude>META-INF/*.RSA</exclude></excludes></filter></filters></configuration></execution></executions></plugin></plugins></build></project>

3、准备要处理的文件

其中ip信息的文件(ip.txt)如下:

1.0.1.0|1.0.3.255|16777472|16778239|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302
1.0.8.0|1.0.15.255|16779264|16781311|亚洲|中国|广东|广州||电信|440100|China|CN|113.280637|23.125178
1.0.32.0|1.0.63.255|16785408|16793599|亚洲|中国|广东|广州||电信|440100|China|CN|113.280637|23.125178
1.1.0.0|1.1.0.255|16842752|16843007|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302
1.1.2.0|1.1.7.255|16843264|16844799|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302
1.1.8.0|1.1.63.255|16844800|16859135|亚洲|中国|广东|广州||电信|440100|China|CN|113.280637|23.125178
1.2.0.0|1.2.1.255|16908288|16908799|亚洲|中国|福建|福州||电信|350100|China|CN|119.306239|26.075302

数据访问文件(access.log)如下:**

20090121000132095572000|125.213.100.123|show.51.com|/shoplist.php?phpfile=shoplist2.php&style=1&sex=137|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0(Compatible Mozilla/4.0(Compatible-EmbeddedWB 14.59 http://bsalsa.com/ EmbeddedWB- 14.59  from: http://bsalsa.com/ )|http://show.51.com/main.php|
20090121000132124542000|117.101.215.133|www.jiayuan.com|/19245971|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; TencentTraveler 4.0)|http://photo.jiayuan.com/index.php?uidhash=d1c3b69e9b8355a5204474c749fb76ef|__tkist=0; myloc=50%7C5008; myage=2009; PROFILE=14469674%3A%E8%8B%A6%E6%B6%A9%E5%92%96%E5%95%A1%3Am%3Aphotos2.love21cn.com%2F45%2F1b%2F388111afac8195cc5d91ea286cdd%3A1%3A%3Ahttp%3A%2F%2Fimages.love21cn.com%2Fw4%2Fglobal%2Fi%2Fhykj_m.jpg; last_login_time=1232454068; SESSION_HASH=8176b100a84c9a095315f916d7fcbcf10021e3af; RAW_HASH=008a1bc48ff9ebafa3d5b4815edd04e9e7978050; COMMON_HASH=45388111afac8195cc5d91ea286cdd1b; pop_1232093956=1232468896968; pop_time=1232466715734; pop_1232245908=1232469069390; pop_1219903726=1232477601937; LOVESESSID=98b54794575bf547ea4b55e07efa2e9e; main_search:14469674=%7C%7C%7C00; registeruid=14469674; REG_URL_COOKIE=http%3A%2F%2Fphoto.jiayuan.com%2Fshowphoto.php%3Fuid_hash%3D0319bc5e33ba35755c30a9d88aaf46dc%26total%3D6%26p%3D5; click_count=0%2C3363619
20090121000132406516000|117.101.222.68|gg.xiaonei.com|/view.jsp?p=389|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; CIBA)|http://home.xiaonei.com/Home.do?id=229670724|_r01_=1; __utma=204579609.31669176.1231940225.1232462740.1232467011.145; __utmz=204579609.1231940225.1.1.utmccn=(direct)
20090121000132581311000|115.120.36.118|tj.tt98.com|/tj.htm|Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; TheWorld)|http://www.tt98.com/|

4.获取ip归属地信息

package cn.toto.sparkimport java.io.{BufferedReader, FileInputStream, InputStreamReader}import scala.collection.mutable.ArrayBuffer/*** Created by toto on 2017/7/8.* 查找IP的归属地信息*/
object IPLocationDemo {def ip2Long(ip: String): Long = {val fragments = ip.split("[.]")var ipNum = 0Lfor (i <- 0 until fragments.length){ipNum =  fragments(i).toLong | ipNum << 8L}ipNum}def readData(path: String) = {val br = new BufferedReader(new InputStreamReader(new FileInputStream(path)))var s: String = nullvar flag = trueval lines = new ArrayBuffer[String]()while (flag){s = br.readLine()if (s != null)lines += selseflag = false}lines}def binarySearch(lines: ArrayBuffer[String], ip: Long) : Int = {var low = 0var high = lines.length - 1while (low <= high) {val middle = (low + high) / 2if ((ip >= lines(middle).split("\\|")(2).toLong) && (ip <= lines(middle).split("\\|")(3).toLong))return middleif (ip < lines(middle).split("\\|")(2).toLong)high = middle - 1else {low = middle + 1}}-1}/*** 运行后的结果是:* 2016917821* 120.55.0.0|120.55.255.255|2016870400|2016935935|亚洲|中国|浙江|杭州||阿里巴巴|330100|China|CN|120.153576|30.287459** 要求2016917821       在 |2016870400|2016935935|  之间。* @param args*/def main(args: Array[String]): Unit = {val ip = "120.55.185.61"val ipNum = ip2Long(ip)println(ipNum)val lines = readData("E:\\learnTempFolder\\ip.txt")val index = binarySearch(lines, ipNum)print(lines(index))}
}

运行结果:


5.查询IP归属地相关信息,并将这些信息存储到MySQL数据库中

代码如下:

package cn.toto.sparkimport java.sql.{Connection, Date, DriverManager, PreparedStatement}import org.apache.spark.{SparkConf, SparkContext}/*** Created by toto on 2017/7/8.*/
object IPLocation {val data2MySQL = (iterator: Iterator[(String, Int)]) => {var conn: Connection = nullvar ps : PreparedStatement = nullval sql = "INSERT INTO location_info (location, counts, accesse_date) VALUES (?, ?, ?)"try {conn = DriverManager.getConnection("jdbc:mysql://192.168.106.100:3306/bigdata", "root", "123456")iterator.foreach(line => {ps = conn.prepareStatement(sql)ps.setString(1, line._1)ps.setInt(2, line._2)ps.setDate(3, new Date(System.currentTimeMillis()))ps.executeUpdate()})} catch {case e: Exception => println("Mysql Exception")} finally {if (ps != null)ps.close()if (conn != null)conn.close()}}def ip2Long(ip: String): Long = {val fragments = ip.split("[.]")var ipNum = 0Lfor (i <- 0 until fragments.length){ipNum =  fragments(i).toLong | ipNum << 8L}ipNum}def binarySearch(lines: Array[(String, String, String)], ip: Long) : Int = {var low = 0var high = lines.length - 1while (low <= high) {val middle = (low + high) / 2if ((ip >= lines(middle)._1.toLong) && (ip <= lines(middle)._2.toLong))return middleif (ip < lines(middle)._1.toLong)high = middle - 1else {low = middle + 1}}-1}def main(args: Array[String]): Unit = {val conf = new SparkConf().setMaster("local[2]").setAppName("IpLocation")val sc = new SparkContext(conf)val ipRulesRdd = sc.textFile("E://workspace//ip.txt").map(line =>{val fields = line.split("\\|")val start_num = fields(2)val end_num = fields(3)val province = fields(6)(start_num, end_num, province)})//全部的ip映射规则val ipRulesArrary = ipRulesRdd.collect()//广播规则val ipRulesBroadcast = sc.broadcast(ipRulesArrary)//加载要处理的数据val ipsRDD = sc.textFile("E://workspace//access.log").map(line => {val fields = line.split("\\|")fields(1)})val result = ipsRDD.map(ip => {val ipNum = ip2Long(ip)val index = binarySearch(ipRulesBroadcast.value, ipNum)val info = ipRulesBroadcast.value(index)//(ip的起始Num, ip的结束Num,省份名)info}).map(t => (t._3, 1)).reduceByKey(_+_)//向MySQL写入数据result.foreachPartition(data2MySQL(_))//println(result.collect().toBuffer)sc.stop()}
}

数据库SQL:

CREATE DATABASE bigdata CHARACTER SET utf8;USE bigdata;CREATE TABLE location_info (id INT(10) AUTO_INCREMENT PRIMARY KEY,location VARCHAR(100),counts INT(10),accesse_date DATE
) ENGINE=INNODB DEFAULT CHARSET=utf8;

运行程序,运行结果后:

Spark查找某个IP的归属地,二分算法,try{}catch{}的使用,将结果存MySQL数据库相关推荐

  1. 如何查找专用 IP 地址?

    专用 IP 地址:这些地址在网络内部使用,例如,平板电脑.Wi-Fi 相机.无线打印机和台式电脑使用的家庭网络.这些类型的 IP 地址为设备提供了一种与路由器和专用家庭网络上的其他设备进行通信的方法. ...

  2. python解析IP地址归属地

    一.使用免费淘宝地址库 免费的嘛总是不如收费的好,自己玩玩记录一下 哪里不好? https://ip.taobao.com//outGetIpInfo?ip=xxx 把 xxx 替换成想要查找的IP地 ...

  3. 二分算法php,使用PHP实现二分查找算法代码分享

    第一种方法: [二分查找要求]:1.必须采用顺序存储结构 2.必须按关键字大小有序排列. [优缺点]折半查找法的优点是比较次数少,查找速度快,平均性能好;其缺点是要求待查表为有序表,且插入删除困难.因 ...

  4. 【Java数据结构与算法】第十七章 二分查找(非递归)和分治算法(汉诺塔)

    第十七章 二分查找(非递归)和分治算法(汉诺塔) 文章目录 第十七章 二分查找(非递归)和分治算法(汉诺塔) 一.二分查找 1.思路 2.代码实现 二.分治算法(汉诺塔) 1.概述 2.汉诺塔 一.二 ...

  5. 5 个用于在 Linux 终端中查找域名 IP 地址的命令

    5 个用于在 Linux 终端中查找域名 IP 地址的命令 本教程介绍了如何在 Linux 终端验证域名或计算机名的 IP 地址.本教程将允许你一次检查多个域.你可能已经使用过这些命令来验证信息.但是 ...

  6. 绕过CDN查找真实IP的方法总结

    文章目录 前言 1.如何判断是否使用CDN 1.1.直接ping 1.2.多地ping 1.3. nslookup 1.4. 查看响应头中的"X-cache"字段 2.如何绕过CD ...

  7. linux系统没ip,树莓派在没有显示器情况下查找未知IP教程

    如果是没有显示器操作树莓派,可能会不知道树莓派有线网卡自动分配到的IP地址,不知道登录到哪儿.以下提供详细操作步骤解决这个问题. 网段扫描法 这个是推荐的办法.网段扫描工具很多,推荐一个Advance ...

  8. 绕CDN查找真实IP方法

    绕CDN查找真实IP方法 1 多ping检测 2 nslookup检测 3 查询历史DNS记录 4 SecurityTrails平台查询 5 查询子域名 6 网络空间引擎搜索法 7 国外主机解析域名 ...

  9. Linux如何查找域名IP地址

    这篇文章主要介绍了在Linux终端中如何查找域名IP地址,具有一定借鉴价值,感兴趣的朋友可以参考下,希望大家阅读完这篇文章之后大有收获,下面让小编带着大家一起了解一下. 可以使用以下 5 个命令来完成 ...

最新文章

  1. python raise valueerror_raise ValueError('无法设置没有定义索引的帧'ValueError:
  2. LeetCode 340. Longest Substring with At Most K Distinct Characters
  3. 有乳胶枕吗_小耳朵猪、黑木耳面条、乳胶枕……吃喝玩乐穿用样样都有!松江这个展销会,你去了吗?...
  4. a + b + c 求和
  5. Office 2021办公套件iso镜像下载
  6. 详细剖析PS软件中的通道原理,让你完全理解颜色通道与Alpha通道
  7. 解锁ChatGPT超高级玩法,展示动态图片,纯干货分享!
  8. Microsoft Project
  9. 目标检测算法DSSD的原理详解
  10. c语言sqar是double,C语言怎么编写正弦波
  11. PyCrypto安装和使用示例
  12. 找到任务栏广告弹窗的源头
  13. java 获取百度云盘图片_java 利用百度云识别图片文字
  14. Linux下批量重命名文件或文件夹(rename命令)
  15. 二见钟情之ComboBox显示查询结果集
  16. C语言 输入一个华氏温度F,要求输出摄氏度C。
  17. 自然图像与医学图像的区别(研究方向:医学图像处理)
  18. Android系统入门
  19. 以可信度加权的方式做决定
  20. Oracle RAC的启动和关闭

热门文章

  1. 基础知识:编程语言介绍、Python介绍、Python解释器安装、运行Python解释器的两种方式、变量、数据类型基本使用
  2. 析构函数与构造函数的调用
  3. wxWidgets:wxPickerBase类用法
  4. boost::irange相关的测试程序
  5. BOOST_MP11_VERSION宏用法的测试程序
  6. boost::hana::minimum.by用法的测试程序
  7. 基于Boost::beast模块的同步HTTP客户端
  8. Boost:将自定义占位符_1复制到arg <1>的测试程序
  9. Boost:post process后期处理的测试程序
  10. DCMTK:演示状态查看器-打印服务器