Series Introduction

Packetloop CTO Michael Baker (@cloudjunky) made a big splash when he presented ‘Finding Needles in Haystacks (the Size of Countries)‘ at Blackhat Europe earlier this year. The paper outlines a toolkit based onApache Pig, Packetpig @packetpig (available on github), for doing network security monitoring and intrusion detection analysis on full packet captures using Hadoop.

In this series of posts, we’re going to introduce Big Data Security and explore using Packetpig on real full packet captures to understand and analyze networks. In this post, Michael will introduce big data security in the form of full data capture, Packetpig and Packetloop.

Introducing Packetpig

Intrusion detection is the analysis of network traffic to detect intruders on your network. Most intrusion detection systems (IDS) look for signatures of known attacks and identify them in real-time. Packetpig is different. Packetpig analyzes full packet captures – that is, logs of every single packet sent across your network – after the fact. In contrast to existing IDS systems, this means that using Hadoop on full packet captures, Packetpig can detect ‘zero day’ or unknown exploits on historical data as new exploits are discovered. Which is to say that Packetpig can determine whether intruders are already in your network, for how long, and what they’ve stolen or abused.

Packetpig is a Network Security Monitoring (NSM) Toolset where the ‘Big Data’ is full packet captures. Like a Tivo for your network, through its integration with Snort, p0f and custom java loaders, Packetpig does deep packet inspection, file extraction, feature extraction, operating system detection, and other deep network analysis. Packetpig’s analysis of full packet captures focuses on providing as much context as possible to the analyst. Context they have never had before. This is a ‘Big Data’ opportunity.

Full Packet Capture: A Big Data Opportunity

What makes full packet capture possible is cheap storage – the driving factor behind ‘big data.’ A standard 100Mbps internet connection can be cheaply logged for months with a 3TB disk. Apache Hadoop is optimized around cheap storage and data locality: putting spindles next to processor cores. And so what better way to analyze full packet captures than with Apache Pig – a dataflow scripting interface on top of Hadoop.

In the enterprise today, there is no single location or system to provide a comprehensive view of a network in terms of threats, sessions, protocols and files. This information is generally distributed across domain-specific systems such as IDS Correlation Engines and data stores, Netflow repositories, Bandwidth optimisation systems or Data Loss Prevention tools. Security Information and Event Monitoring systems offer to consolidate this information but they operate on logs – a digest or snippet of the original information. They don’t provide full fidelity information that can be queried using the exact copy of the original incident.

Packet captures are a standard binary format for storing network data. They are cheap to perform and the data can be stored in the cloud or on low-cost disk in the Enterprise network. The length of retention can be based on the amount of data flowing through the network each day and the window of time you want to be able to peer into the past.

Pig, Packetpig and Open Source Tools

In developing Packetpig, Packetloop wanted to provide free tools for the analysis network packet captures that spanned weeks, months or even years. The simple questions of capture and storage of network data had been solved but no one had addressed the fundamental problem of analysis. Packetpig utilizes the Hadoop stack for analysis, which solves this problem.

For us, wrapping Snort and p0f was a bit of a homage to how much security professionals value and rely on open source tools. We felt that if we didn’t offer an open source way of analysing full packet captures we had missed a real opportunity to pioneer in this area. We wanted it to be simple, turn key and easy for people to take our work and expand on it. This is why Apache Pig was selected for the project.

Understanding your Network

One of the first data sets we were given to analyse was a 3TB data set from a customer. It was every packet in and out of their 100Mbps internet connection for 6 weeks. It contained approximately 500,000 attacks. Making sense of this volume of information is incredibly difficult with current tooling. Even Network Security Monitoring (NSM) tools have difficult with this size of data. However it’s not just size and scale. No existing toolset allowed you to provide the same level of context. Packetpig allows you to join together information related to threats, sessions, protocols (deep packet inspection) and files as well as Geolocation and Operating system detection information.

We are currently logging all packets for a website for six months. This data set is currently around 0.6TB and because all the packet captures are stored in S3 we can quickly scan through the dataset. More importantly, we can run a job every nightly or every 15 minutes to correlate attack information with other data from Packetpig to provide an ultimate amount of context related to security events.

Items of interest include:

  • Detecting anomalies and intrusion signatures
  • Learn timeframe and identity of attacker
  • Triage incidents
  • “Show me packet captures I’ve never seen before.”

“Never before seen” is a powerful filter and isn’t limited to attack information. First introduced by Marcus Ranum,“never before seen” can be used to rule out normal network behaviour and only show sources, attacks, and traffic flows that are truly anomalous. For example, think in terms of the outbound communications from a Web Server. What attacks, clients and outbound communications are new or have never been seen before? In an instant you get an understanding that you don’t need to look for the normal, you are straight away looking for the abnormal or signs of misuse.

Agile Data

Packetloop uses the stack and iterative prototyping techniques outlined in the forthcoming book by Hortonworks’ ownRussell Jurney, Agile Data (O’Reilly, March 2013). We use Hadoop, Pig, Mongo and Cassandra to explore datasets and help us encode important information into d3 visualisations. Currently we use all of these tools to aid in our research before we add functionality to Packetloop. These prototypes become the palette our product is built from.

Big Data Security Part One: Introducing PacketPig相关推荐

  1. (论文笔记06.High Fidelity Data Reduction for Big Data Security Dependency Analyses(CCF A)2016)

    High Fidelity Data Reduction for Big Data Security Dependency Analyses(CCF A) 这是我读的条理最清晰的一篇文章了! 1.AB ...

  2. Data + AI Summit 2022 超清视频下载

    Data + AI Summit 2022 于2022年06月27日至30日举行.本次会议是在旧金山进行,中国的小伙伴是可以在线收听的,一共为期四天,第一天是培训,后面几天才是正式会议.本次会议有超过 ...

  3. Management of your data

    Management of your data 文章目录 Management of your data Intro Session data Data lifecycle Why Areas of ...

  4. [安全论文翻译] Analysis of Location Data Leakage in the Internet Traffic of Android-based Mobile

    基于Android移动设备的互联网流量中的位置数据泄漏分析 Analysis of Location Data Leakage in the Internet Traffic of Android-b ...

  5. 【论文翻译】Recent security challenges in cloud computing 近代云计算面临的安全挑战

    Recent security challenges in cloud computing Nalini Subramanian Research Scholar⁎ , Andrews Jeyaraj ...

  6. 什么是DataOps?难道DataOps只是面向Data 的Ops吗?

    DevOps 近几年可谓风生水起,这与它吸引人的使命有关 - 提升产品研发迭代的速度,当然也不能忽略历史的进程,比如虚拟化(尤其是 Docker)技术的加持! 大数据(或团队)一般在前方业务线的存在感 ...

  7. Techniques and Applications for Crawling, Ingesting and Analyzing Blockchain Data 中文翻译

    Techniques and Applications for Crawling, Ingesting and Analyzing Blockchain Data 中文翻译 摘要 关键词 I 引言 I ...

  8. Oracle Database 12c Security - 6. Real Application Security

    RAS(Real Application Security)是12c的新特性,RAS的特点是全面(comprehensive)和透明. 传统的Web应用,要在不同连接间切换,并保证正确权限的成本太高, ...

  9. 基于联邦学习的多源异构数据融合算法 Multi-Source Heterogeneous Data Fusion Based on Federated Learning

    5.基于联邦学习的多源异构数据融合算法 Multi-Source Heterogeneous Data Fusion Based on Federated Learning 摘要:随着科技的迅猛发展, ...

最新文章

  1. java 虚拟机 初始化_Java虚拟机 类初始化 阶段
  2. corosync+pacemaker+crmsh配置高可用集群。
  3. Oracle中日期Date类型格式的转化
  4. UIAutomator 2
  5. java 获取400的错误信息_获取400错误的请求Spring RestTemplate POST
  6. 韩顺平循序渐进学java 第12讲 多态
  7. bean的属性类型----ibatis类型-------oracle数据库类型
  8. 用了vscode和clion我都裂开了
  9. collator java_Java Collator getInstance(Locale)用法及代码示例
  10. 详解CSS的Flex布局
  11. td外边加div为啥不隐藏_那些不常见,但却非常实用的 css 属性
  12. php 文件上传框架,Laravel框架实现文件上传的方法分析
  13. 云鹊医怎么快速认证_兴趣认证怎么申请?掌握这9个小技巧,快速通过
  14. python中__init__()、__new__()、__call__()、__del__()几个魔法方法的用法
  15. MLP算法,也叫前馈神经网络或多层感知器
  16. 判断虚拟键盘是删除键的方法
  17. 第二届阿里巴巴大数据智能云上编程大赛亚军比赛攻略
  18. 中国大数据分析行业研究报告
  19. 如何激励“躺平”员工?
  20. 电商数据分析指标体系实例。

热门文章

  1. spring oauth2+JWT后端自动刷新access_token
  2. 万能 Transformer,你应该知道的一切
  3. Github最强算法刷题笔记.pdf
  4. 吴 恩 达 教 你 做 机 器 学 习 职 业 规 划
  5. DeepMind、哈佛造出了 AI「小白鼠」,从跑、跳、觅食、击球窥探神经网络的奥秘...
  6. Django模型之数据库操作-查询
  7. Linux服务器安装python3.6.1 运行爬虫
  8. beautifulsoup以及正则表达式re之间的一些知识!
  9. 爬虫笔记|r.text-r.request.headers|修改,头
  10. 归纳苹果,Facebook大规模部署的Spark-用户界面详细执行操作。