树莓派卸载腾出空间

数字看起来如何?(How are the numbers looking?)

Working in tech start-ups, we are often asked about metrics of each technical component in production. Ambitious new startups have the “build it first, then they will come” mentality. Whether their offerings are going viral or dead in the water, they prefer shipping new features than thinking about platform reliability.

在技​​术初创公司中,我们经常被问到生产中每个技术组件的指标。 雄心勃勃的新创业公司有“先建好,然后就来”的心态。 无论他们的产品是病毒式传播还是死水式传播,他们都希望提供新功能,而不是考虑平台可靠性。

Bigger shops with an analytics person (or team) on-board can put together analysis on which direction the business should go. However, as the product becomes bigger and more complex, it will grow to have a temperament of its own.

拥有分析人员(或团队)的大型商店可以将分析汇总到业务往哪个方向发展。 但是,随着产品变得越来越大,越来越复杂,它会变得具有自己的气质。

You may be familiar with these customer service reports:

您可能熟悉以下客户服务报告:

“Customer is saying the form loads for them, but it does not complete.”

“客户说要为他们加载表格,但还没有完成。”

“Customer is saying their identification request has not completed in a week. Shouldn’t it finish in three days?”

“客户说他们的身份证明请求一周之内没有完成。 它不应该在三天内完成吗?”

“Customer is complaining about the product not loading and blank documents are shown.”

“客户抱怨产品无法加载,并显示空白文档。”

If you have heard of those things, you may be familiar with these internal chatters:

如果您听说过这些事情,那么您可能对这些内部聊天很熟悉:

“I thought we rolled out that fix last month, why is it coming back?”

“我以为我们上个月推出了该修复程序,为什么它又回来了?”

“Didn’t we QA for international names?”

“我们不是对国际名称进行质量检查吗?”

“Are we being hacked? Why is the site behaving so badly?”

“我们被黑客入侵了吗? 为什么网站的表现如此糟糕?”

For every customer report you heard about, ten to a hundred other reload, retry, and gave up. By the time you hear about it in the engineering team, it has impacted thousands or more.

对于您听到的每个客户报告,其他十到一百次重新加载,重试和放弃。 到您在工程团队中听说它时,它已经影响了成千上万甚至更多。

How can we stay on top of these technical issues? This where instrumentation and observability comes into the picture.

我们如何才能紧贴这些技术问题? 这是仪表和可观察性的体现。

您的代码是否已安装好? (Is your code instrumented today?)

  • Can you give me the number of successfully handled requests vs failed ones?能给我成功处理的请求与失败处理的请求的数量吗?
  • How about the number of times where a customer order was inserted into the database?将客户订单插入数据库的次数如何?
  • If this data is in database and remote log storage, how much effort would it take you to put together a report?

    如果此数据位于数据库和远程日志存储中,那么您将花费多少精力来整理一份报告?

  • How about a report that updates hourly?每小时更新一次的报告怎么样?

Conventional troubleshooting relies on building pattern matching rules on log files. In some cases, operators log into the server to look at logs directly. The more elements there are in the system, the more places errors can spontaneously appear out of thin air.We know the mantra to keep everything as simple as possible, of course. But, some problems do require taking on additional expertise and complexity. At the end of the day, you may have more logs than time available to scan through them all.

常规故障排除依赖于在日志文件上构建模式匹配规则。 在某些情况下,操作员会登录服务器直接查看日志。 系统中存在的元素越多,错误就会自然而然地出现在更多的位置。当然,我们知道使所有事情保持尽可能简单的口头禅。 但是,某些问题确实需要额外的专业知识和复杂性。 一天结束时,您可能需要花费更多的时间来浏览所有日志。

How do we detect warning signs before it impacts business revenue, given limited team bandwidth?

在团队带宽有限的情况下,我们如何在警告标志影响业务收入之前对其进行检测?

The answer is: learn about open source instrumentation systems. The two products highly recommended by the community are Grafana and Prometheus.https://radar.cncf.io/2020-09-observability

答案是:了解开源仪器系统。 社区高度推荐的两种产品是Grafana和Prometheus。 https://radar.cncf.io/2020-09-可观察性

忘记日志。 首先关注指标。 (Forget about logs. Focus on metrics first.)

Instrumentation allows us to keep tabs on a program's current state.

通过检测,我们可以随时了解程序的当前状态。

  • We can declare a counter that gets increased whenever a record is successfully inserted into the database.我们可以声明一个计数器,只要将记录成功插入数据库,计数器就会增加。
  • We can measure the amount of time an external system takes to handle

    我们可以衡量外部​​系统处理的时间

    our requests.

    我们的要求。

  • We can measure the current environment's cpu/memory usage to reflect on the possibility of memory leaks.我们可以测量当前环境的cpu /内存使用情况,以反映内存泄漏的可能性。

Where as logs would allow an investigator to pinpoint exactly where a user journey goes wrong, metrics builds a top-level model for the team to operate with.

日志可以使调查人员准确查明用户旅程出了错的地方,而指标则为团队建立了顶层模型。

想象一个人节食并进行日常锻炼: (Imagine a person going on diets and exercising routines:)

they are instructed to keep tabs on the calories content of the food they eat, and the intensity/length of the type of exercises performed. Lastly, they must record their weight at a regular interval! When we are not thinking in terms of metrics, we lack the proper units to even frame our goals with.A diet routine without numbers may work very well, but we cannot be absolutely certain until we measure with numbers.

他们被指示要密切注意所吃食物的卡路里含量以及所进行的运动类型的强度/时间。 最后,他们必须定期记录体重! 当我们不按照指标进行思考时,我们缺乏适当的单位来制定目标,没有数字的饮食习惯可能效果很好,但是直到我们用数字进行衡量时我们才能绝对确定。

The same can be said for creating and maintaining software offerings. If we are not thinking in terms of metrics such as:

创建和维护软件产品也可以这样说。 如果我们不考虑以下指标:

  • 99 percentile request response time99%的请求响应时间
  • server up time服务器正常运行时间
  • error and disconnect rates错误和断开率

We are already lost in terms of quality. We can make tweaks, fixes, and push new features in our platform, but we aren't sure if they make matters better or worse. The only certainty when flying blind is that we know the errors in the logs have stopped, but was it due to our fixes? Or a restart would have fixed it? Who knows?!

我们已经失去了质量。 我们可以在平台中进行调整,修复和推送新功能,但是我们不确定它们是否会改善或恶化。 盲目的唯一确定的是我们知道日志中的错误已停止,但这是由于我们的修复程序造成的吗? 还是重启会解决? 谁知道?!

Forget about logs. Focus on metrics first.

忘记日志。 首先关注指标。

“但是我们太忙了,无法花时间进行测量!” (“But we are too busy to spend time on measurements!”)

No one is perfect. We have a limited amount of time available in choosing winning strategies and implementing them. Writing code without customer inputs/feedback and insights on how the code is running is a frightening reality many developers face.

没有人是完美的。 我们在选择制胜战略和实施战略方面有有限的时间。 在没有客户输入/反馈的情况下编写代码以及对代码如何运行的见解是许多开发人员面临可怕现实

If you care about winning and staying in business, you need to keep your customers happy, and your services reliable enough. Just as the dieting and exercising person must measure calories, time exercises, and record their weight, so can developer teams sit down to figure out what numbers to measure, and leverage Prometheus and Grafana to keep measuring them.

如果您关心赢得和维持业务,则需要使您的客户满意,并且您的服务足够可靠。 正如节食和锻炼的人必须测量卡路里,进行时间锻炼并记录其体重一样,开发人员团队也可以坐下来确定要测量的数字,并利用Prometheus和Grafana来不断测量它们。

You literally cannot set objectives without measurements.

从字面上看,您无法设定目标而不进行测量。

“好,但是有多少个步骤?” (“Ok, but how many steps are there?”)

Now onto the business of monitoring itself. Here are the 8 steps any engineers can follow to get insights into our platform:

现在进入监视本身的业务。 任何工程师都可以按照以下8个步骤来深入了解我们的平台:

  1. Install Prometheus. It will retain two weeks of metrics by default, and is enough for most start-ups. If you are a bigger business that needs longer data retention, you know who to reach out to.

    安装Prometheus 。 默认情况下,它将保留两周的指标,足以满足大多数初创企业的需求。 如果您是一家规模较大的企业,需要更长的数据保留时间,那么您就会知道该联系谁。

  2. Install Grafana. Ideally, it would use an external database (such as Postgres) instead of the internal sqlite3 database. We want all monitoring related components to be as reliable as possible.

    安装Grafana 。 理想情况下,它将使用外部数据库(例如Postgres)而不是内部sqlite3数据库。 我们希望所有与监视相关的组件都尽可能地可靠。

  3. Install node_exporter on our virtual machines. Packages are available for ubuntu, centos, and other flavors of linux. This light weight agent helps monitor resource usage on the machines.

    在我们的虚拟机上安装node_exporter 。 软件包可用于ubuntu,centos和其他类型的linux。 该轻量级代理有助于监视计算机上的资源使用情况。

  4. Import Prometheus client library into the application, and start with tracking the number of errors and exceptions occurring in the system.

    将Prometheus客户端库导入应用程序,并从跟踪系统中发生的错误和异常的数量开始。

  5. Configure Prometheus to scrape both application metrics and node_exporter machine metrics. Verify that samples are flowing through.配置Prometheus,以同时抓取应用程序指标和node_exporter计算机指标。 验证样品是否流过。
  6. Setup Prometheus as a data source in Grafana, and create your very first dashboard to see how many errors are occurring in the system.将Prometheus设置为Grafana中的数据源,并创建您的第一个仪表板以查看系统中发生了多少错误。
  7. Setup a slack notification channel in Grafana, so you can be warned about error rate rising.在Grafana中设置一个松弛的通知通道,因此可以警告您错误率上升。
  8. Iterate, add, and refine metrics as the team becomes more knowledgeable on the types of metrics it cares about.随着团队对其所关心的指标类型的了解越来越多,可以迭代,添加和完善指标。

“这项工作需要多长时间?” (“How long will this effort take?”)

How long should these items take?

这些物品需要多长时间?

For smaller footprints of under twenty machines and five applications, this exercise in instrumenting and monitoring will take a single engineer no longer than two weeks. That's 80 hours. Hire a contractor, and youwill be done and complete with training and documentation within a month.This is much less time than the many hours engineering team will spend reading through logs in the future. Once the pipeline is established, more different types of measurements is possible.

对于少于20台机器和5种应用的较小占地面积,在仪器和监视中进行的这项工作将花费单个工程师不超过两周的时间。 那是80个小时。 雇用承包商,您将在一个月内完成培训和文档编制工作,而这比工程团队日后要花大量时间阅读日志的时间要少得多。 一旦建立了管道,就可以进行更多不同类型的测量。

Better metrics can lead to better business decisions.

更好的指标可以导致更好的业务决策。

结论 (Conclusion)

Just as you wouldn't trust a hospital's treatment when they do not take measurement, we cannot be sure of our product's reliability until we actually look at the numbers. If you are serious about service reliability, but are not sure where to begin?

就像您不信任医院不进行测量时的治疗方法一样,在实际查看数字之前,我们无法确定产品的可靠性。 如果您认真对待服务可靠性,但不确定从哪里开始?

Reach out to us with questions about observability at info@teamzerolabs.com

通过info@teamzerolabs.com与我们联系,提出有关可观察性的问题

翻译自: https://medium.com/teamzerolabs/making-time-for-instrumentation-and-observability-ac873b063d6a

树莓派卸载腾出空间


http://www.taodudu.cc/news/show-5839199.html

相关文章:

  • 一些基础的入侵绕过姿势案例分析
  • 高效能人士的七个习惯-读书笔记
  • 赌输了
  • 恨、恨、恨
  • ​还在买爆款?这家公司让你在家淘全球各地特色小店!
  • 大数据智能推荐促进内容生态建设 今日头条与时尚集团战略合作
  • 潮印天下时尚 印出随心所欲
  • 抖音小店无货源不吃香了?这个项目的红利期结束了?
  • 抖音小店无货源玩法,玩好这几点小店轻轻松松月入上万
  • 抖音小店无货源还能玩吗?无货源模式靠谱吗?现在入场晚不晚?
  • ios 登录功能学习研究
  • H5——移动端JQ实现下拉刷新、上拉加载更多
  • 树相关算法及Python实现
  • day06 生命周期
  • “揭秘高效索引技巧:从零到一打造优质内容检索系统“
  • 区块链原理与基础理论知识
  • `Solution` `LC` 2603. 收集树中金币
  • CTR预估 论文精读(一)--XGBoost
  • 武汉坚守第三十三天——信任危机起风波,出手控价安民心
  • 苏嵌项目实战 学习日志1
  • form跳转到新页面
  • Mysql在高并发情况下,防止库存超卖而小于0的解决方案
  • 新叶的书单
  • 闲情雅兴著文章
  • plant 飞机制作小结
  • 基于改进 YOLOv5 的航空发动机表面缺陷检测模型如何制作?
  • VR GunJack 制作 - 机舱模型和动画
  • 游戏制作 RPGDreamer (一)--打飞机类游戏
  • python模型训练框架_使用TensorFlow框架基于SSD算法训练模型
  • HTML5 绘制简单的飞机模型

树莓派卸载腾出空间_腾出时间进行仪表和观测相关推荐

  1. sys.setdefaultencoding 没有这个方法_没有时间?工作太忙?按下葫芦浮起瓢?掌握这个方法很重要...

    一.你有没有遇到过以下问题呢? 一整天忙忙碌碌,晚上却突然想起还有什么事没有干? 快要到截止期限了,可是我的材料还没有准备好? 面临很多事情,不知道从何做起? 做着这件事,心里却在为另外一件事焦虑? ...

  2. 怎样清理苹果电脑磁盘空间_苹果磁盘空间不足怎么清理_mac系统怎么清理磁盘空间-win7之家...

    当我们在使用苹果mac系统电脑使用一段时间之后,电脑就会自动产生很多缓存和垃圾文件,如果时间久了没有及时清理就会导致磁盘空间不足,然后电脑运行变得不是很流畅,所以我们需要对磁盘空间进行清理,接下来给大 ...

  3. 复习Object类_日期时间类_System类_StringBuilder_包装类以及各类的细节

    Object类_日期时间类_System类_StringBuilder_包装类以及各类的细节 主要内容 Object类 Date类 DateFormat类 Calendar类 System类 Stri ...

  4. Socket 短连接、长连接_YTmarkit的空间_百度空间

    Socket 短连接.长连接_YTmarkit的空间_百度空间 Socket 短连接.长连接_YTmarkit的空间_百度空间 Socket 短连接.长连接 socket Socket协议的形象描述 ...

  5. 【转】vim 分割窗口[转]_孤鸿灬的空间_百度空间

    [转]vim 分割窗口[转]_孤鸿灬的空间_百度空间 [转]vim 分割窗口[转]_孤鸿灬的空间_百度空间 [转]vim 分割窗口[转] 同时显示两个不同的文件,或者同时查看一个文件的两个不同的部分, ...

  6. linux下qt静态编译_自由出土文物的空间_百度空间

    linux下qt静态编译_自由出土文物的空间_百度空间 linux下qt静态编译_自由出土文物的空间_百度空间 linux下qt静态编译 2012-04-09 13:10 测试通过,贴一下过程,仅用来 ...

  7. 从时间到空间,高精度时间频率传递技术新突破

    从时间到空间,高精度时间频率传递技术新突破 从时间到空间,高精度时间频率传递技术新突破 时间和空间,可能是这个宇宙中最深远.最神秘.也最浪漫的两个词了.某一天,一串光,携带精确的时间,飞越苍茫外太空, ...

  8. 怎样清理苹果手机内存空间_你还不知道?苹果手机这样清理垃圾,轻松腾出10G内存!...

    这年头手机内存不够和电量不足已经成为了广大手机用户的梦魇,不过在手机电量不足这一块儿好歹还有充电宝和快充可以拯救一下.可是手机内存不足可就比较棘手了,这一点相信很多苹果用户的感触尤为深刻.尽管如今苹果 ...

  9. 触手可及大数据 下载_触手可及的创新,为虚拟化腾出空间

    存档日期:2019年5月13日 | 首次发布:2011年3月30日 虚拟化是当今企业计算中的热门话题. 但是,术语"虚拟化"可以具有许多不同的含义,每种含义都为企业提供了独特的优势 ...

最新文章

  1. 斯坦福团队是如何构建更好用的聊天 AI 呢?
  2. 《未来企业效率白皮书》
  3. Nature | 原核生物基因的生物地理学研究
  4. 春节前后学习实践的技术领域
  5. linux获取最高权限并取消_通过安卓渗透WIN7获取系统最高权限
  6. 安卓UI图分离器(支持ios@2x3x图转成安卓xhdpi,xxhdpi图,最新支持拖入并自动解压.zip图片压缩包)
  7. RDD 与 DataFrame原理-区别-操作详解
  8. FastReport人民币大写转换
  9. python pip3 pip_Python:pip 和pip3的区别
  10. Android XML文件使用
  11. multipart/form-data与application/octet-stream的区别、application/x-www-form-urlencoded
  12. Luogu P2079 烛光晚餐(背包)
  13. 2021最新版谷歌浏览器百度网盘下载
  14. 关于Matlab取整函数round的用法
  15. 宝塔下的服务器环境搭建步骤
  16. 蚂蚁金服服务注册中心数据一致性方案分析 | SOFARegistry 解析
  17. IHG Connect,给旅人一个温暖的家
  18. 挖掘关键词的六种方法
  19. 外包岗退退退!坚决不能选的三点理由:简历有污点,稳定性极差,福利待遇差!...
  20. python判断字符是英文字母怎么回事_python判断字符串是否包含字母

热门文章

  1. 2022-2028全球与中国碳补偿项目市场现状及未来发展趋势
  2. 12.2 票据背书需求分析
  3. Viewing Information About CDBs PDBs
  4. 最小二乘法的原理及推导
  5. 喀嚓鱼”马克杯活动,我无语了
  6. 通过 iTunes Search API 检测版本更新
  7. 关于 Java 中的 WeakReference
  8. 株洲小巨蛋项目开发总结
  9. Java中的过滤器Filter
  10. signal和sigaction