爬虫goodreads数据

Photo of old books by Ed Robertson on Unsplash
埃德·罗伯森 ( Ed Robertson)的旧书照片,内容为Unsplash

什么 (What)

This is a data set of the first 50,000 book ids pulled from Goodreads’ API on July 30th, 2020. A few thousand ids did not make it through because the book id was changed, the URL or API broke, or the information was stored in an atypical format.

这是2020年7月30日从Goodreads的API中提取的前50,000个图书ID的数据集。由于图书ID发生更改,URL或API中断或信息存储在其中,数千个ID没有通过非典型格式。

为什么 (Why)

From the reader’s perspective, books are a multi-hour commitment of learning and leisure (they don’t call it Goodreads for nothing). From the author’s and publisher’s perspectives, books are a way of living (with some learning and leisure too). In both cases, knowing which factors explain and predict great books will save you time and money. Because while different people have different tastes and values, knowing how a book is rated in general is a sensible starting point. You can always update it later.

从读者的角度来看,书籍是学习和休闲的一个多小时的承诺(他们不称其为“ 读书无所作为”)。 从作者和出版者的角度来看,书籍是一种生活方式(也有一些学习和休闲的机会)。 在这两种情况下,了解哪些因素可以解释和预测优秀的书将为您节省时间和金钱。 因为尽管不同的人有不同的品味和价值观,但了解一本书的总体评级是明智的起点。 您以后可以随时对其进行更新。

环境 (Environment)

It’s good practice to work in a virtual environment, a sandbox with its own libraries and versions, so we’ll make one for this project. There are several ways to do this, but we’ll use Anaconda. To create and activate an Anaconda virtual environment called ‘gr’ (for Goodreads) using Python 3.7, run the following commands in your terminal or command line:

在虚拟环境中工作是一个好习惯,这是一个具有自己的库和版本的沙箱,因此我们将为该项目创建一个。 有几种方法可以做到这一点,但是我们将使用Anaconda 。 要使用Python 3.7创建并激活名为'gr'(用于Goodreads)的Anaconda虚拟环境,请在终端或命令行中运行以下命令:

装置 (Installations)

You should see ‘gr’ or whatever you named your environment at the left of your prompt. If so, run these commands. Anaconda will automatically install any dependencies of these packages, including matplotlib, numpy, pandas, and scikit-learn.

您应该在提示符左侧看到“ gr”或您所命名的环境。 如果是这样,请运行这些命令。 Anaconda将自动安装这些软件包的所有依赖项,包括matplotlib,numpy,pandas和scikit-learn。

进口货 (Imports)

数据采集 (Data Collection)

We pull the first 50,000 book ids and their associated information using a lightweight wrapper around the Goodreads API made by Michelle D. Zhang (code and documentation here), then write each as a dictionary to a JSON file called book_data.

我们使用轻量级包装器(由Michelle D. Zhang制作, 此处是代码和文档)使用轻量级包装提取前50,000个书籍ID及其相关信息,然后将每个书籍ID作为字典写到名为book_data的JSON文件中。

数据清理 (Data Cleaning)

We’ll define and describe some key functions below, but we’ll run them in one big wrangle function later.

我们将在下面定义和描述一些关键功能,但是稍后将在一个大的争用函数中运行它们。

威尔逊下界 (Wilson Lower Bound)

A rating of 4 stars based on 20 reviews and a rating of 4 stars based on 20,000 reviews are not equal. The rating based on more reviews has less uncertainty about it and is therefore a more reliable estimate of the “true” rating. In order to properly define and predict great books, we must transform average_rating by putting a penalty on uncertainty.

基于20条评论的4星评级和基于20,000条评论的4星评级不相等。 基于更多评论的评级具有较少的不确定性,因此是对“真实”评级的更可靠的估计。 为了正确定义和预测好书,我们必须通过对不确定性进行惩罚来转变average_rating

We’ll do this by calculating a Wilson Lower Bound, where we estimate the confidence interval of a particular rating and take its lower bound as the new rating. Ratings based on tens of thousands of reviews will barely be affected because their confidence intervals are narrow. Ratings based on fewer reviews, however, have wider confidence intervals and will be scaled down more.

我们将通过计算威尔逊下界(Wilson Lower Bound)来做到这一点,在此我们估计特定等级的置信区间,并将其下界作为新等级。 基于数万条评论的评分几乎不会受到影响,因为它们的置信区间很窄。 但是,基于较少评论的评分具有较大的置信区间,并且将被缩小。

Note: We modify the formula because our data is calculated from a 5-point system, not a binary system as described by Wilson. Specifically, we decrement average_rating by 1 for a conservative estimate of the true non-inflated rating, and then normalize it. If this penalty is too harsh or too light, more ratings will over time raise or lower the book’s rating, respectively. In other words, with more information, this adjustment is self-correcting.

注意 :我们修改公式是因为我们的数据是根据5点系统而不是Wilson所描述的二进制系统计算的。 具体来说,我们对平均非虚假评级的保守估计值将average_rating递减1,然后对其进行归一化。 如果此惩罚太苛刻或太轻,随着时间的流逝,更多的评级将分别提高或降低该书的评级。 换句话说,有了更多信息,此调整是自校正的。

体裁 (Genres)

Goodreads’ API returns ‘shelves’, which encompass actual genres like “science-fiction” and user-created categories like “to-read”. We extracted only the 5 most popular shelves when pulling the data to limit this kind of clean-up; here, we’ll finish the job.

Goodreads的API返回“货架”,其中包含诸如“科幻小说”之类的实际类型和诸如“待阅读”之类的用户创建类别。 在提取数据时,我们仅提取了5个最受欢迎的书架,以限制此类清理工作。 在这里,我们将完成工作。

After some inspection, we see that these substrings represent the bulk of non-genre shelves. We’ll filter them out using a regular expression. Note: We use two strings in the regex so the line doesn’t get cut off. Adjacent strings inside parantheses are joined at compile time.

经过检查,我们发现这些子字符串代表了大部分非类型的货架。 我们将使用正则表达式过滤掉它们。 注意 :我们在正则表达式中使用了两个字符串,因此该行不会被截断。 括号内的相邻字符串在编译时连接。

多合一清洁 (All-in-one Cleaning)

Now we’ll build and run one function to wrangle the data set. This way, the cleaning is more reproducible and debug-able.

现在,我们将构建并运行一个函数来纠缠数据集。 这样,清洁更可重复且可调试。

比较未调整和调整后的平均评分 (Compare Unadjusted and Adjusted Average Ratings)

Numerically, the central measures of tendency of mean (in blue) and median (in green) slightly decrease, and the variance significantly decreases.

在数值上,均值(蓝色)和中位数(绿色)趋势的中心度量略有降低,并且方差显着降低。

Visually, we can see the rating adjustment in the much smoother and wider distribution (although note that the x-axis is truncated). This is from eliminating outlier books with no or very few ratings, and scaling down ratings with high uncertainty.

在视觉上,我们可以看到等级调整更加平滑和广泛(尽管请注意,x轴被截断了)。 这是因为消除了没有评级或评级很少的离群图书,并降低了不确定性很高的评级。

Unadjusted mean: 3.82Unadjusted median: 3.93Unadjusted variance: 0.48
Adjusted mean: 3.71Adjusted median: 3.77Adjusted variance: 0.17

资料泄漏 (Data Leakage)

Because our target is derived from ratings, training our model using ratings is effectively training with the target. To avoid distorting the model, we must drop these columns.

由于我们的目标是从评级得出的,因此使用评级训练模型可以有效地对目标进行训练。 为了避免扭曲模型,我们必须删除这些列。

It is also possible that review_count is a bit of leakage, but it seems more like a proxy for popularity, not greatness, in the same way that pop(ular) songs are not often considered classics. Of course, we'll reconsider this if its permutation importance is suspiciously high.

review_count也可能有点泄漏,但似乎更像是流行的 review_count ,而不是review_count ,就像流行(ular)歌曲通常不被视为经典之类。 当然,如果其排列重要性很高,我们将重新考虑。

分割资料 (Split Data)

We’ll do an 85/15 train-test split, then re-split our train set to make the validation set about the same size as the test set.

我们将进行85/15火车测试拆分,然后重新拆分火车集,以使验证集的大小与测试集的大小相同。

(20281, 12) (20281,) (4348, 12) (4348,) (4347, 12) (4347,)

评估指标 (Evaluation Metrics)

With classes this imbalanced, accuracy (correct predictions / total predictions) can become misleading. There just aren’t enough true positives for this fraction to be the best measure of model performance. So we’ll also use ROC AUC, a Receiver Operator Characteristic Area Under the Curve. Here is a colored drawing of one, courtesy of Martin Thoma.

对于此类不平衡的类,准确性(正确的预测/总的预测)可能会引起误解。 对于这部分,没有足够的真实肯定作为衡量模型性能的最佳方法。 因此,我们还将使用ROC AUC,即曲线下方的接收器操作员特征区域。 这是马丁·托马(Martin Thoma)提供的彩色图画。

Drawing of ROC AUC in the style of XKCD
XKCD样式的ROC AUC图纸

The ROC curve is a plot of a classification model’s true positive rate (TPR) against its false positive rate (FPR). The ROC AUC is the area from [0, 1] under and to the right of this curve. Since optimal model performance maximizes true positives and minimizes false positives, the optimal point in this 1x1 plot is the top left, where the area under the curve (ROC AUC) = 1.

ROC曲线是分类模型的真实阳性率(TPR)相对其假阳性率(FPR)的图。 ROC AUC是此曲线下方和右侧的[0,1]区域。 由于最佳模型性能可最大化真实正值并最小化错误正值,因此此1x1图中的最佳点位于左上方,曲线下面积(ROC AUC)= 1。

For imbalanced classes such as great, ROC AUC outperforms accuracy as a metric because it better reflects the relationship between true positives and false positives. It also depicts the classifier’s performance across all its values, giving us more information about when and where the model improves, plateaus, or suffers.

对于不平衡类(例如great ,ROC AUC优于准确性作为度量标准,因为它可以更好地反映真假阳性之间的关系。 它还描述了分类器在其所有值上的表现,为我们提供了有关模型何时何地改进,平稳或遭受损失的更多信息。

拟合模型 (Fit Models)

Predicting great books is a binary classification problem, so we need a classifier. Below, we’ll encode, impute, and fit to the data a linear model (Logistic Regression) and two tree-based models (Random Forests and XGBoost), then compare them to each other and to the majority baseline. We’ll calculate their accuracy and ROC AUC, and then make a visualization.

预测好书是一个二进制分类问题,因此我们需要一个分类器。 下面,我们将对数据进行编码,估算和拟合,以线性模型(Logistic回归)和两个基于树的模型(Random Forests和XGBoost)进行比较,然后将它们相互比较并与多数基准进行比较。 我们将计算其准确性和ROC AUC,然后进行可视化。

多数阶级基线 (Majority Class Baseline)

First, by construction, great books are the top 20% of books by Wilson-adjusted rating. That means our majority class baseline (no books are great) has an accuracy of 80%.

首先,从结构上看,按威尔逊调整后的评分, great书籍是排名前20%的书籍。 这意味着我们的大多数班级基线( 没有一本好书)的准确性为80%。

Second, this “model” doesn’t improve, plateau, or suffer since it has no discernment to begin with. A randomly chosen positive would be treated no differently than a randomly chosen negative. In other wrods, its ROC AUC = 0.5.

第二,此“模型”没有改善,稳定或遭受损害,因为它一开始没有识别力。 随机选择的阳性与随机选择的阴性没有区别。 在其他情况下,其ROC AUC = 0.5。

Baseline Validation Accuracy: 0.8Baseline Validation ROC AUC: 0.5

逻辑回归 (Logistic Regression)

Now we’ll fit a linear model with cross-validation, re-calculate evaluation metrics, and plot a confusion matrix.

现在,我们将使用带有交叉验证的线性模型,重新计算评估指标,并绘制混淆矩阵。

Baseline Validation Accuracy: 0.8Logistic Regression Validation Accuracy: 0.8013Baseline Validation ROC AUC: 0.5Logistic Regression Validation ROC AUC: 0.6424

Logistic回归混淆矩阵 (Logistic Regression Confusion Matrix)

随机森林分类器 (Random Forest Classifier)

Now we’ll do the same as above with a tree-based model with bagging (Bootstrap AGGregation).

现在,我们将对带有袋装( B ootstrap AGG规约)的基于树的模型进行与上述相同的操作。

Baseline Validation Accuracy: 0.8Logistic Regression Validation Accuracy: 0.8013Random Forest Validation Accuracy: 0.8222Majority Class Baseline ROC AUC: 0.5Logistic Regression Validation ROC AUC: 0.6424Random Forest Validation ROC AUC: 0.8015

随机森林混淆矩阵 (Random Forest Confusion Matrix)

XGBoost分类器 (XGBoost Classifier)

Now we’ll do the same as above with another tree-based model, this time with boosting.

现在,我们将对另一个基于树的模型进行与上述相同的操作,这一次是使用boosting。

Baseline Validation Accuracy: 0.8Logistic Regression Validation Accuracy: 0.8013Random Forest Validation Accuracy: 0.8245XGBoost Validation Accuracy: 0.8427Majority Class Baseline ROC AUC: 0.5Logistic Regression Validation ROC AUC: 0.6424Random Forest Validation ROC AUC: 0.8011XGBoost Validation ROC AUC 0.84

XGBClassifier performes the best in accuracy and ROC AUC.

XGBClassifier在准确性和ROC AUC方面表现最佳。

图形和比较模型的ROC AUC (Graph and Compare Models’ ROC AUC)

Below, we see that logistic regression lags far behind XGBoost and Random Forests in achieving a high ROC AUC. Among the top two, XGBoost initially outperforms RandomForest, and then the two roughly converge around FPR=0.6. We see in the lower right legend, however, that XGBoost has the highest AUC of 0.84, followed by Random Forest at 0.80 and Logistic Regression at 0.64.

在下面,我们看到逻辑回归在实现高ROC AUC方面远远落后于XGBoost和Random Forests。 在前两个中,XGBoost最初优于RandomForest,然后两个大致在FPR = 0.6左右收敛。 但是,我们在右下角的图例中看到,XGBoost的AUC最高,为0.84,其次是随机森林,为0.80,逻辑回归为0.64。

In less technical language, the XGBoost model was the best at classifying great books as great (true positives) and not classifying not-great books as great (false positives).

用较少技术的语言,XGBoost模型最擅长将优秀书籍分类为优秀书籍(真实肯定),而不是将非优秀书籍分类为优秀书籍(错误肯定)。

排列重要性 (Permutation Importances)

One intuitive way of identifying whether and to what extent something is important is by seeing what happens when you take it away. This is the best in a situation unconstrained by time and money.

识别某些事物是否重要以及在什么程度上重要的一种直观方法是查看将其拿走时发生的情况。 在不受时间和金钱约束的情况下,这是最好的选择。

But in the real world with real constrains, we can use permutation instead. Instead of eliminating the column values values by dropping them, we eliminate the column’s signal by randomizing it. If the column really were a predictive feature, the order of its values would matter, and shuffling them around would substantially dilute if not destroy the relationship. So if the feature’s predictive power isn’t really hurt or is even helped by randomization, we can conclude that it is not actually important.

但是,在具有实际约束的现实世界中,我们可以改用排列。 我们没有通过丢弃来消除列值的值,而是通过随机化来消除列的信号 。 如果该列确实是一个预测性特征,则其值的顺序将很重要,并且如果不破坏该关系,将它们改组也将大大稀释。 因此,如果该功能的预测能力并没有真正受到损害,甚至没有受到随机化的帮助,我们可以得出结论,它实际上并不重要。

Let’s take a closer look at the permutation importances of our XGBoost model. We’ll have to refit it to be compatible with eli5.

让我们仔细看看XGBoost模型的排列重要性。 我们必须对其进行改装以使其与eli5兼容。

排列重要性分析 (Permutation Importance Analysis)

As we assumed at the beginning, review_count matters but it is not suspiciously high. This does not seem to rise to the level of data leakage. What this means is that if you were wondering what book to read next, a useful indicator is how many reviews it has, a proxy for how many others have read it.

正如我们在一开始所假设的那样, review_count重要,但并不高。 这似乎没有达到数据泄漏的程度。 这意味着如果您想知道接下来要读什么书,一个有用的指标是它拥有多少评论,以及多少其他书已经读过它的代理。

We see that genres is the second most important feature for ROC AUC in the XGBoost model.

我们看到genres是XGBoost模型中ROC AUC的第二重要功能。

author is third, which is surprising and perhaps a bit concerning. Because our test set is not big, the model may just be identifying authors whose books are the most highly rated in wilson-adjusted terms, such as J.K. Rowling and Suzanne Colins. More data would be useful to test this theory.

author排名第三,这令人惊讶,也许有点令人担忧。 由于我们的测试集不大,因此该模型可能只是在识别按威尔森调整后的书评鉴最高的作者,例如JK Rowling和Suzanne Colins。 更多的数据将有助于检验这一理论。

Fourth is num_pages. I thought this would be higher for two reasons:

第四是num_pages 。 我认为这会更高,原因有两个:

  1. Very long books’ ratings seem to have a bit of a ratings bias upward in that people willing to start and finish them will rate them higher. The long length screens out less interested marginal readers, who probably wouldn’t have rated the book highly in the first place.很长的书的评分似乎有一个较高的评分偏见,因为愿意开始和完成它们的人会给它们更高的评分。 篇幅较长的文章筛选出了那些不太感兴趣的边缘读者,他们可能不会一开始就对这本书给予很高的评价。
  2. Reading and showing off that you’re reading or have read long books is a sign of high social status. The archetypal example: Infinite Jest.阅读并炫耀自己正在阅读或已经读过很长的书是社会地位很高的标志。 原型示例:无限开玩笑。

带走 (Takeaway)

We’ve seen how to collect, clean, analyze, visualize, and model data. Some actionable takeaways are that when and who publishes a book doesn’t really matter, but its review count does — the more reviews, the better.

我们已经看到了如何收集,清理,分析,可视化和建模数据。 一些可行的建议是,何时以及谁出版一本书并不重要,但它的评论数却很重要–评论越多越好。

For further analysis, we could break down genres and authors to find out which ones were rated highest. For now, happy reading.

为了进行进一步的分析,我们可以分解genresauthors ,找出哪些被评为最高。 现在,阅读愉快。

翻译自: https://medium.com/@ryan.koul/predicting-great-books-from-goodreads-data-using-python-1d378e7ef926

爬虫goodreads数据


http://www.taodudu.cc/news/show-863443.html

相关文章:

  • opengl层次建模_层次建模简介
  • 如何用dds实现线性调频_用神经网络生成线性调频
  • azure_Azure ML算法备忘单
  • 矩阵乘法如何去逆矩阵_矩阵乘法和求逆
  • 机器学习数据倾斜的解决方法_机器学习并不总是解决数据问题的方法
  • gan简介_GAN简介
  • 使用TensorFlow训练神经网络进行价格预测
  • 您应该如何改变数据科学教育
  • r语言解释回归模型的假设_模型假设-解释
  • 参考文献_参考文献:
  • 深度学习用于视频检测_视频如何用于检测您的个性?
  • 角距离恒星_恒星问卷调查的10倍机器学习生产率
  • apache beam_Apache Beam ML模型部署
  • 转正老板让你谈谈你的看法_让我们谈谈逻辑回归
  • openai-gpt_GPT-3报告存在的问题
  • 机器学习 凝聚态物理_机器学习遇到了凝聚的问题
  • 量子计算 qiskit_将Tensorflow和Qiskit集成到量子机器学习中
  • throw 烦人_烦人的简单句子聚类
  • 使用NumPy优于Python列表的优势
  • 迷你5和迷你4区别_可变大小的视频迷你批处理
  • power bi可视化表_如何使用Power BI可视化数据?
  • 变形金刚2_变形金刚(
  • 机器学习 测试_测试优先机器学习
  • azure机器学习_Microsoft Azure机器学习x Udacity —第4课笔记
  • 机器学习嵌入式实现_机器学习中的嵌入
  • 无监督学习 k-means_无监督学习-第3部分
  • linkedin爬虫_机器学习的学生和从业者的常见问题在LinkedIn上提问
  • lime 深度学习_用LIME解释机器学习预测并建立信任
  • 神经网络 梯度下降_梯度下降优化器对神经网络训练的影响
  • 深度学习实践:计算机视觉_深度学习与传统计算机视觉技术:您应该选择哪个?

爬虫goodreads数据_使用Python从Goodreads数据中预测好书相关推荐

  1. python处理nc数据_利用python如何处理nc数据详解

    利用python如何处理nc数据详解 来源:中文源码网    浏览: 次    日期:2018年9月2日 [下载文档:  利用python如何处理nc数据详解.txt ] (友情提示:右键点上行txt ...

  2. python获取股票逐笔交易数据_利用python下载股票交易数据

    前段时间玩Python时无意看到了获取股票交易数据的tushare模块,由于自己对股票交易挺有兴趣,加上现在又在做数据挖掘工作,故想先将股票数据下载到数据库中,以便日后分析: # 导入需要用到的模块 ...

  3. python怎么存储数据_【Python】存储数据

    很多程序都要求用户输入某种信息,如让用户存储游戏首选项或者提供可视化数据,不管专注什么,程序都要将数据进行存储,那么如何存储呢? JSON(JavaScript Object Notation)格式最 ...

  4. 如何用python批量下载数据_使用Python批量下载数据

    这次依旧是,不过下载的是Australian Bureau of Statistics的数据,都是xls的表格,网址为:http://www.abs.gov.au.网页打开左边有棵树目录,里面记录的澳 ...

  5. python 网页上显示数据_用Python实现网页数据抓取

    需求: 获取某网站近10万条数据记录的相关详细信息. 分析:数据的基本信息存放于近1万个页面上,每个页面上10条记录.如果想获取特定数据记录的详细信息,需在基本信息页面上点击相应记录条目,跳转到详细信 ...

  6. python从文件中读取数据_【Python】从文件中读取数据

    从文件中读取数据 1.1 读取整个文件 要读取文件,需要一个包含几行文本的文件(文件PI_DESC.txt与file_reader.py在同一目录下) PI_DESC.txt 3.1415926535 ...

  7. python表格数据_用python读取表格数据

    我有一堆表格数据的文本文件.看起来像这样:1. BRISTOL CITY 42 16 4 1 43 13 8 7 6 23 27 59 2. Plymouth Argyle 42 18 3 0 47 ...

  8. python从sqlserver提取数据_通过Python读取sqlserver数据写成json文件的总结

    最近在做将sqlserver的数据迁移到redshift上面,然后再做的过程当中出现了很多问题.(通过sqlserver 读取数据,然后写成json文件然后长传到S3上面,然后通过S3copy的形式将 ...

  9. pythonurllib库获取yahoo财经数据_用Python从Yahoo财经中获取历史数据

    我写这篇文章是为了从YF直接从下载csv链接获取历史数据.它需要发出两个请求,一个请求获取cookie和面包屑,另一个请求获取数据.它返回一个pandas数据帧import re from io im ...

最新文章

  1. 降低预测过程计算成本,这些NLP模型压缩方法要知道
  2. 图像低频高频区域分离
  3. Ubuntu 16.04.6 LTS 发布:紧急修复 APT 漏洞
  4. 数据结构学习之路-第一章:绪论
  5. 通俗易懂设计模式解析——代理模式
  6. 2015 Autodesk 开发者日( DevDays)和 助力开发周火热报名中
  7. python manager ulimit_Linux ulimit命令
  8. 使用 golang-plugins 时要避免的事情
  9. 小学生计算机考试软件,中小学生准考证制作打印软件
  10. 苹果电脑上常用的4个数据库管理工具
  11. windows电脑借助多开软件实现ZALO程序多开
  12. 【RTD】AD7793三线式铂电阻PT100/PT1000应用
  13. 单系统 台电x80pro_台电X80HD安装Win8单系统教程
  14. C++面向对象小练习:几何图形类
  15. FastQC使用与结果详细解读
  16. word一行内英文单词之间间隔过大解决方法
  17. Twitter注册如何做到ip防关联
  18. 蒟蒻重返c++,学海拾贝(二)
  19. Kafka对Java程序员有多重要?连阿里都再用它处理亿万级数据统计
  20. xp系统t3登录不到服务器,用友T3软件服务器xp系统,客户端xp系统,登不上服务器...

热门文章

  1. github中删除已建仓库(二)
  2. 牵引力教育就业数据显示:很多大学毕业就等于失业?
  3. 数据库备份DBS 新增Region支持:华北2、华东2和华南1
  4. 批量put和单条put
  5. sersync+rsync同步
  6. 模板:什么是Traits
  7. 指定rviz的点启动_好消息!武汉已经启动新冠疫苗紧急接种工作
  8. wordpress漏洞_用软件工具扫描WordPress / Shopify主题恶意代码以及漏洞分析相关工具...
  9. Web UI⾃动化测试平台,专治Web 的各种疑难杂症。
  10. 南通大学计算机网络及应用,南通大学《计算机网络》课程设计资料.pdf