Python:wordcloud.wordcloud()函数的参数解析及其说明

目录

wordcloud.wordcloud()函数的参数解析及其说明


wordcloud.wordcloud()函数的参数解析及其说明

class WordCloud Found at: wordcloud.wordcloudclass WordCloud(object):
    """Word cloud object for generating and drawing.
    
    Parameters
    ----------
    font_path: string
    Font path to the font that will be used (OTF or TTF).
    Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don't have this font, you need to adjust this path.
    
    width : int (default=400)
    Width of the canvas.
    
    height : int (default=200)
    Height of the canvas.
    
    prefer_horizontal : float (default=0.90)
    The ratio of times to try horizontal fitting as opposed to vertical.  If prefer_horizontal < 1, the algorithm will try rotating the word   if it doesn't fit. (There is currently no built-in way to get only vertical words.)
    
    mask : nd-array or None (default=None)
    If not None, gives a binary mask on where to draw words. If mask  is not  None, width and height will be ignored and the shape of mask  will be used instead. All white (#FF or #FFFFFF) entries will be considerd   "masked out" while other entries will be free to draw on. [This  changed in the most recent version!]
    
    scale : float (default=1)
    Scaling between computation and drawing. For large word-cloud   images,
    using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words.
    
    min_font_size : int (default=4)
    Smallest font size to use. Will stop when there is no more room   in this  size.
    
    font_step : int (default=1)
    Step size for the font. font_step > 1 might speed up computation  but   give a worse fit.
    
    max_words : number (default=200)
    The maximum number of words.
    
    stopwords : set of strings or None
    The words that will be eliminated. If None, the build-in  STOPWORDS  list will be used.
    
    background_color : color value (default="black")
    Background color for the word cloud image.
    
    max_font_size : int or None (default=None)
    Maximum font size for the largest word. If None, height of the    image is used.
    
    mode : string (default="RGB")
    Transparent background will be generated when mode is "RGBA"  and  background_color is None.
    
    relative_scaling : float (default=.5)
    Importance of relative word frequencies for font-size.  With  relative_scaling=0, only word-ranks are considered.  With   relative_scaling=1, a word that is twice as frequent will have twice the size.  If you want to consider the word frequencies and not  only  their rank, relative_scaling around .5 often looks good.
    
    .. versionchanged: 2.0
    Default is now 0.5.
    
    color_func: callable, default=None
    Callable with parameters word, font_size, position, orientation,  font_path, random_state that returns a PIL color for each word.
    Overwrites "colormap". See colormap for specifying a matplotlib colormap instead.
    
    regexp : string or None (optional)
    Regular expression to split the input text into tokens in   process_text.
    If None is specified, ``r"\w[\w']+"`` is used.
    
    collocations : bool, default=True
    Whether to include collocations (bigrams) of two words.
    
    .. versionadded: 2.0
    
    colormap : string or matplotlib colormap, default="viridis"
    Matplotlib colormap to randomly draw colors from for each   word.
    Ignored if "color_func" is specified.
    
    .. versionadded: 2.0
    
    normalize_plurals : bool, default=True
    Whether to remove trailing 's' from words. If True and a word appears with and without a trailing 's', the one with trailing 's'  is removed and its counts are added to the version without  trailing 's' -- unless the word ends with 'ss'.

类WordCloud在:WordCloud找到。wordcloudclass WordCloud(对象):
用于生成和绘制的Word云对象。

参数
----------
font_path:字符串
要使用的字体(OTF或TTF)的字体路径。
Linux机器上的默认DroidSansMono路径。如果你在另一个操作系统上或者没有这个字体,你需要调整这个路径。

width :int(默认=400)
画布的宽度。

height :int(默认=200)
画布的高度。

prefer_horizontal : float(默认=0.90)
尝试水平拟合与垂直拟合的时间比。如果prefer_horizontal < 1,算法将尝试旋转不适合的单词。(目前还没有内置的方法来只获取垂直的单词。)

mask : nd-array或None(默认=None)
如果没有,给出一个二进制掩码在哪里绘制单词。如果遮罩不是None,宽度和高度将被忽略,而使用遮罩的形状。所有白色(#FF或#FFFFFF)的参赛作品将被视为“屏蔽”,而其他参赛作品将可以自由提取。[这在最近的版本中有所改变!]

scale :浮动(默认=1)
在计算和绘图之间缩放。对于大的字云图像,
使用scale而不是更大的画布尺寸会快得多,但可能会导致适合文字的粗化。

min_font_size : int(默认=4)
使用的最小字体大小。将停止时,没有更多的空间在这个大小。

font_step : int(默认=1)
字体的步长。font_step > 1可能会加速计算,但是匹配效果更差。

max_words :数字(默认=200)
单词的最大数量。

stopwords :一组字符串或没有
将被删除的单词。如果没有,将使用内置的STOPWORDS列表。

background_color :颜色值(默认=“黑色”)
背景色为字云图像。

max_font_size : int或None(默认=None)
为最大的字的最大字体大小。如果没有,则使用图像的高度。

mode :string(默认="RGB")
当模式为“RGBA”,background_color为None时,将生成透明背景。

relative_scaling :浮动(默认= 5)
字体大小的相对频率的重要性。对于relative_scaling=0,只考虑单词的等级。使用relative_scaling=1,出现频率两倍的单词的大小也会增加一倍。如果您想要考虑单词的频率而不仅仅是它们的排名,那么在5左右的relative_scaling通常看起来不错。

. .versionchanged: 2.0
现在默认值是0.5。

color_func:可调用,默认=无
可调用参数word, font_size, position, orientation, font_path, random_state,为每个单词返回一个PIL颜色。
覆盖“colormap”。请参阅colormap以指定matplotlib的colormap。

regexp :字符串或无(可选)
正则表达式,用于在process_text中将输入文本分割为令牌。
如果没有指定,“r”\ w (\ w) +”“使用。
&
collocations :bool, default=True
是否包含两个单词的搭配(双字母组合)。

. .versionadded: 2.0

colormap : string或matplotlib colormap,默认="viridis"
Matplotlib colormap为每个单词随机绘制颜色。
如果指定了“color_func”,则忽略。

. .versionadded: 2.0

normalize_plurals : bool, default=True
是否删除单词后面的“s”。如果是真的,并且一个单词出现时带有或不带有结尾s,那么带有结尾s的单词将被删除,并将其计数添加到没有结尾s的版本中——除非这个单词以“ss”结尾。

    Attributes
    ----------
    ``words_`` : dict of string to float
    Word tokens with associated frequency.
    
    .. versionchanged: 2.0
    ``words_`` is now a dictionary
    
    ``layout_ `` : list of tuples (string, int, (int, int), int, color))
    Encodes the fitted word cloud. Encodes for each word the string,   font size, position, orientation and color.
    
    Notes
    -----
    Larger canvases with make the code significantly slower. If you   need a  large word cloud, try a lower canvas size, and set the scale  parameter.
    
    The algorithm might give more weight to the ranking of the words  than their actual frequencies, depending on the ``max_font_size `   and the scaling heuristic.
    """
属性
---------
' ' words_ ' ':浮动字符串的dict
具有相关频率的单词标记。

. .versionchanged: 2.0
words_”现在是一本字典

' ' layout_ ' ':元组列表(字符串,int, (int, int), int, color))
编码合适的词云。为每个单词编码字符串、字体大小、位置、方向和颜色。

笔记
-----
较大的画布使代码明显地变慢。如果你需要一个大的字云,尝试一个较低的画布大小,并设置比例参数。

根据' ' max_font_size '和缩放启发式,算法可能给予单词的排名比它们的实际频率更多的权重。
”“”

def __init__(self, font_path=None, width=400, height=200, 
     margin=2, 
        ranks_only=None, prefer_horizontal=.9, mask=None, scale=1, 
        color_func=None, max_words=200, min_font_size=4, 
        stopwords=None, random_state=None, 
         background_color='black', 
        max_font_size=None, font_step=1, mode="RGB", 
        relative_scaling=.5, regexp=None, collocations=True, 
        colormap=None, normalize_plurals=True):
        if font_path is None:
            font_path = FONT_PATH
        if color_func is None and colormap is None:
            # we need a color map
            import matplotlib
            version = matplotlib.__version__
            if version[0] < "2" and version[2] < "5":
                colormap = "hsv"
            else:
                colormap = "viridis"
        self.colormap = colormap
        self.collocations = collocations
        self.font_path = font_path
        self.width = width
        self.height = height
        self.margin = margin
        self.prefer_horizontal = prefer_horizontal
        self.mask = mask
        self.scale = scale
        self.color_func = color_func or colormap_color_func(colormap)
        self.max_words = max_words
        self.stopwords = stopwords if stopwords is not None else 
         STOPWORDS
        self.min_font_size = min_font_size
        self.font_step = font_step
        self.regexp = regexp
        if isinstance(random_state, int):
            random_state = Random(random_state)
        self.random_state = random_state
        self.background_color = background_color
        self.max_font_size = max_font_size
        self.mode = mode
        if relative_scaling < 0 or relative_scaling > 1:
            raise ValueError(
                "relative_scaling needs to be "
                "between 0 and 1, got %f." % 
                relative_scaling)
        self.relative_scaling = relative_scaling
        if ranks_only is not None:
            warnings.warn("ranks_only is deprecated and will be 
             removed as"
                " it had no effect. Look into relative_scaling.", 
                DeprecationWarning)
        self.normalize_plurals = normalize_plurals
    
    def fit_words(self, frequencies):
        """Create a word_cloud from words and frequencies.

Alias to generate_from_frequencies.

Parameters
        ----------
        frequencies : dict from string to float
            A contains words and associated frequency.

Returns
        -------
        self
        """
        return self.generate_from_frequencies(frequencies)
    
    def generate_from_frequencies(self, frequencies, 
     max_font_size=None):
        """Create a word_cloud from words and frequencies. Parameters

----------
        frequencies : dict from string to float
            A contains words and associated frequency.

max_font_size : int
            Use this font-size instead of self.max_font_size

Returns
        -------
        self

"""
        # make sure frequencies are sorted and normalized
        frequencies = sorted(frequencies.items(), key=itemgetter(1), 
         reverse=True)
        if len(frequencies) <= 0:
            raise ValueError("We need at least 1 word to plot a word 
             cloud, "
                "got %d." % 
                len(frequencies))
        frequencies = frequencies[:self.max_words] # largest entry will 
         be 1
        max_frequency = float(frequencies[0][1])
        frequencies = [(word, freq / max_frequency) for 
            word, freq in frequencies]
        if self.random_state is not None:
            random_state = self.random_state
        else:
            random_state = Random()
        if self.mask is not None:
            mask = self.mask
            width = mask.shape[1]
            height = mask.shape[0]
            if mask.dtype.kind == 'f':
                warnings.warn("mask image should be unsigned byte 
                 between 0"
                    " and 255. Got a float array")
            if mask.ndim == 2:
                boolean_mask = mask == 255
            elif mask.ndim == 3: # if all channels are white, mask out
                :::3]255, axis=-1)
        else:
            boolean_mask = np.all(mask[ == 
                raise ValueError("Got mask of invalid shape: %s" % 
                    str(mask.shape))
        else:
            boolean_mask = None
            height, width = self.height, self.width
        occupancy = IntegralOccupancyMap(height, width, 
         boolean_mask)
        # create image
        img_grey = Image.new("L", (width, height))
        draw = ImageDraw.Draw(img_grey)
        img_array = np.asarray(img_grey)
        font_sizes, positions, orientations, colors = [], [], [], []
        last_freq = 1.
        if max_font_size is None:
            # if not provided use default font_size
            max_font_size = self.max_font_size
        if max_font_size is None:
            # figure out a good font size by trying to draw with
            # just the first two words
            if len(frequencies) == 1:
                # we only have one word. We make it big!
                font_size = self.height
            else:
                self.generate_from_frequencies(dict(frequencies[:2]), 
                    max_font_size=self.height)
                # find font sizes
                sizes = [x[1] for x in self.layout_]
                try:
                    font_size = int(2 * sizes[0] * sizes[1] / 
                        (sizes[0] + sizes[1]))
                # quick fix for if self.layout_ contains less than 2 values
                # on very small images it can be empty
                except IndexError:
                    try:
                        font_size = sizes[0]
                    except IndexError:
                        raise ValueError('canvas size is too small')
        else:
            font_size = max_font_size
        # we set self.words_ here because we called 
         generate_from_frequencies
        # above... hurray for good design?
        self.words_ = dict(frequencies)
        # start drawing grey image
        for word, freq in frequencies:
            # select the font size
            rs = self.relative_scaling
            if rs != 0:
                font_size = int(round((rs * (freq / float(last_freq)) + 
                            (1 - rs)) * font_size))
            if random_state.random() < self.prefer_horizontal:
                orientation = None
            else:
                orientation = Image.ROTATE_90
            tried_other_orientation = False
            while True:
                # try to find a position
                font = ImageFont.truetype(self.font_path, font_size)
                # transpose font optionally
                transposed_font = ImageFont.TransposedFont(
                    font, orientation=orientation)
                # get size of resulting text
                box_size = draw.textsize(word, font=transposed_font)
                # find possible places using integral image:
                result = occupancy.sample_position(box_size[1] + self.
                 margin, 
                    box_size[0] + self.margin, 
                    random_state)
                if result is not None or font_size < self.min_font_size:
                    # either we found a place or font-size went too small
                    break
                # if we didn't find a place, make font smaller
                # but first try to rotate!
                if not tried_other_orientation and self.prefer_horizontal < 
                 1:
                    orientation = Image.ROTATE_90 if orientation is None 
                     else Image.ROTATE_90
                    tried_other_orientation = True
                else:
                    font_size -= self.font_step
                    orientation = None
            
            if font_size < self.min_font_size:
                # we were unable to draw any more
                break
            x, y = np.array(result) + self.margin // 2
            # actually draw the text
            draw.text((y, x), word, fill="white", font=transposed_font)
            positions.append((x, y))
            orientations.append(orientation)
            font_sizes.append(font_size)
            colors.append(self.color_func(word, font_size=font_size, 
                    position=(x, y), 
                    orientation=orientation, 
                    random_state=random_state, 
                    font_path=self.font_path))
            # recompute integral image
            if self.mask is None:
                img_array = np.asarray(img_grey)
            else:
                img_array = np.asarray(img_grey) + boolean_mask
            # recompute bottom right
            # the order of the cumsum's is important for speed ?!
            occupancy.update(img_array, x, y)
            last_freq = freq
        
        self.layout_ = list(zip(frequencies, font_sizes, positions, 
                orientations, colors))
        return self
    
    def process_text(self, text):
        """Splits a long text into words, eliminates the stopwords.

Parameters
        ----------
        text : string
            The text to be processed.

Returns
        -------
        words : dict (string, int)
            Word tokens with associated frequency.

..versionchanged:: 1.2.2
            Changed return type from list of tuples to dict.

Notes
        -----
        There are better ways to do word tokenization, but I don't 
         want to
        include all those things.
        """
        stopwords = set([i.lower() for i in self.stopwords])
        flags = re.UNICODE if sys.version < '3' and type(text) is unicode 
         else 0
        regexp = self.regexp if self.regexp is not None else r"\w[\w']+"
        words = re.findall(regexp, text, flags)
        # remove stopwords
        words = [word for word in words if word.lower() not in 
         stopwords]
        # remove 's
        words = [word[:-2] if word.lower().endswith("'s") else word for 
            word in words]
        # remove numbers
        words = [word for word in words if not word.isdigit()]
        if self.collocations:
            word_counts = unigrams_and_bigrams(words, self.
             normalize_plurals)
        else:
            word_counts, _ = process_tokens(words, self.
             normalize_plurals)
        return word_counts
    
    def generate_from_text(self, text):
        """Generate wordcloud from text.

The input "text" is expected to be a natural text. If you pass a 
         sorted
        list of words, words will appear in your output twice. To 
         remove this
        duplication, set ``collocations=False``.

Calls process_text and generate_from_frequencies.

..versionchanged:: 1.2.2
            Argument of generate_from_frequencies() is not return of
            process_text() any more.

Returns
        -------
        self
        """
        words = self.process_text(text)
        self.generate_from_frequencies(words)
        return self
    
    def generate(self, text):
        """Generate wordcloud from text.

The input "text" is expected to be a natural text. If you pass a 
         sorted
        list of words, words will appear in your output twice. To 
         remove this
        duplication, set ``collocations=False``.

Alias to generate_from_text.

Calls process_text and generate_from_frequencies.

Returns
        -------
        self
        """
        return self.generate_from_text(text)
    
    def _check_generated(self):
        """Check if ``layout_`` was computed, otherwise raise error."""
        if not hasattr(self, "layout_"):
            raise ValueError("WordCloud has not been calculated, call 
             generate"
                " first.")
    
    def to_image(self):
        self._check_generated()
        if self.mask is not None:
            width = self.mask.shape[1]
            height = self.mask.shape[0]
        else:
            height, width = self.height, self.width
        img = Image.new(self.mode, (int(width * self.scale), 
                int(height * self.scale)), 
            self.background_color)
        draw = ImageDraw.Draw(img)
        for (word, count), font_size, position, orientation, color in self.
         layout_:
            font = ImageFont.truetype(self.font_path, 
                int(font_size * self.scale))
            transposed_font = ImageFont.TransposedFont(
                font, orientation=orientation)
            pos = int(position[1] * self.scale), int(position[0] * self.scale)
            draw.text(pos, word, fill=color, font=transposed_font)
        
        return img
    
    def recolor(self, random_state=None, color_func=None, 
     colormap=None):
        """Recolor existing layout.

Applying a new coloring is much faster than generating the 
         whole
        wordcloud.

Parameters
        ----------
        random_state : RandomState, int, or None, default=None
            If not None, a fixed random state is used. If an int is given, 
             this
            is used as seed for a random.Random state.

color_func : function or None, default=None
            Function to generate new color from word count, font size, 
             position
            and orientation.  If None, self.color_func is used.

colormap : string or matplotlib colormap, default=None
            Use this colormap to generate new colors. Ignored if 
             color_func
            is specified. If None, self.color_func (or self.color_map) is 
             used.

Returns
        -------
        self
        """
        if isinstance(random_state, int):
            random_state = Random(random_state)
        self._check_generated()
        if color_func is None:
            if colormap is None:
                color_func = self.color_func
            else:
                color_func = colormap_color_func(colormap)
        self.layout_ = [(word_freq, font_size, position, orientation, 
                color_func(word=word_freq[0], font_size=font_size, 
                    position=position, orientation=orientation, 
                    random_state=random_state, 
                    font_path=self.font_path)) for 
            word_freq, font_size, position, orientation, _ in 
            self.layout_]
        return self
    
    def to_file(self, filename):
        """Export to image file.

Parameters
        ----------
        filename : string
            Location to write to.

Returns
        -------
        self
        """
        img = self.to_image()
        img.save(filename, optimize=True)
        return self
    
    def to_array(self):
        """Convert to numpy array.

Returns
        -------
        image : nd-array size (width, height, 3)
            Word cloud image as numpy matrix.
        """
        return np.array(self.to_image())
    
    def __array__(self):
        """Convert to numpy array.

Returns
        -------
        image : nd-array size (width, height, 3)
            Word cloud image as numpy matrix.
        """
        return self.to_array()
    
    def to_html(self):
        raise NotImplementedError("FIXME!!!")

 

Python:wordcloud.wordcloud()函数的参数解析及其说明相关推荐

  1. python如何查看函数的参数_python 如何查看自带函数的默认参数?

    python 为什么默认参数不能放在必选参数前面 关于python带默认值的参数只能放在后面的问题曾经不信一见钟情,遇见了你遗失了心:曾经不信爱有多真,遇见了你迷失自己. Python函数里的默认参数 ...

  2. python怎么理解函数的参数_理解Python中函数的参数

    定义函数的时候,我们把参数的名字和位置确定下来,函数的接口定义就完成了.对于函数的调用者来说,只需要知道如何传递正确的参数,以及函数将返回什么样的值就够了,函数内部的复杂逻辑被封装起来,调用者无需了解 ...

  3. python中__init__函数以及参数self

    1.class类包含: 类的属性:类中所涉及的变量 类的方法:类中函数 2. _init_函数(方法) 首先说一下,带有两个下划线开头的函数是声明该属性为私有,不能在类地外部被使用或直接访问. ini ...

  4. python函数做n_【python】定义函数、参数、递归(n!)

    函数### 函数名其实就是指向一个函数对象的引用,完全可以把函数名赋给一个变量,相当于给这个函数起了一个"别名": >>> a = abs # 变量a指向abs函 ...

  5. python中max函数的用法解析

    python中 max函数可以输入一个参数,也可以输入两个参数.如果是一个参数,这个参数必须是可迭代的,max会for i in - 遍历一遍这个迭代器函数会返回其最大值,也可以给出key参数,这样函 ...

  6. python中choice()函数的参数_Python中choice函数如何实现?

    熟悉Python的小伙伴是知道Python是可以生成随机项的,python中choice函数是random模块的随即取样函数,它可以通过导入 random 模块,调用 random 静态对象生成Pyt ...

  7. Python数据分析常用函数及参数详解,可以留着以备不时之需

    利用Python进行数据分析最核心的库就是Pandas,可以说,掌握了Pandas库,Python数据分析就属于中阶水平了. 在<一次简单.完整的全流程数据分析,让我们不再害怕Python &g ...

  8. C语言中函数可变参数解析

    大多数时候,函数中形式参数的数目通常是确定的,在调用时要依次给出与形式参数对应的所有实际参数.但在某些情况下希望函数的参数个数可以根据需要确定.典型的例子有 大家熟悉的函数printf().scanf ...

  9. python学习之函数的参数类型

    函数的参数类型有很多,比如说:位置参数.默认值参数.关键参数.命名关键参数.可变长度参数 (1)>>> 函数名              查看函数的内存地址 (2)>>& ...

最新文章

  1. powershell获取linux文件,powershell如何读取文件名并赋值到变量?
  2. [GO] go使用etcd和watch方法进行实时的配置变更
  3. linux架构师高级系统调优策略
  4. 【java/C# 服务器】IOS 配置推送证书 p12文件流程 - 勿以己悲
  5. Ajax中最有名axios插件(只应用于Ajax)(post方法,官网写错了,应是字符串格式)...
  6. Java 算法 复数求和
  7. Android中的WebView之loadDataWithBaseURL()与loadData()
  8. Android系统中的广播(Broadcast)机制简要介绍和学习计划 .
  9. 不同应用系统之间数据交互的几种方式
  10. 使用POI导出Excel公用类方法 v1.1
  11. 研磨设计模式读书笔记
  12. 单细胞分析实录(16): 非负矩阵分解(NMF)检测细胞异质性
  13. 每日一题:16. “气球” 的最大数量 (C++)
  14. 第一届全国区块链和分布式记账技术标准化技术委员会 委员名单
  15. tmp文件删除会影响计算机吗,电脑临时文件能删吗?
  16. 数据量太大,节省内存的几种方式
  17. 论文分栏前后内容不连续?教你word如何删除分节符
  18. 一个不错的404页页【非常抱歉,全站内容审核中...】
  19. 【Ubuntu】reids客户端(GUI) Medis编译打包
  20. 音视频行业玩家必读,如何实现生态合作+商业变现

热门文章

  1. 5kyu Some Egyptian fractions
  2. 清华大学参赛计算机集群,清华学生超算团队获得国际大学生超级计算机竞赛(SC18)总冠军...
  3. mfc static 文本自适应宽度_基于单双词的自适应单调启发式搜索的文本攻击
  4. centos 上假设svnserve
  5. 磁珠与电感的区别,看了就灰常明白了
  6. 校园职业社交Handshake获1005万美金A轮融资
  7. keepalived主从模式监测nginx
  8. DOS BAT批处理定义变量
  9. 干货 | 基于 BDD 理念的 UI 自动化测试在携程度假的应用
  10. 从零开发一个 Java Web 项目要点