PHP DOMDocument-> getElementByID添加Â代替空(PHP DOMDocument->getElementByID adding  in place of empty )

我正在使用PHP的DOMDocument对象来解析一些HTML(使用cURL获取)。 当我按ID获取元素并输出它时,任何空的 标记都会获得一个额外的字符并变为Â 。


$document = new DOMDocument();

$document->validateOnParse = true;

$document->loadHTML( curl_exec($handle) );


$element = $document->getElementById( __ELEMENT_ID__ );

echo $document->saveHTML();

echo $document->saveHTML($element);


$document->saveHTML()命令按预期运行并打印出整个页面。 但是,就像我上面说的那样,在echo $document->saveHTML($element)命令echo $document->saveHTML($element)空标签转换为Â 。

这发生在$element所有 标记中。

在这个过程中(通过ID获取元素并输出元素)是插入这个额外的字符? 我可以解决它,但我更感兴趣的是找到根。

I'm using PHP's DOMDocument object to parse some HTML (fetched with cURL). When I get an element by ID and output it, any empty tags get an additional character and become  .

The Code:

$document = new DOMDocument();

$document->validateOnParse = true;

$document->loadHTML( curl_exec($handle) );


$element = $document->getElementById( __ELEMENT_ID__ );

echo $document->saveHTML();

echo $document->saveHTML($element);


The $document->saveHTML() command behaves as expected and prints out the entire page. BUT, like I say above, on the echo $document->saveHTML($element) command transforms empty tags into  .

This happens to all tags within $element.

What in this process (of getting the element by ID and outputting the element) is inserting this extra character? I'm could work around it, but I'm more interested in getting to the root.


更新时间:2019-11-29 11:57


我能够通过设置页面的字符编码来解决问题。 我提取的页面没有定义的字符编码,我的页面只是一个没有定义标题信息的片段。 当我添加


I was able to fix the problem by setting the character encoding of the page. The page I was fetching did not have a defined character encoding, and my page was just a snippet without defined header info. When I added

The problem disappeared.



