<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Coranac &#187; blag</title>
	<atom:link href="http://www.coranac.com/category/blag/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.coranac.com</link>
	<description>my own little world</description>
	<lastBuildDate>Sat, 19 Nov 2011 16:43:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.4</generator>
		<item>
		<title>Filter juggling and comment preview</title>
		<link>http://www.coranac.com/2009/08/filter-juggling-and-comment-preview/</link>
		<comments>http://www.coranac.com/2009/08/filter-juggling-and-comment-preview/#comments</comments>
		<pubDate>Mon, 10 Aug 2009 11:49:50 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[blag]]></category>
		<category><![CDATA[codesnippet]]></category>
		<category><![CDATA[comment preview]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=103</guid>
		<description><![CDATA[One of the nice features of WordPress is that it already has a lot of functionality built-in. The whole thing is set up so that normal people can just install and start writing posts immediately, with WordPress taking care of all the details like converting HTML entities and adding newline where appropriate. Of course, for [...]]]></description>
			<content:encoded><![CDATA[<p>
One of the nice features of WordPress is that it already has a lot of<br />
functionality built-in. The whole thing is set up so that normal people<br />
can just install and start writing posts immediately, with WordPress<br />
taking care of all the details like converting<br />
<a href="http://www.w3schools.com/tags/ref_entities.asp">HTML entities</a><br />
and adding newline where appropriate.
</p>
<p>
Of course, for those that aren&#8217;t normal and that would like to write<br />
in raw HTML, these things are somewhat annoying. Fortunately, though,<br />
WordPress allows you to disable these kinds of filters. The catch is<br />
that you need to find out which filters to disable, namely,<br />
<code>wptexturize</code> (which converts HTML entities) and<br />
<code>wpautop</code> (which does newline control). WordPress also makes<br />
it easy add additional filters, like the<br />
<a href="http://blog.hackerforhire.org/code-snippet/"<br />
rel="pingback">CodeSnippet plugin</a> that I use for code highlighting.
</p>
<p>
However, with the amount of filters available, sometimes things will<br />
clash. A good example of this is comments that have source code<br />
in them. Part of what CodeSnippet does is convert certain characters<br />
(specifically: &lsquo;&lt;&rsquo;, &lsquo;&gt;&rsquo;,<br />
&lsquo;&amp;&rsquo;) to printable characters<br />
(&amp;lt;, &amp;gt;, &amp;amp;) and aren&#8217;t considered special HTML<br />
characters anymore. However, there are several other filters that<br />
have a similar task, so that when you write this:</p>
<p><!-- For example, for comments I still have<br />
<code>wptexturize</code> and <code>wpautop</code> enabled, since in<br />
all probability most readers here are merely slightly odd at best.<br />
Besides that, this is the expected comment behaviour and makes sure<br />
no-one gets to add &lt;script&gt; tags and all. -->
</p>
<p><blockquote>
<br />
Oh hai! This is a useful bitfield function.<div>&nbsp;</div><br />
&#91;code lang="cpp"]<br/><br />
template&lt;class T&gt;<br/><br />
inline void bfInsert(T &amp;y, u32 x, int start, int len)<br/><br />
{<br/><br />
 &nbsp; &nbsp;u32 mask= ((1&lt;&lt;len)-1) &lt;&lt; start;<br/><br />
 &nbsp; &nbsp;y &amp;= ~mask;<br/><br />
 &nbsp; &nbsp;y |= (x&lt;&lt;start) &amp; mask;<br/><br />
}<br/><br />
[/code]<br />

</blockquote>
</p>
<p>what it becomes is:</p>
<p><blockquote>
<br />
Oh hai! This is a useful bit function.</p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="kw1">template</span><br />
<span class="kw1">inline</span> <span class="kw1">void</span> bfInsert(T &amp;amp;y, u32 x, <span class="kw1">int</span> start, <span class="kw1">int</span> len)<br />
{<br />
&nbsp; &nbsp; u32 mask= (<span class="nu0">1</span>&amp;lt;&amp;lt;len) &amp;lt;&amp;lt; start;<br />
&nbsp; &nbsp; y &amp;amp;= ~mask;<br />
&nbsp; &nbsp; y |= (x&amp;lt;&amp;lt;start) &amp;amp; mask;<br />
}</div>
</div>
<p>
</blockquote>
</p>
<p>
Not exactly pretty. Note that the template class is simply removed<br />
because it's seen as an illicit HTML tag, and all the special<br />
characters are doubly converted. This is still a mild example; I think<br />
if you place the angle brackets wrong, whole swaths of code can<br />
simply be eaten by the sanitizer.
</p>
<p>
Unfortunately, finding out where the problem lies is tricky. Not<br />
only are there dozens of potential functions doing the conversion,<br />
they can be called from anywhere and PHP isn't exactly rich in the<br />
debugger department. You also have no idea where to start, because<br />
the filters can be called from everywhere. Worse still, in this<br />
particular case the place where the bad happens is actually before<br />
the comment is even saved to the database (but only for unregistered<br />
people; for me the code comments would work fine), and because comments<br />
are handled on a page that you don't actually ever see, random<br />
echo/print statements are useless as well.
</p>
<p>
But I think I finally got it: it was<br />
<code>wp_kses()</code> using (in a roundabout way)<br />
<code>wp_specialchars()</code> in the <tt>wp-includes/kses.php</tt><br />
<s>room</s>file. The contractor is actually<br />
<code>wp_filter_comment()</code> from <tt>wp-includes/comment.php</tt>,<br />
using the <code>pre_comment_content</code> filter as a middleman.
</p>
<p>
The trick now is to keep it from happening. What I've done is define<br />
not one but two <code>pre_comment_content</code> filters: one that<br />
pre-mangles the brackets and ampersand before <code>wp_kses</code>,<br />
and one that de-mangles them afterwards. Of course, this will only<br />
be of importance between &#91;code] tags. Exactly how to do this will<br />
depend on the plugin you're using, but in the case of<br />
CodeSnippet it goes like this:
</p>
<div class="php">
<div class="php proglist" style=" "><span class="co1">//# Put this along with the other add_filter() calls.</span></p>
<p><span class="co1">// Ensure in-\&amp;#91;code] entities ('&lt;&gt;&amp;') work out right in the end.</span><br />
add_filter(<span class="st_h">'pre_comment_content'</span>, <a href="http://www.php.net/array"><span class="kw3">array</span></a>(&amp;<span class="re0">$CodeSnippet</span>, <span class="st_h">'filterDeEntity'</span>), <span class="nu0">1</span>);<br />
add_filter(<span class="st_h">'pre_comment_content'</span>, <a href="http://www.php.net/array"><span class="kw3">array</span></a>(&amp;<span class="re0">$CodeSnippet</span>, <span class="st_h">'filterReEntity'</span>), <span class="nu0">50</span>);</p>
<p>...</p>
<p><span class="co1">//# Add these methods to the CodeSnippet class.</span><br />
&nbsp; &nbsp; <span class="co4">/** <br />
&nbsp; &nbsp; &nbsp;* Pre-encode HTML entities. Should come \e before wp_kses.<br />
&nbsp; &nbsp; &nbsp;*/</span><br />
&nbsp; &nbsp; <span class="kw2">function</span> filterDeEntity(<span class="re0">$content</span>)<br />
&nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="re0">$content</span>= &nbsp;<a href="http://www.php.net/preg_replace"><span class="kw3">preg_replace</span></a>(<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">'#(\[code.*?\])(.*?)(\[/code\])#msie'</span>, <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">'&quot;\\1&quot; . str_replace(<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; array(&quot;&lt;&quot;, &quot;&gt;&quot;, &quot;&amp;&quot;), <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; array(&quot;[|LT|]&quot;, &quot;[|GT|]&quot;, &quot;[|AMP|]&quot;), <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \'\\2\') . &quot;\\3&quot;;'</span>, <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="re0">$content</span>);<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="re0">$content</span>= <a href="http://www.php.net/str_replace"><span class="kw3">str_replace</span></a>(<span class="st_h">'\&quot;'</span>, <span class="st_h">'&quot;'</span>, <span class="re0">$content</span>);<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="re0">$content</span>;<br />
&nbsp; &nbsp; }<br />
&nbsp; &nbsp; <span class="co4">/** <br />
&nbsp; &nbsp; &nbsp;* Decode HTML entities. Should come \e after wp_kses.<br />
&nbsp; &nbsp; &nbsp;*/</span> <br />
&nbsp; &nbsp; <span class="kw2">function</span> filterReEntity(<span class="re0">$content</span>)<br />
&nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span>(<a href="http://www.php.net/strstr"><span class="kw3">strstr</span></a>(<span class="re0">$content</span>, <span class="st0">&quot;[|&quot;</span>))<br />
&nbsp; &nbsp; &nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="re0">$content</span>= <a href="http://www.php.net/preg_replace"><span class="kw3">preg_replace</span></a>(<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">'#(\[code.*?\])(.*?)(\[/code\])#msie'</span>, <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">'&quot;\\1&quot; . str_replace(<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; array(&quot;[|LT|]&quot;, &quot;[|GT|]&quot;, &quot;[|AMP|]&quot;), <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; array(&quot;&lt;&quot;, &quot;&gt;&quot;, &quot;&amp;&quot;), <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \'\\2\') . &quot;\\3&quot;;'</span>, <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="re0">$content</span>);<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="re0">$content</span>= <a href="http://www.php.net/str_replace"><span class="kw3">str_replace</span></a>(<span class="st_h">'\&quot;'</span>, <span class="st_h">'&quot;'</span>, <span class="re0">$content</span>);<br />
&nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="re0">$content</span>;<br />
&nbsp; &nbsp; }</div>
</div>
<p>
Notice that both methods are under the same filter group. The trick<br />
is that they have different priorities, which makes one act before<br />
<code>wp_kses()</code>, and one after. Also note how the regexps work<br />
in the replacement part of <code>preg_replace()</code>. This particular<br />
feature of <code>preg_replace()</code> allows for shorter code, but is<br />
<i>very</i> fragile; it may be better to use<br />
<code>preg_replace_callback()</code> instead. In any case, written like<br />
this it seems to work:
</p>
<p><blockquote>
Oh hai! This is a useful bit function. </p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="kw1">template</span>&lt;<span class="kw1">class</span> T&gt;<br />
<span class="kw1">inline</span> <span class="kw1">void</span> bfInsert(T &amp;y, u32 x, <span class="kw1">int</span> start, <span class="kw1">int</span> len)<br />
{<br />
&nbsp; &nbsp;u32 mask= ((<span class="nu0">1</span>&lt;&lt;len)-<span class="nu0">1</span>)&lt;&lt;start;<br />
&nbsp; &nbsp;y &amp;= ~mask;<br />
&nbsp; &nbsp;y |= (x&lt;&lt;start) &amp; mask;<br />
}</div>
</div>
<p>
</blockquote>
</p>
<h4>Comment preview</h4>
<p>
The code-comment mangling is just part of the issues one can<br />
encounter in blog comments. It's usually impossible to see beforehand<br />
what will be accepted and what not. Is HTML allowed? Are all tags<br />
allows, or just some or none at all? What about whitespace? Or<br />
BB-like tags? Basically, you'll never know what a comment will look<br />
like until you submitted it, and by then it's too late to change it.
</p>
<p>
You know what'd be really helpful? A <b>comment preview</b>!
</p>
<p>
You'd think this'd be a fairly obvious feature for a blogging<br />
system to have, but apparently not.<br />
I was thinking of making by own preview functionality, but when<br />
attempting to do so several items within WP thwarted my efforts.<br />
Fortunately, it seems plugins of this sort exist already. The plugin<br />
I'm now using is <a href="http://blogwaffe.com/ajax-comment-preview/"<br />
rel="pingback">ajax-comment-preview</a>, which works pretty darn well.
</p>
<p><div>&nbsp;</div></p>
<p>
So anyway, comments should be able to handle code properly now and<br />
there's a comment-preview to show you what the comment will look<br />
like in the end. And there was much rejoicing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2009/08/filter-juggling-and-comment-preview/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>new and improved geshi</title>
		<link>http://www.coranac.com/2009/06/new-geshi/</link>
		<comments>http://www.coranac.com/2009/06/new-geshi/#comments</comments>
		<pubDate>Fri, 05 Jun 2009 19:13:08 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[blag]]></category>
		<category><![CDATA[geshi]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=69</guid>
		<description><![CDATA[With Tonc I pretty much did all the syntax highlighting of code manually. As you might expect, this experience was &#8211; well, the proper description is something not suitable for anyone under the age of several thousand, so let&#8217;s keep it at &#8220;somewhat less than pleasant&#8221;. So the first thing I looked when starting this [...]]]></description>
			<content:encoded><![CDATA[<p>
With Tonc I pretty much did all the syntax highlighting of code manually.<br />
As you might expect, this experience was &ndash; well, the proper description is<br />
something not suitable for anyone under the age of several thousand,<br />
so let&#8217;s keep it at &ldquo;somewhat less than pleasant&rdquo;. So the first<br />
thing I looked when starting this whole blogging gig for was something<br />
that could do that automatically. In my case, that was<br />
<a href="http://wordpress.org/extend/plugins/codesnippet-20/">codesnippet</a>,<br />
which was build on the very awesome<br />
<a href="http://qbnz.com/highlighter/">Geshi</a>. There were some<br />
small problems with number formatting and whitespace handling, but<br />
overall it&#8217;s served me well.
</p>
<p>
The Geshi that came with it was &hellip; 1.0.7.20, I think. In any case, Geshi&#8217;s<br />
is now at 1.0.8.3, so I figured it was time for an upgrade. Most notable was<br />
that the way numbers were parsed has been greatly modified, with different<br />
types of representations now being parsed separately &ndash; and correctly<br />
to boot. Right now, it&#8217;s almost fully correct, as you can see from the list<br />
below:
</p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="co1">// Regular int</span></p>
<p><span class="nu0">123</span><br />
<span class="nu0">123l</span><br />
<span class="nu0">123L</span><br />
123ll &nbsp; &nbsp; &nbsp; <span class="co1">// fail</span><br />
123LL &nbsp; &nbsp; &nbsp; <span class="co1">// fail</span></p>
<p>123u&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// fail</span><br />
123U&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// fail</span><br />
+<span class="nu0">123</span></p>
<p>-<span class="nu0">123</span></p>
<p><span class="co1">// Octal</span><br />
<span class="nu0">0123</span></p>
<p><span class="co1">// Hex</span><br />
<span class="nu0">0&#215;12</span><br />
<span class="nu0">0&#215;123</span><br />
0&#215;123.4</p>
<p><span class="co1">// Float</span><br />
<span class="nu0">123.4</span><br />
<span class="nu0">123.4f</span><br />
<span class="nu0">123.4F</span><br />
+<span class="nu0">123.4</span><br />
-<span class="nu0">123.4</span><br />
<span class="nu0">1.2e3</span></p>
<p><span class="nu0">1.2E3</span><br />
<span class="nu0">1.2e+3</span><br />
<span class="nu0">1.2e-3</span></p>
<p><span class="co1">// Inner</span><br />
(<span class="nu0">1.23</span>)<br />
abc123de</div>
</div>
<p>Only some of the more special integer literals aren&#8217;t parsed correctly,<br />
specifically the unsigned (<code>-U</code>) and long long<br />
(<code>-LL</code>) suffixes aren&#8217;t accepted. I don&#8217;t suppose hex floats will<br />
work either, but that&#8217;s a GCC extension anyway.
</p>
<p>
To fix this, you need to modify geshi a little; specifically the<br />
GESHI_NUMBER_INT_CSTYLE regular expression:
</p>
<div class="none">
<div class="none proglist" style=" ">&nbsp; GESHI_NUMBER_INT_CSTYLE =&gt;<br />
&nbsp; &nbsp; <span class="st0">&#8216;(?&lt;![0-9a-z_<span class="es0">\.</span>%])(?&lt;![<span class="es0">\d</span><span class="es0">\.</span>]e[+<span class="es0">\-</span>])([1-9]<span class="es0">\d</span>*?|0)l(?![0-9a-z<span class="es0">\.</span>])&#8217;</span>,</div>
</div>
<p>
&hellip; yeah. I&#8217;m not sure why it&#8217;s formulated like that either. I&#8217;d have thought<br />
&#8216;<code>\b</code>&#8216; would have worked just as well, but alright. Anyway, notice the single &#8216;<code>l</code>&#8216; character in there? That needs to be extended to something<br />
that matches a potential single &#8216;<code>u</code>&#8216;, possibly followed by one or two<br />
&#8216;<code>l</code>&#8216;s. In other words: &#8216;<code>u?l{0,2}</code>&#8216;.
</p>
<div class="none">
<div class="none proglist" style=" ">&nbsp; GESHI_NUMBER_INT_CSTYLE =&gt;<br />
&nbsp; &nbsp; <span class="st0">&#8216;(?&lt;![0-9a-z_<span class="es0">\.</span>%])(?&lt;![<span class="es0">\d</span><span class="es0">\.</span>]e[+<span class="es0">\-</span>])([1-9]<span class="es0">\d</span>*?|0)<span class="es0">\&lt;</span>b<span class="es0">\&gt;</span>u?l{0,2}<span class="es0">\&lt;</span>/b<span class="es0">\&gt;</span>(?![0-9a-z<span class="es0">\.</span>])&#8217;</span>,</div>
</div>
<h4 id="sssec-html">HTML in code</h4>
<p>
An astute readed may have noted the bold in the previous snippet. Normally,<br />
you can&#8217;t do that in Geshi.. One of the things that Geshi does is translate<br />
HTML entities like &#8216;<code>&lt;</code>&#8216; into things like &#8220;<code>&amp;lt;</code>&#8221;<br />
so that it&#8217;ll turn up right on the resulting page. This, of course, is one of the<br />
things Geshi is expected to do. However, in this case it also makes it impossible<br />
to add HTML parts in the code snippet, which at times can be very useful.
</p>
<p>
So what do we do now? Well, we can use <i>escaped</i> HTML tags. Much like<br />
&#8220;<code>\n</code>&#8221; doesn&#8217;t actually mean backslash + &#8216;<code>n</code>&#8216; but a<br />
newline character, &#8220;<code>\&lt;</code>&#8221; can be used for the actual<br />
&#8216;<code>&lt;</code>&#8216;. And to <i>un</i>escape that, a double backslash can be used,<br />
much like it is in C.
</p>
<div class="none">
<div class="none proglist" style=" ">\\&lt;b\\&gt;BOLD\\&lt;/b\\&gt; &nbsp; &nbsp;becomes &nbsp; &nbsp; \&lt;b\&gt;BOLD\&lt;/b\&gt;</div>
</div>
<p>
There are several ways to implement this. One would be to modify it in the geshi<br />
code. I haven&#8217;t tried that route yet because I expect it could get messy. That&#8217;s<br />
arguably how it <i>should</i> be done, but it&#8217;s easier to do it after the fact:<br />
when all the conversions have been done. Basically, you need something like this:
</p>
<div class="php">
<div class="php proglist" style=" "><span class="co1">// Initialize geshi with the text to convert and language file to use.</span><br />
<span class="re0">$geshi</span> = <span class="kw2">new</span> GeSHi(<span class="re0">$text</span>, <span class="re0">$lang</span>, <span class="re0">$this</span>-&gt;geshi_path);</p>
<p><span class="co1">// This does the actual work.</span><br />
<span class="re0">$text</span>= <span class="re0">$geshi</span>-&gt;parse_code();</p>
<p><span class="co1">// Replace (un)escaped html entities.</span><br />
<span class="re0">$text</span>= <a href="http://www.php.net/str_replace"><span class="kw3">str_replace</span></a>(<br />
&nbsp; &nbsp; <a href="http://www.php.net/array"><span class="kw3">array</span></a>(<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Normal entities</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;\\\&amp;lt;&#8217;</span>, <span class="st_h">&#8216;\\\&amp;gt;&#8217;</span>, <span class="st_h">&#8216;\\\&amp;amp;&#8217;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// In-string escapes get crap added, gaddammittohell &gt;_&lt;.</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;&lt;span class=&quot;es0&quot;&gt;&lt;&lt;/span&gt;&#8217;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;&lt;span class=&quot;es0&quot;&gt;&gt;&lt;/span&gt;&#8217;</span>, <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;&lt;span class=&quot;es0&quot;&gt;&amp;&lt;/span&gt;&#8217;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Unescaped entities</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;\\\&amp;&#8217;</span>, <span class="st_h">&#8216;\\\&lt;&#8217;</span>, <span class="st_h">&#8216;\\\&gt;&#8217;</span>), <br />
&nbsp; &nbsp; <a href="http://www.php.net/array"><span class="kw3">array</span></a>(<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;&lt;&#8217;</span> &nbsp; &nbsp; , <span class="st_h">&#8216;&gt;&#8217;</span> &nbsp; &nbsp; , <span class="st_h">&#8216;&amp;&#8217;</span>, &nbsp; &nbsp;&nbsp; &nbsp; <span class="co1">// Normal entities</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;&lt;&#8217;</span> &nbsp; &nbsp; , <span class="st_h">&#8216;&gt;&#8217;</span> &nbsp; &nbsp; , <span class="st_h">&#8216;&amp;&#8217;</span>, &nbsp; &nbsp;&nbsp; &nbsp; <span class="co1">// In-string entities.</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;\\\&amp;amp;&#8217;</span>, <span class="st_h">&#8216;\\\&amp;lt;&#8217;</span>, <span class="st_h">&#8216;\\\&amp;gt;&#8217;</span>&nbsp; &nbsp; <span class="co1">// Unescaped entities</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; ), <br />
&nbsp; &nbsp; <span class="re0">$text</span>);</div>
</div>
<p>
There are three sets of items to search &amp; replace here. The first two are<br />
the basic escaped tag delimiters, so that they&#8217;ll actually result in HTML tags,<br />
and unescaped delimiters, so that you can print the combination itself. The<br />
third category are for HTML in string literals. Since the backslash has a<br />
specific meaning there as well, Geshi puts some highlighting stuff around it<br />
that would make the standard search fail. So that whole thing would need<br />
to be searched for and <s>destroyed</s>replaced.
</p>
<p>
It&#8217;s ugly, I know, but it seems to work. It&#8217;d be nicer if this could be done<br />
in the parser itself, but I have a feeling that&#8217;d take changes in multiple<br />
places. Since I don&#8217;t know the code that well yet, I&#8217;m not touching that<br />
one with a ten-foot pole.
</p>
<p>
Lastly, let&#8217;s test the ARM asm highlighter:
</p>
<div class="gccarm">
<div class="gccarm proglist" style=" "><span class="co2">// Regular int</span><br />
<span class="nu0">123</span><br />
<span class="nu0">123l</span><br />
<span class="nu0">123L</span><br />
<span class="nu0">123ll</span><br />
<span class="nu0">123LL</span> &nbsp; <br />
<span class="nu0">123u</span><br />
<span class="nu0">123U</span><br />
+<span class="nu0">123</span><br />
-<span class="nu0">123</span></p>
<p><span class="co2">// Binary</span><br />
<span class="nu0">0b01100110</span><br />
<span class="nu0">0B10101010</span></p>
<p><span class="co2">// Octal</span><br />
<span class="nu0">0123</span></p>
<p><span class="co2">// Hex</span><br />
<span class="nu0">0&#215;12</span><br />
<span class="nu0">0&#215;123</span><br />
0&#215;123.4</p>
<p><span class="co2">// Float</span><br />
<span class="nu0">123.4</span><br />
<span class="nu0">123.4f</span><br />
<span class="nu0">123.4F</span><br />
+<span class="nu0">123.4</span><br />
-<span class="nu0">123.4</span><br />
<span class="nu0">1.2e3</span><br />
<span class="nu0">1.2E3</span><br />
<span class="nu0">1.2e+3</span><br />
<span class="nu0">1.2e-3</span></p>
<p><span class="co2">// Inner</span><br />
(<span class="nu0">1.23</span>)<br />
abc123de</div>
</div>
<p>
Still works too. Bitchin&#8217;.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2009/06/new-geshi/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!--
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
</head>
<body>
<p>
My database has called in sick. Please imagine some 
annoying elevator tune till he gets back.
</p>
<p>
<small>[[Doo-di-doo tooo. Dum-di-dum-di-doo-dooo.]]</small>
</p>
</body>
</html>

-->
