<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: DMA vs ARM9, round 2 : invalidate considered harmful</title>
	<atom:link href="http://www.coranac.com/2010/03/dma-vs-arm9-round-2/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.coranac.com/2010/03/dma-vs-arm9-round-2/</link>
	<description>my own little world</description>
	<lastBuildDate>Fri, 23 Dec 2011 16:50:19 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.4</generator>
	<item>
		<title>By: sylvainulg</title>
		<link>http://www.coranac.com/2010/03/dma-vs-arm9-round-2/comment-page-1/#comment-5617</link>
		<dc:creator>sylvainulg</dc:creator>
		<pubDate>Fri, 22 Apr 2011 12:12:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.coranac.com/?p=175#comment-5617</guid>
		<description>I see. Looks like I had overlooked this part of the GBA Tek.</description>
		<content:encoded><![CDATA[<p>I see. Looks like I had overlooked this part of the GBA Tek.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cearn</title>
		<link>http://www.coranac.com/2010/03/dma-vs-arm9-round-2/comment-page-1/#comment-5609</link>
		<dc:creator>cearn</dc:creator>
		<pubDate>Wed, 20 Apr 2011 12:42:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.coranac.com/?p=175#comment-5609</guid>
		<description>The stack is in DTCM, which is invisible to the DMA controller. This is pretty much why &lt;code&gt;REG_DMA&lt;i&gt;n&lt;/i&gt;FILL&lt;/code&gt; exists, and why you can&#039;t DMA a local array.</description>
		<content:encoded><![CDATA[<p>The stack is in DTCM, which is invisible to the DMA controller. This is pretty much why <code>REG_DMA<i>n</i>FILL</code> exists, and why you can&#8217;t DMA a local array.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sylvainulg</title>
		<link>http://www.coranac.com/2010/03/dma-vs-arm9-round-2/comment-page-1/#comment-5608</link>
		<dc:creator>sylvainulg</dc:creator>
		<pubDate>Wed, 20 Apr 2011 12:19:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.coranac.com/?p=175#comment-5608</guid>
		<description>Just to check, has everyone ever managed to use the stack as source/destination of a DMA transfer, or is that an auwful idea per se ?</description>
		<content:encoded><![CDATA[<p>Just to check, has everyone ever managed to use the stack as source/destination of a DMA transfer, or is that an auwful idea per se ?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: steve.zhan</title>
		<link>http://www.coranac.com/2010/03/dma-vs-arm9-round-2/comment-page-1/#comment-3289</link>
		<dc:creator>steve.zhan</dc:creator>
		<pubDate>Mon, 05 Jul 2010 17:17:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.coranac.com/?p=175#comment-3289</guid>
		<description>Hi,
 // Assuming cached regions. Add tests for that yourself.
void dmaCopySafish(const void *src, void *dst, u32 size)
{
    DC_FlushRange(src, size);                       // Flush source.
    
    u32 addr= (u32)dst;
    if(addr % CACHE_LINE_SIZE)                      // Check head
        DC_FlushRange((void*)(addr), 1);
        
    if((addr+size) % CACHE_LINE_SIZE)               // Check tail.
        DC_FlushRange((void*)(addr+size), 1);

When some task interrupt the current task, and access the boundary of the current cache(because the current cache is
not aligne &quot;cache line&quot;, then may be cached, if cpu modify these memory at this time, 

    dmaCopy(src, dst, size);                        // Actual copy.
    DC_InvalidateRange(dst, size);                  // Final invalidate.

the modify of the memory has lost....

}</description>
		<content:encoded><![CDATA[<p>Hi,<br />
  // Assuming cached regions. Add tests for that yourself.<br />
 void dmaCopySafish(const void *src, void *dst, u32 size)<br />
 {<br />
     DC_FlushRange(src, size);                       // Flush source.</p>
<p>     u32 addr= (u32)dst;<br />
     if(addr % CACHE_LINE_SIZE)                      // Check head<br />
         DC_FlushRange((void*)(addr), 1);</p>
<p>     if((addr+size) % CACHE_LINE_SIZE)               // Check tail.<br />
         DC_FlushRange((void*)(addr+size), 1);</p>
<p> When some task interrupt the current task, and access the boundary of the current cache(because the current cache is<br />
 not aligne &#8220;cache line&#8221;, then may be cached, if cpu modify these memory at this time, </p>
<p>     dmaCopy(src, dst, size);                        // Actual copy.<br />
     DC_InvalidateRange(dst, size);                  // Final invalidate.</p>
<p> the modify of the memory has lost&#8230;.</p>
<p> }</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cache link &#171; Corey&#39;s Journal</title>
		<link>http://www.coranac.com/2010/03/dma-vs-arm9-round-2/comment-page-1/#comment-3099</link>
		<dc:creator>cache link &#171; Corey&#39;s Journal</dc:creator>
		<pubDate>Fri, 23 Apr 2010 06:19:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.coranac.com/?p=175#comment-3099</guid>
		<description>[...] Cearn&#8217;s interesting discussion about ds cache and DMA [...]</description>
		<content:encoded><![CDATA[<p>[...] Cearn&#8217;s interesting discussion about ds cache and DMA [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sylvainulg</title>
		<link>http://www.coranac.com/2010/03/dma-vs-arm9-round-2/comment-page-1/#comment-3029</link>
		<dc:creator>sylvainulg</dc:creator>
		<pubDate>Wed, 31 Mar 2010 06:42:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.coranac.com/?p=175#comment-3029</guid>
		<description>&lt;q&gt;According to gbatek, it&#039;s this:&lt;/q&gt;
ooh. YeS. of course. Row-preload latency and all those RAS-CAS things. I tend to overlook it. And it basically explain the role of &quot;cache lines&quot; size mentioned above.

Thanks for the time you invest on talking about the issue. It&#039;s been a while since I found something to have a DMA-related discussion with ;)</description>
		<content:encoded><![CDATA[<p><q>According to gbatek, it&#8217;s this:</q><br />
 ooh. YeS. of course. Row-preload latency and all those RAS-CAS things. I tend to overlook it. And it basically explain the role of &#8220;cache lines&#8221; size mentioned above.</p>
<p> Thanks for the time you invest on talking about the issue. It&#8217;s been a while since I found something to have a DMA-related discussion with ;)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cearn</title>
		<link>http://www.coranac.com/2010/03/dma-vs-arm9-round-2/comment-page-1/#comment-3026</link>
		<dc:creator>cearn</dc:creator>
		<pubDate>Tue, 30 Mar 2010 14:08:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.coranac.com/?p=175#comment-3026</guid>
		<description>&lt;blockquote&gt;When Flush(destination) is believed to take too much time, we could replace it with
[code]
if (unaligned(dst.start)) 
   flush(dst.start &amp; alignment, ( dst.start &amp; alignment )+cache_line);
if (unaligned(destination.end))
   flush(dst.end &amp; alignment, ( dst.end &amp; alignment) + cache_line);
flush(src)
dma_copy(...)
invalidate(dst)
[/code]
couldn&#039;t we ?
&lt;/blockquote&gt;Argh, I forgot about this option. Yes, this is also possible. And probably the best solution, although it&#039;s a little awkward to write it out. I think it&#039;d be something like this:
[code lang=&quot;cpp&quot;]#define CACHE_LINE_SIZE	32

// Assuming cached regions. Add tests for that yourself.
void dmaCopySafish(const void *src, void *dst, u32 size)
{
	DC_FlushRange(src, size);          				// Flush source.
	
	u32 addr= (u32)dst;
	if(addr % CACHE_LINE_SIZE)						// Check head
		DC_FlushRange((void*)(addr), 1);
		
	if((addr+size) % CACHE_LINE_SIZE)				// Check tail.
		DC_FlushRange((void*)(addr+size), 1);

	dmaCopy(src, dst, size);						// Actual copy.
	DC_InvalidateRange(dst, size);					// Final invalidate.
}
[/code]

&lt;blockquote&gt;
Btw, I&#039;d suspect that RAM-to-RAM DMA copies being slower than RAM-to-vraM could be due to the fact that raM and VRAM are actually using distinct buses. When moving data through separated buses, reads and writes don&#039;t have to fight for bus bandwidth and can happen in parallel.
&lt;/blockquote&gt;

According to &lt;a href=&quot;http://nocash.emubase.de/gbatek.htm#dsdmatransfers rel=&quot;nofollow&quot;&gt;gbatek&lt;/a&gt;, it&#039;s this:

&lt;blockquote&gt;
&lt;b&gt;NDS Sequential Main Memory DMA&lt;/b&gt;&lt;br&gt;
Main RAM has different access time for sequential and non-sequential access. Normally DMA uses sequential access (except for the first word), however, if the source and destination addresses are both in Main RAM, then all accesses become non-sequential. In that case it would be faster to use two DMA transfers, one from Main RAM to a scratch buffer in WRAM, and one from WRAM to Main RAM.
&lt;/blockquote&gt;

&lt;br&gt;&lt;p&gt;
I just noticed this thread, where simonjhall warns against exactly this type of behaviour: &lt;a href=&quot;http://forum.gbadev.org/viewtopic.php?t=15294&quot; rel=&quot;nofollow&quot;&gt;http://forum.gbadev.org/viewtopic.php?t=15294&lt;/a&gt;.
&lt;/p&gt;</description>
		<content:encoded><![CDATA[<blockquote><p>When Flush(destination) is believed to take too much time, we could replace it with</p>
<div class="none">
<div class="none proglist" style=" ">if (unaligned(dst.start)) <br /> &nbsp; &nbsp;flush(dst.start &amp;amp; alignment, ( dst.start &amp; alignment )+cache_line);<br /> if (unaligned(destination.end))<br /> &nbsp; &nbsp;flush(dst.end &amp;amp; alignment, ( dst.end &amp; alignment) + cache_line);<br /> flush(src)<br /> dma_copy(&#8230;)<br /> invalidate(dst)</div>
</div>
<p> couldn&#8217;t we ?
 </p></blockquote>
<p>Argh, I forgot about this option. Yes, this is also possible. And probably the best solution, although it&#8217;s a little awkward to write it out. I think it&#8217;d be something like this:</p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="kw1">#define</span> CACHE_LINE_SIZE <span class="nu0">32</span></p>
<p> <span class="co1">// Assuming cached regions. Add tests for that yourself.</span><br /> <span class="kw1">void</span> dmaCopySafish(<span class="kw1">const</span> <span class="kw1">void</span> *src, <span class="kw1">void</span> *dst, u32 size)<br /> {<br /> &nbsp; &nbsp; DC_FlushRange(src, size); &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Flush source.</span><br /> &nbsp; &nbsp; <br /> &nbsp; &nbsp; u32 addr= (u32)dst;<br /> &nbsp; &nbsp; <span class="kw1">if</span>(addr % CACHE_LINE_SIZE)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Check head</span><br /> &nbsp; &nbsp; &nbsp; &nbsp; DC_FlushRange((<span class="kw1">void</span>*)(addr), <span class="nu0">1</span>);<br /> &nbsp; &nbsp; &nbsp; &nbsp; <br /> &nbsp; &nbsp; <span class="kw1">if</span>((addr+size) % CACHE_LINE_SIZE) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Check tail.</span><br /> &nbsp; &nbsp; &nbsp; &nbsp; DC_FlushRange((<span class="kw1">void</span>*)(addr+size), <span class="nu0">1</span>);</p>
<p> &nbsp; &nbsp; dmaCopy(src, dst, size);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Actual copy.</span><br /> &nbsp; &nbsp; DC_InvalidateRange(dst, size);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Final invalidate.</span><br /> }</div>
</div>
<blockquote><p>
 Btw, I&#8217;d suspect that RAM-to-RAM DMA copies being slower than RAM-to-vraM could be due to the fact that raM and VRAM are actually using distinct buses. When moving data through separated buses, reads and writes don&#8217;t have to fight for bus bandwidth and can happen in parallel.
 </p></blockquote>
<p> According to <a href="http://nocash.emubase.de/gbatek.htm#dsdmatransfers rel="nofollow">gbatek</a>, it&#8217;s this:</p>
<blockquote><p>
 <b>NDS Sequential Main Memory DMA</b><br />
 Main RAM has different access time for sequential and non-sequential access. Normally DMA uses sequential access (except for the first word), however, if the source and destination addresses are both in Main RAM, then all accesses become non-sequential. In that case it would be faster to use two DMA transfers, one from Main RAM to a scratch buffer in WRAM, and one from WRAM to Main RAM.
 </p></blockquote>
<p>
<p>
 I just noticed this thread, where simonjhall warns against exactly this type of behaviour: <a href="http://forum.gbadev.org/viewtopic.php?t=15294" rel="nofollow">http://forum.gbadev.org/viewtopic.php?t=15294</a>.
 </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sylvainulg</title>
		<link>http://www.coranac.com/2010/03/dma-vs-arm9-round-2/comment-page-1/#comment-3023</link>
		<dc:creator>sylvainulg</dc:creator>
		<pubDate>Mon, 29 Mar 2010 11:26:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.coranac.com/?p=175#comment-3023</guid>
		<description>When Flush(destination) is believed to take too much time, we could replace it with
&lt;code&gt;
if (unaligned(dst.start)) 
   flush(dst.start &amp; alignment, ( dst.start &amp; alignment )+cache_line);
if (unaligned(destination.end))
   flush(dst.end &amp; alignment, ( dst.end &amp; alignment) + cache_line);
flush(src)
dma_copy(...)
invalidate(dst)
&lt;/code&gt;

couldn&#039;t we ?

Btw, I&#039;d suspect that RAM-to-RAM DMA copies being slower than RAM-to-vraM could be due to the fact that raM and VRAM are actually using distinct buses. When moving data through separated busses, reads and writes don&#039;t have to fight for bus bandwidth and can happen in parallel.</description>
		<content:encoded><![CDATA[<p>When Flush(destination) is believed to take too much time, we could replace it with<br />
 <code><br />
 if (unaligned(dst.start))<br />
    flush(dst.start &amp; alignment, ( dst.start &amp; alignment )+cache_line);<br />
 if (unaligned(destination.end))<br />
    flush(dst.end &amp; alignment, ( dst.end &amp; alignment) + cache_line);<br />
 flush(src)<br />
 dma_copy(...)<br />
 invalidate(dst)<br />
 </code></p>
<p> couldn&#8217;t we ?</p>
<p> Btw, I&#8217;d suspect that RAM-to-RAM DMA copies being slower than RAM-to-vraM could be due to the fact that raM and VRAM are actually using distinct buses. When moving data through separated busses, reads and writes don&#8217;t have to fight for bus bandwidth and can happen in parallel.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Coranac » DMA vs ARM9, round 2 : invalidate considered harmful Zero Me</title>
		<link>http://www.coranac.com/2010/03/dma-vs-arm9-round-2/comment-page-1/#comment-3022</link>
		<dc:creator>Coranac » DMA vs ARM9, round 2 : invalidate considered harmful Zero Me</dc:creator>
		<pubDate>Sun, 28 Mar 2010 23:00:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.coranac.com/?p=175#comment-3022</guid>
		<description>[...] more:  Coranac » DMA vs ARM9, round 2 : invalidate considered harmful          By admin &#124; category: code zero &#124; tags: code zero, definitely-need, dma, invalidate, [...]</description>
		<content:encoded><![CDATA[<p>[...] more:  Coranac » DMA vs ARM9, round 2 : invalidate considered harmful          By admin | category: code zero | tags: code zero, definitely-need, dma, invalidate, [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!--
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
</head>
<body>
<p>
My database has called in sick. Please imagine some 
annoying elevator tune till he gets back.
</p>
<p>
<small>[[Doo-di-doo tooo. Dum-di-dum-di-doo-dooo.]]</small>
</p>
</body>
</html>

-->
