Difference between revisions of "Frequently Asked Questions"

From Sniper
Jump to navigation Jump to search
Line 1: Line 1:
 +
'''Q: Why do the cache statistics in Sniper for a particular application not agree with my HW performance counters?'''
 +
 +
'''A:''' There are two major reasons why the cache numbers might be different when compared to hardware.  The first and most important reason is how the cache counters collect data in Sniper.  The cache access rates should look comparable to real hardware, but the miss rate might look completely different.  The reason for this is that the overlapping, or outstanding misses in Sniper count as hits, while on real hardware they would count as misses.  Internally, the memory subsystem completes each access and gets the result immediately, and uses a queuing model to determine contention.  Therefore, a miss in real hardware would be a hit in Sniper.  One way to compare the number of L1-D cache misses that we see in Sniper with hardware would be to compare that number to the number of L2-D accesses (assuming that it is a private L2-D cache).  The number of L2-D cache access would only represent the non-overlapped L1-D misses, which is how the statistics in Sniper work.
 +
 
'''Q: Why does the CPI-stack format that I generate with Sniper differ from the SC11 paper results?'''
 
'''Q: Why does the CPI-stack format that I generate with Sniper differ from the SC11 paper results?'''
  

Revision as of 07:34, 16 January 2012

Q: Why do the cache statistics in Sniper for a particular application not agree with my HW performance counters?

A: There are two major reasons why the cache numbers might be different when compared to hardware. The first and most important reason is how the cache counters collect data in Sniper. The cache access rates should look comparable to real hardware, but the miss rate might look completely different. The reason for this is that the overlapping, or outstanding misses in Sniper count as hits, while on real hardware they would count as misses. Internally, the memory subsystem completes each access and gets the result immediately, and uses a queuing model to determine contention. Therefore, a miss in real hardware would be a hit in Sniper. One way to compare the number of L1-D cache misses that we see in Sniper with hardware would be to compare that number to the number of L2-D accesses (assuming that it is a private L2-D cache). The number of L2-D cache access would only represent the non-overlapped L1-D misses, which is how the statistics in Sniper work.

Q: Why does the CPI-stack format that I generate with Sniper differ from the SC11 paper results?

A: We have recently updated the CPI stack format to better reflect system resource contention. See our recent IISWC publication for more details on these changes.

Q: Why does the TLB code in Sniper not perform the way that I expect?

A: Sniper is a user-space simulator, and therefore doesn't model all of the Hardware-Operating System interactions that one might expect to see. This is because the applications that we are targeting, HPC workloads, tend to see very few TLB misses. As an experiment, we looked into modeling the OS effects of TLB misses, but only from the perspective of OS-handled TLB misses. To use this, one would have to set the TLB size to the last-level TLB of the architecture that you are modeling, and set the miss penalty to 100s of cycles to account for the OS penalty. Modeling L1-TLBs is possible but is not currently implemented. Modifications to the memory subsystem to report TLB misses as a part of the load and store access times would be necessary to get this working properly.

Q: What are the license terms for using Sniper?

A: In short, the interval core model is protected under a US patent application. We automatically grant you a free license for using the interval model inside Sniper for academic purposes. For commercial use, please contact Lieven Eeckhout. All other code is licensed under the very liberal MIT license. You can view the full details on our License page.