Release Notes

Important information about the NVIDIA® Nsight™ Visual Studio Edition 3.1 release

Display Driver

You must install the NVIDIA display driver that supports the NVIDIA Nsight tools. If you have an NVIDIA graphics card installed on your target machine, you likely already have an NVIDIA display driver; however, NVIDIA Nsight requires a specific version of the driver in order to function properly. From the NVIDIA web site, download and install the following display driver (or newer):

Driver Release 319, Release 319 or newer

See below for more release information about:

CUDA Debugger

Graphics Debugger

Analysis Tools

CUDA Debugger

New in the 3.1 Release

  1. Debugging in Visual Studio 2012 is now supported.
  2. Microsoft Windows 8 operating systems are now supported.
  3. Support for the CUDA 5.5 Toolkit.

Changed Features and Fixed Issues in the 3.1 Release of the CUDA Debugger

  1. Windows Vista is no longer supported. (20897)
  2. The CUDA Debugger will now show the SASS registers view by default if none was selected. (22379) 
  3. With the CUDA 5.5 Toolkit and higher, all __shared__ variables declared at function scope will no longer display in the Visual Studio Locals window. (15122)
  4. With the CUDA 5.5 Toolkit and higher, when application execution is paused on a line of code that contains a conditional statement (such as an IF or WHILE statement), if you Step Over the statement, the CUDA Debugger will now pause at the next line of code after the conditional statement (after the closing curly brace). (5199)

Known Issues in the 3.1 Release of the CUDA Debugger

  1. Firewall and anti-intrusion software (e.g., McAfee Host Intrusion Prevention) will not allow remote debugger connections. Please disable or add an exclusion for Nsight Monitor. (22804)
  2. Fermi devices with an attached display can hang when stopping debugging while at a breakpoint. This occurs in hardware mode debugging. Please use preemption mode debugging, or switch to a GPU without an attached display. (18778)
  3. In some cases, when the CUDA application is built with the "Generate Relocatable Device Code" option, and a CUDA kernel function is declared with the __global__ static attributes, the NVIDIA Nsight debugger might not be able to display local variables inside that function. Users can work around this issue by simply removing the static qualifier on the function. (21914)
  4. On Tesla architecture GPUs, warps frozen at an exception or inline breakpoint may re-report the event when the debugger suspends for other reasons. (This has been fixed on Fermi and Kepler GPUs.) (16327)
  5. You must enable Memory Checker before launching a process, and cannot change the setting while debugging. (18935, 18937)
  6. When the CUDA Debugger is used to debug CUDA applications which share resources with DirectX 9 (such as the "simpleD3D9" sample program), the debugger may display incorrect values for memory locations in those shared resources. This may happen when the GPU device executing the application is Compute Capability 2.0 or higher. Incorrect values for the contents of memory may be displayed in any debug window (Autos, Locals, Watch, Warp Watch, or Memory). This issue does not affect applications using Direct3D 11. (13899)
  7. On Fermi GPUs, after an MMU fault, the CUDA Debugger will not resume normally. The application may still be terminated. (13808)
  8. When using the CUDA Debugger with NVIDIA Nsight, breakpoints will not be hit in source files whose full paths contain non-ASCII characters. Any path with a character code >= 128 is affected. (11429)
  9. If you experience hangs or TDRs while locally debugging CUDA on a single GPU (or using the Software Preemption debugging mode in general), try disabling operating system features that use video hardware acceleration. For example, disabling Aero on Windows 7, changing to a high-contrast desktop theme on Windows 8, or disabling WPF acceleration.
  10. Variables do not appear for source code that is not executed. This occurs because the compiler aggressively optimizes code even if you have not specified any compiler optimizations. As a result, the compiler removes any code that will not be executed from the output executable.
  11. Breakpoints will hit multiple times on lines that have more than one inline function call. For example, setting a breakpoint on:
    x = cos() + sin()
    will generate three breakpoints on that line. One for the evaluation of the expression, plus one for each function on the line.
  12. On Tesla architecture GPUs (e.g., GeForce GTX 260, Tesla® C1060), hitting a breakpoint may prevent HW error detection. It is possible that errors can be masked in the following situation:In this situation, the launch would succeed - but if run without breakpoints, a launch failure would occur.
    The above is just an example. It is possible that there are other ways for this to occur.
  13. Unloading modules does not refresh the state of breakpoints set in that module. This means that those breakpoints do not show their latest state in Visual Studio when they have been unloaded.
  14. The Visual Studio Breakpoint "Filter" option is not supported for CUDA GPU breakpoints.
  15. The Visual Studio Breakpoint "Hitcount" option is not supported for CUDA GPU breakpoints.
  16. When starting Graphics or CUDA Debugging in Visual Studio, the user cannot specify environment variables to be used in the environment block of the launched process.
  17. The F5 hotkey (which is the default hotkey in Visual Studio for starting the CPU Debugger) does not start the CUDA Debugger.
    To start the CUDA Debugger, you must either change the key bindings or use the menu command:
    Nsight > Start CUDA Debugging.
  18. There is no support for automatically performing a Build when launching the CUDA Debugger.
  19. The Load Symbols option, or "Symbols settings," in the Modules view is not supported for CUDA debugging.

Graphics Inspector and Graphics Debugger

New in the 3.1 Release of the Graphics Debugger

  1. Debugging in Visual Studio 2012 is now supported.
  2. Microsoft Windows 8 operating systems are now supported.
  3. Direct3D 11.1 is now supported. (21025) 
  4. Support for Direct State Access (DSA) feature of OpenGL. (19981) 
  5. Support for version 46 of the Microsoft HLSL compiler that exports pdb based debug information. (9456)
  6. OpenGL scissor state is now displayed on the host. (22621) 
  7. Frame Debugging for bindless graphics is now supported. (22386)
  8. Improvements to the OpenGL Frame Debugger's event view. With this, you now can:

Changed Features and Fixed Issues in the 3.1 Release of the Graphics Debugger

  1. Direct3D 10 applications are no longer supported. (20673) 
  2. Windows Vista is no longer supported. (20897)
  3. Local shader debugging performance improvements. (11405) 
  4. CUDA interop memory updates are now captured when rendering a frame; this allows data modifications to a buffer to be viewed in the associated OpenGL resource. (20160)
  5. GL_LUMINANCE and GL_LUMINANCE_ALPHA formats are now supported. (23100) 
  6. Fixed some instances of not seeing the geometry preview and now properly declare which situations are not supported (like DrawAuto).
  7. Better handling of out of memory situations when profiling or saving a capture to disk.
  8. Performance improvements to the profiler, reducing the number of passes needed to calculate the bottleneck and utilization results.

Known Issues in the 3.1 Release of the Graphics Debugger

  1. The saving and loading of Frame Profiler data for Direct3D applications was disabled for this build, due to a late bug. The bug will be fixed and the feature re-enabled in a future release. (24313) 
  2. If you experience a TDR on either your host or target system, it is recommended that you reboot your machine before trying to restart the debugging session. (22986)
  3. When performing local GPU debugging, there is a chance that the GPU will be paused long enough for the operating system's Timeout Detection and Recovery (TDR) mechanism to be triggered. The Nsight Monitor has settings in the General page to allow you to change the delay time (we recommend 30 seconds) or disable TDR completely. (22733)
  4. NVIDIA Nsight may not work correctly unless double buffering is enabled by the application, since GLUT or other implementations may skip the SwapBuffer calls for a single buffered window. (24590) 
  5. Debugging of HLSL Effects (anything compiled with an fx_N_M target) is not supported, only pure HLSL shaders. (24891) 
  6. If you encounter rendering issues when running a serialized capture, it could be due to NVIDIA Nsight saving the debug compiled shaders. This is done so that both dynamic shader editing and shader debugging functions properly. However, there may be bugs in the shaders generated by the Direct3D compiler when running in debug. To update to the latest compiler/runtime, you can run on Windows 8, or update your Windows 7 system to SP1 and install the platform update from http://support.microsoft.com/kb/2670838.
  7. Firewall and anti-attack software (e.g., McAfee Host Intrusion Prevention) will not allow remote debugger connections. Please disable or add an exclusion for Nsight Monitor. (22804)
  8. The following features are not available on GPUs other than Fermi and Kepler:
  9. The Pixel value in the tables for the OpenGL profiler represent shaded pixels (i.e., fragments that ran the fragment shader). If color writes are disabled and the fragment shader doesn't write Z, this value may be 0, even though the depth value for a fragment may be written. (22061)
  10. Due to limitations of the HLSL and GLSL compiler, debugging of shaders that were concatenated from different source files using the #line directive to refer back to the original sources may not work as expected. (22067)
  11. The frame scrubber does not support the nvtxRangeBegin and nvtxRangeEnd functions, only nvtxRangePush and nvtxRangePop. (22163)
  12. Managed applications built with the AnyCpu configuration are not supported. The target application must be built using either the Win32 or x64 configurations.
  13. NVIDIA Nsight for Graphics uses a setting in the driver that enables instrumentation. If the application does not close cleanly, this setting can remain enabled, which can cause some additional CPU overhead. The Nsight Monitor has an option to disable this, in case you see any issues running your application outside of NVIDIA Nsight. (16936, 16962)
  14. VS tracepoints, enabled by the "When Hit" breakpoint option, do not work as expected when the "Continue Execution" option is also set. Only the first hit will be reported and the target application will hang afterwards. You will need to select Debug > Stop Debugging to resume the debug target. (10904)
  15. Source code syntax highlighting and the population of the autos window with your programs variables are set up with a file extension to programming language mapping. If these are not working, you can add the extension for your HLSL source code files to the list in the Tools > Options dialog under the Text Editor > File Extension section, and associate them with Microsoft Visual C++. (12094)
  16. The Graphics Debugger's Autos window may not show all variables as expected. This may happen if a shader is compiled using preprocessor macros to conditionally include or exclude code lines, and those macro definitions may only be available at shader compile time. (12094)
  17. Forcing the target application to close through the task manager while in the frame debugger may crash the target application.
DirectX Known Issues
  1. If you are passing the D3D11_MAP_FLAG_DO_NOT_WAIT to a Map call on a Direct3D 11 Device Context, it is possible that the operation hasn't finished so you will see a return code of 0x887A000A or DXGI_ERROR_WAS_STILL_DRAWING. This can sometimes happen when the capture is trying to restore a buffer to the frame start state and it is mapped early in the frame. Simply remove the D3D11_MAP_FLAG_DO_NOT_WAIT, and it should function properly. (24846)
  2. There are two DirectX shader compiler bugs that may cause incorrect stepping behavior.
    1. The DX shader compiler will map "end-of-block" instructions to the beginning line number of the block in the HLSL source.
    2. The DX shader compiler will map "implicit" returns to the beginning line number of the shader.

    This issue can be resolved by always adding an "explicit" return at the end of your shader. (14656)

  3. In some cases, very small vertex buffers cannot be retrieved from the GPU, so the Vertices3D view in the Graphics Focus Picker may not display the correct input vertices. (14192)
  4. If a pipeline stage does not have an object bound, then the related state is not displayed on the host. For example, if there is no pixel shader bound, then no Shader Resource Views will be shown in the Pixel Shader page. (15394)
  5. HLSL code cannot contain any non-ASCII characters. Any character with a character code >= 128 is not supported. (14760)
  6. Applications that intercept DirectX devices or objects by use of a shim object are not supported. This interferes with an internal mechanism and therefore cannot be handled properly. (14470)
  7. Debugging when running with Stereoscopic 3D is not supported. This will be fixed in a future version. In the meantime, please run your application with Stereoscopic 3D Stereo disabled when debugging with NVIDIA Nsight. (12618)
  8. NVIDIA Nsight is incompatible with the debug runtime in all versions of Direct3D. While it may sometimes work, there are known incompatibilities that we are unable to support at this time.
  9. The Graphics Debugger does not support the Reference Rasterizer (RefRast) tool, which is the CPU rasterizer provided by Microsoft. The Graphics Debugger will signal an error if the IDXGIFactory::CreateSoftwareAdapter function is used for device creation.
  10. You may not be able to see all local or global variables in the Watch window. This can be due to optimizations performed by the HLSL compiler.
  11. You may not be able to set a breakpoint on certain lines of source code. This can be due to optimizations performed by the HLSL compiler.
  12. Expression evaluation and breakpoint conditions do not support HLSL built-in functions and vector and matrix expressions.
OpenGL Known Issues
  1. OpenGL compute shader debugging is not supported.
  2. Debugging of shaders in applications that use more than one OpenGL context may not work under some circumstances.
  3. If you see a significant drop in frame rate when running NVIDIA Nsight, it may be due to some shader compilation optimizations that are disabled in the driver. This is a bug that will be fixed in a future driver version, but may also impact the profiler results to show the shader unit as more of a bottleneck than it truly is. (23163) 
  4. Some debugger windows, such as the Shaders List or Focus Picker, use Direct3D shader type names (i.e., hull shader) instead of GLSL shader type names.
  5. There are times when the GUI may not refresh completely when debugging OpenGL programs. Please force the window to refresh by minimizing and restoring the window. (19869)
  6. If you are connected to your target using VNC and attempting to debug an OpenGL program, make sure to disable any "Hook" or "Mirror" driver option in the VNC server settings. (20686)
  7. We suggest that you disable any breakpoints in CUDA code before entering Frame Debugger (i.e., Pause and Capture Frame). There are some cases where hitting the breakpoint will cause the Frame Debugger to become unresponsive. (20721)
  8. GLSL Shader debugging can be unstable when running with multiple displays. Please run with a single monitor when debugging GLSL shaders. (21402)
  9. Visualization of Depth-Stencil formats in Visual Studio is limited in the OpenGL Graphics Debugger. The depth part of a DS format is displayed, but not the stencil. Note that this is not a limitation on the target application when running with NVIDIA Nsight. You can switch between Render Target, Depth, or Stencil through the HUD toolbar while the target application is being debugged.
C++ AMP Debugger Known Issues
  1. C++ AMP degbugging is only supported on Fermi and Kepler GPUs.
  2. The Break for every thread (like CPU behavior) option is not supported.
  3. Attaching to an already running process may crash the process with drivers prior to R319. (17633)
  4. Editing of variable values may not work for variables that are not arrays or declared with the title_static modifier.

 

 

Analysis Tools

New in the 3.1 Release

  1. Debugging in Visual Studio 2012 is now supported.
  2. Microsoft Windows 8 operating systems are now supported.
  3. Direct3D 11.1 is now supported. (21025) 
  4. Support for the CUDA 5.5 Toolkit.
  5. Support Direct State Access (DSA) feature of OpenGL. (19981) 
  6. Improved performance when tracing DirectX or OpenGL applications.
  7. Added the new SM pipeline utilization experiment.

Changed Features and Fixed Issues in the 3.1 Release of the Analysis Tools

  1. Direct3D 10 applications are no longer supported. (20673) 
  2. Windows Vista is no longer supported. (20897)
  3. The NvPmApi.h NvPmApi structure definition has been updated, and is not binary compatible with the previous version. The new field size has been added to improve structure versioning. (12997)

Known Issues in the 3.1 Release of the Analysis Tools

Analysis Activity Known Issues
  1. Tracing the following APIs is not supported in managed processes:
    • NVTX
    • OpenCL
    • Direct3D
    • OpenGL
    • Launching a managed .exe for tracing with any of the aforementioned APIs enabled will result in an "Access Denied" pop-up message, and the analysis session will not start.
    • In Trace Process Tree mode, instrumentation for tracing the aforementioned APIs can only propagate to native child processes. If a managed child process is launched, neither it nor any child process it launches (managed or native) can be instrumented by NVIDIA Nsight. The analysis session will continue unaffected, and the user will not be notified of the problem; the report will not contain data from managed processes and their children.
    • System and CUDA tracing is fully supported in managed processes, and in Trace Process Tree mode, tracing support propagates to all child processes (native or managed).
    • Managed processes are fully supported in the Profile CUDA modes.
  2. The stop collection timer is implemented in Visual Studio. The latency to communicate to the monitor and application can result in a longer duration than requested.
  3. CPU Thread Trace
    If the Windows Kernel Event Provider is already in use when a new capture session is launched, the collected data may produce unexpected results. For best results ensure that no other kernel providers are running during an analysis session.
  4. CUDA Trace
    • CUDA trace does not show implicit memory transfers for graphics interop.
    • CUDA Runtime API trace does not capture the <<< >>> kernel launch syntax. Instead, the corresponding CUDA Runtime API calls are reported. Some of the CUDA Driver API calls that are executed by the CUDA Runtime may report errors, such as CUDA_ERROR_INVALID_CONTEXT, even though the usage of the CUDA Runtime API is valid. (6745)
    • When collecting trace information about CUDA kernels and memory transfers, sometimes the report file will not contain complete information about the kernels and memory transfers. This happens because retrieving the data interferes with the application and affects performance, so the tool only does it after these events: 
      • a call to cuCtxSynchronize()/cudaDeviceSynchronize(),
      • a call to cuCtxDestroy()/cudaDeviceReset(),
      • a call to cuStreamDestroy()/cudaStreamDestroy(),
      • the application launches enough kernels or memory transfers to fill up NVIDIA Nsight's buffer, so NVIDIA Nsight forces a context synchronize in order to retrieve the data.

      If your capture appears to be missing some or all kernel launch or memory transfer events, either force the data to flush by adding a call to cuCtxSynchronize()/cudaDeviceSynchronize() after all the CUDA work is finished, or (for an application that continuously launches kernels and memcpys), simply capture for more time and try to generate enough data to incur NVIDIA Nsight's flush for a full buffer. (4812)
  5. CUDA Profiler
    • On Tesla GPUs, branch counters include __syncthreads().
    • Profile Trigger increments by 1 per warp, not by 1 per active thread.
  6. OpenCL
    • The end timestamp can sometimes be recorded significantly after the completion of a command. If this occurs, adding a clFlush after specific command will fix the timestamp.
    • The start/end range for memory read and write commands includes both host and device time. CUDA start/end range only includes device time.
    • Viewing OpenCL Source or Binary code from the OpenCL Programming Builds or OpenCL Program Summary creates a temporary file in %TMP%. The temporary file is not deleted when the file is closed.
    • OpenCL reports occasionally do not contain device commands. This can occur if the OpenCL context/queue is not released or less than 512 events occurred during a capture.
  7. DirectX/OpenGL Trace
    • Graphics workload information, such as draw calls and dispatches, are output in groups of 16384 workload events. As a consequence, a report will not contain any graphics workload information if an insufficient number of draw calls occurred during a capture. Increasing the capture duration will help to work around this limitation.
    • Some applications, such as Chrome, run in a sandbox environment. The effects on NVIDIA Nsight of such a sandbox are hard to predict, so if having trouble, a user should read the documentation for the target application, and disable any sandbox when possible. For Chrome, the applicable launch flag is -no-sandbox. (16426)
    • When you are running analysis for DX apps on a multi-GPU system, you could see a hang. When running frame timings for DX apps on a multi-GPU system, you could see a timeout waiting for the results. One possible solution would be to connect the monitor to the other GPU. Failing that, you should run analysis with only one GPU plugged into the system.
Analysis Report Known Issues
Timeline Known Issues
  1. There can be an error of approximately 1 microsecond between CPU events and GPU events.
  2. Percentages displayed in the row labels and tool tips are based upon the full capture time.
  3. The mouse forward and back buttons cannot be used to navigate the report page system.
  4. CTRL+- toggles to the previous document instead of Zooming Out.
  5. Double-clicking on a row containing a line/area graph that also has children will expand/collapse the row as opposed to increasing the height to 66% of the view.
  6. Using VNC (virtual network computing) software to remotely open a Timeline Report can cause Visual Studio to crash. (7157)


NVIDIA® Nsight™ Development Platform, Visual Studio Edition User Guide Rev. 3.1.130815 ©2009-2013. NVIDIA Corporation. All Rights Reserved.

of