I was recently on-site at a large Dynamics NAV customer attempting to help them get through a number of significant performance issues they encountered on their two previous failed attempts to upgrade from NAV 3.70 to NAV 5.00 SP1. In summary, they had in excess of 100 concurrent users running relatively well on NAV 3.70 with a high transaction volume. When they attempted to go live on the upgraded NAV 5.00 SP1 version they could barely get 10 users in the system without completely deadlocking each other and it was, for the lack of better term, unusable. During the previous two attempts to upgrade the partner had deployed the NAV Application Benchmark tool and coded in all the customer business cases and custom functionality to allow them to do their scalability testing (Very Cool!!). This is where I came in.
The first thing the partner and I did was run a 30-user mixed workload test for 10 minutes on the upgraded code just to see where we were at. During this test we received over 1100 SQL deadlocks. There were so many it was difficult to even categorize where they were coming from. We reviewed the C/AL code for each of the profiles we were using and with client monitor trying to track down issues with the locking order of the tables between these processes. Sounds easy, but it’s not and took a considerable amount of time. As we were scanning through the code I happened to look over the shoulder of one of the partners who had the code up on his screen for one of the processes only reports we were using and I noticed a comment in the code that caught my attention.
//CI-Perf 1.01 140206 CI-HJS 1.00
What first caught my attention was the word “Perf.” I was very interest to know what performance modifications had been made to the code. The second thing that caught my attention was the date “140206,” which means this performance enhancement dated back to 2006, when they were running NAV 3.70. Further inspection of the code revealed that several of these enhancements were to correct locking order bugs in the standard code and localization code for the 3.70 version. These changes were absolutely necessary on 3.70 to be able to achieve any type of scalability but on 5.00 SP1 they were fixing issues which no longer existed, causing several locking order violations in very key NAV processes and causing a huge number of deadlocks. We went through all the code and “unwound” these locking order changes and re-ran the benchmark tests*. Eventually we ended in a 100-user mixed workload test for 2 hours that produced > 30 deadlocks and these were on reservation entry and no. series, and we also achieved a very high transaction volume.
The moral of the story is when you do code upgrades from version to version, especially when you have a large gap of versions in between, be very careful what C/AL performance optimizations you bring forward from the old version to the new, and evaluate each one to make sure it is still necessary or it could end up having the absolute opposite effect on the newer version and causing performance issues.
Thanks go out the partner who helped me work through all these issue while I was on-site!
*There were other changes made with executable versions and SQL and database configuration that I will not detail here.
Senior Premier Field Engineer
Microsoft Dynamics
Microsoft Certified Master – SQL Server 2008