Monday, June 10, 2024 – Last week, I initiated the generation of 200,000 dummy data points. Although the data generation process worked, it was quite slow. Over the weekend, the process stopped midway and responded with a 401 unauthorized error. I restarted the program and the generation, and reported the status to Mr. Peter. It turned out that the reason for the interruption was a user API access limit of finite hours. Mr. Peter instructed me to extend the access duration on the identity server.
I then continued with my current task: debugging and investigating why the API calls for a list query were taking so long. While debugging a specific handler that Mr. Peter had previously improved by implementing a parallel foreach loop, I noted that I had initially used a standard foreach loop to assign additional data to each list item. The parallel foreach has the advantage of leveraging multiple threads to process items in the collection concurrently. This can significantly reduce the overall execution time, especially for large datasets or operations involving intensive computation. By distributing the workload across multiple threads, it maximizes CPU resource utilization, leading to faster task completion compared to a standard foreach loop, which processes each item sequentially.
However, despite testing different degrees of parallelism to optimize performance, where I used a stopwatch to measure the execution time of each configuration, having more than 4 degrees of parallelism did not show any improvements in execution times, so I settled on using 4 as the maximum degree of parallelism. Even after implementing the parallel foreach, the query performance was still unsatisfactory, taking over a minute to retrieve the list on the front end. This prompted me to search for additional optimization strategies.
Upon finding a suggestion on Stack Overflow, I realized that using FirstOrDefault might be contributing to the slow performance. An article confirmed that FirstOrDefault could be less efficient compared to other methods. I decided to replace it with ToDictionary, which was reported to have faster performance. After implementing this change, I observed a significant improvement in the retrieval time, reducing it to 19 seconds for the two lists called asynchronously, which was approximately 3 times faster.
Despite this improvement, the retrieval time of over 19 seconds was still far from ideal. Consequently, Mr. Peter advised me to implement pagination for the two lists to further enhance performance.
