Optimizing Data Commit Strategy for Efficient Performance

Monday, July 1, 2024 – Last week, while waiting for the data generation to be completed, in order to test the stock generation function, I retested the controller tests to ensure everything was running successfully. However, I discovered that I had overlooked a test, particularly the generate stock test. The cause of the failure was due to my implementation of a task when creating the stock as an alternative to a foreach loop. This task caused the failure because, as Mr. Peter explained, we cannot run tasks when calling or using services from the repository due to the multiple DB contexts involved.

To resolve this, I modified the code to use a foreach loop instead. However, my next concern was how to handle the same data being called again every time the table is generated. Although the table configurations include settings for unique data, looping through already stored data is time-consuming and unnecessary. After consulting with Mr. Peter, he suggested using the Except function to exclude existing data from the list, ensuring that the list only contains new data without the need for a foreach loop to check each item individually.

Next, while examining and testing the logic for the specific data generation, which was previously developed but had performance issues, I found that the commit transaction was causing a slowdown at the end of the progress bar. Committing a large volume of data at once proved to be inefficient. I experimented with moving the transaction into every loop, but this still resulted in a very slow process, with the progress bar updating every second, meaning thousands of loops would take thousands of seconds to complete. 

Mr. Peter suggested committing transactions in batches of 500, which improved the speed. I then tested increasing the batch size to 1,000, 10,000, and 20,000. I found that committing every 10,000 records was the most efficient for now, with a commit time of about 5-7 seconds per batch. This significantly sped up the generation process compared to committing every 500 records in one batch, which took about 2 seconds per batch.

Improving Performance with Pagination and Efficient Data Handling

Monday, July 8, 2024 – This week, I was tasked with adding pagination to another segment of the stock management UI. This required changes to both the API query and the UI view model. First, I focused on the API query by incorporating all the necessary additions into the request query, based on the existing filters from the front-end UI filter. Previously, all the data was retrieved in a single API call and stored in the background data grid collection. The filtering logic was then handled in the view model. The initial purpose of this approach was to ensure a faster response based on the user’s search.

However, after testing the UI with an expanded dataset reaching up to a hundred thousand records, retrieving all data at once degraded the page’s performance, which could be taking up to 8-10 seconds to load a single page. This led to the decision to implement pagination, which would significantly improve performance by fetching only 50-100 records per page.

While working on the pagination logic for the query, one challenge was calculating the average total margin directly within the API query. Initially, I fetched all stock data into a list, which slowed down performance and defeated the purpose of pagination by fetching a large amount of data at once. To address this, I adopted a different approach.

I was reminded of LINQ’s concept of Deferred Execution, where data is not immediately retrieved from the database. Instead, an expression tree representing the query is constructed, and the query is executed only when iterated over or when a method forcing execution (such as ToList() or ToListAsync()) is called. This insight led me to try the more efficient approach.

Thus, rather than immediately fetching the entire query result into a list, I applied filters to retrieve only the necessary data. I then utilized the Select() method to specify and retrieve only the required fields from the database, enabling efficient data transformation into the desired format. This approach offered greater flexibility for handling data, allowing for transformations, calculations, or filters directly within the Select() projection. By implementing these transformations efficiently, I optimized performance by fetching the transformed data into a data format that I want and only turning them into a list only when necessary, resulting in improved speed and efficiency.

Once the API query was complete, I fixed the controller tests and began working on the view and view model changes, ensuring the controller tests passed successfully. By the end of the week, I had completed most of the assigned tasks. There are a few remaining UI issues that I will continue to check next week to ensure a flawless user experience.

Updating Logic for New Data Models

Monday, June 24, 2024 – In my previous blog, I mentioned that I created three additional tables to store unique names, establishing relationships with the existing table to facilitate querying without manually assigning names.

After adding these three new models, I needed to update many parts of the code logic. The first step was to ensure all three new tables had the latest data before generating stock data. To achieve this, I created a repository and the necessary services to retrieve data from source names and items from different contexts and store them in the new tables before creating the stock history data.

I then tested the new logic and encountered a few errors. The first error occurred because I didn’t rebuild the Cassandra container correctly. The next error was an “unknown database” issue, which stumped me for a while. Mr. Peter identified that the debugger was running too quickly, causing it to end prematurely. He advised me to use a while loop to keep the debugger running while the status was still pending. This adjustment allowed the code to execute correctly without ending too fast.

After integrating the new tables, I realized that nearly all of the controller tests needed updates. This involved reassigning the item and source names for each test to reflect the new data structure. I meticulously went through each test, ensuring they were correctly updated and consistent with the new models. In the process, I took the opportunity to refactor and clean up the tests, improving their readability and efficiency. This comprehensive update not only fixed the immediate issues but also enhanced the overall robustness and maintainability of the test suite.