Implementing Excel Import Functionality and Tackling Minor Bug Fixes

Monday, August 12, 2024 – Towards the end of last week, I was tasked with creating a new functionality that allows data to be imported and read from an Excel file. This was a straightforward task requiring only two buttons for functionality. I started by working on the view and view model to get a comprehensive understanding of how the UI should operate and what data needed to be sent to the API.

Since this was my first time working with Excel import functionality, I spent some time familiarizing myself with the existing code to understand how previous imports were handled. After drafting the view and view model, I moved on to developing the API. My plan was to extract the data list from the imported Excel file, send it to the API, and then process it accordingly. After some work, I successfully completed the task, and the new feature worked as expected. However, after a review by Mr. Peter, I realized I had overlooked validation error handling for cases where the imported Excel file might contain faults or unexpected data.

In addition to this task, I also worked on fixing a few minor bugs related to previously developed features. These included adjusting the UI to fit the standard screen resolution of 120% used by most user, adding progress percentage logs to keep track of the progress via log files, and optimizing an API query that was taking too long to execute. By the end of the week, the Excel import feature required some minor checks for additional conditions before I could push the latest changes to the cloud server.

Navigating Migration Challenges and Debugging with Stack Traces

Monday, August 5, 2024 – After last week’s failed migration attempt, I identified the root cause as a mismatch between the data structure of the migration and the current table. After several attempts to resolve the issue, I decided to remove the problematic table entirely and recreate it using the existing SQL migration files.

Moving on to my next task, I needed to debug an issue where data generation was failing, causing the progress bar to close automatically without any visible error. This behavior led to errors going unnoticed. To address this, I added more robust error handling and logging to ensure that any errors would be flagged to the user. However, even after implementing these measures, the source of the error remained unclear. At this point, Mr. Peter suggested using stack traces for complex functions and demonstrated how to apply them effectively. With stack trace error handling, we could pinpoint the exact line where the error was thrown, making troubleshooting much easier.

Later in the week, Mr. Peter assigned me to research message queues and identify the best platform for our upcoming tasks. After exploring various options, I became particularly interested in RabbitMQ and delved into its documentation to learn more about this service.

Combining Data Tables for Efficient Referencing

Monday, July 29, 2024 – Last week, after I pushed all my changes to cloud server, Mr. Peter found two API tests failing. I quickly investigated the tests and found that one was failing because I had added a new table but hadn’t updated the data for that specific test. However, the second test revealed a loophole in my logic execution.

The issue stemmed from two newly created tables meant to store information about user type A and user type B. These tables were designed to serve as references to reduce the amount of data in a single table. During the unit test for this specific API execution, it was discovered that user type A actually required a reference from the user type B table, which only had an established relationship with a different kind of the table. This error necessitated rearranging the tables and combining user type A and user type B into one table so that both could be referenced without any obstruction.

As the week was ending, I made the necessary changes but encountered a migration failure. I plan to resolve this issue on Monday as quickly as possible before moving on to the next task.

Improving Performance Without Raw SQL

Monday, July 22, 2024 – In the previous week, I implemented a raw SQL query to set a column in a table to a condition, resolving a bug where the loading bar never ended. While the query executed correctly, Mr. Peter advised that, since we are using the EF Core approach, we should minimize the use of raw SQL queries unless absolutely necessary.

During this time, I also discovered another bug where a page displayed an “object set to reference” error. Upon careful debugging, I found that the error was due to the recent addition of a new table. The query from this table set a reference but didn’t correctly group it, resulting in an inability to map the correct data.

After solving the bug, I quickly worked on removing the raw SQL query. Initially, I tried using a foreach loop and experimented with multiple foreach iterations. However, waiting for all loops to finish before sending the results back to the UI caused a waiting time of about one minute, which was impractical. I then experimented with executing the task using async/await to see if there was any improvement in performance, but the total execution time remained similar.

After some consideration, to avoid long waiting times for an unknown response, I decided to implement the foreach loop along with a progress bar for the delete process.

Solving Report Section Bugs and Performance Issues

Monday, July 15, 2024 –Last week, I encountered a few bugs related to the report section. The first bug was with the overall margin calculations, which were incorrect even though the formula was right. After generating the margins and filtering a group of data with the same reference, some groups had a similar bug pattern. Specifically, one margin in the list was calculated as zero when it shouldn’t have been. I realized the issue was due to committing the generated margins in batches. I had forgotten to include an important function in the last batch, which caused the bug.

The second bug occurred in another section with multiple combo boxes for selecting dates and years. When a user selected a year, the combo box that should have displayed all years was null. Initially, I thought the problem was in the view model. However, after thorough debugging, I discovered the issue was with the API, which lacked a query to retrieve all existing years for the specific data. Once I identified the source of the bug, I was able to fix it.

The third bug of the week involved a button intended to remove a table of data. The query was supposed to be straightforward, but the loading bar never seemed to end. The problem was that the query took too long for the loading bar to handle. I solved this by implementing a raw SQL query, which made the function run in under ten seconds.

I pushed all the changes and will continue with my next task next week.

Optimizing Data Commit Strategy for Efficient Performance

Monday, July 1, 2024 – Last week, while waiting for the data generation to be completed, in order to test the stock generation function, I retested the controller tests to ensure everything was running successfully. However, I discovered that I had overlooked a test, particularly the generate stock test. The cause of the failure was due to my implementation of a task when creating the stock as an alternative to a foreach loop. This task caused the failure because, as Mr. Peter explained, we cannot run tasks when calling or using services from the repository due to the multiple DB contexts involved.

To resolve this, I modified the code to use a foreach loop instead. However, my next concern was how to handle the same data being called again every time the table is generated. Although the table configurations include settings for unique data, looping through already stored data is time-consuming and unnecessary. After consulting with Mr. Peter, he suggested using the Except function to exclude existing data from the list, ensuring that the list only contains new data without the need for a foreach loop to check each item individually.

Next, while examining and testing the logic for the specific data generation, which was previously developed but had performance issues, I found that the commit transaction was causing a slowdown at the end of the progress bar. Committing a large volume of data at once proved to be inefficient. I experimented with moving the transaction into every loop, but this still resulted in a very slow process, with the progress bar updating every second, meaning thousands of loops would take thousands of seconds to complete. 

Mr. Peter suggested committing transactions in batches of 500, which improved the speed. I then tested increasing the batch size to 1,000, 10,000, and 20,000. I found that committing every 10,000 records was the most efficient for now, with a commit time of about 5-7 seconds per batch. This significantly sped up the generation process compared to committing every 500 records in one batch, which took about 2 seconds per batch.

Improving Performance with Pagination and Efficient Data Handling

Monday, July 8, 2024 – This week, I was tasked with adding pagination to another segment of the stock management UI. This required changes to both the API query and the UI view model. First, I focused on the API query by incorporating all the necessary additions into the request query, based on the existing filters from the front-end UI filter. Previously, all the data was retrieved in a single API call and stored in the background data grid collection. The filtering logic was then handled in the view model. The initial purpose of this approach was to ensure a faster response based on the user’s search.

However, after testing the UI with an expanded dataset reaching up to a hundred thousand records, retrieving all data at once degraded the page’s performance, which could be taking up to 8-10 seconds to load a single page. This led to the decision to implement pagination, which would significantly improve performance by fetching only 50-100 records per page.

While working on the pagination logic for the query, one challenge was calculating the average total margin directly within the API query. Initially, I fetched all stock data into a list, which slowed down performance and defeated the purpose of pagination by fetching a large amount of data at once. To address this, I adopted a different approach.

I was reminded of LINQ’s concept of Deferred Execution, where data is not immediately retrieved from the database. Instead, an expression tree representing the query is constructed, and the query is executed only when iterated over or when a method forcing execution (such as ToList() or ToListAsync()) is called. This insight led me to try the more efficient approach.

Thus, rather than immediately fetching the entire query result into a list, I applied filters to retrieve only the necessary data. I then utilized the Select() method to specify and retrieve only the required fields from the database, enabling efficient data transformation into the desired format. This approach offered greater flexibility for handling data, allowing for transformations, calculations, or filters directly within the Select() projection. By implementing these transformations efficiently, I optimized performance by fetching the transformed data into a data format that I want and only turning them into a list only when necessary, resulting in improved speed and efficiency.

Once the API query was complete, I fixed the controller tests and began working on the view and view model changes, ensuring the controller tests passed successfully. By the end of the week, I had completed most of the assigned tasks. There are a few remaining UI issues that I will continue to check next week to ensure a flawless user experience.

Updating Logic for New Data Models

Monday, June 24, 2024 – In my previous blog, I mentioned that I created three additional tables to store unique names, establishing relationships with the existing table to facilitate querying without manually assigning names.

After adding these three new models, I needed to update many parts of the code logic. The first step was to ensure all three new tables had the latest data before generating stock data. To achieve this, I created a repository and the necessary services to retrieve data from source names and items from different contexts and store them in the new tables before creating the stock history data.

I then tested the new logic and encountered a few errors. The first error occurred because I didn’t rebuild the Cassandra container correctly. The next error was an “unknown database” issue, which stumped me for a while. Mr. Peter identified that the debugger was running too quickly, causing it to end prematurely. He advised me to use a while loop to keep the debugger running while the status was still pending. This adjustment allowed the code to execute correctly without ending too fast.

After integrating the new tables, I realized that nearly all of the controller tests needed updates. This involved reassigning the item and source names for each test to reflect the new data structure. I meticulously went through each test, ensuring they were correctly updated and consistent with the new models. In the process, I took the opportunity to refactor and clean up the tests, improving their readability and efficiency. This comprehensive update not only fixed the immediate issues but also enhanced the overall robustness and maintainability of the test suite.

Docker Crashes and Optimizing Backend Queries

Last week, several Docker containers repeatedly crashed despite my attempts to restart them. I consulted Mr. Peter and discovered that the disk had reached its maximum capacity of 20GB. To resolve this, I needed to increase the disk size by 50GB. I sought Mr. Peter’s guidance on how to proceed, but since I was not very familiar with Docker commands, I researched Docker containers further. Before making any changes to the disk, I created backup files to ensure data integrity in case of any issues. Despite several attempts, I was unsuccessful, and Mr. Peter eventually helped me expand the disk size while preserving the data.

Afterward, I resumed optimizing a query by implementing pagination instead of fetching the entire list in one API call. While working on a query to read a list of entities from the backend, I encountered an issue where I could not directly query a specific column which was not present in the table. Previously, filtering was done on the front end using BackgroundDataGridCollection, and I was assigning the data names by using a foreach loop into each of the wrapper. I sought Mr. Peter’s advice on the best approach. The next day, he suggested creating three additional tables to store unique names, establishing relationships with the existing table to facilitate querying without manually assigning names.

Following this, I initialized the models for the three new tables, configured their relationships, and added integration tests for each table.

By the end of the week, there were still many parts of the code needing optimization for a cleaner approach, which I plan to continue next week.

Debugging and Optimizing API Calls for Faster Data Retrieval

Monday, June 10, 2024 – Last week, I initiated the generation of 200,000 dummy data points. Although the data generation process worked, it was quite slow. Over the weekend, the process stopped midway and responded with a 401 unauthorized error. I restarted the program and the generation, and reported the status to Mr. Peter. It turned out that the reason for the interruption was a user API access limit of finite hours. Mr. Peter instructed me to extend the access duration on the identity server.

I then continued with my current task: debugging and investigating why the API calls for a list query were taking so long. While debugging a specific handler that Mr. Peter had previously improved by implementing a parallel foreach loop, I noted that I had initially used a standard foreach loop to assign additional data to each list item. The parallel foreach has the advantage of leveraging multiple threads to process items in the collection concurrently. This can significantly reduce the overall execution time, especially for large datasets or operations involving intensive computation. By distributing the workload across multiple threads, it maximizes CPU resource utilization, leading to faster task completion compared to a standard foreach loop, which processes each item sequentially.

However, despite testing different degrees of parallelism to optimize performance, where I used a stopwatch to measure the execution time of each configuration, having more than 4 degrees of parallelism did not show any improvements in execution times, so I settled on using 4 as the maximum degree of parallelism. Even after implementing the parallel foreach, the query performance was still unsatisfactory, taking over a minute to retrieve the list on the front end. This prompted me to search for additional optimization strategies.

Upon finding a suggestion on Stack Overflow, I realized that using FirstOrDefault might be contributing to the slow performance. An article confirmed that FirstOrDefault could be less efficient compared to other methods. I decided to replace it with ToDictionary, which was reported to have faster performance. After implementing this change, I observed a significant improvement in the retrieval time, reducing it to 19 seconds for the two lists called asynchronously, which was approximately 3 times faster.

Despite this improvement, the retrieval time of over 19 seconds was still far from ideal. Consequently, Mr. Peter advised me to implement pagination for the two lists to further enhance performance.