Majd Yafi

Back-end Developer

Front-end Developer

Solution Architect

0

No products in the cart.

Majd Yafi
Majd Yafi
Majd Yafi
Majd Yafi

Back-end Developer

Front-end Developer

Solution Architect

Blog Post

Optimising Legacy Systems for Performance and Reliability

September 16, 2024 Azure, Software Engineering, Tools
Optimising Legacy Systems for Performance and Reliability

So assuming that a team worked very hard to assemble legos into a train. Two years later, you were brought into the team and were asked to disassemble it and rebuild a faster and more reliable one but using the same lego parts.

Here’s what I have been through when I was tasked with a similar mission recently.

The most important KPIs I used to measure against non-functional requirements are:

  • CPU and Memory usage and Network latency
  • Query Performance (against the database) in monolith architectures or against each micro-service
  • Request Rate
  • Throughput
  • Response time
  • Amount of data processed on the server vs. client

I have also looked at the following smoke guns:

  • Over-engineered parts
  • External libraries (I reevaluated the need for each of them)
  • In-memory caching for dynamic data and CDNs for static data
  • Single points of failure
  • Incorrect use of asynchronous calls and concurrency
  • Unreliable tests on like-for-like environments (In my experience, this was due to performance tests running stress tests against APIM with a developer license for the dev environment. This license can handle up to 1000 calls). This has given me false indication of how production performs but went unnoticed until investigated
  • Frontend logic vs. Backend (i.e using javascript heavily on the frontend to invoke APIs written in .NET vs. calls invoked from MVC clients)
  • Quick wins: code minifying, image compression, CSS optimisation, etc.

I created a report that contained the above before making any attempt to disassemble a working system, albeit being sluggish and unreliable according to end-users. Bear in mind that it inevitable to avoid compromises, therefore, I recorded all decisions along the way using ADRs throughout the re-engineering process.

The top trade-offs that I have encountered are:

  • Monetary (budgeting and costs)
  • Cloud limitations and cloud patterns (for mostly hybrid applications: on-premise + cloud, or multi-cloud ones)
  • Performance, and reliability despite being the main goals of this post
  • Security
  • Integration contracts (commonly swagger) with third-party systems and other micro-services that live under the umbrella of this project in question
  • Applications’ versions and dependency versioning (backward compatibility)
  • Constant changes of business requirements while refactoring the current implementation of singed-off requirements 
  • Management of the non-technical: such as teams’ resistance to change and teams’ dynamics

The following table shows in numbers the performance gain I managed to win by change

Technique Gain Comments
Replacement of ORM Negative Gain 🙁 milli-seconds were lost due to the fact that entity framework generates SQL where as Dapper delegates this activity to developers. As a results milli-seconds were lost
Introducing Full-text search and indexing the relevant columns 1 second gain per request 🙂 Full-text search optimises string matches. Massive win.
DNS Caching milli-seconds (estimated) Hard to measure but I gained reliability more than performance
Compressing Calls up to 500 milliseconds Aggregate where possible. Simply return IQuerable objects and iterate over IEnumerable. Don't confuse the two. this helps EF generating optimised queries.
Removing external libraries Reliability and consistency gain Only use libraries that you trust and consistent with the application's logic. I removed the library that created in-memory database for unit-tests and replaced it with Moq. This relieved the server's memory
Remove static assets that are no longer in use, minimise the code, and CDN it. milliseconds All these static assets can be safely cached.
Database structure, pool, and locking mechanism up to 1 second depends how many calls exceed the limit for APIM (due to licensing) that await for to be served.
Overall up to 2 seconds with max of 7 seconds gained great for applications with limited bandwidth and budget. Also, for applications running low-end cloud licenses.

P.S: all my posts are driven from experience not AI tools like ChatGPT.

Taggs:
Write a comment

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert