The title of this article may be a little provocative, but that’s a general idea. This time, it’s not just about sharing knowledge. We want to start a discussion about different approaches to programming. And also about how ignoring or underappreciating certain aspects may harm the final product.
Let’s start with defining a microservice. It’s not that easy since there are many valid definitions. Some say a microservice is any code below 1,000 lines of code. Others say it’s created with 3 pizzas or less. Some people claim that a microservice is something you can completely rewrite during a sprint.
The microservices we worked on required more than one sprint, more than 3 pizzas, and more than 1,000 lines of code. Our definition of microservice is something as small as possible and as big as necessary to run. In other words, if there’s something you can remove and your code still works, it’s not a microservice.
The funny thing is that these flaws have been known for over 20 years. In 1994 Peter Deutsch coined the Seven Fallacies of Distributed Computing, which was later expanded with one additional flaw by James Gosling. These two men charted the areas that deserve special interest during the microservice design phase.
Here are the conditions a microservice has to meet:
- It has to have a single responsibility (business domain)
- Implementation should be executed through a contract
- It should be possible to implement independently from other microservices
How to approach a microservice project?
As software engineers, we decided that it was easier, at least for some of us, to design an app as a monolith, but adopt the microservice approach for communication. That’s how fire TMS came to be (read more about the project here).The stages of development were as follow:
- Start – monolith until business domain boundaries are established,
- Development – monolith with microservice communication (modular monolith),
- Scaling – switch to actual microservices.
The limitations of microservices
Do you know that meme about wanting something cheap, fast, and good, but having to choose just two of these qualities? Another version of this applies to microservices. You can have two of those:- Data consistency,
- High availability,
- Division resistance.
When does a microservice become a distributed monolith:
- change in one service requires changes in others (there is no possibility of independent implementation of services)
- services share a database
- services communicate intensively with each other
Microservice vs. Monolith
Let’s take a minute to think about why we even create microservices. In theory, it’s because…- we want to make the programmers’ life easier,
- the app needs to be scalable,
- the app needs to be failure-proof,
- we need to use a technology which better fits a given task
- we want to distribute tasks between many developers organized into teams
Here’s what a reality check can prove:
Additional problems can and will transpire during launching such microservices. These dangers are well-known, but when you first encounter them, they’re quite fresh to you. Some of these can be coded away. If they are not, however, they’re going to become a problem for the administrators who implement and maintain the microservices. Transferring even a part of error processing onto the infrastructure and administration can result in these problems piling up. The next step is network congestion or alert spam in the monitoring. And we haven’t even started about the flaws of distributed systems.The top 8 flaws of distributed systems
1. The network isn’t failure-free
At some point, the network will fail. It’s inevitable. And then, your microservice can act in a variety of manners. You might experience a complete connection loss upon the start or re-start of the app, or the existing connection can be lost. Common scenarios:- the app freezes and requires a restart,
- endless 'awaiting response’ state with no timeout,
- excessive resource use,
- no reconnect option, no rollback, no resuming,
- wrong message order, data corruption, duplicates.
2. There will be delays
It’s something apparent for the UI people who test their app locally and think it works lightning-fast. As soon as they install it on the server, or move to a mobile app, the client that worked great in the dev environment, is painfully slow in RC and production. The connection is also limited when reaching the client via VPN. Delays may cause the app to act up. The answer would be to use libraries that delay server response and let you test the app in the slow connection mode. Or you could simply install it somewhere far away and see how it really works. Common scenarios:- WAN connections
- mobile apps
- too many connections
- AJAX queries
- delays in data presentation
- low responsiveness of the app
- too many connections
3. The bandwidth is limited
We encountered this problem when trying to load a high number of short messages to elastic search. Establishing a connection takes time. Sending a message takes time. What can you do about it? For one, you can aggregate your messages, but if the packages get too big and get lost, they get retransmitted. Then the network clogs up with these retransmissions. That’s why it’s important to find the right balance for the packages. Mind that you can also affect other services using the same network, such as VoIP. Common scenarios:- generating too many short messages (no aggregation)
- generating too large messages (retransmissions)
- data flow bottlenecks
- network overloads, lost packages
- limited bandwidth for other services (i.e. VoIP)
- dev / RC vs. production differences
4. The network isn’t safe
You can’t assume that it is and send unencrypted data. If you do, you’re making yourself open to attacks. Common scenarios:- transmission of unencrypted data
- external traffic monitoring and leaks of confidential data
- replay attack (i.e. logging) – retransmitting a recorded transmission of your logging, resulting in logging to your account from a different device and location
- injection of unauthorized data (i.e. Javascript into HTML) – this results in gaining full control of everything underneath the user’s browser by inserting an HTML script to the page that’s returned to the user
- MiTM attacks – the person controlling the access point can hijack the transmission. edit the data, and generally use the data that you input
- automatized attacks
5. Topology changes
This is one of the common wrong assumptions. Common scenarios:- adding or removing servers, or server instances – if you’re building a scalable service, you should know that the topology will keep changing because new instances of every service will keep appearing. You will also remove certain elements when a health check detects that a service isn’t working as expected, or when something causes the host to go down
- technology changes – it may happen so that you go through someone’s network and you don’t have any control over it
- changes in package routes – dynamic routing
- no connection with hardcoded servers
6. Only one administrator?
This problem haunts many companies. Even large ones. It never ends well. Common scenarios: multi-level production environment with infra/app admins and L1/L2 line division, where every segment has its own rules and politics. This means that whenever you want to push something to production, you have to consult with every team separately. You can try to change this process, try to convince the client that your implementation doesn’t require everyone’s participation. You can try to automatize it and just inform everyone it happened. However, your client can straight out refuse. Results:- difficulties with establishing the scope of competencies
- scheduling multiple meetings to execute a single update
- conflicting politics between the teams
7. Transport isn’t free
This is a programmer’s common problem: not minding the amount of transmitted data. Common scenarios:- SELECT * FROM – this is a way to get any data you want, but also a lot of data you don’t want at the same time. Don’t do this. Overusing this method in elastic is an admin’s nightmare, the database slows down to a crawl. And you can’t really optimize it. It’s also discouraged in AWS – you could actually get removed for such practices. It’s not only an abuse of the bandwidth but also an additional cost for you. Always think twice if you’re only getting the data you need
- insufficient aggregation from the database server
- lack of optimization of graphic files
- serialization that requires additional assets
- increasing costs due to incoming/outgoing traffic
- higher technical demands (better routers and connections)
- slow UI
8. Homogeneous network (that isn’t)
There’s a rather slight chance you’ll experience this problem, however – there’s still a chance. When you combine systems between different users, problems may arise. Common scenarios:- connections between the systems of different types, from different providers
- different connection speeds
- using closed protocols
- MTU size
- a complete lack or limited possibilities of connecting with other systems or applications