September 2017 – Daniel Leaver

Web Scalability for Startup Engineers, by Artur Ejsmont

Screenshot from 2017-09-24 20-32-16

Artur Ejsmont is a guy with a really cool name, and a really cool book. He is also Head of Platform Engineering at Yahoo in Sydney.

Now to the book: it is hard for me to overstate how much relief I felt after reading this book. I am a self taught software developer and I had some serious wholes in my knowledge when it came to architecturing large full stack applications. I had of course come across disjointed articles here and there that covered some of the topics were are about to discuss, but this book simply does an incredible job of tying it all together. It gives the full picture and a much better intuition for how backend concepts are related.

Every line of the book packs knowledge and I had many “aha” moments while reading it. It’s not going to be possible to go into every area that the book covers, but I’m going to do my best at outlining this huge knowledge cloud!

“Front End Layer”

Artur mentions CDNs and caching as two ways to help the front layer scale.

Content Delivery Networks use GeoDNSs to point users to a local node of the network, they are ubiquitously used in web development because they enable a number of interesting benefits. One of the first benefits of CDNs is that they can be used to cache static content (HTML, CSS, JavaScript, fonts, images, videos, audio). This is helpful because your web servers then don’t have to do any work.

The other main benefit of CDNs is that they are closer to the user than your web servers are. Some DNSs have nodes in over 50 different locations over the worlds, and a CDN uses GeoDNS (a DNS service that can interpret to user’s physical location) to route a user to its closest node, drastically reducing latency. I’m going to write another book review shortly about a fantastic networking book I’m reading that goes into more depth about how DNSs drastically improve TCP/TLS latency, by reducing handshake and certificate negotiation times and avoiding the slow start algorithms inherent to TCP communications.

On top of the previously covered HTTP and media caching, the book also talks about browser caching (localStorage). An app can quite easily use localStorage to keep track of the local state of the application, and simply reload the last state when an application is closed and reopened. Google maps for instance might locally persist the latest map, so that the next time the user opens his app, it isn’t blank: there will already be a loaded map.

“Web Services”

The book goes in detail into load balancing. And caching. HTTP caching is of course useful and it avoids having to generate new content. But object caching, application entities that cannot be cached in NGINX, can benefit from being cached in a fast in memory database such as Redis (Redis is particularly useful because it has an expiry flag that can be set for each object).

Keeping web services stateless is vitally important as it enables them to be horizontally scalable: hide any number of servers behind a load balancer, and as long as they are stateless, your application can scale. This enables better scalability than vertical scaling (buying a more and more powerful service to handle all traffic) because there isn’t a limit on the number of web servers you can have. If your load balancer is becoming overwhelmed: use more than one load balancer and use your DNS server to route traffic to each load balancer.

“Data Layer”

The book has the best introduction to SQL scalability that I have come across. The author talks about read replication where multiple read databases reading form a master, to scale reads (but not writes). Master-master replication to improve reliability and avoid downtime. And vertical or horizontal sharding of databases.

Managing distributed SQL databases might be a bit of a hassle though, and the author also talks about some NoSQL databases, their pros and cons. NoSQL databases usually have a harder time guaranteeing ACID transactions, and usually offer a choice between high availability (low latency when getting and editing data) versus high consistency (the retrieved data is always accurate). There are many trade offs ad gotchas that have to be properly understood when designing and using distributed databases (CAP theorem, and more). I have plans to read more about it soon, if it’s any good I’ll be reviewing that too! The pros of NoSQL databases are often that they are able to store Tera to Peta bytes of data.

“Asynchronous Processing”

Ejsmont mainly talks about queues. He mentions that some services are publishers and others are workers. Queuing jobs should help your applications are using resources efficiently and avoid getting overwhelmed. The workers will only take the next job as soon as they are able to. Queues also encourage strong decoupling between parts of your application which is a sign of good engineering.

We have used asynchronous processes at TransferWise to process money orders by making them go through various stages (is there missing information? is the customer verified? is this a potential fraud? does we need to get more information about the recipient? are their enough funds on the account? did the payment go through?). We also use queues at Maple Inside with Google PubSub as part of our Event Driven Architecture.

There are however challenges to queues: it’s something extra to manage and deploy. Queues can sometimes crash if they become overwhelmed. And poisoned messages can systematically crash your whole application because they keep getting sent to new workers after crashing the previous ones.

“Searching for Data”

In one of the last parts of the book, Artur talks about search engines and how they might be useful for some applications. He mentions ElasticSearch in particular – a somewhat new but hot technology. Search engines might take in XML or JSON documents and are then able to efficient search through them. A common pattern for an eCommerce website that sells cars might be to add a new document to ElasticSearch every time a new car is added to the cars database. When a user wants to search for a specific car model, a list is returned by ElasticSearch, and the latest data for each carId is fetched from the cars database.

There is a good deal more information in the book that I haven’t touched on. I’m particularly fond of the book and think it might be one of my favorite books of the year. It should really be recommended reading for any (especially newby) engineers.

Building Microservices: Designing Fine-Grained Systems, by Sam Newman

Screenshot from 2017-09-24 19-46-09

Microservices are all the rage these days and while I am not sure that they are apt for every product – especially during the early stages – they ARE popular and for good reason.

The Service Oriented Architecture (aka. microservices) have become popular because they enable us to split a monolithic code base into decoupled services, each managed and working independently from another. The parts of the application that need to be scaled can be scaled on their own. And teams can take ownership of specific services.

While I read this book a few months ago, and I was able to appreciate and immediately apply its what now seems like straight-forward principles. I have worked on microservices at TransferWise, and currently at Maple Inside with great success.

Here are some of Sam Newman’s key concepts that were the most important to me:

Microservices should be as small as possible, but not smaller
It’s very hard to create distributed ACID transactions. 2 Phase Commit is probably one of the better ways to do it if you really wanted to, but it is best to avoid it all together.
Microservices are hard

“Microservices should be as small as possible, but no smaller.”

Microservices should be as small as possible, keyword “as possible”. While at first it might be hard to find where the boundaries lie, it’s often possible to split the code base into components with a very limited ranges of responsibility (Netflix has over 900 microservices).

For instance, an eCommerce company might use an email service with a simple interface that takes in a from, to, and a body. The same company might also have an order service that reads and edits orders based on userId. To retrieve an order, you might first have to query your user service for the user (using the user email) and then use that user’s id to make a second request to the order service. Although it will add latency to your app, it shouldn’t add that much time to answer the request with proper database indexes and if your services on the same local area network.

Some services however cannot be split. Or with so much difficulty that it is simply not worth it. Those services are best kept attached together, because they are already as small “as possible”. Read the next section to understand when that might be the case.

“It’s very hard to create distributed ACID transactions. 2 Phase Commit is probably one of the better ways to do it if you really wanted to, but it’s often best to avoid it all together.”

It’s incredibly hard to revert a transaction across multiple services if it fails for whatever reason. I believe the consensus is that it is better to use 2 phase commit if you want to go down the route of having transactions, but 2 phase commits algorithms don’t guarantee ACID transactions, they only increase the likelihood of them succeeding. The way 2 PC works is that it simply checks that every service that needs to be involved in the transaction is able to do the transaction before doing it, thus reducing the risks of a transaction failing.

The best approach is however to not split up parts of your application that require ACID transactions over several areas of responsibility.

“Microservices are hard”

Microservices add complexity: communicating between microservices, deciding which standards to use, understanding where the boundaries lie, making the system resilient, debugging distributed errors, deploying microservices, monitoring, logging, every thing is harder.

In monolith first, Martin Fowler makes the case that is might be preferable for some projects to start off as a monolith. Especially until the project has been confirmed to be useful. The main advantage is the time saved between iterations, and also because their is no need at that point to split a huge monolith up into a SOA. It also gives the team the time to learn more about the domain problem, and gain the domain knowledge that will help them create more stable boundaries between areas of responsibility.

“Other”

The book talks in some detail about problems common to all microservices: logging, monitoring, analytics. Because the application is split up, so are logs, errors, and databases. And as logging, monitoring and analytics are vitally important to production systems, a good deal of thought has to go into how to centralize logs split across multiple services, monitoring multiple services and merging the contents of SQL and NoSQL databases for analytics.

The book also read about the “SOA bus”, which is sort of similar to the CPU bus in the sense that it’s an efficient way to make your parts talk together. At Maple Inside, for instance, we are using Google PubSub and an event driven architecture (EDA) so that services can publish events for other services, and while other services are able to subscribe to specific events. An example might be a user service who publishes a “User Created” event that the email service (and others) might be subscribed to, so that it can send out a welcome email.

All in all it’s one of the best books currently out there about microservices. It took me about 20 hours to read, and I highly recommend anyone new to the topic to read it.

World of Math Part 2 (Pre-Calc)

It’s been a while since I last wrote, but I have continued learning math in my absence. I made it beyond pre-calculus on Khan Academy, and am almost finished with calculus at the time of writing.

It has been an enriching journey, and I have learnt a lot, and refreshed a lot. Particularly applicable to my day to day job are vectors, matrices, and logarithms and exponentials. All of which are important math concepts to know when designing algorithms.

There were also a few other concepts, that while not applicable to every day CS, were great to learn. The unit circle (trigonometry) and differentiation/integration (calculus).

It took me about 3 months studying 20 hours per week to complete all content on Khan Academy up to Calculus.

The journey continues, although I may take a break after completing calculus to study aspects to CS a bit more relevant to my day to day job.