It's not hard. You need an accomplice. The accomplice swaps two items on the shelf. You take one of the
misplaced items. You get charged for the other. But that's not the point
Two days ago, I was on a trip to Seattle, and of course I visited the Amazon Go Store. If you are
out of the loop, it's a store without checkout. You come into a store, you grab items from the shelves, you
walk out. That's it.
Amazon doesn't explain how it works, but we can infer some from observations.
When you walk out, you don't get a receipt instantly;
The app sends you a receipt later;
The time it takes their servers to present you a receipt varies. We had three people enter the store; the
person who didn't spend much time got his receipt in 2-3 minutes, the accomplice in ~5 minutes, and me, it
took Amazon the whopping 15-20 minutes to serve my receipt.
We can conclude that tricky interactions get sent for a human review, e.g. to Mechanical Turk,
which Amazon conveniently owns. It seems that a bunch of object recognition coupled with a bit of mechanical-turking does the trick.
But it is the future
Once I've satisfied my curiosity, and managed to trick the store, I returned to use it for real.
I walked in, grabbed a bottle of water, and walked out. It took 22 seconds. I got a receipt for a bottle of
water later, but I didn't even check.
Folks, this is the future.
In his article "Invisible Asymptotes", Eugene Wei attributes a lot of Amazon's achievement in winning
retail consumers hearts to eliminating friction. He writes,
People hate paying for shipping. They despise it. It may sound banal, even self-evident, but understanding
that was, I'm convinced, so critical to much of how we unlocked growth at Amazon over the years.
Interestingly, Eugene doesn't apply this to Amazon Go, but that's probably one visit to Seattle away. ;-)
Waiting in checkout lines is the worst part of brick-and-mortar shopping experience; it's obvious to
everyone who shopped at least once.
Therefore, Amazon Go is the future.
By the way, does anyone need a bottle of salad dressing?
A Thunderbolt 3 external GPU setup that doesn't weigh 7 pounds (3kg)? Is it even possible? Of course
it is, but you'll need to do some simple tinkering. This post describes how you can do it in a
couple of hours if you're so inclined. You too can loose 4lbs off your Thunderbolt 3 eGPU within a day!
This is a Thunderbolt3-enabled DIY setup based on the Thunderbolt3-enabled
components sourced from Akitio Node Pro (this and
other Amazon links are affilliate). This is not the first of its kind. Here's an example from Dec
2016 linked to me on reddit. And there are many other well-known DIY setups
for pre-TB3 tech . My setup weighs 1.5kg including the power supply,
and only 0.47kg without one, and it can fit larger video cards.
This setup does not aim to save money, but to achieve superior portability, utilizing
Thunderbolt3's plug-and-play capabilities as well as keeping it light and small. It cost about
$400, but at least it doesn't needs its separate suitcase now!
I happened to speak to some industry professionals who actually know something about electronics. They
suggested that removing the case might create significant EMF interference which would manifest in wifi
connectivity issues. I ran some tests and wasn't able to detect any such effect. Perhaps, it
only appears when you're having a LAN-party with 10 of those in. But if you're worried about EMF, get a
Faraday bag ;-)
And if you own a business that produces the Thunderbolt 3 enclosures, could you
please pretty please just make a retail solution that weighs 2 lbs, 75% of which would be the power
On 4k screens, portability, and priorities
Would you believe that an employed, experienced software engineer does not own a laptop? Neither
did my friends whom I told I don't own one. Boy did it make for some awkward job interview
conversations. "Let's code something on your laptop!" and I would respond, "Oh I don't own one,"
and get this suspicious "oh really" squint.
(Answer: I just don't.)
I finally gave in when gearing up for a vacation in my hometown. I recalled all the times I wanted
to use a laptop: mostly when writing blog posts on an airplane (like this one), researching bike
routes when traveling, and editing photos I've just taken. Many of these tasks have been mostly,
but not completely replaced by smart phones (Lightroom for phones and
shooting RAWs from a phone camera dealing a pretty severe blow).
I rarely need a laptop when away from a power outlet: I'm not the sort of explorer who ventures into
a remote village and emerges with a 10-page magazine article. In fact, I don't really look at the
laptop that much when I travel. But when I do look at the laptop, I demand the premium experience.
Many ultralight laptops offer a UHD 1980x1040 screen in exchange for the extra 2-3 hours of battery
life... Please, my cell phone has more pixels! I settled on HP Spectre x360 13-inch with a
What a gorgeous screen it is! It is easily the best display I've ever owned, and probably the best
display I've eber looked at. How to make use of this artifact (well, apart from over-processing
photos in Lightroom)? Play gorgeous games with gorgeous 3D graphics. Like The Witcher
3. Or DOOM (the new one, not the 1990s classics). Or Everyone's Gone to the
Rapture's serene landscapes.
The problem is, for a screen this gorgeous, the Spectre's internal video card is simply... bad. The
integrated Intel UHD 620 graphics card does not believe in speed. After rendering just 1 frame
of the idyllic British countryside, the video card froze for 3 seconds, refusing to
render another one frame until it's done admiring the leaves, and the shades, and the reflection. It
produces less than 1 FPS at best, and its 3DMark score of 313 solidly puts it at the worst 1% of
computers to attempt the test.
The test doesn't favor the brave--who would attempt to 3dmark an integrated ultralight laptop video
card?--but it does show you how bad the result is. How can we improve?
When my desktop PC's GeForce 660 MX saw the first frame of DOOM, it was in simiar awe,
confused, confronted with a task more demanding it ever had before. After struggling a bit and
rendering maybe three frames, the video card decided to retire, pack its things and move to Florida,
while I replaced it with the state-of-the-art-but-not-too-crazy GeForce GTX 1070. DOOM
instantly became fast and fun. So the question is now obvious.
Turns out, tinkerers have been connecting external GPUs to laptops since forever. With time, GPUs
required more and more power and space, while the possible and hence demanded laptop size shrank.
The GPU power consumption trend has finally been reversed, but the laptops are going to get lighter
A laptop is just a PC stuffed into a small plastic case, so connecting a GPU would be just like
connecting it to the desktop. Laptop manufacturers would leave a "PCI extension slot" either
featured as a supported connector or at least available inside the case for the bravest to solder.
There is a lot of external GPU (eGPU) do-it-yourself DIY ,
, and out-of-the-box solutions available.
But then, Intel developed a further extension of USB-C called Thunderbolt 3. The previous
USB interface generations were also named "Thunderbolt", just the lightning was meek and the thunder
eGPU after Thunderbolt 3
Apparently, not all graphic adapters are compatible with Thunderbolt 3, or with
the specific hardware I used. For example, I wasn't able to make my GeForce MX 660 Ti
work with it (even before I took everything apart if you must ask). My guess would be that older video
cards are not compatible. If in doubt, check this forum for compatibility
Thunderbolt 3 is no magic. It's simply a standard to produce USB chips with higher wattage
and throughoput... so high and fast that it allows you to connect, say, displays or even graphic
cards over USB. It "merely" quadruples the throughput of the USB 3 standard, and now you can do
plug-and-play for your video card, eh? You would just buy a thing to plug your video card into, and
connect that thing into USB. Easy!
So all I need is to buy that "thing". Easy! There's a plenty of Thunderbolt 3 "things", here, take
Notice something? That's right, they all are freaking gigantic and weigh 3-4 times more than the
ultralight laptop itself. Here, I bought one, and it's a size of an actual desktop computer I own!
The manufacturers are telling: "want Thunderbolt 3? Buy a 3 kg case!" Look Akitio Node Pro has a
freaking handle! A handle!
It didn't use to be this way. Before Thunderbolt 3 enabled plug-and-play, hackers still found ways
to attach an eGPU as shown above. These solutions are tiny and they cost pennies! How
do we get something similar with Thunderbolt 3?
My other choice here would be a smaller-sized external GPUs like Breakaway
Puck, which is indeed both smaller and cheaper. I decided against those as I would have to
buy a new GPU that was less powerful than the GPU I already own. Besides, the power supplies
included with those are lackluster, citing portability concerns, but they under-deliver still.
On top of that, the total weight would still be more than 1 kg, but the build would deliver significantly less power
than their 1.5 lbs counterparts. The bigger enclosures have enough power to both charge the laptop and supply
the GPU with enough wattage to churn those FPS.
Some speculate Intel just takes a pretty big cut in the licensing fees for every Thunderbolt 3
device produced. Since they say it on the internet, it must be true. (See also
here, scroll to the middle.) This explains the $200+ price. But that does not
explain the 7 lbs of scrap.
It's time to take the matter into my own hands.
"When There's Nothing Left to Take Away"
...perfection is finally attained not when there is no longer anything to add, but when there is
no longer anything to take away.
So we're going to turn this
The procedure consists of two steps: disassembling the original box, and mounting of the
enclosure onto a chunk of wood. Before we start, please understand the risks of this procedure, and
accept the full responsibiltiy for the results.
Tinkering with the device in a manner described in this post will definitely void your warranty, and
putting a source of heat next to a piece of wood is likely a fire hazard. Do not attempt unless you know what
We'll need the following ingridients:
Akitio Node Pro (to source the TB3 extension cards) (amazon);
powerful enough to
also charge the laptop (unlike, say, the non-"Pro" Akitio Node)! I can confirm it does. You
should shoot for 450-500W+ for that: the laptop charger would draw 100W, and you can look it up in
It was somewhat straightforward to disassemble the box. I needed T8 and T10 screws as
well as the expected Phillips screw. I got T8 and T10 from ACE Hardware brick-and-mortar stores.
If you're curious, hex screwdrivers and flat screwdrivers only got me so far, until I faces a Boss
Screw, a T10 right next to a board. That's when I gave up and went to the hardware store:
Basically, just unscrew every bolt you can see, remove the part if you can; then find more screws
This bit is tricky: you need to unplug this wire; I couldn't find more ways to unscrew anything
else, so I just extracted this using a thin Allen key.
What we need is these three parts: two boards (one with the PCI slots and the other with the USB ports), and
the power supply. Look, they weigh only 1kg, as opposed to other, non-essential parts that weigh 2.3kg.
Putting it back together (without the scrap)
The bottom board, once dismounted, reveals that it can't stand on its own and needs to be mounted on
at least 1cm standoffs. I decided to mount them onto a wooden board that needs to be at least 3x5.
This board set worked, albeit it only had 1 out of 5 boards that was rectangular (it fits
pretty snug so you only have one chance).
Wait, how did I know where to put those, and how did I secure them? Simple: I drilled the wood and
glued the standoffs. I first tried to mark the spots to drill by putting nails through the mount
holes like so:
This did not work! I wasn't able to put the nails in straight enough, and missed the correct
spot for a millimeter or so. I fixed it by tilting one of the standoffs a bit, but I did resort to
a different method of marking the holes: drilling right through the board!
This looked sketchy but it worked better than trying to hammer nails vertically.
I used super glue to secure the standoffs to the wood. As added security, I put a bit of saw dust
back into the holes for a tighter grip. (Ask Alex for the right way. Some mention epoxy glue but
then my dad said its unsuitable for gluing wood to metal, so I didn't research this question
further (and I surely didn't have it and I didn't want to go to the hardware store again).
I practiced to mount standoffs onto a different board first, and I highly recommend you do this too. I only had one board at the time, so I couldn't screw it up, but if you just get more
boards, it'd be easier.
Finishing touches and results
After I plugged the cards, the setup seemed a bit unstable. After all, that is a heavy double-slot
monster of a GPU. So I added back one of the assembly pieces previously discarded and secured it
back with a screw I found. However, I also later lost that screw while on the road, and ran this
for hours without the extra piece, and it worked well, and didn't fall over (duh).
So here it is (without the power supply)
And here it is without the GPU (and without the power supply either)
The final setup weighs just 1.3 kg, including the power supply and 0.47kg without.
In order to improve portability, I only used the X screws when putting things back, and made sure
that no T8 or T10 screws are needed, and I can travel with a regular philips screwdriver. Make sure
to pick a large enough screwdriver to unscrew that tricky bolt. I tried to use a small
screwdriver one might use for watches and I didn't get enough torque out of it.
And we're done. I've ran a variety of tests and benchmarks. Note that I ran all benchmarks with the
3Dmark Time Spy
I ran a 3Dmark Time Spy benchmark multiple times; here are the scores from one run. I also ran it on my PC
(same GPU, but a different, older CPU) to check if switching from PC to a Laptop
My desktop runs Intel(R) Core(TM) i7-3770, whereas the laptop runs way more modern Core i7-8550U. However,
it's known that CPUs didn't make much progress in single-core performance over the last several years; most
improvements have been in portability and energy efficiency.
Unfortunately, I didn't use 3dMark Pro, so I couldn't force 4k for the desktop; it ran under a lower
resolution. I suspect, they'd be on par otherwise.
So it seems that the eGPU setup runs as well as the desktop setup with the same GPU (but a way older CPU).
I used Cuda-Z software recommended by egpu.io to measure throughput.
It seems, the connection does churn 2Gb/s of data each way, which is good. Overall, I don't really know how
ti interpret the results.
I've playes many hours of Witcher 3 (2015) on max settings to great effect. There was no lag, and
I enjoyed beautiful and responsive gateway. I also played a bit of DOOM (2016) on Ultra settings, and it was
a bit too much (I got 20-30 FPS when walking and most of the fights, but several simultaneous effects lagged a
bit). Non-ultra settings were a blast though!
Both games were enjoyed in 4k resolution. It was awesome.
Portability and features
The laptop charges while playing; just make sure to get a 100W-enabled cable.
As an added bonus, Ubuntu Linux 18.04 (bionic) recognized the plugged in video card without any
issues or configuration. I haven't yet tested Machine Learning acceleration but I'll update this post when I
can do it.
Field Test Number one... oops!
How does this setup fare without benchmarks? I was able to get several hours of continous gameplay
until the stock Akitio's power supplydied of overheading and never recovered. The boards, the
GPU, and the laptop were OK, but the power supply just stopped working.
I can't quite tell what prompted it, but I did notice that the fan didn't turn on. It used to but
it didn't. I must have damaged the supply when transporting it in the checked in luggage.
Instead of trying to recover, debug, and fix the dead power supply, I just bought a new one. I
immediately noticed the difference between the Akitio's power supply and the typical ATX power unit.
Akitio's Power Supply is smaller than ATX and it weighs less (0.8 kg vs 1.5kg)
...it only has PCI-E cords whereas ATX has all of them
...it has two PCI-E power connectors, but they are on different wires. The ATX
power supply I got has two of them attached to the same wire, and the distance between them is quite
...it turns on when the switch on the back is flipped, whereas a normal ATX power supply requires
another switch to power-up, which motherboards usually supply but we don't yet have.
Since the original device was 500W, I tried to match it and settled on a 550W my local store had in
stock. You can choose anything that works, but I was limited to what my local electronics outlet had
in stock. So I bought a Corsair 550W modular supply RM550x which featured:
slightly smaller weight than the alternatives (we pulled up the comparison table at the store,
and even less powerful units were listed as heavier).
modular wiring so I could discard the excess wires these units usually have for all ATX computer internals.
a bit more expensive because I didn't want to destroy another power supply.
You need to also short-circuit the pins so the power switch actually turned the power on
one. I also had to add some extenders.
Note that there are several ways to short-circuit the power supply's pins, so don't be
confused if you see seemingly conflicting instructions.
However, I'm not quite satisfied with the result. Corsair RM550x is large. While it doesn't have any
cooling issues as the original Akitio's supply does, I feel there is middle ground here with something not as
large and more specialized.
When selecting an ATX power supply, also buy a PCI-E extender
along the way. 6-pin is enough (the Akitio's "motherboard" piece is powered via 6-pin PCIe supply).
Most likely, the dual PCI-E connectors are designed for two video cards placed rigth next to one
another, whereas about 10-15 inch long wire is needed for our setup.
You might need or could use a splitter instead as well.
Wi-fi Interference testing
I heard, the major reason Thunderbolt 3 eGPUs come with a huge enclosure is to prevent the electromagnetic
field emissions. Supposedly, Thunderbolt 3 boards emit quite a bit of EMF radiation, and this can cause Wi-fi
I wasn't able to find the evidence of that. I measured wi-fi connectivity speeds and signal strength and I
wasn't able to notice a drop. That doesn't mean there was no packet drop: perhaps, the wi-fi connection
was indeed broken, but my 100Mb/s broadband was too slow to actually affect the speeds. It also could be that
you'd need say 5 Thunderbolt 3 cards to emit enough EMF.
I used my cell phone to measure signal strength and used OOKLA speedtest to measure up- and download speeds.
I placed cell phone into three places: 2ft from Wi-fi router, 12 ft from Wi-fi router, and into the other
room. I also placed eGPU into three positions: 2ft from Wi-fi router, 10 ft from Wi-fi router, and completely
off. Here's what it looked like when the eGPU is 2ft away from Wi-fi; you can see the Ubiquiti wi-fi "White
Plate" in the top right corner:
I was running the Unigine Superposition benchmarks while measuring the signal strength and the download
speeds, in case the EMF interference only appears under load.
Science log is here. The results are in the table below, each cell contains "Download speed
(Mb/s) / Upload Speed (Mb/s) / Signal Strength (db)".
eGPU 10ft away
eGPU 2ft Away
114 / 14.0 / -31
114 / 12.8 / -22
116 / 13.3 / -22
10 ft away
118 / 14.2 / -38
119 / 13.1 / -34
116 / 13.4 / -35
110 / 13.8 / -53
116 / 13.8 / -49
116 / 13.6 / -54
So this means my Wi-fi stays pretty much unaffected regardless of eGPU presence. If there was packet drop, it
didn't affect the 100 Mb/s connection.
Buy a longer USB3 Thunderbolt-enabled cable ⚡
The USB cable that comes with the Akitio Node Pro is quite good but a bit too short. No
wonder: a longer cable will cost you. A detachable cable that affords the required throughput needs
to conform to som Thunderbolt 3 standard, and support 40Gbps data throughput. I simply bought a
pretty long, 6ft cable by Akitio hoping for best compatibility, and I've had no issues
with it. The Field Tests were done on that longer cable.
Putting the enclosure away reduces noise and improves mobility: you can put the setup close to the
power outlet and attach to it from a different side of the table.
Use that extra fan
Akitio Node Pro had one extra fan to draw air into the case, and it is now unused.
Optionally, you can attach it to the board where it originally was. If I were to do this, I would
also screw some leftover standoffs into the fan so it gets better intake. The original case had a
curve to separate it from the ground. However, I got good enough performance out of the video card.
A way to reduce EMF exposure is to put the emitter into a Faraday Cage or a special
This one actually works. As probably do others, but just a month ago there used to be many scams on
Amazon of faraday cages that you should place on top of your router, and they would "block EMF", improve your
health, and make Wi-fi faster at the same time. 😂 This Farady
bag actually works (I tested it by placing a cell phone and calling it to no avail). I can't tell you
have to use it, but maybe it could put your mind at ease.
Final Evaluation and notes on performance
It works. Moreover, the power supply doesn't overheat. I have never seen its fan turn on.
Perhaps, the larger size allowed to space the internals better, or perhaps it's just of better
I've put in about 10 hours of gameplay on the new setup, including about 5 continouos hours. As I
test it out more, I'll update this post with better long-term evaluation data.
The performance is stellar, even with the internal display. Various measurements (done by other
people) detect that using the internal display on a laptop or connecting an external display to the
laptop (as opposed to connecting it to the video card itself) saturates some bottleneck and results
in a performance drop. My findings are consistent with this blog post on egpu.io: with
a card like GTX 1070, you won't notice the drop because you're getting 40+ FPS anyway.
After playing Witcher 3 at "Ultra High" quality (!), with full-screen antialiasing (!!), at 4k
resolution (!!!), for several hours (was hard to resist anyway), I am happy to call it a success.
Moreover, the setup survived a transcontinental flight in the checked-in luggage, wrapped in
underwear and padded with socks. And now, just one challenge remains: get it through TSA in a
carry-on and prove the bunch of wires is a gaming device.
This post contains a speed run of setting up AWS Elastic Beanstalk. It’s easy to lose your way in AWS
documentation, and I hope I can make it easy for you here.
We’re going to set up a very simple application, that has only 1 type of instances. In my case, this instance
serves 1 docker image with a simple web server listening to port 80. Hopefully, when this guide becomes
popular, the instace will scale up (wow!). Otherwise it will just be the cheapest thing AWS can do for
some simple custom code with no disk or mutable state (aka a database).
Choose the right product (which is “Elastic Beanstalk”)
The first challenge is to not confuse it with other, less useful Amazon products. It’s harder than it
seems. You do not wantAmazon Elastic Container Service despite that it has the word “Container”
in it, but “Elastic Beanstalk” only seems to offer beans, or stalking or both. The “Container Service” is a
counterpart of EBS that requries you to set everything up manually, including your Elastic Private Network,
Managed Instance Group, Elastic Load Balancer, and other Crap You Don’t Care about. On top of that, you will
have to manually update Docker installations. “So uncivilized”.
Configure private Docker Registry
The next challenge is, find out a way to deploy Docker containers to your private repo. You need Amazon
Elastic Container Registry (this one both has the word “Container” and is actually useful). Create a
repo for your server image (let’s call it megaserver). Optionally (later), add a “Lifecycle Policy” that
deletes old images automatically. But for now, you need to configure the client.
Click on “View Push Commands”, which will show you something like this:
Go to the Elastic Beanstalk, and click “Create New Application” in the top right corner. Choose some
sensible name; it doesn’t matter. Then, inside the application, choose “Actions”, then “Create new
environment”. Choose “Docker” as the platform.
Make sure that your ~/.aws folder did not exist before running these commands! If you were playing with
other AWS products, you might have already written something there and corrupted it somehow. So if some auth
commands don’t work, try removing the folder and then running:
rm -r ~/.aws
$(aws ecr get-login)
(The last line means “run aws ecr get-login, then run the command it printed to the console. It prints a
docker login command that would authorize docker push to put containers to the AWS registry. )
Now, your keys should be in ~/.aws/credentials. Mine looks like this:
As part of the Beanstalk Command-line tool workflow, you’ll need to create ebs/Dockerrun.aws.json in
your working folder. See here for documentation. Here’s the file I created (note how it uses
the image repository name megaserver we created above).
docker tag boring_volhard 12345567890.dkr.ecr.us-west-2.amazonaws.com/megaserver:latest
docker push 12345567890.dkr.ecr.us-west-2.amazonaws.com/megaserver:latest
eb deploy --verbose
If all the permission are correct, your image will start deploying, and will be available within 1-2 minutes.
Update the application
When you want to deploy the next version, just repeat the same commands. You’ll see how the image is
being updated on the environment page.
docker tag boring_volhard 12345567890.dkr.ecr.us-west-2.amazonaws.com/megaserver:latest
docker push 12345567890.dkr.ecr.us-west-2.amazonaws.com/megaserver:latest
eb deploy --verbose
If you set the Environment parameters correctly (I don’t remember if you need to change the defaults or not),
it will perform a rolling update, where it would replace your running containers one-by-one.
Here’s the configuration that works for me. Note the “Rolling updates and deployments” in the middle. This
website can scale to more instances based on network I/O (particularly, based on the O).
Keep the change
My bill for one micro instance is $0.85 per day, which brings it to… $26 a month. In a quirk of Amazon
Billing, it says I’m paying the most (65%) for the Load Balancer rather than for “running the instances”
(27%). Based on which, it seems to me, these costs are made up anyway. Consider this the minimum price at
which one can run AWS beanstalk with dockers.
Ten years ago, I thought this website should be a content platform with video hosting, blog
platform, and other cool things. It was an essential learning experience: I’ve chased performance
bugs and reinvented highload
wheels. Today, armed with several years of Industry Experience™, I
am ready to present the highest-grade advice of a seasoned web engineer.
The right architecture for this blog is a set of static webpages served off of a file system by a simple web
server such as Apache or nginx. One tiny shared VM would serve way more visitors than my content could ever
And I did just that. There’s no dynamic content here anymore. So… wordpress you say?
Why do this again?
I hated managing the website manually, so why did I choose to endure one more manual-ish setup? First, Wordpress is
for wimps who love PHP, especially the hosted one. I was going to yet again immerse into whatever the
current hype was in order to learn about the latest cool tech. And the hype of 2017 was
implemented in Go
deployed onto Managed Cloud
with managed services for email, database, etc
…and maybe even serverless?
Everyone seems to like Amazon. I would have chosen Google Cloud Platform, of course, if I were to
optimize for quality and reliability. However I’ve chosen AWS because its a) the hype; b) not where
I work. I’ve had enough exposure to Google Cloud as an insider, and I did want to expand my
But how would I choose what exactly to do? What should my architecture achieve? Of course, it
should be security, long-term maintainability, and… money.
My previous version of the blog ran on Gentoo Linux. It was hell and it became unmaintainable. I rant about
it at length in my previous post.
Anything that is quickly updateable would work. I used whatever linux. I literally don’t remember what it
is; I need to look….
$ head -n 1 Dockerfile
What is it? I guess it’s Ubuntu or something else Debian-based, but I literally don’t care. Ok,
Let’s optimize for… money!
Another motivation I had was to reduce the cost. Before the migration, the single VM that was
serving my shitty Ruby on Rails app cost ~$50 / month. Gosh, I could buy a brand new computer
every year.. So, can I do better than to shell out the whopping
$720 / year for this stupid blog nobody reads? Can I do, say, at $10 or $20/month?
…spot prices on their GPU compute instances were $26 an hour for a four-GP machine, and $6.50 an hour for a one-GP machine. That’s the first time I’ve seen a computer that has human wages
Turns out, I could get it cheaper but only if I didn’t use all the managed services. Managed is costly.
The smallest (tiniest) MySQL managed database costs $12 / month on AWS. It supplies me with 1 cpu
and 1 Gig of memory. This blog doesn’t need 1 damn CPU dedicated to the DB!
It doesn’t even need a DB! It needs to copy static pages from basic file system to your screen!
Rabbit holing is another problem. So what if I want a AWS managed Git? Yes sure! That’d bee free
or $1 a month. Plus access to the Keys. The Keys would be $1/month. Oh, and the logging of the
key usage? That would be another whatever per access unless there’s 10k accesses but don’t worry,
for most workflows that’d be fine!..
ugh. Can I get something more predictable? One way is to search clarity, and the other to get rid
Getting rid of this. And of that.
Turns out, I can get by on the internet with free services for code storage, bug tracking, and file,
photo and video storage. The $1/month I pay to Google covers 100 Gb of crap, which I’m yet to fill.
GitHub and Youtube are the staples. I’ve explained more on private git and other things in the
Do I even need rendering?
So what about converting human-writeable rich-text formats to HTML? Wordpress would be too lame,
but I can’t do my own rendering engine anymore. The highlight of the previous version was of course
the custom context-free parser generator that compiles a formal grammar into Ruby code.
It took it sweet 1-2 seconds (!) to render each page of this website (not a joke).
That thing burns in hell and gets replaced with Markdown.
There would be no database. The contents (i.e. the text on this and other pages) would be versioned
in Git and stored on a “local” disk (disks that are attached to only 1 machine are considered local
even if they are actually remote, which is how cloud architectures now work).
If I wanted to change the contents or to post something new, here’s what my workflow would look
SSH onto the server
Use vim to add or edit a file in a special folder.
Use git to push the change into the “release” branch.
Run a ruby script that would use Markdown to format all the blog pages
into HTML. It would use ls to get the list of all pages and make the [blog archives][archives]
page. That’s not much different from a DB-based architecture: ater all, Databases evolved out of
simple collection of files arranged into folders.
rsync the code onto the remote disk.
That’s it. nginx would serve the generated pages. Since making a new post invalidates all pages
anyway because you’d see it pop up in the sidebar to the left, there’s even no need to be smart
What a genius idea! It’s so obvious that I’m surprised nobody…
Docker can combine the local disk contents with a recipe called Dockerfile to produce a VM-like
image that could serve a website.
Now, it would be a fun idea to have a self-hosted Docker image. The image would contain the Git
repositories for website content and its own Docker files, and it would build itself and redeploy
itself using AWS APIs. I think it could work…
But let’s start with something simpler. Let’s simply build Docker from another Docker container. It’s
easy enough and it protects me from the loss of my home machine. In the end, the workflow would work like so:
Docker image Build contains all the build tools, including the Hugo website builder.
Static contents (including the images and Markdown files) are in a git-versioned local folder.
The Build image runs hugo to create a folder with HTML, CSS, and other files that constitute the
entirety of the website.
Another Dockerfile describes the “Final” image, which combines [nginx:latest][nginx-docker] and the static
files created in the previoius step.
The speed run of how to set up autoscaled container service on Amazon Cloud is in a separate post.
A speed run is a playthrough of an otherwise fun game done as fast as possible (primarily, on video). Setting up AWS is a very fun game. For example, it's easy to set up ECS, and then discover, halfway through, that you've made some wrong choices at the beginning that are unfixable, and you have to start over.
It works, it is blazingly fast, and it’s pretty cheap for a managed platform with rolling updates. My bill is
$0.85 per day, which brings it to… $26 a month. In a quirk of Amazon Billing, it says I’m paying the most
(65%) for the Load Balancer rather than for “running the instances” (27%). All these costs are bogus
Believe me, I tried to delete the Load Balancer (this kills the service) or switch to single-instance
configuration (this simply doesn't switch and quickly reverts back--sic!). I couldn't bring it
below $25, and I'm OK with that. Although, I could run this on App Engine for free...
I’ve achieved my goals: cut costs by almost a factor of 3, and reduced the page load speed by a factor of 10-100x.
I fixed my blog. But I set out to fix it not to blog about fixing it; I wanted to explore something more
interesting. But perhaps, next time. ;-)
As I mentioned in my previous post, the previous version of this blog ran on a VM
powered by Gentoo Linux. Partly, that was the reason it was such a big mess and frankly,
a security hazard.
You see, I’ve become literally scared to update Gentoo. Installing updates on Gentoo is like a challenging
puzzle game. Installing Gentoo updates is an NP-hard problem. It is a drain on your time, it’s a
family tragedy and it is plain and simple a security threat. But let’s start at the very beginning,
when I first saw a Linux thingy at my grad school….
At the beginning, there was Windows 3.11 for Workgroups. The first computers I interacted with ran
MS-DOS or Windows 3.11. Then Windows 95, and 98, and finally Windows XP. I thought Windows was all
And then I went to a CS class in college, and wham! Gentoo.
I immediately fell in love with these green [ ok ] marks that show when a
portion of the system has comkpleted loading. Unlike the never-ending scrollbar of Windows XP, it
fosters immediate connection with the inner workfings of the machine. You feel involved. You feel
in the know. You feel powerful.
So when I needed to choose a Linux distro to complete my coursework, it was Gentoo.
The main feature of Gentoo is that you build everything from sources. Nothing connects you to
the inner workings than you literally witnessing the gcc invocations as they churn through your
kernel you manually configured, through the window manager, or a new version of perl. That’s
right, every single package–including the Kernel–is rebuilt on your local machine. Why?
One tihng, is that you can enable unsafe optimizations and tie everything to your machine. Those
off-the-shelf distros have to work on a number of machines, but with Gentoo, you can compile
everything with gcc -O3 --arch=icore7 -fenable-unsafe-life-choices.
It is insanely satisfying to watch. You haven’t lived if you’ve never seen Linux software compile.
If you haven’t seen it, watch it. It’s worth it. It’s like watching fire.
Another selling point, you can disable the features and Reduce Bloat™. You don’t want to build a
desktop environment? Fine–all your packages will compile without the GUI bindings. You never use
PHP? Fine, no PHP bindings. You don’t like bzip2? (Wait, what?) You can disable that too!
You just specify it in the USE flags in your make.conf, like USE="-pgp -gtk -qt4
-bzip2", and then when you emerge your packages, they’ll build without them. (emerge is the
Gentoo’s apt-get install).
Awesome. Wait, what did you say about Bzip2? You can compile your system without bzip and only with
gzip? Why do you even care? That’s because you’re a
college kid with a lot of time on your hands. Emerge on.
So I emerge. Back in 2005, it took really long to compile KDE 3. We would leave it
overnight to compile, and pray that it would not fail because our particular USE flags selection
didn’t make it fail.
And then you try to update it. emerge -uDpav, I still remember it. It recompiles all your
… or not. If you somehow forget to update the system (e.g. you leave for a vacation, or your cron
crashes) then come back in two weeks and try to update it… it will fail to compile. That’s when you’re introduced to dependency twister.
Since the system is its own build environment, every next version should be buildable on top of the
previous version. But sometimes it’s just not. It just doesn’t build. Some library is too old,
but in order to compile a new version, you need to downgrade another library. Or worse, build dependencies
form loops. Imagine dependency Foo needs a new version of library Bar to compile, and the new
version of library Bar requies a new version of Foo–this actually sometimes happens.
Then, you’d have to resolve them by temporarily disabling or re-enabling USE flags. Or randomly
rebuilding subsets of your packages (via helper tools like revdep-rebuild). Or applying the
updates in the correct order, but you need to figure out the order first.
As a result, your system quickly rots and becomes a security hazard. A computer that hasn’t been
updated for years, and is open to the network is a security risk. My access logs showed that
automated bots were constantly trying to hack the website (polling URLs like /wp-admin/admin.php).
So that’s it. Unless the system can security updates quickly and reliably, it’s a security
hazard. Gentoo can not.
I got tired playing dependency twister around the time I graduated. Also, I got tired of trying to
update Ruby’s ActiveRecord every now and then. Nothing like doing this for several years, I really
makes you appreciate App Engine and simialr products.
So I bid Gentoo farewell and moved on. I moved on to Whatever Linux that makes my Docker Containers
up to date… which is now I believe Ubuntu? I don’t really know and I no longer care.
This website just got a new look (with mobile layout), a new web hosting, and a new technology that powers it. I’ll post the summary in the coming days. And otherwise, just enjoy fewer lags and 500 errors. (Except today when I accidentally routed the traffic to the wrong Load Balancer. >_<)
I’ll be writing separate posts about these migrations, but here’s what went where.
Crash due to division by zero? How about multiplication by zero?
I recently transferred to the next level of floating point arithmetic caveats. Division by zero is something we all know about, but multiplication by zero can be as harmful.
An algorithm computed weight of entities, then converted them to integers, as in
This code may fail the assertion. Imagine y is large (say, 1000), so exp(y) no longer fits double. The value of exp(y) will be +Infinity in this case. Surprisingly, it will correctly understand that INT_MAX is less than +Infinity, and will pass the check as expected. But here's when multiplication by zero kicks in.
What will happen if x is 0, and exp(y) is infinity? Any result will be mathematically nonsense, unless it's a special value of "Not A Number". min() would also return NaN, and integer conversion will happily convert it to... -2147483648. The assertion fails, and the production job crashes because it does not expect the result to be negative. We're multiplying two positive floating point numbers, how can it be negative?
Yet it is. All because of multiplication by zero.
Translucent areas depict waiting for something; incomplete lock statements have dashed border. Note that it doesn't matter in which order the top two acquisitions are made.
Can a cryptic entanglement of your mutex locks lead to a deadlock? It sure can. Deadlocking is the second thing your parents tell you about mutexes: if one thread acquires A, then acquires B before releasing A, and the other does the same in the reverse order, the threads may potentially deadlock. And deadlock they will if the first two acquisitions are picked from separate threads. Here's the code of two threads, and sidenote depicts the problematic execution:
But what about Reader/writer locks also known as "shared/exclusive" locks? Let's recap what these are first. Sometimes, to achieve greater efficiency, a mutex implementation supports two flavors of locking: Reader and Writer (otherwise known as "shared" and "exclusive"). If several threads only want to read from a shared variable, there's no need for each of them to wait for others. That's where you'd use a ReaderLock operation on a mutex guarding the variable. If thread wants to write, it invokes WriterLock which means "do not run any readers of writers while I'm holding the lock". Here's a wiki entry for reference, and here's standard Java API.
Seemingly OK Reader Lock Execution
We no longer have a "must-before" relation between B locks in two threads, so they don't deadlock. This looks OK, but it actually is not!
So imagine that both threads X and Y happen to use one of the locks as Reader lock? It seemingly should prevent deadlocking: if, say, B is a reader lock, then the execution specified above will make progress: B.ReaderLock() in thread X will not block waiting for thread Y to release it... right? Here's the code for clarity:
Turns out, reader locks can deadlock. You just need to make reader lock wait for another reader lock's release; how?
Many mutual exclusion implementations make acquiring threads "form a line" of some sort to ensure fairness: no thread should wait forever for a lock. Then a threads that tries to acquire a lock--either shared or exclusive--waits until all threads that called L.WrLock() earlier exit their critical sections. Fairness is especially important when you have reader and writer locks: if you'd allow any reader lock to proceed while there is another reader holding the lock, your writers could "starve" waiting for quiescence among readers, which may never happen on a highly contended lock.
So, to make a reader lock wait on another reader lock, we need a writer lock between them.
Deadlocking Reader Lock Execution
Here's how three threads can interleave such that you have a deadlock between reader mutex locks. The "blocked by" relationship between these reader locks transitively hops over a writer lock in some other thread Z.
Assume, that in the execution described earlier, before Thread X attempts to acquire the reader lock B, thread Z chips in, invokes B.WrLock(), and only then X calls B.RdLock(). The second X's Y.RdLock() starts to wait for the Z to acquire and then release B because of fairness concerns discussed above. Z's B.WrLock() waits for Y to release B.RdLock(). Y waits for X to release A. No thread makes progress, and here's the deadlock. Here's sample code of all three threads:
Note that you will have at least one writer lock for B somewhere (because if all acquisitions are Reader Locks, there's no point in having the lock at all.) Therefore, the only way to prevent this kind of deadlock is to not distinguish reader and writer locks when reasoning about progress guarantees.
This kind of deadlock needs at least three threads in order to bite you, but don't dismiss it outright! If the Internet taught us that a million monkeys in front of typewriters will not eventually recreate all the body of Shakespeare's work. they would at least trigger all possible race conditions in our typewriters no matter how contrived the corresponding executions seem.
I used to wonder what this Error 502 meant when I observed when I visited my favorite websites. When I retried the error usually seemed to disappear, and I found it's linked with the service being overwhelmed with requests at certain times. But why do I see it? Why doesn't the web server start many threads leveraging OS capabilities of CPU sharing? If it queues incoming requests, why doesn't it just delay their fulfillment, just making them wait in a line longer? Just like, you know, in any cafe, bank, and public office where people form lines, and get what they needed--eventually?
There are three answers to this. First, small-latency error is more important than serving all requests. Second, there are interesting nonlinear effects that happen when the server is at capacity if you aren't using queueing. Which leads to the third point, if more requests arrive than what you can process in one second, you will not be able to serve them, ever. You have to drop them. Let's take a closer look.
Fast error is better than slow result
If a service is consuming responses of several other services, it's often in a situation where it can ignore output of one of them, and still provide value to the user. An example in the paper linked above is Google web search that can skip displaying ads, maps, and other additional information, and simply serve search results.
This alone could be a good reason to drop a request, but this doesn't sound that convincing for the frontend server that faces a human end-user. Let's explore two ways of keeping the requests.
Latency Grows Exponentially If You're Not Limiting Rate
Imagine we're not limiting the number of worker threads, and just start processing everything we receive. In other words, we do not keep a queue of requests, and rely on the operating system to provide fair distribution of computing power among working threads. Say, if we have 4 cores, and every requests takes 1 second of CPU work, we can serve at 4 QPS; this is considered "at capacity".
What would happen if a rate grows beyond 4; say, to 5? Requests arrived at the beginning would not complete when a pack of new requests arrives. At the beginning of 2nd second, 5 earlier requests each would have 20% left, and execution speed would drop twice, so they would be completed only after 1.5 seconds. At the beginning of second 2, requests that arrived at second 1 would have completed in 0.4 seconds, but 5 more requests arrive, decreasing the speed by a factor of two again. The latency for these requests would be 2 seconds. The latency for the next pack arriving at 3rd second would be as much as 2¾ seconds, etc. Here's an illustration on how CPU time is distributed among requests, requests arrived at the same second have the same color:
Let's assess how the latency grows with rate. In other words, let's compute function L(t, r), latency of a request arrived at the moment t if the rate of requests is r times higher than capacity. For instance, if our capacity is 4 QPS, but there are 5 incoming requests, then the rate parameter r = 1.25.
First, the number of in-flight requests at the beginning of a given second N(t) grows linearly with t. Indeed, N(t) < QPS*t; on the other hand, the amount of CPU time owed to previous requests is QPS*t*(r-1)/r, which equals QPS*t*(1-1/r). So, (1-1/r)*QPS*t < N(t) < QPS*t, which means that N is linear function of t, e.g. N(t) = a*t, where a > QPS*(1-1/r).
Second, the CPU time allotted to a request witinh a second [t, t+1) is at most 1/N(t). The latency of the request is then the minimal x such that S = 1/N(t) + 1/N(t+1) + ... + 1/N(t+x) ≥ r. Since N is linear, this sum equals to 1/(a*t) + 1/(a*t + a) + 1/(a*t + 2a) + ... 1/(a*t + xa). From integral calculus we recall we can approximate the sum 1/2 + 1/3 + ... + 1/x as ln(x), so S ≅ 1/a * (ln x - ln t)). Hence, x ≥ t * exp(a*r).
We proved that L(t, r) ≳ t * exp(a*r) where a > (1-1/r), so latency grows linearly with time, and exponentially with rate of incoming requests if we try to serve them all right away.
Practical results, however, seem to suggest that exponent power is sublinear with respect to r. Latency definitely grows greater than polynomially, but the true function in the power remains a mystery. The closest I came to so far is sqrt(r), but let me juggle it a bit first, and I'll share my calculations and graphs. Here's the simulation script; built-in rational numbers in Python are cool, btw.
Newly arriving requests slow down the currently executing ones, prolonging their latency. There is a way to mitigate it: queueing.
Latency Grows Linearly with Queues
Imagine the most naive solution: first come, first served queue. All incoming requests are stored in the queue, and are served in the order they arrive, handler popping them from the queue as soon as it has capacity (a free CPU share or an idle working thread).
When the server is below capacity, the queue is always empty, and the latency of each request is L0, which is the time it takes handler to construct a response. What happens when the server reaches capacity?
The latency of a request that arrives at time t (we assume constant rate r) is L(t, r) = Q(t, r)*L0 + L0, where Q(t) is the queue size when the request arrives. The size of the queue is the number of undone requests, which is t*(r-1). Therefore, L(t, r) = t*(r-1)*L0 + L0, concluding that latency grows linearly with both rate and time with queues.
You Can't Serve More Requests than What You Can
This may seem obvious, but is often overlooked. If you start queueing requests and serve them later, then you need to have the capacity dedicated to these requests later. Meanwhile, the next second more requests arrive than you can serve that second, and you have these old requests, too. Now you have three options:
Serve old requests first, but then your latency will start growing at least linearly as you spend more time over capacity;
Serve new requests first, but then old requests will remain in the queue until the first second where more requests arrive. In most cases, this would mean waiting hours as usage pattern are usually circular.
Discard some requests without spending CPU time on serving them. Which leads us to error 502.
In other words, we have to drop (r-1) of all requests regardless of which scheduling mechanism we use. Whether or not the latency grows linearly or exponentially is irrelevant: it has to grow, and this is bad. Its exponential, turbulent growth without queueing is just an interesting artefact because you have another reason to avoid starting all the incoming requests at once: it would simply choke OS scheduler.
When I installed SF Muni sign in my apartment, I realized it had contradictory requirements when it comes to positioning. On one hand, I want it to be visible from my couch, where I sit in the morning. On the other hand, I would want it powered off when I sit on the very same spot in a darkened room, say, when I'm watching a movie. Manually turning it on and off (a custom remote control?) would undermine the whole idea of having a fully automated always-on sign. Installing an automated power on/off knob by itself is easy, but what would trigger it?
Note that powering the LED sign on and off was not a part of an official interface, but I resolved it by rendering an empty picture. I don't know how much energy it saves, but since the USB port is cold after a day of it being "powered off," I can guess that a lot.
I thought of using a schedule: say, power it in the morning, and power it off in the evening. That would probably do. This would be at indirect measurement, though. Schedule is not the problem here: the actual problem is the contrast between the lit sign and the background. What if I can measure the luminosity of the room directly? I have a computer that is much closer to hardware than what I used before, so why not install a light sensor?
What we need is to somehow read the value of current ambient light, and transform it into digital form for our scripts to read. A light sensor is basically a diode that changes its resistance based on the outside light condition, and we need to read current that flows through its circuit.
Unfortunately, the Raspberry Pi itself can not do this: it lacks digital-to-analog converter. Luckily for us, other small Pi-like computers do.
One of Arduino boards, Nano. I chose the smallest one that had analog input. There are more arduinos available.
One of the computers capable of this is Arduino. Unlike Pi, it does not run as a full-featured Linux machine. Instead, you load up a program from another computer, and Arduino then runs this program. Well, that's not too different from what happens on the Pi, but you definitely can't load Linux on small Arduinos.
So, since Light Sensor can't be connected to Pi, I bought Arduino Nano for $10. You probably can buy a different board (the important thing to look for is an "analog port",) but I bought the smallest because Pi is already large enough. The board I bought is, like, ten times smaller than Pi, and that's what I looked for.
Here's how it looks like, mounted:
Here's how the whole thing looks on my wall when it's bright enough.
Since my electronic circuitry skills are somewhere at high school Science classes level, I used this video as a guide. Here's the circuit that guy describes:
Please excuse my mad circuitry skills.
When the light is off
Note that the sign is shut off when there's no enough light. Fancy LEDs are conveniently hidden behind the corner, and don't face my couch. :-)
The components used are really cheap. Arduino itself cost $15. Light sensors and resistors (I used 1kOhm) are sold for five bucks a hundred, but I went for a more expensive option from RadioShack because the store is close to where I live. Since I don't know how to solder, I used female-to-female jumper wires to connect the components (I cut one wire's connector at one side with scissors to connect to the analog input A0.
So it worked as a charm. I played a bit with circuitry (see the notes below,) and can now watch TV without bright red LED lights imprinting on my retina. And I have a much weirder and colorful installation on my wall, which is by itself immensely cute as well as useful.
After I assembled the scheme, I proceeded to install SDK. I didn't try it on my desktop computer, and did this directly on Pi (because why borther?) The apt-get install arduino didn't work, and was showing some error I forgot to write down, but the advise I googled was to update first:
Then I launched SDK from a remote console by typing arduino into my console (I made sure I ssh-ed with -X option, like ssh -X email@example.com), and had the GUI pop up on my desktop while actually running on Pi. I pasted the program you can find in contrib, and pressed "Upload." It didn't work because I first had to select "Arduino Nano w/ATmega328" under "Tools -> Board" menu. This was all programming required: Arduino ran this program every time it was powered on, but I still needed to collect the digital data. The numbers were printed to the USB serial port, and I could read them either from Arduino SDK or by cat /dev/ttyUSB0 from a console.
This program only made sure that the numbers are printed onto the serial port, and a separate "collector" software that interacts with the sign client was still necessary. The collector script I wrote was communicating with muni sign program by creating a file, thus signaling that the sign should be shut down. I added another initscript and used rc-update to auto-start it.
When I first tried to check if the sensor works, I saw that it changes very abruptly with the luminosity. The value read 830-840 unless I completely shoved it under the cushion (where it read as 0). My physics intuition nudged me to decrease resistance, which I did, and achieved greater smoothiness. Now it could tell dimly lit room from a completely lit one, which was my original goal. I use threshold value of 100, but it will vary with resistor and the sensor you use.
Oh, and light sensor is one-sided, so if the value you're reading is about zero no matter how bright the room is, you probably should insert it backwards.
I also couldn't find a way to assign ttyUSB ports, so that LED sign is always on /dev/ttyUSB0, and Arduino's serial port is always on /dev/ttyUSB1, and they get assigned randomly. I just reboot Pi a couple of times until it gets it.
That's all, and I hope this will help me reassemble it again.