Other lessons from a Smiling Bot

The lessons I’ve learned when building a Smiling Bot: an embedded CNN model for the deep learning for Visual Recognition course at Stanford.

There was this woman I really liked, and I needed to show off in some way. How do you impress a girl? Of course you show her your machine learning projects (gotta rock what you have 🤷). I no longer have the Muni Sign on my wall though–that’s been a reliable magnet for ladies–so I showed her my Smiling Robot.

She was impressed…not. Here are her exact words after she saw my Raspberry Pi correctly read a smile off of my face:

That’s nice… It’s slow though.

What?! Two months of hard work on top of ~10 years of CS education, several sleepless nights, modern groundbreaking research packed into a Raspberry Pi, a poster with Arnie, and all I got was…

It’s slow.

Yeah, sorry girl (not sorry)! I’ve only built a very optimized CNN using neural architecture search, but the non-neural Viola-Jones boosting-based face detector I’ve downloaded from the internet without looking and haphazardly slapped on in Lesson II wasn’t that optimized. It only takes three more seconds to give the result (totaling five seconds), and that’s on an ARMv5 chip. ARM vee five! Isn’t that awesome??

It’s slow.

In the heat of the moment, her response hurt my ego so much that I even starter to explain what was going on. But I quickly shut up. Maybe I shouldn’t be that defensive. Not only was I missing on an opportunity to better myself, it’s just not attractive for a guy to be defensive. Acting manly seemed more important at the time so I swiftly moved on to other subjects, but her words stuck with me:

It’s slow.

Gotta admit, she’s right. Despite all the hard work I put into the Bot, it is a tad slow. I have excuses but it doesn’t matter: that’s a valuable lesson to learn.

Customers do not care that your machine learning project was hard!

Of course they don’t care. People are surrounded with technology that, in the words of Arthur C. Clarke, looks “indistinguishable from magic”.

Yes, most have no idea how to backpropagate gradients, but then they talk to (talk to!) their Google Home, and it knows what they’re looking for before they even say it.

They can’t explain why residual connections are a good idea for deep convolutional encoder-decoder architectures (to be honest, nobody can explain that; it just works, okay?) But they open Snapchat, and the filter swaps their face with their bff within a millisecond.

They can’t appreciate that simple linear models can do wonders when thousands of features are thrown at them. But then they open Facebook, and it shows them stuff they don’t like, but they still open it again and again… Well, that one doesn’t sound as magical–but the picture they snapped with their phone looks better than an amateur photographer’s attempts at art shot on a DSLR.

Then they open your product, and…

It’s slow.

They’re right. Don’t ship crap.