Engineering

How not to create 2D arrays in Python
Python has many idioms that make writing the language a joy. One of them is, to quickly initialize an array of n elements you can do this
Aug 2, 2024
Warehouse assignment with Google OR Tool
Google OR Tool is a powerful library with built-in support for certain classes of optimization problems. For example, assignment problem is a well-known problem, in which N number of customers should be assigned to M number of warehouses such that the total serving distance is minimized. Replacing warehouses with fire stations, we may want to minimize the longest distance instead.
Jul 24, 2024
Travelling Saleman problem with local search
Another approach to combinatorial optimization problems is to use local search (LS). With constraint programming (CP), we slowly expand the search frontier by making one choice after another, and prune the search space along the way by removing choices that do not meet constraints. Once a full solution is reached, we are certain that is a valid solution.
Jul 6, 2024
Contraint programming with Google ORTool
Constraint programming (CP) is a technique employed in solving combinatorial problems. Combinatorial problems are set of problems that are quick to verify but take exponential time to try out all possible assignments. Brute force approaches often fail to terminate or take very long CPU time to be of any practical use.
May 21, 2024
Notes on Front-end development
Front-end development is quick to get started. As projects grow and code build up, there are nuances that may help us avoid the painful road of catching intermittent bugs. Below are a few things I would like to note. Key points are to keep the application state’s lean, make program flow obvious, and add checks at your build pipeline.
Mar 28, 2023
Fast way to iterate through video frames with Python and OpenCV
Recently, I was working on a program to sample N frames from a source video, and then assign a score to each frame (from 0 to 1) in terms of thumbnail-worthiness. Before long, it became apparent that decoding video frames was one performance bottleneck. In this post, I would look into different ways of reading video frames with OpenCV and then speeding it up with multithreading.
Dec 13, 2019
Implement shapenet face landmark detection in Tensorflow
In my previous post on building face landmark detection model, the Shapenet paper was implemented in Pytorch. With Pytorch, however, to run the model on mobile requires converting it to Caffe. Though there is tool to take care of that, some operations are not supported and in the case of Shapenet, it was not something I know how to fix yet. Turn out it was simpler to just re-implement Shapenet in Tensorflow and then convert it to Tensorflow Lite.
Sep 12, 2019
Use MobileNetV2 as feature extractor in Tensorflow
Applying machine learning in image processing tasks sometimes feel like toying with Lego blocks. One base block to extract feature vectors from images, another block to classify… Popular choices of feature extractors are MobileNet, ResNet, Inception. And as with any other engineering problem, choosing a feature extractor is about considering trade-offs between speed, accuracy, and size. For my current task of dealing with ML on mobile devices, MobileNetV2 seem to be a good fit as it is fast, quantization friendly and does not sacrifice too much of accuracy. Tensorflow provides a reference implementation of MobileNetV2 that makes using it much easier.
Jun 19, 2019
Train a face dectector using TensorFlow object detection API.
About 3 years ago, putting together a face detection camera application for mobile devices was more involving a task. I remember a colleague sitting next to me back then tinkering with OpenCV and dlib to produce a demo with the right trade-off between size, speed and accuracy. As with every engineering problem, there is no one-size-fit-all solution. A on-device face detector may choose to reduce the size of input images to quicken detection, though lower resolution results in lower accuracy. Fast forward to the moment, it has never been as easier to customize your own face dection model thanks to folks at Google who open source their Tensorflow object dection api. Besides, platforms like Colab provide hobbists with free access to ML training-capable machines.
May 13, 2019
Building face landmark detection model using Pytorch
Having used dlib for face landmark detection task, implementing my own neural network to achieve similar goal can be potentially fun and help the learning process. There is this recently released paper that outlines the approach of using machine learning in setting parameters used in traditional statistical models. The author is nice enough to release his source code, which can be a great starting point. So I forked from there, changed code to remove some bulky dependencies, and sort of re-writing it to better fit my mental model and in the process understand it better.
Apr 3, 2019
Allocate objects on memory buffer for performance gain
I wrote about the cost of memory allocation in a recent post. Given a fixed amount of memory needed, reserve a large chunk in one go is cheaper than grabing smaller chunks one at a time. I did not realize that Cpp has the facility to take advantage of that until reading through the code of folly::IOBuf.
Mar 6, 2019
Echo server with libevent
Network programming is one area where non-blocking IO can be used to achieve higher performance. A typical server needs to handle a few hundreds to a few thousands connections at a time. With the thread-pool based blocking model, when a new connection is established, a server’s thread serving that connection will trigger kernel system call to read data from socket file descriptor, be blocked until data are available. Thus, to handle say 200 connections concurrently, the sever needs to spawn 200 threads.
Mar 4, 2019
The underestimated cost of memory allocation
In languages like Java and C++, memory allocation is explicit and obvious. Programming with arrays in those languages mean thinking about size in advance before allocation. If there is a need for flexible size list, the standard library is also explicit about whether the list is backed by an array or linked list, so that programmers are mindful about operation complexity.
Feb 22, 2019
Implicit property getter can be harmful
Property as a programming language’s feature has been around for a while. I first got to use it while developing a multi-tenant cloud-based point of sale application on .NET platform. The idea is to avoid the verbosity of calling getter/setter methods by invoking them behind the scene whenever a field is accessed/assigned.
Feb 19, 2019
Basic FBThrift example
Facebook re-opensourced their fork of Apache Thrift some 4 years ago. Yet there is relatively little documentation and independent comparison to see if there is any performance bonus to gain by moving from Apache Thrift to FBThrift. The two are no drop-in replacement, thus, replacing one with another requires effort. This post first looks at setting up and running FBThrift. Complete code example used in this post can be found here.
Jan 17, 2019
Floating point binary representation in javascript
Inspired by this post which explains how computer stores floating point number, Here is a bit of javascript code that print out a float32 number in binary format. Firstly, the main idea is, floating point number is represented similar to scientific representation of numbers using E notation.
Dec 19, 2018
Derivative of loss function in softmax classification
Though frameworks like Tensorflow, Pytorch has done the heavy lifting of implementing gradient descent, it helps to understand the nuts and bolts of how it works. After all, neural network is pretty much a series of derivative functions. In this blog post, let’s look at getting gradient of the lost function used in multi-class logistic regression.
Dec 17, 2018
Perspectives in designing ML cost function
For many different machine learning problems, finding a solution involves these similar steps.
Dec 16, 2018
How dk.brics.automaton regex library works
This brics regex library is by far the fastest when comparing with openJDK java.util.regex and com.google.re2j. Let looks at what lie under the hood. dk.brics.automaton is a Finite automata library with application in Regex. The idea is similar to google re2j, which is to construct a DFA from regex string and matching an input string means advancing from one state to another. (google re2j is surprisingly the slowest in my test case. Which is probably due to my particular regex input, or some bug with Java port. I have not examined it yet)
Nov 22, 2018
Misconceptions when scale applications
Once in a while, engineering managers and software engineers would triumphantly tell me to use a particular piece of technology because it is scalable or good for concurrency. Unfortunately, scaling an application means trading off different factors affecting an app’s performance and no single technology comes as silver bullet to solve them all. Below are some of the misconceptions I often hear.
Nov 13, 2018
How regex is implemented
Recently at work, I need to take a deeper look at how optimization is done by different regex libraries and how they combine regex patterns. This document is examining 2 regex implementations: java.util.regex of OpenJDK and re2 of Google.
Oct 25, 2018
How java load classes
For a while, I have delegated the task of managing java run command to IDE. Eclipse, Netbean and IntelliJ and all seem to do a decent job of masking away complexity of supplying java with JVM parameters, classpath, debugging options… Compared to other languages, the java run command can get horrendously long, not very suitable for handcrafting. Recently, when working on some sort of command generating at work and playing around with java command line arguments, I encountered some minor gotchas.
Oct 24, 2018
Shortcomings of web technologies in building a robust desktop application.
As computation power and data are shifting to the edge, web applications are more and more like installed desktop applications with (limited) file system access, threading, offline storages… Frameworks such as Electron, which allow browser-based application to be packaged as standalone app, are gaining popularity. However, with great power comes great responsibility. Javascript, html, css have always been used to build webpages, which is a stateless and forgiving environment. A broken webpage can be remedied by hitting refresh button. The web is not used to be fast, users’ expectation for it is lower than for an installed application.
Sep 6, 2018
Java byte literal for value greater than 0x80
In porting a piece of code from C++ to Java, I encountered statement like this:
Jun 18, 2016
C++ '&' operator
Some note on C++ references
Mar 23, 2016
Concurrency in Java context
Concurrency is an unavoidable fact in web development if the page ever gets pass more than one user (which is pretty much any service out there). But concurrency also poses a problem to data consistency. This post is a back-to-the-basics summary of techniques I’m aware of.
Jan 9, 2016
Running Odoo 8 on Pypy
Pypy has not reached the point of compatibility that one could simply switch the interpreter. Some libraries need to be replaced, or installed with latest development code. Still, one can get the Odoo web running with a few changes.
Jul 31, 2015
Tornado Non-blocking Smtp Client
Recently, I was developing a tornado-based web application at work. The idea of using Tornado is to base the whole web application on Tornado’s single-threaded IOloop, which enables the application to handle higher load compared to using other multi-threaded model (given the same hardware). Since there is no additional thread spawned to handle concurrent requests, no memory overhead. And besides, that help avoids context-switching cost when the program control is passed between threads.
Jul 28, 2014
Fibonacci Tail Recursion
(Documenting my progress with Haskell. little by little)
May 1, 2014