Jekyll2023-07-30T08:49:00+00:00https://vuamitom.github.io/feed.xmlTam VuRiding the wave of increasing entropy
Notes on Front-end development2023-03-28T10:06:20+00:002023-03-28T10:06:20+00:00https://vuamitom.github.io/2023/03/28/note-on-front-end-development<p>Front-end development is quick to get started. As projects grow and code build up, there are nuances that may help us avoid the painful road of catching intermittent bugs. Below are a few things I would like to note. Key points are to keep the application state’s lean, make program flow obvious, and add checks at your build pipeline.</p>
<h2 id="encapsulate-actions-not-just-components">Encapsulate actions, not just components</h2>
<p>Think about the time when you need to re-use a component in multiple screens to accomplish something. For example, to ask the user if she really wants to do something, a popup modal is often deployed across screens. One way to do that is to create a shared ConfirmModal component, and embed it wherever it is needed.</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o"><</span><span class="nx">ConfirmModal</span> <span class="nx">open</span><span class="o">=</span><span class="p">{</span><span class="nx">openConfirmModal</span><span class="p">}</span>
<span class="nx">message</span><span class="o">=</span><span class="p">{</span><span class="dl">"</span><span class="s2">Are you sure you want to delete X?</span><span class="dl">"</span><span class="p">}</span>
<span class="nx">onAccept</span><span class="o">=</span><span class="p">{</span><span class="nx">positiveAction</span><span class="p">}</span>
<span class="nx">onCancel</span><span class="o">=</span><span class="p">{()</span> <span class="o">=></span> <span class="nx">setOpenConfirmModal</span><span class="p">(</span><span class="kc">false</span><span class="p">)}</span><span class="sr">/</span><span class="err">>
</span>
<span class="kd">function</span> <span class="nx">dangerousAction</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">setOpenConfirmModal</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>That’s a first step towards <a href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself">DRY</a>. Yet we can take one step further. Notice that, the host component does not need to manage the open/close state of the <code class="language-plaintext highlighter-rouge">ConfirmModal</code>. Rather, encapsulate the intent as a single function, and only return to the host page what it cares about.</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">dangerousAction</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">confirm</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">confirm</span><span class="p">(</span><span class="dl">'</span><span class="s1">Are you sure you want to delete X?</span><span class="dl">'</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">confirm</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">//... continue with dangerousAction </span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h2 id="choose-a-minimal-state">Choose a minimal state</h2>
<p>Frontend application is stateful. UI components are tied to a set of flags to know if users have clicked on a checkbox or selected another tab. Some state flags are variables that get updated by interrupts (events) triggered by network callback, an interval timer. It’s important to choose a minimum set of flags that reflect your application state. Avoid having state variables that are derivative of another.</p>
<p>This is bad</p>
<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span>
<span class="nl">user</span><span class="p">:</span> <span class="nx">User</span><span class="p">;</span>
<span class="nl">isLoggedIn</span><span class="p">:</span> <span class="nx">boolean</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">isLoggedIn</code> is redundant as its value can be inferred from <code class="language-plaintext highlighter-rouge">user</code>. Having it creates responsibility to update variables when needed.</p>
<h2 id="avoid-abusing-observer-pattern">Avoid abusing observer pattern</h2>
<p>Modern frameworks like Reactjs has built-in constructs to watch for changes in state variable. For example, when users select a new shipping address, another flow should be triggered to update delivery fee. I often see code along the line below, which works most of the time, except for when it does not, it leads to hard to find bug. Object comparison is not exactly the best way to check for changes. And if for some reason, user state data got refetched, the handler would be triggered again.</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="kd">const</span> <span class="nx">selectAddress</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="c1">// this function get called somewhere </span>
<span class="c1">// ....</span>
<span class="nx">setShippingInfo</span><span class="p">(</span><span class="nx">newAddr</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">useEffect</span><span class="p">(()</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">updateCurrentCartShippingFee</span><span class="p">(</span><span class="nx">shippingInfo</span><span class="p">);</span>
<span class="p">},</span> <span class="p">[</span><span class="nx">shippingInfo</span><span class="p">])</span>
</code></pre></div></div>
<p>To me, it’s way better to make the flow obvious</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">selectAddress</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="c1">// ... do something </span>
<span class="nx">setShippingInfo</span><span class="p">(</span><span class="nx">newAddr</span><span class="p">);</span>
<span class="nx">updateCurrentCartShippingFee</span><span class="p">(</span><span class="nx">newAddr</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<h2 id="invest-in-your-build-time">Invest in your build time</h2>
<p>Developing for UI is often developing with multiple quick iterations. Ideally, one would visualize the UI arrangement in his head and churn out css for as long as possible before having to look at the actual changes. In practice, the frequency of going back and forth between the code editor and the UI is high. Bret Victor, in his talk on <a href="https://www.youtube.com/watch?v=PUv66718DII">inventing on principle</a>, stressed the importance of having an immediate feedback loop when creating. I think it’s especially important for UI development. Maybe it’s part of the reason for web technologies’ presence in areas it’s not initially meant for like desktop & mobile app development. The web’s instant reload is faster to iterate than other frameworks like say Qt.</p>
<p>So, if your development project takes long to refresh UI, view it as a critical problem. Take a look at alternative build tools, or split stable parts of the code into sub-packages that don’t need to be rebuilt everytime.</p>
<h2 id="share-data-across-components">Share data across components</h2>
<p>Often, one component needs to communicate its change to others outside of its hierarchy. Two patterns to go about it:</p>
<ul>
<li>shared global state: basically, every component has access to a shared object or any of its child objects, which is the approach framework like Redux employs. Beside the component has access to the setter function of that shared object. When it needs to tell other part of the app to change, it updates the shared state. Pros: easy to debug by logging out the global state’s snapshot. Cons: state is not obvious signal of action, probably you have to watch for change in state instead.</li>
<li>event bus: by implementation, event bus is also a shared global object. But it’s different in that the object holds no state, except for a list of subscribers. With this pattern, an object broadcast immutable events, and other components act on those if they are interested. Each component maintains its own state instead of relying on that of the parent. Pros: modular components. Cons: manage component lifecycle to unsubscribe properly.</li>
</ul>
<h2 id="add-checks-in-your-pipeline">Add checks in your pipeline</h2>
<p>Website speed is important so it’s better to be frugal on how much scripts get downloaded to the browser. More often than not, someone may add a library which is either bloated or not tree-shake friendly. When people notice the unusual bundle size, it can be a few MRs away from the causing one. Then it would be really hard to filter through past changes. In my experience, having checks that automatically fail pipeline when certain metrics degrade is really helpful.</p>
<h2 id="pick-the-right-framework">Pick the right framework</h2>
<p>Despite its popularity, I think React is more suitable for a small team of 2-3 developers than for larger project. It has many ways of doing things for users to decide, and a lot could go wrong if someone is new to the framework. For example, consider <code class="language-plaintext highlighter-rouge">useEffect</code> and its dependency array.</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">fetchProductData</span> <span class="o">=</span> <span class="nx">useHelper</span><span class="p">();</span>
<span class="nx">useEffect</span><span class="p">(()</span> <span class="o">=></span> <span class="p">{</span>
<span class="c1">// ... </span>
<span class="nx">fetchProductData</span><span class="p">();</span>
<span class="p">},</span> <span class="p">[</span><span class="nx">fetchProductData</span><span class="p">]);</span>
</code></pre></div></div>
<p>Linter often suggests adding <code class="language-plaintext highlighter-rouge">fetchProductData</code> to the dependency array. But if fetchProductData is not wrapped by an useCallback, it risks pushingg CPU to 99% usage. For the unsuspectful, it’s hard to notice until suffering the pain a few times.</p>
<p>For larger team, I would suggest going with other more opinionated frameworks, say Angular (I have not tried the framework though).</p>Front-end development is quick to get started. As projects grow and code build up, there are nuances that may help us avoid the painful road of catching intermittent bugs. Below are a few things I would like to note. Key points are to keep the application state’s lean, make program flow obvious, and add checks at your build pipeline.An unprecedented period2020-04-09T23:33:20+00:002020-04-09T23:33:20+00:00https://vuamitom.github.io/2020/04/09/unprecedented-period<p>As of today, millions of Vietnamese have gone through their first week of self-confinement since our Prime Minister ordered businesses to shut down and people to stay in house. To some, the isolation period may have started earlier. Companies like mine have enforced remote working weeks prior to the official order. Self confining voluntarily for almost a month is unprecedented and unthinkable in normal time. But there is nothing normal about the recent course of events.</p>
<p>A new kind of virus, which wreaked havoc in Wuhan China since Lunar New Year, has spread out more than two hundred countries in the world. In mere months, more than a million people have suffered from it, tens of thousands died. To divert the trajectory of doom, governments around the world are enforcing social distancing. To save lives, lives must be turned upside down. Schools and businesses are closed, leaving kids at home and their parents with them. In France, people are getting fined or even imprisoned for wandering on the street, further pushing the line of what are acceptable. Leaders and doctors were faced with uncomfortable trade-offs. Pushing businesses into indebtedness and brankruptcy to prevent further deaths. Stories of Italian doctors weeping after pulling ventilators away from old patients to save younger ones circulated social networks. Triaging patients to direct care and resources to those that are most likely to be saved, and would have the longest time to live. This generally accepted utilitarian approach would have appalled many at another time, since it involves weighing lives on an impersonal scale. And it would be getting even more uncomfortable as the death toll decreases and economic damage mounts. People will start asking whether it’s worth the loss.</p>
<p>Among things that are turned upside down by the virus is, with irony, Vietnameses’ predisposition towards the West. Before long, many men and women from this country have strived for opportunities to work abroad and to migrate. European rich cities hold such allure that some accepted the risk of illegal trafficking to get there. But ever since the pandemic threatened to overwhelm medical systems in many European countries, there has been a reverted flow of people scrambling to repatriate. Before international flight route closure, thousands of Vietnamese arrived in Hanoi and HCM city each day. Home sweet home. The splendor of Paris streets was overshadowed by the danger of getting infected without treatment. The reversal does not just stop there. For many westerners, Southeast Asia destinations have always been attractive due to their welcoming locals. Foreigners were often regarded with keen interest as they were not only pleasantly different but also a highly valued revenue source. But since Europe has the virus pretty much out of control, the locals have shied away from anyone who looks caucasian. It was sad to hear stories of foreigners turned away by hotels. Hopefully, this temporary reversal of attitude won’t do too much damage to local tourism.</p>
<p>For many of us, fortunate enough to avoid infection and lay-off, last few weeks were mostly about dealing with confinement. Staying inside is not necessary a burden. Some people, me among them, prefer occasional solitude and appreciate being away from crowds. Yet, there seem to be a limit to it all. My friend, a dance instructor, once said that she felt like something was crumbling inside even though her work was going well despite the lockdown. Evolution has not changed our deeply wired need to be in pack and to be outside. Maybe there is some merit in the way state media portraying stay-at-home citizens as warriors in the war against Covid-19. We don’t fight against the virus, but rather against our visceral urge to go out when the day is nice and fair, which is harder than it sounds. Individualism and personalities have been valued in this age of Youtube celebs and influencers. Yet, never before has standing in line been this appreciated. Unfortunate souls, who had contracted the virus and had their itinerary laid bare for all to see, would attract scorn from the esteemed public for what could have been normal needs, e.g attending a wedding, visiting a clinic. Impersonal time indeed. An upside is that one can start on books that one has always wanted to read, chat up friends, or try on new hobby. My mom has found this a great opportunity to bake since everyone is around more often. When the legs laze, the mouth works.</p>
<p>So far the experience has been like hitting the brake when you are on highway. Activities come to a grinding halt. Everyday around the globe, someone somewhere lost her loved one. Death is an addition to the ever increasing number of casualties. For some, the prolonged calamity means suspended plans, unfulfilled promises. For others, it means shutting down businesses, letting employees go. The impact on society and on individuals will be long lasting. A silver lining of this circumstance is that people are brought closer together. Parents spending more times with their kids. Food giveaway booths are set up to help the poor. Let’s hope that better day will come.</p>As of today, millions of Vietnamese have gone through their first week of self-confinement since our Prime Minister ordered businesses to shut down and people to stay in house. To some, the isolation period may have started earlier. Companies like mine have enforced remote working weeks prior to the official order. Self confining voluntarily for almost a month is unprecedented and unthinkable in normal time. But there is nothing normal about the recent course of events.Covid-192020-03-20T22:24:18+00:002020-03-20T22:24:18+00:00https://vuamitom.github.io/2020/03/20/covid19-in-hanoi<p>For the past 2 months, covid-19 has pushed lives of people all over the globe into disarray. Everywhere, events are cancelled, cargos held up at borders, trade suspended. Covid-19 has become the all dominating theme at dinner conversation. People talk about it in barber shops, in supermarkets, on taxi. And unlike the situation in Europe with thousands of confirmed cases, Vietnam has managed to keep the number below a hundred (though that is increasing daily). As a result, though covid-19 patients are referred to by number in the news, those numbers are still very personal. Pick a random person on the street, most likely he/she can recount with details where patient no. 34 went, what she did for a living, who were her F1 (people in direct contact with her)…</p>
<p>The pandemic makes people ill but measures in place to stop it is what are disrupting daily lives. One may pin his hope on the seasonality of the disease. As the weather gets warmer, the pandemic may eventually subside, giving the world’s economy a break. Yet, even if that’s the case, the virus may inflict far more damage in its wake than financial losses during temporary shutdowns. The short acute shock from the pandemic can lead to systematic failure of the market if borrowing has reached an unsustainable level. Right before the outbreak, the world has enjoyed almost 11 consecutive years of economic expansion. Yet a year before that, I remember reading forboding The Economist’s articles anticipating a coming recession, given the cyclicity of the world economy. Despite anticipation of market crash, macroeconomic indicators had been better than ever. Inflation had been well-behaved and unemployment in the U.S had been at record low. The problem with that is, investors and consumers could have come to feel a false sense of security, believing that good time would last forever, and thus increase debt-powered spendings. Loans are often collaterized with marked-to-market assets, and companies estimate their ability to service the loans based on good time profit. An unexpected event like covid-19 is all it needs to temporarily shrink firms’ profit, creates a shortage of cash which, making loans unservicable for companies. A few panicking investors sell their stocks, depressing assets’ prices. Depreciating collaterals force banks to sell, thus, further depress prices, creating a self-reinforcing downward sprial.</p>
<p>This is what is happening in stock market now. Two weeks ago, when news of patient no. 17 surfaced, investors anticipated worsening pandemic situation and started to sell off their shares. As assets’ prices went down hill, investors who had used borrowed money to buy stocks were forced to liquidate their positions. And not to mention investors who would jump in the action to avoid further loss. That was the opposite of economics 101, lower prices in the asset market did not trigger an increase in demand. Rather, demand dried up as a result of lower prices. This brought me some personal pain as I watched VNDirect’s dashboard covered in red. And market decline has not shown any sign of stop. Though, a choice to buy stocks now should be less risky than before covid-19 outbreak when the risk was not visible.</p>
<p>One lesson is that, had one looked at only inflation or unemployment rate, one would not have thought that the economy is doing very well. However, its ability to withstand occasional shocks is questionable. George Cooper suggested in his book <a href="https://www.amazon.com/Origin-Financial-Crises-Central-Efficient/dp/0307473457">Origin of financial crises</a> that rather than traditional macroeconomic indicators, central banks should look at debt level and the cost of borrowing to guage the health of the economy.</p>
<p>Hanoi’s streets are still filled with motobikes and cars, as busy as ever. The usually inefficent system of reckless scooter drivers turn out to be pretty resilient in stressful time. Would this pandemic happen 10 years later, after Hanoi has <a href="https://www.bbc.com/news/world-asia-40498052">managed to ban scooters</a>, people would have a much harder time going around. Not everybody could afford a car and crowded public buses could be virus hot beds. Though public transportation is nice, the city would do better by encouraging bicycle riding as an active mode of going around. In time of stress, being able to travel at will can be very helpful.</p>
<p>Companies, including mine, are tinkering with work-from-home policies. Though VTV aired a section praising businesses moving to remote-working direction, I think it is by no mean a smooth transition for Vietnamese businesses. Working from home requires a certain level of trust in employees in order to work. I feel that that trust level is not really high and there are reasons for that. But I do hope that, when the pandemic subsides, firms do not write off working from home unworkable. Rather they should see it as necessary, build permanent policies and train their employees to work remotely on a more regular basis. Trust takes time.</p>
<p><img src="/content/images/hang_en.jpg" alt="En cave" /></p>
<p>The week right after when the patient no.17 was hospitalized, fear gripped the whole nation and my trip to Én cave was due. I got cold feet and wanted to postpone it but it was too late for rescheduling, and went ahead with the plan I did. Streets of Phong Nha was empty, and I was the only one staying in that twenty-plus room guest house. Suddenly, anyone could be a potential virus carrier, but forgetting that fact for a moment, then the trip was really enjoyable. Weather was nice, and one has all the space to himself, food was good. Nothing to complain about.</p>For the past 2 months, covid-19 has pushed lives of people all over the globe into disarray. Everywhere, events are cancelled, cargos held up at borders, trade suspended. Covid-19 has become the all dominating theme at dinner conversation. People talk about it in barber shops, in supermarkets, on taxi. And unlike the situation in Europe with thousands of confirmed cases, Vietnam has managed to keep the number below a hundred (though that is increasing daily). As a result, though covid-19 patients are referred to by number in the news, those numbers are still very personal. Pick a random person on the street, most likely he/she can recount with details where patient no. 34 went, what she did for a living, who were her F1 (people in direct contact with her)…A decade in review2020-01-18T17:22:21+00:002020-01-18T17:22:21+00:00https://vuamitom.github.io/2020/01/18/decade-lookback<p>A decade time would feel short when the clock was ticking pass midnight on the last day of 2019. Where all the time has gone. Yet, a simple act of comparison between the self now and a decade ago, that whole decade would start to appear more real, heavier and thicker in one’s awareness. Old perceptions, preferences, fears, worries, desires were replaced by the new. Changes happened slow enough that one hardly noticed, yet, significant enough to surprise us when compressed in a moment of reminiscing. This is my first time to write them down.</p>
<p>Career wise, I earned my bachelor degree, got my first job and worked in 5 different cities. A few things that I used to do when I first started to work but no longer do now:</p>
<ul>
<li>
<p>Be eager to impress by taking on tight deadlines and then spent sleepless nights and weekends to accomplish it. Needless to say, it was not good for health. Trying to impress would prevent timely communication about progress concerns. And looking back, re-negotiating deadlines and resources when they looked infeasible might have been a better strategy. An individual contributor only has fixed amount of time. But a team’s time is more elastic as it can shift people from one project to another, more urgent one. Thus, instead of doubling down, ask for help and more “resources”.</p>
</li>
<li>
<p>Be possessive of the code. Software engineers write code based on their mental concept of how a system would work and interact. Given the same business requirements, individuals have different way of structuring their code. One may prefer having a large centralized logic unit while other spreading logic into smaller more specialized unit. No matter what amount of best practices would change that. After building and leading software teams, I’ve come to accept those idiosynchrasies, and to trust teammates to do what are reasonable. In the end, metrics (performance and business) will drive system designs.</p>
</li>
<li>
<p>Be an energy spendthrift. A decade ago, putting a new programming language to test in the next project sounded like a nice thing. I would maximize unknown factors and plunge myself into figuring out those stuffs along the way. Now I would prefer a familiar toolset, unless there are compelling reasons to do so. It is not that learning new things no longer excites me. I’m more interested into conceptual instead of procedural aspect of things. And a decade ago, I would be eagerly jump on board a new project and think about its practicality much later. A service to send sms notifying bus arrivals, a shopify clone, … are among projects that were quickly started and quickly abandoned. Youthful exuberance gave way to understanding that harder part of a product is not the software part.</p>
</li>
<li>
<p>Be concerned about doing cool things. In the beginning, there is a lot of talk about how one thing in software is more shinny than others. Common beliefs were software outsourcing easy, mobile app development more interesting than core banking, backend harder than frontend… Having worked briefly in each, I think all of them are equally hard and require different expertises. Though friends and families sometimes believe that computer science graduates know everything computer, including fixing broken one, they are jobs for different people. And there is enough depth in each area for people to grow, whatever they choose to do.</p>
</li>
</ul>
<p>Nearly a decade of working in tech, there were things I found amusing about the industry.</p>
<ul>
<li>
<p>Occasional invention of new names/acronyms for existing concepts would give rise to much overhyped new field. Think microservices. Service oriented architecture (SOA) has been around for long in the enterprise software world, yet it does not invoke as much trendiness when pronounced. Think NoSQL databases. Taking a relational database, rid it of SQL query processor, constraints and triggers, voila!, a NoSQL database. A few years ago, lots of developer flocked to NoSQL’s banner. But as their use cases grew more complicated, NoSQL were then forced to developed capabilities that are already available in RDBMS.</p>
</li>
<li>
<p>Software developers often lash out library names as knee jerk reaction to a requirement. Need a cache, use Redis/Memcached. Need to store time-series data, use InfluxDB. If you have a hammer, everything looks like a nail. Careful consideration of actual requirements, e.g of a cache, would save everyone a lot of trouble. Similarly, the ancient argument that a fast application layer need to be written in C is still ever present in a lot of people’s perception. It takes some effort to persuade the other person that, it only matters if the bottle neck is the CPU.</p>
</li>
<li>
<p>When meeting software engineer strangers at awkward dinner parties, asking “Do you code Python/Java/…” seem to be popular conversation starter of choice. Yeah, that is a part of what we do.</p>
</li>
</ul>
<p>Near the end of this decade, I got hands-on with projects involving deep learning. The field is increasingly approachable to non-researchers. New releases of Pytorch or Tensorflow now allow users to train and run a model with a few lines of code. Thus at the end of the day, what matters more is who own the data. Though industry leaders often draw analogies between machine learning advancement and the previous smartphone evolution, unlike the smartphone boom, machine learning is more incumbent-friendly. Since established companies have an edge over up-starts and small businesses due to the amount of resources they can pour into acquiring and labeling data. So it’s not likely that the like of Whatsapp or Zalo, dominating consumer oriented applications, will appear out of this trend. That’s said, there are already a lot of new AI-focused companies appearing, which is exciting still.</p>
<p>Given the limited time each of us have, having to put a decade behind is not anything to celeberate. But maybe the price is worth paying for the experiences we gained, the people we met. And not that we have the choice anyway. So I’m grateful for what have been and looking foward to what’s to come.</p>A decade time would feel short when the clock was ticking pass midnight on the last day of 2019. Where all the time has gone. Yet, a simple act of comparison between the self now and a decade ago, that whole decade would start to appear more real, heavier and thicker in one’s awareness. Old perceptions, preferences, fears, worries, desires were replaced by the new. Changes happened slow enough that one hardly noticed, yet, significant enough to surprise us when compressed in a moment of reminiscing. This is my first time to write them down.Fast way to iterate through video frames with Python and OpenCV2019-12-13T14:36:18+00:002019-12-13T14:36:18+00:00https://vuamitom.github.io/2019/12/13/fast-iterate-through-video-frames<p>Recently, I was working on a program to sample N frames from a source video, and then assign a score to each frame (from 0 to 1) in terms of thumbnail-worthiness. Before long, it became apparent that decoding video frames was one performance bottleneck. In this post, I would look into different ways of reading video frames with OpenCV and then speeding it up with multithreading.</p>
<h2 id="reading-video-frames-with-opencv">Reading video frames with OpenCV</h2>
<p>Consider that a video is F frame long, we will need to select frame at the sample rate of <code class="language-plaintext highlighter-rouge">S = F/N</code>. Naiively, one could loop through the video and read every single frame.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># naive version
</span><span class="n">cap</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">VideoCapture</span><span class="p">(</span><span class="n">video_path</span><span class="p">)</span>
<span class="n">success</span><span class="p">,</span> <span class="n">img</span> <span class="o">=</span> <span class="n">cap</span><span class="p">.</span><span class="n">read</span><span class="p">()</span>
<span class="n">fno</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="n">success</span><span class="p">:</span>
<span class="k">if</span> <span class="n">fno</span> <span class="o">%</span> <span class="n">sample_rate</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">do_something</span><span class="p">(</span><span class="n">img</span><span class="p">)</span>
<span class="c1"># read next frame
</span> <span class="n">success</span><span class="p">,</span> <span class="n">img</span> <span class="o">=</span> <span class="n">cap</span><span class="p">.</span><span class="n">read</span><span class="p">()</span>
</code></pre></div></div>
<p>However, each call to <code class="language-plaintext highlighter-rouge">cap.read()</code> would result in unnecessary decoding of frames to be discarded. Performance was intolerable. A better alternative would be to skip through frames and only decode the one that we need. Opencv <code class="language-plaintext highlighter-rouge">grab</code> and <code class="language-plaintext highlighter-rouge">retrieve</code> function can be used for this purpose (which are called by <code class="language-plaintext highlighter-rouge">read</code> function internally).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">success</span> <span class="o">=</span> <span class="n">cap</span><span class="p">.</span><span class="n">grab</span><span class="p">()</span> <span class="c1"># get the next frame
</span><span class="n">fno</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="n">success</span><span class="p">:</span>
<span class="k">if</span> <span class="n">fno</span> <span class="o">%</span> <span class="n">sample_rate</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">_</span><span class="p">,</span> <span class="n">img</span> <span class="o">=</span> <span class="n">cap</span><span class="p">.</span><span class="n">retrieve</span><span class="p">()</span>
<span class="n">do_something</span><span class="p">(</span><span class="n">img</span><span class="p">)</span>
<span class="c1"># read next frame
</span> <span class="n">success</span><span class="p">,</span> <span class="n">img</span> <span class="o">=</span> <span class="n">cap</span><span class="p">.</span><span class="n">grab</span><span class="p">()</span>
</code></pre></div></div>
<p>This simple modification would result in a much needed performance boost. But can we do better? Looking through OpenCV documentation, it’s possible to read from a specific frame by call to <code class="language-plaintext highlighter-rouge">cap.set(cv2.CAP_PROP_POS_FRAMES, fno)</code>. The above snippet can be re-written as below:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">total_frames</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">cap</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">cv2</span><span class="p">.</span><span class="n">CAP_PROP_FRAME_COUNT</span><span class="p">))</span>
<span class="k">for</span> <span class="n">fno</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">total_frames</span><span class="p">,</span> <span class="n">sample_rate</span><span class="p">):</span>
<span class="n">cap</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">cv2</span><span class="p">.</span><span class="n">CAP_PROP_POS_FRAMES</span><span class="p">,</span> <span class="n">fno</span><span class="p">)</span>
<span class="n">_</span><span class="p">,</span> <span class="n">image</span> <span class="o">=</span> <span class="n">cap</span><span class="p">.</span><span class="n">read</span><span class="p">()</span>
<span class="n">do_something</span><span class="p">(</span><span class="n">image</span><span class="p">)</span>
</code></pre></div></div>
<p>So which option is faster. It depends on number of frames read relative to total frames. The sparser the read, the faster it is to read with random-seeking method. Since each <code class="language-plaintext highlighter-rouge">cap.set</code> takes a fixed amount of time to reach certain frame, the latter method’s time cost increases linearly with number of frames to sample. On the other hand, the incremental read method results in a relatively fix amount of time regardless of number of frames to sample.</p>
<p><img src="/content/images/opencv_read_frame.png" alt="Compare performance of two methods" /></p>
<h2 id="sampling-video-frames-faster-with-multi-threads">Sampling video frames faster with multi-threads</h2>
<p>If neither of the above methods meet speed requirement, the next natural step would be to parallize frame reading. The idea is simple, instead of one thread reading through the whole video, we would have each worker thread read a continuous segment of the video.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fnos</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">total_frames</span><span class="p">,</span> <span class="n">sample_rate</span><span class="p">))</span>
<span class="n">n_threads</span> <span class="o">=</span> <span class="mi">4</span> <span class="c1"># n_threads is the number of worker threads to read video frame
</span><span class="n">tasks</span> <span class="o">=</span> <span class="p">[[]</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">n_threads</span><span class="p">)]</span> <span class="c1"># store frame number for each threads
</span><span class="n">frame_per_thread</span> <span class="o">=</span> <span class="n">math</span><span class="p">.</span><span class="n">ceil</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">fnos</span><span class="p">)</span> <span class="o">/</span> <span class="n">n_threads</span><span class="p">)</span>
<span class="n">tid</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">idx</span><span class="p">,</span> <span class="n">fno</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">fnos</span><span class="p">):</span>
<span class="n">tasks</span><span class="p">[</span><span class="n">math</span><span class="p">.</span><span class="n">floor</span><span class="p">(</span><span class="n">idx</span> <span class="o">/</span> <span class="n">frame_per_thread</span><span class="p">)].</span><span class="n">append</span><span class="p">(</span><span class="n">fno</span><span class="p">)</span>
</code></pre></div></div>
<p>A simple worker thread class can be implemented like below, which uses synchronized queue to communicate between main thread and worker threads.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Worker</span><span class="p">(</span><span class="n">threading</span><span class="p">.</span><span class="n">Thread</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">threading</span><span class="p">.</span><span class="n">Thread</span><span class="p">.</span><span class="n">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">queue</span> <span class="o">=</span> <span class="n">queue</span><span class="p">.</span><span class="n">Queue</span><span class="p">(</span><span class="n">maxsize</span><span class="o">=</span><span class="mi">20</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">decode</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">video_path</span><span class="p">,</span> <span class="n">fnos</span><span class="p">,</span> <span class="n">callback</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">queue</span><span class="p">.</span><span class="n">put</span><span class="p">((</span><span class="n">video_path</span><span class="p">,</span> <span class="n">fnos</span><span class="p">,</span> <span class="n">callback</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">run</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="s">"""the run loop to execute frame reading"""</span>
<span class="n">video_path</span><span class="p">,</span> <span class="n">fnos</span><span class="p">,</span> <span class="n">on_decode_callback</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">queue</span><span class="p">.</span><span class="n">get</span><span class="p">()</span>
<span class="n">cap</span> <span class="o">=</span> <span class="n">cv2</span><span class="p">.</span><span class="n">VideoCapture</span><span class="p">(</span><span class="n">video_path</span><span class="p">)</span>
<span class="c1"># set initial frame
</span> <span class="n">cap</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">cv2</span><span class="p">.</span><span class="n">CAP_PROP_POS_FRAMES</span><span class="p">,</span> <span class="n">fnos</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">success</span> <span class="o">=</span> <span class="n">cap</span><span class="p">.</span><span class="n">grab</span><span class="p">()</span>
<span class="n">results</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">idx</span><span class="p">,</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">fnos</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">while</span> <span class="n">success</span><span class="p">:</span>
<span class="k">if</span> <span class="n">count</span> <span class="o">==</span> <span class="n">fnos</span><span class="p">[</span><span class="n">idx</span><span class="p">]:</span>
<span class="n">success</span><span class="p">,</span> <span class="n">image</span> <span class="o">=</span> <span class="n">cap</span><span class="p">.</span><span class="n">retrieve</span><span class="p">()</span>
<span class="k">if</span> <span class="n">success</span><span class="p">:</span>
<span class="n">on_decode_callback</span><span class="p">(</span><span class="n">image</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">idx</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">idx</span> <span class="o">>=</span> <span class="nb">len</span><span class="p">(</span><span class="n">fnos</span><span class="p">):</span>
<span class="k">break</span>
<span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">success</span> <span class="o">=</span> <span class="n">cap</span><span class="p">.</span><span class="n">grab</span><span class="p">()</span>
</code></pre></div></div>
<p>Putting them together:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># create and start threads
</span><span class="n">threads</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">n_threads</span><span class="p">):</span>
<span class="n">w</span> <span class="o">=</span> <span class="n">Worker</span><span class="p">()</span>
<span class="n">threads</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">w</span><span class="p">)</span>
<span class="n">w</span><span class="p">.</span><span class="n">start</span><span class="p">()</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">queue</span><span class="p">.</span><span class="n">Queue</span><span class="p">(</span><span class="n">maxsize</span><span class="o">=</span><span class="mi">100</span><span class="p">)</span>
<span class="n">on_done</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">results</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="c1"># distribute the tasks from main to worker threads
</span><span class="k">for</span> <span class="n">idx</span><span class="p">,</span> <span class="n">w</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">threads</span><span class="p">):</span>
<span class="n">w</span><span class="p">.</span><span class="n">decode</span><span class="p">(</span><span class="n">video_path</span><span class="p">,</span> <span class="n">tasks</span><span class="p">[</span><span class="n">idx</span><span class="p">],</span> <span class="n">on_done</span><span class="p">)</span>
<span class="c1"># do something with result now:
</span><span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">results</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">timeout</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="n">do_something</span><span class="p">(</span><span class="n">image</span><span class="p">)</span>
</code></pre></div></div>Recently, I was working on a program to sample N frames from a source video, and then assign a score to each frame (from 0 to 1) in terms of thumbnail-worthiness. Before long, it became apparent that decoding video frames was one performance bottleneck. In this post, I would look into different ways of reading video frames with OpenCV and then speeding it up with multithreading.The Gene An Intimate History2019-11-14T14:36:18+00:002019-11-14T14:36:18+00:00https://vuamitom.github.io/2019/11/14/the-gene-history<p><a href="https://www.amazon.com/Gene-Intimate-History-Siddhartha-Mukherjee/dp/1432837818">The Gene An Intimate History</a> is a moving account of the discovery and development of genetics, the science of inheritance and chemical basis that give living things their forms and functions. The book is also the author’s deeply personal story of how genetics linked diceases burdened his larger family.</p>
<p>Within a century time, genetics has gone from non-existence to one of mankind’s greatest endeavor to understand nature and ourselves. At the beginning of 20th century, Mendel’s theory was re-discovered, which stated that there exist a concrete indivisible unit of inheritance that got passed on from parents to offsprings. Physical traits from parents re-emerge in children distinctively rather than average out. E.g cross breeding a white flower plant with a red one would result in either white or red color plants, not pink one. And then, when scientists discovered that human’s genome resides on 23 pairs of chromosomes, gene had taken its material form, no longer an abstract theory. After that, Watson and Crick discovered the chemical structure of gene as double helix DNA molecules. At the end of the 20th century, thousands of researchers joined effort to sequence the human’s genome, a project as iconic of the century as landing people on the moon.</p>
<p>The human’s genome is approximately 3 billions of DNA bases (A, G, T , C). Visually speaking, printing thoses bases double sidedly on an A4 paper, it would take a million pages to contain it all. And the book provides a number of delightful details about our own genome. Number of genes that constitute the human genome is fewer than that of rices and corns. Nature has a way to make more with less. It achieves complexity not with more materials but rather with more intricate arrangment of those materials. Besides, throughout the human’s genome, we share many common genes with fruit flies, worms. Million years of evolution still leave traces behind. Those genes stay dormant in our genome, inactive but unable to leave.</p>
<p>As mankind’s understanding of genes evolved, one thing has remained constant: human’s aspiration to be better and to be closer to perfection. That desire at time led to horrors like the eugenic movement, which arrogantly proposed to get rid of bad genes. Thousands of people that were deemed imbecile, dim-witted were put in concentration camp, isolated, sterilized. Cruel and confusing time it was. Judging the quality of genes from subjective measurement like intelligence is absurd and prone to abuse. Under Nazi Germany, eugenics escalated to exterminating cripples, homosexuals, and Jews. Nowaday, genetic technologies have allowed researchers to tamper with human’s DNA first to cure disceases. Many have in sight the goal of engineering a better human, not limited by what have been shown to us by nature. A Netflix documentary, <a href="https://www.netflix.com/watch/80208834?trackId=14170286&tctx=1%2C2%2C72483a3e-cddb-4718-943b-092887cc9398-202796346%2Cf8917691-7153-43fd-84c7-1eb56881e528_130081853X3XX1573832856893%2Cf8917691-7153-43fd-84c7-1eb56881e528_ROOT">Unnatural Selection</a>, shows that so-called biohackers are already distributing DIY DNA kits with which brave souls can start altering themselves right now.</p>
<p>One thing to note is that, as Victor McKusick has realized when documenting various genetics linked diceases, there are no normalcy in genes, just variations. What we call diceases are more about the relationship between a genetic variation and its environment, whether it helps or hinders the organism’s survival. Genetic variations are human kind’s pool of resources to draw on in time of environmental shifts, and the condition necessary for Darwinian evolution to work. That is to say, there will be inherient risk in trying to engineering our DNA to perfection, which would deprive nature of its needed gene pool.</p>
<p>The last century was about building the foundation of genetics. This century, building on that foundation together with advancements in computers and AI, genetics will surely move much faster. Who knows which path genetics will lead us down in this century. For better or worse, we should expect radical changes in how we deal with diceases, our sense of selves, our view of morality, equality.</p>The Gene An Intimate History is a moving account of the discovery and development of genetics, the science of inheritance and chemical basis that give living things their forms and functions. The book is also the author’s deeply personal story of how genetics linked diceases burdened his larger family.Implement shapenet face landmark detection in Tensorflow2019-09-12T21:41:40+00:002019-09-12T21:41:40+00:00https://vuamitom.github.io/2019/09/12/shapenet-face-landmark-tensorflow<p>In my previous post on <a href="/2019/04/03/face-landmark-detect">building face landmark detection model</a>, the <a href="https://github.com/justusschock/shapenet">Shapenet</a> paper was implemented in Pytorch. With Pytorch, however, to run the model on mobile requires converting it to Caffe. Though there is tool to take care of that, some operations are not supported and in the case of Shapenet, it was not something I know how to fix yet. Turn out it was simpler to just re-implement Shapenet in Tensorflow and then convert it to Tensorflow Lite.</p>
<h2 id="implement-shapenet-in-tensorflow">Implement Shapenet in Tensorflow</h2>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">predict_landmarks</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">pca_components</span><span class="p">,</span> <span class="n">is_training</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">feature_extractor</span><span class="o">=</span><span class="n">extractors</span><span class="p">.</span><span class="n">original_paper_feature_extractor</span><span class="p">):</span>
<span class="c1"># shape means are stored at index 0
</span> <span class="n">shape_mean</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">constant</span><span class="p">(</span><span class="n">pca_components</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">name</span><span class="o">=</span><span class="s">'shape_means'</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">float32</span><span class="p">)</span>
<span class="n">components</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">constant</span><span class="p">(</span><span class="n">pca_components</span><span class="p">[</span><span class="mi">1</span><span class="p">:],</span> <span class="n">name</span><span class="o">=</span><span class="s">'components'</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">float32</span><span class="p">)</span>
<span class="n">in_channels</span> <span class="o">=</span> <span class="mi">1</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">inputs</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span> <span class="o">==</span> <span class="mi">3</span> <span class="k">else</span> <span class="mi">3</span>
<span class="c1"># get number of PCA components
</span> <span class="n">n_components</span> <span class="o">=</span> <span class="n">components</span><span class="p">.</span><span class="n">shape</span><span class="p">.</span><span class="n">as_list</span><span class="p">()[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">n_transforms</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">TRANSFORMS_OPS</span><span class="p">.</span><span class="n">items</span><span class="p">():</span>
<span class="n">n_transforms</span> <span class="o">+=</span> <span class="n">v</span>
<span class="n">num_out_params</span> <span class="o">=</span> <span class="n">n_components</span> <span class="o">+</span> <span class="n">n_transforms</span>
<span class="k">if</span> <span class="n">in_channels</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">inputs</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">expand_dims</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="c1"># extract features from input
</span> <span class="n">features</span> <span class="o">=</span> <span class="n">feature_extractor</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">num_out_params</span><span class="p">,</span> <span class="n">is_training</span><span class="p">)</span>
<span class="n">features</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">features</span><span class="p">,</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">num_out_params</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span>
<span class="c1"># run shape layer
</span> <span class="n">shapes</span> <span class="o">=</span> <span class="n">shape_layer</span><span class="p">(</span><span class="n">shape_mean</span><span class="p">,</span> <span class="n">components</span><span class="p">,</span> <span class="n">features</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="n">n_components</span><span class="p">])</span>
<span class="c1"># run transform layers. Transform parameters are stored
</span> <span class="c1"># in the last few indices of features vector.
</span> <span class="n">transformed_shapes</span> <span class="o">=</span> <span class="n">transform_layer</span><span class="p">(</span><span class="n">shapes</span><span class="p">,</span> <span class="n">features</span><span class="p">[:,</span> <span class="n">n_components</span><span class="p">:])</span>
<span class="k">return</span> <span class="n">transformed_shapes</span>
</code></pre></div></div>
<p>The implementatin in Tensorflow is pretty straightforward.</p>
<ol>
<li>Firstly, pre-calculated PCA components are loaded onto the Tensorflow graph as <code class="language-plaintext highlighter-rouge">tf.constant</code>. Secondly.</li>
<li>A feature extractor network is run on input images to retreive feature vector.</li>
<li>Run the shape layer on the extracted features to get landmarks.</li>
<li>Transform the output of shape layer with scale, translate and rotate parameters.</li>
</ol>
<p>Detailed implementation of each layer can be found <a href="https://github.com/vuamitom/shapenet-tensorflow/tree/master/model/shapenet">here</a></p>
<h2 id="speeding-up-feature-extraction-with-depthwise-convolution">Speeding up feature extraction with depthwise convolution.</h2>
<p>When deploying neural network on mobile, speed is the key. Using tensorflow lite <a href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark">bencmark tool</a>, we can discern that convolution ops are among the most costly operations. By replacing convolution layer with <a href="https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728">depthwise separable convolution</a>, a speed up can be quickly achieved.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">depthwise_conv_feature_extractor</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">num_out_params</span><span class="p">,</span> <span class="n">is_training</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
<span class="k">return</span> <span class="n">original_paper_feature_extractor</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">num_out_params</span><span class="p">,</span> <span class="n">is_training</span><span class="o">=</span><span class="n">is_training</span><span class="p">,</span> <span class="n">use_depthwise</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">original_paper_feature_extractor</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">num_out_params</span><span class="p">,</span> <span class="n">is_training</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">use_depthwise</span><span class="o">=</span><span class="bp">False</span><span class="p">):</span>
<span class="s">""" Original feature extractor accepts a use_depthwise flag
"""</span>
<span class="n">norm_class</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">contrib</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="n">instance_norm</span>
<span class="c1"># use depthwise conv2d as a drop-in replacement for conv2d
</span> <span class="n">conv2d_class</span> <span class="o">=</span> <span class="n">slim</span><span class="p">.</span><span class="n">separable_conv2d</span> <span class="k">if</span> <span class="n">use_depthwise</span> <span class="k">else</span> <span class="n">slim</span><span class="p">.</span><span class="n">conv2d</span>
</code></pre></div></div>
<h2 id="converting-to-tflite-model">Converting to Tflite model.</h2>
<p>In theory, converting a Tensorflow model to Tflite is just a matter of running converter tool. However, similar to Pytorch converter, Tflite model requires the Tflite runtime to support operations in the model graph. For Shapenet, <code class="language-plaintext highlighter-rouge">tf.broadcast_to</code> and <code class="language-plaintext highlighter-rouge">tf.cos</code> were two operations that are not supported at the time. Fortunately it is pretty simple to implement your own from from supported operations:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># for tf.broadcast_to
</span><span class="k">def</span> <span class="nf">broadcast_to_batch</span><span class="p">(</span><span class="n">tensor</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="k">if</span> <span class="n">FOR_TFLITE</span><span class="p">:</span>
<span class="n">multiples</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">concat</span><span class="p">([[</span><span class="n">batch_size</span><span class="p">],</span> <span class="n">tf</span><span class="p">.</span><span class="n">ones</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">size</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">shape</span><span class="p">(</span><span class="n">tensor</span><span class="p">)),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">int32</span><span class="p">)],</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">tile</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">expand_dims</span><span class="p">(</span><span class="n">tensor</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="n">multiples</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="n">name</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">broadcast_to</span><span class="p">(</span><span class="n">tensor</span><span class="p">,</span> <span class="p">[</span><span class="n">batch_size</span><span class="p">,</span> <span class="o">*</span><span class="n">tensor</span><span class="p">.</span><span class="n">shape</span><span class="p">],</span> <span class="n">name</span><span class="o">=</span><span class="n">name</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># for tf.cos
</span><span class="k">def</span> <span class="nf">do_cos</span><span class="p">(</span><span class="n">tensor</span><span class="p">,</span> <span class="n">sin_tensor</span><span class="p">):</span>
<span class="k">if</span> <span class="n">FOR_TFLITE</span><span class="p">:</span>
<span class="c1"># calculate cos from sin
</span> <span class="c1"># since cos is not supported as of 1.13.1
</span> <span class="c1"># cos = (1 - sin**2)** 0.5
</span> <span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="nb">pow</span><span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">tf</span><span class="p">.</span><span class="nb">pow</span><span class="p">(</span><span class="n">sin_tensor</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="mf">0.5</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">cos</span><span class="p">(</span><span class="n">tensor</span><span class="p">)</span>
</code></pre></div></div>In my previous post on building face landmark detection model, the Shapenet paper was implemented in Pytorch. With Pytorch, however, to run the model on mobile requires converting it to Caffe. Though there is tool to take care of that, some operations are not supported and in the case of Shapenet, it was not something I know how to fix yet. Turn out it was simpler to just re-implement Shapenet in Tensorflow and then convert it to Tensorflow Lite.Viettel’s mobile money and the threat to banks2019-07-13T12:31:40+00:002019-07-13T12:31:40+00:00https://vuamitom.github.io/2019/07/13/viettel-mobile-money-n-threat-to-bank<p>In a recent conversation, my dear friend made the claim that once telecomunication providers like Viettel and VNPT have their mobile money licences approved, they would take over banks as providers of financial service providers. Banks would inevitably be sidelined. I frowned at the notion of an incoming onslaught of telcos as mobile money as a concept is not new. If one looks at it as the transfer and storage of monetary value via cellphone, that concept can be traced back to the era of feature phones. Back then, phone subscription owners could top up others’ mobile numbers with their existing mobile balances. Or services like ringtone, quizzes can charge users via mean of SMS. Thus mobile money has been in limited usage for long. And a few years ago, mobile wallets (e-wallets like ZaloPay, Momo…), smartphone applications that facilitate payment and money transfer via phone, already began their costly quests to acquire users. Then, what’s so big a deal about this mobile money development?</p>
<h3 id="what-is-mobile-money">What is mobile money</h3>
<p>Turns out, what made my friend excited about the mobile money license was not any additional use cases to existing mobile wallets but rather a elimination of requirements. In 2014, the State Bank of Vietnam mandated that each user of mobile wallets must provide a linked bank account of the same named holder [1]. The rule has been instrumental in stopping people from activating their mobile wallet. And it is a hinderance even to people with bank accounts but no internet banking service. My mom is an example.</p>
<p>What changed was that this year, the government is about to allow telecom companies (Viettel, VNPT …) to provide mobile wallet services without linked bank accounts. Now my mom can open her own ViettelPay account and have someone else help her top up the initial balance via transfering, the function that is built in most mobile wallets, and then start paying for her electricity bill right away.</p>
<h3 id="why-is-it-bad-for-banks">Why is it bad for banks</h3>
<p>Banks have long been the gate keeper to consumers’ pocket. They are uniquely paid to keep customers’ idle cash and enjoyed a direct relationship with customers and possessed all the data of customers’ cashflow and solvency. Banks then loan out the cash, pays part of the interest to customers and kept the rest. Besides, deep understanding of customers’ financial capacity allows banks to upsale insurances and mortgages. Traditional payment networks like Visa, Mastercard delegate customer signing up and relationship management to banks.</p>
<p>Despite the economic growth in recent years, only about 30% of Vietnam’s over 15-year-olds population have bank accounts [1] while mobile subscription covers more than 84% [2]. Arguably, Telecom providers would have access to this vast and unfragmented potential market. There are only 3 or 4 telecom companies dominating the local market compared to the great many banks. And acquiring users is just a matter of cultivating existing relationship. Thus telcos are well positioned to push banks to the fringe. Maybe one day, banks, deprived of their direct relationship with customers, would become providers of loans, saving plans on platforms controlled by telcos. Or worse, risk being replaced entirely.</p>
<p><img src="/content/images/unbanked_pop.jpg" alt="" /></p>
<h3 id="why-it-is-alos-an-opportunity">Why it is alos an opportunity.</h3>
<p>Although the change in regulation is bringing forth new competitors to banks’ territory, it would be a mistake to forget opportunities that it would open up to financial institutions.</p>
<p>Firstly, the pie would get bigger and benefit every player, including banks. Instead of buying things with cash, more people would exchange goods and services by weilding the phone. Convenience aside, people would have access to their transaction history, which will be instrumental in determine one’s credit worthiness. Imagine that farmers can apply for loan by bringing their mobile wallets to a nearby bank branch. At the same time, banks would be able to better evaluate risk, avoid bad debt without over looking potential customers.</p>
<p>Secondly, in the long run, different mobile money systems would likely be connected, allow users to transfer fund seemlessly between their different wallets and accounts. When that happens, banks can participate as another wallet providers, competing on quality of services. One can already do zero fee money top up or withdrawal with the likes of ZaloPay, Momo. I’m thinking about a deeper level of interconnected-ness when a ZaloPay user can buys goods from a merchant who uses Momo. Maybe that would requires a shared payment protocol or a new intermediate payment network altogether. But I think it is a goal worth striving for. If an mobile application truly wants to act as a wallet, it should not dictate where the wallet owner can spend his or her money. In the Visa, Mastercard world, I can pay for my meal with either of them since they share the same standard. Besides, in order for mobile money to reach wide adoption, the system must have liquidity. One way to achieve that is to ensure the unhindered flow of money.</p>
<h3 id="what-should-banks-do-now">What should banks do now</h3>
<p>Top of the list is to improve users’ experiences of mobile banking. Banking apps have always been a complex system with many functionalities, which may be one of reason why most banking apps out there feel like an tiring attempt to squeeze many things into a little estate. Opening up e-wallet like ZaloPay or Momo is much faster than openning a bank’s app. That is understandably due partly to the high security level imposed by banks. That said, banks could organize functions into sensitive and less sensitive groups, allowing users to enter the app much quicker for non sensitive functionality and only re-prompt for authentication when important actions are about to be taken. Besides, inter-bank transfer system can be improved upon to rival instant and zero cost transfer offered by e-wallets.</p>
<p>Furthermore, banks should experiment with some form of zero fee mobile first bank account. Opening a bank account involves a trip to nearby branches, which can be time consuming. And keeping an bank account can be costly if one does not have much money to deposit. Not to mention that card payment is not widespread. That is why most people don’t bother to have a bank account yet. Now with the spreading of mobile payment, it can be a possibility that customers can open a bank account instantly via mobile app and allow limited transactions before submitting all documents.</p>
<p>Last but not least, among the unbanked, banks can target youngsters specificially, who will soon grow up to be working and tax paying adults. One idea is to encourage opening bank accounts to children or dependants of existing customers. In an increasingly cashless world, parents still need to give their children pocket money. Opening family bank account meets that need and also secure a relationship with future customers for banks.</p>
<h3 id="references">References</h3>
<ol>
<li>
<p><a href="https://asia.nikkei.com/Editor-s-Picks/FT-Confidential-Research/Red-tape-holds-Vietnam-back-in-digital-payments">https://asia.nikkei.com/Editor-s-Picks/FT-Confidential-Research/Red-tape-holds-Vietnam-back-in-digital-payments</a></p>
</li>
<li>
<p><a href="https://vietnamnews.vn/economy/418482/smartphone-users-cover-84-of-vn-population.html">https://vietnamnews.vn/economy/418482/smartphone-users-cover-84-of-vn-population.html</a></p>
</li>
</ol>In a recent conversation, my dear friend made the claim that once telecomunication providers like Viettel and VNPT have their mobile money licences approved, they would take over banks as providers of financial service providers. Banks would inevitably be sidelined. I frowned at the notion of an incoming onslaught of telcos as mobile money as a concept is not new. If one looks at it as the transfer and storage of monetary value via cellphone, that concept can be traced back to the era of feature phones. Back then, phone subscription owners could top up others’ mobile numbers with their existing mobile balances. Or services like ringtone, quizzes can charge users via mean of SMS. Thus mobile money has been in limited usage for long. And a few years ago, mobile wallets (e-wallets like ZaloPay, Momo…), smartphone applications that facilitate payment and money transfer via phone, already began their costly quests to acquire users. Then, what’s so big a deal about this mobile money development?Why the Libra was created2019-06-20T00:31:40+00:002019-06-20T00:31:40+00:00https://vuamitom.github.io/2019/06/20/facebook-and-the-libra<p>The day before yesterday, Facebook <a href="https://www.facebook.com/zuck/posts/10107693323579671">annouced its new currency</a>, the Libra as part of its effort to break into the payment market. Trying to eat the payment cake has been a long anticipated move for Facebook, since its Chinese social messaging counter part, WeChat, has demonstrated how widespread mobile payment can be. But unlike, WeChat, Facebook does not only provide a payment service, but it goes as far as to create a blockchain based currency. Why the trouble?</p>
<p>Imagine that Facebook did it the Whatsapp’s way, which is to be a payment provider. It would then sit between customers and merchants in transactions in fiat money (USD, SGD, VND…). In order to buy a cup of tea, the customers would use the Facebook’s mobile wallet app to scan the product’s QR code, tap agree to pay, and then both the merchant’s wallet balance would be credited, and finally the customer can enjoy his cup of tea. But how money goes from the customer’s bank account to the merchant’s is not that straightforward. Besides flipping bytes and bits to reflect the new balance on the merchant and customer’s digital wallet, Facebook would need to rely on existing payment infrastructure to carry out the transaction. They have a few choices.</p>
<p>The simplest would be to trigger an inter-bank transaction between the user’s bank and the merchant’s. Interbank payment is a system that enables cross-border transactions between banks which are identifiable by their SWIFT code. If the two banks reside within the same country, they may rely on the national network of payment instead. This approach is naiive as it results in high cost per transaction and it would take up to a few days to complete. Not to mention that if Facebook want to charge a small transaction fee, such micro payment would be prohibitively costly under this system.</p>
<p>Alternatively, existing payment network such as Visa or Mastercard would happily facilitate those transactions and take a good bite of the cake. This approach may be the simplest but again very costly.</p>
<p>The above two approaches are either slow or expensive. So one solution that is pretty popular among digital wallets is to make use of the fact that intra-bank transactions are free. Say the merchant and customer to the transaction being discussed hold accounts with bank A and B respectively. Facebook would then open their own accounts with these 2 banks. If a cup of coffee cost 80.000 VND, customer’s account would be debitted and Facebook’s account at bank B creditted with the same amount. At the same time, for bank A, Facebook’s account would be debitted and the merchant’s account creditted with 80.000 VND. No money moves out of a bank’s balance sheet. I guess this is the approach taken by most local digial wallets like ZaloPay, Momo and ViettelPay. However, Facebook operates at a much larger scale, which results in a more complicated issue. Opening equivalent accounts with major banks in every country in which Facebook has business can be too much of a burden. This is a problem not faced by any other digital wallet providers. Not even WeChat Pay or Alipay. Since the latters mostly serve customers within China.</p>
<p>So what solution is left for Facebook. Crytocurrency! which allows near instant transaction without middle parties. The reason Facebook wants to issue their own currency instead of adopting existing stable coins (USDT, USDC…) may be technical since none of the existing networks meet the requirement of expected transactions per second. Bitcoin is both volatile in value and vulnerable to high transaction fee (gas price).</p>
<p>At the moment, the move to create the Libra looks like a decision motivated by neccessity for Facebook. Whether they will be able to pursuade law makers, onboard backers and enroll merchants is remained to be seen. If Facebook manages to pull it off, its impact would be far and wide.</p>The day before yesterday, Facebook annouced its new currency, the Libra as part of its effort to break into the payment market. Trying to eat the payment cake has been a long anticipated move for Facebook, since its Chinese social messaging counter part, WeChat, has demonstrated how widespread mobile payment can be. But unlike, WeChat, Facebook does not only provide a payment service, but it goes as far as to create a blockchain based currency. Why the trouble?Use MobileNetV2 as feature extractor in Tensorflow2019-06-19T21:41:40+00:002019-06-19T21:41:40+00:00https://vuamitom.github.io/2019/06/19/train-mobilenetv2-tensorflow<p>Applying machine learning in image processing tasks sometimes feel like toying with Lego blocks. One base block to extract feature vectors from images, another block to classify… Popular choices of feature extractors are MobileNet, ResNet, Inception. And as with any other engineering problem, choosing a feature extractor is about considering trade-offs between speed, accuracy, and size. For my current task of dealing with ML on mobile devices, <a href="https://arxiv.org/abs/1801.04381">MobileNetV2</a> seem to be a good fit as it is fast, quantization friendly and does not sacrifice too much of accuracy. Tensorflow provides a <a href="https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet">reference implementation</a> of MobileNetV2 that makes using it much easier.</p>
<p>First thing first, clone the <a href="https://github.com/tensorflow/models">repo</a> and add its to Python path</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">sys</span>
<span class="n">sys</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="s">'/path/to/tensorflow/models/research/slim'</span><span class="p">)</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">mobilenet_v2.mobilenet_base</code> returns output tensors that are convolved with input image. For MobileNetV2, the last layer is <code class="language-plaintext highlighter-rouge">layer_20</code>. Output from mobilenet can be used for classification or as input to ssdlite for object detection.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="k">def</span> <span class="nf">mobilenet</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">is_training</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">variable_scope</span><span class="p">(</span><span class="s">'myscope'</span><span class="p">,</span> <span class="n">reuse</span><span class="o">=</span><span class="n">reuse_weights</span><span class="p">)</span> <span class="k">as</span> <span class="n">scope</span><span class="p">:</span>
<span class="k">with</span> <span class="n">slim</span><span class="p">.</span><span class="n">arg_scope</span><span class="p">(</span>
<span class="n">mobilenet_v2</span><span class="p">.</span><span class="n">training_scope</span><span class="p">(</span><span class="n">is_training</span><span class="o">=</span><span class="n">is_training</span><span class="p">,</span> <span class="n">bn_decay</span><span class="o">=</span><span class="mf">0.9997</span><span class="p">)),</span> \
<span class="n">slim</span><span class="p">.</span><span class="n">arg_scope</span><span class="p">(</span>
<span class="p">[</span><span class="n">mobilenet</span><span class="p">.</span><span class="n">depth_multiplier</span><span class="p">],</span> <span class="n">min_depth</span><span class="o">=</span><span class="mi">16</span><span class="p">):</span>
<span class="k">with</span> <span class="p">(</span><span class="n">context_manager</span><span class="p">.</span><span class="n">IdentityContextManager</span><span class="p">()):</span>
<span class="n">_</span><span class="p">,</span> <span class="n">image_features</span> <span class="o">=</span> <span class="n">mobilenet_v2</span><span class="p">.</span><span class="n">mobilenet_base</span><span class="p">(</span>
<span class="n">od_ops</span><span class="p">.</span><span class="n">pad_to_multiple</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="mi">32</span><span class="p">),</span>
<span class="n">depth_multiplier</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span>
<span class="n">is_training</span><span class="o">=</span><span class="n">is_training</span><span class="p">,</span>
<span class="n">use_explicit_padding</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">scope</span><span class="o">=</span><span class="n">scope</span><span class="p">)</span>
<span class="c1"># last layer of moblilenetV2 is layer_20
</span> <span class="c1"># this is for demonstration purpuse.
</span> <span class="c1"># Layers such as fully_connected can be put here.
</span> <span class="k">return</span> <span class="n">image_features</span><span class="p">[</span><span class="s">'layer_20'</span><span class="p">]</span>
</code></pre></div></div>
<p>It is a pretty straight forward process. There is only one point to note, the <code class="language-plaintext highlighter-rouge">is_training</code> flag. When it is set to True, BatchNorm layers accumulate statistics such as moving variance and moving mean. When it is False, it would just use those values. When I first trained my MobileNetV2 based network, though loss value decreased fine, prediction worked as expected. Just that the model would produce junk output when <code class="language-plaintext highlighter-rouge">is_training</code> is set to False for evaluation. And quantizing the model seemed to make it gibberish. Turned out, I need to make the update of moving variances and means a dependency of training ops, which is cautioned <a href="https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py#L473">here</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">update_ops</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">compat</span><span class="p">.</span><span class="n">v1</span><span class="p">.</span><span class="n">get_collection</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">GraphKeys</span><span class="p">.</span><span class="n">UPDATE_OPS</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">control_dependencies</span><span class="p">(</span><span class="n">update_ops</span><span class="p">):</span>
<span class="c1"># add dependency on "moving avg" for batch_norm
</span> <span class="n">train_op</span> <span class="o">=</span> <span class="n">optimizer</span><span class="p">.</span><span class="n">minimize</span><span class="p">(</span><span class="n">l1_loss</span><span class="p">,</span> <span class="n">global_step</span><span class="p">)</span>
</code></pre></div></div>
<p>Without it, Tensorflow does not write those statistics as part of the model, which breaks it after quantization.</p>Applying machine learning in image processing tasks sometimes feel like toying with Lego blocks. One base block to extract feature vectors from images, another block to classify… Popular choices of feature extractors are MobileNet, ResNet, Inception. And as with any other engineering problem, choosing a feature extractor is about considering trade-offs between speed, accuracy, and size. For my current task of dealing with ML on mobile devices, MobileNetV2 seem to be a good fit as it is fast, quantization friendly and does not sacrifice too much of accuracy. Tensorflow provides a reference implementation of MobileNetV2 that makes using it much easier.