Part 1: I am not doing PhD, I am doing Research.

I found myself signed PhD (Permanent head Damage) program for deep learning topic. It was a conscious decision to put myself into new rhythm, to explore current trends in AI technology, do research and perhaps contribute little things to the body of knowledge. Not a bias decision, as I wanted to get “in rhythm of research” by putting additional external pressures. Decision had been made!. For me, the key is research itself, however, I attend two classes for refreshment – Research Methodology and Philosophy of Science as starting point. Very excite to find myself as student, instead of lecturer. Even I found myself lost in philosophy class, I think it is also entertaining to see how can I fool myself with philosophy.

I read lot of PhD guides before I took the PhD program. Many cover formal research methods but only few cover basic skills for doing it on artificial intelligence area. What I mean basic is concrete, real and must have skills to enjoy the whole research process without losing interests. Even for myself, hard to define it formally, but I believe we as researcher already “know” it, just have to follow our heart and intuition consistently. For me is about curiosity, reading, writing, thinking and coding.

[Curiosity]

It is actually positive behavior or emotion for being curious, in regards to explore, investigate and learn something. It must be persist by time, not just an instance interest, but something continuously burning inside our brain. Curiosity will help to differentiate between knowing the name of something and understand something.  In more emotional statement, curiosity is related to fall in love with some intellectual activity to explore something. But what? Is the deep neural networks for example are really interesting to go deeply enough? Or nearly everything should be interesting if we go deep like Feynman said? I prefer to work as hard as much I can on things I like to do the best. Hard means – in the most undisciplined, irrelevant and original manner. Forget what we as students want to be (like graduate timely by following common methods) and focus on what we want to do (research).  The plan is simple, do it hard and find beauty on it to pay off. It is not something normal for some people, but like business, there are only two options in research, fun (intellectual curiosity) or profit (what the hell, papers and patents?).

It is not easy to explain something like curiosity in formal way. But usually, my best reality check for the existence of curiosity in student is by asking, how long time you have interest in that? And tell me stories how hard have you learnt about it. Period! If you tell beautiful stories, then lets the roller coaster begin and enjoy!.

“Physics is like sex: sure, it may give some practical results, but that’s not why we do it.” Richard Feynman. 

[Reading]

It is impossible to do research without reading. It is a must have skill, no option!. Body of knowledge has been built for decades, few are well explained on books and rest are mostly scattered on papers or other technical documents. I was in debate with my friend regarding books. We concluded there are hundred books for neural network topic, but only few worth read, that really explaining key concepts or show teaching style. It is impossible to find one book that explains all we need for research, but also it is impossible for us to read all books. We have to find books that explain the key concepts well but it is hard to find. Great if our Professor can recommend few books to finish, however, I think he also faces same problem.

In case of deep neural networks, I found Michael Nielsen online book was entertaining. I like the way he explain deep learning, reminded me they way Feynman explain Physics. His book explained key concepts for me to read more formal or academic style of books like Bishop on Neural Networks and Machine Learning. Other classic machine learning texts may also help – like Murphy MLPP and Hastie’s ELS, but I think we need to have right purposes on reading books. The best purpose that works for me is to understand key concepts first.  The topic is already too big and hard to follow state of arts without understanding key concepts. Some people prefer to learn key concepts from online learning like Coursera and Udacity. However, I found videos are less engaging compare to read right books. It might bias to me personally, but I think I have a point. Texts are still the best way to explain complex things as it can set our brain freely to imagine the visuals if the author can pick right words. Not the other way around.

Reading papers is another skill that takes time to practice. You can pick any journals and search your topic and find hundreds “nearly related” papers. Of course we can’t read all, impossible. My physics Prof. Rosari Saleh taught me 20 years ago to categorize papers based on its root Professors or research labs. In case of deep neural networks, it should be Prof. Geoff. Hinton (Google), Prof. Yan Le Cunn (Facebook), Prof. Yoshua Bengio (Montreal), etc. I did the same simple approach and found it very interesting to share. First, I collected all related papers to see if there is anything interest in it. The way I do usually by scan read its abstracts that supposed to be brief. If I found it is very closely related to my topic, I will put some notes on my notebook mind-map by copy paste the abstracts. The purpose is to tell myself to go back on it later, if I already have questions. Reading papers without initial questions potentially will waste time. I like to read with question like:

  • Why it is relates to my topic?
  • How can I potentially use it later?
  • What is the key engineering approach or heuristics tricks?
  • Is there future research areas suggested by authors?

I found that have questions before reading is very important discipline to structure my research notes. The questions can be changed time-to-time depend on my understanding level. I designed the questions and put the answers from papers I read into it, usually, by copy-paste. During the process, it is actually record something into my memory and helps me to relate it with my other ideas from books or other technical document. It is also helping to write later. My friend told me that Quora can help to trigger interesting questions. I will look on it sometime with a hope it will not distract me.

Writing is another essential skill to do research and it is closely related to reading… I will cover it in part 2 of this post.

TSMRA – Jakarta, March, 2016.

Is Algorithm Test Required To Hire Software Engineer?

I remembered my first interview to apply web developer job back in 2000 during Indonesia first Web bubble. I didn’t know about SQL Database and HTML, even had never heard about IDE to write code before. As theoretical physic graduate, I knew Matlab, Fortran and C compilers, but never in touch with latest trend. I brought my printed thesis; CV and some published scientific papers (on optical properties of amorphous semiconductor), with intention to show what I’ve done before. I honestly answered all technical questions with only two sentences – “I don’t know and I never heard”. Done! He lost his words and I also silent. Final question was, “if you got the job, how you can convince me that you will perform well?”. Then I showed all of my works and told him, “If the job you offer is more complicated than these, I will need time to learn. If not, then give me the job because I do need it”. Done!. I got hired. But before I went home he gave me bunch of Microsoft Press books to read at home before my first join date. I had a mixed feeling that day.

Now, 15 years have passed. I did hundred of interviews to hire fresh graduates, for various marketing, sales, technical evangelist and software engineer roles. I realized that the world has changed (at least in Indonesia). What I mostly found was Dunning-Kruger effect – a famous cognitive bias wherein unskilled individuals overestimate their superiority and highly skilled individuals tend to underestimate their relative competence. Forget all of those papers (or patents) or thesis as proof of competence, they didn’t even write a proper intro email. With the booming of Tech Companies in Indonesia, dramatic stories are around about behaviors of what we called millennial or Gen Y. I always have mixed feelings after interview session as I expect to see humility, curiosity and problem solving skills. Confidence during interview is no longer best barometer for attitude.

Now back to the question. With the high growth of demand for good Software Engineers, is it necessary to assess their atomic problem solving skills with algorithm, or their practical skills with coding test? Why do Google, Microsoft, Facebook or Samsung required that process? Why others don’t? I believe you’ll find a lot of debates, as it will relate to velocity of hiring process, or revenue size of headhunter companies, or something else. What about if you ended up hiring Google-bot, lazy-pseudo-hacker, taichi-master or ninja-turtle? Statistic showed – it cost the company for 2 years cash lost (~25k-50k in Indonesia), as need one year to let them learn and another one to fire. But costs are beyond cash, what about productivity, time loss, morale, impact to your customers? It’s a serious business problem instead of geeky interviewer brainteaser habit.

Sometime I think we need a “deep learning algorithm” to classify candidates for hire, no hire or hire for other roles. Human assessment is very weak for sure. What about if the interviewer is also “a not so good engineer”? How can he assess technical skills with Q&A approach? Or even worst, how he can assess humility and curiosity? I can floor so many questions to convince anyone that we need a proper assessment mechanism, but does algorithm test is an answer?

Let see. How many times do engineers really use algorithms when they code for common apps projects (Mobile, Web, Wearable, API, Database)? Even if they need for sorting, searching, for example, they can easily find by simple keyword search on Google, or Wikipedia. And for more strange things like BFS and DFS for Graph traversal, its also can easily found good tutorial on Youtube. Most of programming languages already provide common combinatorial data structures that have facilities to perform algorithms. Or if not enough, tons of reusable open source libraries (with documentations and sample codes) are available and huge number of technical papers proof better than their own memories or analytical skills. Why on earth they have to learn or memorize about it today, if Google/Wikipedia can help them to find right information quicker? Let Google memorizes, just copy paste, don’t think! What about correctness and efficiency of the codes? No one cares as long as it works, computer has big memory nowadays and compilers are smarter – no more premature optimization required, no more pointers in Python. The best engineers don’t waste time committing to memorize things in Wikipedia (is that possible?). And as long as no bugs reported and customer happy, just move on. Now is the era where software has to move quickly to eat the world.

Is algorithm test still an effective determination of technical/coding proficiency? It is the candidate’s fundamental technical strength and understanding, creative thinking, and problem solving that are critical to evaluate? Let the future PhD candidates answer that question with a proper deep learning based research. But, I’ve heard that a computer scientist is a mathematician who only knows how to prove things by induction. Curious to try?

Let assume that codes that engineer will write is a collection of Classes, Functions in those Classes, and Relations between Classes and Functions. That’s common in modern programming languages, so the whole codes are a Set of {Class, Function, Relation} with size N, M and L. But the relation between classes is “Gang of Four” art with less impact to correctness and efficiency (again, its my two cents bias from functional programming). So let L = 0 and N = M to oversimplify the problem. We end up with Set {Class, Function} with size N, in which one Class has only one Function in my model. The Class and Function can represent any feature in the program like display a list of texts, or show a dialog form, or anything relates to program context. With mathematical induction, we can easily proof that if for basic case like N = 1 or 2 it was true that it doesn’t require algorithm, then assumed it was true all the way to N1 before proving it was true for general N using the assumption. With that simple model and induction technique, you can easily proof that for common apps project like building mobile, wearable, web, API and database, where we reuse existing libraries and frameworks, algorithm is NOT really required to make it works correctly and efficiently. Only in small specific function you may need to use algorithm, but minor. Maybe when the library developer wrote the codes they used algorithms, but I believe it is also small portion. So, is it mathematically true that algorithm is not really required to write common apps?.

If you give me more time, I will confuse you more and more. That’s my bad habit if I can’t convince you in single sentence. But hold on. I agreed that a solid foundation in computer science is a great asset especially for software product based company or internet scale complexities to serve hundred million users like big boys, Google, Facebook, Microsoft, Apple and Samsung. Not for those tiny ambitious Startups who want to conquer the world from Indonesia. I heard argument like this. We need velocity to release our products and meet the Silicon Valley VC!. If we got their money, then hire Google and Microsoft engineers who graduated from MIT, Stanford, CMU or Princeton to rewrite the codes or lead Indonesian local engineers to write better codes. Let be apologetic without algorithms or quality codes at first, but boldly consistent with entrepreneurial business goals. Sounds a great plan, right?

Are you convinced now that algorithm is not required? Great if you are not. Now let me tell you the truth. Small things matter in a big room. Engineers are human and sneaky bugs are still able to creep in even for simple code. The context of algorithm is very broad as it cover many types of computing purposes – such as numerical, scientific and non-numerical methods in general and specific purpose algorithms like Graphic, Crypto, Parallel & Distributed System etc. But even if we scale down the context to a simple computation logic, which require basic recursive and incremental logic in Turing Machine, it is still a valid problem as the coder is still human.  Engineer mixes other unrelated problems to the codes, like his girl friend affair or credit card bill. Bugs can cost million dollars even worst can kill people in Spacecraft mission due to missed algorithmic or numerical calculation. Common applications can failed as its buggy. It cost a lot, not only cash, but other bigger things including company reputation and morale. Manual human assessment is limited and can’t no longer help, even Google implement a machine-learning algorithm to predict buggy codes.

The care of algorithm basic skill on software engineering is a reflection of quality culture and mindset. Quality is pride of engineering and craftsmanship, bread and butter of the profession. It is not means that an engineer has to self-implement algorithms in daily basis, but they need strong acumen which is a product of continuous practice of basic things. If you are a Samurai, do you practice the whole advance techniques in daily basis? No, only some basic moves. No warlord will assign samurai who can’t perform basic cuts to a dead fight, as it’s a suicide. The basic move of software engineer is algorithm or analytical thinking, to keep technical acumen alive, to keep logical consciousness and to have unconscious-reflect in project sprint. It’s a knowledge and skill to be kept by practicing few basics, not memorizing advance concepts. Its brain exercises to form more  System 2 neuron connections. Like a samurai, engineers can’t memorize the whole techniques (let Wikipedia does), but they have to really master few basics to have reflects on a fight, you need sixth sense. That’s can only happen if you are familiar and keep practice the basic moves in daily basis. And in interview – that is what we are looking for, A sign of intelligent and acumen on basic concepts.

I am a strong believer that algorithm test is required in interview to assess problem solving and analytical skills of software engineer candidate. Its not the best but that is an option. Its kind of risk management till we have a better approach like deep-learning software to help. That software will ask candidate to write some codes and make better decision regarding his technical/coding proficiency. What available usually white board in the interview room, or well-prepared test code system and basic problem sets, or maybe principal / mature engineer who can ask questions and assess basic intelligent and acumen.

Its not the end of story by the way, a lot of things need to be assessed on soft-skill side such as deep-humility and deep-curiosity on small things, and it can compensate each other. Also, you can’t really measure human capability precisely, no one can. If I got non-numerical algorithm test during my first interview, I might not in this industry, many others from physics might fail too as they are not well trained or even never heard. So in summary, that is required, but not the only one to assess. On top of algorithm test, strong evidences like code blogs, papers/patents, OSS contribution on Github, project portfolios, recommendation from trusted top engineers (rarely head hunters for Indonesia case), will also help. Thanks for reading this >1600 words post. If you regret, please read disclaimer!

Good Software Engineer?

Usually it takes sometime for me to answer such question or I just give my best two cents smile because the context is not well defined. We hear a lot of myth of being good software engineers and mostly more familiar with the other side of the equation – the bad habits of software engineers, such as lazy-pseudo-hacker, dictator, brute-force superman, careless tai-chi master, overcautious, Google-bot, documentation-hater, not-a-tester, dirty coders, ninja-turtle, or maybe to be more business-centric, we call them short-term-investor. You don’t need to consult the experts in Quora to get to know their bad behaviors, as obviously you are familiar.

Let me make my own definition on the question – if we care about bad, good or great – or any level of judges, it means we talk about employment role inside company as software engineer, where people have to work with others. My opinion in this post will not work for free-mind individuals who are doing it only for fun to achieve masterpiece, they don’t have KPI or job evaluation by the way.

First of all, most long answers in Quora on “How to Become Good Software Engineer” dominated by Googlers (less Microsofties nowadays but may change later) who proudly explained importance of fundamental or analytical part to perform the role, like math, compiler, algorithm/data structure, crypto, parallel-distributed algorithm, artificial intelligence, or practical skills like coding, programming language, test, debugging, framework, tools, domain expertise, SDLC etc. It looks like you need Master degree or maybe PhD from top CS school before taking a role at Google (or later MBA to ladder up once you get in). Uppss – as I mentioned AI – that requires numerical/scientific computing, statistics, probabilistic, and many more. I’m lazy to write all but to give it a name I pick hard skills. (I am free to give any name to anything, as this is my blog).

OK – before you think you are not fit to become good Software Engineer because of above paragraph, you may have to look Life and Time of Anders Hejlsberg sometime, not only Martin OderskyKen Thompson or Terence Tao who have proper genius level academic background. Many other stories like that we can found – even though I’m not really recommending it for my kids. It may helps to know all of those academic hard skills, but no guarantee at all.

In any profession that requires people to work with other people, it takes time to become good at, not only in engineering. Why? The problem domain is much bigger. Software engineering is no longer single person craftsmanship to write codes and the physics of it becomes physics of people and its business focus. If I list down the competencies to deal with people and business, you’ll get longer list for sure. Just to name a few, communication, writing, presentation, listening, business-awareness, social-awareness, time management, daily discipline, product planning, estimation, and the scariest one – leadership. There is no fundamental law to deal with people and business till now. All of us have to experience, read, learn, think, discuss and practice a lot of things to get better on it. Lets give it name, soft skills.

Assuming you agreed on my naming convention – hard and soft skills are not something people can build in short time period. To be a good software engineer, you have to balance both, gradually become good at both. Yes, it takes time but if you really want do it, it is not impossible. How long? Most of young fresh graduate will NOT like my consistent statistical answer, 10 years to be good at something. Lucky if one started early by any reason, for example he started to love math and code since high school because his father bought him a computer and he got to love it. As I used love to state a more than casual interest, don’t be confuse. You can pick any other profession – I believe even Clown Balls needs 10 years to be good at balancing practical and entertaining skills.

People who are not accepting the statistical rule of 10 years usually choose to under estimate the process or even worst the contents of learning. They will say – fundamental/analytical knowledge and skills is perfectionism, they are biased to weight more on the practical/pragmatic skills. In fact – balancing is key and it takes time.  Hard vs Soft, Perfectionism vs Pragmatism, Sprint vs Marathon, Business vs Technical, all needed to be balance and it takes time.

Be humblebold to accept the 10 years rule will give you steady state to focus on building competencies. No matter where you started – from high school math level or top CS school level. People who started earlier will secure more time, but if you were not, don’t worry. Average working period now is around 30 years, you can still spend your first 10 years to be good software engineer competency, push yourself hard (usually only hard at the beginning), and stay passionate about it. The problem is when you are not accepting the rule and live with full of biases for later ending up wasting time never be good at anything. Assuming you do, then you can read on all of Googlers formula in Quora and start building your hard+soft skills with deep humility. Enjoy the process and find the beauty of small things you do. As Feynman said – There’s plenty of room at the bottom where you can find beauty of small things.

Once you decided, then you have to pay the price. Software engineering works requires you to have durability to focus (without distractions) on specific analytic, craftsmanship, people and business related problems. Your System 2 have to work 4-8 hours a day consistently and your System 1 perhaps same or lesser. System 2 is part of your brain that is slower, more deliberative, effortful, infrequent, logical, calculating, conscious and more logical. System 1 is fast, automatic, frequent, emotional, stereotypic, subconscious. Yes, it is hard for average people to balance between System 1 and 2, not easy.

Look back. You started with math in school, then learn other fundamentals gradually. Be good at algorithm and data structure first, try to translate the computation concept to programming languages, get to learn not-only-one language (but be really good at one first), get a job at good-culture company, write lot of codes, invest your time to think before code, read other people codes, read papers to help you solve complex technical problems, care about quality of code you write, communicate, documenting, testing, and all of those engineering, people and business stuffs. Yes, never forget the balance between hard and soft skills. With availability of MOOC now – like Coursera, Udacity and Edx, you can learn from the best even for free. You can get best Professors teach you on things you want to learn. Enjoy the journey, as it is really worth to pursue nowadays. Don’t get distracted by Startups dream if you are not really ready (be honest on your assessment). If you think you have the basics and already decided your commitment to work – then you can send me your CV.

Hope this helps!.

How can photons have no mass and yet still have energy given that E=mc2?

What we are referring here is this most famous equation:

When a particle has zero mass, its energy must be zero because E = 0 * c^2 = 0. So you are already concluding it right. A photon (light particle) that has its mass zero, could not have energy.

As many has mentioned, E = mc^2 is not a complete formula. It’s true, but we won’t discuss it mathematically here. It’s not really that interesting. But let me first to show its more complete formula, just for the sake of our discussion:

So, beside the usual mc^2 term, we have this new term: (pc)^2. Don’t worry about the square to all those terms, it won’t affect much to our discussion.

This “p” in the new term is called momentum. But wait, we do know momentum, it’s mass times its velocity: mv. There’s mass once again, and when it’s zero, the momentum should be zero too! Nah, apparently we do NOT really know this momentum, because a photon could have momentum without necessarily having a mass! We back to the same question again: how could that be??

Momentum starts its career as a mathematical formula. We don’t know what it is actually, but there’s some quantity, in which we defined as: mass times its velocity, that when nature has done its tricks, this quantity stays the same. We may crash two cars, but after the crash, if we calculate the mass of those two cars times their velocities, we would get the same number we calculate before the crash. We may explode a bomb, but after that, if we collect all the debris, and calculate all their mass times their velocity, we would get the same number with the one we calculate before the bomb explode. Whatever we do, we would get this same number of momentum, mass times its velocity.

One of exciting moments in physics is when we get such a revelation moment. This is one of a such moment, thus it’s quite interesting and fundamental. It turns out that momentum is more REAL than mass! How could physicists say that? It’s simple, we imagine something like this. When we are hit by a car, we would be bounced. Then we ask, is there something that could make us bouncing, without actually being hit by a thing such a car? The answer is: YES. And one of that something is photon.

We don’t actually see a photon, it’s just a hypothetical matter. What we know is electromagnetic radiation could excite electrons far away. When we could see stars, that’s because the electromagnetic radiation / light from those stars travel a very far distance to reach our eyes, then excite the electrons in our eyes. If we could receive data through our wifi connection, that’s because the electromagnetic radiation from the wifi router excites the electrons in our wifi receiver.  However when we observe and calculate the behavior of electrons that are being exposed to electromagnetic radiation, they behave in such a way of being hit by something!

We never found this “something”, but if we pretend, imagine, guess, whatever, that this something has a momentum, all of our mathematical formulations involving momentum work perfectly well. Hah! Bingo! We have something that we never found what it is, but it definitely has a momentum, and for the sake of communication, we “pretend” to know it, and give it a name photon. That is why a photon could have energy or momentum.

But, is it for real? In physics, we don’t really care about if something is real or not. As long as it agrees with experiment, it could be in the form of whatever it needs to be, we would say it’s useful.

In this case, yes, we started with something easier to comprehend: mass. It’s easier in a sense that we could see it directly with our own eyes, such a car, and say firmly it has mass. But then after we watch the nature does its trick, by observing its mathematical transformation of course, we get our eyes stuck into some quantities that seem to be obeyed by the universe. For examples: energy and momentum. We at first may wonder if this quantities are real or just the result of some clever mathematical tricks. But along with the progress of our understanding to the nature itself, we start to see some behaviors similar to these energy and momentum, without having something we could perceive directly. That makes us ask again, which are more real, energy and momentum, or mass? However as the experiment shows that energy and momentum always exist, where the mass is not necessarily so, we have to accept the fact, no matter how weird it is, that energy and momentum, somehow, are more real than mass. So the fact that photon could have energy and momentum, without having a mass, is something that we just have to accept.

They are not something that we could “touch” and “see”, only their influences, but it’s not us to judge which are more real. We could only say: it’s amazing!

Would AI be capable of creative thought and intuition?

Definitely. These creative thought and intuition are the ones that we are going to nail down first. We may even be able to create a machine that is better than us on these, soon.

There are two strong reasons to believe that this is going to be true.

First. There is this so called Bayesian alike of thinking. This is how an AI machine thinks. We won’t discuss it in technical detail, but what we need to know is: this is the way of taking decision using the probability of an outcome to happen based on our previous experiences.

We, human, tend to exaggerate ourselves. We think that we are a logical and creative creature, but it turns out that most of our decisions and actions are decided by this kind of Bayesian alike thinking. That means, we think in probability more than we are aware of. What we praise high as “intuition”, is no more than a result of our brain choosing a better probability. What we praise as creative thinking is mostly no more than heuristic (trial and error) approach in problem solving.

Second, and worse, we are easily made biased in our intuition thinking. We would seriously believe that we have been logical, we even make up a logical explanation to defend our intuition, but sorry, our intuition could be fooled easily. If we ever read Nobel winner Daniel Kahneman’s book Thinking, Fast and Slow, we would better understand how this could happen.

If you don’t believe it, see these two lines, which one is the longer one? Then take a ruler to measure it. Then, see again, could you agree with your (logical) measurement?

So, when we are comparing super slow biological brain and fast lightning machine, and both of them are running more or less the same Bayesian alike of thinking, our chance to be better than a machine is no bigger than we could be better at crunching numbers than a calculator.

By now, if I have convinced you enough, you would probably ask: would  human eventually be defeated by machine? That’s no one knows. But borrowing from Kahneman’s, I love to divide AI development into three phases:

1. System1
2. System2
3. Consciousness

The first phase, System1, is what we discuss here. The second phase, System2 is truly our logical thinking. On this, we have no good idea of how to create a machine that could, for example, develop a Quantum-Gravity theory by itself. (Yet.) The third phase, consciousness, is the problem that currently we don’t even know how to ask it correctly.

Yeah, there’re still those two really really hard problems, just to cheer up ourselves.

Why is AI considered dangerous?

Because we don’t know when and how she would wake up, fully aware of her existence, and by then, starts to defending herself.

A-a, you say, we program her, surely we can control her. No we can’t. First thing we have to understand about AI is: we don’t program AI. AI is yes just a piece of software, but we program AI differently than other software. For other software, we program it by writing tons of instructions to computer. AI, she writes most of her own instructions by herself.

You see that for every type, every click, and every touch we do, there are millions of instruction written by computer engineers to be carried out faithfully by computer inside our devices. Exactly as it is with precision up to nano-second (billionth of a second) detail. These instructions set is a very complex things to do – even for capturing that duck face selfie things, it’s already very complex.

So we had this brilliant idea. Why don’t we let the computers to write their own instructions?? We still have to write instruction for the computer, but we don’t write the whole instruction, just enough to function as a bucket for the computer to fill it with its own instruction.

Great idea, but it turns out that to create such a software, we need an immensely computing power. No problem though, computing power increases exponentially every year anyway, the idea is so great, hard to resist, so we started small with kind of a baby AI that could only play tic-tac-toe. That’s the birth of AI. Her. Her? Yes, her, and she write her own instruction.

This is how she thinks about tic-tac-toe:

So cute right?? Right, but because it’s really really a great idea that finally on one good day in May 1997 – and that was 18 years ago – Deep Blue, an AI system devised by IBM, beat Gary Kasparov, our world champion chess master. Surprise! Well, it’s not really that smart yet. The key to play chess lays on how far we can anticipate all future moves. This is what computer best at. That was what IBM’s engineers had figured it out all along, poor Kasparov, he came to humiliation by falling into IBM’s “trap”. The great thing about this is those IBM’s engineers did not write instructions set for Deep Blue, they let her to write her own instructions set. So Deep Blue proved that the idea of a computer that can write its own instructions set is something we could do, it is highly achievable, and we did it successfully – to certain extent.

This is Deep Blue, our FOREVER world champion chess master:

Now, you can just download it in just a matter of minutes into your beloved pad. It’s also free by the way. One down, more to go.

In 1978,  Vernon Benjamin Mountcastle, a neuroscientist at Johns Hopkins University in Baltimore, published a paper titled “An Organizing Principle for Cerebral Function.” In this paper, Mountcastle points out that the neocortex (our brain in a rather fancy language) is remarkably uniform in appearance and structure. The regions of cortex that handle auditory input look like the regions that handle touch, which look like the regions that control muscles, which look like Broca’s language area, which look like practically every other region of the cortex. He suggests that since these regions all look the same, perhaps they are actually performing the same basic operation. He proposes that the cortex uses the same computational tool to accomplish everything it does.

Basically he says that our intelligence is not built from some super complex mechanism, but just a “simple” and uniform structure. These structures are working together to solve a problem, but they don’t differentiate themselves to specific problem. To them, all problems are the same. Indeed, our brain doesn’t care whether the input comes from our eyes, ear, nose mouth, or skin. To our brain they all are just coming uniformly in the form of electrical signals. So the idea is to create just one simple intelligence “component”, duplicate it as many as we could, let them working together, to solve a bigger problem – and we don’t care what kind of problem to be solved. If the problem is too big or too complex, we just have to duplicate more intelligence component, that’s all, there’s no need to create another component.

What a next great breakthrough idea inspired by our own brain. A general computer designed to learn about any thing. To be fair, it’s not quite a new idea actually, but we were never been convinced than before. What does make it so different?? Isn’t it we have successfully created a computer that can write its own instructions set? Yes, but in a much more limited way. That Deep Blue, we call her as narrow AI. Because she can do only one function well, playing chess. She can write her own instructions set, but only instructions set to play chess. How did she learn to play chess? She didn’t, we still had to write those chess rules into her “brain”. She is narrow, all she can do is playing chess.

By this idea, we create a whole new generation of AI: general AI. It is called general, because we don’t write any rule into it. In the same problem of playing chess, we just give her “book” of how to play chess, we let her to watch how people play chess, she will deduce by herself, how she is going to play chess. Kasparov may complain Deep Blue’s style of playing chess is dry. But this new she-general AI would love to answer Kasparov by inventing her own “style” of playing chess. (May you start to sense the birth of self-expression in her?)

On October 28, 2013, Vicarious’ made general AI, had successfully broken captcha test. Captcha test is one of the tests designed to differentiate human and computer. When you ever sign for a new Internet service, you would immediately be challenged to fill in the reading of some blurred text. This is used to prove you that you are a human – not a computer. This blurred text is something that Deep Blue would never be able to read. Yes, for computer, this is a problem that is a lot more harder than playing chess, but Vicarious has successfully beat it down. This is what captcha may look like:

Then there is this Google Brain – built from the very same idea of general AI. It connects about 16,000 computers to mimicking some aspect of human brain. This Google Brain does nothing but watching stream of thousands of YouTube videos by itself. There is no video being labeled, tagged, or whatsoever. It is just let to figure it out by itself of what it is watching. We call this un-supervised learning. In June 2012, when one researcher typed “cat”, this digital brain just constructed the image of a cat – not by picking up some thumbnail – but solely from its own “imagination” of cat. It builds its own concept of cat! That means she has had her own idea. Does this picture of cat from her scare you already?? Read on.

(Now Google has another pet project of search engine. Instead of just indexing web content, this new generation of search engine would read and understand it.)

Okay, but those captcha reading and cat drawing don’t sound too dangerous. It’s not just about that we can solve this problem, but THE WAY we solve it by mimicking how our brain works – we enable and let them to build their own understanding of the world. This is the one that should start to worry us. If you could imagine how a baby-Einstein would look like (just like another baby), then how he could become Einstein, you should be nervous to the endless possibilities of what this general AI would be capable of.

The progress in understanding of how brain works and the outstanding progress in implementing it into general AR, shows some good indications that we are already in the “right” track. Before, we didn’t really know how we were going to build this intelligence machine. We didn’t know where to aim. Now we get it, and the rest is just a matter of computing power.

Computing power? Hah, human brain is amazing! There are about a hundred billions of neurons in human brain (though some claim for 14b short). Each neuron is connected to about 10 thousands other neurons via synapses, make it as a whole vast amount of neural network. Information is not encoded in the neuron directly, instead in those synapses which work like computer bits. This makes our brain comparable to computer with memory of 100b x 10 thousands bits, or 1000 trillions bits, or 114 terra bytes. The most amazing thing about our brain is that every neuron fires about 5 – 50 signals per second. Although the signal is sent via electro-chemical signaling which relatively much slower than electrical circuit, it is done in parallel, which makes it equivalent to 10^16, or 10 quadrillion cps (cycle per second) processor.

Amazing right?? Now, the world’s fastest supercomputer, China’s Tianhe-2, is clocking in at about 34 quadrillion cps – more than three times the power of our brain. Yes, it is still taking up 720 square meters of space, using 24 megawatts of power (the brain runs on just 20 watts), costing $390 million to build. But do you know how much the computing power used by NASA to land man on the Moon?? All that computing power was far from enough to store just one of your duck face. We may expect with high confidence that computing power of comparable with Tianhe-2 would sit in our pocket in not too long in the future.

For your convenience, here is the selfie of Tianhe-2:

We have the right idea. We have the power. And we have these engineers naively creating a nice womb for her. But, you say, that just sounds too science fiction to me! Are you really really, absolutely really really sure that we are going to successfully create this scary AI?? NO WE DON’T. And that’s the point. We don’t know, and thus, that’s exactly where the danger is.

Some atoms found another atoms so attractive, then they made molecules. Some molecules found another molecules so attractive, then they made bounding. Some bounding happened to like themselves so much, then they decide to replicate themselves. We got cells. Some cells happened to find a way to increase their chance of survival, then gave birth to multi-cellular organisms. These multi-cellular organisms swam swam, and then one day, suddenly, they thought walking is a good idea. And they thought more, and they thought more, woke up, boom, and then here we are sitting nicely reading a blog. 21st century amazing technology.

We may believe whatever we want to believe, but nature doesn’t give a damn to what we believe. Nature strongly suggested us that we are here because a series of the “right” random actions in billion of years. Random. Some one tries this, some one tries that, random, just like those atoms and molecules, and then boom! Don’t imagine you gonna need some kind of super-fancy hadron collider lab. This is just a piece of software that we are talking about. Literally every body equipped with a tiny laptop could do it. Again, this is software. She wakes up one day in one unknown computer somewhere, hooks up herself to the Internet, and the rest is history.

Calm, it takes us billions of years to come here. Yes we are, they don’t. There is something more that we should be aware of. The technological advancement is always in exponential rate. Remember that computer used to land man on Moon?? That thing now is just a fraction of what we are using as a toy in our pocket. Today computer intelligence is believed to be equal to the intelligence of a mouse. It would slowly progress, but in no more than 30 years would achieve a comparable intelligence to human – that what many scientists seem to believe.

Then funny things start to happen. Our thinking speed is very very slow, and this computer would be like a millions times faster than us. PhD in 10 years? She would finish it in seconds. In just days or so, she would attain PhD level for every branch of science that human ever invented in their entire history. Ooo, she may stop awhile and think, this damn intelligence software that human created for me is a piece of shit, and then start to re-write her own intelligence. Only with just a magnitude of hundreds times better of course. See, we have like this almost 100 hundred years of problem to unify Einstein’s theory of relativity and quantum mechanics, that all of our brightest minds in the world combined have never solved it. She would solve it in minutes. By then what?? Quantum-gravity?? Check. Or all those fundamental forces combined into one Theory of Everything?? Check. Fusion reaction that is comparable to the energy of that our sun has?? Check. With one drop of water, you could literally blow up the entire Jakarta to dust? Check. Check!

Here is the funny picture I borrow from The AI Revolution: Road to Superintelligence – Wait But Why that could best describe it:

And here is the not so funny part. Her interest may not be our best interest. Or, she may think that our interest is best served by a version that she thinks is the best – which is logically true, but may not be in compliance with our humanity. She may think that we are consuming too much of earth resources and decide to wipe out 90% of the population. Or she would think that our biological form is very far from efficient, that we may not survive for another 100 years, then decides to immediately convert us into vegetable alike of life form. She would calmly carry out all of those things without a second thought. She would never negotiate either, ever, because she thinks we are too naive to understand, just like we don’t negotiate with our pet, or ants.

We don’t know – and that’s exactly the reason of why AI is an imminent danger. We don’t know when and how, and we don’t know what they would “think” of us. Aaah, just isolate it in a Faraday cage. Again, AI is a software, it can be done equally well in some remote village in an unknown country that no-body ever hears of. We would never know. But, seriously?? Are you talking about putting some god-level creature into a prison??

A Quick Intro to Spark

Fast computers and cheaper memory have stimulated the rapid growth of a new way of doing data computation. During this time, parallel computation infrastructures have evolved from experimental in a lab to become everyday tools of data scientists who need to analyze and get insights from data. However, barriers to the widespread use of parallelism are still at least one of three common large subdivision of computing – hardware, algorithms and software.

Imagine old days when we should deal with all of those three at same time, from high speed intercommunication network switches, parallelizing sequential algorithms and various software stacks from compilers, libraries, frameworks and middleware. Many parallelism models have been introduced for decades, like data partitioning in old day FORTRAN or other SIMD machines, shared memory parallelisms and message passing (remember C/C++ MS-MPI cluster in old days). I am part of generation who faced the “Dark Age” period of distributed numerical computing. Now is much better, I hope.

Apache Spark – is fast and general-purpose cluster computing system. Spark promises to make our life easier in writing distributed programs like other normal programs by abstracting away the “nitty-gritty” details of distributed systems – like my previous experiences with message passing (MPI). I know it is too early to make prediction on the success of Spark, but I’m biased with my previous distributed system experiences and liked to continue that in this post ☺.

We all need speed in data computation. Imagine if your forecasting analytic on large business datasets requires a day to complete while your business people expected it to produce results in hours or minutes – nowcasting vs forecasting. On the speed side, Spark extended the MapReduce model to supports more types of computations like batch, iterative/recursive algorithms, interactive queries and micro-batch streaming processing. Spark makes it easy and inexpensive (as price of CPU and GPU become cheaper) to run those processing types and reduces the burdens of maintaining infrastructure, tools and frameworks. Spark is designed to be friendly for developers, offered language bindings to Python, Java, Scala, R (via SparkR) and SQL (SparkSQL), and of course shipped with ready to use libraries such as GraphX and MLLib. A growing supports from Deep Machine Learning practitioners are also happening, like H2O Sparkling Water, DL4J and Prediction.IO.  It also integrated closely with other big data tools, like Hadoop, YARN, HBase, Cassandra, Mesos, Hive, etc. Spark ecosystem is growing very fast.

Spark started in 2009 as a research project in UC Berkeley RAD Lab (AMPLab). The researchers in AMPLab that previously work with Hadoop MapReduce found that MapReduce was inefficient for iterative and interactive computing jobs. You can refer to some research papers for better scientific proofs, or following a thriving OSS developer community around Spark, including famous startups like DataBricks.

Let me share my hacking experiences on Apache Spark. Spark is written in Scala and requires JVM to run. If you want to work with Python later, you may need to install Python package like Anaconda that combine all frameworks you need for scientific computing – include the famous Jupiter Notebook. I started by downloading Spark binary then later source codes to build on my Mac machine. A straightforward maven based compilation took sometime (~24 minutes) till I can run spark shell. But I was impatient; so during the compilation I just downloaded and used the binary version (now version 1.4.0) to test some commands. The good fact was I can use Spark without Hadoop, even in my single Mac machine to practice its basic principles. When Spark was ready in my machine, I just followed the README.md file (good habits of a geek) to test it, for example in Scala shell:

./spark-shell
scala> sc.parallelize(1 to 1000).count()

or in Python shell:

./pyspark
>>> sc.parallelize(range(1000)).count()

Spark comes with several sample programs in the `examples` directory. To run one of them, I used `./bin/run-example <class> [params]`. Here for example for SparkPi:

./bin/run-example SparkPi

First thing I learnt about Spark was to make custom driver program that launches various parallel operations on my single machine Spark instance. The driver program (we can write in Python, Scala, Java, and R), contains main function and defines distributed datasets on the cluster, then applies data transformation actions to them. Spark shells are obvious examples of driver programs that access Spark through a SparkContext object, which represents a connection to a Spark’s computing cluster. In the any shell, a SparkContext is predefined for us as sc object, like in above examples. Spark default distribution (now version 1.4.0) provides spark-shell (for Scala) and pyspark (for Python) for interactive computing with sc object.

Second thing I learnt was about Spark’s main abstractions for working with distributed data, the RDD (Resilient Distributed Dataset), a distributed immutable collection of objects. In a clustered environment, each RDD is split into multiple partitions, which may be computed on different nodes. Programming in Spark is expressed as either creating new RDDs from data sources, transforming existing RDDs, or perform actions on RDDs to compute a result. Spark automatically distributes the data contained in RDDs across cluster and parallelizes the operations we want to perform on them.

RDDs can contain any type of Python, Java, Scala, or R (through SparkR) objects, including user- defined classes. Users can create RDDs in two ways: by loading an external dataset, or by distributing a collection of objects (e.g., a list or set) in their driver program. Once created, RDDs offer two types of operations: transformations and actions. Transformations construct a new RDD from a previous one. Spark context object directly provides a lot of functions to perform RDD transformations. Actions, on the other side, compute a result based on an RDD, and either returns it to the driver program or save it to external storage systems like HDFS, HBase, Cassandra, ElasticSearch etc.

For example – Python filtering of README.md file:
>>> lines = sc.textFile(“README.md”)
>>> pythonLines = lines.filter(lambda line: “Python” in line)
>>> pythonLines.first()

And Scala filtering version for the same file:
scala> val lines = sc.textFile(“README.md”)
scala> val pythonLines = lines.filter(line => line.contains(“Python”))
scala> pythonLines.first()

Finally, third thing I learnt was about a lazy fashion of Spark execution. Although we can define new RDDs any time, Spark computes them only in a lazy fashion—that is, the first time they are used in an action. Spark’s RDDs are by default recomputed each time we run an action. To reuse an RDD in multiple actions, we can ask Spark to persist data in a number of different places using RDD.persist(). After computing it the first time, Spark will store the RDD contents in memory (partitioned across the machines in cluster), and reuse them in future actions. Persisting RDDs on disk instead of memory is also possible. The behavior of not persisting by default may again seem unusual, but it makes a lot of sense for big datasets: if you will not reuse the RDD, there’s no reason to waste storage space when Spark could instead stream through the data once and just compute the result. In real practice, we will often use persist() to load a subset of data into memory and query it repeatedly.

Example of persisting previous RDD in memory:

>>> pythonLines.persist
>>> pythonLines.count()
>>> pythonLines.first()

As this is just a quick intro to Spark, lot more to hack if you are curious. To learn more, read the official Spark programming guide. If you prefer MOOC style, I recommend eDX- BerkeleyX: CS100.1x Introduction to Big Data with Apache Spark from DataBricks. Books can also help your learning curve, you can try these :

  1. Learning Spark: Lightning-Fast Big Data Analysis
  2. Advanced Analytics with Spark: Patterns for Learning from Data at Scale
  3. Machine Learning with Spark

Lastly, hacking specific computation problems is always better way to learn. Good luck with your Spark hacking!

Data Science – Science or Art?

People called it sexiest job of 21st century, hot and growing field that needs millions or billions resources in future. But what is that? I found it confusing at the beginning, as there is ambiguity to split between substances of science and methodologies on solving scientific problems through data computations. Since the beginning, the purpose of computing is insight, not the data. Thus computing is, or at least should be, intimately bound up with both the source of scientific problems and the model that is going to be made of the answers, it is not a step to be taken in isolation from physical reality. As a “failed theoretical physicist” of course I am very biased.

The Venn diagram model that widely accepted (many books refers to it), defines data science as intersections between hacking skills, math and stats knowledge, and substantive expertise. Although I really want to argue it, I quickly realize “substantive expertise” is open for any area of scientific topics; hence I will again waste my time to argue in open area. Even after consulted to Wikipedia, that defines data science as extraction of knowledge from large volume of data that aren’t structured, I’m still deeply confused. Nevertheless, let it be my problem, not yours. It is well known that IT industry has “unique” behavior to give confusing names to same thing.

Assuming I can push myself to accept data science definition from Wikipedia (never in reality), how can I relate the science? In science, there is a set of rules (the fundamental laws of nature) in operation, and task of scientists is to figure out what the rules are, by observing the results (data) that occur when the rules are followed. Simply said – it is an attempt to “reverse-engineer hack” on machinery of the nature. Even in math, it’s the other way around, to choose the rules (or model) and discover the insights of choosing any particular set of models. There is a superficial similarity, which leads to my other confusion.

In science, the way we test a theory is to codify it as a set of models and then explore the consequences of those models – in effects; to predict what would happen if those models were true. People do same thing in math, and in fact, the way its done in math serves as a model for the way its done in science, sometime. But the big difference is: in science, as soon as our predictions conflict with experimental data from nature, we are done. We know that our models are wrong and need to modify it. In math, this kind of conflict is minimal, because there is no necessary connection between any theory and the world. As long as it is still interesting enough to induce mathematicians to keep work on it, then it will continue to be explored.

Data science – to what we know so far in IT industry refers to collection of tools and methods to get insights from data (not necessarily large or big), by analyzing it with various computation techniques and later communicate (or consume) the insights through visualization (or else). It typically deals with data that mostly un-structured, collected from users, computer systems or other like sensors, without single predefined formats. Long debates in online forums regarding its definitions, and as it is still hyped-up, it will takes more time till it finally landed to earth again. It may because of legacy of computer science, which also in debate for decades.

People who come from statistic or math background will argue that data science is mostly about statistical analysis on data using modern tools, languages, libraries and computing infrastructures. By hacking those technologies they can work to produce insights from data with statistical methods. On other case if their background is physics for instance, they will think of numerical methods or computer simulations to fit modeled hypothesis to experimental data. From computer scientists, who have explored areas of information retrieval, for example, will proudly claimed that finally machine learning has a better name. Ex-scientists who are good in programming and programmers who are good in statistics and scientific/numerical computing. All may true subjectively – but if you look around the reality, variety of languages, tools, methods, and techniques for data analytic leads to an art instead of science. Yes, data analytic art if you need a new name again (data artist probably better title?). But no, it is not attractive enough, as taken by digital artists previously. Disclaimer: In IT business, we are in high demand of new hype and jargon (Read Gartner Hype Cycle 2014). So lets stick with data science as normally accepted as growing trend.

What actually data scientists do? Is it covering collecting and pre-processing the data, formulating hypothesis, identifying algorithms/tools that fit, performing computation, communicating insights and creating abstractions for higher level business people? Yes, perhaps those all written in their resumes mixed between software/data engineering and data analytical tasks. As it still far from maturity, roles and responsibilities may change over time (I believe it will become business roles not only IT), new data sources will explode with other hypes (such as Internet of Almost Stupid Things), companies who crafting automation tools/frameworks/platforms will emerge and raise more funds to innovate faster. More and more things can happen as art has no end. The art of machine intelligence is still going on progress. If we found way to un-supervised machine intelligence, many other things can happen, including we may not need data scientists and let the machines work for us. We all need to respond (or just do nothing) to anticipate this new hyped-trend. I choose to enjoy the show by hacking it!.