DB Tsaihttps://www.dbtsai.com/blogThe weblog of what I'm enthusiastic about.en-USMon, 25 Aug 2014 16:56:00 PDThttp://wordpress.org/?v=3.9.2hourly1Multinomial Logistic Regression with Apache Sparkhttps://www.dbtsai.com/blog/2014/multinomial-logistic-regression-with-apache-spark/big datacomputerhadoopmachine learningprogramingAlgorithmHadoopL-BFGSMachine LearningMLlibMultinomial Logistic RegressionOptimizationSparkDB TsaiMon, 23 Jun 2014 14:22:35 PDThttps://www.dbtsai.com/blog/?p=2139• Speaker: DB Tsai
• Date: June 20, 2014
• Location: Hacker Dojo, Mountain View, CA
• Host: Silicon Valley Machine Learning Meetup
• URL: http://www.meetup.com/Silicon-Valley-Machine-Learning/events/187398222/
• Slide: http://www.slideshare.net/dbtsai/2014-0620-mlor-36132297
• Video: https://www.youtube.com/watch?v=rYwZ09b1B1c

Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it’s with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.


© dbtsai for DB Tsai, 2014. | Permalink | Add to del.icio.us
Post tags: , , , , , , ,

]]>
• Speaker: DB Tsai • Date: June 20, 2014 • Location: Hacker Dojo, Mountain View, CA • Host: Silicon Valley Machine Learning Meetup • URL: http://www.meetup.com/Silicon-Valley-Machine-Learning/events/187398222/ • Slide: http://www.slideshare.net/dbtsai/2014-0620-mlor-36132297 • Video: https://www.youtube.com/watch?v=rYwZ09b1B1c Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will […]https://www.dbtsai.com/blog/2014/multinomial-logistic-regression-with-apache-spark/feed/0
Implementing the in-Mapper Combiner for Performance Gains in Hadoophttps://www.dbtsai.com/blog/2013/implementing-the-in-mapper-combiner-for-performance-gains-in-hadoop/big datacomputerhadoopJavaprogramingEnglishDB TsaiMon, 02 Dec 2013 10:21:15 PSThttps://www.dbtsai.com/blog/?p=2043By DB Tsai and Jenny Thompson. This article is also published in the blog of my company, Alpine Data Labs.

Before we start the article, let’s show you the benchmark first.
wordCountBenchmark

In the Alpine development team, we are always looking for ways to improve the efficiency of our algorithms. One of the most widely applicable and effective fixes we found was to implement the in-mapper combiner design pattern in our hadoop based algorithms. This can dramatically cut the amount of data transmitted across the network, and speed up the algorithm by 20%-50%.

For example, to implement the C4.5 decision tree algorithm we need to compute the information gain, which is essentially just counting all different combinations of independent and dependent variables; as a result, aggregating the result in mapper instead of emitting all of them for each row greatly increases performance. We also use this technique for our naive Bayes classifier, linear regression, and correlation operators.

A high level description and pseudo-code example can be found in Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer, 2010. It has been used in several MapReduce apps; for example, pig 0.10+ does this automatically. However, there are currently no ready-to-use concrete examples to be found online. Here we will present a step-by-step example with code, a word-count use-case and benchmarking.
(...)
閱讀全文; Read the rest of Implementing the in-Mapper Combiner for Performance Gains in Hadoop (1,218 words)


© dbtsai for DB Tsai, 2013. | Permalink | Add to del.icio.us
Post tags:

]]>
By DB Tsai and Jenny Thompson. This article is also published in the blog of my company, Alpine Data Labs. Before we start the article, let’s show you the benchmark first. In the Alpine development team, we are always looking for ways to improve the efficiency of our algorithms. One of the most widely applicable […]https://www.dbtsai.com/blog/2013/implementing-the-in-mapper-combiner-for-performance-gains-in-hadoop/feed/0
Prof. Andrew Ng: “Deep Learning: Machine learning via Large-scale Brain Simulations”https://www.dbtsai.com/blog/2013/prof-andrew-ng-deep-learning-machine-learning-via-large-scale-brain-simulations/big datacomputermachine learningDB TsaiThu, 01 Aug 2013 23:15:11 PDThttps://www.dbtsai.com/blog/?p=2133As a co-organizer of SF Machine Learning Meetup, it’s my pleasure to have Prof. Ng to give us a talk about deep learning. However, he is a famous celebrity (co-founder of Coursera, Stanford professor teaching Machine Learning on-line for free, and his deep learning work, etc) so that everyone knows him pretty well; as a result, I couldn’t find a good way to make a good introduction to him.

In the end, I came out with a joke to make a introduction to him, and it seems to be working really well. Check out the video!


© dbtsai for DB Tsai, 2013. | Permalink | Add to del.icio.us
Post tags:

]]>
As a co-organizer of SF Machine Learning Meetup, it’s my pleasure to have Prof. Ng to give us a talk about deep learning. However, he is a famous celebrity (co-founder of Coursera, Stanford professor teaching Machine Learning on-line for free, and his deep learning work, etc) so that everyone knows him pretty well; as a […]https://www.dbtsai.com/blog/2013/prof-andrew-ng-deep-learning-machine-learning-via-large-scale-brain-simulations/feed/0
Java Concurrent Dynamic Object Pool for non Thread Safe Objects using Blocking Queuehttps://www.dbtsai.com/blog/2013/java-concurrent-dynamic-object-pool-for-non-thread-safe-objects-using-blocking-queue/computerJavaprogramingDB TsaiSun, 24 Feb 2013 15:15:46 PSThttps://www.dbtsai.com/blog/?p=2016I tried to integrate an external non-thread-safe library to my web project; I found out that it’s too expensive to create an instance of the object for each client thread. Initially, I solved this issue by just sharing the same object across all the concurrent client threads, and synchronized this object externally.

However, the performance is still not acceptable; as a result, I designed a resource pool to hold fixed amount of the objects, and recycle them when the threads don’t need them anymore.

I want that the resource objects in the pool are dynamically created instead of creating all of them in the constructor. The pool will be initially empty, and when a client thread acquires a resource object, the pool can create a new resource on demand. Once the numbers of created objects are reached the size of the pool; then the new client threads will be blocked, and wait for other thread to recycle the resource.

Finally, the pool should be fair, and the fairness ensures that the first thread that asks is the first thread that gets; otherwise there is probability that some threads will just wait forever.

This concurrent object pool can be built by the blocking queue in Java concurrent package, and one of the implementation, ArrayBlockingQueue supports fairness which we require.

(...)
閱讀全文; Read the rest of Java Concurrent Dynamic Object Pool for non Thread Safe Objects using Blocking Queue (541 words)


© dbtsai for DB Tsai, 2013. | Permalink | Add to del.icio.us
Post tags:

]]>
I tried to integrate an external non-thread-safe library to my web project; I found out that it’s too expensive to create an instance of the object for each client thread. Initially, I solved this issue by just sharing the same object across all the concurrent client threads, and synchronized this object externally. However, the performance […]https://www.dbtsai.com/blog/2013/java-concurrent-dynamic-object-pool-for-non-thread-safe-objects-using-blocking-queue/feed/8
Hadoop M/R to implement “People You Might Know” friendship recommendationhttps://www.dbtsai.com/blog/2013/hadoop-mr-to-implement-people-you-might-know-friendship-recommendation/big datacomputerhadoopprogramingEnglishDB TsaiTue, 22 Jan 2013 00:38:54 PSThttps://www.dbtsai.com/blog/?p=1906social

The best friendship recommendations often come from friends. The key idea is that if two people have a lot of mutual friends, but they are not friends, then the system should recommend them to be connected to each other.
Let’s assume that the friendships are undirected: if A is a friend of B then B is also a friend of A. This is the most common friendship system used in Facebook, Google+, Linkedin, and several social networks. It’s not difficult to extend it to directed friendship system used in twitter; however, we’ll focus on the undirected case throughout this post.

The context of this article is modified into an exercise by Dr. Matthew for his students, you may want to check it out!
(...)
閱讀全文; Read the rest of Hadoop M/R to implement “People You Might Know” friendship recommendation (1,074 words)


© dbtsai for DB Tsai, 2013. | Permalink | Add to del.icio.us
Post tags:

]]>
The best friendship recommendations often come from friends. The key idea is that if two people have a lot of mutual friends, but they are not friends, then the system should recommend them to be connected to each other. Let’s assume that the friendships are undirected: if A is a friend of B then B […]https://www.dbtsai.com/blog/2013/hadoop-mr-to-implement-people-you-might-know-friendship-recommendation/feed/8
My startup, DuJour in Silicon Valley is now in StartXhttps://www.dbtsai.com/blog/2012/my-startup-dujour-startx/computerlifemiscellaneousstudy abroadDB TsaiSun, 22 Apr 2012 15:50:58 PDThttps://www.dbtsai.com/blog/?p=1865I’ve been working very hard on a startup project, DuJour, as a co-founder for many months in Silicon Valley, and we are very excited to announce that our startup is now in StartX, the non-profit incubator and accelerator program for Stanford students and recent alums.

Have you seen those perfect men and women in fashion, like that super-model for Victoria’s secret? Thin, tall, the perfect skin tone. Have you ever looked into the mirror and wondered what if I have that perfect body? What if I loose a few pounds and alter my skin tone?

You don’t need to wonder that, because you are just as beautiful as those models.

DuJour is fashion by everyday people. In DuJour, you can find people who share similar traits with you in fashion – like you all like vintage style or have curvy bodies. You can contribute to an exciting discussion, start your own fashion trend, or simply explore what’s new!

This is the video that we used for applying the StartX program which demonstrates our main idea visually.

I will introduce our site to all the readers when we do the alpha testing next month. For those of you who can not wait and are willing to help us debugging, please send me a message; I’ll send you a testing account.


© dbtsai for DB Tsai, 2012. | Permalink | Add to del.icio.us
Post tags:

]]>
I’ve been working very hard on a startup project, DuJour, as a co-founder for many months in Silicon Valley, and we are very excited to announce that our startup is now in StartX, the non-profit incubator and accelerator program for Stanford students and recent alums. Have you seen those perfect men and women in fashion, like […]https://www.dbtsai.com/blog/2012/my-startup-dujour-startx/feed/0
Biking to schoolhttps://www.dbtsai.com/blog/2011/biking-to-school/lifemiscellaneousstudy abroadDB TsaiWed, 05 Oct 2011 10:45:47 PDThttps://www.dbtsai.com/blog/?p=1827I decided to do exercise few days ago after seeing my belly in the mirror and after finding my shirt didn’t fit me well. After coming to United Sates since last September, I have gained weight day by day without notice. Maybe it is all because the high calorie food such as burger with cheese, steak, beer, red wine and etc… Another reason may be I have to sit in the lab all day long to run some data and thus lack of exercise. When I was in Taipei, Taiwan, I used to take public transportation, which will require me at least walk to the station from school, work place or home. Taipei is pretty much like New York in traffic aspect and it is really unwise to drive in Taipei city. You will be stuck in the traffic jam and will have difficult time to find a parking lot. Therefore the easy way to move around in Taipei is to take subway. Basically all the places in Taipei city are accessible via subway. After coming to States, my moving distance is kind of short, only dorm to lab, to class, or some grocery shopping via zip car.  After moving to off-campus housing in Mountain View, I purchased a pre-owned car and this give me an excuse for not doing exercise.

When my friends started to teas my look, I thought they’re merely too critical. Not until few days ago I saw myself in the mirror did I realize how huge my change is and how serious it is. I made up my mind that I need to do exercise. I think it can benefit my health and make me more energetic and eventually help me work more efficiently.

So far I have biked to school for a week and I have kept track on how far and how fast I bike every day. Generally I biked about 5 miles in 30 minutes one way from my apartment to school, ending up about 12 miles/hour. I actually post my biking record on Facebook so it will encourage me to keep doing this. I found that biking is really good not only because it saves gas and money and also it take me to school faster than driving in the traffic.


© dbtsai for DB Tsai, 2011. | Permalink | Add to del.icio.us
Post tags:

]]>
I decided to do exercise few days ago after seeing my belly in the mirror and after finding my shirt didn’t fit me well. After coming to United Sates since last September, I have gained weight day by day without notice. Maybe it is all because the high calorie food such as burger with cheese, […]https://www.dbtsai.com/blog/2011/biking-to-school/feed/0
My off-campus living at Stanfordhttps://www.dbtsai.com/blog/2011/my-off-campus-living-at-stanford/lifemiscellaneousstudy abroadDB TsaiTue, 04 Oct 2011 13:08:17 PDThttps://www.dbtsai.com/blog/?p=1821I moved to off campus on this June. Originally I am pretty satisfied with my living environment as it is really close to San Antonio shopping mall. Many supermarkets, banks, restaurants, like Target, Wal-Mart, Trade Joe, Ross, Chase, Chilly and Fresh Choice, are all within one mile. I can walk to these places as my after-dinner exercise. However, this is one thing I have to deal with, the neighbor.

I found the structure in American house is really unlike the one in Taiwan. While Taiwanese generally use concrete and steel, American use wood to construct the house. So the sounds in US will be easily to leak to next door. I live on 1st floor and I can easily hear my upstairs neighbor walking. There was an unhappy encounter few weeks earlier when I was discussing homework with my classmate at my place. We didn’t talk very loud but few minutes later the upstairs tenant came to knock my door and warn us to be quiet. It make me embarrassing in front of my classmates as I was the person propose to have discussion at my apartment with more relaxing surrounding.

Now is my turn to feel really unhappy with my upstairs tenant. Walking at midnight has bothered my sleep for a while but there was one night he exercised at 3 or 4am. I don’t know what kind of exercise he did but I clearly heard him drop sometime heavy on the floor. I guess it is dumbbell. Doing exercise at midnight is really a big no-no, especially playing dumbbell at 3am! I couldn’t sleep until 5 am and decided to write him a note. I first apologized for earlier disturbance he though I and my classmates made and then told him how important the night sleep means to me. I heard he is a nurse working three days a week at night. I then told him that normally people sleep at night. I do need to have great sleep as I have to do my lab work and attend classes at day time. I posted the note at his door and I do hope this can solve the issue, at least temporary.

Therefore, I hope I can work hard, graduate soon and start to work. I will then start to earn and save money. Ideally I will have my own house in five years in my plan. I guess I don’t have to worry noise problem between neighbors at that time as the distance between houses should be much larger than the one between apartments.


© dbtsai for DB Tsai, 2011. | Permalink | Add to del.icio.us
Post tags:

]]>
I moved to off campus on this June. Originally I am pretty satisfied with my living environment as it is really close to San Antonio shopping mall. Many supermarkets, banks, restaurants, like Target, Wal-Mart, Trade Joe, Ross, Chase, Chilly and Fresh Choice, are all within one mile. I can walk to these places as my […]https://www.dbtsai.com/blog/2011/my-off-campus-living-at-stanford/feed/0
[分享] he.net 美國大 IDC 免費提供的 DNS 代管https://www.dbtsai.com/blog/2011/he-net_free_dns/computerDB TsaiFri, 06 May 2011 02:17:19 PDThttps://www.dbtsai.com/blog/?p=1783https://dns.he.net/

he.net( Hurricane Electric)是美國老牌 IDC 數據交換中心,成立於1994年。如果有在玩 ipv6 的,大概都會聽過他的名子。它提供免費的 ipv6 tunnel broker, 是類似 vpn 的東西,讓的 ipv4 電腦取得 ipv6 使用。小弟就是用它們家的 tunnel broker, 提供免費 ipv6 服務,
而我所擁有的 domain name 也是給 Hurricane Electric 代管。解析快又穩定。

he.net這個免費DNS解析服務最多可以添加25個域名,提供設置設置A record AAAA record, CNAME record, MX record, NS record, TXT record 和 SRV record. 如果有在玩 ipv6 的, 重要的是它的 name server 有 ipv6 ns glue, 所以可以在”純” ipv6 的網路環境中正常解析。目前很多 dns 代管都可以設定 AAAA record, 但是沒有在 name server 上設置 ipv6 ns glue, 所以基本上查你 AAAA record 時還是跑 ipv4協定。這在 dual stack 上沒有問題,但是在”純”ipv6 網路下反而會連不到。

它們提供五台 name server 可以使用,大家應該可以相信這們大的數據公司的服務,況且他們有自己的骨幹到世界各地,算是很有資本的公司。對了,他還有添加slave、reverse的功能。


© dbtsai for DB Tsai, 2011. | Permalink | Add to del.icio.us
Post tags:

]]>
https://dns.he.net/ he.net( Hurricane Electric)是美國老牌 IDC 數據交換中心,成立於1994年。如果有在玩 ipv6 的,大概都會聽過他的名子。它提供免費的 ipv6 tunnel broker, 是類似 vpn 的東西,讓的 ipv4 電腦取得 ipv6 使用。小弟就是用它們家的 tunnel broker, 提供免費 ipv6 服務, 而我所擁有的 domain name 也是給 Hurricane Electric 代管。解析快又穩定。 he.net這個免費DNS解析服務最多可以添加25個域名,提供設置設置A record AAAA record, CNAME record, MX record, NS record, TXT record 和 SRV record. 如果有在玩 ipv6 的, 重要的是它的 name server 有 ipv6 ns glue, […]https://www.dbtsai.com/blog/2011/he-net_free_dns/feed/1
My birthday of ten thousand days oldhttps://www.dbtsai.com/blog/2011/my-birthday-of-ten-thousand-days-old/UncategorizedDB TsaiMon, 17 Jan 2011 08:00:50 PSThttps://www.dbtsai.org/blog/?p=1141dbtsai@dbtsai:~$ date -d “1983-09-02 UTC 864000000 seconds” +%F
2011-01-18


© dbtsai for DB Tsai, 2011. | Permalink | Add to del.icio.us
Post tags:

]]>
dbtsai@dbtsai:~$ date -d “1983-09-02 UTC 864000000 seconds” +%F 2011-01-18 © dbtsai for DB Tsai, 2011. | Permalink | Add to del.icio.us Post tags:https://www.dbtsai.com/blog/2011/my-birthday-of-ten-thousand-days-old/feed/0