Problem Statement:
Let's say you have a number of documents. Each document has a number of words (we refer them as terms). The problem is to identify two documents which are most similar.
Given below is one of the solution approaches
Use of Term Document Matrix:
Each row of a Term document matrix (let's name it as D) is a document vector,
with one column for every term in the entire corpus of documents. The matrix is sparse as not all document might contain a term. The value in each cell of the matrix is the term frequency.
example:
docid term1 term2 term3 term4 term n
d1 2 1 0 0 3
d2 0 2 3 4 1
d3 1 0 4 2 0
.
.
The transpose of the same Term Document Matrix (DT) will look as follows
docid d1 d2 d3 .............. dn
term1 2 0 1
term2 1 2 0
term3 0 3 4
.
.
We can create a Similarity Matrix (S) by multiplying D with DT { e.g. S = D * DT}
The structure of Similarity matrix will be as follows
docid d1 d2 d3 ............. dn
d1 x11 x12 x13 x1n
d2 x21 x22 x23 x2n
d3 x31 x32 x33 x3n
.
.
dn xn1 xn2 xn3 xnn
where Xmn = SUM Product of all term frequencies of docids dm and dn
Intuitively higher the value of Xmn, the more similar are the documents with doc ids dm and dn.
Coming back to our original problem, find which to documents are most similar.
Simply look into the Similarity matrix and find our the row, col of the highest value in the matrix.
Interpretations of technorealism
Reflections on Data Science, Leadership and Organizational behavior.
Saturday, May 18, 2013
Sunday, May 5, 2013
The Serving Leader
"The Serving Leader" by Ken Jennings and John Stahl-Wert was my weekend reading. The book talks about five actions that can transform teams, business and community. My key takeaways from the book are
- Serving leaders run to great purpose by holding out in front of their team a "reason why" that is so big that it requires and motivates everybody's best efforts
- They qualify to be first by putting other people first
- They raise the bar of expectation by being highly selective in choice of team leaders and by establishing high standards of performance
- They teach serving leader principles and practices and remove obstacle to performance. These actions multiply the serving leader's impact by educating and activating tier after tier of leadership
- They build on strength by arranging each person in the team to contribute what he or she is best at. This improves everyone's performance and solidifies teams by aligning the strengths of many people
Some of the thoughts were counter intuitive such as "focus over strength over addressing weaknesses". Yet the argument that "it is foolish to pour all our energy into turning weaknesses to serviceable mediocrity" - makes profound sense.
Overall a good read and fodder for introspection.
Wednesday, March 14, 2012
Semi-Closed mobile wallet - Transforming Indian mCommerce space
It's paradoxical that the number of Indians who have access to toilets are fewer than those who use cell phones. In that context, Government of India issuing "Semi closed mobile wallet" service licenses to seven entities a couple of months back, can prove to be a critical milestone in indian mCommerce space. Indian Telecom major Bharti Airtel and Nokia (closed wallet service) have already launched their mobile payment services. The Reserve Bank of India has already given its conditional nod to the finance ministry's proposal to allow 100% foreign direct investment (FDI) through the automatic route in these payment services. Needless to say, these are interesting times for Indian mCommerce space.
How does a Semi Closed mobile wallet service work?
In 'semi-closed' mobile prepaid instruments, you can load money into your cell phone from a licenced company and make payments with it, but you can't use it to withdraw money.
For example, Airtel Money (subsidiary of telecom major Bharti Airtel) allows easy subscription by dialling *400# from any airtel phone or through their online portal or by visiting nearest airtel money retail outlet. Upon activation you get a mobile PIN. You load cash on to your cell phone through your netbanking account or manually from a retail outlet. Using Airtel Money, you can pay utility bills, shop for products at designated outlets (each outlet will have a designated airtel mobile number, payment happens by transferring money to this phone number) and transfer money from one phone to another. At this point, online shopping using Airtel money is not available.
Impact of Semi closed mobile wallet services on Indian mCommerce space
How does a Semi Closed mobile wallet service work?
In 'semi-closed' mobile prepaid instruments, you can load money into your cell phone from a licenced company and make payments with it, but you can't use it to withdraw money.
For example, Airtel Money (subsidiary of telecom major Bharti Airtel) allows easy subscription by dialling *400# from any airtel phone or through their online portal or by visiting nearest airtel money retail outlet. Upon activation you get a mobile PIN. You load cash on to your cell phone through your netbanking account or manually from a retail outlet. Using Airtel Money, you can pay utility bills, shop for products at designated outlets (each outlet will have a designated airtel mobile number, payment happens by transferring money to this phone number) and transfer money from one phone to another. At this point, online shopping using Airtel money is not available.
Impact of Semi closed mobile wallet services on Indian mCommerce space
The semi-closed mobile wallet is particularly exciting because it has something in it for everyone – consumers, banks and telecom operators. It is a convenient cashless mode transaction, accessible even to those consumers who are not eligible for a credit card. Banks could save the cost and effort of credit card maintenance and administration by getting more customers to switch to the semi-closed wallet.
That said, semi closed mobile wallet services are at very early stages of adoption. In order for mobile wallet services to replace Credit cards/ Debit cards or the use of hard cash, the service providers must take radically different approach, explore new monetization models and apply true potential of technology.
Here are my thoughts on alternate business models semi closed mobile wallet service providers should consider
- Incentivize mobile wallet adoption for merchants by driving down interchange fee: By a recent study, credit card companies suck out upwards of $50 billion / year in the form of interchange fee. Needless to say, interchange fees are never a hit with merchants. Mobile wallet service providers should incentivize merchants to adopt their services by reducing (or eliminating) interchange fee and focus on monetizing alternate value added services to earn their revenue
- Make the transactions secure/frictionless: Adopt contact less NFC technology. Own up arbitrage responsibility (in case of conflict) to give the necessary comfort feeling to consumers
- Explore alternate Revenue models: It is obvious that mobile wallets will compete with credit card companies in enabling B2C, B2B transactions. However the mobile wallet service can be extended to other segments such as Money transfer between consumers, can be positioned as a facilitator for government welfare schemes etc
- Explore value added services: explore the power of web and analytics to provide value added services to consumers(e.g location based services - show mobile wallet coupons for your nearest retailer, customer loyalty program etc) and to merchants (analytics on customer purchase, targeted advertising etc)
In Indian context, mobile commerce provides immense opportunity for indian consumers to simply skip credit cards and jump on to the next stage of evolution in payment systems. If you are already working in this space, please feel free to share your views on how to make this transformation smooth yet effective.
Monday, January 16, 2012
Entrepreneurship Moments
Who is an entrepreneur?
In the mid 1980’s, Harvard Business School professor Howard Stevenson defined Entrepreneurship as “the pursuit of opportunity without regard to resources currently controlled”. Theoretically anybody who demonstrates the aforesaid behavior can be an Entrepreneur.
What it takes to be an entrepreneur?
Unfortunately for many, being an entrepreneur is an end state and it requires harsh trade-offs. Some leave stable jobs to chase that one Big idea. Others start by putting together a business plan backed by a detailed market research. Even some start by searching for the right angel investor. Often times, the very entrepreneurial spirit perishes in the complexities it entails.
Is it really that arduous to be an entrepreneur?
Not if we treat entrepreneurship as a journey, a collection of contiguous decision points and apply our entrepreneurial spirit at each of those. John Burgstone in his book, ‘Breakthrough Entrepreneurship” writes “Every time you want to make any important decision, there are two possible courses of action. You can look at the array of choices that present themselves, pick the best available option and try to make it fit. Or, you can do what the true entrepreneur does: Figure out the best conceivable option and then make it available.”
So the right question would be “did I act like an entrepreneur” today? Yesterday? Everyday? If answer is consistently yes, you are an habitual entrepreneur.
In the mid 1980’s, Harvard Business School professor Howard Stevenson defined Entrepreneurship as “the pursuit of opportunity without regard to resources currently controlled”. Theoretically anybody who demonstrates the aforesaid behavior can be an Entrepreneur.
What it takes to be an entrepreneur?
Unfortunately for many, being an entrepreneur is an end state and it requires harsh trade-offs. Some leave stable jobs to chase that one Big idea. Others start by putting together a business plan backed by a detailed market research. Even some start by searching for the right angel investor. Often times, the very entrepreneurial spirit perishes in the complexities it entails.
Is it really that arduous to be an entrepreneur?
Not if we treat entrepreneurship as a journey, a collection of contiguous decision points and apply our entrepreneurial spirit at each of those. John Burgstone in his book, ‘Breakthrough Entrepreneurship” writes “Every time you want to make any important decision, there are two possible courses of action. You can look at the array of choices that present themselves, pick the best available option and try to make it fit. Or, you can do what the true entrepreneur does: Figure out the best conceivable option and then make it available.”
So the right question would be “did I act like an entrepreneur” today? Yesterday? Everyday? If answer is consistently yes, you are an habitual entrepreneur.
Saturday, November 26, 2011
Analytics using R: Most active in my Twitter list
I follow some 80 odd people/ news sources on my twitter account. For a while I wondered which of these sources are most active on twitter.
I picked a simple metric '# of status messages posted to twitter' as the measure of activity. Using R I quickly wrote a program to generate my top 10 most active twitter sources.
Here is the bar plot of the result
I realize the code is not optimally written. Any suggestions refine the code will be appreciated.
Update: 11/29/2011
In the latest version of twitteR package, the method userFriends() has been deprecated. You may replace line#9 in the above code as with the code given below:
tfriends <- tuser$getFriends()
Here is the bar plot of the result
As expected news sources dominate the list. Among individuals "Michael Hyatt" and "Jurgen Appelo" are most active.
If you are interested in 'R', here is the code to extract this report:
## Prerequisite: Install twitteR package 'install.packages(twitteR)
## load twitteR package
library(twitteR)
##get handle to a twitteR user object (in this case for user d_lalit
tuser <- getUser('d_lalit')
##get list of friends of d_lalit
tfriends <- userFriends(tuser)
##create an array to store the name and number of status messages for each friend
friendsCount <- length(tfriends)
friendsName <- character(friendsCount)
friendsMsgCount <- numeric(friendsCount)
for (i in 1:friendsCount) {
friendsName[i] <- tfriends[[i]]$screenName
friendsMsgCount[i] <- as.numeric(tfriends[[i]]$statusesCount)
}
## prepare a sortedlist and extract top 10 values from the list
sortedlist <- sort(friendsMsgCount, index.return = TRUE, decreasing=TRUE)
top10friendsName <- character(10)
top10friendsMsgCount <- numeric(10)
for (i in 1:10) {
top10friendsName[i] <- friendsName[sortedlist$ix[[i]]] ## index is stored under ix
top10friendsMsgCount[i] <- as.numeric(sortedlist$x[[i]])
}
## plot the chart
barplot(top10friendsMsgCount, width = 0.25, names.arg = top10friendsName, horiz=FALSE, main="Twitter friends by activity count", ylab="Number of status messages", xlab="twitter friends", space=0.2, density=50, angle=45, cex.names=0.7)
Update: 11/29/2011
In the latest version of twitteR package, the method userFriends() has been deprecated. You may replace line#9 in the above code as with the code given below:
tfriends <- tuser$getFriends()
Thursday, October 27, 2011
Cheapest Tablet in the World - Made in India
Check out "Akash" - World's cheapest tablet. It is unabashedly optimized for cost. To be priced at $35 a piece (subsidized by Govt of India) for educational institutes and $60 a piece for retail sale.
Aakash Tablet from Venturebeat on Vimeo.
Specifications
Aakash Tablet from Venturebeat on Vimeo.
Specifications
- Screen: 7-inches; 800-by-400 pixels; Resistive touchscreen
- Operating system: Android 2.2, Froyo
- Processor: 366 MHz Connexant; HD Video co-processor (both with graphics accelerators)
- Memory: 256MB RAM (internal); 2GB Flash (external)
- Storage: 2GB card included, expandable up to 32GB
- Ports: Two USB 2.0; 3.5mm audio out jack; 3.5mm audio in jack (No built-in speakers)
- Connectivity: GPRS; Wi-Fi 802.11 a,b,g
- Power: Up to 180 minutes on battery; AC adapter, 200-240 volt
- Weight: 350 grams
Subscribe to:
Posts (Atom)