Quora has taken the tech and entrepreneurial world by storm, providing a system that works so fluidly that it is sometimes hard to see what the big fuss is all about. This slick tool is powered, not only by an intelligent crowd of askers and answerers, but by a well-crafted backend created by co-founders who honed their skills at Facebook.
It is not surprising that, with all the smart people using this smart tool, there are many pondering on how it works so well. The NoSQL boffins scratch their heads and ponder such questions as, “Why does Quora use MySQL as the data store rather than NoSQLs such as Cassandra, MongoDB, CouchDB, etc?“.
In this blog post I will delve into the snippets of information available on Quora and look at Quora from a technical perspective. What technical decisions have they made? What does their architecture look like? What languages and frameworks do they use? How do they make that search bar respond so quickly?
- Components Of Quora
- What’s Cooking Under That Hood?
- Charlie Cheever Follows “14 Rules for Faster-Loading Web Sites”
- Recommended Reading
Components Of Quora
The general components that make up Quora are…
- You can ask questions
- You can answer questions (anonymously if you desire)
- You can comment on answered questions
- You can vote-up or vote-down answers to questions
- Questions can be assigned to topics
- You can write a post (a informative statement, rather like a orphaned answer or blog post)
- You can follow questions, topics or other users
- A super-fast auto-complete search-box at the top, which doubles as the method for entering new questions
The last point, the super-fast auto-complete search-box, is one of the defining features of Quora. You can see immediately, as you begin to enter a question, whether somebody else has already asked that question or if there is a topic or post on the subject. Let’s start there…
What’s Cooking Under That Hood?
Only the questions, topic labels, user names or post titles are indexed and served up to the search-box. There is no full-text search, so searching the content of questions and answers will not work. The text that is indexed is tokenized so that words in a different order will still be matched. Prefix matching enables best matches to be shown before the entire word is entered. For instance, typing “mi” might immediately show “Microsoft” in the results.
There is some simple stemming of words, since “nears” matches “near”, but “pony” does not match “ponies”. “Topic-aliases” allow for similar matches on topic names, such as “startup” and “start-up”. These topic-aliases have been manually entered by users. Otherwise these would not match.
If a duplicate question is redirected to another question (a feature of Quora), then that original question will still appear in the search results, since it increases the chances of a match. There is no n-gram indexing, so slight mis-spellings will not match. For instance, “gooogle” (with an extra “o”) finds nothing.
Previously, they did use an open source search server, called Sphinx. It supports the features they are using above, but they have since moved from this due to real-time constraints. Their new solution is built in-house and allows them better prefix indexing and control over the matching algorithms. They built this in Python.
What libraries does Quora use for search?
Adam D’Angelo, Quora Founder (Nov 13, 2010)
Our search is custom-written. It doesn’t use any libraries aside from Thrift, and Python’s unicode library, which we use for unicode normalization.
Quora uses persistent connections. A HTTP connection is established with the server when you start typing the search query. This connection is kept open and further requests are made on this same open connection. The connection will terminate (times-out) if not used for 60 seconds. If a connection times-out then a new connection is established when typing begins.
To simulate the typing of a word into the search-box, I sent the following requests, character-by-character, across a persistent connection. For instance “butler” is six requests (“b”, “bu”, “but” … “butler”).
"butler" (6 chars) duration: 0.393 secs 0.065 secs per query "butler monkeys" (14 chars) duration: 0.672 secs 0.048 secs per query "fasdisajfosdffsa" (16 chars) duration: 0.746 secs 0.046 secs per query
That last query was used to test if there was a slow-down for a word that would obviously not be in a caching layer. I saw no slow-down. This means that they are not caching, caching is only used to take the load off the backend search engine or they are doing something smarter (e.g. if there is no match for “fasd” then there will be no match for “fasdi”, so abort).
Is Quora going to implement full-text search?
Adam D’Angelo, I made a lot of the early Quora … (Sep 1, 2010)
Yes, eventually. We haven’t implemented this yet because we’ve prioritized other things, but we will definitely do it in the future.
Webnode2 And LiveNode
They seem very pleased with the technology they have built and struggled to find its weaknesses. One weakness is that it is tricky for LiveNode to keep track of what is happening within the browser as it pushes changes from the server. If users A and B are viewing the same question then ones interactions will affect the other. For instance, if user A up-votes an answer then that answer will be promoted and will visibly move up the page. This display change will be pushed over AJAX to user B’s browser. Any prior browser-side change that user B made, such as expanding a comments section, might be lost.
While they would like to open-source LiveNode and have tried to keep code separation, doing so right now would be too much work and would take time away from their main goal, which is making Quora better.
Charlie Cheever points out that webnode2 is unrelated to the “free and easy website builder” called Webnode at webnode.com.
- Make Fewer HTTP Requests
- Use a Content Delivery Network
- Add an Expires Header
- Gzip Components
- Put Stylesheets at the Top
- Put Scripts at the Bottom
- Avoid CSS Expressions
- Reduce DNS Lookups
- Avoid Redirects
- Remove Duplicate Scripts
- Configure ETags
- Make AJAX Cacheable
Quora is a great example of a modern tech start-up. They are very small team who understand the technologies they are using very well. They have made considered choices in the technology they have selected and have a good vision of which components would be better written from scratch. They seem keen to share these in-house technologies with the open-source community and I look forward to when they have the time to make this a reality. I intend keep following Quora and writing about them more in future blog posts.
- How FriendFeed uses MySQL to store schema-less data
- Steve Souders’ 14 Rules for Faster-Loading Web Sites