打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
Quora - What are good resources to learn about search engine architecture?

What are good resources to learn about searchengine architecture?

Ithink Manning and Prabhakar's book on Information Retrieval covers agood bit of the theory behind search engines, but what are the bestresources out there to learn about their distributed systems, networkrouting and scalability aspects? Pointers to books, conference andjournal papers perhaps that talk about real world designs and systems?
Cannot addcomment at this time.
 

3Answers

Krishna Gade, TwitterSearch <- Bing <- Live Search <- M...
6 votes by Anon User, Anon User, Amund Tveit, (more)
If you're interested in thearchitecture of search engines the way they are done in practice ratherthan in academia, following are some of the papers that're very good.Esp., the last one helps you give a good model to approach the problemof how to design the architecture of a search engine. 

-Evolution of Google's search architecture by Jeff Dean. http://research.google.com/peopl...
-Lessons from building large scale systems by Jeff Dean. http://www.cs.cornell.edu/projec...
-Operational Requirements for Scalable Search Systems. http://www.ir.iit.edu/~abdur/pub...

AlsoI found this IR lab produces good search architecture papers. 
http://cis.poly.edu/westlab/publ...
There is a newbook which is incomplete: http://www.ir.uwaterloo.ca/book/

Itfocuses more on the use of IR in search engines.

Also searchengine is a large area - in general you can divide it into systems andthe algorithms side. Algorithm part is obvious; systems refers tobuilding large scale distributed systems that enable the algorithm toperform effectively and efficiently.

Some conferences to followon this topic are: SOSP, WWW, SIGMOD, VLDB.

As for informalreadings, I personally subscribe to the following blogs, and many ofthem talk about challenges in building real systems (not necessarilysearch engines, but all kinds of distributed systems):

WernerVogels (Amazon CTO)
http://www.allthingsdistributed....

JamesHamilton (Amazon DE & VP of Engineering)
http://perspectives.mvdirona.com/

http://highscalability.com/ (lots ofcoverage about different systems: youtube, google, flickr, etc)

http://www.royans.net/arch/ (similar tohighscalability but updated not as frequently)

http://googleresearch.blogspot.com/
For the rankingpart of search engine, SIGIR is the most relevant conference, followedby CIKM.   A relatively new book about it: http://ir.iit.edu/~ophir/pub.html. D.Grossman and O. Frieder, Information Retrieval: Algorithms andHeuristics, Kluwer Academic Publisher.
 
For system implementation part, there is one more new book http://www.search-engines-book.com/,which I have not read it. So I am not sure whether it touchesdistributed system, network routing, etc.  
 
You may study Open source project Lucene (indexing and rankinglibrary), or Solr (Enterprise search solution based on Lucene), which isused by many companyies such as Netflix. Katta, http://katta.sourceforge.net/,  is"Lucene & more  in the cloud". You may read source code ordocumentation to know implementation details.  Very easy to set up asearch engine using Solr  and play with it.
 
There is a Lemur and related Indri project in academia   http://www.lemurproject.org/lemu.... Using this toolkit, it is easy to implement and test sophiscated searchalgorithms such as language models-based ones.
 
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
Search Engine Indexing
Software ? OpenStack Open Source Cloud Computing Software
The Lucene search engine: Powerful, flexible, and free
Top 1000 sites - DoubleClick Ad Planner
Weaving Relevant Keywords Into Your Site
share search engine
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服