Tag Archives: Algorithm

[Online Education ]Design and Analysis of Algorithms I


About The Course

In this course you will learn several fundamental principles of algorithm design. You’ll learn the divide-and-conquer design paradigm, with applications to fast sorting, searching, and multiplication. You’ll learn several blazingly fast primitives for computing on graphs, such as how to compute connectivity information and shortest paths. Finally, we’ll study how allowing the computer to “flip coins” can lead to elegant and practical algorithms and data structures. Learn the answers to questions such as: How do data structures like heaps, hash tables, bloom filters, and balanced search trees actually work, anyway? How come QuickSort runs so fast? What can graph algorithms tell us about the structure of the Web and social networks? Did my 3rd-grade teacher explain only a suboptimal algorithm for multiplying two numbers?


How to program in at least one programming language (like C, Java, or Python); familiarity with proofs, including proofs by induction and by contradiction; and some discrete probability, like how to compute the probability that a poker hand is a full house. At Stanford, a version of this course is taken by sophomore, junior, and senior-level computer science majors.


No books are required for the course. However, three books have significantly influenced the instructor’s presentation and can be consulted for extra details. In order of decreasing relevance to the course, they are: Kleinberg & TardosAlgorithm Design, Dasgupta, Papadimitriou & VaziraniAlgorithms, and Cormen, Leiserson, Rivest, & SteinIntroduction to Algorithms.

No special software (e.g., development enviroment) is required to take this course.

The Instructor

Tim Roughgarden is an Associate Professor of Computer Science and (by courtesy) Management Science and Enginering at Stanford University, where he holds the Chambers Faculty Scholar development chair. At Stanford, he has taught the Design and Analysis of Algorithms course for the past eight years. His research concerns the theory and applications of algorithms, especially for networks, auctions and other game-theoretic applications, and data privacy. For his research, he has been awarded the ACM Grace Murray Hopper Award, the Presidential Early Career Award for Scientists and Engineers (PECASE), the Shapley Lecturership of the Game Theory Society, a Sloan Fellowship, INFORM’s Optimization Prize for Young Researchers, and the Mathematical Programming Society’s Tucker Prize.

Frequently Asked Questions

  1. When does the class start?The class will start in January 2012.
  2. What is the format of the class?The class will consist of lecture videos, which are broken into small chunks, usually between eight and twelve minutes each. Some of these may contain integrated quiz questions. There will also be standalone quizzes that are not part of video lectures, and programming assignments. There will be approximately two hours worth of video content per week.
  3. Will the text of the lectures be available?We hope to transcribe the lectures into text to make them more accessible for those not fluent in English. Stay tuned.
  4. Do I need to watch the lectures live?No. You can watch the lectures at your leisure.
  5. Can online students ask questions and/or contact the professor?Yes, but not directly There is a Q&A forum in which students rank questions and answers, so that the most important questions and the best answers bubble to the top. Teaching staff will monitor these forums, so that important questions not answered by other students can be addressed.
  6. Will other Stanford resources be available to online students?No.


[repost ]程序员面试题精选100题(05)-查找最小的k个元素









typedef multiset<int, greater<int> >  IntHeap;

// find k least numbers in a vector
void FindKLeastNumbers
const vector<int>& data,               // a vector of data
IntHeap& leastNumbers,                 // k least numbers, output
unsigned int k

if(k == 0 || data.size() < k)

vector<int>::const_iterator iter = data.begin();
for(; iter != data.end(); ++ iter)
// if less than k numbers was inserted into leastNumbers
if((leastNumbers.size()) < k)

// leastNumbers contains k numbers and it’s full now
// first number in leastNumbers is the greatest one
IntHeap::iterator iterFirst = leastNumbers.begin();

// if is less than the previous greatest number
if(*iter < *(leastNumbers.begin()))
// replace the previous greatest number



[repost ]Data Structure Visualizations


Currently, we have visualizations for the following data structures and algorithms:

  • Basics
  • Recursion
  • Indexing
  • Sorting
  • Heap-like Data Structures
  • Graph Algorithms
  • Dynamic Programming
  • Geometric Algorithms
  • Others …

    [repost ]Pearson’s Correlation Coefficient, r

    Correlation is a technique for investigating the relationship between two quantitative, continuous variables, for example, age and blood pressure. Pearson’s correlation coefficient (r) is a measure of the strength of the association between the two variables.

    The first step in studying the relationship between two continuous variables is to draw a scatter plot of the variables to check for linearity. The correlation coefficient should not be calculated if the relationship is not linear. For correlation only purposes, it does not really matter on which axis the variables are plotted. However, conventionally, the independent (or explanatory) variable is plotted on the x-axis (horizontally) and the dependent (or response) variable is plotted on the y-axis (vertically).

    The nearer the scatter of points is to a straight line, the higher the strength of association between the variables. Also, it does not matter what measurement units are used.

    Values of Pearson’s correlation coefficient

    Pearson’s correlation coefficient (r) for continuous (interval level) data ranges from -1 to +1:

    r = -1 data lie on a perfect straight line with a negative slope data lie on a perfect straight line with a negative slope
    r = 0 no linear relationship between the variables no linear relationship between the variables
    r = +1 data lie on a perfect straight line with a positive slope data lie on a perfect straight line with a positive slope

    Positive correlation indicates that both variables increase or decrease together, whereas negative correlation indicates that as one variable increases, so the other decreases, and vice versa.

    Example Scatterplots

    Identify the approximate value of Pearson’s correlation coefficient. There are 8 charts, and on choosing the correct answer, you will automatically move onto the next chart.


    Tip: that the square of the correlation coefficient indicates the proportion of variation of one variable ‘explained’ by the other (see Campbell & Machin, 1999 for more details).

    Statistical significance of r


    The t-test is used to establish if the correlation coefficient is significantly different from zero, and, hence that there is evidence of an association between the two variables. There is then the underlying assumption that the data is from a normal distribution sampled randomly. If this is not true, the conclusions may well be invalidated. If this is the case, then it is better to use Spearman’s coefficient of rank correlation (for non-parametric variables). See Campbell & Machin (1999) appendix A12 for calculations and more discussion of this.

    It is interesting to note that with larger samples, a low strength of correlation, for example r = 0.3, can be highly statistically significant (ie p < 0.01). However, is this an indication of a meaningful strength of association?

    NB Just because two variables are related, it does not necessarily mean that one directly causes the other!

    Worked example

    Nine students held their breath, once after breathing normally and relaxing for one minute, and once after hyperventilating for one minute. The table indicates how long (in sec) they were able to hold their breath. Is there an association between the two variables?


    chart showing scatter plot

    The chart shows the scatter plot (drawn in MS Excel) of the data, indicating the reasonableness of assuming a linear association between the variables.

    Hyperventilating times are considered to be the dependent variable, so are plotted on the vertical axis.

    Output from SPSS and Minitab are shown below:

    Select Analysis>Correlation>Bi-variate

    table of correlations

    Correlations: Normal, Hypervent

    Pearson correlation of Normal and Hypervent = 0.966
    P-Value = 0.000

    In conclusion, the printouts indicate that the strength of association between the variables is very high (r = 0.966), and that the correlation coefficient is very highly significantly different from zero (P < 0.001). Also, we can say that 93% (0.9662) of the variation in hyperventilating times is explained by normal breathing times.


    Similarity Arithmetics for Mahout 0.4 Hadoop based recommender Job

    Mahout 0.4 Hadoop based recommender Job has one parameter –similarityClassname that can specify the name of distributed similarity class to instantiate or a predefined similarity from SimilarityType

    Mahout 0.4 Hadoop based recommender Job has provided  following SimilarityType:


    The source code of those Similarity Arithmetics are in here:


    You can view the code to know the implementation or search it in:  http://wikipedia.org