CSCI-1080 Intro To CS: World Wide Web

Saint Louis University
Department of Computer Science

Web Science Preface

This part of the course introduces Web science. The course has no prerequisites and it is targeted to non-technical undergraduates.

The Web Science curriculum of this course is freely inspired by the following passage adapted from “Web Science: An Interdisciplinary Approach to understanding the Web”, by James Hendler, Nigel Shadbolt, Wendy Hall, and Tim Berners-Lee:

Web science, an emerging interdisciplinary field, takes the Web as its primary object of study. This study incorporates both the social interactions enabled by the Web's design and the applications that support them. The Web is often studied at the micro scale, as an infrastructure of protocols, programming languages, and applications. However, it is the interaction of human beings creating, linking, and consuming information that generates the Web's behavior as emergent properties at the macro scale. These properties often generate surprising properties that require new analytic methods to be understood.

For example, when Mosaic, the first popular Web browser, was released publicly in 1992, the number of users quickly grew by several orders of magnitude, with more than a million downloads in the first year. The wide deployment of Mosaic led to a need for a way to find relevant material on the growing Web, and thus search became an important application, and later an industry, in its own right. The enormous success of search engines has inevitably yielded techniques to game the algorithms (an unexpected result) to improve search rank, leading, in turn, to the development of better search technologies to defeat the gaming. More recent macro-scale examples include photo-sharing on Flickr, video-uploading on YouTube, and social-networking sites like Instagram, Facebook and Bonfyre (a growing social network startup based in St.Louis, MO).

The essence of Web science is to understand (see e.g., this video) and to design (see this other video) systems to produce the effects we want.

Given the breadth of the Web and its inherently multi-user (social) nature, its science is necessarily interdisciplinary, involving at least mathematics, computer science, sociology, psychology, and economics.

Micro vs. Macro: Themes of Web science

Four important themes of Web Science are

  • Micro: an individual acts

  • Macro: the world responds (or not) to an individual's action

  • Synthetic: something is created to produce a desired result

  • Analytic: laws are stated to explain observed phenomena

We focus on these themes as they apply to Web builders – people who contribute links and other content to the Web:

Synthetic Analytic
Micro An individual builds a website to produce a result. we do not speak to this quadrant
Macro “The world” builds a website to produce a result. Laws are stated to explain large-scale Web phenomena.


Some Web builders consider themselves Web developers; others consider themselves bloggers; others merely post an occasional comment on someone else's blog or discussion forum. We say “Web builder” to encompass the full spectrum of people who contribute links and other content to the Web.

Our lab curriculum provides an informal hands-on approach to the task of building a Web site. The web science material (which you are now reading) explain large scale Web phenomena and other interesting web algorithms and dynamics; for example, the Amazon recommendation algorithm and the Google PageRank algorithm.

The sociology, psychology, and economics of this course follow Duncan Watts’ Six Degrees, which we recommend as a narrative companion to our own material. Our complete suggested reading list is below.

Online safety

  • Protecting yourself from evildoers

  • Privacy, trust, and ownership

Networks

Basic mathematical foundations of networks:

Sets Explicit Notation for Sets rightarrow Cardinality rightarrow Subsets rightarrow Venn Diagrams rightarrow Union and Intersection rightarrow Ordered Lists rightarrow Implicit Notation for Sets rightarrow Logical Expressions rightarrow Compound expressions with “or” rightarrow Compound expressions with “and” rightarrow Union and intersection defined formally rightarrow Similarity of Sets

Graphs Undirected and Directed rightarrow Neighborhood and Degree rightarrow Density and Average Degree rightarrow Paths in undirected graphs defined formally rightarrow Paths in directed graphs rightarrow Length rightarrow Distance

Network Structure

Hubs, clusters, and other basic structural features of the Web:

Connected: a word of many meanings rightarrow Induced Subgraphs rightarrow “Connected” defined formally rightarrow Connected graphs and connected components rightarrow Hubs rightarrow Clusters rightarrow Defining clusters rightarrow Connected components and Cliques

  • See also:

Network Dynamics

How randomness, homophily, and cumulative advantage shape the Web:

Limitations of traditional graph theory rightarrow Introduction to network dynamics rightarrow Three models of dynamic graphs rightarrow Random graphs rightarrow Demonstration of random graph dynamics rightarrow Random graph algorithm rightarrow Clusters and homophily rightarrow Triadic closure rightarrow Triadic closure algorithm rightarrow Hubs and cumulative advantage rightarrow Preferential attachment algorithm

  • See also:

Variables, Probability, and Scale-Free Networks

Understanding that the Web is a scale-free network requires some probability theory:

Variables in mathematics rightarrow Variables in algorithms rightarrow Random variables rightarrow Discrete vs. continuous variables rightarrow Probability distributions rightarrow Degree distributions

  • General discussion of scale-free networks:

    • Six Degrees: Chapter 4, pp. 101-114

    • From previous chapter on Network Dynamics: Hubs and cumulative advantage and Preferential attachment algorithm

Information and Computation

Applying fundamental concepts of computer science to the Web

Information, computation, and algorithms rightarrow Summation: an example of what computation is rightarrow HTML: an example of what computation is not rightarrow Computing distance, part one: Information diffusion rightarrow Computing distance, part two: Example rightarrow Computing distance, part three: Algorithm

  • Examples of information diffusion on the Web:

  • See also:

Collaborative Filtering

How to compute personalized recommendations:

“Expert opinions” without the experts rightarrow Tuples: content of CF rightarrow Bipartite graphs: structure of CF rightarrow Structural equivalence: computation of CF rightarrow The four steps of collaborative filtering

The Long Tail

Niches and blockbusters in the world of Web commerce:

rightarrow Macro-analytic view of collaborative filtering rightarrow Power law revisited rightarrow Niches, megahits, and the neglected middle rightarrow Macro-analytic view of the long tail rightarrow Macro view of Web programming

  • See also:

    • The Long Tail, by Chris Anderson. Wired, October 2004.

    • Going Long, by John Cassidy. The New Yorker, July 2006. Six Degrees Chapter 7, pp 207-215: Information Externalities & Market Externalities

Influence in Networks

How to compute the influence of a Web page:

rightarrow Popularity, influence, and centrality rightarrow Introduction to PageRank rightarrow NetRank: a simplified version of PageRank rightarrow Normalization and convergence rightarrow The NetRank algorithm rightarrow Dividing by outdegree: the NR* formula rightarrow The PageRank formula rightarrow The damping factor: PageRank as probability

Competition and Cooperation

What happens when Web builders seek to increase their influence?

Games: Competition and Cooperation

rightarrow Dynamics of popularity and influence rightarrow PageRank competition rightarrow Doing the right thing rightarrow Mutually assured construction rightarrow Authority, reciprocity, reputation rightarrow Game theory rightarrow Winners’ dilemma

  • See also

The contect of this site is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.