KnowEvo Logo

Knowevo Visualization and Discovery Facilitation Tools

This document describes the visualization and analysis tools developed by the Knowevo project team. The Library of Congress visualizations were developed after the completion of the start-up project and are based on the preliminary dataset that spans from 1800s to 1950. Since only a limited dataset was used for these visualizations, the observations we make should be considered tentative until they are verified using the full Library of Congress book collection. Encyclopedia Britannica tools are based on editions 3, 9, 11, and 15. Some of the EB tools are integrated with the Wikipedia data.





I. Knowevo Library of Congress Tools: Historical Knowledge Mapping

1. Knowevo Galaxies of Knowledge

Try it now

This tool allows the user to explore relations between the Library of Congress Classification categories and the change in these relations over time. This visualization uses a force-directed graph with the distance between the nodes based on several metrics that use the number of LCSH headings shared between the two corresponding categories. The size of a domain node corresponds to the number of books categorized under that domain for a given year. The nodes are colored according to the top-level category to which they belong. The large nodes correspond to LCC one letter classes, smaller "planets" around them correspond to LCC two letter subclasses.

In the current prototype, the user chooses a year using the horizontal slider, and regulates the visual model properties (charge, strength of connections, distance between nodes, similarity measure) using the sliders in upper left corner. The user can either view the state of the book collection for a given year, or view the domain network only for the books published that year. The tool is available at http://knowevo.cs.uml.edu/demos/galaxies.

The observed clusterization reconstructs the intuitively felt "geography" of knowledge domains: for example, the proximity of science and technology and their distance from the clusters formed by various humanities domains (see Figure 1).

lcc-mockup-1a.jpg

Figure 1. "Galaxy of knowledge" for the year 1950.

Figure 2 shows the proportionate growth of domains and their interconnectedness, from mostly unconnected domains in the year 1881, to a more connected configuration in the year 1912, to a densely interconnected map in the mid-century.

final-fig1a.png final-fig1b.png final-fig1c.png

Figure 2. Galaxies of Knowledge for years 1881, 1912, and 1947.





2. Knowevo Landscape of Knowledge

Try it now

This tool visualizes the domains of knowledge as a 3-d mountainous surface projecting from the square plane produced by the LCC categories. Each category is represented by a location on the plane, where the column corresponds to the top category, indicated by a letter (e.g. 'Q' for "Science"), and the row corresponds to the subcategories within that domain (e.g. 'QA' for "Mathematics), etc. The more books in a category, the higher the corresponding peak. Putting together historical "snapshots" of such surfaces allows one to observe and compare the growth of different regions of knowledge, its valleys, hills, and peaks. (Similar methods have been used by Denton (2012) for visualizing the differences between regional libraries.) Figure 2 below shows examples of such landscapes for the years 1900 and 1950.

m1900-axis.png m1950-axis.png

Figure 3. "Landscapes of knowledge" for the year 1900 (left) and 1950 (right).

This is a discovery tool that allows the user to observe interesting regularities in the way the domain landscape changes. For example, the comparison between 1900 and 1950 clearly shows a disproportionate rise of domains of Fiction, American and English literature (LCC subclass PR, PC, and PZ) and American history (LCC class E and F) and the appearance of the new groups of domains in science (LCC class Q) and technology (LCC class T).

The visualization is available at http://knowevo.cs.uml.edu/demos/landscape.





3. Knowevo Pulse of Knowledge

Try it now

This visualization is a stream graph that shows domains (and subdomains) as "streams" whose width changes over time reflecting the proportion of the LC collection allotted to different domains each year. Figure 4 shows the distribution of categories in the LC collection from 1859 to 1950. The two most conspicuous changes in the graph are (1) the widening of the lower yellow band (second from the bottom), representing American History in 1861-1866, i.e. roughly the time of American Civil War and (2) the widening of the khaki band (at the top) representing World History in 1914-1919, i.e. roughly the World War One. These data may suggest that people turn to history during the times of great historical calamities.

lcc_rivers.png

Figure 4. "Pulse of knowledge" stream graph for the years 1850-1950 (all top LCC domains).

Figure 5 shows the distribution of sub-domains in the category Science ( LCC class Q) for the years 1850 -1950. Following the "pulses" of scientific disciplines, we can trace their relative fortunes during this period: note, for example, the slow decline of astronomy (reddish-brown band at the bottom of the chart) and the growth of chemistry toward the mid 20th century (the green band, 3rd from the bottom). We can also visually observe an appearance of a new domain, microbiology which begins as a very narrow band (pink, 6th from the top) in the 1870s and begin to widen substantially in the early 20th century. Finally, we may even be observing a strong domain-wide impact of a single book: an abrupt widening of the green band representing zoology (at the top of the chart) in 1860-1862 seems to follow the publication of Darwin's Origin of The Species in 1859, which caused a great controversy and a burst of publication activity among biologists and zoologists. (We admit that in this case our hypothesis is not much more than just a guess -- but the coincidence is too striking to neglect).

This visualization is available at http://knowevo.cs.uml.edu/demos/pulse.

lcc_science_rivers.png

Figure 5. "Pulse of knowledge" stream graph for the subdomains of Science for the years 1850-1950.





II. Knowevo Encyclopedia Britannica Tools: Facebook of the Past

Below, we provide a set of screenshots for the educational online tools for tracking and mapping the social dimension of the history of knowledge about people and a history of reputations. These tools are available through the Facebook of the Past interface. The back-end of these tools is a database that contains all articles about people in different Encyclopedia editions (Britannica 3, 9, 11, 15, and Wikipedia). Articles on the same person are matched across editions to compile a master list of people. Each subject is characterized by measures of importance and centrality in respective editions, by the network of co-occurring subjects ("neighbors"), and by the list of categories accompanying this subject in Wikipedia. A walk-through demo of the tools is available at https://www.youtube.com/watch?v=R2TUde1MO2c.

Try it now

1. Knowevo reputation graph

This tool maps historical changes in individual reputations. The user inputs the name of the person of interest, and then, if needed, is offered a choice among namesakes. Once the name is selected, the tool produces a graph that represents the importance of this person in successive historical editions of Britannica and in Wikipedia.

Figure 6 shows the relative importance of articles on Bach across different editions, as measured by relative article volume, compared to the volume of the entire encyclopedia. Note that due to OCR noise, the current algorithm linked in an article on Bach's son in the 11th edition. Figure 7 shows the relative importance of articles on Voltaire across different editions, as measured by relative article volume, compared to the volume of the entire encyclopedia.

importance-bach.jpg

Figure 6. Importance of Johann Sebastian Bach as reflected in different editions.

importance-voltaire-new.jpg

Figure 7. Importance of Voltaire as reflected in different editions.

2. Knowevo domain graph

The user picks the domain of interest. The domain is effectively, a Wikipedia category, or a cross-section of Wikipedia categories or lists (e.g. French 19th century composers, chemists, members of the romantic movement, etc). Once the domain is picked, the system produces the historical graph of the changes in relative importance of the participants of the domain through time, across the historical editions of the Britannica and for Wikipedia. The resulting cumulative graph represents the history of the domain, i.e the historical changes in composition and relative hierarchies within the domains. Figure 8 shows relative importance of different members of the category of 1860 US Presidential Candidates through time. Importance for a category is computed in the same way as importance for its single member, treating the category as the union of the text in all member articles.

category-importance-us-candidates-1860-new.jpg

Figure 8. Importance of different members of the category of 1860 US Presidential Candidates.

3. Peers and influences

This tool allows the user to investigate social and intellectual connections of persons featured in the current edition of Britannica and Wikipedia, and reconstruct the underlying social graph, thus creating an entertaining interface for studying history of human connections. This feature also the user to view the contemporaries of each historical personality ("peers"), as well as the people who influenced this person and those who he or she has influenced ("influences"). These three categories are determined by co-occurrence in Wikipedia and lifetime dates for each pair of people.

Figure 9 shows peer communities of Bach's connections. The goal here is to visualize Bach's social circles. The interface allows the user to adjust the granularity of the social circle display. Figures 10 and 11 show visualizations of social circles for Shakespeare and for Pushkin.

visualize-bach-1.jpg visualize-bach-2.jpg

Figure 9. Visualizing Bach's social circles at two different granularities

visualize-shakespeare-new.jpg

Figure 10. Visualizing Shakespeare's social circles.

visualize-pushkin.jpg

Figure 11. Visualizing Pushkin's social circles.

Figure 12 shows the 20th century view of Isaac Newton's intellectual network of people who influenced him (in green), his contemporaries and peers (in pink), and his intellectual followers (blue). The size of the node represents relative importance of each person, using article size as the proxy.

newton-circles.jpg

Figure 12. Isaac Newton's peers and influences (link).

4. Bach, Handel, Mozart, Vivaldi: A use case study on tracking reputations in Britannica.

To illustrate the use of these tools, we examined the domain of the 18th century classical composers. We identified domain members in Britannica 11 and 15 by employing our matching algorithm (Luo et al, 2014) to map the relevant Wikipedia articles to the corresponding articles in Britannica. We then used Knowevo importance ranking algorithms to the rank domain members in each edition. Figure 13 illustrates the change in importance. The legend on the left lists the composers alphabetically. The top two bar charts are their rank in the 11th and 15th editions, respectively. The bottom bar chart shows their reputation change from 11th to 15th edition, sorted on its absolute value.

Importance_Change.png

Figure 13. Biggest "movers and shakers" among the 18th century composers.

Table 1 shows the relative ranking for the most important 18th century composers in the 11th and 15th editions of Britannica. The top five composers remained the same, but the order of importance underwent a significant change. In the 15th edition, Mozart replaced Bach at the top of the hierarchy, a change potentially brought on by the era of sound recording which led to classical music reaching a wider audience; this may have proved detrimental to Bach's difficult polyphonies, while Mozart's light melody lines with suitable harmonic accompaniment rose in popularity.

The most drastic change within the top 5 composers was Handel's drop from the second to the fifth place. A possible explanation may lie in the history of genres: in the 20th century, the genres of the archaic Italian opera and oratorio that defined Handel's oeuvre lost their popularity and were, in general, less frequently performed and recorded.

Two composers that did not have a dedicated article in the 11th edition, but ranked quite high in the 15th edition are Georg Philipp Telemann and Antonio Vivaldi. Their 20th century rediscovery is a well known fact. Importantly, our algorithm has been able to "catch" these two comebacks automatically.

Table 1. The rank of top five 18th century composers in the 11th and 15th edition of Britannica.

1911 (11th edition rank) 1985-2000 (15th edition rank)
1. Johann Sebastian Bach
2. George Frideric Handel
3. Wolfgang Amadeus Mozart
4. Christoph Willibald Gluck
5. Joseph Haydn
1. Wolfgang Amadeus Mozart
2. Johann Sebastian Bach
3. Joseph Haydn
4. Christoph Willibald Gluck
5. George Frideric Handel