Graph of Related HTML5 Tags

A graph that was built by scraping the web pages, counting each pages HTML5 tags, and graphing them using D3.js

Getting the Data

I got the idea of a graph showing relation ships between html tags after reading the MDN web docs page for html tags. The immediate issue was that there wasn't any data necessarily connecting individual tags other than the individual categories that MDN displays them in. To remedy this issue I took to crawling the internet and cataloging the use of tags on each page.

I formatted the data like this:


Manipulating the Data

After a healthy amount of this data was acquired it had to be parsed and turned into a network graph for later display. I made a decision when developing the graph about how to weight tag relationships. After trying a few schemes for determining how important a tag is for another tag I settled on this formula:

score = number_of_appearances_on_the_same_page /

This allows the score to give more weight to more related tags while still allowing some weight to more frequent tags.

From this, connections were generated by taking each tags two most related tags. After these connections were found, duplicate connections were removed and the tags and their connections were output as JSON for use in D3.js.

Displaying the Data

D3.js was great for displaying the graph. It had plenty of features but the one I was most interested in was the d3-force which allows the node graph to be automatically formatted based on spacing requirements. Another useful module was d3-zoom, this allows the network graph to be zoomed which was necessary because there were a lot of nodes to display.

Interpreting the Results

  • There were a few tags that aren't connected to any other tags such as <ruby>. These tags were not encountered in any of the scraped pages.
  • There are clusters of connected data such as HTML5 tags and table tags.

Thanks for reading.
Get the code on github!