Words Across Borders

Visualizing Word Frequencies Over Time and Space via Google Ngrams

Posted December 14, 2025

Click Here to Visit the Web App

(If app is “asleep” press button to get it back up)

What is Words Across Borders?

Words Across Borders is an interactive dashboard/web app that allows users to explore how often different words (or phrases) are used in written texts across different languages and over time. This is based on the Google Ngram Viewer dataset, which contains the relevant word frequency data for various languages from 1500-2022 (though this webb app only includes data starting from 1600 due to the small sample size of texts from before that). Google already has a visualization tool for this dataset, called the Google Ngram Viewer, which allows users to graph the frequency of input words over time. However, it only provides the one visualization type, and can only be used for one language at a time. As such, Words Across Borders acts as an enhanced version that allows cross-language comparison and varying visualizations. More details can be found in the “User Manual” tab of the web app.

Backstory

This was a project I created alongside a group of classmates as part of our “final project” for an introductory visualizations course in my Master of Science in Data Science course. The primary goal of our final projects was to apply course concepts regarding effective data visualizations, with a requirement for the project to be deployed publicly (as opposed to being a local file). We chose Streamlit over other options such as Tableau Public for hosting our app because we were already using Python for our visualizations, and because Tableau does not support dynamic data very well (we required API calls to Google Ngram Viewer, given how infeasible it was to download the multiple terabyte dataset).

Each student in the class had to submit a short project proposal to the class, and were allowed three votes each as well. The most highly voted projects would then be chosen. I had initially planned to work with a friend around the Steam Web API, or some other video game dataset, trusting that we’d be able to get another 3 or 4 votes for that. As such, I had intended my proposal to be a throwaway, just there to fulfill the requirement that every student submit one. Having worked with Google Ngrams tangentially as part of an assignment in my undergraduate course at UC Berkeley (Computer Science 61B: Data Structures), this dataset came to mind immediately. Hence I proposed, partially as a joke, to explore the dataset in order to answer whether stereotypes were true, such as “do Asian parents indeed praise their kids less?”Spoilers, the phrase “proud of you” is indeed the least frequently seen in the Chinese language over any Western language, though the translation used for Chinese is also not the best (one of the many limitations of our project is that more accurate translations of the input term into other languages is more of a linguistic issue than data science, and out of scope of this project). But this is a tangent and I digress. During the in-class voting, I began to sweat as I realized I was rapidly gaining votes, and next thing you know, I had the highest or second highest number of votes, and my project was chosen. Students began to flock over to join my group… and I couldn’t really say no anymore. Not to say that I wasn’t interested in this concept, after all it came to mind for a reason, but I just think it was quite funny how this was meant to be a throwaway pitch for me to work with my friend on something else.

How to Use

Instead of giving you a wall of text here for instructions, please just play around with the web app yourself. It should be somewhat intuitive, and we spent quite a bit of our time on UI design, adding tooltips, etc. There is also a “User Manual” tab with some more context and instruction if needed.