Social Network Analysis and Topic Modeling of codecentric’s Twitter friends and followers

24.7.2017 | 8 minutes reading time

Recently, Matthias Radtke has written a very nice blog post on Topic Modeling of the codecentric Blog Articles , where he is giving a comprehensive introduction to Topic Modeling. In this article I am showing a real-world example of how we can use Data Science to gain insights from text data and social network analysis.

I am using publicly available Twitter data to characterize codecentric’s friends and followers for

identifying the most “influential” followers and using text analysis tools like sentiment analysis to characterize their interests from their user descriptions
performing Social Network Analysis on friends, followers and a subset of second degree connections to identify key players who will be able to pass on information to a wide reach of other users and
combing this network analysis with topic modeling to identify meta-groups with similar interests.

Knowing the interests and social network positions of our followers allows us to identify key users who are likely to retweet posts that fall within their range of interests and who will reach a wide audience.

Twitter Mining

Via the Twitter REST API anybody can access Tweets, Timelines, Friends and Followers of users or hash-tags. One drawback of the REST API is its rate limit of 15 requests per application per rate limit window (15 minutes). An alternative would be to use Twitters’s Streaming API , if you wanted to continuously stream data of specific users, topics or hash-tags. Here though, I want to look at a snapshot of codecentric’s Twitter followers to show some of the possibilities that analyzing this information holds.

On July 15th, codecentric had 449 friends (users who codecentric follows) and 2732 followers (users who follow codecentric), while 261 of them are simultaneously friends & followers.

We now have the following information about these friends and followers:

user name
user screen name
user description (the short introduction that each user can write about themselves)
number of tweets per user
number of followers per user
number of friends per user
date of account creation
account location
account language
etc.

This data can tell us a lot about who is interested in codecentric and what we do. We can e.g. start with a simple exploratory data analysis and look at what languages the accounts are set to – no need for fancy models (just yet)!

Top 10 languages of codecentric’s Twitter friends and followers.

As we can see, the vast majority of friends and followers have English and German account settings. The insight derived from this is that tweeting in both, German and English will find an audience among our followers (even though English would probably be more inclusive, assuming that most, if not all, German followers will also be able to understand English tweets).

Who are codecentric’s most influential followers and what are they interested in?

We can also try to identify our most influential followers. These would be followers with a big network (i.e. who have many followers) and who also tweet/re-tweet a lot. If we capture these followers’ interests with one of our tweets, they are a) more likely to re-tweet and b) will reach a bigger audience by doing so!

grafic of correlation between follower count and the average number of tweets per day

Correlation between follower count and the average number of tweets per day of codecentric’s Twitter followers.

The plot above shows the correlation between the number of followers codecentric’s followers have and how often they tweet.

Now that we know who our most influential followers are, we can analyze their short descriptions about themselves to find out what they are interested in. By proxy, this will give us an idea about which kind of tweets are most likely to capture their interest. Of course, this is not to say that these are the only people who (should) matter and that tweets should be tailored towards these interests only! Covering a wide range of topics makes for an interesting and authentic profile but since “knowledge is power”, it can be extremely valuable to know which tweets/posts are likely to increase visibility!

In order to extract information from the descriptions of the most influential followers (defined as the top 100 followers based on a score of follower count * average tweets per day), I am making use of text analysis and natural language processing tools.

To prepare the data, I am splitting the user descriptions into words, convert each word to its word stem and remove stop words.

We can now identify the most common words in these descriptions.

Wordcloud showing most frequently used word stems in codecentric’s Twitter followers’ descriptions.

Not surprisingly, software development, agile and business are among the most common words. But also IoT, data and science occur frequently in our influential followers’ descriptions!

Instead of looking for the most common words, we can also look for the most common word pairs (bigrams).

grafic of most frequently used word pairs

Most frequently used word pairs in codecentric’s Twitter followers’ descriptions.

This graph shows the most common word pairs in our influential followers’ descriptions (arrow colors represent how often the pair occurs). Because we are looking at a relatively small set of followers, none of the word pairs occur exceptionally often. Still, data science is the most common word pair!

Sentiment analysis

Sentiment analysis describes a collection of natural language processing tools and resources that are used to identify subjective information in text, like positive or negative sentiment, joy, digust, fear, anger, etc.

Here, we can also use bigram analysis to identify negated meanings, i.e. words preceded by “not”, “no”, etc. In sentiment analysis, the meanings of negated words can then be reversed.

Overall sentiment score distribution of codecentric’s Twitter followers’ descriptions.

This plot shows the overall sentiment in the user descriptions of the most influential followers. Based on Bing Liu’s sentiment lexicon, we can score how many positive and negative words were used in each followers’ description. Because this lexicon is only available for the English language, we can only get realiable scores for followers with an English description (68 out of 100 followers have an English language setting). As we can see, the majority of followers have predominantly positive descriptions.

Social Network Analysis

Social networks describe interactions between people, e.g. Twitter friends and followers. The analysis of such networks makes use of graph theory.

Here, we can show codecentric’s Twitter followers and friends as a directed network: each node represents a user and edge arrows indicate who a user follows.

Because of Twitter’s API rate limit, I have only mined the friends lists of 106 of codecentric’s friends. Still, this leaves us with a network of 39929 second degree connections!

With graph theory we can calculate a number of metrics that allow us to identify key players in the network:

centrality and node degree to find nodes with many adjacent edges (i.e. users who are highly connected)
closeness to find central nodes (i.e. users that can spread information to many other users)
transitivity or clustering coefficient, which measures the probability that adjacent nodes are connected
PageRank or eigenvector centrality, which scores nodes according to their connections with high-degree nodes
betweenness centrality and diameter (to describe the shortest and longest paths between nodes)

Below, I am showing the network graph with node size representing betweenness centrality. Nodes with high betweenness centrality are on the path between many other nodes, which makes them key connections or bridges between different groups of nodes. These users are very important because they are likely to pass on information to a wide reach of other users. Node positions are calculated with the Fruchterman-Reingold layout algorithm.

grafic social network of codecentric's Twitter friends and followers

Social network of codecentric’s Twitter friends and followers. Node size represents betweenness centrality.

Topic Modeling

We can now use the follower descriptions again to identify groups of users with similar interests. For a detailed introduction to topic modeling, see Matthias Radtke’s “Topic Modeling of the codecentric Blog Articles”“ .

Here, I am using Latent Dirichlet Allocation with the VEM algorithm to group codecentric’s first and second degree connections into five topics.

The wordcloud below visualizes the most characteristic words for each topic.

Wordcloud showing representative word stems of five topics from topic modelling.

Now, we want to know which topic each user in our network belongs to. This, we can find out with the so called gamma score. Each user is assigned the topic with highest respective gamma score.

Social network of codecentric’s Twitter friends and followers. Color indicate topic from topic modelling.

This network shows the different interest groups of codecentric’s Twitter friends based on what topics they and their friends were assigned to (one user e.g. seems to be follow many users assigned to topic 1, which is about software development).

Even though this network is far from representative, because it only shows a subgroup of second degree friends, we can already see the potential that this information contains! We now have a very good idea about the interests of our friends from a) their Twitter descriptions and b) from the descriptions of the users that they in turn follow. We could now, for example, generate a similar network with first and second degree followers. It would give us a good idea about the interests of users who are not (yet) followers. This information could be used to target specific interest groups by expanding or focusing more on topics where we see a potential for reaching many users via existing followers.

We could even imagine combining this approach with machine learning techniques to predict follower interests and sharing-potential.

All analyses have been done with R version 3.4.0.

Code is available via Github .

Was this post helpful?

Blog author

Shirin Elsinghorst

People Lead & Principal Consultant Data/AI

Do you still have questions? Just send me a message.

fromShirin Elsinghorst

Deep Learning Workshop at codecentric AG in Solingen

Big Data – a buzz word you can find everywhere these days, from nerdy blogs to scientific research papers and even in the news. But how does Big Data Analysis work, exactly? In order to find that out, I attended the workshop on “Deep Learning with Keras...

Big Data
Data
AI
Machine Learning

6.2.2018 | 6 minutes reading time

Shirin Elsinghorst

Looking beyond accuracy to improve trust in machine learning

Traditional machine learning workflows focus heavily on model training and optimization; the best model is usually chosen via performance measures like accuracy or error and we tend to assume that a model is good enough for deployment if it passes certain...

Data
Machine Learning
Python

9.1.2018 | 11 minutes reading time

Shirin Elsinghorst

Explore Predictive Maintenance with flexdashboard

Predictive Maintenance Predictive Maintenance is an increasingly popular strategy associated with Industry 4.0; it uses advanced analytics and machine learning to optimize machine costs and output (see Google Trends plot below). A common use case for...

Big Data
Data
Machine Learning

2.11.2017 | 3 minutes reading time

Shirin Elsinghorst

Data Science for Fraud Detection

What is fraud and why is it interesting for Data Science? Fraud can be defined as “the crime of getting money by deceiving people” (Cambridge Dictionary); it is as old as humanity: whenever two parties exchange goods or conduct business, there is the...

Big Data
Data
Machine Learning

5.9.2017 | 10 minutes reading time

Shirin Elsinghorst

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Introducing Data Interface Quadrants (DIQs)

In today’s rapidly evolving, data-driven world, organisations face an increasingly complex challenge: how to design, implement, and manage data interfaces that meet both immediate operational demands and long-term strategic business objectives. A data...

API
Data

30.1.2025 | 8 [Missing String "readingTime"]

Daniel Kocot

Miriam Greis

Open Source hits Billion-Dollar Market: DeepSeek-R1 is shaking up the ...

On January 27, 2025, the technology stock exchange experienced an unexpected crash: The NVIDIA stock price plummeted by over 17%, temporarily wiping out nearly $600 billion in market value and setting a new historical record in the stock market. Many...

AI
Generative AI
LLM

29.1.2025 | 8 [Missing String "readingTime"]

How we can hack an AI with just a few words

How we can hack an AI with just a few words Artificial intelligence (AI) has undergone an astonishing transformation in recent years and is now present in many areas of life. Whether in the form of chatbots that help us with everyday questions or generative...

IT-Security
AI

27.1.2025 | 4 [Missing String "readingTime"]

Access Databricks UnityCatalog from duckdb

Databricks is a great platform when it comes to data management and governance, mostly due to the unity catalog. But Spark as an engine for processing the data is just ok'ish, especially when data is not really big. New engines like polars, datafusion...

Data

20.1.2025 | 5 [Missing String "readingTime"]

Matthias Niehoff

Charge your APIs Volume 36 - Trends for 2025

As 2025 approaches, new trends are emerging in the world of APIs. After 2024 was user-centric, the focus is now shifting back to developer needs and increasing productivity. APIs are evolving and the technologies surrounding them are becoming more powerful...

Integration
API
Data
Software architecture

11.12.2024 | 5 [Missing String "readingTime"]

Daniel Kocot

Simplifying LLM Application Development: A Newcomer's Perspective

I. Introduction Large Language Models (LLMs) have become highly popular due to their transformative impact on various fields, especially within IT. They enable developers to create innovative software applications centered around AI interactions, offering...

Generative AI
AI

6.12.2024 | 13 [Missing String "readingTime"]

Function Calling with GPT Models

GenAI is a powerful tool for generating content and interacting with applications using natural language. However, this tool also has significant limitations when you plan to use it in your own software. GenAI's knowledge is limited to information that...

Generative AI
AI
LLM

6.9.2024 | 5 [Missing String "readingTime"]

When Business Meets Technology: From Data Product to Data Architecture...

Abstract The Data Product Canvas (DPC) is a tool for the lightweight and iterative definition of data products. It increases the efficiency of product definition by clearly presenting the key impact areas on data products. Additionally, the DPC motivates...

Software architecture
Data
DDD
Digital product developement

6.8.2024 | 24 [Missing String "readingTime"]

Charge your APIs Volume 28: Empowering application and data integration...

In today's fast-paced world, seamless application and data integration is crucial for organisational success. This blog explores how frameworks like Maslow's Pyramid, Team Topologies, Evolutionary Architectures, API Federation, and API Marketplaces, ...

API
Data
Integration

25.7.2024 | 8 [Missing String "readingTime"]

Daniel Kocot

Exploring Dapr: A Deep Dive into Distributed Application Runtime

In a recent blog post, we introduced Dapr (Distributed Application Runtime) and highlighted its potential as a valuable tool for cloud-native applications, in combination with Aspire. This post dives deeper into the inner workings of Dapr, explaining...

Software development
Cloud native
Software architecture
Open Source

10.7.2024 | 10 [Missing String "readingTime"]

Manuel Zapf

Modern Microservices: Unleashing the Power of .NET Core, Aspire, and Dapr

I recall the days when writing a web application in C# with .NET meant deploying it on an IIS web server for accessibility. Today, this approach seems outdated, especially with the shift towards microservice-based architectures. Fortunately, Microsoft...

Software architecture
Open Source
Cloud
Microservices
Infrastructure as Code
.NET
Cloud native

27.6.2024 | 8 [Missing String "readingTime"]

Manuel Zapf

Data for the Masses Volume 2: Data Products, Data Contracts and API Contracts

The pillars of modern data architectures as success factors for organisations In the digital economy, a well-thought-out data architecture and the efficient use of data are crucial for organisational success. Data products, data contracts and API contracts...

Data
API

13.6.2024 | 7 [Missing String "readingTime"]

Daniel Kocot

Becoming a Data-Driven Company with Applied Data Products

In recent years, the hype surrounding the value of data has grown continuously, and a multitude of concepts and methods have emerged on how companies can become 'data-driven'. From strategic top management to detail-oriented data analysts attempts are...

Agile
Big Data
Data
Product management
Digitalization
Data Science
Business Intelligence

18.5.2024 | 9 [Missing String "readingTime"]

A/B Testing: Tool support and testing GrowthBook

In the previous blog post we introduced some general concepts of A/B testing: we explored the main aspects, defined test types and explained the most common statistical methods. Now we want to explore the areas in which A/B testing tools can provide...

Testing
Python
Data
UX/UI
Analysis
JavaScript

18.3.2024 | 20 [Missing String "readingTime"]

Francesca Diana

Demystifying the Kubernetes Gateway API: What the heck is it and why should...

When Gateway API debuted in October last year, this concluded a nearly four-year-long process that started in summer 2019. Gateway API is the successor of core Ingress definition, aiming towards various goals. This blog post will give a brief overview...

API
Open Source
Cloud
Networking
Kubernetes
Cloud native

15.3.2024 | 6 [Missing String "readingTime"]

Manuel Zapf

How to gain visibility as a software developer?

No matter if junior, medior or senior, introverted or extroverted: Every software developer can increase their visibility with different tools and should treat the topic as important. The only question is: how and with what effort? In this blog post,...

Training
Software development
Community
Open Source

21.2.2024 | 6 [Missing String "readingTime"]

A/B Testing: An introduction

This blog series aims to aid teams who are contemplating adding A/B testing to their toolkit but are unsure of which tool to use. In addition to helping with tool selection, the series also provides the entire team with a consistent initial understanding...

Testing
Data
UX/UI
Analysis

6.2.2024 | 29 [Missing String "readingTime"]

Francesca Diana

Data for the Masses Volume 1: The Digital Product Passport - A Key Element...

The Digital Product Passport represents a significant shift for digital units within organisations, compelling them to ensure comprehensive data transparency. This tool not only serves as a product's digital fingerprint but also opens up new dimensions...

Data
Product management

25.1.2024 | 7 [Missing String "readingTime"]

Daniel Kocot

Answer questions about your documents with OpenAI and Pinecone

In recent years, large language models (LLMs) have made remarkable progress in interacting with humans, showcasing their ability to answer a wide array of questions. Trained on publicly accessible internet content, these models have broad knowledge across...

13.11.2023 | 12 [Missing String "readingTime"]

Lukas Lehmann

Charge your APIs: NordicAPIs Platform Summit Edition - API first ... not...

In the ever-evolving landscape of software development, buzzwords and paradigms come and go. One such term that has gained significant traction in recent years is "API-First Development." It's been hailed as the holy grail of modern software engineering...

API
Data

19.10.2023 | 5 [Missing String "readingTime"]

Daniel Kocot

Social Network Analysis and Topic Modeling of codecentric’s Twitter friends and followers

Twitter Mining

Who are codecentric’s most influential followers and what are they interested in?

Sentiment analysis

Social Network Analysis

Topic Modeling

Was this post helpful?

Blog author

More articles

Deep Learning Workshop at codecentric AG in Solingen

Looking beyond accuracy to improve trust in machine learning

Explore Predictive Maintenance with flexdashboard

Data Science for Fraud Detection

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

More articles in this subject area

Introducing Data Interface Quadrants (DIQs)

Open Source hits Billion-Dollar Market: DeepSeek-R1 is shaking up the ...

How we can hack an AI with just a few words

Access Databricks UnityCatalog from duckdb

Charge your APIs Volume 36 - Trends for 2025

Simplifying LLM Application Development: A Newcomer's Perspective

Function Calling with GPT Models

When Business Meets Technology: From Data Product to Data Architecture...

Charge your APIs Volume 28: Empowering application and data integration...

Exploring Dapr: A Deep Dive into Distributed Application Runtime

Modern Microservices: Unleashing the Power of .NET Core, Aspire, and Dapr

Data for the Masses Volume 2: Data Products, Data Contracts and API Contracts

Becoming a Data-Driven Company with Applied Data Products

A/B Testing: Tool support and testing GrowthBook

Demystifying the Kubernetes Gateway API: What the heck is it and why should...

How to gain visibility as a software developer?

A/B Testing: An introduction

Data for the Masses Volume 1: The Digital Product Passport - A Key Element...

Answer questions about your documents with OpenAI and Pinecone

Charge your APIs: NordicAPIs Platform Summit Edition - API first ... not...