Post Annotations
Overview
Annotations have been added to the Post object from all v2 endpoints that return a Post object. Post annotations offer a way to understand contextual information about the Post itself. Though 100% of Posts are reviewed, due to the contents of Post text, only a portion are annotated.
-
Entity annotations (NER): Entities include people, places, products, and organizations, and are delivered in the
entity
payload section. They are programmatically assigned based on what is explicitly mentioned (named-entity recognition) in the Post text. -
Context annotations: Derived from analyzing a Post’s text, context annotations include a domain and entity pairing to help discover Posts on topics that may have been previously difficult to surface. We currently use 80+ domains to categorize Posts. A CSV file of available context annotation entities is available in our Github repository.
Post Annotation Types
Entities
Entity annotations are programmatically defined entities within the entities
field and are reflected as annotations in the payload. Each annotation has a confidence score and indicates where in the Post text the entities were identified (using start
and end
fields).
The entity annotation types include:
- Person - Examples: Barack Obama, Daniel, George W. Bush
- Place - Examples: Detroit, Cali, San Francisco
- Product - Examples: Mountain Dew, Mozilla Firefox
- Organization - Examples: Chicago White Sox, IBM
- Other - Examples: Diabetes, Super Bowl 50
Context
Last updated: June 2022
Context annotations are delivered in the context_annotations
field of the payload. They are inferred based on semantic analysis of keywords, hashtags, handles, etc., in the Post text and result in domain and/or entity labels. Currently, we use 80+ domains, as shown in the table below.
Domain Categories | Domain Codes |
---|---|
3: TV Shows | 46: Brand Category |
4: TV Episodes | 47: Brand |
6: Sports Events | 48: Product |
10: Person | 54: Musician |
11: Sport | 55: Music Genre |
12: Sports Team | 56: Actor |
13: Place | 58: Entertainment Personality |
22: TV Genres | 60: Athlete |
23: TV Channels | 65: Interests and Hobbies Vertical |
26: Sports League | 66: Interests and Hobbies Category |
27: American Football Game | 67: Interests and Hobbies |
28: NFL Football Game | 68: Hockey Game |
29: Events | 71: Video Game |
31: Community | 78: Video Game Publisher |
35: Politicians | 79: Video Game Hardware |
38: Political Race | 83: Cricket Match |
39: Basketball Game | 84: Book |
40: Sports Series | 85: Book Genre |
43: Soccer Match | 86: Movie |
44: Baseball Game | 87: Movie Genre |
45: Brand Vertical | 88: Political Body |
46: Brand Category | 89: Music Album |
47: Brand | 90: Radio Station |
48: Product | 91: Podcast |
54: Musician | 92: Sports Personality |
55: Music Genre | 93: Coach |
56: Actor | 94: Journalist |
58: Entertainment Personality | 95: TV Channel [Entity Service] |
60: Athlete | 109: Reoccurring Trends |
65: Interests and Hobbies Vertical | 110: Viral Accounts |
66: Interests and Hobbies Category | 114: Concert |
67: Interests and Hobbies | 115: Video Game Conference |
68: Hockey Game | 116: Video Game Tournament |
71: Video Game | 117: Movie Festival |
78: Video Game Publisher | 118: Award Show |
79: Video Game Hardware | 119: Holiday |
83: Cricket Match | 120: Digital Creator |
84: Book | 122: Fictional Character |
85: Book Genre | 130: Multimedia Franchise |
86: Movie | 131: Unified Twitter Taxonomy |
87: Movie Genre | 136: Video Game Personality |
88: Political Body | 137: eSports Team |
89: Music Album | 138: eSports Player |
90: Radio Station | 139: Fan Community |
91: Podcast | 149: Esports League |
92: Sports Personality | 152: Food |
93: Coach | 155: Weather |
94: Journalist | 156: Cities |
95: TV Channel [Entity Service] | 157: Colleges & Universities |
109: Reoccurring Trends | 158: Points of Interest |
110: Viral Accounts | 159: States |
114: Concert | 160: Countries |
115: Video Game Conference | 162: Exercise & Fitness |
116: Video Game Tournament | 163: Travel |
117: Movie Festival | 164: Fields of Study |
118: Award Show | 165: Technology |
119: Holiday | 166: Stocks |
120: Digital Creator | 167: Animals |
122: Fictional Character | 171: Local News |
130: Multimedia Franchise | 172: Global TV Show |
131: Unified Twitter Taxonomy | 173: Google Product Taxonomy |
136: Video Game Personality | 174: Digital Assets & Crypto |
137: eSports Team | 175: Emergency Events |
138: eSports Player |
Note: Domain 131 (Unified Twitter Taxonomy) refers to X’s User Facing Interest Taxonomy. This taxonomy helps to power features on the platform such as Topics.
Requesting Annotations
Sample Request
Sample Response
Sample App
Check out the Post Entity Extractor on Glitch to easily discover context annotations in Posts and see how this feature works.
Frequently asked questions
Context annotations
The questions below are specific to the context annotations element of Tweet annotations. For more details, please see the Overview page.
How does Twitter context annotations work?
How does Twitter context annotations work?
Twitter classifies Tweets semantically, meaning that we curate lists of keywords, hashtags, and @handles that are relevant to a given topic. If a Tweet contains the text we’ve specified, it will be labeled appropriately. This differs from a machine learning approach where a model is trained specifically to classify text (in this case, Tweets) and produce a probability score alongside the output/classification.
How do I know that your data is complete and trustworthy?
How do I know that your data is complete and trustworthy?
Twitter’s annotations are curated by domain experts using research and QA processes that have been refined over the course of several years. The process is supported by custom tooling to scale data tracking as far as we are able to maintain excellent precision and recall. In addition, our data is audited regularly by an internal team, having received a precision score of ~80% for the past several quarters.
How do you ensure precision?
How do you ensure precision?
Team members QA our entities on a daily basis to ensure high precision and recall. Additionally, our work is audited quarterly by an internal team, which manually reviews 10,000 Tweets across all of our domains to calculate a precision score.
How do you decide what to track?
How do you decide what to track?
For some domains, like sports and TV, we rely on automated ingest to build out our graph. In the News domain, we track data around stories published by the Twitter Moments team. Otherwise, the team uses a variety of research methods to identify topics to track that garner a high amount of conversation on the platform.
What historical support is available with Tweet Annotations?
What historical support is available with Tweet Annotations?
Data tracking begins the moment an entity is published; therefore, we do not annotate Tweets that were published prior to a given entity being tracked. For example, if an upstart brand/company is added to the taxonomy, we will not retroactively annotate Tweets about that brand prior to when the annotation was added.
Is Twitter able to annotate Tweets in non-english languages? If so, which languages and does the coverage of Tweets being annotated change?
Is Twitter able to annotate Tweets in non-english languages? If so, which languages and does the coverage of Tweets being annotated change?
Yes. Language coverage can vary depending on the domain and the market. English and Japanese are included in the majority of the biggest entities. Below, is a list of the languages and main markets that are covered today:
- English (US, UK)
- Japanese (Japan)
- Portuguese (Brazil)
- Spanish (Argentina, Mexico, Spain)
- Hindi (India)
- Arabic (Saudi Arabia)
- Turkish (Turkey)
- Indonesian (Indonesia)
- Russian (Russia)
- French (France)
Coming soon (~H2 2021):
- German (Germany)
- Tamil (India)
Below is a table of the top 15 countries ordered by the most coverage of annotated Tweets:
Rank | Country code | Country | % of Tweets annotated |
---|---|---|---|
1 | IN | India | 41% |
2 | VN | Vietnam | 36% |
3 | GB | Great Britain | 36% |
4 | EC | Ecuador | 35% |
5 | PE | Peru | 33% |
6 | US | United States | 32% |
7 | CA | Canada | 32% |
8 | AU | Australia | 31% |
9 | JP | Japan | 31% |
10 | PH | Philippines | 30% |
11 | SG | Singapore | 30% |
12 | MY | Malaysia | 30% |
13 | MX | Mexico | 30% |
14 | GB | Great Britain | 29% |
15 | NG | Nigeria | 29% |
What underlying 'semantics' does Twitter rely on to annotate a Tweet?
What underlying 'semantics' does Twitter rely on to annotate a Tweet?
Tweet annotations consist of the following semantics to annotate a Tweet:
- Accounts - we can annotate tweets from a given handle or mentioning this handle
- Hashtags
- Keywords/phrases
For customers that are familiar with the filtered steaming APIs such as PowerTrack, the semantics used by annotations are similar in principle to the boolean rules defined to filter a stream of Tweets. If a Tweet matches the underlying semantic conditions, it will be tagged accordingly.
Why do some Tweets have entities associated with them while others do not?
Why do some Tweets have entities associated with them while others do not?
The goal is to annotate as many Tweets as possible; however, there are several reasons why some Tweets are not annotated:
- Some Tweets aren’t semantically rich enough to be labelled and can’t be tagged with our current annotation rules
- Some Tweets aren’t topical
- The Tweet is about a very ephemeral topic that’s not in our graph
- We don’t cover the language/market
- We cover the language/market but we’re missing a topic or a specific term/account/hashtag related to a topic we already track
When there are multiple domains (for example, [3,30]), does the Entity ID remain the same?
When there are multiple domains (for example, [3,30]), does the Entity ID remain the same?
An entity can be part of multiple domains. The domain IDs will change but the entity ID remains the same. Donald Glover is a person (domain 10), an actor (domain 56) and a musician (domain 54) but his entity ID is still 875072662527029248.
Do you have an established timeline for show/movie tracking? In other words, how long is a show/movie tracked before/after release?
Do you have an established timeline for show/movie tracking? In other words, how long is a show/movie tracked before/after release?
Tracking starts a month prior to the release. For popular blockbusters, like a Marvel movie, we can start tracking them as soon as they start teasing about an upcoming release.
Do movies have a locale filter similar to the one for TV shows?
Do movies have a locale filter similar to the one for TV shows?
No, they do not.