How do you work with the output from this search client? · xdevplatform/search-tweets-python · Discussion #116 · GitHub | Latest TMZ Celebrity News & Gossip | Watch TMZ Live
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are discussing how to handle client output... With v2 JSON, the search payloads are resigned with a core "data" array containing the Tweets that matched the query, along with an independent "includes" array that contains User objects, referenced Tweets (e.g. original Tweets for Quote Tweets and Retweets), along with other supporting Twitter objects (e.g. media, polls, places).
Before v2, the search payloads provided "atomic" Tweets, with all supporting object attributes inside the Tweet object.
We are working on a new "atomic" option to have the v2 client output atomic Tweet objects and doing the work of referencing associated "includes" objects and injecting them into the "data" Tweets... I think that would be cool. Two of us (thanks Igor!) are experimenting with this and should have an update here soon (?).
It would also be good to have the client just output the response as received from the search endpoint. Seems simple enough.
So, how do you work with the client output, and what new tricks would make your integrations easier?
I hope to plug this thing in as a "database loader", so maybe that will result in more built-in output options.
From another commit message that explains the reasoning:
This commit extracts the flatten option from sample and moves it into a
separate command that twarc users could run on their data. This is to
encourage people to collect the original data wherever possible.
So where previously where you would have done this:
twarc2 sample --flatten > sample.jsonl
You will now do this:
twarc2 sample > sample.jsonl
twarc2 flatten sample.jsonl > sample-flattened.jsonl
Or, if you *really* don't want the original JSON, you can create a
pipeline:
twarc2 sample | twarc2 flatten > sample.jsonl
So in #112atomic format is identical to flattened in twarc parlance. In twarc, the idea is to store the original requests (r variety in #112 ) and then optionally post process as a.
After the tweets are retrieved, they usually get stored as is in jsonl one json object per line, gzipped. These are loaded into R or python for analysis later - usually as CSVs, flattened into a dataframe with pandas json reader or something like that.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Looking for the latest TMZ celebrity news? You've come to the right place. From shocking Hollywood scandals to exclusive videos, TMZ delivers it all in real time.
Whether it’s a red carpet slip-up, a viral paparazzi moment, or a legal drama involving your favorite stars, TMZ news is always first to break the story. Stay in the loop with daily updates, insider tips, and jaw-dropping photos.
🎥 Watch TMZ Live
TMZ Live brings you daily celebrity news and interviews straight from the TMZ newsroom. Don’t miss a beat—watch now and see what’s trending in Hollywood.
Uh oh!
There was an error while loading. Please reload this page.
-
We are discussing how to handle client output... With v2 JSON, the search payloads are resigned with a core "data" array containing the Tweets that matched the query, along with an independent "includes" array that contains User objects, referenced Tweets (e.g. original Tweets for Quote Tweets and Retweets), along with other supporting Twitter objects (e.g. media, polls, places).
Before v2, the search payloads provided "atomic" Tweets, with all supporting object attributes inside the Tweet object.
We are working on a new "atomic" option to have the v2 client output atomic Tweet objects and doing the work of referencing associated "includes" objects and injecting them into the "data" Tweets... I think that would be cool. Two of us (thanks Igor!) are experimenting with this and should have an update here soon (?).
It would also be good to have the client just output the response as received from the search endpoint. Seems simple enough.
So, how do you work with the client output, and what new tricks would make your integrations easier?
I hope to plug this thing in as a "database loader", so maybe that will result in more built-in output options.
Beta Was this translation helpful? Give feedback.
All reactions