280k posts and counting. What do you want to know about Yas Forums?

I'm collecting data from pol. I can do lots of cool stuff with this data, but what do you guys want to know? Here's some basics that I've gotten so far.

Flag Distribution:
plotly.com/~aasonvpsvsjladvn/1/#/

Meme Flag Distribution:
plotly.com/~aasonvpsvsjladvn/3/#/

Information regarding the sample I used is in a comment on the first link. It's not perfect, but it should be representative enough.

I plan to eventually do analyses to find things like trends between flags (particularly the meme flags) and certain words/phrases.

Attached: pol-stats.png (861x693, 123.14K)

Other urls found in this thread:

plotly.com/~aasonvpsvsjladvn/3/#/
docs.google.com/spreadsheets/d/1I5kJ9C4iX_BTbVy8jufcFE4z5J4Wl5uhGaU-aBk-RZU/edit#gid=0
docs.google.com/spreadsheets/d/1OIyYqfHaaNQ2OrV25x_Oj2n2zEPDRSs5ygvJd1A23Jo/edit#gid=0
twitter.com/AnonBabble

If youre doing it from archives youre as dumb as the thinktanks who have done it for years

This is my first go at it, so yes, the archives are most accessible. Scraping those ensures that I don't have to re-scrape the same ones for updates. I can always scale it up with minimal work.

You're a retard. You think you're special making this data set? You're not. Kill yourself

Why are you guys being so hard on him? It sounds like a fun project.

I just wanted to help, you big meanie!

Attached: a.jpg (400x400, 23.93K)

Mutts law

what kind of variables you working with?

Pure autism. Nicely done compiling fren

Attached: 151952975925.jpg (920x687, 119.17K)

wtf calm down schizos

r a r e
a
r
e

how did you het pnly 280k posrs? /cvg/ threads should have more than that already alonr

You can chunk out 30% as USA [meme flag]

I've just been taking stuff from the archives. For now, I've been saving Thread#, UserID, TimePosted, Flag, URL of image, and post text.

It's super basic, but I just started Saturday night.

Im proud of you bucko.

that sounds fun. how you gonna dive into the post text? seems like the most interesting one here.

I have an idea. Find this year's top 100 non-get posts with the most replies and post them.

Interesting shit

>Canada has the highest per capita use
Wtf is wrong with us

Keep it simple and post meme flags vs national id in one pic

bbc posts aren't going to write themselves

Tell us how many times we say nigger, nigger.

Like I said in OP, it'd be nice to see if certain meme flags say certain phrases more often. We might be able to spot some trends to better identify the shills.

I was originally hoping to see trends for certain words over time, but I underestimated just how much data is on Yas Forums. I'll have some trouble scraping a decent sample size regularly, but it can be done.

I don't expect to learn anything practical with this. It's just something I'll do while it interests me.

also non-op posts of course

How many times was nothingburger mentioned in the past 3 months on a weekly basis? I think this would be very groundbreaking data.

pic strangely relate

Attached: Historyofnothingburgers.png (676x776, 178.08K)

I may eventually get some population data so we can see all the numbers per capita.

Holy shit dude you have serious problems

Cool stuff OP. Don't let faggots get you down. I remember a long time ago moot saying that most of the worst shitposts were from Australia and he considered banning all of them.

I just started my data collection. I was surprised at how much people post on here. It's unlikely I can store all that data, but I'll keep adding to the database.

Of 280k posts i've saved from the past 2 days, only 334 mention the exact phrase "nothing burger". I'll need to use better logic to find all the variations.

ew almost 8% angloids

That fade where your picture cuts off at the bottom is aesthetic as fuck.

Whats the rarest memeflag and also whats the most used word?

No, he's just a retard.

I haven't parsed out the "reply" portion of each post. that's going to take some work.

Of my tiny dataset, the OP with the most replies is this one.

Can you pull out the most commonly used phrases? Looking for the most popular copypastas. I don't know how you'd program this but check the correlation of each post with every other post, and give me the top 10 most popular 3+ word phrases on Yas Forums.

THIS IS AN EXCELLENT IDEA

what is the distribution of posts ending in consecutive integers by countries

I am confident that Australia has a massive share of them relative to their overall posting share

Attached: 1584720704598.jpg (300x251, 41.41K)

I haven't broken it up by word yet, but the rarest memeflag is Jihadi cus who the fuck wants to be one of those lol

All meme flags:
plotly.com/~aasonvpsvsjladvn/3/#/

Now everyone post a picture of your face so I can include that in the data.

To those of us archiving and cataloguing the recent uptick in shill posts and COINTEL raids, do you have anything in your data set that might prove useful in tracking it? I know it's a difficult metric to tie down but copypastas, flag rarity to content type (looking at you chang), and image name could help. Any info in regards to that would be cool, but I feel you probably aren't one of us so meh.

Attached: d9cae6cfc4677fb3553d6dbffc4e1622d12be8edf80ad70d97310baf81eabf4c.png (1488x1470, 76.73K)

Does post text include if someone links to another post? This should be joined. You want to know whether a comment is a response or a standalone comment because it changes the context.

I’d also recommend collecting whether something is green text or not. If green text and a link, then it’s often a quote or summary. Compare green text against the link text. This may help later in filtering duplicate text occurrences or attributing text to users when it’s really just a quote that is then shit on. See where I’m going. You need to normalize the data before diving into the analyzation of it so that it can be viewed in the proper context.

After that. I’d like to know the origin of certain topics, where they originate from and when. It should then be able to determine if a certain topic evolves naturally or is pushed in a concerted effort. Then we can ask why.

DELET THIS

There's missing information if you do it with the archive it's not reliable. Also there's already several agency doing the exact same and they come to stupid conclusion all the times.

I haven't done much work parsing out the actual posts yet. Here's the most common posts with exact matches with a count of at least 20. Hopefully the link works and people can't fuck with it; i'm not too familiar with google docs.

docs.google.com/spreadsheets/d/1I5kJ9C4iX_BTbVy8jufcFE4z5J4Wl5uhGaU-aBk-RZU/edit#gid=0

keep in mind, i have a very small sample size.

BASED

no

This, at the end of the day with the way these forums function its difficult if not downright impossible to tie anything meaningful to posts and use it in any meaningful ways. Our big boy human brains are able to see things like typing style, stereotypes, inside jokes, etc. to be able to identify posters, shills, or to understand what prior knowledge/group the person has/belongs to. Artificial Intelligence is a far cry from being able to do that with any meaning or consistency but its grand to watch special interest groups spend millions just to say "They shitpost and are mean >:(". One special interest (israeli) organization used metadata to track images as they propagate around the web to help them shill but then Anons found a way to track it themselves and fucked it up, which was grand.

Attached: pewds.gif (690x472, 1.01M)

you're destroying pol

I want to know how many of those posts were Chinese bugmen on VPN.

Lmao!!!!
>despite being only 5.23% of total posts . . .

This is really a cool project. You should find a prepackaged chat bot and train it with only Canadian poster data and make a faggotbot.

Attached: 1584962483336.jpg (1200x960, 61.59K)

Roughly half the threads posted today have been chink slide threads. Who cares? This place is a shithole.

Do i have to call people a nigger to be welcomed here lol? i've been using Yas Forums for a few months now, and it's opened my eyes to things I never would have discovered otherwise since people are too afraid to talk about real issues in public.

Looking for shills would be great. I'm not sure if the archives will be good enough for that, and I don't know how machine learning works. I'd have to manually find trends which should be doable.

Attached: a.jpg (750x832, 76.24K)

Want me to tell you why you're a virgin?

You.

Shill detection is not data driven. It's agenda driven. Although i guess keywords would work on some level. But the keyword schizos are the worst anons because language adoption is not a real indication of anything.

>but what do you guys want to know?
Accuracy of mutts law. How quickly it rings true in every thread

Easy to VPN from chyna

Nah, not that you aren't a raving wignat. Just that most far right wingers aren't the "Making a large data set as a hobby" type. The point of the ideology is to appeal to the real world and getting away from dopamine drips such as internet addiction. Makes it difficult to teach proper OpSec or to combat the left on their own territory. It's getting better as the Zoomers flood in but it's still a tad uncommon. Sorry m8. If you could get some meaningful data on the topic it'd be helpful to us lamp-lighters trying to help newfags understand the difference between us and the shills/cointelpro.

Attached: 1583135644054.jpg (1400x1048, 268.36K)

>1 post by this id
Soulless chink bugman shill with small penis and smooth brain who thinks like a cat. His parents are ashamed.

>Why are you guys being so hard on him?
Why whiteknight for OP? You know it's just a journo "exposing hate" or shareblue, or ADL or JIDF or QCHQ or Reddit or discord trannies. it can't be good. FUCK OP

Wtf. Just get a notepad and for every thread write 1 for true and 0 for false. It's pretty fucking observable.

I'd expect more Kekistani really. I guess less people come here for the lulz these days.

You're a fucking joke.

good. stay on your toes, fren.

Attached: a.gif (256x300, 13.92K)

why are you gay?

>

Attached: 1580046882633.gif (345x237, 1.86M)

KEKistan can go back to r/thedonald

I think it’d be interesting to track the evolution of a meme. I guess you have to do text, so maybe a how many rolls of tp kind of meme

How so? Nigger.

Attached: maxresdefault.jpg (1280x720, 44.51K)

We have the highest ratio of foreign faggots spoofing an id to here

Flag distribution for the idiots that say "made for BBC" or anything analogous.

why aren't there a [meme flag] behind canada?

Attached: 1584978793287.jpg (700x465, 71.78K)

See for yourself.

docs.google.com/spreadsheets/d/1OIyYqfHaaNQ2OrV25x_Oj2n2zEPDRSs5ygvJd1A23Jo/edit#gid=0

You literally think you matter. It's fucking hilarious.