Do you ever feel that the only reason we have elections is to determine if the polls were right? - Robert Orben
I don't trust polls ever since I read that 62% of women have affairs during their lunch break. I've never met a woman in my life who would give up lunch for sex. - Erma Bombeck
Crazy old men are our entire source of information for polls. - J. O'Rourke
There are just a few days left until the presidential election in America, and if you even glance at the news, you are sure to come across the results of one, or more likely several, Public Opinion Polls (POPs) showing in whose favor voters are currently leaning. For example: "A poll of 1,512 voters showed that Trump has 51% support, while Harris has 49%." Republicans are rejoicing, and Democrats are mobilizing. There may be grounds for optimism, but on the eve of the 2016 elections, Clinton's advantage was 13%, and you know how it ended. - So, these polls are irrelevant, and there is no point in conducting them at all? - you ask. - An excellent question to which there are not one, but three answers: - YES, they do not matter! - NO, they do! - NO, they do not need to be conducted! - Confused? Of course, but let's try to figure it out.
PO polls are conducted not only during elections but also on very different occasions: Public Opinion is an important argument in a dispute and can be decisive in making a decision, especially if there are no other arguments or you don’t want to reveal your true intentions for the time being. For many years, the decision to legalize gambling in Massachusetts was discussed, and polls were conducted until they got the “right result” - Yes, we really want a CASINO here! Or the question of legalizing marijuana in the same state: 42% supported it in 2010, 43% in 2014, and suddenly, in 2016, it was already 52% FOR, and the law was passed; here, you also need to know how to ask the right question. But for now, as an example, we will focus on polls that predict election results - on the day's topic.
Conducting any more or less serious survey is not an easy task, and it is not amateurs who do it, but specialized firms, of which there are several dozen in our state alone and hundreds in the country as a whole, the most famous of which are: Gallup, Pew, YouGov, Ipsos, Harris Poll, and Rasmussen, have thousands of employees and billion-dollar budgets. In the process, we will find out why so much money is needed and what these employees do. Let's start with what a poll is, although probably many know it or have even participated in such polls.
So, Someone wants to know public opinion on some issue and asks a certain number of people: "Who will you vote for?" 100 people were polled, 88 voted for the Democrats, 2 for the Republicans, and 10 abstained. The Democrats' victory is beyond doubt, but it turns out that the poll was conducted among Harvard students and professors, which means that this sample of respondents is not representative on a national scale. The poll is irrelevant, that is, useless since the mood and affiliation of this group are already known. - And how can we make it representative? - We need to select respondents randomly; every US citizen should have an equal chance of being included in the sample. Easy to say!
But let's try. It is known that every legal resident of America has a unique number - SSN - Social Security Number, and, probably, somewhere in the depths of the Department of Social Security (SSA) there is a database of all the holders of these numbers, in electronic format, of course, and not on punch cards. To begin with, you need to send this Department a request to provide you with a list of all SSNs whose holders have the right to vote: that is, to filter out children under 18, the deceased, convicted, green card holders, and similar disenfranchised folks. About two hundred million such "voting" numbers will be received, after which the poll organizers hold a lottery and randomly (like in Toto) select the required number of respondents, say 10 thousand. The survey will be anonymous - names and surnames are not needed, but some information is necessary on how to get to these chosen ones - phone numbers, emails, or postal addresses. A new request flies to the department, and if it is satisfied, then you can start calling these ten thousand on the list or sending them anonymous letters, if you don’t know their names indeed or emails with your questions. Good luck!
This approach will be correct and statistically reliable; however, it is unlikely that the SSA will respond even to a request from the Presidential Administration or the FBI - they are very busy. Therefore, in practice, surveys are conducted by calling randomly selected numbers from phone books, which are, to some extent, available on the Internet (White Pages). Of course, you can collect random email addresses or send paper letters with questionnaires, but this is exotic; albeit such surveys do exist, throwing unopened envelopes in the trash would be a pity.
So, the primary POP method is phone calls. Specially trained people, called pollsters, call random people and ask questions, not just one, "Who will you vote for?" but many. First, demographic questions, so as not to accidentally get into a situation like "everyone from Harvard," and then additional ones, to kill as many birds or flies as possible, for example, in addition to the presidential candidate, you can also ask questions about state senators, marijuana legalization, casinos, electric cars, abortions and many other things that may interest the customer of the survey.
The typical number of demographic questions ranges from 5 to 10, including:
1) Gender – 2 categories (M/F)
2) Age - 10 categories (18-25, 26-35, 36-45, etc.)
3) Race - 5 categories
4) Education - 4 categories
5) Family income - 5 categories
6) Place of residence (state) - 50 categories
7) Employment - 3 categories
8) Marital status - 4 categories
This may be followed by questions about political orientation:
9) What party does he belong to?
10) Your ideology (liberal, conservative, moderate, progressive…)
11) Registered as a voter? (yes/no)
12) Did you vote in the last elections?
13) Will you vote on these?
These questions should be asked as soon as possible because if the client does not intend to vote, the survey can be stopped, or such a respondent cannot be included in the statistics. And finally, the main question:
14) Who will YOU vote for (Dem/Resp/Independent)?
Consider the pollsters did their job and submitted 1,000 filled questionnaires, a typical number for a national survey (all-American). Let's now look at a simple example of how statistical analysis is performed. The question is: "How will men and women (M/F) vote depending on race (white/non-white - only two categories for race: W/B)?" Here are the results of the survey:
Group | Number of respondents (n) | For Dems | For Reps | Dem% | Rep% | Margin of Error |
M-W | 550 | 170 | 380 | 33% | 67% | 8.4% |
F-W | 150 | 90 | 60 | 60% | 40% | 16% |
M-B | 200 | 150 | 50 | 75% | 25% | 13.9% |
F-B | 100 | 60 | 40 | 60% | 40% | 20% |
Total: | 1000 | 470 | 530 | 47% | 53% | 6.2% |
The statistically possible error is calculated using a formula that depends on the number of people in each group (n); the larger the group, the more accurate it is. That is, the result with the percentage for each category is not exact, but falls within the range of values, for example, white women (F-W) will vote for Democrats with a probability of 60+/-16% margin, that is, a spread from 44% to 76%. This is a probable but not guaranteed "victory" in this weight category, but even this cannot be said about non-white women: the spread is 40% to 80%. The same uncertainty in the voting results as a whole: the Republicans will receive from 47% to 59%. That is, they may lose.
If such accuracy does not suit someone, then to reduce the error, it is necessary to increase the sample size; in our case, to reduce the Margin of Error (MOE) from 6.2% to 5%, the total number of respondents n must be approximately 1600. Note that in the current elections, the candidates are running much "closer." let's look back to the first paragraph of this note about the latest Gallup poll results 51:49 in favor of Trump and calculate the error or MOE, as statisticians say, using the formula MOE2 = 4/n. For the n = 1512 respondents declared by Gallup, MOE = 5.1%, and the poll results, to put it mildly, are uninformative.
Well, here, as usual, everything has to be paid for, but if the customers are ready, then by increasing the survey to 40,000, we can reduce the MOE to 1%, which will allow us to claim who will win with a spread of 51 to 49%. If only this (gigantic n!) was the only problem with polls! Already, in our modest example, we were unable to determine the election preferences of not only colored women but also white women, and that would be fine since only the main result (who wins?) is essential here, and the Customer did not like it. Indeed, any POP specialist will immediately point out the shortcoming - women make up 50.5% of America, and in your survey, there are only 25%. The pathetic excuse that women are generally busier and do not have time to answer stupid questions does not work - this is discrimination, and we must fight it. And so, all women's answers are given a coefficient of 3 (three)! - Why? - Because 250 women participated in this poll for every 750 men, it should be equal! And here's what the new table looks like, with what's changed highlighted in bold.
Group | Quantity (n) | For Dems | For Reps | Dem% | Rep% | Margin of Error |
M-W | 550 | 170 | 380 | 33% | 67% | 8.4% |
F-W | 150x3=450 | 270 | 180 | 60% | 40% | 9.4% |
M-B | 200 | 150 | 50 | 75% | 25% | 13.9% |
F-B | 100x3=300 | 180 | 120 | 60% | 40% | 11.5% |
Total: | 1500 | 770 | 730 | 51.5% | 48.5% | 5.1% |
Of course, one can find another flaw in the fact that the percentage of white men is “artificially” inflated. Non-white men should also be given a coefficient of about two, but then the number of women should be increased again. For now, we will leave the result as is since the Customer is satisfied with the victory of democracy and no longer refuses to pay.
Instead, let's look at the demographics of this survey, take a magnifying glass, and ask its organizers for the results for individual groups of the population, for example, "How will Asian women living in Alabama, 30-40 years old, unmarried, unemployed, PhDs with an income of less than 30,000 vote?" The questionnaire had eight questions on demographics (see above), which means we need to collect everyone who fell into the right group and calculate the percentage of "donkeys" and "elephants". We sort through 1000 questionnaires and a miracle: there are two Asian women in this group, and both are democratically minded. Although the MOE for n=2 is about 130%, statistics never lie! - And why a miracle? – Yes, it is because we have really many demographic groups - not a few, but 1,200,000, yes, a million with an extra; check the number of categories for the first eight questions: 2 x 10 x 5 x 4 x 5 x 50 x 3 x 4 = 1,200,000. Suppose we randomly select 1,000 objects from a large set of such objects with 8 characteristics. In that case, the statistical probability that there will be 2 objects with the same set of these characteristics is 18.8%; that is, with a probability of 81%, all of our 1,000 respondents will be alone in their group.
- You don't believe me? And rightly so: in our poll, statistic science was "put to shame" twice: first, there were two Asian women with the same, as they say, profile, and second, there is a group that includes 110 white married men, all retirees aged 70-80 and matching in all categories except for the states of residence. The probability of a group of this size is statistically one billionth, but nevertheless, it was formed, and it is known how they will vote: 100:10, guess who.
- Well, - you ask - do we need millions of respondents to get group statistics? - Theoretically, yes, especially if we exclude retirees from the former USSR, but in practice, POP professionals have found a more economical solution. Of course, you won’t be able to squeeze anything worthwhile out of two people, but let’s say you have a homogeneous group of respondents for 50 white men aged 30 to 40, of whom 27, or 54%, intend (according to the survey results) to vote for the “reds.” But in the group of white men aged 18–25, only 5 filled forms were received – not enough for analysis (well, young people don’t like all kinds of questions), but from past surveys we know that this group usually votes 11% more liberally than 30–40 year olds. Subtracting these 11% from 54%, we get 43%, which means the victory of the “blues” in this group. If there is insufficient data, we use proportions, such as in the example of discrimination against women or previously found correlations between groups. If you want, Everything is possible!
Until now, we have assumed that the survey data is quite real, and the problems are due to a lack of human material (sample size) and insufficient randomization (randomness of the peoples selection for polling), and now let's look at how this data is obtained in practice. It is known that potential voters are phone-called by specially hired pollsters, who are given specially compiled questionnaires and receive special training on conducting a conversation and extracting all the information with minimal energy expenditure. It is clear to everyone that this is not an easy task: most people do not answer calls from unknown numbers at all, and upon hearing about a poll, especially a political one, they immediately hang up or are going rude and then hang up, since there are many more exciting ways to spend 20-30 minutes of your time, especially if it is non-working hours. So, you are lucky if you get one consent to be surveyed out of twenty to thirty calls. Then you need to keep the fish on the hook until he/she/it answers all 15-20 questions, that is, one filled questionnaire per hour if you are lucky again.
The pollster's pay can be time-based, $10-15 per hour, with extra pay for hazardous work or piecework, say $20 for each questionnaire. Both options are obviously problematic for the employer since the first leads to gigantic expenses, and the second encourages falsification: the pollster is tempted to fill out the forms himself or at least add the missing data. Ideally, if you are smart, there is no need to call anyone or anywhere, and the result will be there, but you don't have to go overboard, providing a hundred nicely completed questionnaires a day when others have less than ten. You must also fill them out wisely, knowing there is approximate red/blue parity in the country. No more than 55% of your voters should vote for the Democrats unless the poll is conducted in California.
In companies that deal with POP, there are employees called auditors who, based on the same probability theory, can quickly identify suspicious results (outliers) and their generators. But what will this give? Where will you find new, honest people willing to work at a loss? Let's say the audit revealed falsification. Does that mean the poll results should be annulled, and the customer's money returned? Ha-ha! Here we can also estimate the costs: for 1000 voters to be, pay pollsters 20 thousand, technical staff another 10, statisticians 50, and taking into account the company's profit margin, the customer is invoiced for 300K, otherwise your work will not be respected. Well, of course, the money comes from the election campaign funds.
There are also questions for the respondents: how honestly do you, Ladies and Gentlemen, answer the poll questions? Do you always correctly report your gender (according to 26 categories), income level, and marital status? Is it true that you will vote for our vice president in the elections, and are you not a hidden racist or misogynist, whatever that word means? Perhaps you do not believe that our poll guarantees anonymity and that it is absolutely impossible to establish your identity from your phone number.
Well, what a disgusting pessimist this author is; he doesn't believe in people, honesty, or scientific statistics - if not before this moment, then certainly now, the Reader thought. - But no, I'm an optimist, and I want to believe that pollsters (are they connected to Poltergeist?) exist at all. I drive away thoughts about all the polls being generated on the computers of the companies conducting these polls. Will you say it's unethical? Of course, but it's cheap and practical. I myself was called once, though it was a long time ago.
Nevertheless, many people believe there are POP companies whose polls accurately predict election results. Doesn't that prove that their scientific methods are correct? - Let's split this good question into two: if we talk about the results as a whole - who was elected president, then the probability of guessing is the same as when tossing a coin; either guessed or made a mistake (50:50) and of course many will guess, mainly if polls are conducted every day. The same goes for state elections: in six key (swing states) we (pollsters) use the coin method, and in the rest no one conducts polls - there's no need. But as for demogroup statistics, there's nothing to check at all. - How did white married women aged 30-40 vote in Massachusetts? - Who can know if the vote is secret? So, to determine how they voted, we need to conduct an exit poll again, and we already know about this stuff; it is even more problematic than pre-election predictions.
Do the owners of the polling companies and the mathematicians and statisticians who serve them know about the methods of conducting these surveys? - Of course they do, that's what they are experts for. - And the VIP Customers of the surveys? - They don't know and don't want to know (ignorance is a blessing); what they want is what they get. - Then what's the point of this circus? - Now we've gotten to the heart of the matter! The point is that the sole purpose of these polls is not to figure out public opinion but to shape or influence it. What should we think about the published poll results: "The Republicans are leading by 3%" - That means the People are for us (or for them), our slogans are correct, keep it up! - A 10% lead can signal to voters that the job is done - victory is in the bag, and someone will go fishing instead of voting and will return and see that everything happened the other way around, but it's too late: you, guy, didn't vote and "the last straw broke the camel's back."
Let me immediately clarify that this cannot happen in America and other democracies. Still, in third-world countries and authoritarian regimes, the results of the POP can serve as a kind of damper (German: a device for stabilizing the system during sharp fluctuations ). For example, AI algorithms or other reliable sources show that the government candidate is losing to the opposition candidate by 20%, which is visible to the naked eye, that is, to the broad masses of the population. During this period, the results of multiple polls are published in a continuous stream, showing a minimal gap in one direction or another. Little by little, this convinces the masses that the balance of the scales can swing in any direction. When, as a result of gross manipulation of the results, a candidate loyal to the regime wins by a small margin of 666 votes out of 100 million, the number of dissatisfied people and the danger of a riot are significantly reduced. - Well, our polls predicted a tough fight - and so it turned out.
But does the truth really not interest anyone? After all, knowing how people are predisposed, what issues concern them, and how to conduct an election campaign based on this real stuff is crucial. In 1993, before the elections, Netanyahu was caught cheating on his wife, and it seemed that he should withdraw his candidacy. Still, the POP showed that his popularity had only increased, and there was no reason to worry. Of course, accurate information about the state of mind, especially during elections, is vital, but interestingly, polls are not the tool to obtain it. Everything that is needed can be extracted from social networks and analyzed using special algorithms, which use Artificial Intelligence and predict how each(!) person will vote with a probability of 95%, unlike from polls, the accuracy of which borders on fortune-telling on coffee grounds or the entrails of sacrificial animals. These are the results that “real guys” use to make decisions.
***
Dear readers, I understand that you are tired of this pseudoscience—statistics. But be vigilant and do not forget what Comrade Lenin wrote about us back in 1913: " People have always been and always will be stupid victims of deception and self-deception in politics until they learn to seek out the interests of one or another class behind any moral, religious, political, social phrases, statements, promises. "
© Dimus, October 2014
Не знаю, что и сказать на это... Может песню спеть: " Мне осталась одна забава - пальцы в рот , да весёлый свист. Прокатилась дурная слава, что статистик я и скандалист..."
У всех свои проблемы. Голосование в США не всех волнует.
Ребята, когда вы комментируете, я не знаю, кто это, если вы не зарегистрированы на Dimus.me. подписывайтесь хотя бы инициалами или зарегистрируйтесь.
Димуля, позволь посочувствовать...Расстроюсь, если оно тебе надо - дерьмо это. Не зашло...
Странные у вас там забавы... Все дела переделали, за классификацию чертей взялись... Явный зажор...🐷