Table 1. Headwords with the most definitions.
Quality papers at the best prices
This fat-tailed, almost power-law distribution is not limited to the number of definitions per headword; the number of definitions contributed by each user follows a similar distribution, shown in figure 5 b. The majority of users have contributed only once, while there are few power-users with more than contributed definitions.
These types of distributions are common in self-organized human systems, particularly similar crowd-based systems such as Wikipedia [ 28 , 29 ] or the citizen science projects Zooniverse [ 3 ], social media activity levels such as on Twitter [ 30 ] or content sharing systems such as Reddit or Digg [ 31 ]. A noteworthy feature of UD is that users can express their evaluation of different definitions for each headword by up or down voting the definition. A similar pattern is evident, in which many definitions have received very few votes both up and down and few definitions have many votes.
Figure 6 b shows a scatter plot of the number of down votes versus the number of up votes for each definition. There is a striking correlation between the number of up and down votes for each definition which emphasizes the role of visibility rather than quality in the number of votes. However, there seems to be a systematic deviation from a perfect correlation in which the number of up votes generally outperforms the number of down votes.
Urban Dictionary: feces thesis
This is more evident in figure 6 c , where the distribution of the ratio of up votes to down votes is shown. Evidently, there is a wide variation among the definitions with some having more than 10 times more up votes than down votes and some the other way around. Figure 6. We now compare the number of unique headwords in UD to the number of unique headwords in Wiktionary, another crowd-sourced dictionary. Wiktionary manifests a different policy from that of UD. The content in Wiktionary is created and maintained by administrators selected by the community , registered users and anonymous contributors [ 14 ].
In contrast to UD, there are many different mechanisms in Wiktionary to ensure that the content adheres to the community guidelines. Each page is accompanied by a talk page, where users can discuss the content of the page and resolve any possible conflicts. Furthermore, in Wiktionary guidelines can be found for the structure and content of the entries.
Capitalization is consistent and content or headwords that do not meet the Wiktionary guidelines are removed. For example, while both UD and Wiktionary have misspelled headwords e. Wiktionary entries thus undergo a deeper level of curation. Because of the inconsistent capitalization in UD, we experiment with three approaches to match the headwords between both dictionaries: no preprocessing, lower casing of all characters, and mixed.
The number of unique headwords in UD is much higher and the lexical overlap is relatively low.
- Urban Dictionary: Sabine's Thesis.
- can a research paper be written in first person.
- best quotes to use in essays?
- Art essay a level examples.
- Title annotated bibliography apa?
- Noah Webster Would Have Loved Urban Dictionary | Essay | Zócalo Public Square.
- citing format in mla paper research sources?
Sometimes there is a match on the lexical level i. Table 2. Headword comparison between UD and Wiktionary. The table reports the unique number of headwords in each category. No threshold was applied.
Because there is little curation of UD content, there are many headwords that would not typically be included in a dictionary. Examples include nick names and proper names e. Emptybottleaphobia 9. Based on manual inspection, it seems that these are often headwords with only one entry. We, therefore, also perform a matching considering only headwords from UD with at least two entries table 3.
In this way, we use the number of entries as a crude proxy for whether the headword is of interest to a wider group of people. Note that this filtering is not applied to Wiktionary, because each headword has only one page and headwords that do not match Wiktionary guidelines are already removed by the community. For example, an important criterion for inclusion in Wiktionary is that the term is reasonably widely attested, e.
In this comparison, the number of unique headwords in Wiktionary is higher than that of UD. From a manual inspection we see that many Wiktionary-specific headwords include domain specific and encyclopaedic words e. We also find that many of the popular UD headwords i. Table 3.
Only UD headwords with at least two entries are included. In this section, we present our analyses on the different types of content as well as the offensiveness of the content in UD. We now analyse several aspects of the content in UD that we expect to be different from content typically found in traditional dictionaries as well as Wiktionary. For example, manual inspection suggested that UD has a higher coverage of informal and infrequent words and of proper nouns e. Many of the headwords are not covered in knowledge bases or encyclopaedias.
To characterize the data, we therefore annotated a sample of the data using crowdsourcing see Data and methods. In order to limit the dominance of headwords with only one entry which represent the majority of headwords in UD , the sample was created by taking headwords from each of the 11 frequency bins see table 10 for details on the way the bins were created and sampled from.
Quality papers at the best prices
Note that the last two bins are very small. For each headword, we include up to three entries top ranked, second ranked and random based on up and down votes. Annotations were collected on the entry level and crowd workers were shown the headword, definition and example. Dictionaries are usually selective with including proper nouns e. In contrast, in UD many entries describe proper nouns. We therefore asked crowdworkers whether the entry described a proper noun yes or no. In our stratified sample, Figure 7 shows the fraction of proper nouns by frequency bin.
Proper nouns. Most dictionaries strive towards objective content. We therefore asked the crowdworkers whether the definition describes the meaning of the word, expresses a personal opinion, or both. Figures 8 and 9 show the fraction of entries labeled as opinion , meaning or both , separated according to whether they were annotated as describing proper nouns. In higher frequency bins, the fraction of entries marked as opinion is higher. We also find that the number of entries marked as opinion is higher for proper nouns.
While most entries are marked as describing a meaning , the considerable presence of opinions suggests that the type of content in UD is different from that in traditional dictionaries [ 13 , pp. Figure 8. Meaning versus opinions proper nouns were excluded. Figure 9. Meaning versus opinions proper nouns entries only. UD enables quick recording of new words and new meanings, many of them which may not have seen a widespread usage yet.
Furthermore, as discussed in the previous section, some entries are about made-up words or words that only concern a small community. In contrast, many dictionaries require that included headwords should be attested i. These observations suggest that many definitions in UD may not be familiar to people. To quantify this, we asked crowdworkers whether they were familiar with the meaning of the word. The majority of the entries in UD were not familiar to the crowdworkers.
Once-A-Meetings could easily be circumvented by a simple phone call or e-mail but are instead used to validate a project managers position within the company. Figure 10 shows that in higher frequency bins, more definitions are marked as being familiar , suggesting that the number of definitions per headword is indeed related to the general usage of a headword. Figure Familiarity proper nouns and opinion entries were excluded.
The focus of UD on slang words [ 33 ] means that many of the words are usually not appropriate in formal conversations, like a formal job interview. To quantify this, we asked crowdworkers whether the word in the described meaning can be used in a formal conversation. As figure 11 shows, most of the words in their described meanings were indeed not appropriate for use in formal settings. Formality proper nouns and opinion entries were excluded.
Furthermore, the existence of such content in platforms could signal to other users that such content is acceptable and impact the social norms of the platform [ 36 ]. As a response, various online platforms have integrated different mechanisms to detect, report and remove inappropriate content. In contrast, regulation is minimal in UD and one of its characteristics is its often offensive content.
UD not only contains offensive entries describing the meaning of offensive words, but there are also offensive entries for non-offensive words e. We note, however, that UD also contains non-offensive definitions for offensive words e. To investigate how offensive content is distributed in UD, we ran a crowdsourcing task on CrowdFlower see Data and methods for more details.