Psychological Profiling Via Twitter

This weekend I was playing with a bunch of different linguistic analysis methods to better understand ReTweets, and while I uncovered a ton of cool new data which I’ll be sharing a little later this week, I also came upon an idea I think is pretty awesome, probably groundbreaking, and definitely worth Twittering about.

Communication is a window into a person’s mind, and the way a person talks can tell you a lot about how they think. Linguists have developed two methods to decoding the written word into a meaningful profile of a person’s cognitive processes.

One method is called the Regressive Imagery Dictionary (RID). This coding scheme is designed to measure the amount and type of three categories of content: primordial (the unconscious way you think, like in dreams), conceptual (logical and rational though) and emotional.

Significantly more primordial content has been found in the poetry of poets who exhibit signs of psychopathology than in that of poets who exhibit no such signs (Martindale, 1975). There is also more primordial content in the fantasy stories of creative as opposed to uncreative subjects (Martindale & Dailey, 1996), in psychoanalytic sessions marked by therapeutic “work” as opposed to those marked by resistance and defensiveness (Reynes, Martindale & Dahl, 1984), and in sentences containing verbal tics as opposed to asymptomatic sentences (Martindale, 1977). A cross-cultural study of folktales from forty-five preliterate societies revealed, as predicted from the “primitive mentality” hypothesis of Lévy-Bruhl (1910) and Werner (1948), that amount of primary process content in folktales is negatively related to the degree of sociocultural complexity of the societies that produced them (Martindale, 1976). Martindale and Fischer (1977) found that psilocybin (a drug that has about the same effect as LSD) increases the amount of primordial content in written stories. Marijuana has a similar effect (West et al., 1983). Research has also revealed more primordial content in verbal productions of younger children as compared with older children (West, Martindale, & Sutton-Smith, 1985) and of schizophrenic subjects as compared with control subjects (West & Martindale, 1988).

The other method is Linguistic Inquiry and Word Count (LIWC). In development for over 15 years, the LIWC measures the cognitive and emotional properties of a person based on the words they use.

In order to provide an efficient and effective method for studying the various emotional, cognitive, and structural components present in individuals’ verbal and written speech samples, we originally developed a text analysis application called Linguistic Inquiry and Word Count, or LIWC.

I’ve combined these two systems with a Porter stemming algorithm and my own Twitter analysis infrastructure to create

TweetPsych uses the LIWC and RID to build a psychological profile of a person based on the content of their Tweets. It compares the content of a user’s Tweets to a baseline reading I’ve built by analyzing an ever-expanding group of over 1.5 million random Tweets, then highlighting areas where the user stands out.

The service analyzes your last 1000 Tweets; as such, it works best on users who have posted more than 1000 updates. It is also better suited for running analyses on accounts that are operated by a single user and use Twitter in a conversational manner, rather than simply a content distribution platform. It takes a few moments to analyze an account the first time, but subsequent views of a profile will load faster.

I’ve tried to translate the codes that come from the two linguistic systems into more meaningful explanations, but I may have missed a few. I will continue to expand these definitions, while also refining the system and algorithm to better analyze Twitter-specific content.

I think the possibilities of a system like this are enormous, from matching like-minded users to identifying users that exhibit certain useful or desirable traits. I’d love to hear your thoughts on where this could be improved or where I could take this technology next.