It’s Massive and It’s Patented
This is the first in a series of technology posts where we’ll give folks a peek behind the curtain on how we build and deliver our valuable insights. We’re going to start with sharing our patent, which provides a good overview of our system and core logic. The patent claims sit at the core of GraphMassive, enabling us to understand consumers in ways that until now have been impossible.
I began creating GraphMassive with the goal of understanding the people behind user generated content. This was during the rise of social networks, which made the connections between people available at scale for the first time. I approached the problem by starting with the notion of a person and their connections at the core and then attached a person’s content and metadata, including network impacts (i.e. likes and retweets). The result was a rich information source to perform analysis upon. Over time, I was able to convince Lanny Ripple and Nathan Halko to join me on this adventure. They’ve contributed many great ideas and concepts, allowing the idea to become reality. All three of us share in the patent on systems and methods for identifying and analyzing internet users for SpotRight.
Figure 3 shows one instantiation of a system that collects data from the internet, then scores and associates derived information with a social profile and makes it searchable for query and reporting purposes. At SpotRight, we didn’t just come up with an idea, write a patent, and then call it a day. We’ve implemented it using a scalable lambda architecture. We have all the cool big data tools in play — Cassandra, Hadoop, Spark, Kafka, Zookeeper, Solr, and more. In future technology posts, we’ll share some of the things we do with these tools.
The patented claims at a high level are:
A server system for collecting, storing, and analyzing social profiles, including features such as:
- a graph builder module and graph data store for handling edges between users representing relationships and their strength;
- merging user nodes based on graph data;
- a compute reach module computes reach values based on quantitative analysis of the graph datastore;
- a compute impact module computes a number and quality of actions taken in response to a user’s publication of content on the Internet;
- reach and impact values as determined by the analysis module are used to calculate scores in the scoring module and the scores are assigned to social profiles in the searchable social profile datastore.
A method for generating reports enhancing an understanding of Internet users based on their generated content and actions taken by others in response to the generated content, the method comprising:
- collecting content or other raw data via a crawler module that accesses web pages from a network interface of a server system;
- associating the content or other raw data with a social profile residing in or being added to a memory;
- calculating scores of reach and impact for the social profile based on terms parsed from the content or other raw data, wherein the reach scores are based at least in part on a number and quality of relationships between a user associated with the social profile and other users, and where the relationships are identified based on (1) explicit links between users and (2) user actions that imply relationships;
- receiving a query, via the network interface, for users fitting one or more contexts;
- identifying social profiles fitting the one or more contexts;
- returning a report in response to the query and transmitted through the network interface, having the social profiles fitting the one or more contexts wherein an order of the social profiles is based on the scores.
A method comprising:
- seeding a crawler module;
- crawling the Internet to find new users that can be used to create new social profiles, the first crawling based on the seeding;
- crawling the Internet to further populate existing social profiles, the second crawling based on the seeding including at least consumer data from customer transactions, wherein the existing social profiles are based on existing customers of a marketing client;
- parsing content or other raw data generated by the first and second crawling into terms;
- associating the terms and the content or other raw data with the existing social profiles or the new social profiles;
- computing scores for each term based on reach and impact values for each term, where the reach values represent a number of relationships between a user and other users as well as a quality of those relationships;
- associating the scores for each term with the existing social profiles and the new social profiles;
- returning a report to the marketing client via the network interface in response to a query received via the network interface, the report including one or more social profiles matching contexts found in the query.
These claims, implemented at the core of our platform, allow us to find scored social profiles for any input segment, enrich them with offline data and deliver powerful insight reports to our customers. With our patented platform and the volume of data we have available, we’re just scratching the surface on the types of insights we can provide on your customers, your competitor’s social media followers, or almost any segment you can define.
It took us a little over 48 hours and 200 lines of code…