Quantifying Trust

Aug 8, 2014: Quantifying Trust

Production managers, sensory experts, and quality control professionals use the Gastrograph system to find and detect flaws or batch variations in their products. Though all of our clients are serious about the flavor profile and consistency of their products, and review with a critical palate, not all reviews are of equal trust and weight. Any single Gastrograph review is not solely dependent on the product; the reviewer’s experience, preferences, environment, and health all play a role in their response to flavor.

Trust is important for our clients, because they are interested in how much trust they can place in a flavor profile (\({\it F}\)). For example if they were to ask us for a report on a product \({\it X}\) then we would provide them with the Objective Flavor Profile and the perceived quality, both of which are computed from among the trusted reviews (I will explain how I determine which reviews are trustworthy). So in short our clients can get a concrete flavor analysis on \({\it X}\) computed with trust based weights derived from the reviews.

I will outline how we compute trust through an example. We'll take some detours to explain the differences between a model based strictly on the amount of reviews someone has completed versus a model that is consistent with the amount of reviews completed but additionally based on the character of the flavor profiles of experienced reviewers:

Consider product \({\it X}\) and suppose there exists a set of \(40 \lt N \lt 65\) distinct quality control experts who have produced 1000 \({\it F}\)s for \({\it X}\). Where \[{\it F} \in (\mathbb{Z}^{24} \times {\it ReviewerID} \times {\it ReviewID} \times aor)\] such that \({\it 0 < F_{aor} \space \forall \space q \in ReviewerID}\) . AmountofReviews (\({\it aor}\)) is “experience”, or amount of \({\it F}\)s \({\it q}\) has made, by the submission of \({\it F}\). \({\it aor}\) is an absolute count, so it counts from the first ever \({\it F}\) of reviewer \({\it q}\), not from the beginning of the month. Being that our reviewers are quality control professionals we can safely assume that our distribution in \({\it flavor \space space}\) (\(\mathbb{Z}^{24}\)) is a good profiling of X. Then any \({\it F}\), regardless of reviewer, that “breaks” away from our distribution can be considered a poor review—except for expected variations in the direction of a specific flaw, contamination, or batch variation.

I am using k-Nearest Neighbors (kNN) to explore the structure of our data to ultimately identify high trust \({\it Fs}\). kNN begins with classified data, which I did such that the classifications are consistent with \(F_{aor}\), \(\mu = (most \space experienced)\), \(\alpha = (average \space experience)\) and \(\lambda = least \space experienced\). Then I run kNN on the classified Fs to develop classification rules for unlabeled data.

Therefore classification rules are not learned on the \(F_{aor}\) alone, but are \(\bf consistent\) with \(F_{aor}\). These rules will be used to label future unlabeled \({\it F}\)s.

There is a question that should be in mind at the moment: Why don’t we divide reviews \(\bf strictly\) by \(F_{aor}\) (experience)?
First, I don’t classify \({\it F}\)s by \(F_{aor}\) alone, because then unlabeled data (unanalyzed \({\it F}\)s of X), called \({\it Unlabeled}\), would be classified as follows:

In other words we would be classifying \({\it Unlabeled}\) in an uninformed fashion. One is not using the underlying flavor profile from each review to calculate classification labels. And if you would compare the above results to classification results (below) which are carried out on \({\it Unlabeled}\) but where the classification rules were learned from the \({\it F}\)s themselves (below) we can see a significant difference.

One can see that there are \({\it F}\)s whose \(F_{aor}\) are low but are still considered trustworthy reviews, and therefore are members of subset \(\mu\). In other words we can trust such a review based on the reviewer’s ability to perceive flavor, not only the amount of reviews he/she has completed. I ultimately want to classify \({\it F}\)s as the second method illustrates because reviewers have unique flavor tolerances and learning curves. In my model it’s completely possible that \({\it Reviewers \space A}\) and \({\it B}\) may have the same amount of reviews completed but may not have the same amount of “experience”, which is not an uncommon real world scenario. The previous flexibility and descriptive power is impossible in a model that is strictly based on \(F_{aor}\).

Second, it could be that \({\it q}\), with \(F_{aor}(latest) = 200\), may be reviewing better than \({\it p}\) who has \(F_{aor}(latest) = 300\). A case like this may be common. Consider for example Jack may have been working in the company only 2 years and Adam has been for 6 years, but Jack has 10 more years of experience than Adam. This model partially accounts for a reviewer’s previous tasting experience.

A strict classification on experience would value the reviews of Jack less than the reviews of Adam, even though Jack is the more experienced reviewer. Then you might ask: Well why then did you use \(F_{aor}\) to classify data? Again I have said we don’t want to have a \(\bf strict\) classification by \(F_{aor}\), but one that is \(\bf consistent\) with it. For example reconsider the graph above, of the results of classifying an unlabeled data set.

Now I will outline how I will quantify trust per \({\it F} \space for \space X\): To quantify trust per \({\it F}\), I will first apply the classification rules from the kNN to \({\it Unlabeled}\). Second, I will determine the medoid for all \({\it F} \in \mu\), and then run a similarity analysis that would compare everyone from \(Unlabeled - \{F_{medoid}\}\) to \(F_{medoid}\). The medoid \(F_{medoid}\) is the \({\it F}\) such that \[\forall {\it j}\sum \limits _{id} d(F_{medoid}, F_{id} ) \le \sum \limits _{id} d( F_j,F_i)\] Where \(F_j, F_i \in \mu\) and we are using the Manhattan metric. In other words the average and the medoid only differ in that the medoid has to be a data point and the average does not, and frequently isn’t, a point in our set. I picked the medoid exactly because it belongs to our data set. I want to compare unlabeled \({\it F}\)s to a \({\it F}\) that belongs to (\(\mathbb{Z}^{24}\)) and not to something that might be in (\(\mathbb{Q}^{24}\))-(\(\mathbb{Z}^{24}\)). In other words I want to compare unlabeled \({\it F}\)s to a \({\it F}\) that can actually exist in our \({\it flavor \space space}\).

The results from the similarity analysis will be used to compute the Trust function whose results will be ranged from 0 to 100. Our client can then view this number in order to see the level of trust that can be placed on the individual \({\it F}\)s.

In order to develop a more accurate model of Trust I am now working on incorporating \({\it aor}\). So far I have considered two similar approaches: i) apply a transformation to \({\it aor}\) and include \({\it aor}\) in my \({\it flavor \space space}\) ii) to determine a metric, for \(flavor \space space \times aor\), that would “properly weight” \({\it aor}\). The transformation would “discretize” \({\it aor}\) and translate it to a scale from 0 to 5, the same scale flavor is measured on. Then I would add \({\it aor}\) as another axis in \({\it flavor \space space}\) and I would prep my data and compute trust values in exactly the same way I have outlined earlier. I practice I would need to test out various transformations and see if it improves our models by plotting trust vs \({\it aor}\). There has been work done in metric learning that I will apply to our data.

As far as metrics go the manhattan metric is biased towards the \({\it aor}\) because \({\it aor}\) has no upper bound and in our example ranges from 0 to 300. So my aim is to determine a metric which produce more accurate classifications, well work has been done towards that end. A metric is learned from our data such that it maximizes distance between points that should not be classified together and minimizes distance between points that should be classified in one group.

In conclusion, my approach to the problem exploits the power of kNN so that the structure of our data can dictate which \({\it F}\)s are the most trustworthy \({\it F}\)s. If you are curious as to how the similarity analysis is carried out and the definition of the Trust function, then visit our jobs page to join our team.

Arturo Leon

Data Scientist

Studied math at Florida International University. When not reading a math or physics textbook, I spend my time playing basketball, soccer and videogames.