GradientOne provides several tools for performing meta-analyses of results. As devices become increasingly complex, the point or points of failure can be harder to identify. Incorporating data mining techniques such as machine learning or exploration into the process of brining new products to market can help at multiple stages in the pipeline. It can help in the research and development phase by predicting the possible avenues of implementation most likely to be fruitful. It can help in the manufacturing stage by identifying faulty equipment before the final steps in the process, thus saving operation costs and time, and it can help in the market stage by identifying devices that need to be recalled or patched before failures are reported. This first post will go over how to visualize and explore categorical data with this hypothetical example.
A robotics company wants to investigate the cause of failures. They have iterated the design of their robots over the years, such that although each robot is running the same software and has the same chassis, the components may have changed. For example, some robots have Nylon Supportsinstead of aluminum ones, some robots have a Copper Heatsink on the CPU, some robots have Mecanum Wheels instead of regular ones, and some robots have Lithium Batteries instead of Ni-Cad batteries. The robotics company has a record of all of the components that went into each of their robots, and uploads each part manifest as a result, where a 0 means that that part is not in the robot, and 1 means that part is in the robot:
Since this data is categorical, and not numerical, looking at the scatterplots does not yield useful results:
Instead, the Decision Tree meta analysis is the best tool for categorical data. Since the Decision Tree analysis is a supervised learning analysis, whereas the Scatterplot Matrix was an unsupervised learning analysis, the test engineer will need to provide additional input: the dependent variable. In this case, the dependent variable is the pass/fail Result:
View Results takes you to the Decision Tree. Two trees are generated: the Optimal Tree, that separates the most passes from failures using the fewest decision points:
From this tree, we can see that the combination of Mecanum wheels and a copper heatsink or a lithium battery results will likely result in a failure, and that the combination of regular wheels, alumnium supports, and a copper heatsink also leads to failures. Scrolling down to the partition plot, we can see that this tree has a good partition of passes and failures: there are relatively few passes in a section with a lot of failures, and vice-versa:
The other tree, which can be seen by changing the value in the value in the dropdown. The Symmetric Tree expands all the parameters in the same order. This can be used for data exploration:
The partition plot for the symmetric graph has more partitions than the optimal tree, but still has a good separation between the passes and failures. From this graph, we can see that the mecanum wheels are the primary cause of failures, though it's harder to tell how the other components contribute: