Hi Folks.
I cobbled together a program to perform data mining on a large
collection of sizable ISAM files. The algorithms will mine the
warehoused data for relevant statistics, and generate predictive
analytics to guide management decisions and measure performance.
However, extracting real meaning from data can be challenging,
fiendishly complex to understand, and wildly counter intuitive.
Major factors to consider are: bad data, flawed processes and misinterpretation of results can produce false positives and
negatives, which can lead to inaccurate conclusions and
injudicious business decisions. I would like to know your
professional opinions with the following questions:
1). in social sciences, is it practical or useful to develop
a predictive model?
2). Are there any ironclad guarantees around predictive models?
Good questions!
Here are some particularly dubious answers, from the least authoritative member of this group.
1) Yes
2) No
As a young fellow, I naively thought that good predictive models were unachievable because of the nature of the "soft sciences" themselves. My mother was a sociologist. I degreed in physics and astronomy. Hence my faulty prejudice! Unable to find employment after my education, I ended up in engineering... and they've been stuck with me for over 45 years.
Anyway, let me give you an anecdote that could be helpful. In continuous process control we often gather large volumes of data. As part of quality control, lab testing can determine specific quality levels. However, there's a common problem when the lab analysis is not available soon enough to adjust production controls. We have a discipline called statistical process control. But we are hardly every really satisfied by what we can do.
Nonetheless, the large volumes of data accumulated in process control and quality analysis, is a wonderful resource by which a neural network can be trained.
So, I've set up neural networks specifically in order to perform some type of real time process/quality control.
Yet, even when the neural net achieves higher than 90% accuracy, people complain. They ask, "But HOW does the net know? What rules is it using?" However, the nature of the neural net does not easily expose why it arrives at its predictions.
In the end, people prefer inaccuracies from a predictive model that they understand, to accuracies from one they cannot understand.
So in my opinion, given the data issues you describe, plus all the presuppositions on which they may be founded, I think you may be into a situation that is so fundamentally complex that you will never be able to prove to yourself -- let alone others -- that you have useful predictive model.
Of course these issues may have been solved by others since my day. But I rather doubt it.
I will not be offended if you find this completely useless. However, if you just say "Thank you." I can maintain the illusion that I can still be helpful.
Good luck, ma'am.
On 22/05/2018 1:42 AM, Kellie Fitton wrote:
Hi Folks.
I cobbled together a program to perform data mining on a large
collection of sizable ISAM files. The algorithms will mine the
warehoused data for relevant statistics, and generate predictive
analytics to guide management decisions and measure performance.
However, extracting real meaning from data can be challenging,
fiendishly complex to understand, and wildly counter intuitive.
Major factors to consider are: bad data, flawed processes and misinterpretation of results can produce false positives and
negatives, which can lead to inaccurate conclusions and
injudicious business decisions. I would like to know your
professional opinions with the following questions:
1). in social sciences, is it practical or useful to develop
a predictive model?
Yes. BUT, there are some provisos...:
(These provisos apply to changing the alpha factor on an ancient
Inventory Control system to reflect seasonal demand fluctuations, constructing a deep neural network for a specific AI application and
loading 10s of millions of data points into it, or to a simple self-modifying heuristic programming example; in other words, ANY kind
of software where the "results" are predicated on previous results and modified within desirable constraints:)
1. The predictions (no matter what the algorithm claims) should be considered to be accurate to within 50%. In other words, the model can
be used to give a "general likelihood" of what is going to happen.
2. No financial risk of any kind must be taken, based on the prediction.
3. The rules above don't get changed if the model is within 10% (Unless
it is run across at least 1000 datasets and ALWAYS predicts within 10%
of the actual outcome.) In other words, the "credibility" of the model
may improve but that doesn't alter the rules for using it.
2). Are there any ironclad guarantees around predictive models?
No.
But that doesn't mean they are worthless.
Some classes of problem can ONLY be solved by a computer using
heuristics or AI, because it would take longer than the time available,
to solve them using traditional methods.
If a heuristic model finds its way through a complex maze, the solution
may not be the BEST one, but it is better than NO solution.
If an AI net predicts cases of cholera within 5 miles of your location,
you might well laugh it off but you'd probably renew your vaccination,
just in case...
This whole field is expanding rapidly and it is likely that much more reliable predictions will be available within the next few years. It
might then become possible to relax rules 1 and 2, but for now you
should treat the output from a predictive model with extreme skepticism, even when it gets it pretty much right...
It's like the "Pirate Rules" in Pirates of the Caribbean; more of a "guideline", really.
Pete.
--
I used to write COBOL; now I can do anything...
Hi Folks.
I cobbled together a program to perform data mining on a large
collection of sizable ISAM files. The algorithms will mine the
warehoused data for relevant statistics, and generate predictive
analytics to guide management decisions and measure performance.
However, extracting real meaning from data can be challenging,
fiendishly complex to understand, and wildly counter intuitive.
Major factors to consider are: bad data, flawed processes and >misinterpretation of results can produce false positives and
negatives, which can lead to inaccurate conclusions and
injudicious business decisions. I would like to know your
professional opinions with the following questions:
1). in social sciences, is it practical or useful to develop
a predictive model?
2). Are there any ironclad guarantees around predictive models?
Thank you for your feedback.
Hi Doc Trins O'Grace,
I appreciate your informative feedback. I interviewed some
of my friends who are working as engineers and their answers
rather surprised me. They said to increase the percentage of
accuracy and clairvoyant logic of their predictive analytics,
they must leverage the power of a new wrinkle in their field,
and it is the reliance on Machine Learning and AI, Artificial
Intelligence.
I find it shocking that the most sophisticated
predictive software can become fully non-predictive in just
two weeks period, due to the complexity, uncertainty and
unpredictability of our connected world. case in point, the
collapse of financial services firm Lehman Brothers, and the
great recession of 2007 that was not predicted by economist
who are trained to forecast and uses predictive analysis.
Hi Doc Trins O'Grace,
You are absolutely right -- the trouble is knowing your inputs.
I must ask my self are the data good and make sure the data used
as well as the processes that generate and organize it are of
the highest quality and fully understand them. I don't want to
spend long time and resources only to find a bug in the data.
One can still get problems even with the best data. Garbage in,
garbage out. Predictive analytics are risky by nature, they are
valid as long as the input data are also valid.
On 23/05/2018 8:07 AM, Kellie Fitton wrote:
One can still get problems even with the best data. Garbage in,
garbage out. Predictive analytics are risky by nature, they are
valid as long as the input data are also valid.
Sadly, no, they are not.
In article <fmkbvdFmonU1@mid.individual.net>,
pete dashwood <dashwood@enternet.co.nz> wrote:
On 23/05/2018 8:07 AM, Kellie Fitton wrote:
[snip]
One can still get problems even with the best data. Garbage in,
garbage out. Predictive analytics are risky by nature, they are
valid as long as the input data are also valid.
Sadly, no, they are not.
Mr Dashwood, it seems that folks no longer study the Hawthorne effect.
DD
On 25/05/2018 6:12 AM, docdwarf@panix.com wrote:
In article <fmkbvdFmonU1@mid.individual.net>,Sorry Doc, not sure of your allusion here. My position has been
pete dashwood <dashwood@enternet.co.nz> wrote:
On 23/05/2018 8:07 AM, Kellie Fitton wrote:
[snip]
One can still get problems even with the best data. Garbage in,
garbage out. Predictive analytics are risky by nature, they are
valid as long as the input data are also valid.
Sadly, no, they are not.
Mr Dashwood, it seems that folks no longer study the Hawthorne effect.
consistent throughout the thread (whether it was observed or not... :-)):
"Don't trust the results of analytics."
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 1,030 |
Nodes: | 10 (1 / 9) |
Uptime: | 202:27:53 |
Calls: | 13,341 |
Calls today: | 4 |
Files: | 186,574 |
D/L today: |
3,851 files (1,205M bytes) |
Messages: | 3,357,120 |