A Unique Book on Data Mining and Mathematical Logic.
Data Mining and Knowledge Discovery Via Logic-Based Methods: Theory, Algorithms, and Applications


by Evangelos Triantaphyllou


A monograph expected to be published by the end of fall 2009 or early spring 2010 by Springer. It is 418 pages long.
ISBN ****************





TABLE OF CONTENTS
List of Figures..................................................xv 
List of Tables..................................................xxi 
Foreword......................................................xxvii 
Preface........................................................xxxi 
Acknowledgments..............................................xxxvii 


STRONG>
Chapter 1
INTRODUCTION......................................................1
1.1   	What is Data Mining and Knowledge Discovery?..............1
1.2	Some Potential Application Areas for Data Mining 
        and Knowledge Discovery...................................2
1.2.1 		Applications in Engineering.......................2
1.2.2		Applications In Medical Sciences..................2
1.2.3		Applications In the Basic Sciences................3
1.2.4		Applications In Business..........................4
1.2.5		Applications In the Political and 
		Social Sciences...................................4
1.3	The Data Mining and Knowledge Discovery Process...........5
1.3.1		Problem Definition................................6
1.3.2		Collecting the Data...............................6
1.3.3		Data Pre-Processing...............................8
1.3.4		Application of the Main Data Mining and
		Knowledge Discovery Algorithms....................8
1.3.5		Interpretation of the Results of the Data 
		Mining and Knowledge Discovery Process...........10
1.4  	Four Key Research Challenges in Data Mining and 
	Knowledge Discovery......................................10
1.4.1		Collecting Observations about the Behavior 
		of a System......................................11
1.4.2  		Identifying Patterns from Collections of Data....11
1.4.3  		Which Data to Consider for Evaluation Next?......15
1.4.4  		Do Patterns Always Exist in Data?................17
1.5  	Concluding Remarks.......................................18
Move UP to the Top of the Webpage


Chapter 2
INFERRING A BOOLEAN FUNCTION FROM POSITIVE 
AND NEGATIVE EXAMPLES............................................19
2.1  	Introduction.............................................19
2.2	Some Background Information..............................20
2.3	Data Binarization........................................24
2.4   	Definitions and Terminology..............................28
2.5	Generating Clauses from Negative Examples Only...........31
2.6	Clause Inference as a Satisfiability Problem.............32
2.7	An SAT Approach for Inferring CNF Clauses................33
2.8	The One Clause At a Time (OCAT) Concept..................34
2.9   	A Branch-and-Bound Approach 
	for Inferring a Single Clause............................38
2.10   	A Heuristic for Problem Pre-Processing...................45
2.11	Some Computational Results...............................47
2.12  	Concluding Remarks.......................................51
APPENDIX  	
The SAT Formulation for the Illustrative Example 
(the CNF case)...................................................53
Move UP to the Top of the Webpage


Chapter 3
A REVISED BRANCH-AND-BOUND APPROACH FOR INFERRING A
BOOLEAN FUNCTION FROM EXAMPLES...................................55
3.1	Some Background Information..............................55
3.2	The Revised Branch-and-Bound Algorithm...................55
3.2.1  		Generating a Single CNF Clause...................56
3.2.2		Generating a Single DNF Clause...................61
3.3	Some Computational Results...............................62
3.4	Concluding Remarks.......................................69
Move UP to the Top of the Webpage


Chapter 4
SOME FAST HEURISTICS FOR INFERRING A BOOLEAN FUNCTION
FROM EXAMPLES....................................................71
4. 1	Some Background Information..............................71
4.2	A Fast Heuristic for Inferring a Boolean Function 
	From Complete Data.......................................73
4.3	A Fast Heuristic for Inferring a Boolean Function 
	From Incomplete Data.....................................79
4.4	Some Computational Results...............................83
4.4.1 		Results for the RA1 Algorithm on the 
		Wisconsin Cancer Data............................85
4.4.2	  	Results for the RA2 Heuristic on the 
		Wisconsin Cancer Data with Some Missing Values...90
4.4.3 		Comparison of the RA1 Algorithm and the B&B  
		Method by Using Large Random Data Sets...........93
4.5 	Concluding Remarks......................................100
Move UP to the Top of the Webpage


Chapter 5
AN APPROACH TO GUIDED LEARNING OF BOOLEAN FUNCTIONS.............103
5.1  	Some Background Information.............................103
5.2	Problem Description.....................................106
5.3 	The Proposed Approach...................................108
5.4   	On the Number of Candidate Solutions....................113
5.5 	An Illustrative Example.................................114
5.6  	Some Computational Results..............................116 
5.7   	Concluding Remarks......................................127
Move UP to the Top of the Webpage


Chapter 6
AN INCREMENTAL LEARNING ALGORITHM FOR INFERRING 
BOOLEAN FUNCTIONS...............................................129
6.1  	Some Background Information.............................129
6.2  	Problem Description.....................................130
6.3  	Some Related Developments...............................131
6.4  	The Proposed Incremental Algorithm......................135
6.4.1		Repairing a Boolean Function that 
     		Incorrectly Rejects a Positive Example..........135
6.4.2		Repairing of a Boolean Function that 
  		Incorrectly Accepts a Negative Example..........138
6.4.3		Computational Complexity of the Algorithms 
		for the ILE Approach............................139
6.5  	Experimental Data.......................................140
6.6  	Analysis of the Computational Results...................140
6.6.1	 	Results on the Classification Accuracy..........141
6.6.2		Results on the Number of Clauses................144
6.6.3		Results on the CPU Times........................147
6.7  	Concluding Remarks......................................149
Move UP to the Top of the Webpage


Chapter 7
A DUALITY RELATIONSHIP BETWEEN BOOLEAN FUNCTIONS IN CNF 
AND DNF DERIVABLE FROM THE SAME TRAINING EXAMPLES...............151
7.1	Introduction............................................151
7.2 	Generating Boolean Functions in CNF and DNF.............152
7.3	An Example of Deriving Boolean Functions in 
	CNF and DNF.............................................152
7.4 	Some Computational Results..............................153
7.5  	Concluding Remarks......................................155
Move UP to the Top of the Webpage


Chapter 8
THE REJECTABILITY GRAPH OF TWO SETS OF EXAMPLES.................157
8.1	Introduction............................................157
8.2 	The Definition of the Rejectability Graph...............158
8.2.1   	Properties of the Rejectability Graph...........160
8.2.2		On the Minimum Clique Cover of the 
		Rejectability Graph.............................162 
8.3   	Problem Decomposition...................................163
8.3.1   	Connected Components............................163
8.3.2   	Clique Cover....................................164
8.4     An Example of Using the Rejectability Graph.............165
8.5	Some Computational Results..............................167
8.6   	Concluding Remarks......................................175
Move UP to the Top of the Webpage


Chapter 9
THE RELIABILITY ISSUE IN DATA MINING: THE CASE OF 
COMPUTER-AIDED BREAST CANCER DIAGNOSIS..........................177
9.1 	Introduction............................................177
9.2  	Some Background Information on Computer-Aided 
	Breast Cancer Diagnosis.................................177
9.3  	Reliability Criteria....................................179
9.4  	The Representation / Narrow Vicinity Hypothesis.........183
9.5 	Some Computational Results..............................185
9.6	Concluding Remarks......................................189
APPENDIX  I 
Definitions of the Key Attributes...............................191 
APPENDIX  II 
Technical Procedures............................................193
9.A.1	The Interactive Approach................................193
9.A.2	The Hierarchical Approach...............................194
9.A.3	The Monotonicity Property...............................194
9.A.4	Logical Discriminant Functions..........................195
Move UP to the Top of the Webpage


Chapter 10
DATA MINING AND KNOWLEDGE DISCOVERY BY MEANS OF 
MONOTONE BOOLEAN FUNCTIONS......................................197
10.1  	Introduction............................................197
10.2 	Background Information..................................199
10.2.1 		Problem Descriptions............................199
10.2.2 		Hierarchical Decomposition of Variables.........202
10.2.3 		Some Key Properties of Monotone 
		Boolean Functions...............................204
10.2.4 		Existing Approaches to Problem 1................209
10.2.5 		An Existing Approach to Problem 2...............211
10.2.6 		Existing Approaches to Problem 3................211
10.2.7 		Stochastic Models for Problem 3.................211
10.3 	Inference Objectives and Methodology....................214
10.3.1 		The Inference Objective for Problem 1...........214
10.3.2 		The Inference Objective for Problem 2...........215
10.3.3 		The Inference Objective for Problem 3...........215
10.3.4	Incremental Updates for the Fixed 					
	Misclassification Probability Model.....................216
10.3.5 		Selection Criteria for Problem 1................216
10.3.6 		Selection Criteria for 
		Problems 2.1, 2.2, and 2.3......................217
10.3.7 		Selection Criterion for Problem 3...............218
10.4 	Experimental Results....................................223
10.4.1 		Experimental Results for Problem 1..............223
10.4.2 		Experimental Results for Problem 2..............225
10.4.3		Experimental Results for Problem 3..............228
10.5 	Summary and Discussion..................................232
10.5.1 		Summary of the Research Findings................232
10.5.2 		Significance of the Research Findings...........234
10.5.3 		Future Research Directions......................235
10.6 	Concluding Remarks......................................236
Move UP to the Top of the Webpage


Chapter 11
SOME APPLICATION ISSUES ON MONOTONE BOOLEAN FUNCTIONS...........237
11.1	Some Background Information.............................237
11.2 	Expressing Any Boolean Function in Terms of 
	Monotone Ones...........................................237
11.3	Formulations of Diagnostic Problems as the Inference  
	of Nested Monotone Boolean Functions....................239
11.3.1		An Application to a Reliability 
		Engineering Problem.............................239
11.3.2		An Application to the Breast Cancer 
		Diagnosis Problem...............................240
11.4 	 	Design Problems.................................241
11.5 	 	Process Diagnosis Problems......................243
11.6	Three Major Illusions in the Evaluation of 
	the Accuracy of Data Mining Models......................243
11.6.1	 	First Illusion:  The Single Index 
		Accuracy Rate...................................243
11.6.2		Second Illusion:  Accurate Diagnosis 
		without Hard Cases..............................244
11.6.3		Third Illusion:  High Accuracy on Random 
		Test Data Only..................................244
11.7 	Identification of the Monotonicity Property.............245
11.8	Concluding Remarks......................................248
Move UP to the Top of the Webpage


Chapter 12
MINING OF ASSOCIATION RULES.....................................249
12.1  	Some Background Information.............................249
12.2  	Problem Description.....................................251
12.3 	Methodology.............................................252
12.3.1 		Some Related Algorithms.........................252
12.3.2  	Alterations to the RA1 Algorithm................253
12.4  	Computational Experiments...............................257
12.5  	Concluding Remarks......................................264
Move UP to the Top of the Webpage


Chapter 13
DATA MINING OF TEXT DOCUMENTS...................................267
13.1		Some Background Information.....................267
13.2		A Brief Description of the Document 
		Clustering Process..............................270
13.3		Using the OACT Approach to Classify 
		Text Documents..................................271
13.4		An Overview of the Vector Space Model...........273
13.5 	A Guided Learning Approach for the Classification of
		Text Documents..................................276
13.6		Experimental Data...............................276
13.7		Testing Methodology.............................278
13.7.1		The Leave-One-Out Cross Validation..............279
13.7.2		The 30/30 Cross Validation......................279
13.7.3		Statistical Performance of Both Algorithms......279
13.7.4		Experimental Setting for the Guided Learning 
		Approach........................................280
13.8	Results for the Leave-One-Out and the 30/30 Cross  
	Validations.............................................280
13.9	Results for the Guided Learning Approach................284
13.10  	Concluding Remarks......................................288
Move UP to the Top of the Webpage


Chapter  14
FIRST CASE STUDY:  PREDICTING MUSCLE FATIGUE FROM EMG SIGNALS...289
14.1 	Introduction............................................289
14.2	General Problem Description.............................289
14.3	Experimental Data.......................................290
14.4	Analysis of the EMG Data................................291
14.4.1		The Effects of Load and Electrode Orientation...291
14.4.2		The Effects of Muscle Condition, 
		Load, and Electrode.............................292
14.5	A Comparative Analysis of the EMG Data..................293
14.5.1 		Results by the OCAT / RA1 Approach..............294
14.5.2 		Results by the Fisher's Linear 
		Discriminant Analysis...........................295
14.5.3  	Results by Logistic Regression..................296
14.5.4 		A Neural Network Approach.......................297
14.6	Concluding Remarks......................................299
Move UP to the Top of the Webpage


Chapter 15
SECOND CASE STUDY:  INFERENCE OF DIAGNOSTIC RULES FOR 
BREAST CANCER...................................................301
15.1	Introduction............................................301
15.2	Description of the Data Set.............................301
15.3 	Description of the Inferred Rules.......................305
15.4 	Concluding Remarks......................................310
Move UP to the Top of the Webpage


Chapter 16
A FUZZY LOGIC APPROACH TO ATTRIBUTE FORMALIZATION: ANALYSIS 
OF LOBULATION FOR BREAST CANCER DIAGNOSIS.......................311
16.1 	Introduction............................................311
16.2 	Some Background Information on Digital
	Mammography.............................................311
16.3	Some Background Information on Fuzzy Sets...............313
16.4  		Formalization with Fuzzy Logic..................316
16.5		Degrees of Lobularity and Microlobularity.......321
16.6 	Concluding Remarks......................................324	
Move UP to the Top of the Webpage
 

Chapter 17 
CONCLUSIONS....................................................325
17.1	General Concluding Remarks.............................325
17.2	Twelve Key Areas of Potential Future Research on Data
	Mining and Knowledge Discovery from Databases..........326
17.2.1 		Overfitting and Overgeneralization.............326
17.2.2 		Guided Learning................................326
17.2.3		Stochasticity..................................327
17.2.4 		More on Monotonicity...........................327
17.2.5		Visualization..................................327
17.2.6 		Systems for Distributed Environments...........328
17.2.7 		Developing Better Exact Algorithms 
		and Heuristics.................................328
17.2.8 		Hybridization	and Other Algorithmic Issues...328
17.2.9 		Systems with Self Explanatory Capabilities.....329
17.2.10 	New Systems for Image Analysis.................329
17.2.11		Systems for Web Applications...................329
17.2.12		Developing More Applications...................329
17.3	Epilogue...............................................330
Move UP to the Top of the Webpage


REFERENCES.....................................................333


Subject Index..................................................353


Author Index...................................................363


About the Author...............................................369

Move UP to the Top of the Webpage
 

Visit Dr. Triantaphyllou's Homepage
Dr. Triantaphyllou's Books / Special Issues web site     a new site!

Send suggestions / comments to Dr. E. Triantaphyllou (trianta@lsu.edu).