Dear Editors, please find below our detailed response to the reviews of each of the 4 referees and to Foster's suggestions. The postscript version of the paper is ????? We believe that the comments of the reviewers (and Foster) have helped us to substantially improve the paper and we are grateful. We have complied with almost all of the reviewer's comments and below we go through each of them individually. To summarize, the main changes we have made in the paper (as suggested by Foster) are: 1. Reorganized the empirical results section to provide a clearer focus 2. Carried out a thorough editing (to the best of our ability) of the writing used in the paper. Hopefully the result is a clearer, easier to read paper. 3. Revised Section 5 ("Lessons learned") to add what we believe are the associated opportunities for machine learning research: this was a very useful suggestion and should give the paper a broader readership. 4. Cleaned up numerous typos, figures, etc We look forward to your response Michael Burl (on behalf of all authors) ITEMIZATION OF CHANGES IN REVISED VERSION > In addition to resubmitting the revised paper, we ask that you send > e-mail to mljapps@postofc.corp.sgi.com describing how you addressed > the reviewers comments described below. Not everything needs to be > "fixed," as you may disagree with the reviewer on some points. > However, please do take the comments given to you seriously. The > reviewers are well-known researchers and their comments and > suggestions should generally improve the quality of your paper. We > URGE you to contact us if you have questions about our comments or > those of the reviewers. One of us (Foster in this case) has > personally read the paper and has summarized the important comments. > > > ---------------------------------------------------------------------- > Editor's comments (Foster Provost) > > As described by the reviewers, there are three areas of weakness in > the paper. I will address each in turn. The first two must be > addressed for it to be satisfactory for publication in the journal. > The third will make the paper a much stronger contribution to the > science of Machine Learning, and so should be considered seriously. > Also, several reviewers wondered (privately) how much overlap > there was with other published work on this application. Please > clarify the similarities and differences. > This paper is significantly different from any prior published work. It contains far more extensive experimental results: previously published papers only included results now described under "initial experiments". In addition, all of the previous papers were relatively short conference papers or overview book chapters. This paper is the only complete description of the JARtool system. > Three areas of weakness: > > 1) The empirical evaluation section lacks focus. It should be > redesigned and rewritten. I had the same reaction as Reviewer 1 as I > *worked* my way through this section. As Reviewer 1 states, "Why not > start by stating your important hypotheses then tell the reader what's > testing what. . ." As I read through, I found myself struggling to > figure out what you were trying to show, and then once I did, I had to > go back and determine whether or not I thought you had shown it. I > would have preferred that each section be organized as: > > - Here's what we're going to show > - Here's why we had to set up the experiments as we did (to show it) > - Here are the supporting results > - Here's some analysis/discussion > > For example, consider Section 3.5: > > - Combined classifiers outperform "baseline" > - Experimental setup (and why) > - Supporting results (consistent with other experiments) > - Not fielded because ... > The empirical results section was completely reorganized and rewritten. The order should now "flow" much better than in the original version. In particular there is a clear progression from "initial experiments" through "extended experiments" to "follow-up" hypothesis testing. The section should be much easier to read and understand. > Futhermore, the empirical results in paragraphs 4-9 of Section 2.5 > seem misplaced. The misplacement was apparent in reading, because of > the necessity of the forward reference to the ROC description and the > necessity of the italicized phrase beginning page 16. These results > seem more appropriate as followup analyses for Section 3.6. As far as > I can determine, the point of Section 3.6 is to show that the method > does not perform as well on images taken from geographically diverse > areas of the planet as on images taken from "homogeneous" areas. The > results from Section 2.5 seem to be a followup analysis of "Why not?" > What used to be Section 2.5 has now been moved "forward" to the end of Section 3 as you suggest: this should again improve readability considerably. > 2) The quality of the writing is uneven. The reviewers give many > specific recommendations for improvement. However, as suggested by > Reviewer 1, what the paper really needs is thorough, diligent editing > with the goal of sharpening each paragraph. The figures need > particular work, as described by the reviewers. > One of us specifically focused on this issue and carefully reviewed the paper for readability, and carried out extensive "word-smithing". We cannot guarantee that the readability is exemplary at this point, but it is clearly much better than the original version. Almost all of the figures have been revised with a view towards consistency and readability. > 3) As the paper stands, it is interesting as a description of a > prominent, real-world application. However, the contribution to the > science of Machine Learning seems weaker than I believe it could be. > As Reviewer 2 points out: "What is much less clear ... is what lessons > the RESEARCH community should draw from this paper." > > I believe that for a science such as Machine Learning to remain > vibrant (and not become collection of esoterica), the applied/academic > research cycle must be completed with the publication of applied > papers that point out important things missing from the science, and > point out where the science should be directed in order to have > practical import. > > Since you have considerable experience in applying machine learning > technologies *and* an appreciation for the science, I urge you > strongly to give the matter deep consideration. I believe that a > well-constructed set of "lessons for the research community" would > significantly increase the scientific contribution of this paper. For > generality, the lessons should be linked to other applications where > they also apply. I realize that this is not a trivial undertaking; > however, given the breadth of knowledge of the literature represented > by the seven authors, I would expect that the task would not be > prohibitive. > > Let me suggest a few areas that stand out, based on resonance with > applications familiar to me. > > * Choosing/creating class labels (What is the best thing to do when > experts disagree with each other? In the MAX application, we > encountered the same difficulty. [I can give you a to-appear > paper, if you'd like.]) > > * Comparing classifiers -- "accuracy" was not really important (ROC > comparisons were). This seems to come out almost every application. > (check out: http://www.cs.umass.edu/~fawcett/papers/KDD-97.ps.gz) > > * Feature engineering was very important > > * example independence (for training/test splits) > > * use of prior knowledge > > How do these important issues compare with the foci and directions of > ML research? Is ML research concentrating on the right problems? Are > there simplifying assumptions made in the research community that > should be weakened? > > Since applications work is by necessity limited in the number of > alternatives it can explore, it is important for it to say to the > research community, "Ok, here's a facet of our problem for which you > gave us little guidance. Our solution may be ad hoc, but it works. > We believe that the problem will manifest itself in other > applications. Would you please study it in a more principled fashion, > so future practitioners have more guidance than we did." > We have thought carefully about this and have updated Section 5 (lessons learned) to add more specific recommendations on what research issues researchers might look at to help the practitioners. Since several of us are machine learning researchers ourselves, we tried to keep this section short and to the point, since clearly one could generate many pages of desiderata! But the new section 5 should reflect the spirit and intention of your comments above. ********************************************************************* > Reviewers' comments follow. Please address their specific criticisms. > > ---------------------------------------------------------------------- > > REVIEWER 1: > > Paper Review Form > for > Machine Learning journal > Special Issue on Applications of Machine Learning > and the Knowledge Discovery Process > > > Title: Learning to Recognize Volcanos on Venus > Author(s): Burl, Asker, Smyth, Fayyad, Perona, Crumpler & Aubele > > > ______________________________________________________________________ > APPROPRIATENESS: Is this paper appropriate for this special issue? > ______________________________________________________________________ > > Yes. > > ______________________________________________________________________ > CONTRIBUTION: What contribution is made by this paper? Assess the > novelty of the contribution. Assess the generality of > the contribution. How significant is the contribution > from an applications perspective? From a research > perspective? Will it stimulate future research? Does > it provide useful knowledge for future applications? > > Note that the contribution may be more general than the > introduction of a new learning method. Possible > alternatives are: analyses of simplifying assumptions > commonly made in machine learning literature that cannot > be made in applications, comparisons of methods for > addressing such simplifying assumptions, modifications > of existing methods to address applications issues, > methods for other aspects of the knowledge discovery > process, analysis of why existing methods fail for a > particular application, analysis of the overall process > of applying machine learning methods, etc. > ______________________________________________________________________ > > The main contribution is the demonstration of a very substantial > application requiring ML and other technologies to solve an important > problem in planetary geology. The abstract problem (automated feature > engineering from pixel images where class assignment is expensive > and subjective) is common to many real-world problems, so practitioners > will find the wealth of techniques explored here valuable. > No very new technique is proposed here, but the wide scope of techniques > integrated is impressive. As well as empowering and emboldening > practitioners, I believe it will be stimulate research because of the > interesting problems it suggests. > > ______________________________________________________________________ > DEFINITION & MOTIVATION: > > Is the application task clearly defined? Is the need for machine > learning motivated well? Are existing or alternative methods > discussed. Are the important facets of the task identified in > such a way that the results will generalize to similar problems? > Is a class of problems with similar characteristics defined? > Are the evaluation criteria well motivated from the perspective > of the application task (i.e., do they reflect the target task)? > > Is the knowledge discovery process well defined? Does the paper > focus on the entire knowledge discovery process, or a particular > segment? > > Are the methods clearly defined? Is their use motivated well? > Is their choice (over other methods) well justified? Are the > methods specific to this problem, or will they generalize? Are > the limitations on generalization enumerated? Are the methods > technically sound? > ______________________________________________________________________ > > Yes on all counts, subject to the following. > > I did not check all technical details; some range outside my competence. > > The authors do not address the comprehensibility of their classifiers, > so a few critics might argue that this isn't knowledge discovery. > In my opinion this avenue could still be explored, and the paper > should not be discounted on this criterion. > > The paper shows the typical signs of having a large number of authors: > parts that don't fit together well, an uneven standard of editing, > and verbose ``committee prose'' lacking sharpness and brevity. > The paper has hardly one paragraph that would resist tightening, The entire paper was reviewed and revised for readability as mentioned earlier. > but here are a few examples that stick out. > * p. 2 "there is a a clear-cut need for automated algorithms..." > Cut "-cut" and "automated" (Are there any non-automated algorithms?) > * p.20: "Geologist B is relatively conservative relative to Geologist A." > * The "asymptotic performance of the method" sounds optimistic and impressive, > a humble phrase such as "the best performance possible" would do better. > * The first paragraph of 2.2 takes three long sentences to say that the quality > of the images isn't high enough to support unequivocal labels. > * p.25 "This application has been no exception in this regard." We have corrected all of these errors and are grateful to the referee's diligence in spotting them. > Of course many journal articles are long-winded in expression, > but this paper has too many floating generalizations in the empirical > evaluation, a place where we should be seeing hypotheses pinpointed onto data. > P.24: ``the geologist's detection performance appears to be in the same > general region as it was for the homogeneous images.'' > Changed to be more precise. > ______________________________________________________________________ > SUPPORT: How do the authors provide support for the paper's > contribution (e.g., empirical evaluation, theoretical > justification, demonstration, survey of the literature, > etc.). Do they provide sufficient support for their claims? > Is the argument technically & logically sound? > ______________________________________________________________________ > > The meandering empirical evaluation left me uncertain. On p. 18, there's a long > jump ahead to Figure 11; I found myself prospecting for conclusions > rather than having them delivered to me already refined. Why not start > by stating your important hypotheses then tell the reader what's testing what. As mentioned earlier, this section has been moved "forward" and the entire results section reorganized. > Several flaws are distracting at best and undermine confidence at worst. > As far as I can determine, Figure 2.5 on p.16 should be Figure 7; "Old F" > at the top of Figure 7 should be "OLD4"; "New F" should be "NEW5"; > all axes in Figure 7 should be given labels. These flaws suggest that > the authors have not sweated over the Figures in a long and earnest attempt > to make them more easily render the key hypotheses to the reader. Corrected. > > As I understand it, Leave-one-out is a limiting case of cross-validation, > but here LOO and XVAL are used to distinguish whether an image or example > is reserved for testing: this confuses. > ********************************************************************* * Things to do: MCB ********************************************************************* * * change the confusing text * ********************************************************************* > The round figures in Table 2 suggest that two figures after the decimal > point may be spurious precision. > Changed all numbers to have a single digit after the decimal > ______________________________________________________________________ > DISCUSSION: Does the paper provide an adequate discussion of related > work? Does it describe similarities, differences, and > progress? Does the paper discuss the implications of its > contribution? Does the paper argue for a new research > direction? If so, does the paper provide an adequate > description of the current state of research in this > direction? Does the paper identify the limitations of its > contribution and the simplifying assumptions made? Does > the paper discuss lessons learned in applying machine > learning that are of general interest? > > Does the paper discuss non-technical issues in applying > machine learning that may be generally useful? > ______________________________________________________________________ > > The related work discussed is almost entirely from the applications dimension; > the paper largely ignores other work in ML. > > * Techniques for dealing with contentious experts were > investigated extensively in the EMYCIN project, but none of this is mentioned. This is a good suggestion, but in light of the fact that there is a pretty large literature (e.g., in statistics and behavioral research) on multiple experts, we did not think that EMYCIN was any more relevant than the many other possible references. We added a reference to Cooke, which is a very useful overview text on this topic, for reader who want to find out more about the many aspects to this issue. > * The basic methodology was pioneered by Michie in the '80's (in fact > I recall him mentioning an application that classified chocolates as > defective based on camera images, but I don't have a reference. Ask him.) > But Michie is not mentioned, One of us (LA) sent email to D.Michie@ed.ac.uk asking for the ref on 6/18/97 but did not get a reply. If someone can supply the reference we would be happy to include it. > nor is Shapiro's work on structured > induction, which might be relevant for future work on comprehensibility. As far as we can remember, Shapiro's work dealt with categorical/discrete valued attributes, and so its relevance to what are essentially real-valued, high-dimensional, sets of pixels is not obvious. But if we are wrong and the referee can point out how Shapiro's work could be useful, we would be interested to hear about it. > * None of the several paper on ROC published in the ML literature is cited. We know of only 1 paper in ML (Spackman, 1989) which explicitly talks about ROC curves. We would be happy to add any more references. We also added 2 classic references here, 2 well known textbooks in the field (Green and Swets, and Macmillan and Creelman). > * Although the authors point to the scarcity of experts' time to label, > they do not mention any of the ML techniques for selecting which > examples to present to the experts for labels. During the project we consided using active learning to select which examples to label next - it would have been quite a complicated piece of software so we did not implement it. Given that we did not implement it, and that the paper is quite long, we did not want to add a discussion and explanation of this: so we declined to add this in the revision. > > > GENERAL: Is the paper well-organized and well-written? Does it use > standard terminology? Has the author provided sufficient > background? Are an appropriate number of informative figures > included? Are results presented clearly? Does the abstract > adequately reflect the contents? > > Does anything need to be added or deleted? > ______________________________________________________________________ > > The references are particularly sloppy. > * Missing reference number on p.15. [?] Done. > * On p.25, two references are given using the authors' names, whereas > all the other references are numeric. Done. > * Reference 15 (Geman, Bienenstock & Doursat) is an excellent paper > meriting wider attention, but I see no relevance to this article. > As far as I could determine from a quick check, it and references 6, 34 and > 41 are not cited in the text. Done. > * In reference 31, the second author's first name is Yann and his > last name is "le Cun"; he should not be cited as Y.L. Cun. Done. > * Canonical references are sometime omitted in favor of more recent ones. > For example, for LDA the oldest citation is Duda & Hart, and Egan's **********MCB??? ****************** > classic book on ROC curves is not mentioned. We added Green and Swets book which predates Greens by about 10 years - but we would be happy to add Green too if the reviewer would like us to. > * On p.4, a paper by "Fayyad and colleagues" is cited. This violates > the convention of listing all authors on the first citation. Since those > authors are a subset of the authors of this paper they are unlikely to > complain about being slighted; perhaps the phrase "Some of the authors [17]" > could be substituted if they prefer. Changed to Fayyad et al. > > Miscellaneous comments > > p.2 [one orbit... returned more data than all previous ... missions combined.] > This sounds very impressive, but the terrestrially-minded may have no idea > how many previous missions were run. Certainly I don't. Counting misssions is a bit tricky: there's between 10 and 30 depending on how one counts. So we reordered the sentences here to give the total amount of data returned from Magellan. > > * p.13 Numbers below 13 (such as twelve) should be spelled out. > So change "Training consists of a 3-step process" and "These 4 images" > Done. > p.2 "volcanism on Venus which is the most.." Garden path sentence. > The following noun phrase refers to volcanism, not Venus. > Done. > * p.25 "Namely,an"; also missing or garbled text in point 5. Done. > > * I didn't understand the sentence beginning "A major drawback of LDA". > LDA is often used in high-dimensional binary classification tasks. > Removed the sentence. We agree, it is confusing. > * Please label all axes. (e.g. Fig 6 y-axis). ********************************************************************* * Things to do: label fig 6 y-axis ********************************************************************* * * MCB? * ********************************************************************* > Also, the ^2 in Figure 9 > is so tiny and distant from "km" that non-geologists might think them unrelated. > Done. > * The figure on page 4 uses the acronym SVD without explanation. > Since its purpose is introductory, the acronym could be omitted without loss. > ********************************************************************* * Things to do: MCB???? ********************************************************************* * * generate new figure without (SVD) * ********************************************************************* > * p.7 "Graben are ridges... see Figure 2." I have no idea what ridges I > should be seeing. An arrow and label might help. > ********************************************************************* * Things to do: MCB ********************************************************************* * * add an arrow in figure 2 * ********************************************************************* > * Delete the period after the word "Abstract." Done. (required modifying the style file) > > ---------------------------------------------------------------------- > > REVIEWER 2: > > > ______________________________________________________________________ > Paper Review Form > for > Machine Learning journal > Special Issue on Applications of Machine Learning > and the Knowledge Discovery Process > > > Title:Learning to Recognize Volcanoes on Venus > Author(s): MC Burl, L Asker, ... > > > ______________________________________________________________________ > APPROPRIATENESS: Is this paper appropriate for this special issue? > ______________________________________________________________________ > > yes. > > ______________________________________________________________________ > CONTRIBUTION: What contribution is made by this paper? Assess the > novelty of the contribution. Assess the generality of > the contribution. How significant is the contribution > from an applications perspective? From a research > perspective? Will it stimulate future research? Does > it provide useful knowledge for future applications? > > Note that the contribution may be more general than the > introduction of a new learning method. Possible > alternatives are: analyses of simplifying assumptions > commonly made in machine learning literature that cannot > be made in applications, comparisons of methods for > addressing such simplifying assumptions, modifications > of existing methods to address applications issues, > methods for other aspects of the knowledge discovery > process, analysis of why existing methods fail for a > particular application, analysis of the overall process > of applying machine learning methods, etc. > ______________________________________________________________________ > > The paper is a detailed discussion of one application, > with particular emphasis on the data processing PRECEDING machine > learning. The section on lessons learned is rather brief, but most of > the lessons are abundantly evident (although not necessarily > explicitly discussed) in the detailed description of the various > parts of the system/process. The lessons learned are similar to those > learned in other applications - in that sense they are not all > that novel - but this application is high profile and important > so the paper contributes heavyweight evidence in support of other > applications' observations. The paper certainly does supply > useful knowledge for future applications, at least in terms of > general considerations/guidelines. > > What is much less clear -- this is the only real weakness of the paper -- > is what lessons the RESEARCH community should draw from this paper. > The authors should add a discussion of this, perhaps in the form of > a research agenda, or research issues/methods relevant to applications > such as these. > Section 5 was comprehensively revised to add what the research agenda should be as a result of understanding the needs of these types of large-scale learning projects. > ______________________________________________________________________ > DEFINITION & MOTIVATION: > > Is the application task clearly defined? Is the need for machine > learning motivated well? Are existing or alternative methods > discussed. Are the important facets of the task identified in > such a way that the results will generalize to similar problems? > Is a class of problems with similar characteristics defined? > Are the evaluation criteria well motivated from the perspective > of the application task (i.e., do they reflect the target task)? > > Is the knowledge discovery process well defined? Does the paper > focus on the entire knowledge discovery process, or a particular > segment? > > Are the methods clearly defined? Is their use motivated well? > Is their choice (over other methods) well justified? Are the > methods specific to this problem, or will they generalize? Are > the limitations on generalization enumerated? Are the methods > technically sound? > ______________________________________________________________________ > > With minor exceptions, the answer to all these questions is "yes". > > > ______________________________________________________________________ > SUPPORT: How do the authors provide support for the paper's > contribution (e.g., empirical evaluation, theoretical > justification, demonstration, survey of the literature, > etc.). Do they provide sufficient support for their claims? > Is the argument technically & logically sound? > ______________________________________________________________________ > > Support is given in two forms. The first is scientific: an objective > measure is defined and used to evaluate the system's performance. > Ample support of this type is given for all claims made. > The second type of support is user-acceptance. The paper is not entirely > clear on the degree of user-testing/acceptance, or even if it is too early > to say (the final version will only be released in late spring 1997). ********************************************************************* * Things to do: MCB ********************************************************************* * * It would be nice to say that it is fielded! * Is it??? * ********************************************************************* > > One argument that is not clearly presented is the analysis of the > experiment in Figure 7 (p.16), starting with "The key observation is...". > The logic underlying this argument is subtle and it took quite a lot of > head scratching for me to reconstruct it (and even now I have some doubts). > Perhaps the trickiest point is how the non-volcanoes in one image can > disrupt classification of the volcanoes in another and yet the volcanoes > in the first image are similar to those in the other. This may be too > simple-minded a way of putting things, but it is how even a careful > reader will reason. The argument needs to be fully, carefully given so that > it will convince readers. > ********************************************************************* * Things to do: MCB ********************************************************************* * * This text need to be clairfied! * I think only MCB can do this... LA * ********************************************************************* > On a related matter, I do not understand what justification the authors > have in mind for the claim (p.16): "The effect of including the > test example in the training set is not important...". Could not the > large difference between tts:svd-gauss and xvt:svd-gauss (on the New > data) be due to precisely this point ? ********************************************************************* * Things to do: MCB ********************************************************************* * * This too needs to be clarified * ********************************************************************* > > Another technical point that is not well explained is the use of FROC > instead of ROC (p.18). ROC is initially defined as having absolute > numbers on its axes, but the next paragraph says it plots percentages > (the paper calls them probabilities, or "rates"). I don't see why an > ROC with percentages cannot be used, why must the false alarm RATE > be divided by the number of images ? A rate is a rate. > It's even more confusing (to me) to divide by an area (km^2) - and the > paper seems to freely move between "images" and "km^2" (the figures > use km^2 but the corresponding tables use "per image"). If there is > a good reason, please explain it more clearly. > ********************************************************************* * Things to do: MCB ********************************************************************* * * I agree with the reviewer. The use of "false alarm rate/ km2" is * a mistake. It has been changed to "Fals Alarms / km$^{2}$" in * the figures in the experimental section. * * However, we need to have a uniform format for all figures * Can you (MCB) change Figure 7 to be false alarms per square km instead? * * LA has changed all the tables to include a line with info about false * alarms per square km! * ********************************************************************* > In some experiments you use geologist A as the reference, in others > you use consensus as the reference. It is a pity you don't use the > same reference throughout - as it is, results of different experiments > cannot be compared (see next point). Could you possibly use consensus > as a reference for all the experiments - do you have the data ? If you do, > please use it. If there are practical reasons preventing you from > doing this, they should be listed among the lessons learned. Added: Whenever possible we have used the consensus * labels as the reference, in the cases when we only had * access to individual labels, we have used the labels of * Geologist A as reference. * * to the first paragraph of section 3. * > > In Figure 11, the "starting" detection rate (when false alarm rate/km^2 > is 0.001) is MUCH higher - double - that in any other Figure. Why ? > ********************************************************************* * Things to do: MCB ???? ********************************************************************* * * explain the axes on the figures (maybe in a comment to the reviewer only?) * The axes do not start at 0!! * 6/19/97 LA * ********************************************************************* > ______________________________________________________________________ > DISCUSSION: Does the paper provide an adequate discussion of related > work? Does it describe similarities, differences, and > progress? Does the paper discuss the implications of its > contribution? Does the paper argue for a new research > direction? If so, does the paper provide an adequate > description of the current state of research in this > direction? Does the paper identify the limitations of its > contribution and the simplifying assumptions made? Does > the paper discuss lessons learned in applying machine > learning that are of general interest? > > Does the paper discuss non-technical issues in applying > machine learning that may be generally useful? > ______________________________________________________________________ > > As mentioned above, the paper lack a discussion of the implications of > its findings. As I understand it, JARtool is the artful combination > of off-the-shelf -- indeed textbook -- techniques (matched filters > for FOA, PCA for dimensionality reduction, simple statistical learning > for the final classification). Do the authors feel that simple, > well known techniques will suffice (if used cleverly) for a wide range > of applications ? If so, what sort of applications require more > sophisticated techniques ? What research directions would they propose ? > What weaknesses of the simple techniques did they notice ? In the "lessons learned" section 5 we added some text to mention these issues. For example, we have added a significant number of research directions. ******* * MCB???? Mention limitations in the conclusion: we had * a number of these in the PAMI paper we could pull out ****** > > ______________________________________________________________________ > GENERAL: Is the paper well-organized and well-written? Does it use > standard terminology? Has the author provided sufficient > background? Are an appropriate number of informative figures > included? Are results presented clearly? Does the abstract > adequately reflect the contents? > > Does anything need to be added or deleted? > ______________________________________________________________________ > > The paper is very well written and organized. A large part of the > paper deals with techniques from the statistical/pattern recognition > literature (matched filters, PCA, quadratic/Gaussian classifiers). > The authors astutely recognized that these will not be familiar > to many of the readers and have included a brief description of them. > These descriptions are helpful but not always entirely satisfactory: in > particular, I would ask them to improve the discussion of singular value > decomposition, and to add a full description of the quadratic/Gaussian > learning algorithm/classifier that they use. > ********************************************************************* * Things to do: MCB ********************************************************************* * * we need a 1 sentence definition of what PCA is. (precisely). * * include one sentence PRECISE DEFINITION of what SVD is. * Relate to PCA. (e.g. "SVD is a numerical technique to * estimate eigenvectors in cases where..." * i.e. WHY are we using it? * * define quadratic/Gaussian classifier precisely. e.g. as in * Duda and Hart, e.g log p(w|x$_{i}$ = x$^{+}\sum^{-1}$x etc... * to give quadratic form. Parameters estimated directly by * maximum likelyhoods. * * ********************************************************************* > My remaining comments are minor points of clarification. > p. 4 - "however." all by itself as a sentence is wrong. Corrected. > pp.6-7 (section 2.1): refers twice to Figure 2 ("Observe that...", > "see Figure 2") to explain features of the images but in Figure 2 > there is nothing to indicate where the volcanoes, graben etc. are > so these references explain nothing ********************************************************************* * Things to do: MCB ********************************************************************* * * include pointers to graben, volcanoes in fig.2 * * ********************************************************************* > p.6 - the last few sentences describing what volcanoes look like in > these images (starting with "Observe that...") are very hard to follow. > I think we need to have a portion of image with a prototypical > volcano to accompany this description. ********************************************************************* * Things to do: MCB ********************************************************************* * * maybe add a fig. with one "prototype" volc * ********************************************************************* > p.11 - "the results are not particularly sensitive to the value of k" > OK, but please tell us what k is, or over what range the results remain > the same. Aside from mere interest this is important on the next > page when we talk about dimensionality (k^2). Added: (where $k$ is the number of pixels of the height and width of the region). after: Let ${\bf v}_i$ denote a $k \times k$ pixel region around the $i$-th training volcano > p.11 - you mention (p.10) that FOA should operate with a high detection > rate, regardless of false positive rate. I would like to see some numbers > in this section - what are the typical number of volcanoes and > non-volcanoes FOA finds ? Again, this is important to understand > the claim (p.12) that the number of (positive) training examples is small > relative to the dimensionality. ********************************************************************* * Things to do: MCB, LA ********************************************************************* * * Add a forward pointer to the experimental results??? * ********************************************************************* > This is also a good opportunity to > mention the fact that FOA does miss some volcanoes (this is currently > mentioned in the experimental results). Actually this is mentioned in the place where the FOA is first defined. > p.14 (Figure 6): more explanation is needed of (b) and (c). > For instance, the manner of displaying the principal components is nowhere > explained. ********************************************************************* * Things to do: MCB ??? ********************************************************************* * * explain the manner of displaying the principal components * * ********************************************************************* > Neither is "importance" of a feature or the "singular value > decay" which are the key points being discussed. ********************************************************************* * Things to do: MCB ********************************************************************* * * explain the "singular value decay" * ********************************************************************* > The Y-axis in (c) has no title. ********************************************************************* * Things to do: MCB ********************************************************************* * * The Y-axis in (c) has no title. * ********************************************************************* > There should be letters (a), (b), and (c) in the figure. ********************************************************************* * Things to do: MCB ********************************************************************* * * add (a), (b), and (c) in the figure. * ********************************************************************* > p.15 - "there are arguments [?]". Fill in the missing reference and > summarize the arguments. Corrected. > p.15 - "the choice of classifier is a secondary effect" - I think you mean > "the choice of classifier is of secondary importance". Corrected. > p.16 (Figure 7) - it was not easy to tell the different lines apart, I had > to stare closely. One suggestion is to change the Y-axis to be > the DIFFERENCE between the two training/testing methods - instead of > two curves for each learning algorithm you would then have just one > and big/small differences would be immediately apparent. ********************************************************************* * Things to do: MCB ********************************************************************* * * change figure 7 * ********************************************************************* > p.16 - it is not acceptable to refer the reader several pages forward > to the description of the ROC methodology. Without that description > Figure 7 is incomprehensible. At the very least you need to say > enough here so that the reader knows how to read the figure and > appreciate how these curves can be used to compare performance. Without > that the subtle argument that follows is meaningless. Moved the whole paragraph starting with: "An important characteristic of any automatic detection... and ending with: ... vary considerably from image to image." and figure 7 forward to the experimental section. (and we reorganized the experimental results in general) > p.17 - the phrase "false alarms" is being used where I think it is more > appropriate to say "non-volcanoes" (these are false alarms of the FOA > but would it not be less confusing to use "false alarms" to refer > only to the non-volcanoes misclassified by the final classifier ?). > There may be other places where this should be corrected. Its fairly common in automatic target recognition (for example) to use matched filters as FOAs and to refer to false alarms, *even if these "false alarms" are processed by subsequent stages possibly eliminated*. And since we have used the term in our other publications, we would prefer not to make this change. > p.18 - "... probability threshold at an unknown test region ..." - the word > "which" is missing after "at" Corrected. > p.19 - "The FROC curve is implicitly parameterized..." - I think I can > guess what you mean but please state this more explicitly/clearly. ********************************************************************* * Things to do: MCB ********************************************************************* * * reformulate! * ********************************************************************* > p.19 - in describing ROC/FROC curves you forgot to say the most > important thing - that perfect performance is in the upper left > corner of the plot and therefore if one performance curve is above > and/or to the left of another then it is strictly superior. > i.e. you need to explain how to compare two such curves. ********************************************************************* * Things to do: MCB ********************************************************************* * * explain * ********************************************************************* > pp.19ff - in addition to the number of volcanoes in the training set > please tell us the number of non-volcanoes ********************************************************************* * Things to do: MCB ********************************************************************* * * add the number of non-volcanos for threshold 0.35 * for all the training sets (maybe in the appendix?) * ********************************************************************* > Figures 9 - 12: all include a line labelled FOA=0.35, but this line > appears at different heights - shouldn't it be the same everywhere ? > Also, what does 0.35 mean ? I think you mean that 0.35% of the volcanoes > are missed by FOA: if that is right, you should label the line 99.65 > since the Y-axis is detection rate. * comment to the reviewer: No, it does not mean that 0.35% of the * volcanoes are missed by the FOA. The number 0.35 is the * threshold used by the FOA. * * changed: The dashed line across the top of Figure~\ref{fig:old4} * is the best performance possible. * * to: The dashed line across the top of Figure~\ref{fig:old4} * marks the best possible performance using a FOA threshold of 0.35 * > p.20 - "The '*' symbol is the performance...": I could not find this symbol > anywhere in Figure 10. More generally I had trouble finding most of the > single points in these figures, can you use a bigger/bolder font for them ? Changed the symbols to bigger font and moved the 2 in km$^{2}$ > p.20 - "The results shown as a dashed line in..." please mention that this > line is labelled FOA. Done. > pp.20ff - how were the operating points in Tables 2-5 chosen ? ********************************************************************* * Things to do: MCB ********************************************************************* * * add a sentence about how they were choosen. * * They were choosen so that they would be as close as possible * to the scientists performance. The points on the ROC curve * that are closest to each respective scientist has to be inbetween * the highest and the lowest operating point. * * This can be described better! Can you do this Padhraic? * ********************************************************************* > pp.20ff - the figures' X-axis is "false alarm RATE/km^2" whereas > the corresponding tables give "number of false alarms per image". > Why have we change from rate to number, and from km^2 to image ? **** MCB check****** * * added a line to each table with FA / 10000 km2 * * 6/21/97 LA * ********************************************************************* ********************************************************************* * Things to do: -------- OK -------- ********************************************************************* * * changed the figures to false alarms / km$^{2}$ * ********************************************************************* > p.22 - "gives a better accuracy". Accuracy ? Do you mean detection > rate for a given false alarm rate ? Be precise. ********************************************************************* * Things to do: MCB: still needs fixing, Lars * new sentence is not much better than the old one ********************************************************************* * * changed "In general, the combined classifier approach * gives a better accuracy than the baseline method" * * to: "In general, the combined classifier approach * performs slightly better than the baseline method" * * * * 6/20/97 LA * ********************************************************************* > p.22 - "due to its simplicity and speed". The combined classifier, if > I understand it, is just the voting (or maybe even "or") of 4 of > the simple classifiers - so it is just 4 times slower, a negligible > amount is it not ? And why would anyone regard this obvious way > of combining 4 simple classifiers as unacceptably complex ? ********************************************************************* * Things to do: MCB - check ********************************************************************* * * changed: This result is consistent with other experiments * described in \cite{Asker97a}. * * into: The result is consistent with other experiments * presented in \cite{Asker97a} (this paper also * describes the technique used for combining the classifiers). * * * Somebody should explain that the combined approach requires * more examples before it can get the same accuracy. * 6/20/97 LA * ********************************************************************* > p.25 - "The geologists find that the performance...is quite satisfactory". > It is important to tell us WHERE on the ROC curve the performance is > regarded as satisfactory. ********************************************************************* * Things to do: MCB check ********************************************************************* * * added: * The geologists find that the performance of the method, * in terms of finding the location of the category 1 and 2 volcanoes, * is quite satisfactory, and expect that the fielded version of the * system will considerably reduce their work effort in creating a * comprehensive catalogue of the small volcanoes on Venus. * Here: * The actual * operating point on the ROC curve is still to be determined. * After discussions with the scientists, we decided to design the * interface in such a way that the operating point can be easily changed. * Both our experiments and the scientists tests of the system indicates * that the optimal choise of operating point will vary between different * areas of the planet depending on factors such as terrain type and local * distribution of the volcanoes. * * 6/22/97 LA * ********************************************************************* > p.26 - the sentence fragment starting "and widely ignored in published..." > is out of place. Corrected. > p.26 - lesson #6, and (p.27) THE MOST DIFFICULT issue you faced, > concerns how to use prior knowledge to help solve the problem. > Yet nowhere in the paper do I recall seeing this issue mentioned > or a discussion of the method you used to exploit prior knowledge. > If you did somehow systematically use prior knowledge, please tell us > what the knowledge was, how you elicited from experts, and how you > exploited it in designing the system. > The reviewer makes a good point, lesson #6 does not match with the rest of the paper - so we removed it. > ---------------------------------------------------------------------- > > REVIEWER 3: > > > Learning to Recognize Volcanoes on Venus > M.C. Burl, L. Asker, P. Smyth, U.M. Fayad, P. Perona, L. Crumpler, and > J. Aubele > > Appropriateness: > > Yes, this paper is definitely appropriate. Using your call, it 1) > describes a significant real-world problem, 2) it focuses on the > application, 3) it covers the entire process from raw data to fielded > system. > > Contribution: > > The closest machine learning work to this application is the SKICAT > work by the same authors. But as the paper mentions the segmenting of > the objects from the background is relatively easy in that domain and > there is a good set of predefined features provided by the domain > experts. It is this novel part of the KDD process (creation of the > dataset from the initial raw data) which is dwelt upon in this paper > (although the entire KDD process is discussed). The paper mentions > that several non-ML methods have been tried on this problem without > great success (and cites the work). > > The paper states that the JARtool is been evaluated for use on other > detection problems. Although it has not been generalized already, I > think it is clear that it can be used in other areas. Additional > techniques might need to be used in the focus of attention, labeling > of training data, and feature extraction areas to make this possible. > These techniques themselves are the focus of this paper and it is > unclear how well they will generalize. But since there has been so > little work in this area (data engineering etc.), I believe this lack > of well-defined generalization is to be expected. We are still > creating enough data with the current applications to be able to > generalize it in the future. > > This is a significant application. The paper states that when the > final version is released in late spring 97 it is expected to have a > major impact on geologist's ability to access information regarding > small volcanoes in the Magellan dataset. This would be stronger if > the system was already released and the geologist were already > routinely using it. Hopefully by the time of a final submission of > this paper, a stronger statement could be made. ********************************************************************* * Things to do: MCB ********************************************************************* * * Same as earlier. What is the status now? * * 6/22/97 LA * ********************************************************************* > > The paper contains significant research in the area of data > engineering. This area has not been highly represented amongst the > published documents of machine learning. But it is a (if not the) > critical area for any successful application. The main focus of this > paper is the process of getting from the raw data to a set of features > suitable for a "off-the-shelf" machine learning algorithm. Hopefully > this will stimulate the field into focusing more attention on this > area, instead of continuing to focus their attention on new algorithms > that are .05% more accurate on the Irvine dataset. Their particular > approach might turn out to be useful for other image detection > problems. They are considering applying JARtool itself to other > domains. The exact techniques used here are not directly applicable > to non-image domains, but the general process of transforming the data > is the same. So I believe the general process will be useful even in > non-image domains. > > > Definition & Motivation: > > The application task is clearly defined and examples are given of non > machine learning tasks which were not very successful. One question > which arises is the final results show that testing on homogeneous > images was very successful and testing on non-homogeneous images was > much less so. Could this fully account for the non-success of the > non-ML techniques (that they tested on non-homogeneous data). Since no > direct comparison is done, this is a question which remains in the > readers mind. ********************************************************************* * Things to do: MCB ********************************************************************* * * I don't think this is the case, I think Wiles & Forshaw * used simulated (?) data which was pretty homogenous. Mike can check * this. * * + our own matched filter results * * ********************************************************************* > > The results will generalize to other image detection problems. > Slightly different normalizations might have to be added or changed > but the general technique should work. One question which arose from > reading the paper is why normalization for the direction of the > illumination is not tried. The class of image detection problems is a > fairly large and significant one. ********************************************************************* * Things to do: MCB ********************************************************************* * * Mike: generate some response to reviewer on why this is not * important given PCA. (and SAR? LA) * * ********************************************************************* > > The evaluation criteria was well motivated by the application task but > is unfortunately fairly hard to quantify. They wanted the system to > do perform as well as a geologist. Sine there is a lot of variation > between the geologists themselves this is a difficult claim to > experimentally validate. > > The whole KD process is presented in sufficient detail. But the data > engineering portion (or the transformation from the raw data to the > final dataset used for learning) is the main focus of the paper. > > The methods (both for learning and for transforming the data) are > clearly defined. The data transformations are motivated well. The > machine learning algorithm is chosen because it provides some > properties (posterior probability estimates) of advantage in the > application. But the point is made that all the classification > algorithms tried performed in the same range. This nicely > illustrates the point that the particular machine learning algorithm > chosen is not usually an important issue in a successful application. > > The methods used throughout the KDD process are technically sound. > > > SUPPORT: > > The support given for the technique is empirical and technically > sound. One interesting facet of this is that since the evaluation is > "to perform as well as a person" the empirical evaluation is not > straight forward. This paper will give people a good model of > preparing a similar type of evaluation in other domains. > > Discussion: > > The discussion of related work is extensive and compares the > similarities and differences with other techniques were needed. The > paper emphasizes that deriving an appropriate representation is a key > ingredient to a successful application. It also states that another > key issue is how to model subjective human opinion in evaluating a > system's performance to a persons or to get training data. The paper > includes a section on lessons learned which are of general interest. > > General: > > The paper is well organized and well written. It uses standard > terminology and gives sufficient background about the application. It > contains appropriate figures and the results are presented clearly. > The abstract is adequate. > > The paper might be a bit long, but describing the domain and the image > analysis steps are necessary and take a good deal of space. > > Confidence: > > high > > Recommendation: > > accept with minor revising > > General fixes for author: > > pg 8 "Two label label events..." Corrected > > I believe a number of your citations are incorrect. For instance in > page 15 you cite [?] and on page 16 you talk about Figure 2.5 which > doesn't exist.. I believe it is Figure 7. Corrected. > > You discuss these here as ROC curves but I believe from your > discussion later that they are actually FROC curves (this is a bit > confusing). Also presenting these curves here without explaining them > until later is a bit frustrating to the reader. Could you at least > name the axis of these charts here so the reader doesn't have to wait > 3 pages until he understands them. ********************************************************************* * Things to do: MCB ********************************************************************* * * change the figures to FROC curves * ********************************************************************* And we also moved this section forward to the experimental results section which makes it "flow" much better. > > Please try to match up the tables and figures better (I know this is > difficult in Tex), but for instance Figure 10 on page 21 goes with > table 3 on page 22 instead of table 2 which appears right below it. ********************************************************************* * Things to do: MCB ********************************************************************* * * match up the tables and figures (after everything else is done) * ********************************************************************* > > You talk about the "*" being "one of the authors". This leaves the > reader wondering why a "machine learning person" is so good at > detecting volcanoes relative to geologist A and B. If "*" is in fact > a geologist please mention that in the paper. ********************************************************************* * Things to do: MCB, PJS, LA ********************************************************************* * * This is confusing. Especially since Jayne and Larry are co- * authors. We can no longer say "one of the authors (MCB)". * * apart from that, MCB spent a long time working on volcanoes!! * We can say this in the paper? (We have old data with Caltech * students who were not volcano experts (-but everyone does pretty * well)) * 6/22/97 LA * ********************************************************************* > > On Page 24 you say "clearly both classifiers are performing...but only > one classifier is shown in Figure 12 which is being referred to. ********************************************************************* * Things to do: -------- OK -------- ********************************************************************* * * changed: Clearly both classifiers are performing worse than * on the more homogeneous image sets. For example... * * into: Here the classifier performs worse than on the more * homogeneous data sets. * * 6/22/97 LA * ********************************************************************* > > On page 25 you start going to a different method of cites (now using names > instead of numbers). > Corrected. > On Page 26 "Namely,an" needs a blank. > Corrected. > ---------------------------------------------------------------------- > > REVIEWER 4: > > > Paper Review Form > for > Machine Learning journal > Special Issue on Applications of Machine Learning > and the Knowledge Discovery Process > > > Title: Learning to Recognize Volcanoes on Venus > Author(s): Burl, Asker, et al. > > > ______________________________________________________________________ > APPROPRIATENESS: Is this paper appropriate for this special issue? yes > ______________________________________________________________________ > > > ______________________________________________________________________ > CONTRIBUTION: What contribution is made by this paper? Assess the > novelty of the contribution. Assess the generality of > the contribution. How significant is the contribution > from an applications perspective? From a research > perspective? Will it stimulate future research? Does > it provide useful knowledge for future applications? > > Note that the contribution may be more general than the > introduction of a new learning method. Possible > alternatives are: analyses of simplifying assumptions > commonly made in machine learning literature that cannot > be made in applications, comparisons of methods for > addressing such simplifying assumptions, modifications > of existing methods to address applications issues, > methods for other aspects of the knowledge discovery > process, analysis of why existing methods fail for a > particular application, analysis of the overall process > of applying machine learning methods, etc. > ______________________________________________________________________ > The authors have taken general machine learning and statistical analysis = > techniques and have defined=20 > methods that allow planetary geologists to deal effectively with large im= > age databases. > > ______________________________________________________________________ > DEFINITION & MOTIVATION: > > Is the application task clearly defined? Is the need for machine > learning motivated well? Are existing or alternative methods > discussed. Are the important facets of the task identified in > such a way that the results will generalize to similar problems? > Is a class of problems with similar characteristics defined? > Are the evaluation criteria well motivated from the perspective > of the application task (i.e., do they reflect the target task)? > > Is the knowledge discovery process well defined? Does the paper > focus on the entire knowledge discovery process, or a particular > segment? > =20 > Are the methods clearly defined? Is their use motivated well? > Is their choice (over other methods) well justified? Are the > methods specific to this problem, or will they generalize? Are > the limitations on generalization enumerated? Are the methods > technically sound? > ______________________________________________________________________ > All aspects of the paper are well motivated. The KDD process is well def= > ined. The paper focuses on the=20 > entire KDD process and through the participation of users it shows how th= > e process is actually being used. > > ______________________________________________________________________ > SUPPORT: How do the authors provide support for the paper's > contribution (e.g., empirical evaluation, theoretical > justification, demonstration, survey of the literature, > etc.). Do they provide sufficient support for their claims? > Is the argument technically & logically sound? > ______________________________________________________________________ > The authors provide adequate support for the paper=92s contribution. > > ______________________________________________________________________ > DISCUSSION: Does the paper provide an adequate discussion of related > work? Does it describe similarities, differences, and > progress? Does the paper discuss the implications of its > contribution? Does the paper argue for a new research > direction? If so, does the paper provide an adequate > description of the current state of research in this > direction? Does the paper identify the limitations of its > contribution and the simplifying assumptions made? Does > the paper discuss lessons learned in applying machine > learning that are of general interest? > =20 > Does the paper discuss non-technical issues in applying > machine learning that may be generally useful? > ______________________________________________________________________ > The paper provides good discussion of related work. > > > ______________________________________________________________________ > GENERAL: Is the paper well-organized and well-written? Does it use > standard terminology? Has the author provided sufficient > background? Are an appropriate number of informative figures > included? Are results presented clearly? Does the abstract > adequately reflect the contents? > > Does anything need to be added or deleted? > ______________________________________________________________________ > The paper stands well as it is. > >