Introduction.
The research question of this study asked how the legal reasoning, as defined by Levi, of first year law students compared with that of third year law students. Essay question answers, thinking aloud protocols, researcher observation notes, and student background data were collected. Visual inspections, statistical analyses, and computer searches examined the data and compared the results for the two groups of students.
The findings of this study are consistent with the prior work of Bryden (1984). As previously noted, Bryden raised the possibility that the small differences he found between first and third year law students would disappear if the first year students had the benefit of a little more legal education. This study introduced such a change. Instead of using beginning students as Bryden had, students were selected with twenty credit hours completed over two semesters. As Bryden would have predicted, no difference was found.
The expectation in this study, expressed in the alternative hypothesis, was that third year law students would demonstrate superior performance if Bryden's research approach were modified. This study explored modifications regarding both the definition of what was being measured and the different data collection techniques which could be used. After reviewing a number of alternatives contained in the literature, Levi's (1949) definition of legal reasoning was adopted. The literature likewise provided multiple data collection and analysis approaches. If the expected result had occurred and third year students had performed better, then Bryden's questions about legal education would have been largely answered and this study would have been additional support for the proposition that expertise can be taught.
The expectation of this study was not fulfilled. No difference was found between the legal reasoning, as defined, of first and third year law students. Reasoning by analogy data was sparse so there was little to which Collins' factors could be applied. Schon's reflection in action was almost totally absent. Moreover, although the findings are consistent with the work of Bryden, it is difficult to accept the null hypothesis because those same findings appear counter-intuitive. From the perspective of a teacher, it would seem that a group with an average of 74.3 credit hours of legal education would do better than a group with an average of only 20.5, especially when what is being measured is what law schools profess to teach, that is, legal reasoning. A contrary finding of no difference might be taken as a direct challenge to present legal education methods. In addition, as discussed in the review of the literature, it would provide further support to those who question whether thinking and problem solving skills can be taught.
Rather than accept the null hypothesis, this discussion will review other explanations for the findings. Once again, the review of the literature is a fertile source of possible explanations of why this study's expectation was not met. Thus this discussion will begin with a section considering the possibility that group selection procedures are responsible for the finding of no difference between the first and third year students. Then the following sections will inquire whether the responsible factor or factors stem either from the way legal reasoning was defined or from the research design that was chosen. Those sections will be followed by a summary and by conclusions based upon the discussion.
Group selection procedures as an explanation.
The first possible explanation of this study's findings of no difference between the groups is based on the argument that the findings are not a reflection on legal education's effectiveness, either at the school studied or others, but rather are a mere artifact of the two particular groups which were selected and studied. This possibility is perhaps the most difficult to dismiss and, ultimately, perhaps must be simply acknowledged based upon the practical limitations of research in general and this study in particular.
One of the more concise presentations of this possibility's underlying reasons is included in the work of Campbell and Stanley (1966). In their terminology, the essay score analysis in this study would be a static-group research design for which the internal and external threats to validity are well defined. One of those internal validity threats is the "selection" factor which arises from the failure to have complete randomization in the process of choosing participants.
The "selection" factor, the possibility that differential recruitment influenced the composition of the groups studied, cannot be ruled out in this study. Here there was little information on why some students participated and why many others did not. Students mentioned time problems even though sessions were held at any time convenient for the student. Other said that the $25.00 was not enough compensation compared to their outside employment prospects. Unmentioned but probably present was a fear to expose their abilities to a professor whom they might have for a teacher in a future class. Also unmentioned but likely to be present was a hope by some to obtain special help in test taking.
Each of these reasons could lead to the selection problems raised by Campbell and Stanley. Students with great time pressures might be more apt to be top or bottom performing students. Likewise employment concerns might be more important for bottom students, because they would be less likely to have scholarships, or top students because of their enhanced employment prospects. However, little data was available regarding the reasons for student participation and no formal data was available about what implications those reasons might raise.
Selection of students at only one law school also brings with it threats to the external validity of this study. The possibility exists that the institution studied was not typical and therefore the results should not be generalized to other schools. As noted by Bryden (1984), it is of some help that most law schools use many of the same textbooks and often teach from them in similar ways. However, generalizability concerns remain.
Campbell and Stanley (1966) also explain that all threats to validity cannot be overcome. Some assumptions remain, and might be false, even in the best designed study. Here, practical constraints necessitated more than the minimum number of assumptions. Therefore it must be allowed, especially for the statistical analysis of the essay scores, that the way the students were selected may be responsible for any difference or lack of difference found in this study. The balance of this discussion is presented in the context of the limitations raised by Campbell and Stanley.
The definition of legal reasoning as an explanation.
The second possible explanation of this study's finding of no difference between first and third year students' legal reasoning is based on the idea that this study attempted to measure the wrong educational outcome. Both the review of the literature and the data provide some support for the possibility that the "legal reasoning" supposedly learned by law students is, not Levi's (1949) legal reasoning by analogy, but rather a further application of the students' common sense to a new substantive knowledge base.
The review of the literature included references to scholars who would view legal reasoning as common sense. Included in those references were Fejfar (1986), Lonergan (1957), and Minsky (1986). If legal reasoning is basically common sense, and if law students come to law school already equipped with those skills, then an alternative educational outcome from a legal education would be acquisition of a body of substantive knowledge. As noted in the review of the literature, this would be consistent with the medical education findings of Elstein et al. (1978) and Feltovich et al. (1984). It would also be consistent with the view of legal education that students sometimes have (White, 1986).
In addition, the data here provide some support for the possibility that the definition of legal reasoning used in the study is responsible for the finding of no difference. For example, the computer word searches revealed no occurrences in the transcribed protocols of the word "analogy." Also, the first year students did best on the "LL" question which was most closely related to the material that they had already studied. In addition, even the highest scores of the groups averaged only between "4" and "5" whereas students using Levi's reasoning would be expected to score at least "5." These findings do not decide the issue but they are consistent with the view that expertise acquisition is more the learning of a large data base of information than the learning of a skill like Levi's reasoning by analogy.
At first glance, these challenges appear to go to the heart of this study. The basic assumption here was that law schools teach students to do "legal reasoning" and that Levi's (1949) reasoning by analogy was the best description of what that meant. However, even if students come to law school with some of those skills already in place, and even if learning a substantive body of knowledge is a primary goal, still the data and results in this study are not necessarily rendered superfluous. It can even be argued that the results become more interesting if these challenges are correct.
If law students already know how to reason by analogy, then it would seem even more likely that third year law students would achieve higher scores on this study's measures. The third year students' presumably larger body of legal knowledge would form the basis for more analogies than would be available to a first year student. Thus it might be argued that the challenges raised in this section, far from explaining the finding of no difference in the performance of the first and third year students, actually provide additional incentive for seeking explanations of why so little reasoning by analogy data was obtained. Even if the study's basic assumption about the nature of legal reasoning was incorrect, still the data collection methods cast a wide enough net to catch any differences between the two groups of students. Therefore this set of challenges raise important questions but, by questioning the basic assumption of the study, they do not automatically provide an explanation for why this study found no difference between the two groups.
Research design as an explanation.
Even if this study's student selection procedures were appropriate, and even if assumptions about the definition of legal reasoning did not compromise the results, the possibility still exists that aspects of this study's design would cause it to fail to detect differences between the first and third year students' legal reasoning. The findings in Chapter IV prompt consideration of at least five aspects of the study's design as candidates to explain the study's results. First, it may be that the design did not motivate the students sufficiently to elicit performance on the level addressed by Levi's definition of legal reasoning. Second, the measurement scales used for scoring the essay questions may not have correctly implemented Levi's definition of legal reasoning. Third, the design might have actively interfered with what was being measured. Fourth, the study's measurements may have been collected at the wrong time. Finally, the study's design may have been weakened by the absence of additional control groups. Each of these possibilities will be discussed separately in the following paragraphs.
Student motivation is one possible explanation for this study's results. The level of student motivation might have been important in at least two ways. One is that the first year students, having been more recently introduced to legal reasoning, might have had more motivation to demonstrate their ability to use the new way of thinking. This additional motivation might have increased first year student performance enough to mask any difference that the third year students might have otherwise shown. The second way that motivation might have been important is if neither first or third year students were sufficiently motivated to perform very well. The uniformly low level of performance again might have masked the presence of differences between the groups.
This study included no formal measures of student motivation. However, the researcher's observations of the students during data collection provide some anecdotal data about student motivation. Based on that data, it can be said that some students appeared to throw themselves completely into the task while others appeared willing to take the participation money and run. Moreover the latter category appeared to be a small minority of both the first and third year student groups. No other motivational patterns stood out from the data. However, due to the type of data used in drawing these conclusions, the possible role of motivation cannot be ruled out as an explanation for the results here. Even though it would be contrary to the researcher's personal impressions, it may be that no differences were found between the first and third year students' legal reasoning because of the influence of motivation on group performance.
The essay scoring scales are a second aspect of the study's design that may explain the finding of no difference between the first and third year student groups. The seven point scale was developed from Levi's definition in order to achieve more reliable scoring than had been achieved in previous studies where legal "issues" had typically been counted. To explore whether the seven point scale was really one ordinal scale or a combination of several dichotomous scales, the data was recoded into two dichotomous scales and the results were compared with the results from the seven point scale. The two dichotomous scales were based upon two key elements of Levi's definition - the use of facts and the use of cases. The seven point scale's classifications used combinations of these elements rather than separating them into two different scales.
On the one hand, the findings indicate that the scales worked well. Even though the students used reasoning on a lower level than expected, the scales were broad enough to describe and capture the process in a way that could be analyzed as planned.
On the other hand, the scales did not appear to ease scoring decisions as much as had been hoped. Even after being checked out at the orientation session with the researcher, each of the scorers called the researcher at least once to clarify how the scoring classifications should be applied. Most of the questions concerned how much evidence was necessary to conclude that a student's answer was using a legal rule or a case. The possible combination of factors proved to be extensive. Thus, rather than eliminating the ambiguity involved in deciding what an "issue" is and how it should be counted, the scoring scale may have merely placed the ambiguity into a new setting. Naturally ambiguity from any source would introduce measurement error which could conceal the effects that were being measured.
Even though the scales appeared to be broad enough to measure the form of legal reasoning used by the students, it also appears that the breadth of coverage may have sacrificed the precision needed to measure variations among the students. Particularly on the "TANK" question, the histogram of the scores shows little variability. Thus it appears that the scoring scale may have been too coarse a scale to make fine distinctions among student answers.
A third way that aspects of the study's design could have been responsible for the study's results was if the design interfered with the data collection process. Design interference could take several forms, each of which might so interfere with data collection so as to explain why no difference was found between the legal reasoning of the first and third year students. The first way that the design could interfere would be if the students mistook the data collection process as a legal examination for which they had learned a specialized method of response. Bryden (1984) recognized the possibility that a legal examination might evoke such a response from students. As a consequence, this study was designed so as to reduce the likelihood that the problems would be viewed as part of a legal examination. For example, the students wrote on lawyers' yellow pads rather than bluebooks and they worked at an office desk. Nevertheless, it was clear from various remarks of the participants that the "examination factor" had not been removed. Thus, in one protocol, the student said: "Can't panic here just because I'm taking an exam."
Another protocol provided a clue as to how the students might respond if they believed that they were taking an examination. That protocol, number 1942, referred to IRAC, an acronym standing for: Issue, Rule, Application, and Conclusion (Kelso & Kelso, 1984, p.481). Although perhaps suitable for some contexts, IRAC would appear to favor deductive reasoning over Levi's reasoning by analogy. Searching student outline banks and similar materials was not a part of this study's design. An informal review of several student course outlines was at first assuring in that it showed that students had been exposed to typical lines of appellate cases which are used to teach reasoning by analogy. However, the informal review also supported the possibility that an IRAC type approach to examination questions may be part of law school's hidden curriculum. As noted in the review of the literature, Gross (1984) details how students may be learning an oversimplified legal reasoning process for use on examinations. One key part of the oversimplification is that students work with abstracted legal rules rather than with the authority itself (pp. 426-427).
If this study's data collection process was viewed by students as a legal examination, and if students learn a specialized method of response for legal examinations, then that might explain why no difference was found between the first and third year students. The specialized responses might not accurately reflect what was being learned in law school. The students might be learning Levi's (1949) reasoning by analogy but the study's design might interfere with their demonstration of it because a different response would be triggered by the data collection procedure. The students also might not be learning Levi's reasoning by analogy, as Gross would maintain, but instead an oversimplified process of using abstract rules. In either case, essay questions would trigger exactly the kind of rule dominated responses that were found in this study. As noted earlier, the essay scores averaged, at best, between "4" and "5" with a "4" denoting the use of rule with facts and a "5" denoting the use of a case alone. Use of a case, without its facts, is like using the case as shorthand for a rule in the manner described by Gross.
Rule dominated responses also may have occurred because, as noted earlier, the students may not have been sufficiently motivated to engage in the more involved reasoning demands of Levi's process. However, whether caused by lack of motivation or the triggering of a specialized response, an additional impact of receiving rule dominated responses is that the use of rules rather than case analogies leaves little room for analysis using Collins' factors. Likewise a lack of motivation or the triggering of a specialized response could explain why Schon's reflection-in-action did not appear more often in the data.
The design of this study may have interfered with data collection in another way. Going beyond the problems of students thinking that they were taking an examination, Groen and Patel (1988) maintain that two factors are responsible for the results of investigations, like this one, which are designed along the lines of studies such as de Groot (1965) and Elstein et al. (1978). The first factor is that pattern recognition methodology, which proved useful in studying chess problems presented on de Groot's chess board, is not well adapted to verbally complex domains. In other words, chess pieces on a board may present a pattern for recognition but abstract words present a different type of problem requiring different methods. The second factor is that nonroutine, difficult problems will trigger the backward (hypothetico-deductive) reasoning found by Elstein et al. (1978). Groen and Patel (1988) themselves found forward reasoning when a different research design was used.
Here backward (deductive) reasoning was found. Largely absent was Levi's reasoning by analogy which, according to Levi (1949), is an imperfect form of forward or inductive reasoning. Moreover the essay questions used in the study deliberately included difficult portions so as to avoid the automated processing described by Johnston and Afflerbach (1985). However, it would seem that the easy portions of the essay questions would have avoided difficulties raised by Groen and Patel. Since students provided little evidence of Levi's reasoning by analogy even for the easy portions of the questions, Groen and Patel's two factors may not be a complete explanation of why no difference was found between the first and third year students.
Finally, the design of this study may have interfered with data collection due to the peculiar demands associated with writing an essay response while thinking aloud. Prior research has coupled thinking aloud with primarily mental activities such as problem solving (Ericsson & Simon, 1984) or reading (Johnston & Afflerbach, 1985). However, prior research leaves open the possibility that a task like writing may interfere with the collection of thinking aloud data or vice versa, especially when the task is being performed by a person who is not yet an expert. Here the data also raised the possibility of this type of interference because several participants commented on the difficulty of both writing and thinking aloud. Likewise some participants wrote and thought aloud in separate steps, not concurrently. An example of how this type of interference could be important is provided by data in the researcher's notes showing participants using analogies even though they did not include them in their written answers. Since all six of those participants were third years students, the absence of the analogies from their written answers, and thus from their scores, might have had an important impact on statistical comparisons of group scores. However the analysis of the data did not focus on this source of interference and thus the possibility of its impact here can be acknowledged but not resolved.
A fourth reason why students may not have manifested Levi's reasoning by analogy, even though they had learned it, is that the process involved may not be a monotonic function of learning, that is, that persons who have learned a certain amount may not perform as well as persons who have learned either more or less than that amount. Lesgold, Glaser, Rubinson, Klopfer, Feltovich, Wang (1988) give the example of a baby whose locomotion may become less efficient when automatic and secure creeping gives way to toddling. In their own work, Lesgold et al. (1988) found that third and fourth year radiologist residents performed less well than either experts or first and second year residents on some films. Strauss and Stavy (1982), quoted by Lesgold et al., state that one possible reason for nonmonotonicities is: "[o]scillation between a familiar but inadequate mental representation of the problem situation, and an improved but still new and 'untrusted' representation system which is correct."
Here the findings of Lesgold et al. might explain why even third year students used very general rules to solve the problems rather than applying the more sophisticated reasoning process which presumably they had learned. If, as Herbert Simon (Chase & Simon, 1973) has proposed, it takes 10,000 to 20,000 hours (five to ten work years) to become a chess master, then third year students might be at too early a point in their development to show a difference based upon the change in their reasoning. If a nonmonotonic process is involved, the third year students might even do worse than the first year students on some questions, as was the case here and for Groen and Patel's radiology residents.
The fifth aspect of the study's design that might affect at least the interpretation of the results is that no data was collected either from pre-law students or from lawyers several years after their graduation from law school. The presence of these control groups would have shed light on a number of the possible interpretations of the data. For example, the possibility that law students come to law school already equipped with legal reasoning would have been checked by data from pre-law students. Likewise the presence of law graduates with several years experience would have checked the possibility that a nonmonotonic process was being learned. However, for the present, the contribution of such control groups must await future studies.
Summary and discussion.
This discussion began with a recognition of the possible conclusions that could be drawn from a finding of no difference between the legal reasoning of first and third year law students. It was recognized that those conclusions could go to the heart of both present legal education methods and the possibility of teaching expertise in general.
Since the present system of legal education has been in effect for about one hundred years (Stevens, 1983), and since it therefore can arguably be entitled to a benefit of the doubt, this discussion has focused on explanations of the finding of no difference between the groups and yet not overthrow legal education as we presently know it.
Reviewing the three sections of challenges discussed above, it would appear that a "middle of the road" interpretation of the findings is probably the best approach. Extreme interpretations are, of course, possible. The limitations explained by Campbell and Stanley (1966) can be used to attribute the findings to peculiarities of the individual students studied with no justification for drawing any wider conclusions either for the law school involved or for law schools in general. At the other extreme, one could argue that the finding of no difference, even though one group had about three and a half times as much instruction, means that this study is one more on the list of studies which prove that it is misleading to claim that expertise can be taught. At a minimum, this would mesh with previous calls for reducing the law school curriculum from three to two years (e.g., Packer, & Ehrlich, 1972).
In between the extreme interpretations is the possibility of interpreting these findings in a "middle of the road" fashion. From the one extreme, that interpretation could allow that the study's design, while not rendering the data superfluous, nevertheless could be modified both to better address the problem of motivating participants and to reduce the "examination factor". For example, the use of simulated clients might be seen by students as being more realistic and thus increase motivation while still allowing data collection without students automatically applying a specialized set of responses that they may have learned for examination taking. From the other extreme, that middle position interpretation could allow that legal education may contain a "hidden curriculum" at variance with what the process is claimed to be. For example, it may be that knowledge base acquisition and deductive reasoning play a larger role than is often acknowledged. This could lead to measurement tools more focused on the knowledge base acquisition and deductive reasoning rather than upon legal reasoning as defined by Levi.
In addition to taking bits of wisdom from more extreme interpretive positions, a middle position might also concentrate on the possibility that a nonmonotonic process is being learned and that therefore increased learning will not always be accompanied by immediate improvements in performance. Although this would be frustrating for both students trying to learn expertise and teachers trying to measure that same learning, still it would appear to better represent what is involved in the acquisition of expertise.
Finally, taking a middle position would allow these findings to be more easily integrated with the findings of Bryden (1984). That study used traditional legal education measurement techniques. Issues were counted and totals compared for beginning and third year students. Here data was collected on the level of how issues are identified, on the level of legal reasoning by analogy as defined by Levi (1949). Uses of analogies by first and third year students were counted and compared. In both studies, no significant differences were found when the data was examined as a whole.
Although neither Bryden's study nor this one can be taken as dispositive, still both together can be read as presenting a stronger case that some modifications would strengthen legal education. Bryden urged greater use of programmed learning. This study would support at least greater recognition of the role of knowledge acquisition. It does not seem sufficient to rely on teaching legal reasoning or "thinking like a lawyer" as a set of skills apart from a large knowledge base.
In addition, both studies can be read as supporting the idea, often expressed before, that legal educators might spend more time explicitly describing the process they are teaching. Absent that explicit teaching, it appears that students may be quickly learning a specialized response to essay examinations. Although in one sense that shows that legal education can be very effective, it appears that the specialized response is learned more as part of a hidden curriculum than as an explicitly desired learning outcome. Moreover, it appears that such a specialized response, since it treats the law superficially as a set of overly abstracted rules, and since it does not involve use of law's richer resources, would be detrimental to students in their future practice of law.
Any changes in legal education will encounter predictable obstacles (Cramton, 1982). However, attempting to overcome such a specialized response by continuing to teach legal reasoning implicitly through the use of the case method may be effective for some but it probably is not the best method, particularly if a nonmonotonic process is being learned. Since a nonmonotonic process may delay positive feedback about what has been learned, students might become discouraged when long hours of work appear to produce no positive results and even seem to make matters worse. They may opt for what appears, at least short term, to work better. The fact that some students might succeed despite these obstacles would be of little comfort to educators if it appears that the educational system could do more for many others to whom that educational system has held itself out as a source of help.
Reviewing the preceding findings and discussion, some conclusions will be drawn about the questions addressed in this study. Those conclusions then lead to two levels of recommendations. On one level, recommendations can be made concerning future work on these questions. On another level, recommendations can focus on the broader challenge of teaching expertise.
Many possible explanations could be responsible for this study's finding of no difference between the legal reasoning, as defined by Levi (1949), of first and third year law students. This researcher concludes that the most likely candidates are interference between the task of writing and the thinking aloud process, the "examination" factor with its associated specialized response, and the nonmonotonic nature of expertise learning. These appear the most attractive alternatives, not only because they would account for the results, but also because they would most directly lead to future remedial action.
If the results are due to interference between writing and thinking aloud or to participants thinking that they are taking a law school examination which calls for a specialized response, then data gathering can be modified to better capture the reasoning that they are learning. The suggestions of Groen and Patel (1988), for example, would attack part of this problem. If a nonmonotonic process is being learned, then longitudinal studies could receive more emphasis. To the extent that repeated measures present special problems and opportunities, reference can be made to the testing and measurement advice of Willett (1988). In any case, conclusions and recommendations on this level allow practical steps to be taken toward resolving the larger questions of how problem solving and expertise in general can be defined, taught, and measured.
On a broader level, this researcher declines to conclude that the teaching of expertise is impossible or that legal education is in need of a revolution. However, it would appear that this study, read in light of other studies on the acquisition of expertise, supports at least one more general recommendation. As noted previously, the recommendation has been made before but it merits repetition. The studies support a call for legal educators, and other teachers of expertise, to be more explicit about what we teach, no matter how that goal is defined. If we teach a type of reasoning, we can identify and explain it. If we teach a knowledge base, we can acknowledge its importance and perhaps tailor our teaching methods to make its acquisition most effective and efficient. If we teach various analysis or learning skills, we can try to help students to see their progress even as their performance may appear to deteriorate. If acquisition of expertise is a long, hard process, we may have to examine how to instill a love of life-long learning.
It is likely that legal education, and the teaching of expertise in general, requires teaching to be explicit in all the ways outlined in the preceding paragraph, plus perhaps other ways of being explicit that have not yet been discovered. If that is the case, then we should acknowledge the magnitude of the task and be unwilling to be satisfied with short-term, partial solutions such as over-reliance upon any one teaching or testing method. Hopefully this study can help in the search for longer-term solutions both by documenting part of the legal education experience for the benefit of others who seek to teach expertise and by perhaps helping to forge links between legal education and research on teaching expertise in general so that legal educators could take better advantage of what has been learned by others.
Hopefully this study also will help build a greater appreciation for what has already been accomplished in the teaching of expertise. Although, for example, law school testing, and state bar examinations, can be easily criticized, and although they could certainly be better, still this study would support a view that they may be preferable to many alternatives. If a nonmonotonic process is involved, then heavy emphasis on testing the acquisition of knowledge may be appropriate to test a student who has only been in the field for three years or less. It may also, of course, raise questions about whether testing at a later point might be appropriate, not only to protect the public served by the expert, but in addition to provide incentive for the expert to continue the process of gaining expertise.
By raising the possibility of what amounts to the recertification of experts, this set of conclusions has obviously moved far beyond the subjects and context of this study. It is an appropriate point to observe that much research remains to be done but that this study must, indeed, conclude. The torch passed by Bryden (1984) and others has been carried here but it again must be passed on. Further conclusions will be best made when more data has been gathered and examined.