Linggo, Marso 25, 2012

IX. TOPICAL CONTENT


A. RELATED LITERATURE


Module 1: OVERVIEW IN ASSESSMENT OF LEARNING



                     
Do pupils make more progress by taking charge of their own learning?

Assessment for Learning – The process of seeking and interpreting evidence for use by learners and their teachers to decide where the learners are in their learning, where they need to go and how best to get there.

Here is a podcast-style audio interview about Assessment for Learning. Useful overview, includes interview with teacher and pupils (just click the play button).
Assessment for Learning key findings:
Aims to help pupils to know and recognise the standards that they are aiming for.  Learners need to be clear about exactly what they have to achieve in order to progress, and learning goals should be shared with them.  They need to understand what counts as good work.
Aims to encourage pupils to be active *learners which evidence suggests leads them to make greater improvements than passive* learners.
Involves pupils in peer and self-assessment.  Ultimately, learners must be responsible for their own learning; the teacher cannot do that for them.  So pupils must be actively involved in the progress and need to be encouraged to see for themselves how they have progressed in their learning and what it is they need to do to improve.  Teachers need to encourage pupils to review their work critically and constructively.
Provides feedback, which leads to pupils recognising their next steps and how to take them.  Feedback should be about the qualities of the work with specific advice on what needs to be done in order to improve.  Pupils need to be given the time to act on advice and make decisions about their work, rather than being the passive recipients of teachers’ judgements.

* active learners can be defined as those who:
·         can select information for their own use
·         can offer  and challenge opinions
·         independently judge their own work against given criteria and standards
* passive learners can be defined as those who:
·         are recipients of knowledge
·         await guidance at every stage before making progress
·         rely on others to judge the quality of their work

Module 2: ESTABLISHING THE LEARNING TARGETS




Learning Targets

  • This includes both of what students will know, understand, and be able to do and the criteria that will be used to judge performance.
  • The criteria that are used to judge performance can be thought of as the different dimensions of student performance that will be used to judge whether or not you’re your objectives have been met.
  • Learning targets emphasize the link between instruction and assessment since writing these objectives, you should always be thinking about assessment.  How will you determine if students have learned what you have taught?  What observable behaviors (either verbally or in writing) will demonstrate that students have met the objectives?  If you find these questions difficult to answer then you probably have not written very good learning objectives.

Types of Learning Targets
  • Knowledge and Simple Understanding: This includes mastery of facts and information, typically through recall (i.e. dates, definitions, and principles) as well as simple understanding (i.e. summarizing a paragraph, explaining a chart, and giving examples).
  • Deep Understanding and Reasoning:  This includes problem solving, critical thinking, synthesis, comparing, higher order thinking skills, and judgment.
  • Skills:  This involves something that a student must demonstrate in a way other than answering questions.  These type of targets involve a behavior in which the knowledge, understanding, and reasoning, are used overtly.
  • Products:  This includes a sample of student work (i.e. paper, report, artwork, or other project) that demonstrates the ability knowledge, understanding, reasoning, and skills.
  • Affective:  This includes attitudes, values, interests, feelings, and beliefs.
Sources of Learning Targets
  • Bloom's Taxonomy
  • National, State, and District Standards
  • Textbooks
Module 3: KEYS TO EFFECTIVE TESTING



Test Characteristics
Test, as an instrument possesses some qualities, which are necessary, before it can be eligible as a test and usable.  A test should therefore possess the under listed characteristics, which are interdependent and are what makes a test what it should be.
They include:
• Validity- when a test fulfils its purpose(s) that is measures what it intended to measure and to the extent desired then it is valid. The characteristics of testee can blur the time validity of a test. That is, it can provide false results that do not represent truly what it intend to measure in a student. If a learner has difficulty in assessing the Internet for course materials and participation it can send wrong impression on the learner commitment to log in and ability in course work.
• Reliability- The consistency of test ability to measure accurately what it supposes to measure is its strength in reliability. It is the ‘extent to which a particular measurement is consistent and reproducible’.
• Objectivity- The fairness of a test to the testee, bias test does not portray objectivity and hence is not reliable. A test that is objective has high validity and reliability
• Discrimination- A good test must be able to make distinction between poor and good learner; it should show the slight differences between learner attainment and achievement that will make it possible to distinguish between poor and good learner. What are the likely criteria in order to satisfy these conditions?
• Comprehensiveness- Test items that covers much of the content of the course, that is the subject matter is said to be comprehensive and hence capable of fulfilling purpose. 
• Ease of administration- a good test should not pose difficulties in administration. 
• Practicality and scoring- Assigning quantitative value to a test result should not be difficult. Why, what and how.
• Usability- a good test should be useable, unambiguous and clearly stated with one meaning only.

Module 4: Development of Assessment Tools



Assessment Tools

Below are links to assessment tools and techniques along with specific geoscience examples and resources.
Concept Maps - A diagramming technique for assessing how well students see the "big picture".
Concept Tests - Conceptual multiple-choice questions that are useful in large classes.
Knowledge Survey - Students answer whether they could answer a survey of course content questions.
Exams - Find tips on how to make exams better assessment instruments.
Oral Presentations - Tips for evaluating student presentations.
Poster Presentations -Tips for evaluating poster presentations.
Peer Review - Having students assess themselves and each other.
Portfolios - A collection of evidence to demonstrate mastery of a given set of concepts.
Rubrics - A set of evaluation criteria based on learning goals and student performance.
Written Reports - Tips for assessing written reports.
Other Assessment Types Includes concept sketches, case studies, seminar-style courses, mathematical thinking and performance assessments.
Topics of Particular Interest
Large Class Assessment Learn more about assessment strategies that are particularly useful for large classes and see examples of how techniques were employed in geoscience classes.
Using Technology Learn more about how technology can improve classroom assessment and see how techniques were employed in geoscience classes.

Module 5: Characteristics of a Good Test



Reliability & Validity


We often think of reliability and validity as separate ideas but, in fact, they're related to each other. Here, I want to show you two ways you can think about their relationship.
One of my favorite metaphors for the relationship between reliability is that of the target. Think of the center of the target as the concept that you are trying to measure. Imagine that for each person you are measuring, you are taking a shot at the target. If you measure the concept perfectly for a person, you are hitting the center of the target. If you don't, you are missing the center. The more you are off for that person, the further you are from the center.

The figure above shows four possible situations. In the first one, you are hitting the target consistently, but you are missing the center of the target. That is, you are consistently and systematically measuring the wrong value for all respondents. This measure is reliable, but no valid (that is, it's consistent but wrong). The second shows hits that are randomly spread across the target. You seldom hit the center of the target but, on average, you are getting the right answer for the group (but not very well for individuals). In this case, you get a valid group estimate, but you are inconsistent. Here, you can clearly see that reliability is directly related to the variability of your measure. The third scenario shows a case where your hits are spread across the target and you are consistently missing the center. Your measure in this case is neither reliable nor valid. Finally, we see the "Robin Hood" scenario -- you consistently hit the center of the target. Your measure is both reliable and valid (I bet you never thought of Robin Hood in those terms before).
Another way we can think about the relationship between reliability and validity is shown in the figure below. Here, we set up a 2x2 table. The columns of the table indicate whether you are trying to measure the same or different concepts. The rows show whether you are using the same or different methods of measurement. Imagine that we have two concepts we would like to measure, student verbal and math ability. Furthermore, imagine that we can measure each of these in two ways. First, we can use a written, paper-and-pencil exam (very much like the SAT or GRE exams). Second, we can ask the student's classroom teacher to give us a rating of the student's ability based on their own classroom observation.

The first cell on the upper left shows the comparison of the verbal written test score with the verbal written test score. But how can we compare the same measure with itself? We could do this by estimating the reliability of the written test through a test-retest correlation, parallel forms, or an internal consistency measure. What we are estimating in this cell is the reliability of the measure.
The cell on the lower left shows a comparison of the verbal written measure with the verbal teacher observation rating. Because we are trying to measure the same concept, we are looking at convergent validity.
The cell on the upper right shows the comparison of the verbal written exam with the math written exam. Here, we are comparing two different concepts (verbal versus math) and so we would expect the relationship to be lower than a comparison of the same concept with itself (e.g., verbal versus verbal or math versus math). Thus, we are trying to discriminate between two concepts and we would consider this discriminant validity.
Finally, we have the cell on the lower right. Here, we are comparing the verbal written exam with the math teacher observation rating. Like the cell on the upper right, we are also trying to compare two different concepts (verbal versus math) and so this is a discriminant validity estimate. But here, we are also trying to compare two different methods of measurement (written exam versus teacher observation rating). So, we'll call this very discriminant to indicate that we would expect the relationship in this cell to be even lower than in the one above it.
The four cells incorporate the different values that we examine in the multitrait-multimethod approach to estimating construct validity.
When we look at reliability and validity in this way, we see that, rather than being distinct, they actually form a continuum. On one end is the situation where the concepts and methods of measurement are the same (reliability) and on the other is the situation where concepts and methods of measurement are different (verydiscriminant validity).

Module 6. Analyzing and Using of Test Item Data




Basic Concepts in Item and Test Analysis
Making fair and systematic evaluations of others' performance can be a challenging task. Judgments cannot be made solely on the basis of intuition, haphazard guessing, or custom (Sax, 1989). Teachers, employers, and others in evaluative positions use a variety of tools to assist them in their evaluations. Tests are tools that are frequently used to facilitate the evaluation process. When norm-referenced tests are developed for instructional purposes, to assess the effects of educational programs, or for educational research purposes, it can be very important to conduct item and test analyses.
Test analysis examines how the test items perform as a set. Item analysis "investigates the performance of items considered individually either in relation to some external criterion or in relation to the remaining items on the test" (Thompson & Levitov, 1985, p. 163). These analyses evaluate the quality of items and of the test as a whole. Such analyses can also be employed to revise and improve both items and the test as a whole.
However, some best practices in item and test analysis are too infrequently used in actual practice. The purpose of the present paper is to summarize the recommendations for item and test analysis practices, as these are reported in commonly-used measurement textbooks (Crocker & Algina, 1986; Gronlund & Linn, 1990; Pedhazur & Schemlkin, 1991; Sax, 1989; Thorndike, Cunningham, Thorndike, & Hagen, 1991). These tools include item difficulty, item discrimination, and item distractors.

Item Difficulty
Item difficulty is simply the percentage of students taking the test who answered the item correctly. The larger the percentage getting an item right, the easier the item. The higher the difficulty index, the easier the item is understood to be (Wood, 1960). To compute the item difficulty, divide the number of people answering the item correctly by the total number of people answering item. The proportion for the item is usually denoted as pand is called item difficulty (Crocker & Algina, 1986). An item answered correctly by 85% of the examinees would have an item difficulty, or p value, of .85, whereas an item answered correctly by 50% of the examinees would have a lower item difficulty, or p value, of .50.
p value is basically a behavioral measure. Rather than defining difficulty in terms of some intrinsic characteristic of the item, difficulty is defined in terms of the relative frequency with which those taking the test choose the correct response (Thorndike et al, 1991). For instance, in the example below, which item is more difficult?
  1. Who was Boliver Scagnasty?
  2. Who was Martin Luther King?
One cannot determine which item is more difficult simply by reading the questions. One can recognize the name in the second question more readily than that in the first. But saying that the first question is more difficult than the second, simply because the name in the second question is easily recognized, would be to compute the difficulty of the item using an intrinsic characteristic. This method determines the difficulty of the item in a much more subjective manner than that of a p value.
Another implication of a p value is that the difficulty is a characteristic of both the item and the sample taking the test. For example, an English test item that is very difficult for an elementary student will be very easy for a high school student. A p value also provides a common measure of the difficulty of test items that measure completely different domains. It is very difficult to determine whether answering a history question involves knowledge that is more obscure, complex, or specialized than that needed to answer a math problem. When p values are used to define difficulty, it is very simple to determine whether an item on a history test is more difficult than a specific item on a math test taken by the same group of students.
To make this more concrete, take into consideration the following examples. When the correct answer is not chosen (p = 0), there are no individual differences in the "score" on that item. As shown in Table 1, the correct answer C was not chosen by either the upper group or the lower group. (The upper group and lower group will be explained later.) The same is true when everyone taking the test chooses the correct response as is seen in Table 2. An item with a p value of .0 or a p value of 1.0 does not contribute to measuring individual differences, and this is almost certain to be useless. Item difficulty has a profound effect on both the variability of test scores and the precision with which test scores discriminate among different groups of examinees (Thorndike et al, 1991). When all of the test items are extremely difficult, the great majority of the test scores will be very low. When all items are extremely easy, most test scores will be extremely high. In either case, test scores will show very little variability. Thus, extreme p values directly restrict the variability of test scores.
In discussing the procedure for determining the minimum and maximum score on a test, Thompson and Levitov (1985) stated that items tend to improve test reliability when the percentage of students who correctly answer the item is halfway between the percentage expected to correctly answer if pure guessing governed responses and the percentage (100%) who would correctly answer if everyone knew the answer. (pp. 164-165)
For example, many teachers may think that the minimum score on a test consisting of 100 items with four alternatives each is 0, when in actuality the theoretical floor on such a test is 25. This is the score that would be most likely if a student answered every item by guessing (e.g., without even being given the test booklet containing the items).
Similarly, the ideal percentage of correct answers on a four-choice multiple-choice test is not 70-90%. According to Thompson and Levitov (1985), the ideal difficulty for such an item would be halfway between the percentage of pure guess (25%) and 100%, (25% + {(100% - 25%)/2}. Therefore, for a test with 100 items with four alternatives each, the ideal mean percentage of correct items, for the purpose of maximizing score reliability, is roughly 63%. Tables 3, 4, and 5 show examples of items with p values of roughly 63%.
Table 3
Maximum Item Difficulty Example Illustrating Individual Differences
Group
Item Response




*


 A
B
C
D

Upper group
1
0
13
3

Lower group
2
5
5
6

Note. * denotes correct response
Item difficulty: (13 + 5)/30 = .60p
Discrimination Index: (13-5)/15 = .53

Module 7. Educational Statistics




Mean, Median, Mode, and Range


Mean, median, and mode are three kinds of "averages". There are many "averages" in statistics, but these are, I think, the three most common, and are certainly the three you are most likely to encounter in your pre-statistics courses, if the topic comes up at all.
The "mean" is the "average" you're used to, where you add up all the numbers and then divide by the number of numbers. The "median" is the "middle" value in the list of numbers. To find the median, your numbers have to be listed in numerical order, so you may have to rewrite your list first. The "mode" is the value that occurs most often. If no number is repeated, then there is no mode for the list.

The "range" is just the difference between the largest and smallest values.

Find the mean, median, mode, and range for the following list of values:
13, 18, 13, 14, 13, 16, 14, 21, 13

The mean is the usual average, so:

(13 + 18 + 13 + 14 + 13 + 16 + 14 + 21 + 13) ÷ 9 = 15

Note that the mean isn't a value from the original list. This is a common result. You should not assume that your mean will be one of your original numbers.

The median is the middle value, so I'll have to rewrite the list in order:

13, 13, 13, 13, 14, 14, 16, 18, 21

There are nine numbers in the list, so the middle one will be the (9 + 1) ÷ 2 = 10 ÷ 2 = 5th number:

13, 13, 13, 13, 14, 14, 16, 18, 21

So the median is 14.  


The mode is the number that is repeated more often than any other, so 13 is the mode.

The largest value in the list is 21, and the smallest is 13, so the range is 21 – 13 = 8.

mean: 15
median: 14
mode: 13
range: 8

Note: The formula for the place to find the median is "( [the number of data points] + 1) ÷ 2", but you don't have to use this formula. You can just count in from both ends of the list until you meet in the middle, if you prefer. Either way will work.

Find the mean, median, mode, and range for the following list of values:
1, 2, 4, 7

The mean is the usual average: (1 + 2 + 4 + 7) ÷ 4 = 14 ÷ 4 = 3.5

The median is the middle number. In this example, the numbers are already listed in numerical order, so I don't have to rewrite the list. But there is no "middle" number, because there are an even number of numbers. In this case, the median is the mean (the usual average) of the middle two values: (2 + 4) ÷ 2 = 6 ÷ 2 = 3

The mode is the number that is repeated most often, but all the numbers appear only once. Then there is no mode.

The largest value is 7, the smallest is 1, and their difference is 6, so the range is 6.

mean: 3.5
median: 3
mode: none
range: 6

The list values were whole numbers, but the mean was a decimal value. Getting a decimal value for the mean (or for the median, if you have an even number of data points) is perfectly okay; don't round your answers to try to match the format of the other numbers.

Find the mean, median, mode, and range for the following list of values:
 8, 9, 10, 10, 10, 11, 11, 11, 12, 13

The mean is the usual average:

(8 + 9 + 10 + 10 + 10 + 11 + 11 + 11 + 12 + 13) ÷ 10 = 105 ÷ 10 = 10.5

The median is the middle value. In a list of ten values, that will be the (10 + 1) ÷ 2 = 5.5th value; that is, I'll need to average the fifth and sixth numbers to find the median:

(10 + 11) ÷ 2 = 21 ÷ 2 = 10.5

The mode is the number repeated most often. This list has two values that are repeated three times.

The largest value is 13 and the smallest is 8, so the range is 13 – 8 = 5.

mean: 10.5
median: 10.5
modes: 10 and 11
range: 5

While unusual, it can happen that two of the averages (the mean and the median, in this case) will have the same value.

Note: Depending on your text or your instructor, the above data set may be viewed as having no mode (rather than two modes), since no single solitary number was repeated more often than any other. I've seen books that go either way; there doesn't seem to be a consensus on the "right" definition of "mode" in the above case. So if you're not certain how you should answer the "mode" part of the above example, ask your instructor before the next test.

About the only hard part of finding the mean, median, and mode is keeping straight which "average" is which. Just remember the following:

mean: regular meaning of "average"
median: middle value
mode: most often

(In the above, I've used the term "average" rather casually. The technical definition of "average" is the arithmetic mean: adding up the values and then dividing by the number of values. Since you're probably more familiar with the concept of "average" than with "measure of central tendency", I used the more comfortable term.)

A student has gotten the following grades on his tests: 87, 95, 76, and 88. He wants an 85 or better overall. What is the minimum grade he must get on the last test in order to achieve that average?
The unknown score is "x". Then the desired average is:

(87 + 95 + 76 + 88 + x) ÷ 5 = 85

Multiplying through by 5 and simplifying, I get:
87 + 95 + 76 + 88 + x = 425
                      346 + x = 425
                                x = 79

He needs to get at least a 79 on the last test.

Module 8. Rubrics, Portfolio and Performance-Based Assessment



Uses of a Portfolio
A portfolio is simply a portable collection of artifacts (Brown and Knight, 1994) or a record of what the creator has to offer in terms of  range, quality of work/knowledge/ level of skill attainment and capabilities. Redman (1994) described portfolios as a source  of  good practice or in the terms  of Meeting the Challenge (DoH, 2000)  and the Health Act (1999), evidence  of competent practice. Redman (1994) also said that  a  portfolio is not a historical record or  a profile of current competence, but a living, growing  collection of evidence that mirrors the growth of  its creator.  But what are the uses of a portfolio?
• Development meeting/appraisal  
At appraisal  your portfolio demonstrates that the activities and  objectives have been undertaken and achieved and where your career is heading.
• Course participation A  portfolio enables the learner to become more reflective, to  be able to recognize strengths and limitations and become more aware of the learning they have achieved (Brookfield, 1995).  
• Career development/promotion Competencies or benchmarks specify  performance criteria that have to be achieved  by employees seeking promotion  — a portfolio demonstrates achievement or development in these areas.
• Monitoring practice Evaluation of practice demonstrates  ongoing commitment to competence and can lead to improved performance in practice,  which can be  presented in the portfolio.
• Job/secondments/course applications A portfolio can  be compiled for a specific purpose and can contain specially selected material that provides evidence of capability and transferable skills matched to each of the criteria listed on the job specification.
• Accreditation of prior experience and learning (entry to a course) Preparing and presenting evidence of when and how learning occurred, what learning outcomes were achieved and how new learning is to be used in the future is a crucial role for a  portfolio. Certificates, reflective logs and testimonials can all contribute to this evidence.
• Marketability More people are becoming self-employed and combining jobs with private practice. The skills and abilities of the portfolio holder are clearly defined and can readily be reviewed by prospective employers or contractors.  Portfolios can be like a brochure about you or your service.
• Continuing professional development of the process of structuring, reflecting on and recording activity is a beaming process in itself and can help to develop strategies for thinking and reasoning in and on practice.
• Planning for the future CPD, which is driven by a development plan, can be defined as: “The maintenance and enhancement of the knowledge, expertise and competence of professionals throughout their careers according  to a  plan formulated  with regard to the needs of the professional, the employer, the profession and society'”(Madden and  Mitchell, 1993).
• Fed up? Need some reality orientation looking back through a portfolio at things you have achieved and the things you have handled remind one thing are constantly changing, developing making a difference? A portfolio reminds you of what you have achieved.
• Build up your own evidence base a development plan, portfolio and especially the reflective log, will help deepen your knowledge of occupational therapy related issues, keep up-to-date and maintain skills which are needed for you to carry out your job. It allows you to demonstrate that the service you offer to your clients or patients is the most appropriate one for them.  A portfolio has many uses, and will be looked at in its various forms by many different people. However, in essence it is a presentation of the interests, objectives experiences, skills and development of the creator.

Module 9. Grading and Reporting Practices




Purpose for Grades:
The primary purpose for grading should be to communicate with students, parents, and others (the board, the school division, post secondary institutions, and the Ministry) about their achievement of learning goals. 
The secondary purposes for grading include providing teachers with information for instructional planning…and providing teachers, administrators, parents, and students with information for evaluation of school programs and for student placement.    Grades and other communication about student achievement should be based on solid, high-quality evidence. Teachers should be able to describe that evidence and explain how they arrived at any judgments about the quality of student work. “(Brookhart, 2009)
Definitions:
·         Grade
The number or letter reported at the end of a period of time as a summary statement of student performance and/or achievement 
·         Assessment
Gathering evidence of student learning.  Planned or serendipitous activities that provide information about students’ understanding and skill in a specific  measurement topic (Marzano, 2006)
·         Descriptive Feedback
Gives information that enables the learner to adjust what he or she is doing in order to improve. Descriptive feedback comes from many sources such as teachers, peers, and the students themselves, as they compare their work to samples and related criteria. (Davies, 2008) “The most powerful single innovation that enhances achievement is feedback. The simplest prescription for improving education must be ‘dollups of feedback’. (Marzano, 2006)
(Assessment FOR Learning – AFL is formative assessment plus deep involvement of the learner).  Descriptive feedback is not reported as a grade.
·         Evaluation and/or Evaluative Feedback
Tells the learner how she or he has performed as compared to others or to some standard. Evidence helps the teacher consider whether the student has learned what was needed, and how well they learned it. Evaluative feedback is often reported using letters, numbers, checks, or other symbols. (Assessment oF Learning) (Davies, 2008). The process of making judgments about the levels of students’ understanding or skill based on an assessment (Marzano,2006)



Example:
Grading System
The formula for the computation for grades for non-review subjects shall be as follow:


Class Standing
25%
Preliminary Exam
35%
Final Exam
40%
TOTAL
100%



For review subjects, the formula is as folows:
Class Standing
20%
Preliminary Exams
35%
Revalida
20%
Final Exams
35%


TOTAL
100%
The numerical 5 Point System Observed in the University shall be used for purposes of the transcript of record.
96 - 100
1.00
Excellent
94 - 95
1.25
Very Good
92 - 93
1.50
Very Good
89 - 91
1.75
Good
87 - 88
2.00
Good
84 - 86
2.25
Good
82 - 83
2.50
Fair
79 - 81
2.75
Fair
75 - 78
3.00
Pass
Below 75
5.00
Failure

WF -
Withdrew without permission

WP -
Withdrew with permission

FA -
Failure Due to absences




Failed 




Walang komento:

Mag-post ng isang Komento