A. RELATED
LITERATURE
Do pupils make more progress by taking charge of
their own learning?
Assessment for Learning – The process of seeking
and interpreting evidence for use by learners and their teachers to decide
where the learners are in their learning, where they need to go and how best to
get there.
Here is a podcast-style audio interview about
Assessment for Learning. Useful overview, includes interview with teacher and
pupils (just click the play button).
Assessment for Learning key findings:
Aims to help pupils to know and recognise the
standards that they are aiming for.
Learners need to be clear about exactly what they have to achieve in
order to progress, and learning goals should be shared with them. They need to understand what counts as good
work.
Aims to encourage pupils to be active *learners
which evidence suggests leads them to make greater improvements than passive*
learners.
Involves pupils in peer and self-assessment. Ultimately, learners must be responsible for
their own learning; the teacher cannot do that for them. So pupils must be actively involved in the
progress and need to be encouraged to see for themselves how they have
progressed in their learning and what it is they need to do to improve. Teachers need to encourage pupils to review
their work critically and constructively.
Provides feedback, which leads to pupils
recognising their next steps and how to take them. Feedback should be about the qualities of the
work with specific advice on what needs to be done in order to improve. Pupils need to be given the time to act on
advice and make decisions about their work, rather than being the passive
recipients of teachers’ judgements.
* active learners can be defined as those who:
·
can select information for their own
use
·
can offer and challenge opinions
·
independently judge their own work
against given criteria and standards
* passive learners can be defined as those who:
·
are recipients of knowledge
·
await guidance at every stage before
making progress
·
rely on others to judge the quality of
their work
Learning
Targets
- This includes both of what students will know, understand,
and be able to do and the
criteria that will be used to judge performance.
- The criteria that are used to judge performance can be
thought of as the different dimensions of student performance that will be
used to judge whether or not you’re your objectives have been met.
- Learning targets emphasize the link between instruction and
assessment since writing these objectives, you should always be thinking
about assessment. How will you
determine if students have learned what you have taught? What observable behaviors (either
verbally or in writing) will demonstrate that students have met the
objectives? If you find these
questions difficult to answer then you probably have not written very good
learning objectives.
Types
of Learning Targets
- Knowledge and Simple
Understanding: This includes mastery of facts and
information, typically through recall (i.e. dates, definitions, and
principles) as well as simple understanding (i.e. summarizing a paragraph,
explaining a chart, and giving examples).
- Deep Understanding and
Reasoning: This
includes problem solving, critical thinking, synthesis, comparing, higher
order thinking skills, and judgment.
- Skills: This involves something that a student
must demonstrate in a way other than answering questions. These type of targets involve a behavior
in which the knowledge, understanding, and reasoning, are used overtly.
- Products: This includes a sample of student work
(i.e. paper, report, artwork, or other project) that demonstrates the
ability knowledge, understanding, reasoning, and skills.
- Affective: This includes attitudes, values,
interests, feelings, and beliefs.
Sources
of Learning Targets
- Bloom's Taxonomy
- National, State, and District Standards
- Textbooks
Test Characteristics
Test, as an instrument possesses some qualities, which are necessary,
before it can be eligible as a test and usable.
A test should therefore possess the under listed characteristics, which
are interdependent and are what makes a test what it should be.
They include:
• Validity- when a test fulfils its purpose(s) that is measures what it
intended to measure and to the extent desired then it is valid. The
characteristics of testee can blur the time validity of a test. That is, it can
provide false results that do not represent truly what it intend to measure in
a student. If a learner has difficulty in assessing the Internet for course
materials and participation it can send wrong impression on the learner
commitment to log in and ability in course work.
• Reliability- The consistency of test ability to measure accurately what
it supposes to measure is its strength in reliability. It is the ‘extent to
which a particular measurement is consistent and reproducible’.
• Objectivity- The fairness of a test to the testee, bias test does not
portray objectivity and hence is not reliable. A test that is objective has
high validity and reliability
• Discrimination- A good test must be able to make distinction between
poor and good learner; it should show the slight differences between learner
attainment and achievement that will make it possible to distinguish between
poor and good learner. What are the likely criteria in order to satisfy these
conditions?
• Comprehensiveness- Test items that covers much of the content of the
course, that is the subject matter is said to be comprehensive and hence
capable of fulfilling purpose.
• Ease of administration- a good test should not pose difficulties in
administration.
• Practicality and scoring- Assigning quantitative value to a test result
should not be difficult. Why, what and how.
• Usability- a good test should be useable, unambiguous and clearly
stated with one meaning only.
Assessment Tools
Below are links to assessment tools and techniques along with specific
geoscience examples and resources.
Concept Maps - A diagramming technique for assessing how well students
see the "big picture".
Concept Tests - Conceptual multiple-choice questions that are useful in
large classes.
Knowledge Survey - Students answer whether they could answer a survey of
course content questions.
Exams - Find tips on how to make exams better assessment instruments.
Oral Presentations - Tips for evaluating student presentations.
Poster Presentations -Tips for evaluating poster presentations.
Peer Review - Having students assess themselves and each other.
Portfolios - A collection of evidence to demonstrate mastery of a given
set of concepts.
Rubrics - A set of evaluation criteria based on learning goals and
student performance.
Written Reports - Tips for assessing written reports.
Other Assessment Types Includes concept sketches, case studies,
seminar-style courses, mathematical thinking and performance assessments.
Topics of Particular Interest
Large Class Assessment Learn more about assessment strategies that are
particularly useful for large classes and see examples of how techniques were
employed in geoscience classes.
Using Technology Learn more about how technology can improve classroom
assessment and see how techniques were employed in geoscience classes.
Reliability &
Validity
We often think of reliability and validity as separate ideas but, in fact, they're related to each other. Here, I want to show you two ways you can think about their relationship.
One of my favorite metaphors for the relationship between reliability is
that of the target. Think of the center of the target as the concept that you
are trying to measure. Imagine that for each person you are measuring, you are
taking a shot at the target. If you measure the concept perfectly for a person,
you are hitting the center of the target. If you don't, you are missing the
center. The more you are off for that person, the further you are from the center.
The figure above shows four possible situations. In the first one, you
are hitting the target consistently, but you are missing the center of the
target. That is, you are consistently and systematically measuring the wrong
value for all respondents. This measure is reliable, but no valid (that is,
it's consistent but wrong). The second shows hits that are randomly spread
across the target. You seldom hit the center of the target but, on average, you
are getting the right answer for the group (but not very well for individuals).
In this case, you get a valid group estimate, but you are inconsistent. Here,
you can clearly see that reliability is directly related to the variability of
your measure. The third scenario shows a case where your hits are spread across
the target and you are consistently missing the center. Your measure in this
case is neither reliable nor valid. Finally, we see the "Robin Hood"
scenario -- you consistently hit the center of the target. Your measure is both
reliable and valid (I bet you never thought of Robin Hood in those terms
before).
Another way we can think about the relationship between reliability and
validity is shown in the figure below. Here, we set up a 2x2 table. The columns
of the table indicate whether you are trying to measure the same or different
concepts. The rows show whether you are using the same or different methods of
measurement. Imagine that we have two concepts we would like to measure,
student verbal and math ability. Furthermore, imagine that we can measure each
of these in two ways. First, we can use a written, paper-and-pencil exam (very
much like the SAT or GRE exams). Second, we can ask the student's classroom
teacher to give us a rating of the student's ability based on their own
classroom observation.
The first cell on the upper left shows the comparison of the verbal
written test score with the verbal written test score. But how can we compare
the same measure with itself? We could do this by estimating the reliability of
the written test through a test-retest correlation, parallel forms, or an
internal consistency measure. What we
are estimating in this cell is the reliability of the measure.
The cell on the lower left shows a comparison of the verbal written
measure with the verbal teacher observation rating. Because we are trying to
measure the same concept, we are looking at convergent validity.
The cell on the upper right shows the comparison of the verbal written
exam with the math written exam. Here, we are comparing two different concepts
(verbal versus math) and so we would expect the relationship to be lower than a
comparison of the same concept with itself (e.g., verbal versus verbal or math
versus math). Thus, we are trying to discriminate between two concepts and we
would consider this discriminant validity.
Finally, we have the cell on the lower right. Here, we are comparing the
verbal written exam with the math teacher observation rating. Like the cell on
the upper right, we are also trying to compare two different concepts (verbal
versus math) and so this is a discriminant validity estimate. But here, we are
also trying to compare two different methods of measurement (written exam
versus teacher observation rating). So, we'll call this very discriminant
to indicate that we would expect the relationship in this cell to be even lower
than in the one above it.
The four cells incorporate the different values that we examine in
the multitrait-multimethod approach to estimating construct validity.
When we look at reliability and validity in this way, we see that, rather
than being distinct, they actually form a continuum. On one end is the
situation where the concepts and methods of measurement are the same
(reliability) and on the other is the situation where concepts and methods of
measurement are different (verydiscriminant validity).
Basic Concepts in Item and
Test Analysis
Making fair and systematic evaluations of
others' performance can be a challenging task. Judgments cannot be made solely
on the basis of intuition, haphazard guessing, or custom (Sax, 1989). Teachers,
employers, and others in evaluative positions use a variety of tools to assist
them in their evaluations. Tests are tools that are frequently used to
facilitate the evaluation process. When norm-referenced tests are developed for
instructional purposes, to assess the effects of educational programs, or for
educational research purposes, it can be very important to conduct item and
test analyses.
Test analysis examines how the test items
perform as a set. Item analysis "investigates the performance of items
considered individually either in relation to some external criterion or in
relation to the remaining items on the test" (Thompson & Levitov,
1985, p. 163). These analyses evaluate the quality of items and of the test as
a whole. Such analyses can also be employed to revise and improve both items
and the test as a whole.
However, some best practices in item and test
analysis are too infrequently used in actual practice. The purpose of the
present paper is to summarize the recommendations for item and test analysis
practices, as these are reported in commonly-used measurement textbooks
(Crocker & Algina, 1986; Gronlund & Linn, 1990; Pedhazur &
Schemlkin, 1991; Sax, 1989; Thorndike, Cunningham, Thorndike, & Hagen,
1991). These tools include item difficulty, item discrimination, and item
distractors.
Item Difficulty
Item difficulty is simply the percentage of
students taking the test who answered the item correctly. The larger the
percentage getting an item right, the easier the item. The higher the
difficulty index, the easier the item is understood to be (Wood, 1960). To
compute the item difficulty, divide the number of people answering the item
correctly by the total number of people answering item. The proportion for the
item is usually denoted as pand is called item difficulty (Crocker
& Algina, 1986). An item answered correctly by 85% of the examinees would
have an item difficulty, or p value, of .85, whereas an item
answered correctly by 50% of the examinees would have a lower item difficulty,
or p value, of .50.
A p value is basically a
behavioral measure. Rather than defining difficulty in terms of some intrinsic
characteristic of the item, difficulty is defined in terms of the relative
frequency with which those taking the test choose the correct response
(Thorndike et al, 1991). For instance, in the example below, which item is more
difficult?
- Who
was Boliver Scagnasty?
- Who
was Martin Luther King?
One cannot determine which item is more
difficult simply by reading the questions. One can recognize the name in the
second question more readily than that in the first. But saying that the first
question is more difficult than the second, simply because the name in the
second question is easily recognized, would be to compute the difficulty of the
item using an intrinsic characteristic. This method determines the difficulty
of the item in a much more subjective manner than that of a p value.
Another implication of a p value
is that the difficulty is a characteristic of both the item and the sample
taking the test. For example, an English test item that is very difficult for
an elementary student will be very easy for a high school student. A p value
also provides a common measure of the difficulty of test items that measure
completely different domains. It is very difficult to determine whether
answering a history question involves knowledge that is more obscure, complex,
or specialized than that needed to answer a math problem. When p values
are used to define difficulty, it is very simple to determine whether an item
on a history test is more difficult than a specific item on a math test taken
by the same group of students.
To make this more concrete, take into
consideration the following examples. When the correct answer is not chosen (p =
0), there are no individual differences in the "score" on that item.
As shown in Table 1, the correct answer C was not chosen by either the upper
group or the lower group. (The upper group and lower group will be explained
later.) The same is true when everyone taking the test chooses the correct
response as is seen in Table 2. An item with a p value of .0
or a p value of 1.0 does not contribute to measuring
individual differences, and this is almost certain to be useless. Item
difficulty has a profound effect on both the variability of test scores and the
precision with which test scores discriminate among different groups of
examinees (Thorndike et al, 1991). When all of the test items are extremely
difficult, the great majority of the test scores will be very low. When all
items are extremely easy, most test scores will be extremely high. In either
case, test scores will show very little variability. Thus, extreme p values
directly restrict the variability of test scores.
In discussing the procedure for determining
the minimum and maximum score on a test, Thompson and Levitov (1985) stated
that items tend to improve test reliability when the percentage of students who
correctly answer the item is halfway between the percentage expected to
correctly answer if pure guessing governed responses and the percentage (100%)
who would correctly answer if everyone knew the answer. (pp. 164-165)
For example, many teachers may think that the
minimum score on a test consisting of 100 items with four alternatives each is
0, when in actuality the theoretical floor on such a test is 25. This is the
score that would be most likely if a student answered every item by guessing
(e.g., without even being given the test booklet containing the items).
Similarly, the ideal percentage of correct
answers on a four-choice multiple-choice test is not 70-90%. According to
Thompson and Levitov (1985), the ideal difficulty for such an item would be
halfway between the percentage of pure guess (25%) and 100%, (25% + {(100% -
25%)/2}. Therefore, for a test with 100 items with four alternatives each, the
ideal mean percentage of correct items, for the purpose of maximizing score
reliability, is roughly 63%. Tables 3, 4, and 5 show examples of items with p
values of roughly 63%.
Table 3
Maximum Item Difficulty
Example Illustrating Individual Differences
Group
|
Item Response
|
||||
*
|
|||||
A
|
B
|
C
|
D
|
||
Upper group
|
1
|
0
|
13
|
3
|
|
Lower group
|
2
|
5
|
5
|
6
|
Note. * denotes correct
response
Item difficulty: (13 +
5)/30 = .60p
Discrimination Index:
(13-5)/15 = .53
Mean, Median, Mode,
and Range
Mean, median, and mode are three kinds of "averages". There are many "averages" in statistics, but these are, I think, the three most common, and are certainly the three you are most likely to encounter in your pre-statistics courses, if the topic comes up at all.
The "mean"
is the "average" you're used to, where you add up all the numbers and
then divide by the number of numbers. The "median" is the
"middle" value in the list of numbers. To find the median, your
numbers have to be listed in numerical order, so you may have to rewrite your
list first. The "mode" is the value that occurs most often. If no
number is repeated, then there is no mode for the list.
The "range"
is just the difference between the largest and smallest values.
Find the mean,
median, mode, and range for the following list of values:
13, 18, 13, 14, 13,
16, 14, 21, 13
The mean is the usual
average, so:
(13 + 18 + 13 + 14 +
13 + 16 + 14 + 21 + 13) ÷ 9 = 15
Note that the mean
isn't a value from the original list. This is a common result. You should not
assume that your mean will be one of your original numbers.
The median is the
middle value, so I'll have to rewrite the list in order:
13, 13, 13, 13, 14,
14, 16, 18, 21
There are nine
numbers in the list, so the middle one will be the (9 + 1) ÷ 2 = 10 ÷ 2 = 5th
number:
13, 13, 13, 13, 14,
14, 16, 18, 21
So the median is
14.
The mode is the number that is repeated more often than any other, so 13 is the mode.
The mode is the number that is repeated more often than any other, so 13 is the mode.
The largest value in
the list is 21, and the smallest is 13, so the range is 21 – 13 = 8.
mean: 15
median: 14
mode: 13
range: 8
Note: The formula for
the place to find the median is "( [the number of data points] + 1) ÷
2", but you don't have to use this formula. You can just count in from
both ends of the list until you meet in the middle, if you prefer. Either way
will work.
Find the mean,
median, mode, and range for the following list of values:
1, 2, 4, 7
The mean is the usual
average: (1 + 2 + 4 + 7) ÷ 4 = 14 ÷ 4 = 3.5
The median is the
middle number. In this example, the numbers are already listed in numerical
order, so I don't have to rewrite the list. But there is no "middle"
number, because there are an even number of numbers. In this case, the median
is the mean (the usual average) of the middle two values: (2 + 4) ÷ 2 = 6 ÷ 2 =
3
The mode is the
number that is repeated most often, but all the numbers appear only once. Then
there is no mode.
The largest value is
7, the smallest is 1, and their difference is 6, so the range is 6.
mean: 3.5
median: 3
mode: none
range: 6
The list values were
whole numbers, but the mean was a decimal value. Getting a decimal value for
the mean (or for the median, if you have an even number of data points) is
perfectly okay; don't round your answers to try to match the format of the
other numbers.
Find the mean,
median, mode, and range for the following list of values:
8, 9, 10, 10, 10, 11, 11, 11, 12, 13
The mean is the usual
average:
(8 + 9 + 10 + 10 + 10
+ 11 + 11 + 11 + 12 + 13) ÷ 10 = 105 ÷ 10 = 10.5
The median is the
middle value. In a list of ten values, that will be the (10 + 1) ÷ 2 = 5.5th
value; that is, I'll need to average the fifth and sixth numbers to find the
median:
(10 + 11) ÷ 2 = 21 ÷
2 = 10.5
The mode is the
number repeated most often. This list has two values that are repeated three
times.
The largest value is
13 and the smallest is 8, so the range is 13 – 8 = 5.
mean: 10.5
median: 10.5
modes: 10 and 11
range: 5
While unusual, it can
happen that two of the averages (the mean and the median, in this case) will
have the same value.
Note: Depending on
your text or your instructor, the above data set may be viewed as having no
mode (rather than two modes), since no single solitary number was repeated more
often than any other. I've seen books that go either way; there doesn't seem to
be a consensus on the "right" definition of "mode" in the
above case. So if you're not certain how you should answer the "mode"
part of the above example, ask your instructor before the next test.
About the only hard
part of finding the mean, median, and mode is keeping straight which
"average" is which. Just remember the following:
mean: regular meaning
of "average"
median: middle value
mode: most often
(In the above, I've
used the term "average" rather casually. The technical definition of
"average" is the arithmetic mean: adding up the values and then
dividing by the number of values. Since you're probably more familiar with the
concept of "average" than with "measure of central
tendency", I used the more comfortable term.)
A student has gotten
the following grades on his tests: 87, 95, 76, and 88. He wants an 85 or better
overall. What is the minimum grade he must get on the last test in order to
achieve that average?
The unknown score is
"x". Then the desired average is:
(87 + 95 + 76 + 88 +
x) ÷ 5 = 85
Multiplying through
by 5 and simplifying, I get:
87 + 95 + 76 + 88 + x
= 425
346 + x = 425
x = 79
He needs to get at
least a 79 on the last test.
Uses of a Portfolio
A portfolio is simply a portable collection of
artifacts (Brown and Knight, 1994) or a record of what the creator has to offer
in terms of range, quality of
work/knowledge/ level of skill attainment and capabilities. Redman (1994)
described portfolios as a source of good practice or in the terms of Meeting the Challenge (DoH, 2000) and the Health Act (1999), evidence of competent practice. Redman (1994) also
said that a portfolio is not a historical record or a profile of current competence, but a living,
growing collection of evidence that
mirrors the growth of its creator. But what are the uses of a portfolio?
• Development meeting/appraisal
At appraisal
your portfolio demonstrates that the activities and objectives have been undertaken and achieved
and where your career is heading.
• Course participation A portfolio enables the learner to become more reflective,
to be able to recognize strengths and
limitations and become more aware of the learning they have achieved
(Brookfield, 1995).
• Career development/promotion Competencies or
benchmarks specify performance criteria
that have to be achieved by employees seeking
promotion — a portfolio demonstrates
achievement or development in these areas.
• Monitoring practice Evaluation of practice
demonstrates ongoing commitment to
competence and can lead to improved performance in practice, which can be
presented in the portfolio.
• Job/secondments/course applications A portfolio
can be compiled for a specific purpose
and can contain specially selected material that provides evidence of capability
and transferable skills matched to each of the criteria listed on the job
specification.
• Accreditation of prior experience and learning
(entry to a course) Preparing and presenting evidence of when and how learning
occurred, what learning outcomes were achieved and how new learning is to be
used in the future is a crucial role for a
portfolio. Certificates, reflective logs and testimonials can all contribute
to this evidence.
• Marketability More people are becoming
self-employed and combining jobs with private practice. The skills and
abilities of the portfolio holder are clearly defined and can readily be
reviewed by prospective employers or contractors. Portfolios can be like a brochure about you
or your service.
• Continuing professional development of the
process of structuring, reflecting on and recording activity is a beaming
process in itself and can help to develop strategies for thinking and reasoning
in and on practice.
• Planning for the future CPD, which is driven by
a development plan, can be defined as: “The maintenance and enhancement of the
knowledge, expertise and competence of professionals throughout their careers
according to a plan formulated with regard to the needs of the professional,
the employer, the profession and society'”(Madden and Mitchell, 1993).
• Fed up? Need some reality orientation looking
back through a portfolio at things you have achieved and the things you have
handled remind one thing are constantly changing, developing making a
difference? A portfolio reminds you of what you have achieved.
• Build up your own evidence base a development
plan, portfolio and especially the reflective log, will help deepen your
knowledge of occupational therapy related issues, keep up-to-date and maintain
skills which are needed for you to carry out your job. It allows you to
demonstrate that the service you offer to your clients or patients is the most appropriate
one for them. A portfolio has many uses,
and will be looked at in its various forms by many different people. However,
in essence it is a presentation of the interests, objectives experiences,
skills and development of the creator.
Purpose for Grades:
The primary purpose
for grading should be to communicate with students, parents, and others (the
board, the school division, post secondary institutions, and the Ministry)
about their achievement of learning goals.
The secondary
purposes for grading include providing teachers with information for
instructional planning…and providing teachers, administrators, parents, and
students with information for evaluation of school programs and for student
placement. Grades and other communication
about student achievement should be based on solid, high-quality evidence.
Teachers should be able to describe that evidence and explain how they arrived
at any judgments about the quality of student work. “(Brookhart, 2009)
Definitions:
·
Grade
The number or letter reported at the end of a
period of time as a summary statement of student performance and/or
achievement
·
Assessment
Gathering evidence of student learning. Planned or serendipitous activities that
provide information about students’ understanding and skill in a specific measurement topic (Marzano, 2006)
·
Descriptive Feedback
Gives information that enables the learner to
adjust what he or she is doing in order to improve. Descriptive feedback comes
from many sources such as teachers, peers, and the students themselves, as they
compare their work to samples and related criteria. (Davies, 2008) “The most
powerful single innovation that enhances achievement is feedback. The simplest prescription
for improving education must be ‘dollups of feedback’. (Marzano, 2006)
(Assessment FOR Learning – AFL is formative
assessment plus deep involvement of the learner). Descriptive feedback is not reported as a
grade.
·
Evaluation and/or Evaluative Feedback
Tells the learner how she or he has performed as
compared to others or to some standard. Evidence helps the teacher consider
whether the student has learned what was needed, and how well they learned it.
Evaluative feedback is often reported using letters, numbers, checks, or other
symbols. (Assessment oF Learning) (Davies, 2008). The process of making
judgments about the levels of students’ understanding or skill based on an
assessment (Marzano,2006)
Grading System
The formula for the computation for grades for non-review
subjects shall be as follow:
Class
Standing
|
25%
|
Preliminary
Exam
|
35%
|
Final
Exam
|
40%
|
TOTAL
|
100%
|
For review subjects, the formula is as folows:
Class
Standing
|
20%
|
Preliminary
Exams
|
35%
|
Revalida
|
20%
|
Final
Exams
|
35%
|
TOTAL
|
100%
|
The numerical 5 Point System Observed in the University shall be
used for purposes of the transcript of record.
96 -
100
|
1.00
|
Excellent
|
94 -
95
|
1.25
|
Very
Good
|
92 -
93
|
1.50
|
Very
Good
|
89 -
91
|
1.75
|
Good
|
87 -
88
|
2.00
|
Good
|
84 -
86
|
2.25
|
Good
|
82 -
83
|
2.50
|
Fair
|
79 -
81
|
2.75
|
Fair
|
75 -
78
|
3.00
|
Pass
|
Below
75
|
5.00
|
Failure
|
WF -
|
Withdrew
without permission
|
|
WP -
|
Withdrew
with permission
|
|
FA -
|
Failure
Due to absences
Failed |
Walang komento:
Mag-post ng isang Komento