From Wikipedia, the free encyclopedia - View original article
|Purpose||Comparison of education attainment across the world|
|Membership||59 government education departments|
|Head of the Early Childhood and Schools Division||Michael Davidson|
|Main organ||PISA Governing Body (Chair – Lorna Bertrand, England)|
|Purpose||Comparison of education attainment across the world|
|Membership||59 government education departments|
|Head of the Early Childhood and Schools Division||Michael Davidson|
|Main organ||PISA Governing Body (Chair – Lorna Bertrand, England)|
The Programme for International Student Assessment (PISA) is a worldwide study by the Organisation for Economic Co-operation and Development (OECD) in member and non-member nations of 15-year-old school pupils' scholastic performance on mathematics, science, and reading. It was first performed in 2000 and then repeated every three years. It is done with a view to improving education policies and outcomes.
470,000 15-year-old students representing 65 nations and territories participated in PISA 2009. An additional 50,000 students representing nine nations were tested in 2010.
The Trends in International Mathematics and Science Study (TIMSS) and the Progress in International Reading Literacy Study (PIRLS) by the International Association for the Evaluation of Educational Achievement are similar studies.
PISA stands in a tradition of international school studies, undertaken since the late 1950s by the International Association for the Evaluation of Educational Achievement (IEA). Much of PISA's methodology follows the example of the Trends in International Mathematics and Science Study (TIMSS, started in 1995), which in turn was much influenced by the U.S. National Assessment of Educational Progress (NAEP). The reading component of PISA is inspired by the IEA's Progress in International Reading Literacy Study (PIRLS).
The PISA mathematics literacy test asks students to apply their mathematical knowledge to solve problems set in real-world contexts. To solve the problems students must activate a number of mathematical competencies as well as a broad range of mathematical content knowledge. TIMSS, on the other hand, measures more traditional classroom content such as an understanding of fractions and decimals and the relationship between them (curriculum attainment). PISA claims to measure education's application to real-life problems and lifelong learning (workforce knowledge).
In the reading test, "OECD/PISA does not measure the extent to which 15-year-old students are fluent readers or how competent they are at word recognition tasks or spelling." Instead, they should be able to "construct, extend and reflect on the meaning of what they have read across a wide range of continuous and non-continuous texts."
Developed from 1997, the first PISA assessment was carried out in 2000. The results of each period of assessment take about one year and a half to be analysed. First results were published in November 2001. The release of raw data and the publication of technical report and data handbook only took place in spring 2002. The triennial repeats follow a similar schedule; the process of seeing through a single PISA cycle, start-to-finish, always takes over four years.
Every period of assessment focuses on one of the three competence fields of reading, math, science; but the two others are tested as well. After nine years, a full cycle is completed: after 2000, reading was again the main domain in 2009.
|Period||Focus||OECD countries||Partner countries||Participating students||Notes|
|2000||Reading||28||4 + 11||265,000||The Netherlands disqualified from data analysis. 11 additional non-OECD countries took the test in 2002.|
|2003||Mathematics||30||11||275,000||UK disqualified from data analysis. Also included test in problem solving.|
|2006||Science||30||27||400,000||Reading scores for US disqualified from analysis due to misprint in testing materials.|
|2009||Reading||34||41 + 10||470,000||10 additional non-OECD countries took the test in 2010.|
PISA is sponsored, governed, and coordinated by the OECD. The test design, implementation, and data analysis is delegated to an international consortium of research and educational institutions led by the Australian Council for Educational Research (ACER). ACER leads in developing and implementing sampling procedures and assisting with monitoring sampling outcomes across these countries. The assessment instruments fundamental to PISA's reading, mathematics, science, problem-solving, computer-based testing, background and contextual questionnaires are similarly constructed and refined by ACER. ACER also develops purpose-built software to assist in sampling and data capture, and analyses all data. The source code of the data analysis software is not made public.
The students tested by PISA are ages between 15 years and 3 months and 16 years and 2 months at the beginning of the assessment period. The school year pupils are in is not taken into consideration. Only students at school are tested, not home-schoolers. In PISA 2006, however, several countries also used a grade-based sample of students. This made it possible to study how age and school year interact.
To fulfill OECD requirements, each country must draw a sample of at least 5,000 students. In small countries like Iceland and Luxembourg, where there are fewer than 5,000 students per year, an entire age cohort is tested. Some countries used much larger samples than required to allow comparisons between regions.
Each student takes a two-hour handwritten test. Part of the test is multiple-choice and part involves fuller answers. There are six and a half hours of assessment material, but each student is not tested on all the parts. Following the cognitive test, participating students spend nearly one more hour answering a questionnaire on their background including learning habits, motivation and family. School directors fill in a questionnaire describing school demographics, funding, etc.
In selected countries, PISA started experimentation with computer adaptive testing.
Countries are allowed to combine PISA with complementary national tests.
Germany does this in a very extensive way: On the day following the international test, students take a national test called PISA-E (E=Ergänzung=complement). Test items of PISA-E are closer to TIMSS than to PISA. While only about 5,000 German students participate in the international and the national test, another 45,000 take only the latter. This large sample is needed to allow an analysis by federal states. Following a clash about the interpretation of 2006 results, the OECD warned Germany that it might withdraw the right to use the "PISA" label for national tests.
From the beginning, PISA has been designed with one particular method of data analysis in mind. Since students work on different test booklets, raw scores must be 'scaled' to allow meaningful comparisons. Scores are thus scaled so that the OECD average in each domain (mathematics, reading and science) is 500 and the standard deviation is 100.
This scaling is done using the Rasch model of item response theory (IRT). According to IRT, it is not possible to assess the competence of students who solved none or all of the test items. This problem is circumvented by imposing a Gaussian prior probability distribution of competences. The scaling procedure is described in nearly identical terms in the Technical Reports of PISA 2000, 2003, 2006. NAEP and TIMSS use similar scaling methods.
All PISA results are tabulated by country; recent PISA cycles have separate provincial or regional results for some countries. Most public attention concentrates on just one outcome: the mean scores of countries and their rankings of countries against one another. In the official reports, however, country-by-country rankings are given not as simple league tables but as cross tables indicating for each pair of countries whether or not mean score differences are statistically significant (unlikely to be due to random fluctuations in student sampling or in item functioning). In favorable cases, a difference of 9 points is sufficient to be considered significant.
PISA never combines mathematics, science and reading domain scores into an overall score. However, commentators have sometimes combined test results from all three domains into an overall country ranking. Such meta-analysis is not endorsed by the OECD, although official summaries sometimes use scores from a testing cycle's principal domain as a proxy for overall student ability.
PISA 2012 was presented on 3 December 2013, with results for around 510,000 participating students in all 34 OECD member countries and 31 partner countries. This testing cycle had a particular focus on mathematics, where the mean score was 494.
Of the partner countries, only selected areas of three countries—India, Venezuela and China—were assessed. PISA 2009+, released in December 2011, included data from 10 additional partner countries which had testing delayed from 2009 to 2010 because of scheduling constraints.
The results for PISA 2003 were released on 14 December 2004. This PISA cycle tested 275,000 15 year-olds on mathematics, science, reading and problem solving and involved schools from 30 OECD member countries and 11 partner countries. Note that for Science and Reading, the means displayed are for "All Students", but for these two subjects (domains), not all of the students answered questions in these domains. In the 2003 OECD Technical Report (pages 208, 209), there are different country means (different than those displayed below) available for students who had exposure to these domains.
The results for the first cycle of the PISA survey were released on 14 November 2001. 265,000 15 year-olds were tested in 28 OECD countries and 4 partner countries on mathematics, science and reading. An additional 11 countries were tested later in 2002.
The correlation between PISA 2003 and TIMSS 2003 grade 8 country means is 0.84 in mathematics, 0.95 in science. The values go down to 0.66 and 0.79 if the two worst performing developing countries are excluded. Correlations between different scales and studies are around 0.80. The high correlations between different scales and studies indicate common causes of country differences (e.g. educational quality, culture, wealth or genes) or a homogenous underlying factor of cognitive competence. European Economic Area countries perform slightly better in PISA; the Commonwealth of Independent States and Asian countries in TIMSS. Content balance and years of schooling explain most of the variation.
For many countries, the results from PISA 2000 were surprising. In Germany and the United States, for example, the comparatively low scores brought on heated debate about how the school system should be changed. Some headlines in national newspapers, for example, were:
Students from Shanghai, China, had the top scores of every category (Mathematics, Reading and Science) in PISA 2009 and 2012. In discussing these results, PISA spokesman Andreas Schleicher, Deputy Director for Education and head of the analysis division at the OECD’s directorate for education, described Shanghai as a pioneer of educational reform in which "there has been a sea change in pedagogy". Schleicher stated that Shanghai abandoned its "focus on educating a small elite, and instead worked to construct a more inclusive system. They also significantly increased teacher pay and training, reducing the emphasis on rote learning and focusing classroom activities on problem solving."
Schleicher also states that PISA tests administered in rural China have produced some results approaching the OECD average: Citing further, as-yet-unpublished OECD research, Schleicher said, "We have actually done Pisa in 12 of the provinces in China. Even in some of the very poor areas you get performance close to the OECD average." Schleicher says that for a developing country, China's 99.4% enrollment in primary education is "the envy of many countries". He maintains that junior secondary school participation rates in China are now 99%; and in Shanghai, not only has senior secondary school enrollment attained 98%, but admissions into higher education have achieved 80% of the relevant age group. Schleicher believes that this growth reflects quality, not just quantity, which he contends the top PISA ranking of Shanghai's secondary education confirms. Schleicher believes that China has also expanded school access and has moved away from learning by rote. According to Schleicher, Russia performs well in rote-based assessments, but not in PISA, whereas China does well in both rote-based and broader assessments.
Critics of PISA counter that in Shanghai and other Chinese cities, most children of migrant workers can only attend city schools up to the ninth grade, and must return to their parents' hometowns for high school due to hukou restrictions, thus skewing the composition of the city's high school students in favor of wealthier local families. A population chart of Shanghai reproduced in The New York Times shows a steep drop off in the number of 15-year-olds residing there. According to Schleicher, 27% of Shanghai's 15-year-olds are excluded from its school system (and hence from testing). As a result, the percentage of Shanghai's 15-year-olds tested by PISA was 73%, lower than the 89% tested in the US.
Education professor Yong Zhao has noted that PISA 2009 did not receive much attention in the Chinese media, and that the high scores in China are due to excessive workload and testing, adding that it's "no news that the Chinese education system is excellent in preparing outstanding test takers, just like other education systems within the Confucian cultural circle: Singapore, Korea, Japan, and Hong Kong."
The stable, high marks of Finnish students have attracted a lot of attention. According to Hannu Simola the results reflect a paradoxical mix of progressive policies implemented through a rather conservative pedagogic setting, where the high levels of teachers' academic preparation, social status, professionalism and motivation for the job are concomitant with the adherence to traditional roles and methods by both teachers and pupils in Finland's changing, but still quite paternalistic culture. Others advance Finland's low poverty rate as a reason for its success. Finnish education reformer Pasi Sahlberg attributes Finland's high educational achievements to its emphasis on social and educational equality and stress on cooperation and collaboration, as opposed to the competition among teachers and schools that prevails in other nations.
Of the 74 countries tested in the PISA 2009 cycle including the "+" nations, the two Indian states came up 72nd and 73rd out of 74 in both reading and maths, and 73rd and 74th in science. India's poor performance may not be linguistic as some suggested. 12.87% of US students, for example, indicated that the language of the test differed from the language spoken at home. while 30.77% of Himachal Pradesh students indicated that the language of the test differed from the language spoken at home, a significantly higher percent However, unlike American students, those Indian students with a different language at home did better on the PISA test than those with the same language. India's poor performance on the PISA test is consistent with India's poor performance in the only other instance when India's government allowed an international organization to test its students and consistent with India's own testing of its elite students in a study titled Student Learning in the Metros 2006.  These studies were conducted using TIMSS questions. The poor result in PISA was greeted with dismay in the Indian media. The BBC reported that as of 2008, only 15% of India's students reach high school.
India pulled out of the 2012 round of PISA testing, in August 2012, with the Indian government attributing its action to the unfairness of PISA testing to Indian students. The Indian Express reported on 9/3/2012 that "The ministry (of education) has concluded that there was a socio-cultural disconnect between the questions and Indian students. The ministry will write to the OECD and drive home the need to factor in India's "socio-cultural milieu". India's participation in the next PISA cycle will hinge on this". The Indian Express also noted that "Considering that over 70 nations participate in PISA, it is uncertain whether an exception would be made for India".
In June 2013, the Indian government, still concerned with the future prospect of fairness of PISA testing relating to Indian students, again pulled India out from the 2015 round of PISA testing.
In 2013, the Times Educational Supplement (TES) published an article, "Is PISA Fundamentally Flawed?" by William Stewart, detailing serious critiques of PISA's conceptual foundations and methods advanced by statisticians at major universities.
In the article, Professor Harvey Goldstein of the University of Bristol was quoted as saying that when the OECD tries to rule out questions suspected of bias, it can have the effect of "smoothing out" key differences between countries. "That is leaving out many of the important things,” he warned. "They simply don't get commented on. What you are looking at is something that happens to be common. But (is it) worth looking at? PISA results are taken at face value as providing some sort of common standard across countries. But as soon as you begin to unpick it, I think that all falls apart."
University of Copenhagen Professor Svend Kreiner, who examined in detail PISA's 2006 reading results, noted that in 2006 only about ten percent of the students who took part in PISA were tested on all 28 reading questions. "This in itself is ridiculous,” Kreiner told Stewart. "Most people don't know that half of the students taking part in PISA (2006) do not respond to any reading item at all. Despite that, PISA assigns reading scores to these children."
Queen's University Belfast mathematician Dr. Hugh Morrison stated that he found the statistical model underlying PISA to contain a fundamental, insoluble mathematical error that renders Pisa rankings "valueless". Goldstein remarked that Dr. Morrison's objection highlights “an important technical issue” if not a “profound conceptual error”. However, Goldstein cautioned that PISA has been "used inappropriately", contending that some of the blame for this "lies with PISA itself. I think it tends to say too much for what it can do and it tends not to publicise the negative or the weaker aspects.” Professors Morrison and Goldstein expressed dismay at the OECD's response to criticism. Morrison said that when he first published his criticisms of PISA in 2004 and also personally queried several of the OECS's "senior people" about them, his points were met with “absolute silence” and have yet to be addressed. “I was amazed at how unforthcoming they were,” he told TES. “That makes me suspicious.” “Pisa steadfastly ignored many of these issues,” he says. “I am still concerned.”
Professor Kreiner agreed: “One of the problems that everybody has with PISA is that they don’t want to discuss things with people criticising or asking questions concerning the results. They didn’t want to talk to me at all. I am sure it is because they can’t defend themselves.
Two studies have compared high achievers in mathematics on the PISA and those on the U.S. National Assessment of Educational Progress (NAEP). Comparisons were made between those scoring at the "advanced" and "proficient" levels in mathematics on the NAEP with the corresponding performance on the PISA. Overall, 30 nations had higher percentages than the U.S. of students at the "advanced" level of mathematics. The only OECD countries with worse results were Portugal, Greece, Turkey, and Mexico. Six percent of U.S. students were "advanced" in mathematics compared to 28 percent in Taiwan. The highest ranked state in the U.S. (Massachusetts) was just 15th in the world if it was compared with the nations participating in the PISA. 31 nations had higher percentages of "proficient" students than the U.S. Massachusetts was again the best U.S. state, but it ranked just ninth in the world if compared with the nations participating in the PISA.
Comparisons with results for the Trends in International Mathematics and Science Study (TIMSS) appear to give different results—suggesting that the U.S. states actually do better in world rankings. This can likely be traced to the different material being covered and the United States teaching mathematics in a style less harmonious with the "Realistic Mathematics Education" which forms the basis of the exam. Countries that commonly use this teaching method score higher on PISA, and less highly on TIMSS and other assessments. 
Stephen Krassen, professor emeritus at the University of Southern California, and Mel Riddile of the NASSP attributed the relatively low performance of students in the United States to the country's high rate of child poverty, which exceeds that of other OECD countries. However, individual US schools with poverty rates comparable to Finland's (below 10%), as measured by reduced-price school lunch participation, outperform Finland; and US schools in the 10–24% reduced-price lunch range are not far behind.
Reduced school lunch participation is the only available intra-poverty indicator for US schoolchildren. In the United States, schools in locations in which less than 10% of the students qualified for free or reduced-price lunch averaged PISA scores of 551 (higher than any other OECD country). This can be compared with the other OECD countries (which have tabled figures on children living in relative poverty):
|Country||Percent of reduced school lunches (US)|
Percent of relative child poverty (Other OECD countries)
|United States||< 10%||551|
|United States||> 75%||446|
In 2013 Martin Carnoy of the Stanford University Graduate School of Education and Richard Rothstein of the Economic Policy Institute released a report, "What do international tests really show about U.S. student performance?", analyzing the 2009 PISA data base. Their report found that U.S. PISA test scores had been lowered by a sampling error that over-represented adolescents from the most disadvantaged American schools in the test-taking sample. The authors cautioned that international test scores are often “interpreted to show that American students perform poorly when compared to students internationally” and that school reformers then conclude that “U.S. public education is failing.” Such inferences, made before the data has been carefully analyzed, they say, “are too glib” and "may lead policymakers to pursue inappropriate and even harmful reforms."
Carnoy and Rothstein observe that in all countries, students from disadvantaged backgrounds perform worse than those from advantaged backgrounds, and the US has a greater percentage of students from disadvantaged backgrounds. The sampling error on the PISA results lowered U.S. scores for 15-year-olds even further, they say. The authors add, however, that in countries such as Finland, the scores of disadvantaged students tends to be stagnant, whereas in the U.S the scores of disadvantaged students have been steadily rising over time, albeit still lagging behind their those of their more advantaged peers. When the figures are adjusted for social class, the PISA scores of all US students would still remain behind those of the highest scoring countries, nevertheless, the scores of US students of all social backgrounds have shown a trajectory of improvement over time, notably in mathematics, a circumstance PISA's report fails to take into account.
Carnoy and Rothstein write that PISA spokesman Schleicher has been quoted saying that “international education benchmarks make disappointing reading for the U.S.” and that “in the U.S. in particular, poverty was destiny. Low-income American students did (and still do) much worse than high-income ones on PISA. But poor kids in Finland and Canada do far better relative to their more privileged peers, despite their disadvantages” (Ripley 2011)." Carnoy and Rothstein state that their report's analysis shows Schleicher and Ripley's claims to be untrue. They further fault the way PISA's results have persistently been released to the press before experts have time to evaluate them; and they charge the OECD reports with inconsistency in explaining such factors as the role of parental education. Carnoy and Rothstein also note with alarm that the US secretary of education Arne Duncan regularly consults with PISA's Andreas Schleicher in formulating educational policy before other experts have been given a chance to analyze the results. Carnoy and Rothstein's report (written before the release of the 2011 database) concludes:
We are most certain of this: To make judgments only on the basis of national average scores, on only one test, at only one point in time, without comparing trends on different tests that purport to measure the same thing, and without disaggregation by social class groups, is the worst possible choice. But, unfortunately, this is how most policymakers and analysts approach the field.
The most recent test for which an international database is presently available is PISA, administered in 2009. A database for TIMSS 2011 is scheduled for release in mid-January 2013. In December 2013, PISA will announce results and make data available from its 2012 test administration. Scholars will then be able to dig into TIMSS 2011 and PISA 2012 databases so they can place the publicly promoted average national results in proper context. The analyses we have presented in this report should caution policymakers to await understanding of this context before drawing conclusions about lessons from TIMSS or PISA assessments.
Although PISA and TIMSS officials and researchers themselves generally refrain from hypothesizing about the large and stable differences in student achievement between countries, since 2000 an large literature on the differences in PISA and TIMSS results and their possible causes has emerged. Data from PISA have furnished several economists, notably Eric Hanushek, Ludger Wosserman, Heiner Rindermann, and Stephen J. Ceci, with material for books and articles about the relationship between student achievement and economic development, democratization, and health; as well as the roles of such single educational factors as high-stakes exams, the presence or absence of private schools, and the effects and timing of ability tracking.
PISA 2006 reading literacy results are not reported for the United States because of an error in printing the test booklets. Furthermore, as a result of the printing error, the mean performance in mathematics and science may be misestimated by approximately 1 score point. The impact is below one standard error.