500'000€ Prize for Compressing Human Knowledge
(widely known as the Hutter Prize.   Total payout so far: 29'945€)

Compress the 1GB file enwik9 to less than the current record of about 110MB

News: Kaido Orav & Byron Knoll are 
the eighth Winners! Congratulations!
Kaido Orav Byron Knoll
... the contest continues ...
Prize Medal
Being able to compress well is closely related to intelligence as explained below. While intelligence is a slippery concept, file sizes are hard numbers. Wikipedia is an extensive snapshot of Human Knowledge. If you can compress the first 1GB of Wikipedia better than your predecessors, your (de)compressor likely has to be smart(er). The intention of this prize is to encourage development of intelligent compressors/programs as a path to AGI.

Interview with Lex Fridman (26.Feb'20) (Video, Audio, Tweet)

The Task

Losslessly compress the 1GB file enwik9 to less than 110MB. More precisely: Remark: You can download the zipped version enwik9.zip of enwik9 here (≈300MB). Please find more details including constraints and relaxations at http://prize.hutter1.net/hrules.htm.

Motivation

This compression contest is motivated by the fact that being able to compress well is closely related to acting intelligently, thus reducing the slippery concept of intelligence to hard file size numbers. In order to compress data, one has to find regularities in them, which is intrinsically difficult (many researchers live from analyzing data and finding compact models). So compressors beating the current "dumb" compressors need to be smart(er). Since the prize wants to stimulate developing "universally" smart compressors, we need a "universal" corpus of data. Arguably the online encyclopedia Wikipedia is a good snapshot of the Human World Knowledge. So the ultimate compressor of it should "understand" all human knowledge, i.e. be really smart. enwik9 is a hopefully representative 1GB extract from Wikipedia.

Detailed Rules for Participation

Baseline Enwik9 and Previous Records Enwik8

Author (enwik9) Date Decompressor Total Size Compr.Factor|RAM|time % | Award Sponsor
You? 202? ? <109'685'197 >9.12 | <10GB | <50h >1%| >5'000€ Marcus Hutter
Kaido Orav & Byron Knoll 3.Sep 2024 fx2-cmix 110'793'128 9.03 | 95% | 99% 1.59%| 7'950€ Marcus Hutter
Kaido Orav 2.Feb 2024 fx-cmix 112'578'322 8.88 | 8.9GB | ~50h 1.38%| 6'911€ Marcus Hutter
Saurabh Kumar 16.Jul 2023 fast cmix 114'156'155 8.76 | 8.4GB | 43h 1.04%| 5'187€ Marcus Hutter
Artemiy Margaritov 31.May 2021 starlit ... 115'352'938 8.67 | 10GB | ~50h 1.1% | 9000€ Marcus Hutter
Alexander Rhatushnyak 4.Jul 2019 phda9v1.8 ... 116'673'681 8.58 | 6.3GB | ~23h -- | pre-prize -

Author (enwik8) Date Decompressor Total Size Compr.Factor|RAM|time % | Award Sponsor
Alexander Rhatushnyak 4.Nov 2017 phda9 ... 15'284'944 6.54 | 1048MB | ~5h 4.17%| 2085€ Marcus Hutter
Alexander Rhatushnyak 23.May 2009 decomp8 ... 15'949'688 6.27 | 936MB | ~9h 3.2%| 1614€ Marcus Hutter
Alexander Rhatushnyak 14.May 2007 paq8hp12 -7 16'481'655 6.07 | 936MB | 9h 3.5%| 1732€ Marcus Hutter
Alexander Rhatushnyak 25.Sep.2006 paq8hp5 -7 17'073'018 5.86 | 900MB | 5h 6.8%| 3416€ Marcus Hutter
Matt Mahoney 24.Mar.2006 paq8f -7 18'324'887 5.46 | 854MB | 5h -- | pre-prize -

More Information

History

Committee

Frequently Asked Questions (FAQ)

Contestants and Winners for enwik8

So far we have received the submissions below for enwik8. Each is/was open for public comment and verification for 30 days before an award decision will be/was made. Comments should be made to the Hutter Prize Discussion Forum or by email to members of the Prize committee.
Date Author Decompressor Compression
Options
Size of
archive
Size of
decompr.
Total Size %Improve
1-S/L
Compr.
Factor
Bits/
Char
Memory Time Note
4.Nov'17 Alexander Rhatushnyak phda9 compressed_enwik8 enwik8 15'242'496 42'448 15'284'944 4.17% 6.54 1.225 1048MB ~5h Meets all prize criteria. Fourth winner!
23.May'09 Alexander Rhatushnyak decomp8 archive8.bin enwik8 15'932'968 16'720 15'949'688 3.2% 6.27 1.278 936MB ~9h Meets all prize criteria. Third winner!
22.Apr'09 Alexander Rhatushnyak decomp8 archive8.bin enwik8 15'970'425 16'252 15'986'677 3.0% 6.26 1.279 924MB 9h 3.0% improvement over new baseline paq8hp12
14.May'07 Alexander Rhatushnyak paq8hp12 -7 16'381'959 99'696 16'481'655 3.5% 6.07 1.319 936MB 9h Meets all prize criteria. Second winner!
... " ... ... ... ... ... ... ... ... ... ... ...
6.Nov'06 Alexander Rhatushnyak paq8hp6 -7 16'731'800 170'400 16'902'200 1% 5.92 1.352 941MB 5h 1% improvement over new baseline paq8hp5
25.Sep'06 Alexander Rhatushnyak paq8hp5 -7 16'898'402 174'616 17'073'018 6.8% 5.86 1.366 900MB 5h Meets all prize criteria. First winner!
10.Sep'06 Alexander Rhatushnyak paq8hp4 -7 17'039'173 206'336 17'245'509 5.9% 5.80 1.380 803MB 5h Superseded by paq8hp5
3.Sep'06 Alexander Rhatushnyak paq8hp3 -7 17'241'280 178'468 17'419'748 4.9% 5.74 1.394 742MB 5h Superseded by paq8hp4
28.Aug'06 Alexander Rhatushnyak paq8hp2 -7 17'390'460 205'276 17'595'736 4.0% 5.68 1.408 747MB 5h Superseded by paq8hp3
21.Aug'06 Alexander Rhatushnyak paq8hp1 -7 17'566'769 206'764 17'773'533 3.0% 5.63 1.422 748MB 5h Superseded by paq8hp2
20.Aug'06 Alexander Rhatushnyak paq8hkcc -7 17'597'599 244'224 17'841'823 2.6% 5.61 1.43 747MB 5h Superseded by paq8hp1
16.Aug'06 Dmitry Shkarin durilca0.5h -m1650 -o21 -t2 17'958'687 86'016 18'044'703 1.5% 5.54 1.444 1650MB 30min Fails to meet the reasonable memory limitations
16.Aug'06 Rudi Cilibrasi raq8g -7 18'132'399 34'816 18'167'215 0.9% 5.50 1.453 1089MB 7h Fails to meet the 1% hurdle, and others
24.Mar'06 Matt Mahoney paq8f -7 18'289'559 35'328 18'324'887 0% 5.46 1.466 854MB 5h Pre-prize baseline

The time for decompression/compression is estimated for a 2GHz P4 till 2010 and for a 2.7GHz i7 since 2017. The percent (%) improvement is over the baseline previous record. More details on the (de)compressors can be found here.

Links (Further Information/Discussion/News)

Core Resources: Further Recommended Technical Reading relevant to the Compression=AI Paradigm: Post-2020 Discussion (enwik9,€500k): Pre-2020 Discussion (enwik8,€50k): Warning: The average quality of the posts in the discussion groups and mailing lists is very low. Most participants don't know the underlying scientific concepts and some have not even read the rationale behind the contest. For a cleaned summary consult the frequently asked questions. The competition was also announced or discussed in many blogs.

Disclaimer: Copying and distribution of this page (http://prize.hutter1.net) is permitted, provided the source is cited. The prize will be paid if the solution reflects the spirit of the contest. In particular decompressors (secretly) receiving any kind of "outside" information are forbidden. Also in order to verify your claim we need to be able to run your executable on our machines within reasonable space and time constraints. This is a privately run and funded contest. Payment of the prize cannot be legally enforced. The smallest claimable prize is 5'000€. After an award, the prize formula (L) will be adapted. Rules may change at any time to meet the goals of fairness, accuracy, maximizing public participation, and recognizing existing practice. July 2006. Updated Feb.2020.