Google
 

Introduction

This page describes the file formats used for the 9x9 experiments. It also references the php scripts that read or write/modify them, in case you want to see working code. Another page uses detailed, concrete examples to explain the file formats.

Positions

The position file is a simple list of positions. To add a new position append it to the end of the file; there is no header. Each position is made up of 2 or 3 blocks: PositionID, PositionDB, then optionally a PositionExtraInfo. (In the released data the PositionExtraInfo block is never used.) Without PositionExtraInfo each position is therefore 92 bytes; with it, it is 100 bytes.

PositionID
10 bytes. Binary.
First 8 bytes are a zobrist hash (see below). Binary.
1 byte of tiebreaker. Currently always binary 0; intended to be used when we find two different positions with the same zobrist hash.
1 byte of who to play next. 'B' or 'W'. Ascii.
PositionDB
82 bytes (if 9x9).
81 bytes, in ascii. # for black, O for white, . for empty point. The data goes from top-left corner, one row at a time. So byte 1 is top-left corner, byte 9 is top-right corner, byte 72 is bottom-left corner, byte 81 is bottom-right corner.
Final byte is 1 if there is a PositionExtraInfo section, 0 otherwise. Binary.
PositionExtraInfo
8 bytes. Binary.
4 bytes: unique ID of the source file (e.g. the sgf file). 1+. Corresponds to an entry in master_game_file_list.dat.
2 bytes: variation number. Variation within that game. (if the sgf is edited such that variations are added or removed then this will become out of date).
2 bytes: move number within that variation.

Scripts That Read It

score_positions.php (it also outputs the queue files which are in the same format).

Scripts That Write/Modify It

parse_position_file.php takes ascii board positions and converts them to the binary format.

analyze_position_dat.php has a few modes. "view" outputs ascii board positions. "histogram" gives a breakdown by number of stones on the board. "merge" joins files together, removing duplicates (this is useful when dealing with the queue files that are output by the score_positions.php script).

Related Library Files/Functions

lib/file_read.inc: load_position_file($fname,$callback,$early_stop_file='')

Scored Positions

The position file is a simple list of positions. To add a new scored position simply append it to the end of the file; there is no header file. Each position is made up of 2 blocks: PositionID, then a PositionScore. Each scored position is therefore exactly 12 bytes.

There is one file for each program and ruleset combination. So that information does not need to be stored in the file (it is implicit in the filename, e.g. "fuego_chinese.dat").

The file format is independent of board size; however the definition of PositionScore (and the PHP implementation of it) carries the assumption that the correct komi is likely to be 4, 5, 6 or 7, which may mean it is not suitable for other board sizes.

PositionID
(see definition above)
PositionScore
2 bytes, binary
score: on-board score. 3 means 3 or less, 8 means 8 or higher.
confidence: from 0 (meaning likely to be wrong) to 100 (meaning program is very confident about this score).

Scripts That Read It

read_scored_positions.php

diff_scored_positions.php

Scripts That Write It

score_position.php

Related Library Files/Functions

lib/file_read.inc: load_scored_positions($fname,$callback). You can use 'load_scored_position_objects_into_memory_callback' as the 2nd parameter, which will load all positions into the $GLOBALS['all_positions'] global array, where the key is the PositionID (as a string, 24 characters), and the value is the PositionScore object.

Zobrist Hashing

The PositionID is used to link together the PositionDB object (i.e. the board position) with the PositionScore object (i.e. the result of analysis). 8 of its 10 bytes are a zobrist hash code of the board position; you can of course make your own zobrist code from the board position, and implement some mapping (or even modify the data files to use your own code). However if you wish to use this built-in code (which is required to submit more positions, for instance), this section describes it.

The set of hashcodes is stored in zobrist_board_81.dat, which is included in all published data files. The file is in ascii. The first 3 lines form a header:

  1. Number of bytes used in the hash code. I.e. 8 means a 64-bit hash.
  2. Number of columns. For go we use three: black, white, ko-illegal
  3. Number of rows; i.e. the number of board locations. 81 for 9x9 go.
The remaining rows are csv format: one line for each board location, and the three columns in the order of black, white, ko-illegal point. The hash codes are written in hex, not binary, meaning an 8 byte code uses 16 ascii characters.

Note: the positions in the current published data are just board positions, out of the context of any game. Which means there is no ko-illegal point. The zobrist codes used in the PositionID object (in the input positions and the scored positions files) will therefore just be made up of values from the first two columns. If you discover any discrepencies please let me know.

Related Library Files/Functions

The lib/zobrist_hash.inc PHP script contains classes and functions to load, generate and use this data. lib/prepare_zobrist_9x9.inc is the more specific code built on top. (Warning: the published zobrist_board_81.dat cannot be reproduced from these scripts, for historical reasons; therefore make sure that zobrist_board_81.dat is always present in the same directory as the script being run, otherwise a new file will be generated.)

In the published data files, the class of interest is GoZobristHash64R, which makes a 64-bit hash code in all 8 rotations, then uses the one with lowest value. This means that if we have two positions, which are the same after some rotation, then it will only be analyzed the first time, and the cached result used the 2nd time. (Bear in mind that if any move information is being stored that you'll have to work out how it was rotated, in order to unrotate.)

Here is sample code to make a hash code from an ascii board position (as 81 ascii characters, i.e. the style used in the PositionDB object):

$Z=new GoZobristHash64R;
$Z->init_from_board_position($boardpos,81);
Then to get the code, as a binary string, use $Z->get(). 0 means an empty board. Use $Z->get_as_hex() to get it as 16 ascii characters. To discover how it was rotated use $Z->get_rotation() (returns 0 to 7, where 0 means not rotated).


© Copyright 2010 Darren Cook <darren@dcook.org>