Opened 5 years ago

Closed 4 years ago

#16924 closed enhancement (fixed)

Read DIMACS files

Reported by: malb Owned by:
Priority: major Milestone: sage-6.4
Component: interfaces: optional Keywords:
Cc: ncohen, tmonteil Merged in:
Authors: Travis Scholl Reviewers: Martin Albrecht
Report Upstream: N/A Work issues:
Branch: 169849b (Commits) Commit: 169849b79de10292f16c42fe4ad87be189566767
Dependencies: Stopgaps:

Description

Sage writes dimacs files (for SAT solvers) but doesn't seem to read them.

Change History (32)

comment:1 Changed 5 years ago by ncohen

  • Cc ncohen added

comment:2 Changed 5 years ago by tmonteil

  • Cc tmonteil added

comment:3 Changed 5 years ago by vdelecroix

Hello,

seems to be done in sympy (which is shipped with Sage)

sage: from sympy.logic.utilities.dimacs import load
sage: load('1 2 \n 3')
And(Or(cnf_1, cnf_2), cnf_3)

But I do not know how sympy fits with the current SAT stuff in Sage.

Vincent

comment:4 Changed 5 years ago by malb

it's very easy to load/parse: just turn each line into a tuple and feed those to the SatSolver class with add_clause. Something like

for line in dimacs_file:
    if line.startswith("c"):
        continue # comment
    if line.startswith("p"):
        continue # header
    line = line.split(" "))
    clause = map(int,[e for e in line if e])
    clause = clause[:-1] # strip trailing zero
    solver.add_clause(clause)

comment:5 Changed 5 years ago by tmonteil

This is perhaps off-topic. I have read the dimacs.py file and i am surprised that the class maintains the data as an opened file (see self._tail). I am thinking of the case, where the data is a string given by propcalc.formula("a|b").satformat(), it looks weird to me that the class has to create a file on the disk for that. Is there a reason for that design choice that i could be missing ? Why not maintaining as a list of tuples of integers (corresponding to a list of clauses), and opening a file only when writing and loading data to/from a file ?

comment:6 Changed 5 years ago by malb

Hi, the idea was to avoid having the clauses in memory twice: once in the solver and once (in a bloated format) in Python. The rationale behind this was that in my applications these SAT clauses can be rather big (think 1-2GB).

comment:7 Changed 4 years ago by tscholl2

  • Authors set to Travis Scholl
  • Branch set to u/tscholl2/read_dimacs_16924
  • Commit set to d9274c65814bbf7a5601345c3bda1afcb38e26b2
  • Status changed from new to needs_review

I added a read method as described in the comments here. I split it into two methods, a read method with the usual read(filename) signature and also a load(file_object) method which actually reads the lines from the file. This way it's easy to use other kinds of buffers besides files. Also it allows easy testing using fake files from StringIO. The import uses the six package so the example is compatible with both python 2 and 3. It should read line by line so it shouldn't store the entire files in memory as it sounds like people use somewhat larger files for this.


New commits:

d9274c6added satsolver method to read in DIMACS files

comment:8 Changed 4 years ago by malb

I think file objects are a rare thing in the Sage "API", i.e. as far as I know we don't usually use them. I'm not sure there should be two methods, just one with a filename. Also, aren't "read" and "load" a bit too generic as names given that these are methods on solver instances?

comment:9 Changed 4 years ago by tscholl2

Thanks for the comment.

I wasn't sure about the kind of use case this class gets. You mentioned in a comment that you sometimes use files on the order of gigabytes. I thought it might be useful to have easy access to a method which you only need a stream in case you don't have a disk file (for example a memory mapped file object, a StringIO file object, or some kind of stream from stdin). I don't know if you can open those with fopen.

However, if you think that most people just read from a file on the disk, then I could easily remove the load function.

As for the read name, I used read because there was already a write method and it seemed complementary. Do you think parse is a better name?

comment:10 Changed 4 years ago by malb

You're right read makes sense given that there is a write. How about both functions are merged into one? If the parameter is a string it is considered a file name? Otherwise it is considered a file object?

comment:11 Changed 4 years ago by git

  • Commit changed from d9274c65814bbf7a5601345c3bda1afcb38e26b2 to 623cbede7a8e9b8b3d0da89dbac7bcda1fced153

Branch pushed to git repo; I updated commit sha1. New commits:

623cbedmerged read and load methods

comment:12 Changed 4 years ago by tscholl2

That's a great idea. I merged the methods and uploaded the new version.

comment:13 Changed 4 years ago by kcrisman

  • Component changed from PLEASE CHANGE to interfaces

comment:14 Changed 4 years ago by malb

  • Status changed from needs_review to positive_review

Looks good to me.

comment:15 Changed 4 years ago by malb

  • Reviewers set to Martin Albrecht

comment:16 Changed 4 years ago by vbraun

  • Status changed from positive_review to needs_work

Docs don't build

comment:17 Changed 4 years ago by git

  • Commit changed from 623cbede7a8e9b8b3d0da89dbac7bcda1fced153 to 79c7c68572efc3c5c01848051d6afa30954331c4

Branch pushed to git repo; I updated commit sha1. New commits:

79c7c68added an 'r'

comment:18 Changed 4 years ago by tscholl2

  • Status changed from needs_work to needs_review

Added an "r" so the doc string is raw text instead of a string so that the "\n"'s in the example don't crash the doc building.

comment:19 Changed 4 years ago by jdemeyer

  • Component changed from interfaces to interfaces: optional

comment:20 Changed 4 years ago by vdelecroix

Hello,

The docstring of the function should follow the scheme

def my_function(x,y,z):
    r"""
    one line description

    longer description if needed

    EXAMPLES::

        sage: my_function(1,2,3)

The following is not helpful in the documentation

See http://trac.sagemath.org/ticket/16924

It would make sense in an AUTHOR section. Or within the TESTS section if it would correspond to some bug fix. Moreover, the syntax is now trac:`16924`.

In your example, I do not understand what you want to show.

sage: solver.read(file_object)
sage: solver
DIMACS Solver: ''

It seems that nothing happened... Would it be possible to print the clauses?

Vincent

Last edited 4 years ago by ncohen (previous) (diff)

comment:21 Changed 4 years ago by git

  • Commit changed from 79c7c68572efc3c5c01848051d6afa30954331c4 to 34d8f00f1175b40bb7274934d217261eab2025e0

Branch pushed to git repo; I updated commit sha1. New commits:

34d8f00fixed docstring and example

comment:22 Changed 4 years ago by tscholl2

Thanks for the comments!

I changed the doc string as you suggested and changed the example to print out the clauses. It's a lot better for the example.

Please let me know if you notice anything else that doesn't fit in.

comment:23 follow-up: Changed 4 years ago by vdelecroix

I hardly found specification for DIMACS files. In http://people.sc.fsu.edu/~jburkardt/data/cnf/cnf.html they say several things that you are not taking care of:

  • the comments should only be at the begining
  • there is one line starting with p cnf NUM_VAR NUM_CLAUSES (the same format is used for other constraint satisfcation problems)
  • a clause might extend on more than one line and a clause might start on the same line as another one

In other words, as far as I understand, this would be valid

c hey
c ho
p cnf 3 2
1 -2 0 3
1   0

Moreover, it is written In some examples of CNF files, the rule that the variables are numbered from 1 to N is not followed. The file might declare that there are 10 variables, for instance, but allow them to be numbered 2 through 11.

It would be nice if you know some link with the exact specification of the format.

comment:24 in reply to: ↑ 23 ; follow-up: Changed 4 years ago by malb

Replying to vdelecroix:

  • the comments should only be at the begining

Why enforce that restriction when there is no harm (as far as I can see) in being more liberal?

  • there is one line starting with p cnf NUM_VAR NUM_CLAUSES (the same format is used for other constraint satisfcation problems)

As far as I know, many parsers these days ignore this (source: I chatted to Máté Soos ages ago, I might be misremembering, though)

  • a clause might extend on more than one line and a clause might start on the same line as another one

I did not know about that one.

In other words, as far as I understand, this would be valid

c hey
c ho
p cnf 3 2
1 -2 0 3
1   0

comment:25 in reply to: ↑ 24 Changed 4 years ago by vdelecroix

Replying to malb:

Replying to vdelecroix:

I very naively read the web page I mentioned. I am fine with the current state, but I just wanted to be sure that anybody (that uses this format) agrees with this parser. And of course, I an not the required person to judge that.

One option would be to write down in the docstring that the DIMACS format is not well specified.

Vincent

comment:26 Changed 4 years ago by tscholl2

Sorry it took me a while, I'm at a Sage days working on other things. But I asked people here and got directed to a SAT solver competition website http://people.sc.fsu.edu/~jburkardt/data/cnf/cnf.html.

I copied their rules here

  • The file can start with comments, that is lines begining with the character c.
  • Right after the comments, there is the line p cnf nbvar nbclauses indicating that the instance is in CNF format; nbvar is the exact number of variables appearing in the file; nbclauses is the exact number of clauses contained in the file.
  • Then the clauses follow. Each clause is a sequence of distinct non-null numbers between -nbvar and nbvar ending with 0 on the same line; it cannot contain the opposite literals i and -i simultaneously. Positive numbers denote the corresponding variables. Negative numbers denote the negations of the corresponding variables.

The clause about "ending with 0 on the same line" means they don't allow the multi-line clauses. So it sounds like the definition of "DIMACS standards" depends on who you ask.

I will modify the docstring to clearly describe the variation it assumes the file is in.

Should I post links to these pages describing different rules for DIMACS files in the docstring as well?

EDIT: Sorry I realized I copied the wrong link. Here is the corect one: http://www.satcompetition.org/2009/format-benchmarks2009.html

Last edited 4 years ago by tscholl2 (previous) (diff)

comment:27 Changed 4 years ago by malb

Hi, might as well thrown in the links. Sounds like a good plan.

comment:28 Changed 4 years ago by ncohen

If, instead of reimplementing it yourself, you prefer to sweat over GLPK's API, then you can try calling "glp_read_cnfsat" [1]. That's a way to delegate the desponsibility of handling all cases :-P

Nathann

[1] http://elis.dvo.ru/~lab_11/glpk-doc/cnfsat.pdf

comment:29 Changed 4 years ago by git

  • Commit changed from 34d8f00f1175b40bb7274934d217261eab2025e0 to 169849b79de10292f16c42fe4ad87be189566767

Branch pushed to git repo; I updated commit sha1. New commits:

169849bnoted differences in DIMACS file specs in docs

comment:30 Changed 4 years ago by tscholl2

I added some lines to the documentation for the read method which explains the differences in DIMACS files with links to various "standards". I rebuilt the docs and everything seemed to be formatted alright.

I thought the method Martin wrote is flexible and simple enough that it wasn't worth it to call (and sweat over) GLPK's definition of DIMAC files. Thank you for pointing out another source though Nathann!

comment:31 Changed 4 years ago by malb

  • Status changed from needs_review to positive_review

Looks good to me.

comment:32 Changed 4 years ago by vbraun

  • Branch changed from u/tscholl2/read_dimacs_16924 to 169849b79de10292f16c42fe4ad87be189566767
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.