Opened 8 years ago
Closed 7 years ago
#16924 closed enhancement (fixed)
Read DIMACS files
Reported by: | malb | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-6.4 |
Component: | interfaces: optional | Keywords: | |
Cc: | ncohen, tmonteil | Merged in: | |
Authors: | Travis Scholl | Reviewers: | Martin Albrecht |
Report Upstream: | N/A | Work issues: | |
Branch: | 169849b (Commits, GitHub, GitLab) | Commit: | 169849b79de10292f16c42fe4ad87be189566767 |
Dependencies: | Stopgaps: |
Description
Sage writes dimacs files (for SAT solvers) but doesn't seem to read them.
Change History (32)
comment:1 Changed 8 years ago by
- Cc ncohen added
comment:2 Changed 8 years ago by
- Cc tmonteil added
comment:3 Changed 8 years ago by
comment:4 Changed 8 years ago by
it's very easy to load/parse: just turn each line into a tuple and feed those to the SatSolver class with add_clause. Something like
for line in dimacs_file: if line.startswith("c"): continue # comment if line.startswith("p"): continue # header line = line.split(" ")) clause = map(int,[e for e in line if e]) clause = clause[:-1] # strip trailing zero solver.add_clause(clause)
comment:5 Changed 8 years ago by
This is perhaps off-topic. I have read the dimacs.py
file and i am surprised that the class maintains the data as an opened file (see self._tail
). I am thinking of the case, where the data is a string given by propcalc.formula("a|b").satformat()
, it looks weird to me that the class has to create a file on the disk for that. Is there a reason for that design choice that i could be missing ? Why not maintaining as a list of tuples of integers (corresponding to a list of clauses), and opening a file only when writing and loading data to/from a file ?
comment:6 Changed 8 years ago by
Hi, the idea was to avoid having the clauses in memory twice: once in the solver and once (in a bloated format) in Python. The rationale behind this was that in my applications these SAT clauses can be rather big (think 1-2GB).
comment:7 Changed 7 years ago by
- Branch set to u/tscholl2/read_dimacs_16924
- Commit set to d9274c65814bbf7a5601345c3bda1afcb38e26b2
- Status changed from new to needs_review
I added a read
method as described in the comments here. I split it into two methods, a read
method with the usual read(filename)
signature and also a load(file_object)
method which actually reads the lines from the file. This way it's easy to use other kinds of buffers besides files. Also it allows easy testing using fake files from StringIO. The import uses the six
package so the example is compatible with both python 2 and 3. It should read line by line so it shouldn't store the entire files in memory as it sounds like people use somewhat larger files for this.
New commits:
d9274c6 | added satsolver method to read in DIMACS files
|
comment:8 Changed 7 years ago by
I think file objects are a rare thing in the Sage "API", i.e. as far as I know we don't usually use them. I'm not sure there should be two methods, just one with a filename. Also, aren't "read" and "load" a bit too generic as names given that these are methods on solver instances?
comment:9 Changed 7 years ago by
Thanks for the comment.
I wasn't sure about the kind of use case this class gets. You mentioned in a comment that you sometimes use files on the order of gigabytes. I thought it might be useful to have easy access to a method which you only need a stream in case you don't have a disk file (for example a memory mapped file object, a StringIO file object, or some kind of stream from stdin). I don't know if you can open those with fopen
.
However, if you think that most people just read from a file on the disk, then I could easily remove the load
function.
As for the read
name, I used read
because there was already a write
method and it seemed complementary. Do you think parse
is a better name?
comment:10 Changed 7 years ago by
You're right read
makes sense given that there is a write
. How about both functions are merged into one? If the parameter is a string it is considered a file name? Otherwise it is considered a file object?
comment:11 Changed 7 years ago by
- Commit changed from d9274c65814bbf7a5601345c3bda1afcb38e26b2 to 623cbede7a8e9b8b3d0da89dbac7bcda1fced153
Branch pushed to git repo; I updated commit sha1. New commits:
623cbed | merged read and load methods
|
comment:12 Changed 7 years ago by
That's a great idea. I merged the methods and uploaded the new version.
comment:13 Changed 7 years ago by
- Component changed from PLEASE CHANGE to interfaces
comment:14 Changed 7 years ago by
- Status changed from needs_review to positive_review
Looks good to me.
comment:15 Changed 7 years ago by
- Reviewers set to Martin Albrecht
comment:16 Changed 7 years ago by
- Status changed from positive_review to needs_work
Docs don't build
comment:17 Changed 7 years ago by
- Commit changed from 623cbede7a8e9b8b3d0da89dbac7bcda1fced153 to 79c7c68572efc3c5c01848051d6afa30954331c4
Branch pushed to git repo; I updated commit sha1. New commits:
79c7c68 | added an 'r'
|
comment:18 Changed 7 years ago by
- Status changed from needs_work to needs_review
Added an "r" so the doc string is raw text instead of a string so that the "\n"'s in the example don't crash the doc building.
comment:19 Changed 7 years ago by
- Component changed from interfaces to interfaces: optional
comment:20 Changed 7 years ago by
Hello,
The docstring of the function should follow the scheme
def my_function(x,y,z): r""" one line description longer description if needed EXAMPLES:: sage: my_function(1,2,3)
The following is not helpful in the documentation
See http://trac.sagemath.org/ticket/16924
It would make sense in an AUTHOR
section. Or within the TESTS
section if it would correspond to some bug fix. Moreover, the syntax is now trac:`16924`
.
In your example, I do not understand what you want to show.
sage: solver.read(file_object) sage: solver DIMACS Solver: ''
It seems that nothing happened... Would it be possible to print the clauses?
Vincent
comment:21 Changed 7 years ago by
- Commit changed from 79c7c68572efc3c5c01848051d6afa30954331c4 to 34d8f00f1175b40bb7274934d217261eab2025e0
Branch pushed to git repo; I updated commit sha1. New commits:
34d8f00 | fixed docstring and example
|
comment:22 Changed 7 years ago by
Thanks for the comments!
I changed the doc string as you suggested and changed the example to print out the clauses. It's a lot better for the example.
Please let me know if you notice anything else that doesn't fit in.
comment:23 follow-up: ↓ 24 Changed 7 years ago by
I hardly found specification for DIMACS files. In http://people.sc.fsu.edu/~jburkardt/data/cnf/cnf.html they say several things that you are not taking care of:
- the comments should only be at the begining
- there is one line starting with
p cnf NUM_VAR NUM_CLAUSES
(the same format is used for other constraint satisfcation problems) - a clause might extend on more than one line and a clause might start on the same line as another one
In other words, as far as I understand, this would be valid
c hey c ho p cnf 3 2 1 -2 0 3 1 0
Moreover, it is written In some examples of CNF files, the rule that the variables are numbered from 1 to N is not followed. The file might declare that there are 10 variables, for instance, but allow them to be numbered 2 through 11.
It would be nice if you know some link with the exact specification of the format.
comment:24 in reply to: ↑ 23 ; follow-up: ↓ 25 Changed 7 years ago by
Replying to vdelecroix:
- the comments should only be at the begining
Why enforce that restriction when there is no harm (as far as I can see) in being more liberal?
- there is one line starting with
p cnf NUM_VAR NUM_CLAUSES
(the same format is used for other constraint satisfcation problems)
As far as I know, many parsers these days ignore this (source: I chatted to Máté Soos ages ago, I might be misremembering, though)
- a clause might extend on more than one line and a clause might start on the same line as another one
I did not know about that one.
In other words, as far as I understand, this would be valid
c hey c ho p cnf 3 2 1 -2 0 3 1 0
comment:25 in reply to: ↑ 24 Changed 7 years ago by
Replying to malb:
Replying to vdelecroix:
I very naively read the web page I mentioned. I am fine with the current state, but I just wanted to be sure that anybody (that uses this format) agrees with this parser. And of course, I an not the required person to judge that.
One option would be to write down in the docstring that the DIMACS format is not well specified.
Vincent
comment:26 Changed 7 years ago by
Sorry it took me a while, I'm at a Sage days working on other things. But I asked people here and got directed to a SAT solver competition website http://people.sc.fsu.edu/~jburkardt/data/cnf/cnf.html.
I copied their rules here
- The file can start with comments, that is lines begining with the character c.
- Right after the comments, there is the line p cnf nbvar nbclauses indicating that the instance is in CNF format; nbvar is the exact number of variables appearing in the file; nbclauses is the exact number of clauses contained in the file.
- Then the clauses follow. Each clause is a sequence of distinct non-null numbers between -nbvar and nbvar ending with 0 on the same line; it cannot contain the opposite literals i and -i simultaneously. Positive numbers denote the corresponding variables. Negative numbers denote the negations of the corresponding variables.
The clause about "ending with 0 on the same line" means they don't allow the multi-line clauses. So it sounds like the definition of "DIMACS standards" depends on who you ask.
I will modify the docstring to clearly describe the variation it assumes the file is in.
Should I post links to these pages describing different rules for DIMACS files in the docstring as well?
EDIT: Sorry I realized I copied the wrong link. Here is the corect one: http://www.satcompetition.org/2009/format-benchmarks2009.html
comment:27 Changed 7 years ago by
Hi, might as well thrown in the links. Sounds like a good plan.
comment:28 Changed 7 years ago by
If, instead of reimplementing it yourself, you prefer to sweat over GLPK's API, then you can try calling "glp_read_cnfsat" [1]. That's a way to delegate the desponsibility of handling all cases :-P
Nathann
comment:29 Changed 7 years ago by
- Commit changed from 34d8f00f1175b40bb7274934d217261eab2025e0 to 169849b79de10292f16c42fe4ad87be189566767
Branch pushed to git repo; I updated commit sha1. New commits:
169849b | noted differences in DIMACS file specs in docs
|
comment:30 Changed 7 years ago by
I added some lines to the documentation for the read method which explains the differences in DIMACS files with links to various "standards". I rebuilt the docs and everything seemed to be formatted alright.
I thought the method Martin wrote is flexible and simple enough that it wasn't worth it to call (and sweat over) GLPK's definition of DIMAC files. Thank you for pointing out another source though Nathann!
comment:31 Changed 7 years ago by
- Status changed from needs_review to positive_review
Looks good to me.
comment:32 Changed 7 years ago by
- Branch changed from u/tscholl2/read_dimacs_16924 to 169849b79de10292f16c42fe4ad87be189566767
- Resolution set to fixed
- Status changed from positive_review to closed
Hello,
seems to be done in
sympy
(which is shipped with Sage)But I do not know how
sympy
fits with the current SAT stuff in Sage.Vincent