PyMLNs: Markov logic networks in Python
This package consists of:
- An implementation of MLNs as a Python module (MLN.py) that you can use to work with MLNs in your own Python scripts
- Graphical tools for performing inference in MLNs and learning the parameters of MLNs,
using either PyMLNs itself, J-MLNs (a Java implementation of MLNs that is shipped with ProbCog) or the Alchemy system as the underlying engine.
Prerequisites:
- Python 2.4 or above
- The necessary Python module pyparsing is packaged with PyMLNs (v1.4.6).
if you want/need to install a different version, go here.
- Recommended optional Python modules:
- To enable parameter learning with the PyMLNs engine, you need SciPy and NumPy.
- To speed up calculations with the internal engine on i386-compatible machines, I highly recommend installing Psyco.
The Graphical Tools
Two graphical tools, whose usage is hopefully self-explanatory, are part of the package: There's an inference tool (queryTool.py) and a parameter learning tool (learningTool.py). Simply invoke them using the Python interpreter. (On Windows, do not use pythonw.exe to run them because the console output is an integral part of these tools.)
python queryTool.py
python learningTool.py
General Usage
Both tools work with .mln and .db files in the current directory and will by default write output files to the current directory, too. (Note that when you invoke the tools, the working directory need not be the directory in which the tools themselves are located, which is why I recommend that you create appropriate shortcuts.) The tools are designed to be invoked from a console. Simply change to the directory in which the files you want to work with are located and then invoke the tool you want to use.
The general workflow is then as follows: You select the files you want to work with, edit them as needed or even create new files directly from within the GUI. Then you set the further options (e.g. the number of inference steps to take) and click on the button at the very bottom to start the procedure.
Once you start the actual algorithm, the tool window itself will be hidden as long as the job is running, while the output of the algorithm is written to the console for you to follow. At the beginning, the tools list the main input parameters for your convenience, and, once the task is completed, the query tool additionally outputs the inference results to the console (so even if you are using the Alchemy system, there is not really a need to open the results file that is generated).
Configuration
You may want to modify the configuration settings in config.py:
- If you want to use the tools to invoke the Alchemy system (more than one Alchemy installation is even supported), you will have to configure the paths where these installations can be found as well as the set of command line switches that applies to your version (they have changed over time).
- You can also configure the file masks for MLN and database files, as well as naming conventions for output files (based on input filenames and settings used), which comes in handy when you are dealing with more than just a handful of files.
- Further options concern the user interface, output variants and the workflow. These are documented in config.py itself.
Integrated Editors
The tools feature integrated editors for .db and .mln files. If you modify a file in an internal editor, it will automatically be saved as soon as you invoke the learning or inference method (i.e. when you press the button at the very bottom) or whenever you press the save button to the right of the dropdown menu.
If you want to save to a different filename, you may do so by changing the filename in the text input directly below the editor (which is activated as soon as the editor content changes) and then clicking on the save button.
Session Management
The tools will save all the settings you made whenever the learning or inference method is invoked, so that you can easily resume a session (all the information is saved to a configuration file). Moreover, the query tool will save context-specific information:
- The query tool remembers the query you last made for each evidence database, so when you reselect a database, the query you last made with that database is automatically restored.
- The model extension that you selected is also associated with the training database (because model extensions typically serve to augment the evidence, e.g. the specification of additional formulas to specify virtual evidence).
- The additional parameters you specify are saved specific to the inference engine.
Command-Line Options
When starting the tools from the command line, they (to some degree) interpret and take over any Alchemy-style command line parameters, i.e. you can, for example, directly select the input MLN file by passing "-i <mln file>" as a command line parameter to learningTool.py. Uninterpretable options will be added to the "additional options" input.
Tool-Specific Fields
Query Tool
-
Queries
a comma-separated list of queries, where a query can be any one of the following:
- a ground atom, e.g. foobar(X,Y)
- the name of a predicate, e.g. foobar
- a ground formula, e.g. foobar(X,Y) ^ foobar(Y,X) (internal engine only)
-
Max. Steps, Num. Chains
the maximum number of steps to run sampling-based algorithms, and the number of parallel chains to use
If you leave the fields empty, defaults will be used.
-
Add. params
additional parameters to pass to the inference method
For the internal engine, you can specify a comma-separated list of assignments of parameters of the infer method you are calling (refer to MLN.py for valid options.) For example, with exact inference, setting debug
to True (i.e. writing "debug=True" into the input field) will print the entire distribution over possible worlds. For MC-SAT, you could specify "debug=True, debugLevel=30" to get a detailed log of what the algorithm does (changing debugLevel will affect the depth of the analysis).
For J-MLNs or the Alchemy system, you can simply supply additional command line parameters to pass on to J-MLNs BLNinfer and Alchemy's infer respectively.
File Formats
The file formats for MLN and database files that our Python implementation of MLNs processes are for the most part compatible with the ones used by the Alchemy system.
General conventions
- All constant symbols that aren't integers must begin with an upper-case letter
- Domain symbols must begin with a lower-case letter
- Identifiers may contain only alphanumeric characters, "-", "_" and "'"
MLN Files
An MLN file may contain:
-
C++-style comments
i.e. // and /* */
-
Domain declarations
to assign a set of constants to a particular type/domain
e.g. domFoo = {A, B, C}
-
Predicate declarations
to declare a predicate and the types/domains that apply to each of its arguments
e.g. myPredicate(domFoo, domBar)
A predicate declaration may coincide with a rule for mutual exclusiveness and exhaustiveness (see below).
-
Rules for mutual exclusiveness and exhaustiveness
to declare that for a particular binding of some of the parameters of a predicate, the value assignments of the remaining parameters are mutually exclusive and exhaustive, i.e. that the remaining parameters are functionally determined by the others.
For example, you can add the rule myPredicate(domFoo, domBar!) to declare that the second parameter of myPredicate is functionally determined by the first (i.e. that for each binding of f there is exactly one binding of b for which the atom is true).
-
Formulas with attached weights
as constraints on the set of possible worlds that is implicitly defined by an MLN's set of predicates and a set of (typed) constants with which it is combined.
A formula must always be specified either along with a weight preceding it or, in case of a hard constraint, a period (.) succeeding it. Usually, a weight will be specified as a numeric constant, but when using the internal engine, weights can also be specified as arithmetic expressions, which may contain calls to functions of the Python math module (and the special function logx which returns -100 when passed 0). Note, however, that the expression must not contain any spaces. For example, you could specify an expression such as log(4)/2 instead of 0.69314718055994529.
The formulas themselves may make use of the following operators/syntactic elements (operators in order of precedence):
- existential quantification, e.g. EXIST x myPred(x,MyConstant) or EXIST x,y (...)
Quantification applies only to the formula that follows immediately after the list of quantified variables, so if it is a complex formula, enclose it in parentheses.
- equality, e.g. x=y
- negation, e.g. !myPred(x,y) or !(x=y)
- disjunction, e.g. myPred(x,y) v myPred(y,x)
- conjunction, e.g. myPred(x,y) ^ myPred(y,x)
- implication, e.g. myPred(x,y) ^ myPred(y,z) => myPred(x,z)
- biimplication, e.g. myPred(x,y) <=> myPred(y,x)
When a formula that contains free variables is grounded, there will be a separate instance of the formula for each grounding of the free variables in the ground Markov network (each having the same weight).
While the internal engine may perform a CNF conversion of the formulas, it does not not decompose the CNF formulas if they are made up of more than one conjunct in order to obtain individual clauses. With the internal engine, all formulas are indivisible.
-
Formula templates
An atom in a formula can be prefixed with an asterisk (*) to define a template that stands for two variants of the formula, one with the positive literal and one with the negative literal. (e.g. *myPred(x,y))
Moreover, you can prefix a variable that is an argument of an atom with a + character to define a template that will generate one formula for each possible binding of that variable to one of the domain elements applicable to that argument. (e.g. myPred(+x,y))
-
Probability constraints on formulas (internal engine only)
You may want to require that certain formulas have a fixed prior marginal probability regardless of the size of the domain with which a model is instantiated. This is accomplished by dynamically adjusting the weight of the formula when instantiating a ground Markov network.
e.g. P(myPred(x,y)) = 0.75 or P(myPred(x,y) ^ myPred(y,x)) = 0.9
Similarly, you may want to require that the posterior marginal probability of a ground formula be fixed. This essentially corresponds to a specification of soft evidence.
e.g. R(myPred(X,Y) v myPred(Y,X)) = 0.8
Any formulas for which a constraint is specified must also be part of the MLN (i.e. you must add them to the MLN, with some weight).
Note: Probability constraints are extensions of the original MLN formalism.
Limitations:
- no support for functions, numbers/numeric operators or anything that is related to it
- formulas must always be preceded by a weight or be terminated by a period, even if they are only to be used in an input MLN for parameter learning
- no definition can span multiple lines
Database/Evidence files
A database file may contain:
-
C++-style comments
i.e. // and /* */
-
Positive and negative ground literals
e.g. myPred(A,B) or !myPred(A,B)
-
Soft evidence on ground atoms
e.g. 0.6 myPred(A,B). Note that soft evidence is supported only the internal engine and only
when using the inference algorithms MC-SAT (which corresponds to MC-SAT-PC when using soft evidence) and IPFP-M. Note that soft evidence on non-atomic formulas can be handled using posterior probability constraints (see above).
-
Domain extensions
as domain declarations (see above); useful if you want to define constants without making any statements about them.
Modules
The main functionality of PyMLNs is contained in MLN.py (everything directly related to Markov logic, including inference and parameter learning) and FOL.py (first-order logic). The graphical tools expose but a small fraction of the full functionality. Use Python's help method on the modules to find out more about what's there – or simply take a look at the source files; there is quite a bit of documentation available (though not quite enough).
The MLN module also contains a main app – a little helper script – that offers some basic functions that may be useful.
Contact
If you have any questions or comments, please don't hesitate to contact me.