Python 解析器

来源：互联网发布：js中数组存储对象编辑：程序博客网时间：2024/06/02 14:59

嵌入Python解析器执行一些简单的Python 脚本脚本很容易，但是当python解析器用到扩展模块时和多个线程都需要Python解析器执行脚本时，遇到了一些麻烦问题，下面是在查看帮助文档过程中，一些简单的翻译记录：
　　
　　Py_NewInterpreter():
　　 Create a new sub-interpreter. This is an (almost) totally separate environment for the execution of Python code. In particular, the new interpreter has separate, independent versions of all imported modules, including the fundamental modules __builtin__, __main__ and sys. The table of loaded modules (sys.modules) and the module search path (sys.path) are also separate. The new environment has no sys.argv variable. It has new standard I/O stream file objects sys.stdin, sys.stdout and sys.stderr (however these refer to the same underlying FILE structures in the C library).
　　
　　创建一个子的解析器。这是一个独立的运行Python代码的环境，新解析器是独立的，不受已经导入的Module约束，包括基本的模块 __builtin__,__main__和sys。记录加载Modules的表单和module搜索路径sys.path都是独立的。新环境没有sys.argv 变量。他有一个新的标准I/O流文件对象 sys.stdin,sys.stdout 和 sys.stderr 并且这些引用和底层的Clibrary中的文件结构相同。
　　
　　 The return value points to the first thread state created in the new sub-interpreter. This thread state is made in the current thread state. Note that no actual thread is created; see the discussion of thread states below. If creation of the new interpreter is unsuccessful, NULL is returned; no exception is set since the exception state is stored in the current thread state and there may not be a current thread state. (Like all other Python/C API functions, the global interpreter lock must be held before calling this function and is still held when it returns; however, unlike most other Python/C API functions, there needn't be a current thread state on entry.)
　　返回值指向在新的子解析器中第一个被创建的线程的状态。这个线程状态被人做为当前的线程状态。注意没有实际的线程被创建；看下面线程状态的讨论。如果创建新的解析器不成功，函数将返回Null；由于异常状态被存储在当前的线程状态中并且可能并不存在当前的线程状态，所以没有异常被设置。就像所有的Python/C API函数一样，在调用这个函数之前，全局的解析器锁必须被Hold直到该函数返回，但是不像别的Python/C API,他不需要一个当前线程状态作为入口。
　　
　　 Extension modules are shared between (sub-)interpreters as follows: the first time a particular extension is imported, it is initialized normally, and a (shallow) copy of its module's dictionary is squirreled away. When the same extension is imported by another (sub-)interpreter, a new module is initialized and filled with the contents of this copy; the extension's init function is not called. Note that this is different from what happens when an extension is imported after the interpreter has been completely re-initialized by calling Py_Finalize() and Py_Initialize(); in that case, the extension's initmodule function is called again.
　　扩展模块在自解析器之间是被共享的，如下：一个特定的扩展被第一次导入后，他被正常的初始化，并且把一个它的Module的词典的浅拷贝保存到一边去。当相同的扩展被别的子解析导入的时候，一个新的模块被初始化，并且包含这个拷贝的内容；扩展的初始化函数不被调用。注意，一下情况将有所不同，当一个扩展被导入后，通过调用 Py_Finalize() and Py_Initialize()重新初始化了解析器，在这种情况下，扩展可以被重新调用。
　　
　　 Bugs and caveats: Because sub-interpreters (and the main interpreter) are part of the same process, the insulation between them isn't perfect -- for example, using low-level file operations like os.close() they can (accidentally or maliciously) affect each other's open files. Because of the way extensions are shared between (sub-)interpreters, some extensions may not work properly; this is especially likely when the extension makes use of (static) global variables, or when the extension manipulates its module's dictionary after its initialization. It is possible to insert objects created in one sub-interpreter into a namespace of another sub-interpreter; this should be done with great care to avoid sharing user-defined functions, methods, instances or classes between sub-interpreters, since import operations executed by such objects may affect the wrong (sub-)interpreter's dictionary of loaded modules. (XXX This is a hard-to-fix bug that will be addressed in a future release.)
　　 Bug和告诫：因为子解析器（和主解析器）都是同一个进程的一部分，他们之间的隔离并不完美--例如，使用低级的文件操作os.close时，他们可能（有意无意的）相互影响别的解析器的打开的文件。因为在子解析器之间扩展模块是共享的，一些扩展模块可能不能适当的工作。当扩展中使用静态的或者全局的变量时或者当扩展模块在初始化后或使用处理他的词典时，这种情况尤为严重。可能会把一个子解析器创建的对象插入到另一个子解析器的名称空间中；这应该被高度的关注，尽量避免在子解析器之间共享用户定义函数，方法，实例和类，由于这样的对像所执行的导入动作可能会造成被加载模块组成的解析器词典错误。（XXX 这是一个很难修改的Bug，下一版中这个问题将被关注。）
　　
　　void Py_EndInterpreter( PyThreadState *tstate)
　　 Destroy the (sub-)interpreter represented by the given thread state. The given thread state must be the current thread state. See the discussion of thread states below. When the call returns, the current thread state is NULL. All thread states associated with this interpreter are destroyed. (The global interpreter lock must be held before calling this function and is still held when it returns.) Py_Finalize() will destroy all sub-interpreters that haven't been explicitly destroyed at that point.
　　销毁参数ThreadStatus表示的子解析器。所给的ThreadStatus必须是当前thread state。请看下面线程状态的讨论。当调用返回的时候，当前线程状态是NULL。所有的和这个解析器关联的线程状态被销毁。调用这个函数之前必须持有全局解析器所，直到这个函数返回。Py_Finalize()销毁所有的子解析器，那些不能在某点确定销毁的。
　　
　　 Thread State and the Global Interpreter Lock
　　
　　 The Python interpreter is not fully thread safe. In order to support multi-threaded Python programs, there's a global lock that must be held by the current thread before it can safely access Python objects. Without the lock, even the simplest operations could cause problems in a multi-threaded program: for example, when two threads simultaneously increment the reference count of the same object, the reference count could end up being incremented only once instead of twice.
　　线程状态和全局解析器锁
　　 python解析器不是完全线程安全的。为了支持多线程的Python程序，Python中一个全局锁，当前线程只有先Hold这个所，才能够安全的访问Python对象。没有全局锁，很简单的操作都可能引起一系列多线程编程的问题；例如：当两个线程同时增加同一个对象的引用的时候，引用可能最终只被增加了一次而不是两次。
　　
　　 Therefore, the rule exists that only the thread that has acquired the global interpreter lock may operate on Python objects or call Python/C API functions. In order to support multi-threaded Python programs, the interpreter regularly releases and reacquires the lock -- by default, every 100 bytecode instructions (this can be changed with sys.setcheckinterval()). The lock is also released and reacquired around potentially blocking I/O operations like reading or writing a file, so that other threads can run while the thread that requests the I/O is waiting for the I/O operation to complete.
　　因此，存在这样一个规则，就是只有拥有全局解析器锁的线程才能够访问Python对象或者调用Python/C API函数。为了支持多线程编程，解析器有规律得释放和申请锁---默认是，每100字节码指令（可用 sys.setcheckinterval()改变这个值）。当遇到潜在的I/O阻塞操作如读文件或写文件时，锁也会被释放和申请，以便当线程请求I/O等待I/O完成的时候，别的线程可以运行。
　　
　　
　　problem:
　　SUV:1-100时，
　　PyObject * PySUVValue = PyInt_FromLong((long)suv);
　　PySUVValue 的引用计数会远远大于1。
　　PyObject * PySUVValue = Py_BuildValue( “i”,suv) ；
　　也会有同样的现象，查看Python的帮助文档原来是下面的原因：
　　 The current implementation keeps an array of integer objects for all integers between -1 and 100, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)
　　原来是因为解析器中会有一个从-1到100的整数数组，当我们创建一个在这个范围的Int型数据的时候，只会返回一个该对象的引用，所以我们查看这个对象的引用计数的时候通常情况下不是一。
　　
　　 int PyImport_AppendInittab( char *name, void (*initfunc)(void))
　　 Add a single module to the existing table of built-in modules. This is a convenience wrapper around PyImport_ExtendInittab(), returning -1 if the table could not be extended. The new module can be imported by the name name, and uses the function initfunc as the initialization function called on the first attempted import. This should be called before Py_Initialize().
　　添加一个SingleModule到 built-in modules 表中。这是PyImport_ExtendInittab()的一个包装，可以更方便的使用。如果表不能被扩展返回-1，新的MOdule可以通过名字被导入，和使用初始化函数 initFunc 就像第一次被导入时调用初始化功能。这应该在Py_Initialize()之前调用。
　　在Py_Initialize()函数之前使用 int PyImport_AppendInittab(name, initname)和 Py_Initialize()之后调用initname()函数具有相同的作用。
　　
　　struct _inittab
　　Structure describing a single entry in the list of built-in modules. Each of these structures gives the name and initialization function for a module built into the interpreter. Programs which embed Python may use an array of these structures in conjunction with PyImport_ExtendInittab() to provide additional built-in modules. The structure is defined in Include/import.h as:
　　
　　struct _inittab {
　　 char *name;
　　 void (*initfunc)(void);
　　};
　　这个结构描述了list中相应内嵌Module的唯一入口。每一个结构给一个Module的名字和初始化函数，用于该Module嵌入到解析器中。在嵌入是Python程序中会使用一个这种结构的数组来关联。PyImport_ExtendInittab()可以增加内嵌Module，该结构被定义在Include/import.h文件中
　　
　　
　　int PyImport_ExtendInittab( struct _inittab *newtab)
　　
　　Add a collection of modules to the table of built-in modules. The newtab array must end with a sentinel entry which contains NULL for the name field; failure to provide the sentinel value can result in a memory fault. Returns 0 on success or -1 if insufficient memory could be allocated to extend the internal table. In the event of failure, no modules are added to the internal table. This should be called before Py_Initialize().
　　该函数的作用是添加内嵌Module的集合。newtab以name域为NULL标志结束。没有NULL成员标志结束，可能会造成内存错误。struct _inittab newtab[2];
　　
　　 memset(newtab, '\0', sizeof newtab);成功返回0，内存返回不足可以申请扩展内部表。在失败的事件后，没有Modules被加载到内部表。这个函数应该在Py_Initialize()之前使用。
　　
　　遇到过的问题及解决方法
　　（1）嵌入的Python解析器执行的python脚本中调用Corbar服务时，在解析器析构的时候总会不错。
　　解决方法：Python脚本中释放所有Corbar对象，释放所有的Corbar对象后不再有问题。
　　（2）由于程序中多个线程需要同时执行Python脚本，因此每个线程嵌入了python子解析器，子解析器中引入扩展模块时内存增加。
　　解决方法：在Python解析器初始化之前使用PyImport_ExtendInittab()，各个自解析器就不需要在单独引入扩展模块。
　　（3）每个线程嵌入子解析器，运行完Python脚本，析构子解析器，循环上述过程，内存增加。
　　解决方法： Py_OptimizeFlag = 2;
　　 Py_NoSiteFlag = 1;
　　修改问题（2）并设置这两个标志
　　 Py_OptimizeFlag = 2;
　　 Py_NoSiteFlag = 1;
　　之后好转，但是仍有少量内存增加，需要继续研究

原文链接：http://blog.tianya.cn/blogger/post_read.asp?BlogID=231837&PostID=12326434