Advanced regex tips I have learned recently
Before we get started
*
matches the previous token between zero and unlimited times, as many times as possible. (greedy)
*?
matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)
How to match strings between two words
In one line
word1(.*)word2
In multiple lines
word1((\s|.)*?)word2
Examples
Match python comments
(?P<documentation>(?:(?:\s+[\"\']{3}(?:(?:\s|.)*?)[\"|\']{3}\n+)?(?:[ \t]*?\#(?:.*?)\n+)*)*)?
Match python class and function and comments
# For code blocks
(?P<code_block>(?:[ \t]*)(?P<code_head>(?:(?:(?:@(?:.*)\s+)*)*(?:(?:class)|(?:(?:async\s+)*def)))[ \t]*(?:\w+)\s*\((?:.*?)\)(?:[ \t]*->[ \t]*(?:(.*)*))?:\n+)(?P<code_body>(?:(?:)(?:[ \t]+[^\n]*)|\n)+))
or
# For all head information
(?P<class_or_function_top_defination>(?: *@(?:.*?)\n+)* *(?:\s+(?P<is_class>class)|(?P<is_function>def|async +def)) +(?:(?:\n|.)*?):\n+)(?P<documentation>(?:(?:\s+[\"\']{3}(?:(?:\s|.)*?)[\"|\']{3}\n+)?(?:[ \t]*?\#(?:.*?)\n+)*)*)?(?P<class_or_function_propertys>(?(is_class)((?![ \t]+(?:def|class) )(?:(?:.*?): *(?:.*?) *= *(?:.*?)\n)*)|(?:)))?
Author
yingshaoxo@gmail.com