This macro shows all Wiki pages that have links to external web sites.
I am hoping Juergen will clean up my code up a bit and add it to the MoinMoin distro. Meanwhile I am posting it here for people to check out.
Combining my url checker code on EfnetPythonWiki:JürgenHermann with the ExternalLinks macro (note the proposed name change
) would be incredibly nifty. The checking should be done via a link generated by the macro (i.e. only on request by the user). --jh
I agree, but I think we should leave it for the next version. I am out of commisson for the next four or five weeks, because I'm doing a cross-country move, then I'm starting a new job. You have my permission to rename the macro to ExternalLinks (much better name) and do with it as you please.
-- SteveHowell
In the current form, I won't take it into the official code base anyway, since ripping the URLs with your own machinery is too likely to break later; and at the same time, there is no official machinery, yet. Note that I made some of the page scanning regex available in 0.111, so you can be somewhat more official after the 0.11 release.
1 Use MoinMoin.parser.wiki.Parser.word_rule for wikinames, and MoinMoin.parser.wiki.Parser.url_rule for URLs.
code
"""
MoinMoin - OutsideLinks Macro
Copyright (c) 2002 by Steve Howell <showell@zipcon.net>
All rights reserved, see COPYING for details.
Show all Wiki pages that have links to external web sites.
"""
# Imports
import re, string
from MoinMoin import config, wikiutil
from MoinMoin.Page import Page
def execute(macro, args):
all_pages = wikiutil.getPageList(config.text_dir)
all_pages.sort()
result = ""
for page in all_pages:
if not ignore(page):
links = getLinks(Page(page).get_raw_body())
if len(links) > 0:
result = result + macro.formatter.pagelink(page)
result = result + macro.formatter.bullet_list(1)
for link in links:
result = result + macro.formatter.listitem(1)
result = result + macro.formatter.url(link)
result = result + macro.formatter.listitem(0)
result = result + macro.formatter.bullet_list(0)
return result
def ignore(page):
return re.search("Hermann", page) \
or re.search("Help", page) \
or re.search("MoinMoin", page) \
or re.search("Wiki", page)
def getLinks(text):
link = link_regex()
lines = string.split(text, '\n')
result = []
for line in lines:
for item in link.findall(line):
if not re.search(r"\.gif", item[0]):
result.append(item[0])
return result
def link_regex():
return re.compile(r"((%(url)s)\:([^\s\<%(punct)s]|([%(punct)s][^\s\<%(punct)s]))+)"
% {
'url': 'http|https|ftp|nntp|news|telnet|file',
'punct': re.escape('''"'}]|:,.)?!'''),
})