May 12th, 2012
Recently our Organization has grown to immense size and we are starting to outgrow the 10.6 Wiki server that we use primarily for our intranet. I have been looking at the 10.7 wiki server however it is not much better, our intranet has been plagued with bouts of corruption and plist issues that have caused slow load times, and extreme data loss. Its pretty clear that we need to move to a more stable information storage media. We have looked at WordPress and Drupal for this functionality however the biggest issue is getting the data from the Wiki Server into one of these installations. I noticed that both Drupal and WordPress have many plugins or modules that offer the ability to import content from CSV however getting a Wiki Server content set into CSV is not as easy as it sounds.
I found this script which works great at extracting the information that is stored in the plist file in each of the page folders in the Wiki structure. However grabbing the content out of the page.html file stored in each .page folder was what I was looking to do. I wrote a helper script that recursively copies and runs the script with a few modifications and then exports all the data I wanted to CSV. The script then copies the CSV files to the main export folder and then deletes all the files that it created in the WIki Server structure.
To use this script you must copy the folder and all three of the scripts inside it to the root level of your Server HD. Each script has a variable you must set, once you have set the initial path of your Wiki Deployment and the base URL structure you need to make the files executable. You can do this by
chmod 700 -R /export
this should make the scripts executable. Once done you need to run the run.sh script with sudo. This will trigger the export. This is no where near perfect so I have opened up a GitHub repository for the changes that I have made, and the addition to the helper script that runs these recursively. This also exports content in user blogs as well.
The one challenge I am having is running the script that exports the page.html file content and keeping the encoding at utf-8 so that I don’t get any artifacts or odd characters.
Here are the scripts