A website I am working on right now accepts public comments, and one of the requirements is to do a basic check for dirty language. Surprisingly for such a common problem I wasn’t able to find any code on the net that did what I wanted and so I’ve ended up writing my own.

The Censor class is pretty simple: you give it a list of words you want to censor, either simple text or with wildcards, and the censor will star out any matches it finds.

IList<string> censoredWords = new List<string>
{
  "gosh",
  "drat",
  "darn*",
};
 
Censor censor = new Censor(censoredWords);
string result;
 
result = censor.CensorText("I stubbed my toe. Gosh it hurts!");
// I stubbed my toe. **** it hurts!
 
result = censor.CensorText("The midrate on the USD -> EUR forex trade has soured my day. Drat!");
// The midrate on the USD -> EUR forex trade has soured my day. ****!
 
result = censor.CensorText("Gosh darnit, my shoe laces are undone.");
// **** ******, my shoe laces are undone.

The first example is a simple whole word match on gosh. The second example replaces drat but doesn’t star out the drat in midrate. The final example shows the censor starting out multiple matches and also matching darnit against the wildcard darn*.

I’m passing a collection of strings in my examples but it is easy enough to find a list of swear words on the net, put them in a text file and call something like File.GetAllLines to get an array of words to filter on.

The code:

public class Censor
{
  public IList<string> CensoredWords { get; private set; }
 
  public Censor(IEnumerable<string> censoredWords)
  {
    if (censoredWords == null)
      throw new ArgumentNullException("censoredWords");
 
    CensoredWords = new List<string>(censoredWords);
  }
 
  public string CensorText(string text)
  {
    if (text == null)
      throw new ArgumentNullException("text");
 
    string censoredText = text;
 
    foreach (string censoredWord in CensoredWords)
    {
      string regularExpression = ToRegexPattern(censoredWord);
 
      censoredText = Regex.Replace(censoredText, regularExpression, StarCensoredMatch,
        RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
    }
 
    return censoredText;
  }
 
  private static string StarCensoredMatch(Match m)
  {
    string word = m.Captures[0].Value;
 
    return new string('*', word.Length);
  }
 
  private string ToRegexPattern(string wildcardSearch)
  {
    string regexPattern = Regex.Escape(wildcardSearch);
 
    regexPattern = regexPattern.Replace(@"\*", ".*?");
    regexPattern = regexPattern.Replace(@"\?", ".");
 
    if (regexPattern.StartsWith(".*?"))
    {
      regexPattern = regexPattern.Substring(3);
      regexPattern = @"(^\b)*?" + regexPattern;
    }
 
    regexPattern = @"\b" + regexPattern + @"\b";
 
    return regexPattern;
  }
}

kick it on DotNetKicks.com

Json.NET usage up 193% *

Mmm.... pi

* Chris immediately started to point out the statistical flaws of the survey. My response was that regardless, Json.NET’s share of the flawed survey market has increased Wink

Remember, you can always find East by staring directly at the sun. This American Life - Classifieds

This American Life is a fantastic weekly podcast about everyday aspects of life. Each week’s show has a theme and then a number of stories around that theme.

This episode is based on stories gathered from classified ads, all found from a single single newspaper on a single day. A man who posted an ad about a missing dog, a personnel ad to win back a lost love, and my favourite: assembling a disparate group of musicians from classified ads to form a band for one day only and record a song – Rocket Man.

Jon Langford and the One Day Band - Rocket Man.mp3:

HanselminutesVisiting Fog Creek Software and Joel Spolsky

Scott Hanselman interviews Joel Spolsky about technology, business, blogging and shark jumping. I can’t say I agree with many of Spolsky’s opinions on programming (eww) but he has a good head for business and marketing.

I hope this has taught you kids a lesson: kids never learn. While Fog Creek is most known for a project management tool called Fogbuz they also have created a remote desktop service called Copilot. A DVD I brought not long after I started my first software development job is a documentary about Copilot called Aardvark'd: Twelve Weeks with Geeks. Aardvark is a fun little movie about the software and the small group of interns who made it. Recommended.

Developers and multithreading fall into 3 camps:

  • Doesn’t know anything. This developer avoids thinking about other threads at all costs. Followers of the HttpContext. 90% of developers.
  • Knows everything. This developer is either a savant or writes operating systems for fun. Most likely sports a sweet hippy beard. 3.14159% of developers.
  • Doesn’t know everything but knows enough. This developer will happily write multithreaded applications. It turns out however he/she doesn’t know enough and said applications are full of hard to find, intermittent bugs just waiting to corrupt data and deadlock. The rest us.

I fall into the third category: knows enough to be dangerous.

The latest example of my thread safety failure is knowing that two threads modifying a dictionary at the same time is very bad, but then not considering that getting from a dictionary that is being modified is also not a terribly good idea. Thanks to Amir for the pointer.

ThreadSafeStore

I’ve written a simple helper class that wraps a Dictionary called ThreadSafeStore. ThreadSafeStore treats its internal Dictionary as immutable. Each time a new value is added the wrapper will create a new Dictionary with the new value and reassigns the internal reference. There is no chance a thread could access the Dictionary while it is being modified. ThreadSafeStore is aimed towards read performance with no lock on read required.

The downside to this approach is a new Dictionary is being created with every add. An improvement would be to introduce a second level Dictionary and buffer up new values before adding them to the main store, reducing total object’s allocated. For now I’ve left it simple. Suggestions welcome Smile

public class ThreadSafeStore<TKey, TValue>
{
  private Dictionary<TKey, TValue> _store;
  private readonly Func<TKey, TValue> _creator;
 
  public ThreadSafeStore(Func<TKey, TValue> creator)
  {
    if (creator == null)
      throw new ArgumentNullException("creator");
 
    _creator = creator;
  }
 
  public TValue Get(TKey key)
  {
    if (_store == null)
      return AddValue(key);
 
    TValue value;
    if (!_store.TryGetValue(key, out value))
      return AddValue(key);
 
    return value;
  }
 
  private TValue AddValue(TKey key)
  {
    lock (this)
    {
      TValue value;
 
      if (_store == null)
      {
        _store = new Dictionary<TKey, TValue>();
        value = _creator(key);
        _store[key] = value;
      }
      else
      {
        // double check locking
        if (_store.TryGetValue(key, out value))
          return value;
 
        Dictionary<TKey, TValue> newStore = new Dictionary<TKey, TValue>(_store);
        value = _creator(key);
        newStore[key] = value;
 
        _store = newStore;
      }
 
      return value;
    }
  }
}

That's why I love elementary school, Edna. The children believe anything you tell them. Over the weekend I presented a session on .NET 4.0 Code Contracts at Code Camp 2009.

One of my favourite announcements at PDC2008, I volunteered too talk about Code Contracts to force myself into learning everything I could about them Big Smile

The slides for my presentation can be downloaded here.

 

Thanks to the organisers and other speakers for all the great content. Stand out session goes to Ivan for not only covering everything new in C# 4.0 in 4 minutes but also showing us all some sweet dance moves.

More Posts Next page »