Using Natural Language Processing (NLP) and a newly developed webcrawer, we analyzed 75 of edX's most popular MOOCs. The crawling tool extracted course's textual data from readings (html), video transcripts and assessments available in the Open edX learning management system (LMS). Secondly, through statistical analysis we discovered interesting patterns from the course materials. Finally, we perform text readability assessment based on word embeddings and compare the courses using clustering.