Geeking out with Kaggle PASSNYC data

My free time recently exploring one of the latest challenges from Kaggle: PASSNYC: Data Science for Good Challenge

I’ve been working with Python, but considering switching to R to run k-means.

My code is still a working progress so please be gentle πŸ™‚
https://www.kaggle.com/ambiguouserror/passnyc-data-science-for-good-challenge

Overview

PASSNYC is a not-for-profit organization that facilitates a collective impact that is dedicated to broadening educational opportunities for New York City’s talented and underserved students. New York City is home to some of the most impressive educational institutions in the world, yet in recent years, the City’s specialized high schools – institutions with historically transformative impact on student outcomes – have seen a shift toward more homogeneous student body demographics.

PASSNYC uses public data to identify students within New York City’s under-performing school districts and, through consulting and collaboration with partners, aims to increase the diversity of students taking the Specialized High School Admissions Test (SHSAT). By focusing efforts in under-performing areas that are historically underrepresented in SHSAT registration, we will help pave the path to specialized high schools for a more diverse group of students.

Problem Statement

PASSNYC and its partners provide outreach services that improve the chances of students taking the SHSAT and receiving placements in these specialized high schools. The current process of identifying schools is effective, but PASSNYC could have an even greater impact with a more informed, granular approach to quantifying the potential for outreach at a given school. Proxies that have been good indicators of these types of schools include data on English Language Learners, Students with Disabilities, Students on Free/Reduced Lunch, and Students with Temporary Housing.

Part of this challenge is to assess the needs of students by using publicly available data to quantify the challenges they face in taking the SHSAT. The best solutions will enable PASSNYC to identify the schools where minority and underserved students stand to gain the most from services like after school programs, test preparation, mentoring, or resources for parents.

Submissions for the Main Prize Track will be judged based on the following general criteria:

Performance – How well does the solution match schools and the needs of students to PASSNYC services? PASSNYC will not be able to live test every submission, so a strong entry will clearly articulate why it is effective at tackling the problem.

Influential – The PASSNYC team wants to put the winning submissions to work quickly. Therefore a good entry will be easy to understand and will enable PASSNYC to convince stakeholders where services are needed the most.

Shareable – PASSNYC works with over 60 partner organizations to offer services such as test preparation, tutoring, mentoring, extracurricular programs, educational consultants, community and student groups, trade associations, and more. Winning submissions will be able to provide convincing insights to a wide subset of these organizations.

notepad plus plus for Linux and other cool stuff I’ve found this week

notepad plus plus linux

notepad plus plus for Linux

I use notepad++ at work on a windows machine, I find it great for taking notes, and editing SQL scripts on the go.

For a long time Geany has been my goto Linux file editor, but it might now get more limited use in the future now that notepad++ is available in Linux, via snap πŸ™‚

Crazy too, that I think this is my first snap install!

This is still a Windows exe, installed via Wine, so still not a fully fledged Linux install, but it has all of the familiar features.

Simply install by typing the below into a terminal:
sudo snap install notepad-plus-plus


Which machine learning algorithm should I use?

machine learning cheat sheet
This link appeared on my Twitter feed this week; it’s a year old blog entry from SAS, but still useful.

Guide how to use it here: machine-learning-algorithm-use


How to make a bouncing ball simulator in Python

Scrolling through my Twitter posts this caught my eye, Boing Boing, a cool well presented project, and something a little different than what I use Python for.


Day Tripper tab by The Beatles

In the gym earlier in the week when this awesome promo video from The Beatles started playing on the large screen on the back wall; I’ve had this guitar riff in my head ever since πŸ™‚

E|—————————|—————————|
B|—————————|—————————|
G|—————————|—————————|
D|————2-O—-4—-O-2|————2-O—-4—-O-2|
A|———-2——2—-2—-|———-2——2—-2—-|
E|—O–3-4——————|—O–3-4——————|

Here’s what the band said about the song


Nearly finished Horizon Zero Dawn, I just don’t want it to finish. I’ve been travelling around the map, enjoying the scenery, finding all of the collectables.

Then discovered that Call of Duty Black Ops iii was free on the PS4 this week, so my attention has been diverted… times like this I am thankful for having fibre πŸ™‚

It’s been one of my favourite CoD multiplayer games since Modern Warfare, had heaps of fun, and am keeping a positive KDR πŸ™‚

Thinking of making a commitment to completing Bloodborne next from my list of games to finish… but no point rushing in to anything eh?

Aug 17 games and fuck up

Yesterday I fucked up… a little

I’ve been playing with a Python script to re-arrange the films in my movie directory; it moves everything of a selected file type from directory and sub-directory to a selected directory.

Useful eh?

Thought I might use it to group all of my photos too.

Last night I ran it on a drives root directory for all *.txt files.

Let me rephrase that, I moved every text file from all directories to my root directory.

This was only a backup drive, but I expect there might be a few programs that fail, as config files may have moved… on the upside, I know a good place to look for them is the folder in the root directory named ‘txt files’ πŸ™‚


retro need

Tired of current gen titles, plus mates have borrowed the games I fancied playing, I setup my XBOX 360 and PS3 again.

Such an awesome collection of titles, I can’t wait to introduce to the boy.

Found it mad that there are a few of them that I’ve bought the PS4 remakes of πŸ™‚

This then took me down a retro root and I installed a Linux MAME emulator to go back to my favourite platform title Wardner.

Wardner

Not everyone sees the beauty of the game, but I have fond memories of playing a Japanese version of it at lunch, and after school, being able to get through the full game on 10p.


The rest of my game time as been committed to Elite Dangerous.
A colleague is new to the game, so I logged on with the intention of passing on some credits to help him on his way.

Within the first hour I made 3 insurance claims on expensive ships, simply because I’d forgotten the controls.

Mitigation kicked in and I bought a Sidewinder to bring back the muscle memory and remember what the fuck I was doing.

2.4 is in alpha testing, to be released soon, named ‘The Return’ – Thargoids are coming!
So naturally I’ve been hooked again in preparation.

I’ve been grinding out at Sothis / Ceos, increasing Federation and Explorer rank, currently sitting at Lieutenant / Baron / Expert / Entrepreneur / Ranger… Elite not too far away.

I’m running a Clipper with 2 class 6 First Class cabins, with a single jump I’m making as much as 2.5MM credits, inching up the rank with each departure of passengers.

I’m tempted to follow the Road to Riches, for something different to do, I always enjoyed flying the Asp… also fancy climbing back in a Diamondback and getting my combat skillz back up to speed; think that could be useful with the Thargoids return.

Great that these are not mutually exclusive as my bank balance is looking healthy enough to kit out a few ships to a high grade.

I’ve been tempted to go down the Python route, although not sure why I’m reluctant… maybe it’s the idea that I’ll then have to grind engineers to better equip the ship?

I’ve also joined the Mobius group, which means I can play on-line without the distraction and stife of players who would want to blow away, or steal, my hard earned credits.

To paraphrase Braben, see you all in the black o7

Python as a platform for games

Status

I got thinking after my last post of why I chose Unity, and could Python be an option as a language to build games?

I have much more need for Python on a daily basis… although I’m currently limited to ver 2.7 at work, I do use it for the current Feed2Twitter Raspberry Pi setup.

So for the past few days I’ve been digging around searching for material and tutorials to play with.

One impressive example comes from a post on the Ubuntu Forums from over 7 years ago:

I was experimenting with Python over the last week, and I made this example app for various 3D stuff. I titled it PolarisGP, and I was making it with lax intentions of making a flight racing game. However, in case I lose interest, I’d like to put it out there for posterity.

It includes some examples of pretty cool techniques:

  • Basic linear algebra
  • Quaternion rotation
  • Axis/magnitude angular velocity
  • Display lists
  • .obj file loading
  • Efficiently extracting “up,left,forward” vectors from a matrix
  • Using an inverse matrix (in this case, from a quaternion) to create a camera system

Thanks CurvedInfinity wherever you are now πŸ™‚

I’ve uploaded the code to GitHub as it’s easier to share… & I wanted to get more familiar with the hosting code: https://github.com/gamer-geek/PolarisGP

Python script to quickly delete all Reddit posts

Reddit Python

Here’s a script that has been in my library for a while, don’t know why I haven’t shared it before now, all credit to nearengine πŸ™‚

This does a simple delete on all of our Reddit post history, no double delete like other options

redditPurge is a simple script I wrote while learning Python. It destroys your entire Reddit history, letting you start with a clean slate or delete your account with less of a trace.

usage: ./redditPurge.py username password

license: Do whatever you want with it! If you find it useful please send a tweet (@nearengine) or link to this page.

#!/usr/bin/env python

import time, sys, json
import requests

things_list = list()
karma = 0;

#
# Get username & password from arguments
#

if len(sys.argv) == 3:
username = sys.argv[1]
password = sys.argv[2]
else:
print 'usage: '+sys.argv[0]+' username password'
sys.exit()

#
# Do login
#

head = {'User-Agent': 'redditPurge 0.1'}
data = {'user': username, 'passwd': password, 'api_type': 'json'}
client = requests.session()
r = client.post('https://ssl.reddit.com/api/login', data=data, headers=head)

# Check if login was successful & store modhash, otherwise exit
try:
modhash = r.json()['json']['data']['modhash']
except:
if 'json' in r.json().keys():
if 'errors' in r.json()['json'].keys():
print ('[ ERROR ] ' + str(r.status_code) + ': ' +
r.json()['json']['errors'][0][0] + ' ' +
r.json()['json']['errors'][0][1])
else:
print '[ ERROR ] getting modhash'
else:
print '[ ERROR ] getting modhash'
sys.exit()

#
# Get first 100 things
#

print '[ OK ] please wait while your things are fetched...'
rUrl = 'http://www.reddit.com/user/'+username+'/overview.json?limit=100'
r = client.get(rUrl, headers=head)

# Make sure things were found
if len(r.json()['data']['children']) > 0:
# Fetch each thing's ID
for thing in xrange(0, len(r.json()['data']['children'])):
# and save it to the list
things_list.append(r.json()['data']['children'][thing]['data']['name'])
karma += r.json()['data']['children'][thing]['data']['ups']
karma -= r.json()['data']['children'][thing]['data']['downs']

# If not, display the best error we can
else:
if 'json' in r.json().keys():
if 'errors' in r.json()['json'].keys():
print ('[ ERROR ] ' + str(r.status_code) + ': ' +
r.json()['json']['errors'][0][0] + ' ' +
r.json()['json']['errors'][0][1])
else:
print '[ ERROR ] fetching things'
else:
print '[ ERROR ] fetching things'
sys.exit()

#
# If there are more things, fetch them until we run out
#

if r.json()['data']['after'] != None:
while True:
r = client.get(rUrl+'&after='+r.json()['data']['after'], headers=head)

# Make sure things were found
if len(r.json()['data']['children']) > 0:
# Fetch each thing's ID
for thing in xrange(0, len(r.json()['data']['children'])):
# Store every thing ID in the list
things_list.append(
r.json()['data']['children'][thing]['data']['name'])
karma += r.json()['data']['children'][thing]['data']['ups']
karma -= r.json()['data']['children'][thing]['data']['downs']

# If not, display the best error we can
else:
if 'json' in r.json().keys():
if 'errors' in r.json()['json'].keys():
print ('[ ERROR ] ' + str(r.status_code) + ': ' +
r.json()['json']['errors'][0][0] + ' ' +
r.json()['json']['errors'][0][1])
else:
print '[ ERROR ] fetching things'
else:
print '[ ERROR ] fetching things'
sys.exit()

# We're out of things, so stop fetching more
if r.json()['data']['after'] == None:
break

# Otherwise there are more things, continue fetching them
else:
# Reddit's API rate limit is 2s
time.sleep(2)

#
# Now delete all the things!
#

print ('[ OK ] done fetching. you\'re sacrificing ' + str(karma) +
' karma today! here we go:')

count = 1
count_max = len(things_list)

for thing_id in things_list:
# Try deleting the thing
data = {'id': thing_id, 'uh': modhash}
r = client.post('http://www.reddit.com/api/del', data=data, headers=head)
print ('[ ' + str(r.status_code) + ' ] ' + thing_id +
' (' + str(count) + '/' + str(count_max) + ')')
count += 1

# Reddit's API rate limit is 2s
time.sleep(2)

Academy of Code

I’ve been working through GitHub for useful code, found some interesting material, but it’s time to step up my learning.

After reading through this Reddit/Python thread: How did you learn Python

I’ve chosen Codecademy.

I’ve only recently begun, but I like the uncluttered text and being able to enter script on the site.
Let’s see how long I can commit to before I become distracted with something else πŸ˜‰

MEGA python

I’m investigating the uses for MEGA, as an Android app became recently available.

Considering using it for secure communication; as all data is encrypted, therefore if multiple people had a log in to the same account they could share files that could be messages.

Another option is to upload the message as a file, and share a link to the file in an encrypted format, then delete the file after it has been read… although not sure of how secure any of this is.

Also today I’ve opened the digital code book again, and am trying to pick up a little useful knowledge in Python.
Fun Sunday eh? πŸ˜‰