Intro homework 'Are we in danger?'

  • Hi,

    In case you have not started answering Evan’s workshop 1 homework, this is a beginning.

    1. According to the Wikipedia definition, we are at risk of a potentially hazardous object clashing Earth if:
      a) the “minimum orbit intersection distance” is smaller than 0.05 au. The variable name is moid_au, and ‘au’ stands for astronomical units.
      b) It has an “absolute magnitude” of 22 or brighter. Variable h_mag.

    2. I think we basically have to import, visualize the variable moid_au in a histogram, and define how many have moid_au < 0.05 au and from these, how many have a h_mag >= 22.

    3. Useful datacamp: At the end of chapter two from “Importing Data in Python (Part 2)” you can see “JSON–from the web to Python”.

    I imported the data like this:

    #importing libraries
    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    import json
    import requests

    #data url
    url= ‘

    #package the request, send the request and catch the response: r
    r = requests.get(url)

    #Decode the JSON data into a dictionary: json_data
    data= r.json()

    #The variable moid_au (Minimum orbit intersection distance) defines if an object is potentially dangerous. Risk au < 0.05 au. AU=Astronomical units.

  • @beire_2018 basically the shape of the histogram tells you about the density function. Other variables matter, e.g. if one observation is independent of the other.

  • I have to learn what the poisson distribution means and how that works in python and under which conditions.

    According to Evan we should focus more on the basic question and framework before any coding.

    I believe we can fact check our findings with this data:
    “observed anytime” + “any impact prob” + “any palmera scale” + “H<=22”
    (even if we do any H, we can expand our current dataset/fact check)
    (some variables have different meanings, could be useless…)

    We should get a hold of confirmed earth impact data.
    (something better than this:
    With some luck we can train a supervised model in order to predict future collisions…

  • @beire_2018 the Poisson distribution may be useful to model events such as the number of meteorites greater than 1 meter diameter that strike Earth in a year. I guess the next step should be adding a bit more of stats.

  • @beire_2018 ok, so the answer is 22 !! ☺

  • @irene-del-carmen said in Intro homework 'Are we in danger?':

    hy you calculate mean and standard deviation.

    yes I corrected the x-axis label to MOID (au)
    we simply calculated the mean and standard deviation for it was asked by Evan.
    the key goal was to write our own standard deviation function import that and use instead of numpy.

    the homework is never finished, though I stop working on it for now.

    the number of objects of concern is calculated by len(df) at the end = 22 dangerous objects.
    I believe these objects are rather an opportunity than a danger for resource mining maybe.

  • @beire_2018 I think there is a misunderstanding with the histograms. They are meant to analyze one variable . It shows how many counts of a value appears for thisvariable. Therefore it is not correct to label (x_label) with another variable, as you did.
    I also don’t understand why you calculate mean and standard deviation.

    Yesterday @Ali-Talbi said the homework was finished. If you are in the same team, I consider there is still no clear answer of what are the number of objects that have moid_au<0.05 and h_mag> 22, but you are getting there.

    The code you wrote does not work in my code. I am still doing it slowly, rechecking the courses. 👣 👣

  • @beire_2018 Excellent Beire thank you. I like your solution of searching another data format (.csv).
    However I struggled with the json . I transformed it into a dataframe because it’s an object I’m more familiar. Basically:

    #create a dataframe from a json : df
    df = pd.DataFrame.from_dict(data, orient=‘columns’)

    I’m not so sure how much cleaning we need (forget about melting, etc). I simply make a variable with the two columns I need. The length of moid is the same as the dataframe, so no null values.
    moid = df[‘moid_au’]

    token = 6cb535ac9eaa7b3169f35766f70ee969c60d77606010b6d5

    skipped the data cleaning phase by using a newer bigger dataset (Discovery_Statistics.csv)
    and stuck at the statistical inference analysis

    (current thinking: Tsjebysjev’s inequality rule, calculating z-score’s maybe)

  • @beire_2018 Thank you very much! I guess this is the result of the hours you spent last Thursday. I simply read the definition on Wikipedia and never found out what was q_au_1 and 2. I just updated that h_mag matters if it is bigger than 22. However, the instruction says to reduce everything to one dimension. Therefore I guess the hierarchy is if it close to earth, then let’s see the magnitude.

    I won’t be able to be at Digityser today, but maybe we can keep updating things after we get something? There is a communication channel Raptorate open, we are chatting there. Otherwise, can I join your slack group? Thanks.

  • @change09 Thanks for replying, that’s the meaning of posting!

    I agree with you because the definition in wikipedia includes it: “Potentially hazardous asteroids (PHAs) are defined as having a minimum orbital intersection distance with Earth of less than 0.05 astronomical units (19.5 lunar distances) and an absolute magnitude of 22 or brighter”.

    However, the instructions said " reduce to one dimension" , so I ignored the rest and for a moment .

  • I am currently at digitizer, hope to meet and talk about the project.
    this evening I plan on starting a new notebook and make a clean finished version.
    I agree with your post, here is why:

    designation = year + identifier of the object in space (ex: 2016 WF9) -
    discovery_date = the date the NEOWISE space RADAR detected the object (discovery: first observed) yyyy-mm-dd
    h_mag = absolute magnitude of the object in log scale (ex: H (mag)= 20.1) -
    = only objects larger than roughly 140 meters in diameter (or absolute magnitude, H > 22)
    = see :
    i_deg = inclination degree of the orbit (irrelevant), the tilt of the object orbit around a body
    moid_au = minimum orbit intersection distance
    = An object is classified as a potentially hazardous object (PHO) – that is,posing a possible risk to Earth –
    = if, among other conditions, its Earth MOID is less than 0.05 AU.
    = (MOID < 0.05)
    = ex: earth MOID for 2016_WF9 = 0.0156 AU ~ MOID (au): 0.015 in dataset
    orbit_class = group name of the Near Earth Orbit route around the sun compared to earth (irrelevant)
    period_yr = how many earth years the object makes it’s own full orbit
    = the time a given astronomical object takes to complete one orbit around another object
    = in degrees °, ex: Inclination 14.995° or i (deg): 15 (irrelevant)
    pha = potentially hazardous asteroids (Y/N) binary data
    = suspected extinct comet, classified as near-Earth object and potentially hazardous asteroid of Apollo group
    q_au_1 = (AU) min amplitude (= earth is at 1 AU)
    = q (au): 0.98 or Perihelion 0.9816 AU
    = the point where the body comes closest to the Sun
    q_au_2 = (AU) max amplitude (= earth is at 1 AU)
    = Q (au): 4.76 or Aphelion 4.7614 AU
    = which is the point in the orbit where the celestial body is farthest from the Sun

    Here we can grab a better dataset and fix the missing values perhaps…

    working environment =
    provide the token = 6cb535ac9eaa7b3169f35766f70ee969c60d77606010b6d5

    this will open up the jupyter notebook with python3 and all the latest updates and modules
    yesterday I played around and created the modules for sum, standard deviation etc…

  • I would like to add a small point here please also consider PHA factor which is combination of Magnitude and Moid as per the NASA website definition.