Aller au contenu principal

Create your first project

this page is WIP

You'll complete this simple workflow:

  • Deploy the engine
  • Connect to your data
  • Run an analytics workload
  • Suspend the project

Before you begin

  • From your AI Unlimited admin, get these items:

    • The ip or host name for the AI Unlimited manager

    • These AWS or Azure environment variables

    AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN

  • Access the AI Unlimited manager and get your API key.

    link to API key topic - not all users will go through this workflow, so we need a separate topic - it should be easy to find - first one under Explore and Analzye data

  • Connect to JupyterLab, open a notebook, and select the AI Unlimited kernel.

remarque

If you don't have JupyterLab or the AI Unlimited kernel, see Jupyter installaton options.

Connect, and run your first workload

astuce

Run %help or %help <command> for details on all magic commands or any one of them. Or learn about the magic commands provided by the AI Unlimited kernel specifically.

  1. Connect to the AI Unlimited manager.

    we are aware of the horizonal scroll bar vs. copy icon issue - styles are being tweaked

    Assume no TLS for the sake of this sample workflow? But tell them what it means (very briefly). TA: the workflow section need not provide details of the variables as they are explained in detail in the magic commands

  2. Create a new project.

    They can name it what they want.

    Would be good to have text that spells out CSP.

    But what about the project team in this simple workflow? Do we expect a team to use it? But... I still need to learn about the "team" concept.

  3. Optionally, create an authorization object to store the CSP credentials.

    Normally they create a shared authorization or one for a single user. In this sample workflow, maybe this is not optional? Otherwise, they'd have to use SQL to create an authoriation for themselves? TA: Auth is required only for external connectivity, in this example as there is no external connection, this step is optional

    Replace ACCESS_KEY_ID, SECRET_ACCESS_KEY, and REGION with your values.

    These look like AWS. Do AWS-Azure tabs?

  4. Deploy the engine.

    Replace the Project_Name. (didn't they already name it?) The size can be small, medium, large, or extralarge. The default is small.

    The deployment process takes a few minutes. It generates a password.

  5. Connect to the project.

    When the connection is made, provide the generated password. how?

  6. Run the sample workload.

    TA: this example will change I don't yet understand all this, where the data comes from, etc. TA: Idea is to create a table and then load data from an Excel file or from the sample repo within Jupyter (FILEPATH=notebooks/sql/data/salescenter.csv) when did they select a DB in their data lake? something to do with the object authorization magic command? TA: No, this is data load from an Excel* is the idea that they might coincidentally happen to have tables with those names in their DB - or maybe someone in their org ran this sample workload prior? *TA: idea is to create new tables, users can create by themselves

    remarque

    Make sure you do not have tables named SalesCenter or SalesDemo in the selected database.

    a. Create a table to store the sales center data.

    b. Load data into the SalesCenter table using the %dataload magic command.

    remarque

    Unable to locate the salescenter.csv file? Download the file from GitHub Demo: Charting and Visualization Data.

    Verify that the data was inserted.

    c. Create a table with the sales demo data.

    d. Load data into the SalesDemo table using the %dataload magic command.

    remarque

    Unable to locate the salesdemo.csv file? Download the file from GitHub Demo: Charting and Visualization Data.

    Verify that the sales demo data was inserted successfully.

    Open the Navigator for your connection and verify that the tables were created. Run a row count on the tables to verify that the data was loaded.

    safe to assume they know how? probably not our job to teach them

    e. Use charting magic to visualize the result.

    safe to assume they know how? probably not our job to teach them

    Provide X and Y axes for your chart.

    f. Drop the tables.

  7. Back up your project metadata and object definitions in your Git repository.

    Is "metadata" too ENG-ish for a new data scientist? Should we say "data object definitions"?

  8. Suspend the engine.

You're done! You've run your first workload.