issue154:python
Différences
Ci-dessous, les différences entre deux révisions de la page.
Prochaine révision | Révision précédente | ||
issue154:python [2020/03/01 17:17] – créée auntiee | issue154:python [2020/03/05 14:46] (Version actuelle) – andre_domenech | ||
---|---|---|---|
Ligne 1: | Ligne 1: | ||
- | As many of you who have been reading this column for a while might know, one of my hobbies is cooking. Since I'm the one in the family who cooks every night, I'd have to say that it's my favorite hobby. | + | **As many of you who have been reading this column for a while might know, one of my hobbies is cooking. Since I'm the one in the family who cooks every night, I'd have to say that it's my favorite hobby. |
- | My eighth article for Full Circle back in FCM#34 (February 2010) was about creating a very small and generic cookbook database. The fact that it is 10 years to the month since I first wrote about the database program hasn't escaped me. | + | My eighth article for Full Circle back in FCM#34 (February 2010) was about creating a very small and generic cookbook database. The fact that it is 10 years to the month since I first wrote about the database program hasn't escaped me.** |
- | Anyway, I've started re-writing the program, pretty much from scratch, and again using Page as the GUI designer. I wanted to give it a newer, sleeker look, with a nicer interface, and there have been many things that i've wanted to add for years, but just never got around to doing. Things like adding a way to have a picture of the finished product, a way to grab a recipe from one of the many recipe websites I search, and more. | + | Comme peuvent le savoir beaucoup d' |
- | While I'm still in the process of development and the UI is still somewhat in flux, I' | + | Mon huitième article, dans le numéro 34 du FCM (février 2010) parlait de la création d'une toute petite base de données générique de cuisine. Le fait que, ce mois-ci, ça fait 10 ans que j'ai commencé à parler de cette base de données, ne m' |
- | As I said, one of the things that I was both excited | + | **Anyway, |
- | I stumbled upon a really nice project called " | + | While I'm still in the process |
- | Let's look at how to install the library and utilize it. | + | Quoiqu'il en soit, j'ai commencé à réécrire le programme, de zéro, ou à peu près, et en utilisant à nouveau Page comme environnement graphique de conception. Je voulais lui donner un aspect plus neuf, plus épuré, avec une interface plus jolie, et il y a beaucoup de choses que je veux ajouter depuis longtemps ; mais je n'ai jamais pris le temps de le faire. Des choses comme ajouter une façon d' |
- | The repository is located at https:// | + | Bien que je sois toujours dans le processus de développement, |
+ | |||
+ | **As I said, one of the things that I was both excited and worried about was the webpage scraper. I've tried writing a generic scraper before with limited success, but never could wrap my head around it properly. Part of the problem was that other things with a higher priority level would come up just as I was starting to be comfortable with the process, and I would have to put the project on hold. By the time I got around to revisiting the project, I had to spend a good while trying to remember what I was doing and how I had done it. I ended up so frustrated I started searching the web for some tips and tricks that others posted that might give me a leg up on my learning process.** | ||
+ | |||
+ | Comme je l'ai dit, l' | ||
+ | |||
+ | **I stumbled upon a really nice project called " | ||
+ | |||
+ | Let's look at how to install the library and utilize it.** | ||
+ | |||
+ | Je suis tombé sur un beau projet appelé « recipe-scrapers » qui semblait avoir été créé juste pour ce numéro. C'est une bibliothèque gratuite et Open Source qui fournit des extracteurs personnalisés pour de nombreux sites Web de recettes de cuisine, dont l'un d'eux s' | ||
+ | |||
+ | Regardons comment installer et utiliser la bibliothèque. | ||
+ | |||
+ | **The repository is located at https:// | ||
$ pip install recipe-scrapers | $ pip install recipe-scrapers | ||
Ligne 19: | Ligne 33: | ||
You can also clone or download the repository and, once you have it on your machine, go to the main folder (recipe-scrapers) and use pip to install it directly from the source... | You can also clone or download the repository and, once you have it on your machine, go to the main folder (recipe-scrapers) and use pip to install it directly from the source... | ||
- | $ pip install -e . | + | $ pip install -e .** |
+ | |||
+ | Le dépôt est situé à https:// | ||
+ | |||
+ | $ pip install recipe-scrapers | ||
+ | |||
+ | Vous pouvez aussi cloner ou télécharger le dépôt et, une fois qu'il est sur votre machine, aller dans le dossier principal (recipe-scrapers) et utiliser pip pour l' | ||
+ | |||
+ | $ pip install -e | ||
- | This is a good way to install it if you are interested in how the program works and if you want to write your own scrapers. | + | **This is a good way to install it if you are interested in how the program works and if you want to write your own scrapers. |
Now, open your favorite IDE or editor and create a new file. Let's call it " | Now, open your favorite IDE or editor and create a new file. Let's call it " | ||
Ligne 29: | Ligne 51: | ||
from recipe_scrapers import scrape_me | from recipe_scrapers import scrape_me | ||
- | Next, you will need a recipe page that you want to scrape. You need to find one that is a single recipe, not a category page. For this tutorial, we will use a page from Allrecipes.com that provides the recipe for Asian Orange Chicken https:// | + | Next, you will need a recipe page that you want to scrape. You need to find one that is a single recipe, not a category page. For this tutorial, we will use a page from Allrecipes.com that provides the recipe for Asian Orange Chicken https:// |
- | The next thing that we should do is create a variable to hold the URL of the site page... | + | C'est une bonne façon de l' |
+ | |||
+ | Maintenant, ouvrez votre environnement de développement (IDE) ou votre éditeur préféré et créez un nouveau fichier. Appelons-le « scrapertest.py ». | ||
+ | |||
+ | Bien sûr, la première chose que vous devez faire est d' | ||
+ | |||
+ | from recipe_scrapers import scrape_me | ||
+ | |||
+ | Ensuite, vous aurez besoin d'une page de recette que vous voudrez extraire. Vous devrez en trouver une qui soit simple, pas une page de catégories. Pour ce tutoriel, nous utiliserons une page de Allrecipes.com qui fournit la recette asiatique du Canard à l' | ||
+ | |||
+ | **The next thing that we should do is create a variable to hold the URL of the site page... | ||
site = ' | site = ' | ||
Ligne 41: | Ligne 73: | ||
Once we have that done, we can start grabbing some of the information that the scraper comes back with. Each bit of information is handled by a separate method. | Once we have that done, we can start grabbing some of the information that the scraper comes back with. Each bit of information is handled by a separate method. | ||
- | Note: Some scrapers may provide more or less information depending on the site and if the author of the scraper included it. | + | Note: Some scrapers may provide more or less information depending on the site and if the author of the scraper included it.** |
- | From the above code, we will be able to get the recipe title, the total amount of time that the recipe will take to prepare, the number of servings it will make (yields), a list of the ingredients, | + | Ensuite, il faut créer une variable |
- | When we run the program, the output looks like that shown on the next page, top. | + | site = ' |
- | It’s obvious that the ingredients come back as a Python list, so let's change the program a little bit to make the data a bit more readable. Comment out the line that prints the ingredients as a “glob” and replace it with... | + | Maintenant, nous créons une instance de l' |
+ | |||
+ | scraper = scrape_me(site) | ||
+ | |||
+ | Une fois fait, nous pouvons commencer à fouiller dans les informations que l' | ||
+ | |||
+ | Note : certains extracteurs fournissent plus ou moins d' | ||
+ | |||
+ | **From the above code, we will be able to get the recipe title, the total amount of time that the recipe will take to prepare, the number of servings it will make (yields), a list of the ingredients, | ||
+ | |||
+ | When we run the program, the output looks like that shown on the next page, top.** | ||
+ | |||
+ | À partir du code ci-dessus, nous serons capable d' | ||
+ | |||
+ | Quand nous lançons le programme, la sortie ressemble à celle montrée sur la page suivante, en haut. | ||
+ | |||
+ | **It’s obvious that the ingredients come back as a Python list, so let's change the program a little bit to make the data a bit more readable. Comment out the line that prints the ingredients as a “glob” and replace it with... | ||
# print(f' | # print(f' | ||
Ligne 58: | Ligne 106: | ||
- | With this small change, our output now looks like that shown far right. | + | With this small change, our output now looks like that shown far right.** |
- | Now, let’s make the program a bit more usable by allowing the user to enter the URL at runtime, rather than hard coding it. Comment out the line that assigns the URL (site = ‘’) and replace it with... | + | Il est évident que les ingrédients ressortent comme une liste Python ; aussi, changeons un peu le programme pour améliorer la lisibilité des données. Commentez la ligne qui imprime la liste comme une « globale » et remplacez-la par : |
+ | |||
+ | # print(f' | ||
+ | |||
+ | print(' | ||
+ | |||
+ | for ing in ingredients: | ||
+ | |||
+ | print(f' | ||
+ | |||
+ | Avec cette petite modification, | ||
+ | |||
+ | **Now, let’s make the program a bit more usable by allowing the user to enter the URL at runtime, rather than hard coding it. Comment out the line that assigns the URL (site = ‘’) and replace it with... | ||
site = input(' | site = input(' | ||
Ligne 70: | Ligne 130: | ||
scraper = scrape_me(site) | scraper = scrape_me(site) | ||
- | Be sure to indent the rest of the code so that it all falls under the if statement. | + | Be sure to indent the rest of the code so that it all falls under the if statement.** |
- | For this run, we’ll use a different known good recipe page, again from Allrecipes. https:// | + | Maintenant, améliorons un peu le programme pour permettre à un utilisateur d' |
+ | |||
+ | site = input (' | ||
+ | |||
+ | if site != '': | ||
+ | |||
+ | # site = ' | ||
+ | |||
+ | scraper = scrape_me(site) | ||
+ | |||
+ | Assurez-vous d' | ||
+ | |||
+ | **For this run, we’ll use a different known good recipe page, again from Allrecipes. https:// | ||
Now when you run the program with the new URL, your output looks like this: | Now when you run the program with the new URL, your output looks like this: | ||
- | With a bit more cleanup of the output portion of the code, it will be pretty nice. However, what happens when you enter a website that is not one of the sites that is supported by the library? Let’s look by trying a site that I know is not supported (below) https:// | + | With a bit more cleanup of the output portion of the code, it will be pretty nice. However, what happens when you enter a website that is not one of the sites that is supported by the library? Let’s look by trying a site that I know is not supported (below) https:// |
- | This error is easy to avoid. All of the sites that are supported are stored in a dictionary named SCRAPERS. What we will want to do is grab the domain from the URL and see if it is in the SCRAPERS dictionary. We can do that by importing the urlparse library… | + | Pour ce test, nous utiliserons une autre page de recettes bien connue, à nouveau sur Allrecipes. |
+ | https:// | ||
+ | |||
+ | Maintenant, quand vous lancez le programme avec la nouvelle URL, votre sortie ressemble à ceci : | ||
+ | |||
+ | Avec un peu plus d' | ||
+ | |||
+ | **This error is easy to avoid. All of the sites that are supported are stored in a dictionary named SCRAPERS. What we will want to do is grab the domain from the URL and see if it is in the SCRAPERS dictionary. We can do that by importing the urlparse library… | ||
from urllib.parse import urlparse | from urllib.parse import urlparse | ||
Ligne 84: | Ligne 163: | ||
Be sure to place this at the top of the file, just under the other import statement. The existing code will be shown here as ‘not bold’ and the new code as ‘bold’ (top right). | Be sure to place this at the top of the file, just under the other import statement. The existing code will be shown here as ‘not bold’ and the new code as ‘bold’ (top right). | ||
- | Again, be sure to match the indentation level of the rest of the code. Finally, at the very end of the code, add the following two lines (below). | + | Again, be sure to match the indentation level of the rest of the code. Finally, at the very end of the code, add the following two lines (below).** |
- | Now when you run the program, using the unsupported URL, you’ll see the following… | + | Cette erreur est facile à éviter. Tous les sites qui sont supportés sont stockés dans un dictionnaire nommé SCRAPERS. Ce que nous ferons, c'est de récupérer le domaine dans l'URL et de voir s'il est dans le dictionnaire SCRAPERS. Nous pouvons le faire en important la bibliothèque urlparse : |
+ | |||
+ | from urllib.parse import urlparse | ||
+ | |||
+ | Assurez-vous de la placer tout en haut du fichier, juste sous l' | ||
+ | |||
+ | À nouveau, vérifiez la bonne correspondance du niveau d' | ||
+ | |||
+ | **Now when you run the program, using the unsupported URL, you’ll see the following… | ||
Please enter the website URL to scrape (blank line to quit) -> | Please enter the website URL to scrape (blank line to quit) -> | ||
Ligne 93: | Ligne 180: | ||
- | That’s it.This base code can be easily worked into a GUI form as well. Here’s a shot of what my GUI scraper form looks like. | + | That’s it.This base code can be easily worked into a GUI form as well. Here’s a shot of what my GUI scraper form looks like.** |
- | As I usually do, I’ve put the code up on Pastebin at https:// | + | Maintenant, quand vous lancerez le programme en utilisant une URL non supportée, vous verrez ce qui suit : |
+ | |||
+ | Merci de saisir le site Web à extraire (laissez vierge pour quitter) -> | ||
+ | |||
+ | Désolé, ce site Web n'est pas supporté actuellement. | ||
+ | |||
+ | Voilà. Cette base de code peut facilement être mise dans une interface graphique. Voici une vue de ce à quoi ressemble mon extracteur en mode graphique. | ||
+ | |||
+ | **As I usually do, I’ve put the code up on Pastebin at https:// | ||
Until next time, | Until next time, | ||
- | Keep coding! | + | Keep coding!** |
+ | |||
+ | Comme je le fais habutuellement, | ||
+ | |||
+ | Jusqu' | ||
+ | |||
+ | Continuez à coder ! |
issue154/python.1583079465.txt.gz · Dernière modification : 2020/03/01 17:17 de auntiee